利用supercsv读写CSV、TSV文件
先简单介绍下CSV和TSV文件的区别:
TSV ,Tab-separated values的缩写,即制表符分隔值。关于TSV标准,参考:http://en.wikipedia.org/wiki/Tab-separated_values CSV,Comma-separated values,即逗号分隔值。关于CSV标准,参考:http://en.wikipedia.org/wiki/Comma-separated_values
项目需要把原有的tsv文件数据整理一下形成更方便使用的新tsv文件(加几列)。涉及到tsv文件的读写。其实自己实现也是很简单的功能,不过正好有现成的工具包supercsv,就拿来用用试试。 官网地址:http://supercsv.sourceforge.net/index.html
文档可以说是清晰明了,网上其实也有不少用supercsv解析csv文件的例子,不过从tsv和csv的区别就可以看出,完全一套代码是可以解决的,只要换个分隔符就好饿了。supercsv里,也确实做到了。 先附上官网的例子:http://supercsv.sourceforge.net/examples_reading.html 待解析的csv文件:
customerNo,firstName,lastName,birthDate,mailingAddress,married,numberOfKids,favouriteQuote,email,loyaltyPoints 1,John,Dunbar,13/06/1945,"1600 Amphitheatre Parkway Mountain View, CA 94043 United States",,,"""May the Force be with you."" - Star Wars",jdunbar@gmail.com,0 2,Bob,Down,25/02/1919,"1601 Willow Rd. Menlo Park, CA 94025 United States",Y,0,"""Frankly, my dear, I don't give a damn."" - Gone With The Wind",bobdown@hotmail.com,123456 3,Alice,Wunderland,08/08/1985,"One Microsoft Way Redmond, WA 98052-6399 United States",Y,0,"""Play it, Sam. Play ""As Time Goes By."""" - Casablanca",throughthelookingglass@yahoo.com,2255887799 4,Bill,Jobs,10/07/1973,"2701 San Tomas Expressway Santa Clara, CA 95050 United States",Y,3,"""You've got to ask yourself one question: ""Do I feel lucky?"" Well, do ya, punk?"" - Dirty Harry",billy34@hotmail.com,36
利用MapReader方式解析的代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
/**
* An example of reading using CsvMapReader.
*/private static void readWithCsvMapReader() throws Exception {
ICsvMapReader mapReader = null;
try {
mapReader = new CsvMapReader(new FileReader(CSV_FILENAME), CsvPreference.STANDARD_PREFERENCE);
// the header columns are used as the keys to the Map
final String[] header = mapReader.getHeader(true);
final CellProcessor[] processors = getProcessors();
Map<String, Object> customerMap;
while( (customerMap = mapReader.read(header, processors)) != null ) {
System.out.println(String.format("lineNo=%s, rowNo=%s, customerMap=%s", mapReader.getLineNumber(),
mapReader.getRowNumber(), customerMap));
}
}
finally {
if( mapReader != null ) {
mapReader.close();
}
}}
/**
* Sets up the processors used for the examples. There are 10 CSV columns, so 10 processors are defined. Empty
* columns are read as null (hence the NotNull() for mandatory columns).
*
* @return the cell processors
*/private static CellProcessor[] getProcessors() {
final String emailRegex = "[a-z0-9\\._]+@[a-z0-9\\.]+"; // just an example, not very robust!
StrRegEx.registerMessage(emailRegex, "must be a valid email address");
final CellProcessor[] processors = new CellProcessor[] {
new UniqueHashCode(), // customerNo (must be unique)
new NotNull(), // firstName
new NotNull(), // lastName
new ParseDate("dd/MM/yyyy"), // birthDate
new NotNull(), // mailingAddress
new Optional(new ParseBool()), // married
new Optional(new ParseInt()), // numberOfKids
new NotNull(), // favouriteQuote
new StrRegEx(emailRegex), // email
new LMinMax(0L, LMinMax.MAX_LONG) // loyaltyPoints
};
return processors;}
样例的代码恐怕清楚的不能再清楚了。只需要解释一点,分隔符是通过CsvPreference.STANDARD_PREFERENCE设定的。如果想要解析TSV文件,只需要将这里换成CsvPreference TAB_PREFERENCE即可。
附个源码吧:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/**
* Ready to use configuration that should cover 99% of all usages.
*/
public static final CsvPreference STANDARD_PREFERENCE = new CsvPreference.Builder('"' , ',',"\r\n").build();
/**
* Ready to use configuration for Windows Excel exported CSV files.
*/
public static final CsvPreference EXCEL_PREFERENCE = new CsvPreference.Builder('"' , ',' , "\n").build();
/**
* Ready to use configuration for north European excel CSV files (columns are separated by ";" instead of ",")
*/
public static final CsvPreference EXCEL_NORTH_EUROPE_PREFERENCE = new CsvPreference.Builder('"' , ';' , "\n" ).build();
/**
* Ready to use configuration for tab -delimited files.
*/
public static final CsvPreference TAB_PREFERENCE = new CsvPreference.Builder( '"', '\t', "\n").build();
本文由作者按照 CC BY 4.0 进行授权