Parsing CSV in java

后端 未结 9 1815
臣服心动
臣服心动 2020-11-27 21:12

I have this weird situation where I have to read horizontally. So I am getting a csv file which has data in horizontal format. Like below:

CompanyName,RunDat         


        
相关标签:
9条回答
  • 2020-11-27 21:44

    String,split(",") isn't likely to work.
    It will split fields that have embedded commas ("Foo, Inc.") even though they are a single field in the CSV line.

    What if the company name is:
            Company, Inc.
    or worse:
            Joe's "Good, Fast, and Cheap" Food


    According to Wikipedia:    (http://en.wikipedia.org/wiki/Comma-separated_values)

    Fields with embedded commas must be enclosed within double-quote characters.

       1997,Ford,E350,"Super, luxurious truck"
    

    Fields with embedded double-quote characters must be enclosed within double-quote characters, and each of the embedded double-quote characters must be represented by a pair of double-quote characters.

       1997,Ford,E350,"Super ""luxurious"" truck"
    


    Even worse, quoted fields may have embedded line breaks (newlines; "\n"):

    Fields with embedded line breaks must be enclosed within double-quote characters.

       1997,Ford,E350,"Go get one now  
       they are going fast"
    



    This demonstrates the problem with String,split(",") parsing commas:

    The CSV line is:

    a,b,c,"Company, Inc.", d, e,"Joe's ""Good, Fast, and Cheap"" Food", f, 10/11/2010,1/1/2011, g, h, i


    // Test String.split(",") against CSV with
    // embedded commas and embedded double-quotes in
    // quoted text strings:
    //
    // Company names are:
    //        Company, Inc.
    //        Joe's "Good, Fast, and Cheap" Food
    //
    // Which should be formatted in a CSV file as:
    //        "Company, Inc."
    //        "Joe's ""Good, Fast, and Cheap"" Food"
    //
    //
    public class TestSplit {
        public static void TestSplit(String s, String splitchar) {
            String[] split_s    = s.split(splitchar);
    
            for (String seg : split_s) {
                System.out.println(seg);
            }
        }
    
    
        public static void main(String[] args) {
            String csvLine = "a,b,c,\"Company, Inc.\", d,"
                                + " e,\"Joe's \"\"Good, Fast,"
                                + " and Cheap\"\" Food\", f,"
                                + " 10/11/2010,1/1/2011, h, i";
    
            System.out.println("CSV line is:\n" + csvLine + "\n\n");
            TestSplit(csvLine, ",");
        }
    }


    Produces the following:

    
    D:\projects\TestSplit>javac TestSplit.java
    
    D:\projects\TestSplit>java  TestSplit
    CSV line is:
    a,b,c,"Company, Inc.", d, e,"Joe's ""Good, Fast, and Cheap"" Food", f, 10/11/2010,1/1/2011, g, h, i
    
    
    a
    b
    c
    "Company
     Inc."
     d
     e
    "Joe's ""Good
     Fast
     and Cheap"" Food"
     f
     10/11/2010
    1/1/2011
     g
     h
     i
    
    D:\projects\TestSplit>
    



    Where that CSV line should be parsed as:

    
    a
    b
    c
    "Company, Inc."
     d
     e
    "Joe's ""Good, Fast, and Cheap"" Food"
     f
     10/11/2010
    1/1/2011
     g
     h
     i
    
    0 讨论(0)
  • 2020-11-27 21:44

    java.time

    Assuming you are using a CSV library for reading the file and supposing that you get the individual values as strings from that library:

        String valueFromCsvLibrary = "10/27/2010";
        try {
            LocalDate date = LocalDate.parse(valueFromCsvLibrary, dateFormatter);
            System.out.println("Parsed date: " + date);
        } catch (DateTimeParseException dtpe) {
            System.err.println("Not a valid date: " + dtpe);
        }
    
    Parsed date: 2010-10-27
    

    You should prefer to process the dates as LocalDate in your code (neither as strings nor as instances of the long outdated and poorly designed Date class).

    Even though I don’t have the experience, I am quite convinced that I would go with some open source CSV library.

    Only in case you are sure that the CSV file doesn’t contain quotes, broken lines, commas in the values or other complications and for some reason you choose to parse it by hand:

        String lineFromCsvFile = "CompanyName,RunDate,10/27/2010,11/12/2010,11/27/2010,12/13/2010,12/27/2010";
        String[] values = lineFromCsvFile.split(",");
        if (values[1].equals("RunDate")) {
            for (int i = 2; i < values.length; i++) {
                LocalDate date = LocalDate.parse(values[i], dateFormatter);
                System.out.println("Parsed date: " + date);
            }
        }
    
    Parsed date: 2010-10-27
    Parsed date: 2010-11-12
    Parsed date: 2010-11-27
    Parsed date: 2010-12-13
    Parsed date: 2010-12-27
    

    Exception handling happens as before, no need to repeat that.

    0 讨论(0)
  • 2020-11-27 21:46

    As other has suggested for splitting and parsing you can use opencsv

    For simple data, split them by "," and parse it and ,Use List to add all these values.

    0 讨论(0)
  • 2020-11-27 21:46

    use java.util.Scanner - you can call useDelimiter() to make the comma your delimiter, and read new tokens with next(). The Scanner can be created directly from your file or a string read from the file.

    0 讨论(0)
  • 2020-11-27 21:55

    You start by reading the entire line into a String. Then you use the String.split(...) function to get all the tokens on the line where the delimiter you use is ",". (or is it "\," when you use a regex?)

    0 讨论(0)
  • 2020-11-27 22:00

    A CSV file is a \n terminated file that each column can be seperated either by:

    • Comma or
    • Tabs \t

    I suggest that you have a BufferedReader that reads the CSV file and use the readLine() method to read the row.

    From each row, use String.split(arg) where arg will be your comma or tab \t to have an array of columns....from there, you know what to do.

    0 讨论(0)
提交回复
热议问题