Using CsvBeanReader to read a CSV file with a variable number of columns

前端 未结 3 1028
甜味超标
甜味超标 2020-12-04 01:55

So I\'m working on parsing a .csv file. I took the advice of another thread somewhere on StackOverflow and downloaded SuperCSV. I finally got pretty much everything working,

3条回答
  •  有刺的猬
    2020-12-04 02:24

    Edit: Update for Super CSV 2.0.0-beta-1

    Please note the API has changed in Super CSV 2.0.0-beta-1 (the code example is based on 1.52). The getCSVHeader() method on all readers is now getHeader() (to be in line with writeHeader on the writers).

    Also, SuperCSVException has been renamed to SuperCsvException.


    Edit: Update for Super CSV 2.1.0

    Since version 2.1.0 it's possible to execute the cell processors after reading a line of CSV by using the new executeProcessors() method. For more information see this example on the project website. Please note this is only relevant for CsvListReader, as it's the only reader that allows for variable column length.


    You're correct - CsvBeanReader doesn't support CSV files with a variable number of columns. According to most CSV specifications (including RFC 4180), the number of columns must be the same on every row.

    For this reason (as a Super CSV developer) I'm reluctant to add this functionality to Super CSV. If you can think of an elegant way to add it then feel free to make suggestions on the project's SourceForge site. It would probably mean a new reader that extends upon CsvBeanReader: it would have to split the reading and mapping/processing into two separate methods (you can't do any processing or mapping to fields of the bean unless you know how many columns there are).

    Simple solution

    The simple solution to this (if you have control of the CSV file you're working with) is to simply add a blank column when writing your CSV file (the first line in your example would have a comma at the end - to indicate the last column is empty). That way, your CSV file will be valid (it will have the same number of columns on every row) and you can use CsvBeanReader as you're already doing.

    If that's not possible, then all is not lost!

    Fancy solution

    As you probably realize, CsvBeanReader uses the name mapping to associate each column in the CSV file with a field in your bean, and the CellProcessor array to process each column. In other words, you have to know how many columns there are (and what they represent) if you want to use it.

    CsvListReader, on the other hand, is very primitive and can read rows of varying length (because it doesn't need to process or map them).

    So you can combine all the features of CsvBeanReader with CsvListReader (as done in the following example) by reading the file with both readers in parallel: using CsvListReader to figure out how many columns there are, and CsvBeanReader to do the processing/mapping.

    Note that this makes the assumption that it's only ever the birthDate column that may not be present (i.e. it wouldn't work if you can't tell which column is missing).

    package example;
    
    import java.io.StringReader;
    import java.util.Date;
    
    import org.supercsv.cellprocessor.ParseDate;
    import org.supercsv.cellprocessor.ift.CellProcessor;
    import org.supercsv.exception.SuperCSVException;
    import org.supercsv.io.CsvBeanReader;
    import org.supercsv.io.CsvListReader;
    import org.supercsv.io.ICsvBeanReader;
    import org.supercsv.io.ICsvListReader;
    import org.supercsv.prefs.CsvPreference;
    
    public class VariableColumns {
    
        private static final String INPUT = "name,birthDate,city\n"
            + "John,New York\n" 
            + "Sally,22/03/1974,London\n" 
            + "Jim,Sydney";
    
        // cell processors
        private static final CellProcessor[] NORMAL_PROCESSORS = 
        new CellProcessor[] {null, new ParseDate("dd/MM/yyyy"), null };
        private static final CellProcessor[] NO_BIRTHDATE_PROCESSORS = 
        new CellProcessor[] {null, null };
    
        // name mappings
        private static final String[] NORMAL_HEADER = 
        new String[] { "name", "birthDate", "city" };
        private static final String[] NO_BIRTHDATE_HEADER = 
        new String[] { "name", "city" };
    
        public static void main(String[] args) {
    
            // using bean reader and list reader together (to read the same file)
            final ICsvBeanReader beanReader = new CsvBeanReader(new StringReader(
                    INPUT), CsvPreference.STANDARD_PREFERENCE);
            final ICsvListReader listReader = new CsvListReader(new StringReader(
                    INPUT), CsvPreference.STANDARD_PREFERENCE);
    
            try {
                // skip over header
                beanReader.getCSVHeader(true);
                listReader.getCSVHeader(true);
    
                while (listReader.read() != null) {
    
                    final String[] nameMapping;
                    final CellProcessor[] processors;
    
                    if (listReader.length() == NORMAL_HEADER.length) {
                        // all columns present - use normal header/processors
                        nameMapping = NORMAL_HEADER;
                        processors = NORMAL_PROCESSORS;
    
                    } else if (listReader.length() == NO_BIRTHDATE_HEADER.length) {
                        // one less column - birth date must be missing
                        nameMapping = NO_BIRTHDATE_HEADER;
                        processors = NO_BIRTHDATE_PROCESSORS;
    
                    } else {
                        throw new SuperCSVException(
                                "unexpected number of columns: "
                                        + listReader.length());
                    }
    
                    // can now use CsvBeanReader safely 
                    // (we know how many columns there are)
                    Person person = beanReader.read(Person.class, nameMapping,
                            processors);
    
                    System.out.println(String.format(
                            "Person: name=%s, birthDate=%s, city=%s",
                            person.getName(), person.getBirthDate(),
                            person.getCity()));
    
                }
            } catch (Exception e) {
                // handle exceptions here
                e.printStackTrace();
            } finally {
                // close readers here
            }
        }
    
        public static class Person {
    
            private String name;
            private Date birthDate;
            private String city;
    
            public String getName() {
                return name;
            }
    
            public void setName(String name) {
                this.name = name;
            }
    
            public Date getBirthDate() {
                return birthDate;
            }
    
            public void setBirthDate(Date birthDate) {
                this.birthDate = birthDate;
            }
    
            public String getCity() {
                return city;
            }
    
            public void setCity(String city) {
                this.city = city;
            }
        }
    
    }
    

    I hope this helps.

    Oh, and is there any reason why the fields in your Entry class don't follow normal naming conventions (camelCase)? If you update your header array to use camelcase, then your fields can be camelcase as well.

提交回复
热议问题