Spock: Reading Test Data from CSV File

问题

I'm trying to write an elegant Spock specification that will read a very large test data from CSV file without loading all the data into the memory. I'm looking for your feedback on how you might do it better than what I currently have here.

Let's assume my simplified CSV file looks like the below:-

1,2
3,4
5,6

The assertion is "column 1" + 1 == "column 2"

I'm using OpenCSV to do my CSV parsing simply because the actual CSV file contains strings with special characters like double quotes and commas, and rudimentary parsing through splitting the string by comma and such will not work.

<dependency>
    <groupId>net.sf.opencsv</groupId>
    <artifactId>opencsv</artifactId>
    <version>2.3</version>
</dependency>

Attempt 1

My first attempt is to loop through the CSV and perform assertion on every row. While this approach works, I can't use @Unroll to isolate every assertion into separate independent tests.

def "read from csv"() {
    expect:
    def reader = new CSVReader(...)
    def fields

    while ((fields = reader.readNext()) != null) {
        def firstNum = Integer.valueOf(fields[0])
        def secondNum = Integer.valueOf(fields[1])

        firstNum + 1 == secondNum
    }
}

Attempt 2

This attempt allows me to utilize @Unroll but this requires loading the entire data into memory, which is what I'm trying to avoid in the first place.

@Unroll
def "read from csv"() {
    expect:
    Integer.valueOf(firstNum as String) + 1 == Integer.valueOf(secondNum as String)

    where:
    [firstNum, secondNum] << new CSVReader(...).readAll()
}

Attempt 3

After reading http://spock-framework.readthedocs.org/en/latest/data_driven_testing.html#data-pipes , I can just create an object that implements Iterable... and Spock will only instruct the data provider to query the next value only when it is needed, which is exactly what I want.

@Unroll
def "read from csv"() {
    given:
    CSVParser csvParser = new CSVParser()

    expect:
    def fields = csvParser.parseLine(line as String)
    def firstNum = Integer.valueOf(fields[0])
    def secondNum = Integer.valueOf(fields[1])

    firstNum + 1 == secondNum

    where:
    line << new Iterable() {
        @Override
        Iterator iterator() {
            return new Scanner(...)
        }
    }
}

This attempt isn't too bad, but it looks weird that I have to do some CSV parsing in the expect block that clutters the actual intent here, which is to perform the assertion.

Attempt 4

My final attempt pretty much creates an iterator wrapper that will return the fields as separate variables, but the code is rather ugly to read unless I extract the Iterable class into a separate API.

@Unroll
def "read from csv"() {
    expect:
    Integer.valueOf(firstNum as String) + 1 == Integer.valueOf(secondNum as String)

    where:
    [firstNum, secondNum] << new Iterable() {
        @Override
        Iterator iterator() {
            new Iterator() {
                def reader = new CSVReader(...)

                def fields

                @Override
                boolean hasNext() {
                    fields = reader.readNext()
                    return fields != null
                }

                @Override
                Object next() {
                    return fields
                }

                @Override
                void remove() {
                    throw new UnsupportedOperationException()
                }
            }
        }
    }
}

Question

My question is... how would you approach this problem? Is there a better way (or a better CSV library)? I know Apache Commons CSV is probably the only parser I'm aware of that implements Iterable, but it has been a SNAPSHOT for a long time.

Thanks much.

回答1:

Write a utility class CSVFile that implements Iterable<Iterable<String>> (or Iterable<Iterable<Integer>>). Then use where: [firstNum, secondNum] << new CSVFile("path/to/file").

回答2:

Probably GroovyCSV will do what you are looking for:

GroovyCSV is a library to make csv processing just a little bit Groovier. The library uses opencsv behind the scenes and merely tries to add a thin layer of “Groovy-ness” to the mix.

It's CsvParser methods return iterators.

回答3:

It might be too late but, I've coded this based on Peter Niederwieser's suggestion. I will try to submit this to Spock.

It has dependency on Guava and Apache Commons CSV libraries. I will try to remove these dependencies before submitting the patch.

import com.google.common.collect.Lists;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVRecord;

import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.Iterator;

/**
 * @author Aravind R Yarram
 * @version 1.0.0-SNAPSHOT
 * @since 1.0.0
 */
public class CSVFile implements Iterable<Iterable<String>>
{
    private final String fileName;

    public CSVFile(String fileName)
    {
        this.fileName = fileName;
    }

    /**
     * Returns an iterator over a set of elements of type T.
     *
     * @return an Iterator.
     */
    @Override public Iterator<Iterable<String>> iterator()
    {
        Iterable<CSVRecord> records = null;

        try
        {
            Reader in = new FileReader(fileName);
            records = CSVFormat.EXCEL.parse(in);
        }
        catch (IOException e)
        {
            throw new RuntimeException(e);
        }
        final Iterator<CSVRecord> it = records.iterator();

        return new Iterator<Iterable<String>>()
        {

            @Override public boolean hasNext()
            {
                return it.hasNext();
            }

            /**
             * Returns the next element in the iteration.
             *
             * @return the next element in the iteration
             * @throws NoSuchElementException if the iteration has no more elements
             */
            @Override public Iterable<String> next()
            {
                CSVRecord next = it.next();

                return Lists.newArrayList(next.iterator());
            }

            @Override public void remove()
            {
                throw new UnsupportedOperationException("Not supported");
            }
        };
    }
}

来源：https://stackoverflow.com/questions/25189342/spock-reading-test-data-from-csv-file

标签

unit-testing

csv

groovy

spock

opencsv