NullValues Option Not Working When Loading to DataTable

问题

When reading a CSV into a DataTable, I am trying to add options for boolean and null values that don't seem to be working. For example, a file containing data similar to:

Id,MaxDiscount,Name,Active,AltId
1,,Foo,1,ABC123
2,10,Bar,0,DEF345

And the following logic that uses a schema file to dynamically get the headers and data types we are expecting:

var dt = new DataTable();
using (var reader = new StreamReader(file.FullName))
using (var csv = new CsvReader(reader))
{
    csv.Configuration.HasHeaderRecord = true;
    csv.Configuration.IgnoreQuotes = false;
    csv.Configuration.TypeConverterOptionsCache.GetOptions<int>().NullValues.Add(string.Empty);
    csv.Configuration.TypeConverterOptionsCache.GetOptions<bool>().BooleanFalseValues.Add("0");
    csv.Configuration.TypeConverterOptionsCache.GetOptions<bool>().BooleanTrueValues.Add("1");

    using (var dr = new CsvDataReader(csv))
    {
        foreach (var p in schema.Properties)
        {
            var type = Type.GetType(p.Type, true, true);
            var dc = new DataColumn
            {
                ColumnName = p.Name,
                Unique = p.IsId,
                AllowDBNull = p.Nullable,
                DataType = type
            };

            dt.Columns.Add(dc);
        }
        dt.Load(dr);
    }
}

This leads to the error String was not recognized as a valid Boolean. Couldn't store <0> in Active Column. Expected type is Boolean.

If I manually change the data and replace 0 with false and 1 with true, I then the boolean values work, but I get a similar error: Input string was not in a correct format. Couldn't store <> in MaxDiscount Column. Expected type is Int32.

Is there something I am missing here in order to get this to work? Or do the Type Converter options only work on known objects?

EDIT:

I am unable to use any pre-defined object models when parsing the CSV files as they can contain any number of fields. As long as a schema exists, then the program should know how to handle it. An example schema would be something like the following:

{
  "type": "Part",
  "bucket": "s3Bucket",
  "prefix": "prefix/of/datafile",
  "targetDirectory": "..\\path\\to\\working\\dir",
  "delimiter": ",",
  "properties": [
    {
      "name": "Id",
      "type": "System.String",
      "required": true,
      "nullable": false,
      "isId": true,
      "defaultValue": null,
      "minLength": 6,
      "maxLength": 8
    },
    {
      "name": "MaxDiscount",
      "type": "System.Int32",
      "required": true,
      "nullable": true,
      "isId": false,
      "defaultValue": null,
      "minLength": -1,
      "maxLength": -1
    },
    {
      "name": "Name",
      "type": "System.String",
      "required": true,
      "nullable": false,
      "isId": false,
      "defaultValue": null,
      "minLength": 1,
      "maxLength": 127
    },
    {
      "name": "Active",
      "type": "System.Boolean",
      "required": true,
      "nullable": false,
      "isId": false,
      "defaultValue": null,
      "minLength": 1,
      "maxLength": 1
    },
    {
      "name": "AltId",
      "type": "System.String",
      "required": true,
      "nullable": true,
      "isId": false,
      "defaultValue": null,
      "minLength": 1,
      "maxLength": 127
    }
  ]
}

In this case, the Properties in the schema would relate to columns in the CSV file. This, in theory, would allow me to parse the files and validate the data types at runtime, rather than having to create a new object model each time a new CSV layout is introduced.

回答1:

In my opinion the CsvDataReader class is useless - the implementation of GetFieldType returns typeof(string), GetValue also returns strings, so although it implements the typed data accessor methods, they are never called by the DataTable class Load method.

Thus no CsvHelper mapping occurs - the conversion is done by DataTable using standard string to type converters.

I would suggest removing the usage of CsvDataReader class and replacing the dt.Load(dr); call with something like this:

static void Load(DataTable dt, CsvReader csv)
{
    if (csv.Configuration.HasHeaderRecord)
    {
        if (!csv.Read()) return;
        csv.ReadHeader();
    }
    var valueTypes = new Type[dt.Columns.Count];
    for (int i = 0; i < valueTypes.Length; i++)
    {
        var dc = dt.Columns[i];
        var type = dc.DataType;
        if (dc.AllowDBNull && type.IsValueType)
            type = typeof(Nullable<>).MakeGenericType(type);
        valueTypes[i] = type;
    }
    var valueBuffer = new object[valueTypes.Length];
    dt.BeginLoadData();
    while (csv.Read())
    {
        for (int i = 0; i < valueBuffer.Length; i++)
            valueBuffer[i] = csv.GetField(valueTypes[i], i);
        dt.LoadDataRow(valueBuffer, true);
    }
    dt.EndLoadData();
}

Essentially preparing column type mapping and using CsvReader.GetField(type, index) method for populating the DataRow values. This way the conversion is performed by the CsvReader class and will use all the conversion options.

Btw, none of the shown options for boolean or null values is really needed - all they are handled by the CsvHelper default type converters.

回答2:

From CsvHelper documentation

If you want to specify columns and column types, the data table will be loaded with the types automatically converted.

What i see its ignoring CsvReader type converter options when using CsvDataReader.

But if you use csv.GetRecords it will use defined type converter options.

List<csvData> result = csv.GetRecords<csvData>().ToList();

You will need to have as class for your csv file as below

public class csvData
{
    public int Id { get; set; }
    public string MaxDiscount { get; set; }
    public string Name { get; set; }
    public bool Active { get; set; }
    public string AltId { get; set; }
}

回答3:

[Second try]

I was able to load data into DataTable object via CsvDataReader as long as the collection of DataColumns was created by CsvDataReader and Configuration.Delimiter was set to comma, but... boolean field (Active) wasn't really boolean.

As per my tests and my understanding of documentation, there is only one way to get proper data - via helper class, which needs to set attributes to fields. Two of them are very important:

BooleanFalseValuesAttribute The string values used to represent a boolean false when converting. BooleanTrueValuesAttribute The string values used to represent a boolean true when converting.

So, the decoration of class may look like:

public class MyData
{
    [Name("Id")]
    public int Id { get; set; }
    [Name("MaxDiscount")]
    public int? MaxDiscount { get; set; }
    [Name("Name")]
    public string Name { get; set; }
    [Name("Active")]
    [BooleanTrueValues("1")]
    [BooleanFalseValues("0")]
    public bool? Active { get; set; }
    [Name("AltId")]
    public string AltId { get; set; }
}

And helper class, which maps fields:

public class MyDataMapper: ClassMap<MyData>
{
    public MyDataMapper()
    {
        Map(m => m.Id);
        Map(m => m.MaxDiscount);
        Map(m => m.Name);
        Map(m => m.Active);
        Map(m => m.AltId);
    }
}

Then i've tried to set configuration:

csv.Configuration.RegisterClassMap<MyDataMapper>();

to be able to grab data into DataTable via CsvDataReader object, but... no success :(

Seems that CsvDataReader ignores configuration for some reason (or i wasn't able to successfully set it up).

Whenever there's a need to map fields, the documentation says that the correct way to grab data is to use GetRecords<T> method:

var records = csv.GetRecords<Foo>();

See: Mapping properties

If i understand you well, you want to fetch data into DataTable object... Take a look at this:

List<MyData> records = null;
using (var reader = new StreamReader(myfile))
using (var csv = new CsvReader(reader))
{
    csv.Configuration.HasHeaderRecord = true;
    csv.Configuration.IgnoreQuotes = false;
    csv.Configuration.Delimiter = ",";
    csv.Configuration.RegisterClassMap<MyDataMapper>();
    records = csv.GetRecords<MyData>().ToList();
    dt = records.Select(x=>dt.LoadDataRow(new object[]
            {
                x.Id,
                x.MaxDiscount,
                x.Name,
                x.Active,
                x.AltId
            },false))
            .CopyToDataTable();
     dt.Dump();

A result is:

Id MaxDiscount Name Active AltId
1  null        Foo  True   ABC123 
2  10          Bar  False  DEF345

来源：https://stackoverflow.com/questions/54545023/nullvalues-option-not-working-when-loading-to-datatable

标签

.net

csvhelper