问题
When reading a CSV into a DataTable, I am trying to add options for boolean and null values that don't seem to be working. For example, a file containing data similar to:
Id,MaxDiscount,Name,Active,AltId
1,,Foo,1,ABC123
2,10,Bar,0,DEF345
And the following logic that uses a schema file to dynamically get the headers and data types we are expecting:
var dt = new DataTable();
using (var reader = new StreamReader(file.FullName))
using (var csv = new CsvReader(reader))
{
csv.Configuration.HasHeaderRecord = true;
csv.Configuration.IgnoreQuotes = false;
csv.Configuration.TypeConverterOptionsCache.GetOptions<int>().NullValues.Add(string.Empty);
csv.Configuration.TypeConverterOptionsCache.GetOptions<bool>().BooleanFalseValues.Add("0");
csv.Configuration.TypeConverterOptionsCache.GetOptions<bool>().BooleanTrueValues.Add("1");
using (var dr = new CsvDataReader(csv))
{
foreach (var p in schema.Properties)
{
var type = Type.GetType(p.Type, true, true);
var dc = new DataColumn
{
ColumnName = p.Name,
Unique = p.IsId,
AllowDBNull = p.Nullable,
DataType = type
};
dt.Columns.Add(dc);
}
dt.Load(dr);
}
}
This leads to the error String was not recognized as a valid Boolean. Couldn't store <0> in Active Column. Expected type is Boolean.
If I manually change the data and replace 0
with false
and 1
with true
, I then the boolean values work, but I get a similar error: Input string was not in a correct format. Couldn't store <> in MaxDiscount Column. Expected type is Int32.
Is there something I am missing here in order to get this to work? Or do the Type Converter options only work on known objects?
EDIT:
I am unable to use any pre-defined object models when parsing the CSV files as they can contain any number of fields. As long as a schema exists, then the program should know how to handle it. An example schema would be something like the following:
{
"type": "Part",
"bucket": "s3Bucket",
"prefix": "prefix/of/datafile",
"targetDirectory": "..\\path\\to\\working\\dir",
"delimiter": ",",
"properties": [
{
"name": "Id",
"type": "System.String",
"required": true,
"nullable": false,
"isId": true,
"defaultValue": null,
"minLength": 6,
"maxLength": 8
},
{
"name": "MaxDiscount",
"type": "System.Int32",
"required": true,
"nullable": true,
"isId": false,
"defaultValue": null,
"minLength": -1,
"maxLength": -1
},
{
"name": "Name",
"type": "System.String",
"required": true,
"nullable": false,
"isId": false,
"defaultValue": null,
"minLength": 1,
"maxLength": 127
},
{
"name": "Active",
"type": "System.Boolean",
"required": true,
"nullable": false,
"isId": false,
"defaultValue": null,
"minLength": 1,
"maxLength": 1
},
{
"name": "AltId",
"type": "System.String",
"required": true,
"nullable": true,
"isId": false,
"defaultValue": null,
"minLength": 1,
"maxLength": 127
}
]
}
In this case, the Properties
in the schema would relate to columns in the CSV file. This, in theory, would allow me to parse the files and validate the data types at runtime, rather than having to create a new object model each time a new CSV layout is introduced.
回答1:
In my opinion the CsvDataReader
class is useless - the implementation of GetFieldType
returns typeof(string)
, GetValue
also returns string
s, so although it implements the typed data accessor methods, they are never called by the DataTable
class Load
method.
Thus no CsvHelper
mapping occurs - the conversion is done by DataTable
using standard string to type converters.
I would suggest removing the usage of CsvDataReader
class and replacing the dt.Load(dr);
call with something like this:
static void Load(DataTable dt, CsvReader csv)
{
if (csv.Configuration.HasHeaderRecord)
{
if (!csv.Read()) return;
csv.ReadHeader();
}
var valueTypes = new Type[dt.Columns.Count];
for (int i = 0; i < valueTypes.Length; i++)
{
var dc = dt.Columns[i];
var type = dc.DataType;
if (dc.AllowDBNull && type.IsValueType)
type = typeof(Nullable<>).MakeGenericType(type);
valueTypes[i] = type;
}
var valueBuffer = new object[valueTypes.Length];
dt.BeginLoadData();
while (csv.Read())
{
for (int i = 0; i < valueBuffer.Length; i++)
valueBuffer[i] = csv.GetField(valueTypes[i], i);
dt.LoadDataRow(valueBuffer, true);
}
dt.EndLoadData();
}
Essentially preparing column type mapping and using CsvReader.GetField(type, index)
method for populating the DataRow
values. This way the conversion is performed by the CsvReader
class and will use all the conversion options.
Btw, none of the shown options for boolean or null values is really needed - all they are handled by the CsvHelper
default type converters.
回答2:
From CsvHelper documentation
If you want to specify columns and column types, the data table will be loaded with the types automatically converted.
What i see its ignoring CsvReader
type converter options when using CsvDataReader
.
But if you use csv.GetRecords
it will use defined type converter options.
List<csvData> result = csv.GetRecords<csvData>().ToList();
You will need to have as class for your csv file as below
public class csvData
{
public int Id { get; set; }
public string MaxDiscount { get; set; }
public string Name { get; set; }
public bool Active { get; set; }
public string AltId { get; set; }
}
回答3:
[Second try]
I was able to load data into DataTable
object via CsvDataReader
as long as the collection of DataColumns
was created by CsvDataReader
and Configuration.Delimiter
was set to comma, but... boolean field (Active
) wasn't really boolean.
As per my tests and my understanding of documentation, there is only one way to get proper data - via helper class, which needs to set attributes to fields. Two of them are very important:
BooleanFalseValuesAttribute
The string values used to represent a boolean false when converting.BooleanTrueValuesAttribute
The string values used to represent a boolean true when converting.
So, the decoration of class may look like:
public class MyData
{
[Name("Id")]
public int Id { get; set; }
[Name("MaxDiscount")]
public int? MaxDiscount { get; set; }
[Name("Name")]
public string Name { get; set; }
[Name("Active")]
[BooleanTrueValues("1")]
[BooleanFalseValues("0")]
public bool? Active { get; set; }
[Name("AltId")]
public string AltId { get; set; }
}
And helper class, which maps fields:
public class MyDataMapper: ClassMap<MyData>
{
public MyDataMapper()
{
Map(m => m.Id);
Map(m => m.MaxDiscount);
Map(m => m.Name);
Map(m => m.Active);
Map(m => m.AltId);
}
}
Then i've tried to set configuration:
csv.Configuration.RegisterClassMap<MyDataMapper>();
to be able to grab data into DataTable
via CsvDataReader
object, but... no success :(
Seems that CsvDataReader
ignores configuration for some reason (or i wasn't able to successfully set it up).
Whenever there's a need to map fields, the documentation says that the correct way to grab data is to use GetRecords<T>
method:
var records = csv.GetRecords<Foo>();
See: Mapping properties
If i understand you well, you want to fetch data into DataTable object... Take a look at this:
List<MyData> records = null;
using (var reader = new StreamReader(myfile))
using (var csv = new CsvReader(reader))
{
csv.Configuration.HasHeaderRecord = true;
csv.Configuration.IgnoreQuotes = false;
csv.Configuration.Delimiter = ",";
csv.Configuration.RegisterClassMap<MyDataMapper>();
records = csv.GetRecords<MyData>().ToList();
dt = records.Select(x=>dt.LoadDataRow(new object[]
{
x.Id,
x.MaxDiscount,
x.Name,
x.Active,
x.AltId
},false))
.CopyToDataTable();
dt.Dump();
A result is:
Id MaxDiscount Name Active AltId
1 null Foo True ABC123
2 10 Bar False DEF345
来源:https://stackoverflow.com/questions/54545023/nullvalues-option-not-working-when-loading-to-datatable