问题
How should I deal with missing values in a deedle series?
For example, I have a series with fields Name
and BirthDate
, where BirthDate
is initially DateTime?
and I need to convert BirthDate
to String
.
var newDOB = df.GetColumn<DateTime?>("DOB").Select(x => x.Value.Value != null ? x.Value.Value.ToString("dd/MM/yyyy") : " ");
df.ReplaceColumn("DOB", newDOB);
This is what I tried and it does not work.
What is the best way to convert a missing DateTime?
value to string
for me?
And what is the best way in general to deal with missing values in Deedle series and Deedle dataframes in C#?
回答1:
When you are creating a Deedle series, Deedle detects invalid values and treats them as missing automatically - so when you create a series with NaN
or null
, those are automatically turned into missing values (and this also works for nullables).
Furthermore, the Select
method skips over all missing values. For example, consider this series:
Series<int, DateTime?> ds = Enumerable.Range(0, 100).Select(i =>
new KeyValuePair<int, DateTime?>(i, i%5==0 ? (DateTime?)null : DateTime.Now.AddHours(i))
).ToSeries();
ds.Print();
Here, Deedle recognizes that every fifth value is missing. When you call Select
, it applies the operation only to valid values and every fifth value remains as a missing value:
ds.Select(kvp => kvp.Value.Value.ToString("D")).Print();
If you want to do something with the missing values, you could use FillMissing
(to fill them with a specified string or to copy the value from previous item in the series) or DropMissing
to discard them from the series. You can also use SelectOptional
that calls your function with OptionalValue<V>
and so you can implement your own custom logic for missing values.
This also means that if you have Series<K, DateTime?>
, it is really not very useful, because the null
values are all handled by Deedle - so you can turn it into Series<K, DateTime>
using Select(kvp => kvp.Value.Value)
and let Deedle handle missing values for you.
来源:https://stackoverflow.com/questions/31973143/how-to-deal-with-null-missing-values-in-a-deedle-series-in-c