问题
I also noticed something strange about Deedle mapRows function i cant explain:
let col1 = Series.ofObservations[1=>10.0;2=>System.Double.NaN;3=>System.Double.NaN;4=>10.0;5=>System.Double.NaN;6=>10.0; ]
let col2 = Series.ofObservations[1=>9.0;2=>5.5;3=>System.Double.NaN;4=>9.0;5=>System.Double.NaN;6=>9.0; ]
let f1 = Frame.ofColumns [ "c1" => col1; "c2" => col2 ]
let f2 = f1 |> Frame.mapRows (fun k r -> r) |> Frame.ofRows
let f3 = f1 |> Frame.mapRows (fun k r -> let x = r.Get("c1");
let y = r.Get("c2");
r) |> Frame.ofRows
val f1 : Frame<int,string> =
c1 c2
1 -> 10 9
2 -> <missing> 5.5
3 -> <missing> <missing>
4 -> 10 9
5 -> <missing> <missing>
6 -> 10 9
val f2 : Frame<int,string> =
c1 c2
1 -> 10 9
2 -> <missing> 5.5
3 -> <missing> <missing>
4 -> 10 9
5 -> <missing> <missing>
6 -> 10 9
val f3 : Frame<int,string> =
c1 c2
1 -> 10 9
2 -> <missing> <missing>
3 -> <missing> <missing>
4 -> 10 9
5 -> <missing> <missing>
6 -> 10 9
How can f3 has a different value than f2? all i did with f3 is to get value from the obejectseries.
I am trying to use this mapRows function to do row based process and produce a objectseries then mapRows can create a new frame with the same row keys. The process has to be row based as the column value needs to be updated based on its own value and neighboring value.
The calculation cant be done using column to column directly as the calculation changes based on the row value.
Appreciate any advice
Update
Since the original question was posted, I have since used Deedle in C#. To my surprise the row based calculation is very easy in C# and the way C# Frame.rows function handle missing values are very different than F# mapRows function. The following is a very basic example i used to try and true the logic. it might be useful to anyone who is searching for similar application:
Things to pay attention to are: 1. The rows function didn't remove the row while both columns' value are missing 2. The mean function is smart enough to calculate mean based on available data point.
using System.Text;
using System.Threading.Tasks;
using Deedle;
namespace TestDeedleRowProcessWithMissingValues
{
class Program
{
static void Main(string[] args)
{
var s1 = new SeriesBuilder<DateTime, double>(){
{DateTime.Today.Date.AddDays(-5),10.0},
{DateTime.Today.Date.AddDays(-4),9.0},
{DateTime.Today.Date.AddDays(-3),8.0},
{DateTime.Today.Date.AddDays(-2),double.NaN},
{DateTime.Today.Date.AddDays(-1),6.0},
{DateTime.Today.Date.AddDays(-0),5.0}
}.Series;
var s2 = new SeriesBuilder<DateTime, double>(){
{DateTime.Today.Date.AddDays(-5),10.0},
{DateTime.Today.Date.AddDays(-4),double.NaN},
{DateTime.Today.Date.AddDays(-3),8.0},
{DateTime.Today.Date.AddDays(-2),double.NaN},
{DateTime.Today.Date.AddDays(-1),6.0}
}.Series;
var f = Frame.FromColumns(new KeyValuePair<string, Series<DateTime, double>>[] {
KeyValue.Create("s1",s1),
KeyValue.Create("s2",s2)
});
s1.Print();
f.Print();
f.Rows.Select(kvp => kvp.Value).Print();
// 29/05/2015 12:00:00 AM -> series [ s1 => 10; s2 => 10]
// 30/05/2015 12:00:00 AM -> series [ s1 => 9; s2 => <missing>]
// 31/05/2015 12:00:00 AM -> series [ s1 => 8; s2 => 8]
// 1/06/2015 12:00:00 AM -> series [ s1 => <missing>; s2 => <missing>]
// 2/06/2015 12:00:00 AM -> series [ s1 => 6; s2 => 6]
// 3/06/2015 12:00:00 AM -> series [ s1 => 5; s2 => <missing>]
f.Rows.Select(kvp => kvp.Value.As<double>().Mean()).Print();
// 29/05/2015 12:00:00 AM -> 10
// 30/05/2015 12:00:00 AM -> 9
// 31/05/2015 12:00:00 AM -> 8
// 1/06/2015 12:00:00 AM -> <missing>
// 2/06/2015 12:00:00 AM -> 6
// 3/06/2015 12:00:00 AM -> 5
//Console.ReadLine();
}
}
}
回答1:
The reason why f3
differs follows from the way mapRows
handles missing values.
When you're accessing a value using r.Get("C1")
, you either get the value or you get a ValueMissingException
. The mapRows
function handles this exception and marks the entire row as missing. If you write just:
let f3 = f1 |> Frame.mapRows (fun k r ->
let x = r.Get("c1");
let y = r.Get("c2");
r)
Then the result will be:
1 -> series [ c1 => 10; c2 => 9]
2 -> <missing>
3 -> <missing>
4 -> series [ c1 => 10; c2 => 9]
5 -> <missing>
6 -> series [ c1 => 10; c2 => 9]
If you want to write a function that returns the frame as it was (reading the data from original rows and producing new rows), you could do something like:
f1
|> Frame.mapRows (fun k r ->
[ "X" => OptionalValue.asOption(r.TryGet("c1"));
"Y" => OptionalValue.asOption(r.TryGet("c2")) ]
|> Series.ofOptionalObservations )
|> Frame.ofRows
来源:https://stackoverflow.com/questions/26049661/deedle-frame-maprows-how-to-properly-use-it-and-how-to-construct-objectseries-pr