问题
I am creating a program that reads football matches from different CSV files. The columns I am interested in are present in all the files, but the files have a varying number of columns.
This left me creating a separate mapping function for each variation of file, with a different sample for each type:
type GamesFile14 = CsvProvider<"./data/sample_14.csv">
type GamesFile15 = CsvProvider<"./data/sample_15.csv">
type GamesFile1617 = CsvProvider<"./data/sample_1617.csv">
let mapRows14 (rows:seq<GamesFile14.Row>) = rows |> Seq.map ( fun c -> { Division = c.Div; Date = DateTime.Parse c.Date;
HomeTeam = { Name = c.HomeTeam; Score = c.FTHG; Shots = c.HS; ShotsOnTarget = c.HST; Corners = c.HC; Fouls = c.HF };
AwayTeam = { Name = c.AwayTeam; Score = c.FTAG; Shots = c.AS; ShotsOnTarget = c.AST; Corners = c.AC; Fouls = c.AF };
Odds = { H = float c.B365H; U = float c.B365D; B = float c.B365A } } )
let mapRows15 (rows:seq<GamesFile15.Row>) = rows |> Seq.map ( fun c -> { Division = c.Div; Date = DateTime.Parse c.Date;
HomeTeam = { Name = c.HomeTeam; Score = c.FTHG; Shots = c.HS; ShotsOnTarget = c.HST; Corners = c.HC; Fouls = c.HF };
AwayTeam = { Name = c.AwayTeam; Score = c.FTAG; Shots = c.AS; ShotsOnTarget = c.AST; Corners = c.AC; Fouls = c.AF };
Odds = { H = float c.B365H; U = float c.B365D; B = float c.B365A } } )
let mapRows1617 (rows:seq<GamesFile1617.Row>) = rows |> Seq.map ( fun c -> { Division = c.Div; Date = DateTime.Parse c.Date;
HomeTeam = { Name = c.HomeTeam; Score = c.FTHG; Shots = c.HS; ShotsOnTarget = c.HST; Corners = c.HC; Fouls = c.HF };
AwayTeam = { Name = c.AwayTeam; Score = c.FTAG; Shots = c.AS; ShotsOnTarget = c.AST; Corners = c.AC; Fouls = c.AF };
Odds = { H = float c.B365H; U = float c.B365D; B = float c.B365A } } )
These are again consumed by the loadGames function:
let loadGames season resource =
if season.Year = 14 then GamesFile14.Load(resource).Rows |> mapRows14
else if season.Year = 15 then GamesFile15.Load(resource).Rows |> mapRows15
else GamesFile1617.Load(resource).Rows |> mapRows1617
It seems to me that there must be better ways to get around this problem.
Is there any way I could make my mapping function more generic so that I don't have to repeat the same function over and over?
Is it possible to create the CsvProvider on the fly based on the resource, or do I need to explicitly declare a sample for each variation of my csv-files like in the code above?
Other suggestions?
回答1:
In your scenario, you might get better results from FSharp.Data's CsvFile type. It uses a more dynamic approach to CSV parsing, using the dynamic ?
operator for data access: you lose some of the type-safety guarantees that the type provider gives you, since each separate CSV file will be loaded into the save CsvRow
type -- which means that you can't guarantee at compile time that any given column will be in a file, and you have to be prepared for runtime errors. But in your case, that's just what you want, because it would allow your three functions to be rewritten like this:
let mapRows14 rows = rows |> Seq.map ( fun c -> { Division = c?Div; Date = DateTime.Parse c?Date;
HomeTeam = { Name = c?HomeTeam; Score = c?FTHG; Shots = c?HS; ShotsOnTarget = c?HST; Corners = c?HC; Fouls = c?HF };
AwayTeam = { Name = c?AwayTeam; Score = c?FTAG; Shots = c?AS; ShotsOnTarget = c?AST; Corners = c?AC; Fouls = c?AF };
Odds = { H = float c?B365H; U = float c?B365D; B = float c?B365A } } )
Give CsvFile
a try and see if it solves your problem.
来源:https://stackoverflow.com/questions/39315903/f-csvtypeprovider-extracting-the-same-columns-from-slightly-different-csv-files