I want to read a rather large csv file and process it (slice, dice, summarize etc.) interactively
(data exploration). My idea is to read the file into a database (H2) and use SQL to process it:
Read the file: I use Ostermiller csv parser
Determine the type of each column: I select randomly 50 rows and derive the type (int, long, double, date, string) of each column
I want to use Squeryl to process. To do so I need to create a case class dynamically. That's the bottleneck so far!
I upload the file to H2 and use any SQL command.
My questions:
- Is there a better general interactive way of doing this in Scala?
- Is there a way to solve the 3rd point? To state it differently, given a list of types (corresponding to the columns in the csv file), is it possible to dynamically create a case class corresponding to the table in Squeryl? To my understanding I can do that using macros, but I do not have enough exposure to do that.
I think your approach to the first question sounds reasonable.
Regarding your 2nd question - as an addition to drexin's answer - it is possible to generate the bytecode, with a library such as ASM. With such a library you can generate the same byte code as a case class would.
As scala is a statically typed language there is no way to dynamically create classes except for reflection, which is slow and dangerous and therefore should be avoided. Even with macros you cannot do this. Macros are evaluated at compile-time, not at runtime, so you need to know the structure of your data at compile-time. What do you need the case classes for, if you don't even know what your data looks like? What benefit do you expect from this over using a Map[String,Any]
?
I think you want to create a sealed base class and then a series of case classes as subclasses of it. Each subclass will wrap a different type that you support.
Then you can use match statements and deconstruction to deal with the individual types, and treat them generically via the base class in the places where it doesn't matter.
You can't create a class for an entire row since you don't know enough about it at compile time. Even if you could dynamically generate a class (maybe by invoking the compiler at runtime), you wouldn't be able to benefit from type-safety and most of your code would have to treat it generically anyway.
来源:https://stackoverflow.com/questions/10583283/dynamically-generate-case-class-in-scala