I need to apply an SQL query to CSV files (comma-separated text files). My SQL is predefined from another tool, and is not eligible to change. It may contain embedded select
I would load the data into HSQL (HypersonicSQL). Pure Java, correct SQL, well-proven. Pretty much anything else has a bigger footprint.
I'd say embedded db. I'd suggest either Javadb (Derby built into the Java API) or H2 if you don't care about pulling the extra dependency.
If your SQL is predefined and cannot be changed your best option is to load your CSV into a database and run queries against it.
Apache Derby is a viable option, so are MySQL, which even has a CSV storage engine or PostgreSQL.
Does your SQL use any proprietary functions / extensions? If so, that may limit your choices.
There is a Groovy script, gcsvsql that lets you treat csv files as database tables, including joins. With gcsvsql you can do things like:
gcsvsql "select * from people.csv where age > 40"
gcsvsql "select people.name,children.child from people.csv,children.csv where people.name=children.name"
gcsvsql "select avg(score) from people.csv where age < 40"
You can find this script, which is based on the h2 database engine, at Google code here:
http://code.google.com/p/gcsvsql/
I know, it's a very old case, but...
CsvJdbc is a cool library, but there are some issues using DbUtils while mapping ResultsSets to PoJos. A second bad thing is, that dosn't have a good support for different Datatypes.
After playing with CSVJdbc I will use a stupid CsvParser to read the Files an pump them into a HsqlDB or something like that.
maybe a bit late, sorry for that.
I've been developing the csvjdbc for over a year now and since a few weeks I've got "administrator" rights on that project so I've been able to release the most recent version I had produced. it does all "we" need (we: me and my current my colleagues) need, and I'm adding things as bugs are filed.
have a look at it now and decide again. (the web documentation still needs reviewing, for better insight, check the test cases, which are very extensive).