Does anyone know of any tools to provide simple, fast queries of flat files using a SQL-like declarative query language? I\'d rather not pay the overhead of loading the file in
You can look for HXTT JDBC Drivers. They provide JDBC drivers for most type of flat files, excel etc .
You can execute simple SQL queries on it.
They have trial versions available as well
I made a tool that might help. http://www.mccoyonlinestore.com/index.php?txtSearch=mccoy_rdbms your sql could be "Select Max(value) from animals" or it could by "Select * from animals order by value desc"
I just stumbled across this Python script which does something like what you want, although it only supports very basic queries.
I never managed to find a satisfying answer to my question, but I did at least find a solution to my toy problem using uniq
s "-f" option, which I had been unaware of:
cat animals.txt | sort -t " " -k1,1 -k2,2nr \
| awk -F' ' '{print $2, " ", $1}' | uniq -f 1
The awk
portion above could, obviously, be skipped entirely if the input file were created with columns in the opposite order.
I'm still holding out hope for a SQL-like tool, though.
I wrote TxtSushi mostly to do SQL selects on flat files. Here is the command chain for your example (all of these commands are from TxtSushi):
tabtocsv animals.txt | namecolumns - | tssql -table animals - \ 'select col1, max(as_int(col2)) from animals group by col1'
namecolumns is only required because animals.txt doesn't have a header row. You can get a quick sense of what is possible by looking through the example scripts. There are also links to similar tools on the bottom of the main page.
Perl DBI using DBD::AnyData