I\'m looking for a way to convert xlsx files to csv files on Linux.
I do not want to use PHP/Perl or anything like that since I\'m looking at processing several mill
The Gnumeric spreadsheet application comes with a command line utility called ssconvert that can convert between a variety of spreadsheet formats:
$ ssconvert Book1.xlsx newfile.csv
Using exporter Gnumeric_stf:stf_csv
$ cat newfile.csv
Foo,Bar,Baz
1,2,3
123.6,7.89,
2012/05/14,,
The,last,Line
To install on Ubuntu:
apt-get install gnumeric
To install on Mac:
brew install gnumeric
In bash, I used this libreoffice command to convert all my xlsx files in the current directory:
for i in *.xlsx; do libreoffice --headless --convert-to csv "$i" ; done
It takes care of spaces in the filename.
Tried again some years later, and it didn't work. This thread gives some tips, but the quickiest solution was to run as root (or running a sudo libreoffice
). Not elegant, but quick.
Use the command scalc.exe in Windows
If you are OK to run Java command line then you can do it with Apache POI HSSF's Excel Extractor. It has a main method that says to be the command line extractor. This one seems to just dump everything out. They point out to this example that converts to CSV. You would have to compile it before you can run it but it too has a main
method so you should not have to do much coding per se to make it work.
Another option that might fly but will require some work on the other end is to make your Excel files come to you as Excel XML Data or XML Spreadsheet of whatever MS calls that format these days. It will open a whole new world of opportunities for you to slice and dice it the way you want.
Another option would be to use R via a small bash wrapper for convenience:
xlsx2txt(){
echo '
require(xlsx)
write.table(read.xlsx2(commandArgs(TRUE)[1], 1), stdout(), quote=F, row.names=FALSE, col.names=T, sep="\t")
' | Rscript --vanilla - $1 2>/dev/null
}
xlsx2txt file.xlsx > file.txt
Using the Gnumeric spreadsheet application which comes which a commandline utility called ssconvert is indeed super simple:
find . -name '*.xlsx' -exec ssconvert -T Gnumeric_stf:stf_csv {} \;
and you're done!
Use csvkit
in2csv data.xlsx > data.csv
For details check their excellent docs