We seem to be seeing more and more questions about executing awk on Excel spreadsheets so here is a Q/A on how to do that specific thing.
I have this information in
The combination of the following VBS and shell scripts create a CSV file for each sheet in the Excel spreadsheet:
$ cat xls2csv.vbs
csv_format = 6
Dim strFilename
Dim objFSO
Set objFSO = CreateObject("scripting.filesystemobject")
strFilename = objFSO.GetAbsolutePathName(WScript.Arguments(0))
If objFSO.fileexists(strFilename) Then
Call Writefile(strFilename)
Else
wscript.echo "no such file!"
End If
Set objFSO = Nothing
Sub Writefile(ByVal strFilename)
Dim objExcel
Dim objWB
Dim objws
Set objExcel = CreateObject("Excel.Application")
Set objWB = objExcel.Workbooks.Open(strFilename)
For Each objws In objWB.Sheets
objws.Copy
objExcel.ActiveWorkbook.SaveAs objWB.Path & "\" & objws.Name & ".csv", csv_format
objExcel.ActiveWorkbook.Close False
Next
objWB.Close False
objExcel.Quit
Set objExcel = Nothing
End Sub
.
$ cat xls2csv
PATH="$HOME:$PATH"
# the original XLS input file path components
inXlsPath="$1"
inXlsDir=$(dirname "$inXlsPath")
xlsFile=$(basename "$inXlsPath")
xlsBase="${xlsFile%.*}"
# The tmp dir we'll copy the XLS to and run the tool on
# to get the CSVs generated
tmpXlsDir="/usr/tmp/${xlsBase}.$$"
tmpXlsPath="${tmpXlsDir}/${xlsFile}"
absXlsPath="C:/cygwin64/${tmpXlsPath}" # need an absolute path for VBS to work
mkdir -p "$tmpXlsDir"
trap 'rm -f "${tmpXlsDir}/${xlsFile}"; rmdir "$tmpXlsDir"; exit' 0
cp "$inXlsPath" "$tmpXlsDir"
cygstart "$HOME/xls2csv.vbs" "$absXlsPath"
printf "Waiting for \"${tmpXlsDir}/~\$${xlsFile}\" to be created:\n" >&2
while [ ! -f "${tmpXlsDir}/~\$${xlsFile}" ]
do
# VBS is done when this tmp file is created and later removed
printf "." >&2
sleep 1
done
printf " Done.\n" >&2
printf "Waiting for \"${tmpXlsDir}/~\$${xlsFile}\" to be removed:\n" >&2
while [ -f "${tmpXlsDir}/~\$${xlsFile}" ]
do
# VBS is done when this tmp file is removed
printf "." >&2
sleep 1
done
printf " Done.\n" >&2
numFiles=0
for file in "$tmpXlsDir"/*.csv
do
numFiles=$(( numFiles + 1 ))
done
if (( numFiles >= 1 ))
then
outCsvDir="${inXlsDir}/${xlsBase}.csvs"
mkdir -p "$outCsvDir"
mv "$tmpXlsDir"/*.csv "$outCsvDir"
fi
Now we execute the shell script which internally calls cygstart to run the VBS script to generate the CSV files (one per sheet) in a subdirectory under the same directory where the Excel file exists named based on the Excel file name (e.g. Excel file staff.xlsx
produces CSVs directory staff.csvs
):
$ ./xls2csv "$D/staff.xlsx"
Waiting for "/usr/tmp/staff.2700/~$staff.xlsx" to be created:
.. Done.
Waiting for "/usr/tmp/staff.2700/~$staff.xlsx" to be removed:
. Done.
There is only one sheet with the default name Sheet1
in the target Excel file "$D/staff.xlsx"
so the output of the above is a file "$D/staff.csvs/Sheet1.csv"
:
$ cat "$D/staff.csvs/Sheet1.csv"
Name,Position
Sue,Manager
Bill,Secretary
Pat,Engineer
$ awk -F, -v name="Bill" '$1==name{print $2}' "$D/staff.csvs/Sheet1.csv"
Secretary
Also see What's the most robust way to efficiently parse CSV using awk? for how to then operate on those CSVs.
See also https://stackoverflow.com/a/58879683/1745001 for how to do the opposite, i.e. call a cygwin bash command from a Windows batch file.