I have a bunch of daily printer logs in CSV
format and I\'m writing a script to keep track of how much paper is being used and save the info to a database, but I\'v
As I wrote in another answer:
Rather than interfere with what is evidently source data, i.e. the stuff inside the quotes, you might consider replacing the field-separator commas (with say |
) instead:
s/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g
And then splitting on |
(assuming none of your data has |
in it).
Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern
There is probably an easier way using sed
alone, but this should work. Loop on the file, for each line match the parentheses with grep -o
then replace the commas in the line with spaces (or whatever it is you would like to use to get rid of the commas - if you want to preserve the data you can use a non printable and explode it back to commas afterward).
i=1 && IFS=$(echo -en "\n\b") && for a in $(< test.txt); do
var="${a}"
for b in $(sed -n ${i}p test.txt | grep -o '"[^"]*"'); do
repl="$(sed "s/,/ /g" <<< "${b}")"
var="$(sed "s#${b}#${repl}#" <<< "${var}")"
done
let i+=1
echo "${var}"
done