I have a set of data as input and need the second last field based on deleimiter. The lines may have different numbers of delimiter. How can I get second last field ?
cuts
utility:$ cat file.txt
text,blah,blaah,foo
this,is,another,text,line
$ cuts -2 file.txt
blaah
text
cuts, which stands for "cut on steroids":
- automatically figures out the input field separators
- supports multi-char (and regexp) separators
- automatically pastes (side-by-side) multiple columns from multiple files
- supports negative offsets (from end of line)
- has good defaults to save typing + allows the user to override them
and much more.
I wrote cuts
after being frustrated with the too many limitations of cut
on Unix. It is designed to replace various cut
/paste
combos, slicing and dicing columns from multiple files, with multiple separator variations, while imposing minimal typing from the user.
You can get cuts
(free software, Artistic Licence) from github: https://github.com/arielf/cuts/
Calling cuts
without arguments will print a detailed Usage
message.
There's no need to use cut
, rev
, or any other tools external to bash here at all. Just read each line into an array, and pick out the piece you want:
while IFS=, read -r -a entries; do
printf '%s\n' "${entries[${#entries[@]} - 2]}"
done <file
Doing this in pure bash is far faster than starting up a pipeline, at least for reasonably small inputs. For large inputs, the better tool is awk.
Code for GNU sed:
$ echo text,blah,blaah,foo|sed -r 's/^(\S+,){2}(\S+),.*/\2/' blaah $ echo this,is,another,text,line|sed -r 's/^(\S+,){2}(\S+),.*/\2/' text
Code example similar to sudo_O's awk
code:
$ sed -r 's/.*,(\w+),\w+$/\1/' file blaah text
It might be better to use more specialised programs for CSV
files, eg. awk or excel.
Awk is suited well for this:
awk -F, '{print $(NF-1)}' file
The variable NF is a special awk variable that contains the number of fields in the current record.
Perl solution similar to awk solution from @iiSeymour
perl -lane 'print $F[-2]' file
These command-line options are used:
n
loop around every line of the input file, do not automatically print every line
l
removes newlines before processing, and adds them back in afterwards
a
autosplit mode – split input lines into the @F array. Defaults to splitting on whitespace
e
execute the perl code
The @F
autosplit array starts at index [0] while awk fields start with $1
-1
is the last element
-2
is the second to last element
Got a hint from Unix cut except last two tokens and able to figure out the answer :
cat datafile | rev | cut -d '/' -f 2 | rev