Is there a \"canonical\" way of doing that? I\'ve been using head -n | tail -1
which does the trick, but I\'ve been wondering if there\'s a Bash tool that speci
Wow, all the possibilities!
Try this:
sed -n "${lineNum}p" $file
or one of these depending upon your version of Awk:
awk -vlineNum=$lineNum 'NR == lineNum {print $0}' $file
awk -v lineNum=4 '{if (NR == lineNum) {print $0}}' $file
awk '{if (NR == lineNum) {print $0}}' lineNum=$lineNum $file
(You may have to try the nawk
or gawk
command).
Is there a tool that only does the print that particular line? Not one of the standard tools. However, sed
is probably the closest and simplest to use.
This question being tagged Bash, here's the Bash (≥4) way of doing: use mapfile
with the -s
(skip) and -n
(count) option.
If you need to get the 42nd line of a file file
:
mapfile -s 41 -n 1 ary < file
At this point, you'll have an array ary
the fields of which containing the lines of file
(including the trailing newline), where we have skipped the first 41 lines (-s 41
), and stopped after reading one line (-n 1
). So that's really the 42nd line. To print it out:
printf '%s' "${ary[0]}"
If you need a range of lines, say the range 42–666 (inclusive), and say you don't want to do the math yourself, and print them on stdout:
mapfile -s $((42-1)) -n $((666-42+1)) ary < file
printf '%s' "${ary[@]}"
If you need to process these lines too, it's not really convenient to store the trailing newline. In this case use the -t
option (trim):
mapfile -t -s $((42-1)) -n $((666-42+1)) ary < file
# do stuff
printf '%s\n' "${ary[@]}"
You can have a function do that for you:
print_file_range() {
# $1-$2 is the range of file $3 to be printed to stdout
local ary
mapfile -s $(($1-1)) -n $(($2-$1+1)) ary < "$3"
printf '%s' "${ary[@]}"
}
No external commands, only Bash builtins!
You can also use Perl for this:
perl -wnl -e '$.== NUM && print && exit;' some.file
I have a unique situation where I can benchmark the solutions proposed on this page, and so I'm writing this answer as a consolidation of the proposed solutions with included run times for each.
Set Up
I have a 3.261 gigabyte ASCII text data file with one key-value pair per row. The file contains 3,339,550,320 rows in total and defies opening in any editor I have tried, including my go-to Vim. I need to subset this file in order to investigate some of the values that I've discovered only start around row ~500,000,000.
Because the file has so many rows:
My best-case-scenario is a solution that extracts only a single line from the file without reading any of the other rows in the file, but I can't think of how I would accomplish this in Bash.
For the purposes of my sanity I'm not going to be trying to read the full 500,000,000 lines I'd need for my own problem. Instead I'll be trying to extract row 50,000,000 out of 3,339,550,320 (which means reading the full file will take 60x longer than necessary).
I will be using the time
built-in to benchmark each command.
Baseline
First let's see how the head
tail
solution:
$ time head -50000000 myfile.ascii | tail -1
pgm_icnt = 0
real 1m15.321s
The baseline for row 50 million is 00:01:15.321, if I'd gone straight for row 500 million it'd probably be ~12.5 minutes.
cut
I'm dubious of this one, but it's worth a shot:
$ time cut -f50000000 -d$'\n' myfile.ascii
pgm_icnt = 0
real 5m12.156s
This one took 00:05:12.156 to run, which is much slower than the baseline! I'm not sure whether it read through the entire file or just up to line 50 million before stopping, but regardless this doesn't seem like a viable solution to the problem.
AWK
I only ran the solution with the exit
because I wasn't going to wait for the full file to run:
$ time awk 'NR == 50000000 {print; exit}' myfile.ascii
pgm_icnt = 0
real 1m16.583s
This code ran in 00:01:16.583, which is only ~1 second slower, but still not an improvement on the baseline. At this rate if the exit command had been excluded it would have probably taken around ~76 minutes to read the entire file!
Perl
I ran the existing Perl solution as well:
$ time perl -wnl -e '$.== 50000000 && print && exit;' myfile.ascii
pgm_icnt = 0
real 1m13.146s
This code ran in 00:01:13.146, which is ~2 seconds faster than the baseline. If I'd run it on the full 500,000,000 it would probably take ~12 minutes.
sed
The top answer on the board, here's my result:
$ time sed "50000000q;d" myfile.ascii
pgm_icnt = 0
real 1m12.705s
This code ran in 00:01:12.705, which is 3 seconds faster than the baseline, and ~0.4 seconds faster than Perl. If I'd run it on the full 500,000,000 rows it would have probably taken ~12 minutes.
mapfile
I have bash 3.1 and therefore cannot test the mapfile solution.
Conclusion
It looks like, for the most part, it's difficult to improve upon the head
tail
solution. At best the sed
solution provides a ~3% increase in efficiency.
(percentages calculated with the formula % = (runtime/baseline - 1) * 100
)
Row 50,000,000
sed
perl
head|tail
awk
cut
Row 500,000,000
sed
perl
head|tail
awk
cut
Row 3,338,559,320
sed
perl
head|tail
awk
cut
Using what others mentioned, I wanted this to be a quick & dandy function in my bash shell.
Create a file: ~/.functions
Add to it the contents:
getline() {
line=$1
sed $line'q;d' $2
}
Then add this to your ~/.bash_profile
:
source ~/.functions
Now when you open a new bash window, you can just call the function as so:
getline 441 myfile.txt
You may also used sed print and quit:
sed -n '10{p;q;}' file # print line 10