问题
I have such a example input.txt
file:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
in reprehenderit in voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia deserunt mollit anim id est laborum.
Now I can easly grep for a word and get it's byte offset:
$ grep -ob incididunt /dev/null input.txt
input.txt:80:incididunt
Sadly, the information about the line contents and the information about th searched word gets lost. I only know the filename and the 80
byte offset. I want to print the whole line that contains that byte offset inside the file.
So ideally that would be to get a script.sh
that with two parameters, a file name and a byte offset, outputs the searched line:
$ ./script.sh input.txt 80
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
Another examples:
For the file=input.txt and the byte offset=130 the output should be:
enim ad minim veniam, quis nostrud exercitation ullamco laboris
For the file=input.txt and any byte offset between 195 up until 253 the output should be:
nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
For the file=input.txt and the byte offset=400 the output should be:
sunt in culpa qui officia deserunt mollit anim id est laborum.
I have tried:
I can print from the byte offset up until the end of the line with gnu sed, however that misses the eiusmod tempor
part. I can't think of any idea how to go "back" in the file, to fetch the part from the newline up until that byte offset.
$ sed -z 's/.\{80\}\([^\n]*\).*/\1\n/' input.txt
incididunt ut labore et dolore magna aliqua. Ut
I can read character by character, remember last newline, and print from the last newline up until the next. That will not work with shells read
, as it omits newlines. I think I can get it to work with using dd
, but there's surely must be a simpler solution.
set -- inpux.txt 80
exec 10<"$1"
pos=0
lastnewlinepos=0
for ((i=0;i<"$2";++i)); do
IFS= read -r -u 10 -N 1 c
pos=$((pos+1))
# this will not work..., read omits newlines
if [ "$c" = $'\n' ]; then
lastnewlinepost="$pos"
fi
done
# as I know the last newline before the offset, it's ok to use this now
sed -z 's/.\{'"$lastnewlinepos"'\}\([^\n]*\).*/\1\n/' "$1"
How to print the whole line that "contains" the byte offset inside a file using bash and *nix specific tools?
回答1:
Keep the number of bytes read so far in a variable, when it reaches your byte offset print current line and exit.
$ awk '{read+=1+length} read>=80{print;exit}' input.txt
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
$ awk '{read+=1+length} read>=130{print;exit}' input.txt
enim ad minim veniam, quis nostrud exercitation ullamco laboris
length
is the length of current line, we need to add 1 to it because awk trims the record separator (\n
by default) from lines.
Note that length
will count characters, which may take up to six bytes depending on the locale. To make it count bytes you need to set environment variable LC_ALL
to C
while running awk, like:
LC_ALL=C awk '{read+=1+length} read>=130{print;exit}' input.txt
回答2:
Please try the following, you can adjust input/output according to your needs, but this outputs you the actual offset of the word and the line containing the word:
#!/bin/bash
SEARCH_TERM="$1"
SEARCH_FILE="$2"
OFFSET_OF_WORD="`grep -ob $SEARCH_TERM $SEARCH_FILE | cut -d':' -f1`"
lastNewLinePos=0
lineNumber=0
for newLinePos in $(grep -b '$' $SEARCH_FILE | cut -d':' -f1)
do
if (( $OFFSET_OF_WORD >= lastNewLinePos && $OFFSET_OF_WORD < $newLinePos )); then
echo "Offset: $OFFSET_OF_WORD"
echo "Line: `sed -n ${lineNumber}p $SEARCH_FILE`"
break
fi
lastNewLinePos=$newLinePos
let lineNumber++
done
EDIT: Tested with your given input and executed as
./getLineByOffset.sh incididunt input.txt
Edit 2: If you only know the offset, not the actual search term
#!/bin/bash
OFFSET_OF_WORD="$1"
SEARCH_FILE="$2"
lastNewLinePos=0
lineNumber=0
for newLinePos in $(grep -b '$' $SEARCH_FILE | cut -d':' -f1)
do
if (( $OFFSET_OF_WORD >= lastNewLinePos && $OFFSET_OF_WORD < $newLinePos )); then
echo "Offset: $OFFSET_OF_WORD"
echo "Line: `sed -n ${lineNumber}p $SEARCH_FILE`"
break
fi
lastNewLinePos=$newLinePos
let lineNumber++
done
来源:https://stackoverflow.com/questions/56145864/how-to-print-the-whole-line-that-contains-a-specified-byte-offset-in-a-file