How to print the whole line that contains a specified byte offset in a file?

怎甘沉沦 提交于 2019-12-01 16:14:16

问题


I have such a example input.txt file:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
in reprehenderit in voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia deserunt mollit anim id est laborum.

Now I can easly grep for a word and get it's byte offset:

$ grep -ob incididunt /dev/null input.txt 
input.txt:80:incididunt

Sadly, the information about the line contents and the information about th searched word gets lost. I only know the filename and the 80 byte offset. I want to print the whole line that contains that byte offset inside the file.

So ideally that would be to get a script.sh that with two parameters, a file name and a byte offset, outputs the searched line:

$ ./script.sh input.txt 80
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut

Another examples:

For the file=input.txt and the byte offset=130 the output should be:

enim ad minim veniam, quis nostrud exercitation ullamco laboris

For the file=input.txt and any byte offset between 195 up until 253 the output should be:

nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor

For the file=input.txt and the byte offset=400 the output should be:

sunt in culpa qui officia deserunt mollit anim id est laborum.

I have tried:

I can print from the byte offset up until the end of the line with gnu sed, however that misses the eiusmod tempor part. I can't think of any idea how to go "back" in the file, to fetch the part from the newline up until that byte offset.

$ sed -z 's/.\{80\}\([^\n]*\).*/\1\n/' input.txt 
incididunt ut labore et dolore magna aliqua. Ut

I can read character by character, remember last newline, and print from the last newline up until the next. That will not work with shells read, as it omits newlines. I think I can get it to work with using dd, but there's surely must be a simpler solution.

set -- inpux.txt 80
exec 10<"$1"
pos=0
lastnewlinepos=0
for ((i=0;i<"$2";++i)); do
        IFS= read -r -u 10 -N 1 c
        pos=$((pos+1))
        # this will not work..., read omits newlines
        if [ "$c" = $'\n' ]; then
                lastnewlinepost="$pos"
        fi
done
# as I know the last newline before the offset, it's ok to use this now
sed -z 's/.\{'"$lastnewlinepos"'\}\([^\n]*\).*/\1\n/' "$1"

How to print the whole line that "contains" the byte offset inside a file using bash and *nix specific tools?


回答1:


Keep the number of bytes read so far in a variable, when it reaches your byte offset print current line and exit.

$ awk '{read+=1+length} read>=80{print;exit}' input.txt
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
$ awk '{read+=1+length} read>=130{print;exit}' input.txt
enim ad minim veniam, quis nostrud exercitation ullamco laboris

length is the length of current line, we need to add 1 to it because awk trims the record separator (\n by default) from lines.


Note that length will count characters, which may take up to six bytes depending on the locale. To make it count bytes you need to set environment variable LC_ALL to C while running awk, like:

LC_ALL=C awk '{read+=1+length} read>=130{print;exit}' input.txt



回答2:


Please try the following, you can adjust input/output according to your needs, but this outputs you the actual offset of the word and the line containing the word:

#!/bin/bash
SEARCH_TERM="$1"
SEARCH_FILE="$2"
OFFSET_OF_WORD="`grep -ob $SEARCH_TERM $SEARCH_FILE | cut -d':' -f1`"

lastNewLinePos=0
lineNumber=0
for newLinePos in $(grep -b '$' $SEARCH_FILE | cut -d':' -f1)
do
    if (( $OFFSET_OF_WORD >= lastNewLinePos && $OFFSET_OF_WORD < $newLinePos )); then
        echo "Offset: $OFFSET_OF_WORD"
        echo "Line: `sed -n ${lineNumber}p $SEARCH_FILE`"
        break
    fi
    lastNewLinePos=$newLinePos
    let lineNumber++
done

EDIT: Tested with your given input and executed as

./getLineByOffset.sh incididunt input.txt

Edit 2: If you only know the offset, not the actual search term

#!/bin/bash
OFFSET_OF_WORD="$1"
SEARCH_FILE="$2"

lastNewLinePos=0
lineNumber=0
for newLinePos in $(grep -b '$' $SEARCH_FILE | cut -d':' -f1)
do
    if (( $OFFSET_OF_WORD >= lastNewLinePos && $OFFSET_OF_WORD < $newLinePos )); then
        echo "Offset: $OFFSET_OF_WORD"
        echo "Line: `sed -n ${lineNumber}p $SEARCH_FILE`"
        break
    fi
    lastNewLinePos=$newLinePos
    let lineNumber++
done


来源:https://stackoverflow.com/questions/56145864/how-to-print-the-whole-line-that-contains-a-specified-byte-offset-in-a-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!