How to find position of word in file?

前端 未结 3 2001
太阳男子
太阳男子 2021-01-02 21:32

for example I have file and word \"test\". file is partially binary but have string \"test\". How to find position of word ( index ) in file without load to memory this file

相关标签:
3条回答
  • 2021-01-02 22:16

    You cannot find the position of a text within a file unless you open the file. It is like asking someone to read a newspaper without opening the eye.

    To answer the first part of your question, it is relatively simple.

    with open('Path/to/file', 'r') as f:
        content = f.read()
        print content.index('test')
    
    0 讨论(0)
  • 2021-01-02 22:17

    Try this:

    with open(file_dmp_path, 'rb') as file:
    fsize = bsize = os.path.getsize(file_dmp_path)
    word_len = len(SEARCH_WORD)
    while True:
        p = file.read(bsize).find(SEARCH_WORD)
        if p > -1:
            pos_dec = file.tell() - (bsize - p)
            file.seek(pos_dec + word_len)
            bsize = fsize - file.tell()
        if file.tell() < fsize:
            seek = file.tell() - word_len + 1
            file.seek(seek)
        else:
            break
    
    0 讨论(0)
  • 2021-01-02 22:22

    You can use memory-mapped files and regular expressions.

    Memory-mapped file objects behave like both strings and like file objects. Unlike normal string objects, however, these are mutable. You can use mmap objects in most places where strings are expected; for example, you can use the re module to search through a memory-mapped file. Since they’re mutable, you can change a single character by doing obj[index] = 'a', or change a substring by assigning to a slice: obj[i1:i2] = '...'. You can also read and write data starting at the current file position, and seek() through the file to different positions.

    Example

    import re
    import mmap
    
    f = open('path/filename', 'r+b')
    mf = mmap.mmap(f.fileno(), 0)
    mf.seek(0) # reset file cursor
    m = re.search('pattern', mf)
    print m.start(), m.end()
    mf.close()
    f.close()
    
    0 讨论(0)
提交回复
热议问题