Python equivalent of unix “strings” utility

前端 未结 2 865
故里飘歌
故里飘歌 2020-12-06 06:32

I\'m trying to write a script which will extract strings from an executable binary and save them in a file. Having this file be newline-separated isn\'t an option since the

相关标签:
2条回答
  • 2020-12-06 06:43

    To quote man strings:

    STRINGS(1)                   GNU Development Tools                  STRINGS(1)
    
    NAME
           strings - print the strings of printable characters in files.
    
    [...]
    DESCRIPTION
           For each file given, GNU strings prints the printable character
           sequences that are at least 4 characters long (or the number given with
           the options below) and are followed by an unprintable character.  By
           default, it only prints the strings from the initialized and loaded
           sections of object files; for other types of files, it prints the
           strings from the whole file.
    

    You could achieve a similar result by using a regex matching at least 4 printable characters. Something like that:

    >>> import re
    
    >>> content = "hello,\x02World\x88!"
    >>> re.findall("[^\x00-\x1F\x7F-\xFF]{4,}", content)
    ['hello,', 'World']
    

    Please note this solution require the entire file content to be loaded in memory.

    0 讨论(0)
  • 2020-12-06 07:01

    Here's a generator that yields all the strings of printable characters >= min (4 by default) in length that it finds in filename:

    import string
    
    def strings(filename, min=4):
        with open(filename, errors="ignore") as f:  # Python 3.x
        # with open(filename, "rb") as f:           # Python 2.x
            result = ""
            for c in f.read():
                if c in string.printable:
                    result += c
                    continue
                if len(result) >= min:
                    yield result
                result = ""
            if len(result) >= min:  # catch result at EOF
                yield result
    

    Which you can iterate over:

    for s in strings("something.bin"):
        # do something with s
    

    ... or store in a list:

    sl = list(strings("something.bin"))
    

    I've tested this very briefly, and it seems to give the same output as the Unix strings command for the arbitrary binary file I chose. However, it's pretty naïve (for a start, it reads the whole file into memory at once, which might be expensive for large files), and is very unlikely to approach the performance of the Unix strings command.

    0 讨论(0)
提交回复
热议问题