What's the best way to split a string into fixed length chunks and work with them in Python?

前端 未结 4 1870
不思量自难忘°
不思量自难忘° 2020-11-28 10:47

I am reading in a line from a text file using:

   file = urllib2.urlopen(\"http://192.168.100.17/test.txt\").read().splitlines()

and output

相关标签:
4条回答
  • 2020-11-28 10:50

    My favorite way to solve this problem is with the re module.

    import re
    
    def chunkstring(string, length):
      return re.findall('.{%d}' % length, string)
    

    One caveat here is that re.findall will not return a chunk that is less than the length value, so any remainder is skipped.

    However, if you're parsing fixed-width data, this is a great way to do it.

    For example, if I want to parse a block of text that I know is made up of 32 byte characters (like a header section) I find this very readable and see no need to generalize it into a separate function (as in chunkstring):

    for header in re.findall('.{32}', header_data):
      ProcessHeader(header)
    
    0 讨论(0)
  • 2020-11-28 10:58

    I know it's an oldie, but like to add how to chop up a string with variable length columns:

    def chunkstring(string, lengths):
        return (string[pos:pos+length].strip()
                for idx,length in enumerate(lengths)
                for pos in [sum(map(int, lengths[:idx]))])
    
    column_lengths = [10,19,13,11,7,7,15]
    fields = list(chunkstring(line, column_lengths))
    
    0 讨论(0)
  • 2020-11-28 11:02

    One solution would be to use this function:

    def chunkstring(string, length):
        return (string[0+i:length+i] for i in range(0, len(string), length))
    

    This function returns a generator, using a generator comprehension. The generator returns the string sliced, from 0 + a multiple of the length of the chunks, to the length of the chunks + a multiple of the length of the chunks.

    You can iterate over the generator like a list, tuple or string - for i in chunkstring(s,n): , or convert it into a list (for instance) with list(generator). Generators are more memory efficient than lists because they generator their elements as they are needed, not all at once, however they lack certain features like indexing.

    This generator also contains any smaller chunk at the end:

    >>> list(chunkstring("abcdefghijklmnopqrstuvwxyz", 5))
    ['abcde', 'fghij', 'klmno', 'pqrst', 'uvwxy', 'z']
    

    Example usage:

    text = """This is the first line.
               This is the second line.
               The line below is true.
               The line above is false.
               A short line.
               A very very very very very very very very very long line.
               A self-referential line.
               The last line.
            """
    
    lines = (i.strip() for i in text.splitlines())
    
    for line in lines:
        for chunk in chunkstring(line, 16):
            print(chunk)
    
    0 讨论(0)
  • 2020-11-28 11:05

    I think this way is easier to read:

    string = "when an unknown printer took a galley of type and scrambled it to make a type specimen book."
    length = 20
    list_of_strings = []
    for i in range(0, len(string), length):
        list_of_strings.append(string[i:length+i])
    print(list_of_strings)
    
    0 讨论(0)
提交回复
热议问题