What's the best way to split a string into fixed length chunks and work with them in Python?

前端未结

关注

 4  1872

I am reading in a line from a text file using:

   file = urllib2.urlopen(\"http://192.168.100.17/test.txt\").read().splitlines()

and output

相关标签:

4条回答

北恋

2020-11-28 10:50
My favorite way to solve this problem is with the re module.
```
import re

def chunkstring(string, length):
  return re.findall('.{%d}' % length, string)
```
One caveat here is that re.findall will not return a chunk that is less than the length value, so any remainder is skipped.

However, if you're parsing fixed-width data, this is a great way to do it.

For example, if I want to parse a block of text that I know is made up of 32 byte characters (like a header section) I find this very readable and see no need to generalize it into a separate function (as in chunkstring):
```
for header in re.findall('.{32}', header_data):
  ProcessHeader(header)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

无人共我

2020-11-28 10:58

I know it's an oldie, but like to add how to chop up a string with variable length columns:

def chunkstring(string, lengths):
    return (string[pos:pos+length].strip()
            for idx,length in enumerate(lengths)
            for pos in [sum(map(int, lengths[:idx]))])

column_lengths = [10,19,13,11,7,7,15]
fields = list(chunkstring(line, column_lengths))

0 讨论(0)

余生分开走

2020-11-28 11:02
One solution would be to use this function:
```
def chunkstring(string, length):
    return (string[0+i:length+i] for i in range(0, len(string), length))
```
This function returns a generator, using a generator comprehension. The generator returns the string sliced, from 0 + a multiple of the length of the chunks, to the length of the chunks + a multiple of the length of the chunks.

You can iterate over the generator like a list, tuple or string - for i in chunkstring(s,n): , or convert it into a list (for instance) with list(generator). Generators are more memory efficient than lists because they generator their elements as they are needed, not all at once, however they lack certain features like indexing.

This generator also contains any smaller chunk at the end:
```
>>> list(chunkstring("abcdefghijklmnopqrstuvwxyz", 5))
['abcde', 'fghij', 'klmno', 'pqrst', 'uvwxy', 'z']
```
Example usage:
```
text = """This is the first line.
           This is the second line.
           The line below is true.
           The line above is false.
           A short line.
           A very very very very very very very very very long line.
           A self-referential line.
           The last line.
        """

lines = (i.strip() for i in text.splitlines())

for line in lines:
    for chunk in chunkstring(line, 16):
        print(chunk)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

心在旅途

2020-11-28 11:05

I think this way is easier to read:

string = "when an unknown printer took a galley of type and scrambled it to make a type specimen book."
length = 20
list_of_strings = []
for i in range(0, len(string), length):
    list_of_strings.append(string[i:length+i])
print(list_of_strings)

0 讨论(0)