Split a string by spaces — preserving quoted substrings — in Python

后端未结

关注

 16  679

I have a string which is like this:

this is \"a test\"

I\'m trying to write something in Python to split it up by space while ignoring spac

相关标签:

16条回答

面向向阳花

2020-11-22 15:46

Depending on your use case, you may also want to check out the csv module:

import csv
lines = ['this is "a string"', 'and more "stuff"']
for row in csv.reader(lines, delimiter=" "):
    print(row)

Output:

['this', 'is', 'a string']
['and', 'more', 'stuff']

0 讨论(0)

离开以前

2020-11-22 15:47

Since this question is tagged with regex, I decided to try a regex approach. I first replace all the spaces in the quotes parts with \x00, then split by spaces, then replace the \x00 back to spaces in each part.

Both versions do the same thing, but splitter is a bit more readable then splitter2.

import re

s = 'this is "a test" some text "another test"'

def splitter(s):
    def replacer(m):
        return m.group(0).replace(" ", "\x00")
    parts = re.sub('".+?"', replacer, s).split()
    parts = [p.replace("\x00", " ") for p in parts]
    return parts

def splitter2(s):
    return [p.replace("\x00", " ") for p in re.sub('".+?"', lambda m: m.group(0).replace(" ", "\x00"), s).split()]

print splitter2(s)

0 讨论(0)

一整个雨季

2020-11-22 15:48

I suggest:

test string:

s = 'abc "ad" \'fg\' "kk\'rdt\'" zzz"34"zzz "" \'\''

to capture also "" and '':

import re
re.findall(r'"[^"]*"|\'[^\']*\'|[^"\'\s]+',s)

result:

['abc', '"ad"', "'fg'", '"kk\'rdt\'"', 'zzz', '"34"', 'zzz', '""', "''"]

to ignore empty "" and '':

import re
re.findall(r'"[^"]+"|\'[^\']+\'|[^"\'\s]+',s)

result:

['abc', '"ad"', "'fg'", '"kk\'rdt\'"', 'zzz', '"34"', 'zzz']

0 讨论(0)

一个人的身影

2020-11-22 15:49

If you don't care about sub strings than a simple

>>> 'a short sized string with spaces '.split()

Performance:

>>> s = " ('a short sized string with spaces '*100).split() "
>>> t = timeit.Timer(stmt=s)
>>> print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)
171.39 usec/pass

Or string module

>>> from string import split as stringsplit; 
>>> stringsplit('a short sized string with spaces '*100)

Performance: String module seems to perform better than string methods

>>> s = "stringsplit('a short sized string with spaces '*100)"
>>> t = timeit.Timer(s, "from string import split as stringsplit")
>>> print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)
154.88 usec/pass

Or you can use RE engine

>>> from re import split as resplit
>>> regex = '\s+'
>>> medstring = 'a short sized string with spaces '*100
>>> resplit(regex, medstring)

Performance

>>> s = "resplit(regex, medstring)"
>>> t = timeit.Timer(s, "from re import split as resplit; regex='\s+'; medstring='a short sized string with spaces '*100")
>>> print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)
540.21 usec/pass

For very long strings you should not load the entire string into memory and instead either split the lines or use an iterative loop

0 讨论(0)

上一页 1 2 3