问题
Am wanting to split the following string:
Quantity [*,'EXTRA 05',*]
With the desired results being:
["Quantity", "[*,'EXTRA 05',*]"]
The closest I have found is using shlex.split, however this removes the internal quotes giving the following result:
['Quantity', '[*,EXTRA 05,*]']
Any suggestions would be greatly appreciated.
EDIT:
Will also require multiple splits such as:
"Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
To:
["Quantity", "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]
回答1:
To treat string, the basic way is the regular expression tool ( module re
)
Given the infos you give (this mean they may be unsufficient) the following code does the job:
import re
r = re.compile('(?! )[^[]+?(?= *\[)'
'|'
'\[.+?\]')
s1 = "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
print r.findall(s1)
print '---------------'
s2 = "'zug hug'Quantity boondoggle 'fish face monkey "\
"dung' [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
print r.findall(s2)
result
['Quantity', "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]
---------------
["'zug hug'Quantity boondoggle 'fish face monkey dung'", "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]
The regular expression pattern must be undesrtood as follows:
'|'
means OR
So the regex pattern expresses two partial RE:(?! )[^[]+?(?= *\[)
and\[.+?\]
The first partial RE :
The core is [^[]+
Brackets define a set of characters. The symbol ^
being after the first bracket [
, it means that the set is defined as all the characters that aren't the ones that follow the symbol ^
.
Presently [^[]
means any character that isn't an opening bracket [ and, as there's a +
after this definition of set, [^[]+
means sequence of characters among them there is no opening bracket.
Now, there is a question mark after [^[]+
: it means that the sequence catched must stop before what is symbolized just after the question mark.
Here, what follows the ?
is (?= *\[)
which is a lookahead assertion, composed of (?=....)
that signals it is a positive lookahead assertion and of *\[
, this last part being the sequence in front of which the catched sequence must stop. *\[
means: zero,one or more blanks until the opening bracket (backslash \
needed to eliminate the meaning of [
as the opening of a set of characters).
There's also (?! )
in front of the core, it's a negative lookahead assertion: it is necessary to make this partial RE to catch only sequences beginning with a blank, so avoiding to catch successions of blanks. Remove this (?! )
and you'll see the effect.
The second partial RE :
\[.+?\]
means : the opening bracket characater [ , a sequence of characters catched by .+?
(the dot matching with any character except \n
) , this sequence must stop in front of the ending bracket character ] that is the last character to be catched.
.
EDIT
string = "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
import re
print re.split(' (?=\[)',string)
result
['Quantity', "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]
!!
回答2:
Advised for picky people, the algorithm WON'T split well every string you pass through it, just strings like:
"Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
"Quantity [*,'EXTRA 05',*]"
"Quantity [*,'EXTRA 05',*] [*,'EXTRA 10',*] [*,'EXTRA 07',*] [*,'EXTRA 09',*]"
string = "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
splitted_string = []
#This adds "Quantity" to the position 0 of splitted_string
splitted_string.append(string.split(" ")[0])
#The for goes from 1 to the lenght of string.split(" "),increasing the x by 2
#The first iteration x is 1 and x+1 is 2, the second x=3 and x+1=4 etc...
#The first iteration concatenate "[*,'EXTRA" and "05',*]" in one string
#The second iteration concatenate "[*,'EXTRA" and "09',*]" in one string
#If the string would be bigger, it will works
for x in range(1,len(string.split(" ")),2):
splitted_string.append("%s %s" % (string.split(" ")[x],string.split(" ")[x+1]))
When I execute the code, splitted string at the end contains:
['Quantity', "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]
splitted_string[0] = 'Quantity'
splitted_string[1] = "[*,'EXTRA 05',*]"
splitted_string[2] = "[*,'EXTRA 09',*]"
I think that is exactly what you're looking for. If I'm wrong let me know, or if you need some explanation of the code. I hope it helps
回答3:
Assuming you want a general solution for splitting at spaces but not on space in quotations: I don't know of any Python library to do this, but there doesn't mean there isn't one.
In the absence of a known pre-rolled solution I would simply roll my own. It's relatively easy to scan a string looking for spaces and then use the Python slice functionality to divide up the string into the parts you want. To ignore spaces in quotes you can simply include a flag that switches on encountering a quote symbol to switch the space sensing on and off.
This is some code I knocked up to do this, it is not extensively tested:
def spaceSplit(string) :
last = 0
splits = []
inQuote = None
for i, letter in enumerate(string) :
if inQuote :
if (letter == inQuote) :
inQuote = None
else :
if (letter == '"' or letter == "'") :
inQuote = letter
if not inQuote and letter == ' ' :
splits.append(string[last:i])
last = i+1
if last < len(string) :
splits.append(string[last:])
return splits
来源:https://stackoverflow.com/questions/20256066/python-split-string-by-spaces-except-when-in-quotes-but-keep-the-quotes