In Python, how do I split a string and keep the separators?

前端 未结 13 1018
[愿得一人]
[愿得一人] 2020-11-22 03:26

Here\'s the simplest way to explain this. Here\'s what I\'m using:

re.split(\'\\W\', \'foo/bar spam\\neggs\')
-> [\'foo\', \'bar\', \'spam\', \'eggs\']


        
相关标签:
13条回答
  • 2020-11-22 03:38

    Here is a simple .split solution that works without regex.

    This is an answer for Python split() without removing the delimiter, so not exactly what the original post asks but the other question was closed as a duplicate for this one.

    def splitkeep(s, delimiter):
        split = s.split(delimiter)
        return [substr + delimiter for substr in split[:-1]] + [split[-1]]
    

    Random tests:

    import random
    
    CHARS = [".", "a", "b", "c"]
    assert splitkeep("", "X") == [""]  # 0 length test
    for delimiter in ('.', '..'):
        for _ in range(100000):
            length = random.randint(1, 50)
            s = "".join(random.choice(CHARS) for _ in range(length))
            assert "".join(splitkeep(s, delimiter)) == s
    
    0 讨论(0)
  • 2020-11-22 03:42

    If you are splitting on newline, use splitlines(True).

    >>> 'line 1\nline 2\nline without newline'.splitlines(True)
    ['line 1\n', 'line 2\n', 'line without newline']
    

    (Not a general solution, but adding this here in case someone comes here not realizing this method existed.)

    0 讨论(0)
  • 2020-11-22 03:45
    # This keeps all separators  in result 
    ##########################################################################
    import re
    st="%%(c+dd+e+f-1523)%%7"
    sh=re.compile('[\+\-//\*\<\>\%\(\)]')
    
    def splitStringFull(sh, st):
       ls=sh.split(st)
       lo=[]
       start=0
       for l in ls:
         if not l : continue
         k=st.find(l)
         llen=len(l)
         if k> start:
           tmp= st[start:k]
           lo.append(tmp)
           lo.append(l)
           start = k + llen
         else:
           lo.append(l)
           start =llen
       return lo
      #############################
    
    li= splitStringFull(sh , st)
    ['%%(', 'c', '+', 'dd', '+', 'e', '+', 'f', '-', '1523', ')%%', '7']
    
    0 讨论(0)
  • 2020-11-22 03:47

    If you have only 1 separator, you can employ list comprehensions:

    text = 'foo,bar,baz,qux'  
    sep = ','
    

    Appending/prepending separator:

    result = [x+sep for x in text.split(sep)]
    #['foo,', 'bar,', 'baz,', 'qux,']
    # to get rid of trailing
    result[-1] = result[-1].strip(sep)
    #['foo,', 'bar,', 'baz,', 'qux']
    
    result = [sep+x for x in text.split(sep)]
    #[',foo', ',bar', ',baz', ',qux']
    # to get rid of trailing
    result[0] = result[0].strip(sep)
    #['foo', ',bar', ',baz', ',qux']
    

    Separator as it's own element:

    result = [u for x in text.split(sep) for u in (x, sep)]
    #['foo', ',', 'bar', ',', 'baz', ',', 'qux', ',']
    results = result[:-1]   # to get rid of trailing
    
    0 讨论(0)
  • 2020-11-22 03:47

    I had a similar issue trying to split a file path and struggled to find a simple answer. This worked for me and didn't involve having to substitute delimiters back into the split text:

    my_path = 'folder1/folder2/folder3/file1'

    import re

    re.findall('[^/]+/|[^/]+', my_path)

    returns:

    ['folder1/', 'folder2/', 'folder3/', 'file1']

    0 讨论(0)
  • 2020-11-22 03:47

    I found this generator based approach more satisfying:

    def split_keep(string, sep):
        """Usage:
        >>> list(split_keep("a.b.c.d", "."))
        ['a.', 'b.', 'c.', 'd']
        """
        start = 0
        while True:
            end = string.find(sep, start) + 1
            if end == 0:
                break
            yield string[start:end]
            start = end
        yield string[start:]
    

    It avoids the need to figure out the correct regex, while in theory should be fairly cheap. It doesn't create new string objects and, delegates most of the iteration work to the efficient find method.

    ... and in Python 3.8 it can be as short as:

    def split_keep(string, sep):
        start = 0
        while (end := string.find(sep, start) + 1) > 0:
            yield string[start:end]
            start = end
        yield string[start:]
    
    0 讨论(0)
提交回复
热议问题