String split with indices in Python

后端 未结 2 588
小蘑菇
小蘑菇 2021-01-04 00:37

I am looking for pythonic way to split a sentence into words, and also store the index information of all the words in a sentence e.g

a = \"This is a sentenc         


        
相关标签:
2条回答
  • 2021-01-04 01:11

    Here is a method using regular expressions:

    >>> import re
    >>> a = "This is a sentence"
    >>> matches = [(m.group(0), (m.start(), m.end()-1)) for m in re.finditer(r'\S+', a)]
    >>> matches
    [('This', (0, 3)), ('is', (5, 6)), ('a', (8, 8)), ('sentence', (10, 17))]
    >>> b, c = zip(*matches)
    >>> b
    ('This', 'is', 'a', 'sentence')
    >>> c
    ((0, 3), (5, 6), (8, 8), (10, 17))
    

    As a one-liner:

    b, c = zip(*[(m.group(0), (m.start(), m.end()-1)) for m in re.finditer(r'\S+', a)])
    

    If you just want the indices:

    c = [(m.start(), m.end()-1) for m in re.finditer(r'\S+', a)]
    
    0 讨论(0)
  • 2021-01-04 01:15

    I think it's more natural to return the start and end of the corresponding splices. eg (0, 4) instead of (0, 3)

    >>> from itertools import groupby
    >>> def splitWithIndices(s, c=' '):
    ...  p = 0
    ...  for k, g in groupby(s, lambda x:x==c):
    ...   q = p + sum(1 for i in g)
    ...   if not k:
    ...    yield p, q # or p, q-1 if you are really sure you want that
    ...   p = q
    ...
    >>> a = "This is a sentence"
    >>> list(splitWithIndices(a))
    [(0, 4), (5, 7), (8, 9), (10, 18)]
    
    >>> a[0:4]
    'This'
    >>> a[5:7]
    'is'
    >>> a[8:9]
    'a'
    >>> a[10:18]
    'sentence'
    
    0 讨论(0)
提交回复
热议问题