Python regex split without empty string

后端 未结 5 1443
栀梦
栀梦 2020-12-03 09:54

I have the following file names that exhibit this pattern:

000014_L_20111007T084734-20111008T023142.txt
000014_U_20111007T084734-20111008T023142.txt
...


        
相关标签:
5条回答
  • 2020-12-03 10:20

    I'm no Python expert but maybe you could just remove the empty strings from your list?

    str_list = re.split('^[0-9]+_[LU]_|-|\.txt$', f)
    time_info = filter(None, str_list)
    
    0 讨论(0)
  • 2020-12-03 10:20
    >>> f='000014_L_20111007T084734-20111008T023142.txt'
    >>> f[10:-4].split('-')
    ['0111007T084734', '20111008T023142']
    

    or, somewhat more general:

    >>> f[f.rfind('_')+1:-4].split('-')
    ['20111007T084734', '20111008T023142']
    
    0 讨论(0)
  • 2020-12-03 10:28

    Since this came up on google and for completeness, try using re.findall as an alternative!

    This does require a little re-thinking, but it still returns a list of matches like split does. This makes it a nice drop-in replacement for some existing code and gets rid of the unwanted text. Pair it with lookaheads and/or lookbehinds and you get very similar behavior.

    Yes, this is a bit of a "you're asking the wrong question" answer and doesn't use re.split(). It does solve the underlying issue- your list of matches suddenly have zero-length strings in it and you don't want that.

    0 讨论(0)
  • 2020-12-03 10:33

    Don't use re.split(), use the groups() method of regex Match/SRE_Match objects.

    >>> f = '000014_L_20111007T084734-20111008T023142.txt'
    >>> time_info = re.search(r'[LU]_(\w+)-(\w+)\.', f).groups()
    >>> time_info
    ('20111007T084734', '20111008T023142')
    

    You can even name the capturing groups and retrieve them in a dict, though you use groupdict() rather than groups() for that. (The regex pattern for such a case would be something like r'[LU]_(?P<groupA>\w+)-(?P<groupB>\w+)\.')

    0 讨论(0)
  • 2020-12-03 10:41

    If the timestamps are always after the second _ then you can use str.split and str.strip:

    >>> strs = "000014_L_20111007T084734-20111008T023142.txt"
    >>> strs.strip(".txt").split("_",2)[-1].split("-")
    ['20111007T084734', '20111008T023142']
    
    0 讨论(0)
提交回复
热议问题