Extracting year from string in python

后端 未结 3 1311
一整个雨季
一整个雨季 2021-01-07 02:43

How can I parse the foll. in python to extract the year:

\'years since 1250-01-01 0:0:0\'

The answer should be 1250

相关标签:
3条回答
  • 2021-01-07 03:14

    The following regex should make the four digit year available as the first capture group:

    ^.*\(d{4})-\d{2}-\d{2}.*$
    
    0 讨论(0)
  • 2021-01-07 03:25

    You can use a regex with a capture group around the four digits, while also making sure you have a particular pattern around it. I would probably look for something that:

    • 4 digits and a capture (\d{4})

    • hyphen -

    • two digits \d{2}

    • hyphen -

    • two digits \d{2}

    Giving: (\d{4})-\d{2}-\d{2}

    Demo:

    >>> import re
    >>> d = re.findall('(\d{4})-\d{2}-\d{2}', 'years since 1250-01-01 0:0:0')
    >>> d
    ['1250']
    >>> d[0]
    '1250'
    

    if you need it as an int, just cast it as such:

    >>> int(d[0])
    1250
    
    0 讨论(0)
  • 2021-01-07 03:26

    There are all sorts of ways to do it, here are several options:

    • dateutil parser in a "fuzzy" mode:

      In [1]: s = 'years since 1250-01-01 0:0:0'
      
      In [2]: from dateutil.parser import parse
      
      In [3]: parse(s, fuzzy=True).year  # resulting year would be an integer
      Out[3]: 1250
      
    • regular expressions with a capturing group:

      In [2]: import re
      
      In [3]: re.search(r"years since (\d{4})", s).group(1)
      Out[3]: '1250'
      
    • splitting by "since" and then by a dash:

      In [2]: s.split("since", 1)[1].split("-", 1)[0].strip()
      Out[2]: '1250'
      
    • or may be even splitting by the first dash and slicing the first substring:

      In [2]: s.split("-", 1)[0][-4:]
      Out[2]: '1250'
      

    The last two involve more "moving parts" and might not be applicable depending on possible variations of the input string.

    0 讨论(0)
提交回复
热议问题