Extracting year from string in python

后端未结

关注

 3  1319

How can I parse the foll. in python to extract the year:

\'years since 1250-01-01 0:0:0\'

The answer should be 1250

相关标签:

3条回答

长发绾君心

2021-01-07 03:14
The following regex should make the four digit year available as the first capture group:
```
^.*\(d{4})-\d{2}-\d{2}.*$
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
遇见更好的自我

2021-01-07 03:25
You can use a regex with a capture group around the four digits, while also making sure you have a particular pattern around it. I would probably look for something that:
- 4 digits and a capture (\d{4})
- hyphen -
- two digits \d{2}
- hyphen -
- two digits \d{2}
Giving: (\d{4})-\d{2}-\d{2}

Demo:
```
>>> import re
>>> d = re.findall('(\d{4})-\d{2}-\d{2}', 'years since 1250-01-01 0:0:0')
>>> d
['1250']
>>> d[0]
'1250'
```
if you need it as an int, just cast it as such:
```
>>> int(d[0])
1250
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
失恋的感觉

2021-01-07 03:26
There are all sorts of ways to do it, here are several options:
- dateutil parser in a "fuzzy" mode:
```
In [1]: s = 'years since 1250-01-01 0:0:0'

In [2]: from dateutil.parser import parse

In [3]: parse(s, fuzzy=True).year  # resulting year would be an integer
Out[3]: 1250
```
- regular expressions with a capturing group:
```
In [2]: import re

In [3]: re.search(r"years since (\d{4})", s).group(1)
Out[3]: '1250'
```
- splitting by "since" and then by a dash:
```
In [2]: s.split("since", 1)[1].split("-", 1)[0].strip()
Out[2]: '1250'
```
- or may be even splitting by the first dash and slicing the first substring:
```
In [2]: s.split("-", 1)[0][-4:]
Out[2]: '1250'
```
The last two involve more "moving parts" and might not be applicable depending on possible variations of the input string.
0 讨论(0)
发布评论:

提交评论
- 加载中...