Creating a year column in Pandas

前端 未结 2 1584
梦如初夏
梦如初夏 2021-01-22 09:18

I\'m trying to create a year column with the year taken from the title column in my dataframe. This code works, but the column dtype is object. For example, in row 1 the year di

2条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-01-22 10:16

    Instead of re.findall that returns a list of strings, you may use str.extract():

    wine['year'] = wine['title'].str.extract(r'\b(\d{4})\b')
    

    Or, in case you want to only match 1900-2000s years:

    wine['year'] = wine['title'].str.extract(r'\b((?:19|20)\d{2})\b')
    

    Note that the pattern in str.extract must contain at least 1 capturing group, its value will be used to populate the new column. The first match will only be considered, so you might have to precise the context later if need be.

    I suggest using word boundaries \b around the \d{4} pattern to match 4-digit chunks as whole words and avoid partial matches in strings like 1234567890.

提交回复
热议问题