i am trying to extract some data from a dataframe, however following query only extract the first match and ignores the rest of the matches, for example if the entire data i
you can use Series.str.extractall() method:
In [57]: x
Out[57]:
value
0 123 blah blah blah 456 blah blah blah 129kfj blah blah
1 237 blah blah blah 438 blah blah blah 365kfj blah blah
In [58]: x['newCol'] = x['value'].str.extractall(r'(\d{3})').unstack().apply(','.join, 1)
In [59]: x
Out[59]:
value newCol
0 123 blah blah blah 456 blah blah blah 129kfj blah blah 123,456,129
1 237 blah blah blah 438 blah blah blah 365kfj blah blah 237,438,365
UPDATE:
In [77]: x
Out[77]:
value
0 123 blah blah blah, 456 blah blah blah, 129kfj blah blah
1 237 blah blah blah, 438 blah blah blah, 365kfj blah blah
In [78]: x['value'].str.extractall(r'(\d{3})').unstack().apply(','.join, 1)
Out[78]:
0 123,456,129
1 237,438,365
dtype: object