I have a list of strings that I am trying to parse for data that is meaningful to me. I need an ID number that is contained within the string. Sometimes it might be two or even
Based on @alecxe solution you can also do it without any imports.
If your id numbers are always after id
and have a fixed (7) number of digits I would probably just use .split('id ')
to separate it and get the 7 digits from the second block onwards.
You can put them together in the desired format by using '; '.join()
Putting everything together:
pattern = ['; '.join([value[:7] for value in valueList.split('id ')[1:]]) for valueList in lst1]
Which prints out:
['3999595; 3999999', '3998895; 5555456; 3998899']
You can use id\s(\d{7})
regular expression.
Iterate over items in a list and join the results of findall() call by ;
:
import re
lst1 = [
'(Tower 3rd floor window corner_ : option 3_floor cut out_large : GA - floors : : model lines : id 3999595(tower 4rd floor window corner : option 3_floor: : whatever else is in iit " new floor : id 3999999)',
'(Tower 3rd floor window corner_ : option 3_floor cut out_large : GA - floors : : model lines : id 3998895(tower 4rd floor window corner : option 3_floor: : id 5555456 whatever else is in iit " new floor : id 3998899)'
]
pattern = re.compile(r'id\s(\d{7})')
print ["; ".join(pattern.findall(item)) for item in lst1]
prints:
['3999595; 3999999', '3998895; 5555456; 3998899']