I have a string like below
\"‘‘Apple’’ It is create by Steve Jobs (He was fired and get hired) ‘‘Microsoft’’ Bill Gates was the richest man in the world ‘‘Oracle
One option is to use re.findall
with the following pattern:
‘‘(.*?)’’ (.*?)(?= ‘‘|$)
This will capture, in separate groups, the company name and description, for each match found in the input. Note that we use the lookahead (?= ‘‘|$)
as the end of the current description, which either occurs at the start of the next entry, or the end of the input.
inp = "‘‘Apple’’ It is create by Steve Jobs (He was fired and get hired) ‘‘Microsoft’’ Bill Gates was the richest man in the world ‘‘Oracle’’ It is a database company"
matches = re.findall('‘‘(.*?)’’ (.*?)(?= ‘‘|$)', inp)
companyList = [row[0] for row in matches]
descriptionList = [row[1] for row in matches]
print(companyList)
print(descriptionList)
This prints:
['Apple', 'Microsoft', 'Oracle']
['It is create by Steve Jobs (He was fired and get hired)',
'Bill Gates was the richest man in the world', 'It is a database company']