问题
I have a dataframe, one column is a URL, the other is a name. I'm simply trying to add a third column that takes the URL, and creates an HTML link.
The column newsSource
has the Link name, and url
has the URL. For each row in the dataframe, I want to create a column that has:
<a href="[the url]">[newsSource name]</a>
Trying the below throws the error
File "C:\Users\AwesomeMan\Documents\Python\MISC\News Alerts\simple_news.py", line 254, in df['sourceURL'] = df['url'].apply(lambda x: '{1}'.format(x, x[0]['newsSource']))
TypeError: string indices must be integers
df['sourceURL'] = df['url'].apply(lambda x: '<a href="{0}">{1}</a>'.format(x, x['source']))
But I've used x[colName]
before? The below line works fine, it simply creates a column of the source's name:
df['newsSource'] = df['source'].apply(lambda x: x['name'])
Why suddenly ("suddenly" to me) is it saying I can't access the indices?
回答1:
pd.Series.apply has access only to a single series, i.e. the series on which you are calling the method. In other words, the function you supply, irrespective of whether it is named or an anonymous lambda
, will only have access to df['source']
.
To access multiple series by row, you need pd.DataFrame.apply along axis=1
:
def return_link(x):
return '<a href="{0}">{1}</a>'.format(x['url'], x['source'])
df['sourceURL'] = df.apply(return_link, axis=1)
Note there is an overhead associated with passing an entire series in this way; pd.DataFrame.apply
is just a thinly veiled, inefficient loop.
You may find a list comprehension more efficient:
df['sourceURL'] = ['<a href="{0}">{1}</a>'.format(i, j) \
for i, j in zip(df['url'], df['source'])]
Here's a working demo:
df = pd.DataFrame([['BBC', 'http://www.bbc.o.uk']],
columns=['source', 'url'])
def return_link(x):
return '<a href="{0}">{1}</a>'.format(x['url'], x['source'])
df['sourceURL'] = df.apply(return_link, axis=1)
print(df)
source url sourceURL
0 BBC http://www.bbc.o.uk <a href="http://www.bbc.o.uk">BBC</a>
回答2:
With zip and string old school string format
df['sourceURL'] = ['<a href="%s.">%s.</a>' % (x,y) for x , y in zip (df['url'], df['source'])]
This is f-string
[f'<a href="{x}">{y}</a>' for x , y in zip ((df['url'], df['source'])]
来源:https://stackoverflow.com/questions/51564120/typeerror-string-indices-must-be-integers-using-pandas-apply-with-lambda