TypeError: string indices must be integers using pandas apply with lambda

前端 未结 2 1659
旧巷少年郎
旧巷少年郎 2021-01-13 23:55

I have a dataframe, one column is a URL, the other is a name. I\'m simply trying to add a third column that takes the URL, and creates an HTML link.

The column

相关标签:
2条回答
  • 2021-01-14 00:22

    With zip and string old school string format

    df['sourceURL'] = ['<a href="%s.">%s.</a>' % (x,y) for x , y in zip (df['url'], df['source'])]
    

    This is f-string

    [f'<a href="{x}">{y}</a>' for x , y in zip ((df['url'], df['source'])]
    
    0 讨论(0)
  • 2021-01-14 00:36

    pd.Series.apply has access only to a single series, i.e. the series on which you are calling the method. In other words, the function you supply, irrespective of whether it is named or an anonymous lambda, will only have access to df['source'].

    To access multiple series by row, you need pd.DataFrame.apply along axis=1:

    def return_link(x):
        return '<a href="{0}">{1}</a>'.format(x['url'], x['source'])
    
    df['sourceURL'] = df.apply(return_link, axis=1)
    

    Note there is an overhead associated with passing an entire series in this way; pd.DataFrame.apply is just a thinly veiled, inefficient loop.

    You may find a list comprehension more efficient:

    df['sourceURL'] = ['<a href="{0}">{1}</a>'.format(i, j) \
                       for i, j in zip(df['url'], df['source'])]
    

    Here's a working demo:

    df = pd.DataFrame([['BBC', 'http://www.bbc.o.uk']],
                      columns=['source', 'url'])
    
    def return_link(x):
        return '<a href="{0}">{1}</a>'.format(x['url'], x['source'])
    
    df['sourceURL'] = df.apply(return_link, axis=1)
    
    print(df)
    
      source                  url                              sourceURL
    0    BBC  http://www.bbc.o.uk  <a href="http://www.bbc.o.uk">BBC</a>
    
    0 讨论(0)
提交回复
热议问题