Focusing in on specific results while scraping Twitter with Python and Beautiful Soup 4?

后端 未结 2 1960
孤独总比滥情好
孤独总比滥情好 2021-01-14 17:30

This is a follow up to my post Using Python to Scrape Nested Divs and Spans in Twitter?.

I\'m not using the Twitter API because it doesn\'t look at the tweets by ha

2条回答
  •  抹茶落季
    2021-01-14 18:22

    Alecxe already explained to use the 'href' key to get the value.

    So I'm going to answer the other part of your questions:

    Similarly, the retweets and favorites commands return large chunks of html, when all I really need is the numerical value that is displayed for each one.

    .contents returns a list of all the children. Since you're finding 'buttons' which has several children you're interested in, you can just get them from the following parsed content list:

    retweetcount = retweets[0].contents[3].contents[1].contents[1].string
    

    This will return the value 4.

    If you want a rather more readable approach, try this:

    retweetcount = retweets[0].find_all('span', class_='ProfileTweet-actionCountForPresentation')[0].string
    
    favcount = favorites[0].find_all('span', { 'class' : 'ProfileTweet-actionCountForPresentation')[0].string
    

    This returns 4 and 2 respectively. This works because we convert the ResultSet returned by soup/find_all and get the tag element (using [0]) and recursively find across all it's descendants again using find_all().

    Now you can loop across each tweet and extract this information rather easily.

提交回复
热议问题