By Beautiful Soup i scrape twitter data. I am able to get data but can't save in csv file

天大地大妈咪最大 提交于 2020-01-23 17:31:06

问题


I scraped Twitter for user name, Tweets, replies, retweets but can't save in a CSV file.

Here is the code:

from urllib.request import urlopen
from bs4 import BeautifulSoup

file = "5_twitterBBC.csv"
f = open(file, "w")
Headers = "tweet_user, tweet_text,  replies,  retweets\n"
f.write(Headers)
for page in range(0,5):
    url = "https://twitter.com/BBCWorld".format(page)
    html = urlopen(url)
    soup = BeautifulSoup(html,"html.parser")
    tweets = soup.find_all("div", {"class":"js-stream-item"})
    for tweet in tweets:
        try:
            if tweet.find('p',{"class":'tweet-text'}):
             tweet_user = tweet.find('span',{"class":'username'}).text.strip()
             tweet_text = tweet.find('p',{"class":'tweet-text'}).text.encode('utf8').strip()
             replies = tweet.find('span',{"class":"ProfileTweet-actionCount"}).text.strip()
             retweets = tweet.find('span', {"class" : "ProfileTweet-action--retweet"}).text.strip()
             print(tweet_user, tweet_text,  replies,  retweets)
             f.write("{}".format(tweet_user).replace(",","|")+ ",{}".format(tweet_text)+ ",{}".format( replies).replace(",", " ")+ ",{}".format(retweets) +  "\n")
        except: AttributeError
f.close()

I get data but can't save in CSV file. Someone explain me how to save data in CSV file.


回答1:


As you can see, you've only made a small error in finding the tweets here tweets = soup.find_all("div", {"class":"js-stream-item"}), you forgot to pass on the argument key name which should be like this tweets = soup.find_all("div", attrs={"class":"js-stream-item"})

This is a working solution but it only fetches the first 20 tweets

from urllib.request import urlopen
from bs4 import BeautifulSoup
file = "5_twitterBBC.csv"
f = open(file, "w")
Headers = "tweet_user, tweet_text,  replies,  retweets\n"
f.write(Headers)
url = "https://twitter.com/BBCWorld"
html = urlopen(url)
soup = BeautifulSoup(html, "html.parser")

# Gets the tweet
tweets = soup.find_all("li", attrs={"class":"js-stream-item"})

# Writes tweet fetched in file
for tweet in tweets:
    try:
        if tweet.find('p',{"class":'tweet-text'}):
            tweet_user = tweet.find('span',{"class":'username'}).text.strip()
            tweet_text = tweet.find('p',{"class":'tweet-text'}).text.encode('utf8').strip()
            replies = tweet.find('span',{"class":"ProfileTweet-actionCount"}).text.strip()
            retweets = tweet.find('span', {"class" : "ProfileTweet-action--retweet"}).text.strip()
            # String interpolation technique
            f.write(f'{tweet_user},/^{tweet_text}$/,{replies},{retweets}\n')
    except: AttributeError
f.close()



回答2:


filename = "output.csv"
f = open(filename, "w",encoding="utf-8")
headers = " tweet_user, tweet_text, replies, retweets \n"
f.write(headers)

***your code***

      ***loop****

     f.write(''.join(tweet_user + [","] + tweet_text + [","] + replies + [","] + retweets + [","] + ["\n"]) )
f.close()


来源:https://stackoverflow.com/questions/52103888/by-beautiful-soup-i-scrape-twitter-data-i-am-able-to-get-data-but-cant-save-in

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!