Pytrends trend results not similar with manually downloaded data

痴心易碎 提交于 2020-07-08 12:54:38

问题


I use pytrends to automatically download data in csv from google trend. The code i used is below. In this case, i am downloading a monthly google trend data from 2008 to present.

from pytrends.request import TrendReq
from urllib.parse import unquote
from dateutil.relativedelta import relativedelta
import datetime
import pytrends

google_username = "xxxxx@gmail.com"
google_password = "xxxxx"

search_term = unquote('%2Fm%2F07gyp7')
google_trend = TrendReq(google_username, google_password, custom_useragent='Pytrends'  )
google_trend_payload = {'gprop' : 'news' , 'q': search_term}
trendresult = TrendReq.trend(google_trend_payload, return_type = 'dataframe')
print(trendresult)

The result from google website for the first 5 months compared with the result from pytrends:

Date          Pytrends data          Manual csv data
2008-01       21.0                   28.0
2008-02       16.0                   19.0
2008-03       16.0                   21.0
2008-04       15.0                   18.0
2008-05       22.0                   31.0

Anyone know the reason? Thank you.


回答1:


I had the same issue so I had to download manually during my project. Now, I have been aware of the reason. It is the sampling methods by google. Each day Google returns a different trend series. Imagine google has 10 millions servers, each day, for each query, it only samples maybe 10 k of its servers. So, in order to get consistent series, you can take 30 (or even 50) times and take the average. For series with values not quite small (maybe over 30 as minimum), the standard deviation is around 5% (acceptable).

The difference between manual and gtrend download may be related to the fact that they are not the same extracting data methods. The gtrend downloads the url of type https://www.google.com/trends/fetchContent.... And I do now know how the manual download is processed but I do know there are another way to extract data, like https://www.google.com/trends/trendsReport.. . The latter returns weekly series for everything (pretty rich).

At the moment, there seems to have quota limit problem.



来源:https://stackoverflow.com/questions/39652907/pytrends-trend-results-not-similar-with-manually-downloaded-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!