问题
import urllib.request
import re
import csv
import pandas as pd
from bs4 import BeautifulSoup
columns = []
data = []
f = open('companylist.csv')
csv_f = csv.reader(f)
for row in csv_f:
stocklist = row
print(stocklist)
for s in stocklist:
print('http://finance.yahoo.com/q?s='+s)
optionsUrl = urllib.request.urlopen('http://finance.yahoo.com/q?s='+s).read()
soup = BeautifulSoup(optionsUrl, "html.parser")
stocksymbol = ['Symbol:', s]
optionsTable = [stocksymbol]+[
[x.text for x in y.parent.contents]
for y in soup.findAll('td', attrs={'class': 'yfnc_tabledata1','rtq_table': ''})
]
if not columns:
columns = [o[0] for o in optionsTable] #list(my_df.loc[0])
data.append(o[1] for o in optionsTable)
# create DataFrame from data
df = pd.DataFrame(data, columns=columns)
df.to_csv('test.csv', index=False)
The scripts works fine when I have about 200 to 300 stocks, but my company list has around 6000 symbols.
- Is there a way I can download chunks of data, say like 200 stocks at a time, pause for while, and then resume the download again?
- The export is one stock at a time; how do I write 200 at a time, and append the next batch to the initial batch (for the CSV)?
回答1:
As @Merlin has recommended you - take a closer look at pandas_datareader
module - you can do a LOT using this tool. Here is a small example:
import csv
import pandas_datareader.data as data
from pandas_datareader.yahoo.quotes import _yahoo_codes
stocklist = ['aapl','goog','fb','amzn','COP']
#http://www.jarloo.com/yahoo_finance/
#https://greenido.wordpress.com/2009/12/22/yahoo-finance-hidden-api/
_yahoo_codes.update({'Market Cap': 'j1'})
_yahoo_codes.update({'Div Yield': 'y'})
_yahoo_codes.update({'Bid': 'b'})
_yahoo_codes.update({'Ask': 'a'})
_yahoo_codes.update({'Prev Close': 'p'})
_yahoo_codes.update({'Open': 'o'})
_yahoo_codes.update({'1 yr Target Price': 't8'})
_yahoo_codes.update({'Earnings/Share': 'e'})
_yahoo_codes.update({"Day’s Range": 'm'})
_yahoo_codes.update({'52-week Range': 'w'})
_yahoo_codes.update({'Volume': 'v'})
_yahoo_codes.update({'Avg Daily Volume': 'a2'})
_yahoo_codes.update({'EPS Est Current Year': 'e7'})
_yahoo_codes.update({'EPS Est Next Quarter': 'e9'})
data.get_quote_yahoo(stocklist).to_csv('test.csv', index=False, quoting=csv.QUOTE_NONNUMERIC)
Output: i've intentionally transposed the result set, because there are too many columns to show them here
In [2]: data.get_quote_yahoo(stocklist).transpose()
Out[2]:
aapl goog fb amzn COP
1 yr Target Price 124.93 924.83 142.87 800.92 51.23
52-week Range 89.47 - 132.97 515.18 - 789.87 72.000 - 121.080 422.6400 - 731.5000 31.0500 - 64.1300
Ask 97.61 718.75 114.58 716.73 44.04
Avg Daily Volume 3.81601e+07 1.75567e+06 2.56467e+07 3.94018e+06 8.94779e+06
Bid 97.6 718.57 114.57 716.65 44.03
Day’s Range 97.10 - 99.12 716.51 - 725.44 113.310 - 115.480 711.1600 - 721.9900 43.8000 - 44.9600
Div Yield 2.31 N/A N/A N/A 4.45
EPS Est Current Year 8.28 33.6 3.55 5.39 -2.26
EPS Est Next Quarter 1.66 8.38 0.87 0.96 -0.48
Earnings/Share 8.98 24.58 1.635 2.426 -4.979
Market Cap 534.65B 493.46B 327.71B 338.17B 54.53B
Open 98.6 716.51 115 713.37 43.96
PE 10.87 29.25 70.074 295.437 N/A
Prev Close 98.83 719.41 116.62 717.91 44.51
Volume 3.07086e+07 868366 2.70182e+07 2.42218e+06 5.20412e+06
change_pct -1.23% -0.09% -1.757% -0.1644% -1.0782%
last 97.61 718.75 114.571 716.73 44.0301
short_ratio 1.18 1.41 0.81 1.29 1.88
time 3:15pm 3:15pm 3:15pm 3:15pm 3:15pm
If you need more fields (codes for Yahoo Finance API) you may want to check the following links:
http://www.jarloo.com/yahoo_finance/
https://greenido.wordpress.com/2009/12/22/yahoo-finance-hidden-api/
回答2:
Use python_datareader
for this.
In [1]: import pandas_datareader.data as web
In [2]: import datetime
In [3]: start = datetime.datetime(2010, 1, 1)
In [4]: end = datetime.datetime(2013, 1, 27)
In [5]: f = web.DataReader("F", 'yahoo', start, end)
In [6]: f.ix['2010-01-04']
Out[6]:
Open 10.170000
High 10.280000
Low 10.050000
Close 10.280000
Volume 60855800.000000
Adj Close 9.151094
Name: 2010-01-04 00:00:00, dtype: float64
回答3:
To pause after every 200 downloads, you could - also when you use pandas_datareader
:
import time
for i, s in enumerate(stocklist):
if i % 200 == 0:
time.sleep(5) # in seconds
To save all data into a single file (IIUC):
stocks = pd.DataFrame() # to collect all results
In every iteration:
stocks = pd.concat([stocks, pd.DataFrame(data, columns=columns))
Finally:
stocks.to_csv(path, index=False)
来源:https://stackoverflow.com/questions/37794874/pause-url-request-downloads