问题
I am trying to retrieve the list of available option expiries for a given ticker on yahoo finance. For instance using SPY as ticker on https://finance.yahoo.com/quote/SPY/options
The list of expiries are in the drop down list:
<div class="Fl(start) Pend(18px) option-contract-control drop-down-selector" data-reactid="4">
<select class="Fz(s)" data-reactid="5">
<option selected="" value="1576627200" data-reactid="6">December 18, 2019</option>
<option value="1576800000" data-reactid="7">December 20, 2019</option>
<option value="1577059200" data-reactid="8">December 23, 2019</option>
...
< / select >
< / div >
Using the div class name (or the select class name, but there seems to be several of these on the page), I get the list of values as a single string of concatenated expiries.
My function (I pass on ticker='SPY' from the main function):
def get_list_expiries(ticker):
browser = webdriver.Chrome()
options_url = "https://finance.yahoo.com/quote/" + str(ticker) + "/options"
browser.get(options_url)
html_source = browser.page_source
soup = BeautifulSoup(html_source, 'html.parser')
expiries_dt = []
for exp in soup.find_all(class_="Fl(start) Pend(18px) option-contract-control drop-down-selector"):
expiries_dt.append(exp.text)
browser.quit()
return expiries_dt
This produces:
['December 18, 2019December 20, 2019December 23, 2019December 24, 2019December 27, 2019December 30, 2019...']
I understand I need to use selenium for this but I can't figure out how. The result is always a list of a single string. Ideally I would like to return two lists: one with the unix datestamp (option value="1576627200") and another list with the 'normal' dates (ie 18/12/2019).
Any help will be greatly appreciated.
回答1:
To extract the unix datestamp and Expiry Dates you have to induce WebDriverWait and you can use the following Locator Strategies:
Code Block:
from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.ui import Select options = webdriver.ChromeOptions() options.add_argument("start-maximized") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe') driver.get('https://finance.yahoo.com/quote/SPY/options') select = Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.option-contract-control.drop-down-selector>select")))) print("Unix datestamp: ") print([option.get_attribute("value") for option in select.options]) print("Dates: ") print([option.get_attribute("innerHTML") for option in select.options])
Console Output:
Unix datestamp: ['1576627200', '1576800000', '1577059200', '1577145600', '1577404800', '1577664000', '1577750400', '1578009600', '1578268800', '1578441600', '1578614400', '1578873600', '1579046400', '1579219200', '1579564800', '1579824000', '1580428800', '1582243200', '1584662400', '1585612800', '1587081600', '1589500800', '1592524800', '1593475200', '1594944000', '1600387200', '1601424000', '1602806400', '1605830400', '1606780800', '1608249600', '1610668800', '1616112000', '1623974400', '1631836800', '1639699200', '1642723200'] Dates: ['December 18, 2019', 'December 20, 2019', 'December 23, 2019', 'December 24, 2019', 'December 27, 2019', 'December 30, 2019', 'December 31, 2019', 'January 3, 2020', 'January 6, 2020', 'January 8, 2020', 'January 10, 2020', 'January 13, 2020', 'January 15, 2020', 'January 17, 2020', 'January 21, 2020', 'January 24, 2020', 'January 31, 2020', 'February 21, 2020', 'March 20, 2020', 'March 31, 2020', 'April 17, 2020', 'May 15, 2020', 'June 19, 2020', 'June 30, 2020', 'July 17, 2020', 'September 18, 2020', 'September 30, 2020', 'October 16, 2020', 'November 20, 2020', 'December 1, 2020', 'December 18, 2020', 'January 15, 2021', 'March 19, 2021', 'June 18, 2021', 'September 17, 2021', 'December 17, 2021', 'January 21, 2022']
回答2:
try use SimplifiedDoc, It's a library for extraction
from simplified_scrapy.simplified_doc import SimplifiedDoc
html='''<div class="Fl(start) Pend(18px) option-contract-control drop-down-selector" data-reactid="4">
<select class="Fz(s)" data-reactid="5">
<option selected="" value="1576627200" data-reactid="6">December 18, 2019</option>
<option value="1576800000" data-reactid="7">December 20, 2019</option>
<option value="1577059200" data-reactid="8">December 23, 2019</option>
...
</select>
</div>
'''
doc = SimplifiedDoc(html)
div = doc.getElementByClass('Fl(start) Pend(18px) option-contract-control drop-down-selector')
options = div.options # get all options
expiries_dt = [option.html for option in options]
print (expiries_dt) # ['December 18, 2019', 'December 20, 2019', 'December 23, 2019']
回答3:
You don't need selenium for this bit at least (and to be honest for most Yahoo finance info it is overkill). You can regex out timestamps from response text (converting string representation of list returned to actual list with ast) and use datetime module to convert to required date format.
import requests, re, ast
from datetime import datetime
r = requests.get('https://finance.yahoo.com/quote/SPY/options?guccounter=1')
p = re.compile(r'"expirationDates":(\[.*?\])')
timestamps = ast.literal_eval(p.findall(r.text)[0])
dates = [datetime.utcfromtimestamp(ts).strftime("%B %d, %Y") for ts in timestamps]
Regex explanation:
Datetime conversions:
- See discussion by @jfs which is where I saw
utcfromtimestamp
originally - strftime
来源:https://stackoverflow.com/questions/59401010/how-to-retrieve-the-list-of-values-from-a-drop-down-list