问题
I have written my first Selenium script to practise webscraping in Python. The idea is to scrape all workbooks, views and favourites from a Tableau Public profile. I managed to extract those three key variables, but I don't know how to assign favourites to their respective workbooks since not all workbooks have at least one favourite.
For example "Skyler on Broadway" has no favourites, but if I were to match workbooks and favourites in a dictionary, it would pull in the next best value, namely 4.
f.text != "" only removes empty values at the end of the list.
What's the best way to approach this problem?
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
driver = webdriver.Chrome(executable_path=r',mypath')
driver.get("https://public.tableau.com/profile/skybjohnson#!/")
#load entire website:
while True:
try:
show_more = WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.ID, "load-more-vizzes")))
driver.find_element_by_id("load-more-vizzes")
driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.ID, "load-more-vizzes")))
except Exception as e:
print(e)
break
#get workbook titles
titles = driver.find_elements_by_class_name("workbook-title")
workbook_titles = [i.text for i in titles if i.text != ""]
print(workbook_titles)
#get number of views per workbook
views = driver.find_elements_by_class_name('workbook-view-count')
workbook_views = [int(v.text.split()[0]) for v in views if v.text != ""]
print(workbook_views)
#get number of favourites per workbook
favs = driver.find_elements_by_xpath('//SPAN[@ng-bind="controller.workbook.numberOfFavorites"]')
workbook_favs = [f.text for f in favs if f.text != ""]
print(workbook_favs)
回答1:
First you can get all Vizzes and then get children title, views and favorites. Also you have to check if views count and favorites are exist. You can find improved scroll and correct way to get views count (0 if no views) and favorites (0 if no favorites):
wait = WebDriverWait(driver, 10)
with driver:
driver.get("https://public.tableau.com/profile/skybjohnson#!/")
wait.until(EC.presence_of_element_located((By.ID, "load-more-vizzes")))
while driver.find_element_by_id("load-more-vizzes").is_displayed():
driver.execute_script("arguments[0].scrollIntoView()", driver.find_element_by_id("load-more-vizzes"))
vizzes = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".viz-container li.media-viz")))
for viz in vizzes:
if not viz.is_displayed():
continue
title = viz.find_element_by_css_selector('[ng-bind="controller.workbook.title"]').text
views_count_list = viz.find_elements_by_css_selector('[ng-bind="controller.workbook.viewCount"]')
views_count = views_count_list[0].text if len(views_count_list) > 0 else 0
number_of_favorites_list = viz.find_elements_by_css_selector('[ng-bind="controller.workbook.numberOfFavorites"]')
number_of_favorites = number_of_favorites_list[0].text if len(number_of_favorites_list) > 0 else 0
print(title, views_count, number_of_favorites)
来源:https://stackoverflow.com/questions/59831517/python-selenium-webscraping-of-tableau-public-how-to-assign-favourites-to-workb