beautifulsoup

BS4 Searching by Class_ Returning Empty

吃可爱长大的小学妹 提交于 2021-02-11 18:16:12
问题 I currently am successfully scraping the data I need by chaining bs4 .contents together following a find_all('div') , but that seems inherently fragile. I'd like to go directly to the tag I need by class, but my "class_=" search is returning None . I ran the following code on the html below, which returns None : soup = BeautifulSoup(text) # this works fine tag = soup.find(class_ = "loan-section-content") # this returns None Also tried soup.find('div', class_ = "loan-section-content") - also

BS4 Searching by Class_ Returning Empty

梦想的初衷 提交于 2021-02-11 18:16:10
问题 I currently am successfully scraping the data I need by chaining bs4 .contents together following a find_all('div') , but that seems inherently fragile. I'd like to go directly to the tag I need by class, but my "class_=" search is returning None . I ran the following code on the html below, which returns None : soup = BeautifulSoup(text) # this works fine tag = soup.find(class_ = "loan-section-content") # this returns None Also tried soup.find('div', class_ = "loan-section-content") - also

Is it possible to just get the tags without a class or id with BeautifulSoup?

元气小坏坏 提交于 2021-02-11 18:16:01
问题 I have several thousands HTML sites and I am trying to filter the text from these sites. I am doing this with beautiful soup. get_text() gives me to much unecessary information from these sites. Therefore I wrote a loop: l = [] for line in text5: soup = bs(line, 'html.parser') p_text = ' '.join(p.text for p in soup.find_all('p')) k = p_text.replace('\n', '') l.append(k) But this loop gives me everything that was in a tag that starts with <p . For example: I want everything between two plain

how to use selenium to go from one url tab to another before scraping?

夙愿已清 提交于 2021-02-11 17:01:34
问题 I have created the following code in hopes to open up a new tab with a few parameters and then scrape the data table that is on the new tab. #Open Webpage url = "https://www.website.com" driver=webdriver.Chrome(executable_path=r"C:\mypathto\chromedriver.exe") driver.get(url) #Click Necessary Parameters driver.find_element_by_partial_link_text('Output').click() driver.find_element_by_xpath('//*[@id="flexOpt"]/table/tbody/tr/td[2]/input[3]').click() driver.find_element_by_xpath('//*[@id=

how to use selenium to go from one url tab to another before scraping?

二次信任 提交于 2021-02-11 17:01:02
问题 I have created the following code in hopes to open up a new tab with a few parameters and then scrape the data table that is on the new tab. #Open Webpage url = "https://www.website.com" driver=webdriver.Chrome(executable_path=r"C:\mypathto\chromedriver.exe") driver.get(url) #Click Necessary Parameters driver.find_element_by_partial_link_text('Output').click() driver.find_element_by_xpath('//*[@id="flexOpt"]/table/tbody/tr/td[2]/input[3]').click() driver.find_element_by_xpath('//*[@id=

how to use selenium to go from one url tab to another before scraping?

时光毁灭记忆、已成空白 提交于 2021-02-11 17:00:44
问题 I have created the following code in hopes to open up a new tab with a few parameters and then scrape the data table that is on the new tab. #Open Webpage url = "https://www.website.com" driver=webdriver.Chrome(executable_path=r"C:\mypathto\chromedriver.exe") driver.get(url) #Click Necessary Parameters driver.find_element_by_partial_link_text('Output').click() driver.find_element_by_xpath('//*[@id="flexOpt"]/table/tbody/tr/td[2]/input[3]').click() driver.find_element_by_xpath('//*[@id=

python requests not getting full page

巧了我就是萌 提交于 2021-02-11 16:52:08
问题 """THIS IS MY CODE """ import requests from bs4 import BeautifulSoup import random from selenium import webdriver url ="http://www.yopmail.com/en/?smith" request = requests.get(url) soup = BeautifulSoup(request.text, 'html5lib') print(soup) """IT RETURNING THIS OUTPUT """ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"><head> <meta content="text/html; charset=utf-8" http-equiv=

python requests not getting full page

♀尐吖头ヾ 提交于 2021-02-11 16:49:24
问题 """THIS IS MY CODE """ import requests from bs4 import BeautifulSoup import random from selenium import webdriver url ="http://www.yopmail.com/en/?smith" request = requests.get(url) soup = BeautifulSoup(request.text, 'html5lib') print(soup) """IT RETURNING THIS OUTPUT """ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"><head> <meta content="text/html; charset=utf-8" http-equiv=

AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of elements like a single element

删除回忆录丶 提交于 2021-02-11 16:41:30
问题 I got the following list of lists from parsing with Bs4 through the snippet: details = [i.find_all('span', {'class':re.compile('item')}) for i in cars] [[<span class="item">Red <small>col.</small></span>, <span class="item">120 <small>cc.</small></span>, <span class="item">Available <small>in four days</small></span>, <span class="item"><small class="txt-highlight-red">15 min</small></span>], [<span class="item">Blue <small>col.</small></span>, <span class="item">200 <small>cc.</small></span>

get financial data using Python

点点圈 提交于 2021-02-11 16:31:43
问题 I have managed to write some Python code and Selenium that navigates to a webpage that contains financial data that is in some tables. I want to be able to extract the data and put it into excel. The tables seem to be html based tables code below: <tr> <td class="bc2T bc2gt">Last update</td> <td class="bc2V bc2D">03/15/2018</td><td class="bc2V bc2D">03/14/2019</td><td class="bc2V bc2D">03/12/2020</td><td class="bc2V bc2D" style="background-color:#DEFEFE;">05/22/2020</td><td class="bc2V bc2D"