web-scraping

Web scraping an “onclick” object table on a website with python

随声附和 提交于 2021-02-15 07:44:51
问题 I am trying to scrape the data for this link: page. If you click the up arrow you will notice the highlighted days in the month sections. Clicking on a highlighted day, a table with initiated tenders for that day will appear. All I need to do is get the data in each table for each highlighted day in the calendar. There might be one or more tenders (up to max of 7) per day. Table appears on click I have done some web scraping with bs4, however I think that this is a job for selenium (please,

Why is my Jsoup Code not Returning the Correct Elements?

家住魔仙堡 提交于 2021-02-13 05:44:30
问题 I am working on an app in Android Studio and am having some trouble web-scraping with JSoup. I have successfully connected to the webpage and returned some basic elements to test the library, but now I cannot actually get the elements I need for my app. I am trying to get a number of elements with the "data-at" attribute. The weird thing is, a few elements with the "data-at" attribute are returned, but not the ones I am looking for. For whatever reason my code is not extracting all of the

scrape a table with rvest in R that has mismatch table heading

百般思念 提交于 2021-02-11 18:24:35
问题 I'm trying to scrape this table which seems like it would be super simple. Here's the url of the table: https://fantasy.nfl.com/research/scoringleaders?position=1&sort=pts&statCategory=stats&statSeason=2019&statType=weekStats&statWeek=1 Here's what I coded: url <- "https://fantasy.nfl.com/research/scoringleaders?position=1&sort=pts&statCategory=stats&statSeason=2019&statType=weekStats&statWeek=1" x = data.frame(read_html(url) %>% html_nodes("table") %>% html_table()) This works ok but gives

BS4 Searching by Class_ Returning Empty

吃可爱长大的小学妹 提交于 2021-02-11 18:16:12
问题 I currently am successfully scraping the data I need by chaining bs4 .contents together following a find_all('div') , but that seems inherently fragile. I'd like to go directly to the tag I need by class, but my "class_=" search is returning None . I ran the following code on the html below, which returns None : soup = BeautifulSoup(text) # this works fine tag = soup.find(class_ = "loan-section-content") # this returns None Also tried soup.find('div', class_ = "loan-section-content") - also

BS4 Searching by Class_ Returning Empty

梦想的初衷 提交于 2021-02-11 18:16:10
问题 I currently am successfully scraping the data I need by chaining bs4 .contents together following a find_all('div') , but that seems inherently fragile. I'd like to go directly to the tag I need by class, but my "class_=" search is returning None . I ran the following code on the html below, which returns None : soup = BeautifulSoup(text) # this works fine tag = soup.find(class_ = "loan-section-content") # this returns None Also tried soup.find('div', class_ = "loan-section-content") - also

discord.py-rewrite - Dynamic Web Scraping using PyQt5 not working properly

末鹿安然 提交于 2021-02-11 18:10:20
问题 In short, I'm making a discord bot that downloads the "World of the Day" picture in the website https://growtopiagame.com as D:\Kelbot/render.png and then sends the picture to the channel the command was called. However, it is not a static website and the URL is not in the source code, so I found a solution that uses PyQt5: import re import bs4 as bs import sys import urllib.request from PyQt5.QtWebEngineWidgets import QWebEnginePage from PyQt5.QtWidgets import QApplication from PyQt5.QtCore

Scrapy - how to manage pagination without 'Next' button?

烈酒焚心 提交于 2021-02-11 18:03:36
问题 I'm scraping the content of articles from a site like this where there is no 'Next' button to follow. ItemLoader is passed from parse_issue in the response.meta object as well as some additional data like section_name . Here is the function: def parse_article(self, response): self.logger.info('Parse function called parse_article on {}'.format(response.url)) acrobat = response.xpath('//div[@class="txt__lead"]/p[contains(text(), "Plik do pobrania w wersji (pdf) - wymagany Acrobat Reader")]')

Submitting form and reading results using Excel VBA and InternetExplorer

家住魔仙堡 提交于 2021-02-11 17:52:49
问题 I'm submitting a form using Excel VBA while using an InternetExplorer object. Once submitted, I can see the URL change on screen. However, when I attempt to output the URL (to confirm that it changed and the code knows it), I get the same URL. In both debug statements below, they output the same URL. Code: Dim username As String Dim password As String Dim server_ip As String username = "aaa" password = "bbb" server_ip = "ip_here" Dim ie As New InternetExplorer Dim doc As HTMLDocument Set doc

Excel VBA click website button with no name or ID that uses JSON

北城余情 提交于 2021-02-11 17:39:33
问题 I am using excel VBA to open this website: Legal and General, Then it needs to click the "Fund prices and charges" button. Inspecting the web page with chrome, I can see this button has the following code: <div class=" selected tab" tabindex ="0" role="button" aria-pressed="true">...</div> The HTML 'view source' suggests a script type="application/JSON" I'm very confused by all of this. Does anyone know how I can select this button? I have this section of code so far: Set HTMLDoc = .document

i want to print a proper table out of data scrapped using scrapy

匆匆过客 提交于 2021-02-11 17:20:38
问题 so i have written all the code to scrap table from [http://www.rarityguide.com/cbgames_view.php?FirstRecord=21][1] but i am getting output like # the output that i get {'EXG': (['17.00', '10.00', '90.00', '9.00', '13.00', '17.00', '16.00', '43.00', '125.00', '16.00', '11.00', '150.00', '17.00', '24.00', '15.00', '24.00', '21.00', '36.00', '270.00', '280.00'],), 'G': ['8.00', '5.00', '38.00', '2.00', '6.00', '7.00', '6.00', '20.00', '40.00', '7.00', '5.00', '70.00', '6.00', '12.00', '7.00',