web-scraping | 易学教程

Web scraping an “onclick” object table on a website with python

阅读更多关于 Web scraping an “onclick” object table on a website with python

问题 I am trying to scrape the data for this link: page. If you click the up arrow you will notice the highlighted days in the month sections. Clicking on a highlighted day, a table with initiated tenders for that day will appear. All I need to do is get the data in each table for each highlighted day in the calendar. There might be one or more tenders (up to max of 7) per day. Table appears on click I have done some web scraping with bs4, however I think that this is a job for selenium (please,

Why is my Jsoup Code not Returning the Correct Elements?

阅读更多关于 Why is my Jsoup Code not Returning the Correct Elements?

问题 I am working on an app in Android Studio and am having some trouble web-scraping with JSoup. I have successfully connected to the webpage and returned some basic elements to test the library, but now I cannot actually get the elements I need for my app. I am trying to get a number of elements with the "data-at" attribute. The weird thing is, a few elements with the "data-at" attribute are returned, but not the ones I am looking for. For whatever reason my code is not extracting all of the

scrape a table with rvest in R that has mismatch table heading

阅读更多关于 scrape a table with rvest in R that has mismatch table heading

问题 I'm trying to scrape this table which seems like it would be super simple. Here's the url of the table: https://fantasy.nfl.com/research/scoringleaders?position=1&sort=pts&statCategory=stats&statSeason=2019&statType=weekStats&statWeek=1 Here's what I coded: url <- "https://fantasy.nfl.com/research/scoringleaders?position=1&sort=pts&statCategory=stats&statSeason=2019&statType=weekStats&statWeek=1" x = data.frame(read_html(url) %>% html_nodes("table") %>% html_table()) This works ok but gives

BS4 Searching by Class_ Returning Empty

阅读更多关于 BS4 Searching by Class_ Returning Empty

问题 I currently am successfully scraping the data I need by chaining bs4 .contents together following a find_all('div') , but that seems inherently fragile. I'd like to go directly to the tag I need by class, but my "class_=" search is returning None . I ran the following code on the html below, which returns None : soup = BeautifulSoup(text) # this works fine tag = soup.find(class_ = "loan-section-content") # this returns None Also tried soup.find('div', class_ = "loan-section-content") - also

BS4 Searching by Class_ Returning Empty

阅读更多关于 BS4 Searching by Class_ Returning Empty

discord.py-rewrite - Dynamic Web Scraping using PyQt5 not working properly

阅读更多关于 discord.py-rewrite - Dynamic Web Scraping using PyQt5 not working properly

问题 In short, I'm making a discord bot that downloads the "World of the Day" picture in the website https://growtopiagame.com as D:\Kelbot/render.png and then sends the picture to the channel the command was called. However, it is not a static website and the URL is not in the source code, so I found a solution that uses PyQt5: import re import bs4 as bs import sys import urllib.request from PyQt5.QtWebEngineWidgets import QWebEnginePage from PyQt5.QtWidgets import QApplication from PyQt5.QtCore

Scrapy - how to manage pagination without 'Next' button?

阅读更多关于 Scrapy - how to manage pagination without 'Next' button?

问题 I'm scraping the content of articles from a site like this where there is no 'Next' button to follow. ItemLoader is passed from parse_issue in the response.meta object as well as some additional data like section_name . Here is the function: def parse_article(self, response): self.logger.info('Parse function called parse_article on {}'.format(response.url)) acrobat = response.xpath('//div[@class="txt__lead"]/p[contains(text(), "Plik do pobrania w wersji (pdf) - wymagany Acrobat Reader")]')

Submitting form and reading results using Excel VBA and InternetExplorer

阅读更多关于 Submitting form and reading results using Excel VBA and InternetExplorer

问题 I'm submitting a form using Excel VBA while using an InternetExplorer object. Once submitted, I can see the URL change on screen. However, when I attempt to output the URL (to confirm that it changed and the code knows it), I get the same URL. In both debug statements below, they output the same URL. Code: Dim username As String Dim password As String Dim server_ip As String username = "aaa" password = "bbb" server_ip = "ip_here" Dim ie As New InternetExplorer Dim doc As HTMLDocument Set doc

Excel VBA click website button with no name or ID that uses JSON

阅读更多关于 Excel VBA click website button with no name or ID that uses JSON

问题 I am using excel VBA to open this website: Legal and General, Then it needs to click the "Fund prices and charges" button. Inspecting the web page with chrome, I can see this button has the following code: <div class=" selected tab" tabindex ="0" role="button" aria-pressed="true">...</div> The HTML 'view source' suggests a script type="application/JSON" I'm very confused by all of this. Does anyone know how I can select this button? I have this section of code so far: Set HTMLDoc = .document

i want to print a proper table out of data scrapped using scrapy

阅读更多关于 i want to print a proper table out of data scrapped using scrapy

问题 so i have written all the code to scrap table from [http://www.rarityguide.com/cbgames_view.php?FirstRecord=21][1] but i am getting output like # the output that i get {'EXG': (['17.00', '10.00', '90.00', '9.00', '13.00', '17.00', '16.00', '43.00', '125.00', '16.00', '11.00', '150.00', '17.00', '24.00', '15.00', '24.00', '21.00', '36.00', '270.00', '280.00'],), 'G': ['8.00', '5.00', '38.00', '2.00', '6.00', '7.00', '6.00', '20.00', '40.00', '7.00', '5.00', '70.00', '6.00', '12.00', '7.00',