web-scraping | 易学教程

webscraping using python and selenium and tried to use multiprocessing but code not working. without it code works fine

阅读更多关于 webscraping using python and selenium and tried to use multiprocessing but code not working. without it code works fine

问题 Am doing web scraping with python & selenium. I used to scrape data for one location & year at a time, by creating 1800 .py files (600 places * 3 years = 1800) and batch opening 10 at a time and waiting for it to complete. which is time-consuming so I decided to use multiprocessing. I made my code to read places data from a text file and iterate with it. the text file looks like this Aandimadam Aathur_Dindugal Aathur_Salem East Abiramam Acchirapakkam Adayar Adhiramapattinam Alandur

webscraping using python and selenium and tried to use multiprocessing but code not working. without it code works fine

阅读更多关于 webscraping using python and selenium and tried to use multiprocessing but code not working. without it code works fine

EDGAR SEC 10-K Individual Sections Parser

阅读更多关于 EDGAR SEC 10-K Individual Sections Parser

问题 Do you know of any API (paid or free), tool or python package which can parse individual sections SEC 10-K filings? I'm looking for the individual sections of 10-K filings (e.g. ITEM 1: Business, ITEM 1A: Risk Factors, etc) separated from the entire 10-K filing and preferably cleaned from any page headers (company name), footers (page number) and tables containing mostly numeric data. I've written a parser in python using BeautifulSoup for entire 10-K statements but dividing them into

How to scrape project urls from indiegogo using BeautifulSoup?

阅读更多关于 How to scrape project urls from indiegogo using BeautifulSoup?

问题 I am trying to scrape project URLs from Indiegogo, but I had no success after hours. I can not scrape them either using XPath or Beautifulsoup. The output of the following code does not contain the information I want: soup.find_all("div") Also, Beutifulsoup did not work: import requests from bs4 import BeautifulSoup url = 'https://www.indiegogo.com/explore/all?project_type=campaign&project_timing=ending_soon&sort=trending' page = requests.get(url) soup = BeautifulSoup(page.text, 'html.parser'

Python Selenium webscraping of Tableau Public: how to assign favourites to workbook?

阅读更多关于 Python Selenium webscraping of Tableau Public: how to assign favourites to workbook?

问题 I have written my first Selenium script to practise webscraping in Python. The idea is to scrape all workbooks, views and favourites from a Tableau Public profile. I managed to extract those three key variables, but I don't know how to assign favourites to their respective workbooks since not all workbooks have at least one favourite. For example "Skyler on Broadway" has no favourites, but if I were to match workbooks and favourites in a dictionary, it would pull in the next best value,

Get content from certain tags with certain attributes using BS4

阅读更多关于 Get content from certain tags with certain attributes using BS4

问题 I need to get the content from the following tag with these attributes: <span class="h6 m-0"> . An example of the HTML I'll encounter would be <span class="h6 m-0">Hello world</span> , and it obviously needs to return Hello world . My current code is as follows: page = BeautifulSoup(text, 'html.parser') names = [item["class"] for item in page.find_all('span')] This works fine, and gets me all the spans in the page, but I don't know how to specify that I only want those with the specific class

lxml web-scraping is returning empty values

阅读更多关于 lxml web-scraping is returning empty values

问题 I am trying to get all the food categories from this site https://www.walmart.com/cp/976759 here is snapshot of the category container <div id="cp-center-module-5" class="cp-center-module"><span style="font-size: 0px;"></span><div data-module="FeaturedCategoriesCollapsible" data-module-id="e05783ed-f2bb-44f3-956f-9d7d5286d25b" class="TempoTileCollapsible FeaturedCategoriesCollapsible" data-tl-id="categorypage-FeaturedCategoriesCollapsible"><div class="TempoTileCollapsible-header"><div class=

Extracting table data from website using chrome

阅读更多关于 Extracting table data from website using chrome

问题 i want to extract table data from website with chrome browser with selenium. i wrote below code but it's not working Sub Chartinka() Dim bot As New WebDriver, posts As WebElements, post As WebElement, i As Integer, mysheet As Worksheet, keys As Selenium.keys bot.Start "chrome", "https://chartink.com/screener/buy-15m-78" bot.Get "/" Set posts = bot.FindElementsByXPath("//*[@id='DataTables_Table_0']/tbody/tr[1]") i = 2 Set mysheet = Sheets("Sheet3") For Each post In posts ' Run time Error '438'

Requests-html: error while running on flask

阅读更多关于 Requests-html: error while running on flask

问题 I've prepared a script that was using requests-html which was working fine. I deployed it in the flask app and now it's giving me RuntimeError: There is no current event loop in thread 'Thread-3'. Here's the full error: Traceback (most recent call last): File "C:\Users\intel\AppData\Local\Programs\Python\Python38\Lib\site-packages\flask\app.py", line 2464, in __call__ return self.wsgi_app(environ, start_response) . . . File "C:\Users\intel\Desktop\One page\main.py", line 18, in hello_world r

Problems with data retrieving using Python web scraping

阅读更多关于 Problems with data retrieving using Python web scraping

问题 I wrote a simple code for scraping data from a web page but I mention all the thing like object class with tag but my program does not scrape data. One more thing there is an email that I also want to scrape but not know how to mention its id or class. Could you please guide me - how can I fix this issue? Thanks! Here is my code: import requests from bs4 import BeautifulSoup import csv def get_page(url): response = requests.get(url) if not response.ok: print('server responded:', response