web-scraping

webscraping using python and selenium and tried to use multiprocessing but code not working. without it code works fine

泄露秘密 提交于 2021-02-11 16:56:58
问题 Am doing web scraping with python & selenium. I used to scrape data for one location & year at a time, by creating 1800 .py files (600 places * 3 years = 1800) and batch opening 10 at a time and waiting for it to complete. which is time-consuming so I decided to use multiprocessing. I made my code to read places data from a text file and iterate with it. the text file looks like this Aandimadam Aathur_Dindugal Aathur_Salem East Abiramam Acchirapakkam Adayar Adhiramapattinam Alandur

webscraping using python and selenium and tried to use multiprocessing but code not working. without it code works fine

廉价感情. 提交于 2021-02-11 16:55:15
问题 Am doing web scraping with python & selenium. I used to scrape data for one location & year at a time, by creating 1800 .py files (600 places * 3 years = 1800) and batch opening 10 at a time and waiting for it to complete. which is time-consuming so I decided to use multiprocessing. I made my code to read places data from a text file and iterate with it. the text file looks like this Aandimadam Aathur_Dindugal Aathur_Salem East Abiramam Acchirapakkam Adayar Adhiramapattinam Alandur

EDGAR SEC 10-K Individual Sections Parser

点点圈 提交于 2021-02-11 15:54:01
问题 Do you know of any API (paid or free), tool or python package which can parse individual sections SEC 10-K filings? I'm looking for the individual sections of 10-K filings (e.g. ITEM 1: Business, ITEM 1A: Risk Factors, etc) separated from the entire 10-K filing and preferably cleaned from any page headers (company name), footers (page number) and tables containing mostly numeric data. I've written a parser in python using BeautifulSoup for entire 10-K statements but dividing them into

How to scrape project urls from indiegogo using BeautifulSoup?

江枫思渺然 提交于 2021-02-11 15:40:46
问题 I am trying to scrape project URLs from Indiegogo, but I had no success after hours. I can not scrape them either using XPath or Beautifulsoup. The output of the following code does not contain the information I want: soup.find_all("div") Also, Beutifulsoup did not work: import requests from bs4 import BeautifulSoup url = 'https://www.indiegogo.com/explore/all?project_type=campaign&project_timing=ending_soon&sort=trending' page = requests.get(url) soup = BeautifulSoup(page.text, 'html.parser'

Python Selenium webscraping of Tableau Public: how to assign favourites to workbook?

巧了我就是萌 提交于 2021-02-11 15:31:45
问题 I have written my first Selenium script to practise webscraping in Python. The idea is to scrape all workbooks, views and favourites from a Tableau Public profile. I managed to extract those three key variables, but I don't know how to assign favourites to their respective workbooks since not all workbooks have at least one favourite. For example "Skyler on Broadway" has no favourites, but if I were to match workbooks and favourites in a dictionary, it would pull in the next best value,

Get content from certain tags with certain attributes using BS4

给你一囗甜甜゛ 提交于 2021-02-11 15:31:15
问题 I need to get the content from the following tag with these attributes: <span class="h6 m-0"> . An example of the HTML I'll encounter would be <span class="h6 m-0">Hello world</span> , and it obviously needs to return Hello world . My current code is as follows: page = BeautifulSoup(text, 'html.parser') names = [item["class"] for item in page.find_all('span')] This works fine, and gets me all the spans in the page, but I don't know how to specify that I only want those with the specific class

lxml web-scraping is returning empty values

两盒软妹~` 提交于 2021-02-11 15:21:10
问题 I am trying to get all the food categories from this site https://www.walmart.com/cp/976759 here is snapshot of the category container <div id="cp-center-module-5" class="cp-center-module"><span style="font-size: 0px;"></span><div data-module="FeaturedCategoriesCollapsible" data-module-id="e05783ed-f2bb-44f3-956f-9d7d5286d25b" class="TempoTileCollapsible FeaturedCategoriesCollapsible" data-tl-id="categorypage-FeaturedCategoriesCollapsible"><div class="TempoTileCollapsible-header"><div class=

Extracting table data from website using chrome

笑着哭i 提交于 2021-02-11 15:09:40
问题 i want to extract table data from website with chrome browser with selenium. i wrote below code but it's not working Sub Chartinka() Dim bot As New WebDriver, posts As WebElements, post As WebElement, i As Integer, mysheet As Worksheet, keys As Selenium.keys bot.Start "chrome", "https://chartink.com/screener/buy-15m-78" bot.Get "/" Set posts = bot.FindElementsByXPath("//*[@id='DataTables_Table_0']/tbody/tr[1]") i = 2 Set mysheet = Sheets("Sheet3") For Each post In posts ' Run time Error '438'

Requests-html: error while running on flask

本小妞迷上赌 提交于 2021-02-11 15:01:28
问题 I've prepared a script that was using requests-html which was working fine. I deployed it in the flask app and now it's giving me RuntimeError: There is no current event loop in thread 'Thread-3'. Here's the full error: Traceback (most recent call last): File "C:\Users\intel\AppData\Local\Programs\Python\Python38\Lib\site-packages\flask\app.py", line 2464, in __call__ return self.wsgi_app(environ, start_response) . . . File "C:\Users\intel\Desktop\One page\main.py", line 18, in hello_world r

Problems with data retrieving using Python web scraping

拟墨画扇 提交于 2021-02-11 14:53:04
问题 I wrote a simple code for scraping data from a web page but I mention all the thing like object class with tag but my program does not scrape data. One more thing there is an email that I also want to scrape but not know how to mention its id or class. Could you please guide me - how can I fix this issue? Thanks! Here is my code: import requests from bs4 import BeautifulSoup import csv def get_page(url): response = requests.get(url) if not response.ok: print('server responded:', response