screen-scraping

How to send JavaScript and Cookies Enabled in Scrapy?

╄→尐↘猪︶ㄣ 提交于 2020-07-05 07:20:09
问题 I am scraping a website using Scrapy which require cooking and java-script to be enabled. I don't think I will have to actually process javascript. All I need is to pretend as if javascript is enabled. Here is what I have tried: 1) Enable Cookies through following in settings COOKIES_ENABLED = True COOKIES_DEBUG = True 2) Using download middleware for cookies DOWNLOADER_MIDDLEWARES = { 'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': 400, 'scrapy.contrib

How to send JavaScript and Cookies Enabled in Scrapy?

谁都会走 提交于 2020-07-05 07:20:03
问题 I am scraping a website using Scrapy which require cooking and java-script to be enabled. I don't think I will have to actually process javascript. All I need is to pretend as if javascript is enabled. Here is what I have tried: 1) Enable Cookies through following in settings COOKIES_ENABLED = True COOKIES_DEBUG = True 2) Using download middleware for cookies DOWNLOADER_MIDDLEWARES = { 'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': 400, 'scrapy.contrib

How to move to the next page on Python Selenium?

寵の児 提交于 2020-07-03 07:46:57
问题 I am trying to build a proxy scraper for a specific site, but I'm failing on move to next page. This is the code that I'm using. If you answer my question, please, explain me a bit about what you used and if you can, please, if there are any good tutorials for over this kind of code, provide me some: from selenium import webdriver from selenium.webdriver.chrome.options import Options import time options = Options() #options.headless = True #for headless #options.add_argument('--disable-gpu')

How to separate columns and format date when web scraping by using Python?

送分小仙女□ 提交于 2020-06-28 03:58:11
问题 I am trying to web scrape, by using Python 3, a chart off of this website into a .csv file: 2013-14 NBA National TV Schedule The chart starts out like: Game/Time Network Matchup Oct. 29, 8 p.m. ET TNT Chicago vs. Miami Oct. 29, 10:30 p.m. ET TNT LA Clippers vs. LA Lakers I am using these packages: import re import requests import pandas as pd from bs4 import BeautifulSoup from itertools import groupby I imported the data by: pd.read_html("https://www.sbnation.com/2013/8/6/4595688/2013-14-nba

How can I fix the following error AttributeError: 'dict' object has no attribute 'text'

岁酱吖の 提交于 2020-06-22 13:25:26
问题 I am a few months into programming. I am currently in the process of learning how to automate certain things in a project. My goal is to scrape text, src, and href and store the data in my site's database,but when I try I get this error AttributeError: 'dict' object has no attribute 'text' but it does. this is my code. I created a function def get_world_too(): url = 'http://www.example.com' html = requests.get(url, headers=headers) soup = BeautifulSoup(html.text, 'html5lib') titles = soup

How can I fix the following error AttributeError: 'dict' object has no attribute 'text'

Deadly 提交于 2020-06-22 13:25:22
问题 I am a few months into programming. I am currently in the process of learning how to automate certain things in a project. My goal is to scrape text, src, and href and store the data in my site's database,but when I try I get this error AttributeError: 'dict' object has no attribute 'text' but it does. this is my code. I created a function def get_world_too(): url = 'http://www.example.com' html = requests.get(url, headers=headers) soup = BeautifulSoup(html.text, 'html5lib') titles = soup

Selenium Scroll inside of popup div

大城市里の小女人 提交于 2020-06-11 04:09:25
问题 I am using selenium and trying to scroll inside the popup div on instagram. I get to a page like 'https://www.instagram.com/kimkardashian/', click followers, and then I can't get the followers list to scroll down. I tried using hover, click_and_hold, and a few other tricks to select the div but none of them worked. What would the best way be to get this selected? This is what I tried so far: driver.find_elements_by_xpath("//*[contains(text(), 'followers')]")[0].click() element_to_hover_over =

Selenium Scroll inside of popup div

大兔子大兔子 提交于 2020-06-11 04:07:59
问题 I am using selenium and trying to scroll inside the popup div on instagram. I get to a page like 'https://www.instagram.com/kimkardashian/', click followers, and then I can't get the followers list to scroll down. I tried using hover, click_and_hold, and a few other tricks to select the div but none of them worked. What would the best way be to get this selected? This is what I tried so far: driver.find_elements_by_xpath("//*[contains(text(), 'followers')]")[0].click() element_to_hover_over =

Puppeteer waitForSelector on multiple selectors

故事扮演 提交于 2020-05-13 04:12:10
问题 I have Puppeteer controlling a website with a lookup form that can either return a result or a "No records found" message. How can I tell which was returned? waitForSelector seems to wait for only one at a time, while waitForNavigation doesn't seem to work because it is returned using Ajax. I am using a try catch, but it is tricky to get right and slows everything way down. try { await page.waitForSelector(SELECTOR1,{timeout:1000}); } catch(err) { await page.waitForSelector(SELECTOR2); } 回答1:

How to scrape table data from specific site JSOUP

隐身守侯 提交于 2020-04-30 06:29:33
问题 I'm trying to scrape some data from table on this site:https://www.worldometers.info/coronavirus/ Here is the source code of scraper I've tried public static void main(String[] args) throws Exception { String url = "https://www.worldometers.info/coronavirus/"; try{ Document doc = Jsoup.connect(url).get(); Element table = doc.getElementById("main_table_countries_today"); Elements rows = table.getElementsByTag("tr"); for(Element row : rows){ Elements tds = row.getElementsByTag("td"); for(int i