web-scraping | 易学教程

Error local variable has been referenced before assignment

阅读更多关于 Error local variable has been referenced before assignment

问题 I am new to the stackoverflow community, and new to programming in general. One of my first projects is to build a web scraper to see if I can collect market data. In attempting to build this, I keep getting stuck with an unbound local error. I am aware that this has something to do with how I am instantiating my class and how I am referencing the variable, strong text but not sure how to trouble shoot it.. class Stock: def __init__(self,symbol,company): self.symbol = symbol self.company =

Scraping with selenium and BeautifulSoup doesn´t return all the items in the page

阅读更多关于 Scraping with selenium and BeautifulSoup doesn´t return all the items in the page

问题 So I came from the question here Now I am able to interact with the page, scroll down the page, close the popup that appears and click at the bottom to expand the page. The problem is when I count the items, the code only returns 20 and it should be 40. I have checked the code again and again - I'm missing something but I don't know what. See my code below: from selenium import webdriver from bs4 import BeautifulSoup import pandas as pd import time import datetime options = webdriver

How do I make Internet Explorer driver invisible using Selenium and VB?

阅读更多关于 How do I make Internet Explorer driver invisible using Selenium and VB?

问题 I am using Selenium WebDriver to make some automations, using chrome I can use the headless argument to hide it, but I don't know the argument to hide the Internet Explorer. Dim driver As New ChromeDriver driver.AddArgument ("headless") Dim driver As New IEDriver driver.AddArgument ("?????????????????????") Library used - A Selenium based browser automation framework for VB.Net, VBA and VBScript 回答1: Based on my searching results, it looks like Internet Explorer does not support Headless mode

Optimizing python web scraping script with Selenium

阅读更多关于 Optimizing python web scraping script with Selenium

问题 I'm having an issue with my web scraping script with Selenium Normally, the script can run smoothly. However, I would usually have this error within this for loop (I believe the script ran too fast before the elements can be visible): NoSuchElementException Traceback (most recent call last) <ipython-input-6-470748a6674f> in <module> 66 item_brand.append(driver.find_element_by_xpath('.//*[@id="brand"]/a/span/bdi').get_attribute('textContent')) 67 item_prices.append(driver.find_element_by_css

Scraping Google Images using Selenium in Python

阅读更多关于 Scraping Google Images using Selenium in Python

问题 Now, I have been trying to scrape google images using the following code : from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys import os import time import requests import re import urllib2 import re from threading import Thread import json #Assuming I have a folder named Pictures1, the images are downloaded there. def threaded_func(url,i): raw_img = urllib2.urlopen(url).read() cntr = len([i for i in os.listdir("Pictures1"

Creating a dataframe with text from a website

阅读更多关于 Creating a dataframe with text from a website

问题 I've been asked to create a data frame in R using information copied from a website; the data is not contained in a file. The full data list is at: https://www.npr.org/2012/12/07/166400760/hollywood-heights-the-ups-downs-and-in-betweens Here is a portion of the data: Leading Men (Average American male: 5 feet 9.5 inches) Dolph Lundgren — 6 feet 5 inches John Cleese — 6 feet 5 inches Michael Clarke Duncan — 6 feet 5 inches Vince Vaughn — 6 feet 5 inches Clint Eastwood — 6 feet 4 inches Jimmy

Excel VBA Web Scraping Returning Wrong Text in MSXML2.XMLHTTP method

阅读更多关于 Excel VBA Web Scraping Returning Wrong Text in MSXML2.XMLHTTP method

问题 I am trying to extract the movie description from this Url, "https://ssl.ofdb.de/plot/138627,271359,I-Am-Legend" When i use CreateObject("InternetExplorer.Application") method it gives me the correct web string as visually seen in the web site (This method is slow) But if i use the MSXML2.XMLHTTP,some of the text returned or non readable text (But this method is fast) Output of First Method:(No problem) Robert Neville (Will Smith) war ein hervorragender Wissenschaftler, aber auch er konnte

Excel VBA Web Scraping Returning Wrong Text in MSXML2.XMLHTTP method

阅读更多关于 Excel VBA Web Scraping Returning Wrong Text in MSXML2.XMLHTTP method

scrapy not giving any output

阅读更多关于 scrapy not giving any output

问题 I was following this link and i was able to run a basespider successfully. How ever when i tried using the same with a crawlspider, i was not getting any output. My spider is as follows: from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.http import request from scrapy.selector import HtmlXPathSelector from medsynergies.items import MedsynergiesItem class medsynergiesspider(CrawlSpider): name="medsynergies" allowed

scrapy not giving any output

阅读更多关于 scrapy not giving any output