web-scraping

Error local variable has been referenced before assignment

喜欢而已 提交于 2021-02-11 12:32:20
问题 I am new to the stackoverflow community, and new to programming in general. One of my first projects is to build a web scraper to see if I can collect market data. In attempting to build this, I keep getting stuck with an unbound local error. I am aware that this has something to do with how I am instantiating my class and how I am referencing the variable, strong text but not sure how to trouble shoot it.. class Stock: def __init__(self,symbol,company): self.symbol = symbol self.company =

Scraping with selenium and BeautifulSoup doesn´t return all the items in the page

六月ゝ 毕业季﹏ 提交于 2021-02-11 12:29:41
问题 So I came from the question here Now I am able to interact with the page, scroll down the page, close the popup that appears and click at the bottom to expand the page. The problem is when I count the items, the code only returns 20 and it should be 40. I have checked the code again and again - I'm missing something but I don't know what. See my code below: from selenium import webdriver from bs4 import BeautifulSoup import pandas as pd import time import datetime options = webdriver

How do I make Internet Explorer driver invisible using Selenium and VB?

末鹿安然 提交于 2021-02-11 12:20:48
问题 I am using Selenium WebDriver to make some automations, using chrome I can use the headless argument to hide it, but I don't know the argument to hide the Internet Explorer. Dim driver As New ChromeDriver driver.AddArgument ("headless") Dim driver As New IEDriver driver.AddArgument ("?????????????????????") Library used - A Selenium based browser automation framework for VB.Net, VBA and VBScript 回答1: Based on my searching results, it looks like Internet Explorer does not support Headless mode

Optimizing python web scraping script with Selenium

不想你离开。 提交于 2021-02-11 12:04:52
问题 I'm having an issue with my web scraping script with Selenium Normally, the script can run smoothly. However, I would usually have this error within this for loop (I believe the script ran too fast before the elements can be visible): NoSuchElementException Traceback (most recent call last) <ipython-input-6-470748a6674f> in <module> 66 item_brand.append(driver.find_element_by_xpath('.//*[@id="brand"]/a/span/bdi').get_attribute('textContent')) 67 item_prices.append(driver.find_element_by_css

Scraping Google Images using Selenium in Python

风格不统一 提交于 2021-02-11 08:44:29
问题 Now, I have been trying to scrape google images using the following code : from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys import os import time import requests import re import urllib2 import re from threading import Thread import json #Assuming I have a folder named Pictures1, the images are downloaded there. def threaded_func(url,i): raw_img = urllib2.urlopen(url).read() cntr = len([i for i in os.listdir("Pictures1"

Creating a dataframe with text from a website

核能气质少年 提交于 2021-02-11 06:38:10
问题 I've been asked to create a data frame in R using information copied from a website; the data is not contained in a file. The full data list is at: https://www.npr.org/2012/12/07/166400760/hollywood-heights-the-ups-downs-and-in-betweens Here is a portion of the data: Leading Men (Average American male: 5 feet 9.5 inches) Dolph Lundgren — 6 feet 5 inches John Cleese — 6 feet 5 inches Michael Clarke Duncan — 6 feet 5 inches Vince Vaughn — 6 feet 5 inches Clint Eastwood — 6 feet 4 inches Jimmy

Excel VBA Web Scraping Returning Wrong Text in MSXML2.XMLHTTP method

∥☆過路亽.° 提交于 2021-02-11 06:29:38
问题 I am trying to extract the movie description from this Url, "https://ssl.ofdb.de/plot/138627,271359,I-Am-Legend" When i use CreateObject("InternetExplorer.Application") method it gives me the correct web string as visually seen in the web site (This method is slow) But if i use the MSXML2.XMLHTTP,some of the text returned or non readable text (But this method is fast) Output of First Method:(No problem) Robert Neville (Will Smith) war ein hervorragender Wissenschaftler, aber auch er konnte

Excel VBA Web Scraping Returning Wrong Text in MSXML2.XMLHTTP method

做~自己de王妃 提交于 2021-02-11 06:29:36
问题 I am trying to extract the movie description from this Url, "https://ssl.ofdb.de/plot/138627,271359,I-Am-Legend" When i use CreateObject("InternetExplorer.Application") method it gives me the correct web string as visually seen in the web site (This method is slow) But if i use the MSXML2.XMLHTTP,some of the text returned or non readable text (But this method is fast) Output of First Method:(No problem) Robert Neville (Will Smith) war ein hervorragender Wissenschaftler, aber auch er konnte

scrapy not giving any output

旧时模样 提交于 2021-02-11 06:22:02
问题 I was following this link and i was able to run a basespider successfully. How ever when i tried using the same with a crawlspider, i was not getting any output. My spider is as follows: from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.http import request from scrapy.selector import HtmlXPathSelector from medsynergies.items import MedsynergiesItem class medsynergiesspider(CrawlSpider): name="medsynergies" allowed

scrapy not giving any output

荒凉一梦 提交于 2021-02-11 06:21:34
问题 I was following this link and i was able to run a basespider successfully. How ever when i tried using the same with a crawlspider, i was not getting any output. My spider is as follows: from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.http import request from scrapy.selector import HtmlXPathSelector from medsynergies.items import MedsynergiesItem class medsynergiesspider(CrawlSpider): name="medsynergies" allowed