scrapy-spider

export python data to csv file

狂风中的少年 提交于 2020-01-06 23:49:00
问题 I'm trying to export my file via command line : scrapy crawl tunisaianet -o save.csv -t csv but nothing is happenning, any help? here is my code: import scrapy import csv from tfaw.items import TfawItem class TunisianetSpider(scrapy.Spider): name = "tunisianet" allowed_domains = ["tunisianet.com.tn"] start_urls = [ 'http://www.tunisianet.com.tn/466-consoles-jeux/', ] def parse(self, response): item = TfawItem() data= [] out = open('out.csv', 'a') x = response.xpath('//*[contains(@class, "ajax

Scrapy - Grab all product details

。_饼干妹妹 提交于 2020-01-06 04:38:09
问题 I need to grab all Product Details (with green tickmarks) from this page: https://sourceforge.net/software/product/Budget-Maestro/ divs = response.xpath("//section[@class='row psp-section m-section-comm-details m-section-emphasized grey']/div[@class='list-outer column']/div") for div in divs: detail = div.xpath("./h3/text()").extract_first().strip() + ":" if detail!="Company Information:": divs2 = div.xpath(".//div[@class='list']/div") for div2 in divs2: dd = [val for val in div2.xpath(".

How to get double quotes in Scrapy .csv results

廉价感情. 提交于 2020-01-04 03:18:09
问题 I have a problem with quotations within outputs using Scrapy. I am trying to scrap data that contains commas and this results in double quotations in some columns like so: TEST,TEST,TEST,ON,TEST,TEST,"$2,449,000, 4,735 Sq Ft, 6 Bed, 5.1 Bath, Listed 03/01/2016" TEST,TEST,TEST,ON,TEST,TEST,"$2,895,000, 4,975 Sq Ft, 5 Bed, 4.1 Bath, Listed 01/03/2016" Only columns with commas get double quoted. How can I double quote all my data columns? I want Scrapy to output: "TEST","TEST","TEST","ON","TEST"

How can I reuse the parse method of my scrapy Spider-based spider in an inheriting CrawlSpider?

久未见 提交于 2020-01-03 17:22:46
问题 I currently have a Spider-based spider that I wrote for crawling an input JSON array of start_urls : from scrapy.spider import Spider from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from foo.items import AtlanticFirearmsItem from scrapy.contrib.loader import ItemLoader import json import datetime import re class AtlanticFirearmsSpider(Spider): name = "atlantic_firearms" allowed_domains = ["atlanticfirearms.com"] def __init_

Scrapy ERROR: Error downloading - Could not open CONNECT tunnel

ε祈祈猫儿з 提交于 2020-01-03 16:54:56
问题 I have written a spider to crawl https://tecnoblog.net/categoria/review/ but when I let the spider crawl, there is one error: 2015-05-19 15:13:20+0100 [scrapy] INFO: Scrapy 0.24.5 started (bot: reviews) 2015-05-19 15:13:20+0100 [scrapy] INFO: Optional features available: ssl, http11 2015-05-19 15:13:20+0100 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'reviews.spiders', 'SPIDER_MODULES': ['reviews.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'reviews'} 2015-05-19 15:13:20+0100

Retrying a Scrapy Request even when receiving a 200 status code

浪子不回头ぞ 提交于 2020-01-03 09:24:54
问题 There is a website I'm scraping that will sometimes return a 200, but not have any text in response.body (raises an AttributeError when I try to parse it with Selector). Is there a simple way to check to make sure the body includes text, and if not, retry the request until it does? Here is some pseudocode to outline what I'm trying to do. def check_response(response): if response.body != '': return response else: return Request(copy_of_response.request, callback=check_response) Basically, is

Portia/Scrapy - how to replace or add values to output JSON

时光总嘲笑我的痴心妄想 提交于 2020-01-03 05:28:04
问题 just 2 quick doubts: 1- I want my final JSON file to replace the text extract (for example text extracted is ADD TO CART but I want to change to IN STOCK in my final JSON. Is it possible? 2- I also would like to add some custom data to my final JSON file that is not in the website, for example "Store name"... so every product that I scrape will have the store name after it. Is it possible? I am using both Portia and Scrapy so your suggestions are welcome in both platforms. My Scrapy spider

Why this inconsistent behaviour using scrapy shell printing results?

我的梦境 提交于 2020-01-03 01:34:07
问题 Load the scrapy shell scrapy shell "http://www.worldfootball.net/all_matches/eng-premier-league-2015-2016/" Try a selector: response.xpath('(//table[@class="standard_tabelle"])[1]/tr[not(th)]') Note: it prints results. But now use that selector as a for statement: for row in response.xpath('(//table[@class="standard_tabelle"])[1]/tr[not(th)]'): row.xpath(".//a[contains(@href, 'report')]/@href").extract_first() Hit return twice, nothing is printed. To print results inside the for loop, you

ModuleNotFoundError: No module named 'Scrapy'

て烟熏妆下的殇ゞ 提交于 2020-01-02 10:22:24
问题 import Scrapy class NgaSpider(Scrapy.Spider): name = "NgaSpider" host = "http://bbs.ngacn.cc/" start_urls = [ "http://bbs.ngacn.cc/thread.php?fid=406", ] def parse(self, response): print ("response.body") Error: ModuleNotFoundError: No module named 'Scrapy' What is going on to fix this issue? 回答1: You are incorrectly importing the scrapy module. Find a simple tutorial and references from here. You have to do the following changes: import scrapy # Change here class NgaSpider(scrapy.Spider): #

twisted critical unhandled error on scrapy tutorial

南楼画角 提交于 2020-01-02 04:57:06
问题 I'm new in programming and I'm trying to learn scrapy, using scrapy tutorial: http://doc.scrapy.org/en/latest/intro/tutorial.html So I ran "scrapy crawl dmoz" command and got this error: 2015-07-14 16:11:02 [scrapy] INFO: Scrapy 1.0.1 started (bot: tutorial) 2015-07-14 16:11:02 [scrapy] INFO: Optional features available: ssl, http11 2015-07-14 16:11:02 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'tu torial.spiders', 'SPIDER_MODULES': ['tutorial.spiders'], 'BOT_NAME': 'tutorial'}