pipeline | 易学教程

Writing items to a MySQL database in Scrapy

阅读更多关于 Writing items to a MySQL database in Scrapy

I am new to Scrapy, I had the spider code class Example_spider(BaseSpider): name = "example" allowed_domains = ["www.example.com"] def start_requests(self): yield self.make_requests_from_url("http://www.example.com/bookstore/new") def parse(self, response): hxs = HtmlXPathSelector(response) urls = hxs.select('//div[@class="bookListingBookTitle"]/a/@href').extract() for i in urls: yield Request(urljoin("http://www.example.com/", i[1:]), callback=self.parse_url) def parse_url(self, response): hxs = HtmlXPathSelector(response) main = hxs.select('//div[@id="bookshelf-bg"]') items = [] for i in

Should I create pipeline to save files with scrapy?

阅读更多关于 Should I create pipeline to save files with scrapy?

问题 I need to save a file (.pdf) but I'm unsure how to do it. I need to save .pdfs and store them in such a way that they are organized in a directories much like they are stored on the site I'm scraping them off. From what I can gather I need to make a pipeline, but from what I understand pipelines save "Items" and "items" are just basic data like strings/numbers. Is saving files a proper use of pipelines, or should I save file in spider instead? 回答1: Yes and no[1]. If you fetch a pdf it will be

Bash error: Integer expression expected

阅读更多关于 Bash error: Integer expression expected

In the sections below, you'll see the shell script I am trying to run on a UNIX machine, along with a transcript. When I run this program, it gives the expected output but it also gives an error shown in the transcript. What could be the problem and how can I fix it? First, the script: #!/usr/bin/bash while read A B C D E F do E=`echo $E | cut -f 1 -d "%"` if test $# -eq 2 then I=`echo $2` else I=90 fi if test $E -ge $I then echo $F fi done And the transcript of running it: $ df -k | ./filter.sh -c 50 ./filter.sh: line 12: test: capacity: integer expression expected /etc/svc/volatile /var/run

sklearn pipeline - how to apply different transformations on different columns

阅读更多关于 sklearn pipeline - how to apply different transformations on different columns

I am pretty new to pipelines in sklearn and I am running into this problem: I have a dataset that has a mixture of text and numbers i.e. certain columns have text only and rest have integers (or floating point numbers). I was wondering if it was possible to build a pipeline where I can for example call LabelEncoder() on the text features and MinMaxScaler() on the numbers columns. The examples I have seen on the web mostly point towards using LabelEncoder() on the entire dataset and not on select columns. Is this possible? If so any pointers would be greatly appreciated. maxymoo The way I

How does the PowerShell Pipeline Concept work?

阅读更多关于 How does the PowerShell Pipeline Concept work?

问题 I understand that PowerShell piping works by taking the output of one cmdlet and passing it to another cmdlet as input. But how does it go about doing this? Does the first cmdlet finish and then pass all the output variables across at once, which are then processed by the next cmdlet? Or is each output from the first cmdlet taken one at a time and then run it through all of the remaining piped cmdlet’s? 回答1: You can see how pipeline order works with a simple bit of script: function a {begin

Implementing pipelining in C. What would be the best way to do that?

阅读更多关于 Implementing pipelining in C. What would be the best way to do that?

问题 I can't think of any way to implement pipelining in c that would actually work. That's why I've decided to write in here. I have to say, that I understand how do pipe/fork/mkfifo work. I've seen plenty examples of implementing 2-3 pipelines. It's easy. My problem starts, when I've got to implement shell, and pipelines count is unknown. What I've got now: eg. ls -al | tr a-z A-Z | tr A-Z a-z | tr a-z A-Z I transform such line into something like that: array[0] = {"ls", "-al", NULL"} array[1] =

Functional pipes in python like %>% from R's magritrr

阅读更多关于 Functional pipes in python like %>% from R's magritrr

问题 In R (thanks to magritrr ) you can now perform operations with a more functional piping syntax via %>% . This means that instead of coding this: > as.Date("2014-01-01") > as.character((sqrt(12)^2) You could also do this: > "2014-01-01" %>% as.Date > 12 %>% sqrt %>% .^2 %>% as.character To me this is more readable and this extends to use cases beyond the dataframe. Does the python language have support for something similar? 回答1: One possible way of doing this is by using a module called

Scrapy pipeline spider_opened and spider_closed not being called

阅读更多关于 Scrapy pipeline spider_opened and spider_closed not being called

问题 I am having some trouble with a scrapy pipeline. My information is being scraped form sites ok and the process_item method is being called correctly. However the spider_opened and spider_closed methods are not being called. class MyPipeline(object): def __init__(self): log.msg("Initializing Pipeline") self.conn = None self.cur = None def spider_opened(self, spider): log.msg("Pipeline.spider_opened called", level=log.DEBUG) def spider_closed(self, spider): log.msg("Pipeline.spider_closed

Writing console output to a file - file is unexpectedly empty

阅读更多关于 Writing console output to a file - file is unexpectedly empty

问题 I'm new to scripting and I am trying to write the information returned about a VM to a text file. My script looks like this: Connect-VIServer -Server 192.168.255.255 -Protocol https -User xxxx -Password XXXXXX Get-VM -Name xxxxxx Get-VM xxxxx | Get-HardDisk | Select Parent, Name, Filename, DiskType, Persistence | FT -AutoSize Out-File -FilePath C:Filepath I am able to connect to the VM, retrieve the HDD info and see it in the console. The file is created where I want it and is correctly named

What determines whether the Powershell pipeline will unroll a collection?

阅读更多关于 What determines whether the Powershell pipeline will unroll a collection?

问题 # array C:\> (1,2,3).count 3 C:\> (1,2,3 | measure).count 3 # hashtable C:\> @{1=1; 2=2; 3=3}.count 3 C:\> (@{1=1; 2=2; 3=3} | measure).count 1 # array returned from function C:\> function UnrollMe { $args } C:\> (UnrollMe a,b,c).count 3 C:\> (UnrollMe a,b,c | measure).count 1 C:\> (1,2,3).gettype() -eq (UnrollMe a,b,c).gettype() True The discrepancy with HashTables is fairly well known, although the official documentation only mentions it obliquely (via example). The issue with functions,