I\'d been trying to scrape some date from as asp.net website, the start page should be the following one: http://www.e3050.com/Items.aspx?cat=SON
First, I want to displa
I did not extensively research your code, but i see something strange:
# Get last page number
last_page = hxs.select('//span[@id="ctl00_ctl00_ContentPlaceHolder1_ItemListPlaceHolder_lbl_PageSize"]/text()').extract()[0]
i = 1
# preparing requests for each page
while i < (int(last_page) / 5) + 1:
requests.append(Request("http://www.e3050.com/Items.aspx?cat=SON", callback=self.parse_product))
i +=1
First, instead of these manipulations with i
, you can do:
for i in xrange(1, last_page // 5 + 1):
Then you do:
requests.append(Request("http://www.e3050.com/Items.aspx?cat=SON", callback=self.parse_product))
Are you creating many requests to the same URL?