Scrapy + splash: can't select element

房东的猫 提交于 2019-12-31 17:24:38

问题


I'm learning to use scrapy with splash. As an exercise, I'm trying to visit https://www.ubereats.com/stores/, click on the address text box, enter a location and then press the Enter button to move to next page containing the restaurants available for that location. I have the following lua code:

function main(splash)
  local url = splash.args.url
  assert(splash:go(url))
  assert(splash:wait(5))

  local element = splash:select('.base_29SQWm')
  local bounds = element:bounds()
  assert(element:mouseclick{x = bounds.width/2, y = bounds.height/2})
    assert(element:send_text("Wall Street"))
  assert(splash:send_keys("<Return>"))
  assert(splash:wait(5))

  return {
  html = splash:html(),
  }
end

When I click on "Render!" in the splash API, I get the following error message:

  {
      "info": {
          "message": "Lua error: [string \"function main(splash)\r...\"]:7: attempt to index local 'element' (a nil value)",
          "type": "LUA_ERROR",
          "error": "attempt to index local 'element' (a nil value)",
          "source": "[string \"function main(splash)\r...\"]",
          "line_number": 7
      },
      "error": 400,
      "type": "ScriptError",
      "description": "Error happened while executing Lua script"
  }

Somehow my css expression is false, resulting in splash trying to access an element that is undefined/nil! I've tried other expressions, but I can't seem to figure it out!

Q: Does anyone know how to solve this problem?

EDIT: Even though I still would like to know how to actually click on the element, I figured out how to get the same result by just using keys:

function main(splash)
    local url = splash.args.url
    assert(splash:go(url))
    assert(splash:wait(5))
    splash:send_keys("<Tab>")
    splash:send_keys("<Tab>")
    splash:send_text("Wall Street, New York")
    splash:send_keys("<Return>")
    assert(splash:wait(10))

    return {
    html = splash:html(),
    png = splash:png(),
    }
  end

However, returned html/images in the splash API are from the page where you enter the address, not the page that you see after you've entered your address and clicked enter.

Q2: How do I succesfully load the second page?


回答1:


Not a complete solution, but here is what I have so far:

import json
import re

import scrapy
from scrapy_splash import SplashRequest


class UberEatsSpider(scrapy.Spider):
    name = "ubereatspider"
    allowed_domains = ["ubereats.com"]

    def start_requests(self):
        script = """
        function main(splash)
            local url = splash.args.url
            assert(splash:go(url))
            assert(splash:wait(10))

            splash:set_viewport_full()

            local search_input = splash:select('#address-selection-input')
            search_input:send_text("Wall Street, New York")
            assert(splash:wait(5))

            local submit_button = splash:select('button[class^=submitButton_]')
            submit_button:click()

            assert(splash:wait(10))

            return {
                html = splash:html(),
                png = splash:png(),
            }
          end
        """
        headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'
        }
        yield SplashRequest('https://www.ubereats.com/new_york/', self.parse, endpoint='execute', args={
            'lua_source': script,
            'wait': 5
        }, splash_headers=headers, headers=headers)

    def parse(self, response):
        script = response.xpath("//script[contains(., 'cityName')]/text()").extract_first()
        pattern = re.compile(r"window.INITIAL_STATE = (\{.*?\});", re.MULTILINE | re.DOTALL)

        match = pattern.search(script)
        if match:
            data = match.group(1)
            data = json.loads(data)
            for place in data["marketplace"]["marketplaceStores"]["data"]["entity"]:
                print(place["title"])

Note the changes in the Lua script: I've located the search input, send the search text to it, then located the "Find" button and clicked it. On the screenshot, I did not see the search results loaded no matter the time delay I've set, but I've managed to get the restaurant names from the script contents. The place objects contain all the necessary information to filter the desired restaurants.

Also note that the URL I'm navigating to is the "New York" one (not the general "stores").

I'm not completely sure why the search result page is not being loaded though, but hope it'll be a good start for you and you can further improve this solution.



来源:https://stackoverflow.com/questions/41632784/scrapy-splash-cant-select-element

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!