问题
What I'm trying to do
On avito.ru (Russian real estate site), person's phone is hidden until you click on it. I want to collect the phone using Scrapy+Splash.
Example URL: https://www.avito.ru/moskva/kvartiry/2-k_kvartira_84_m_412_et._992361048
After you click the button, pop-up is displayed and phone is visible.
I'm using Splash execute API with following Lua script:
function main(splash)
splash:go(splash.args.url)
splash:wait(10)
splash:runjs("document.getElementsByClassName('item-phone-button')[0].click()")
splash:wait(10)
return splash:png()
end
Problem
The button is not clicked and phone number is not displayed. It's a trivial task, and I have no explanation why it doesn't work.
Click works fine for another field on the same page, if we replace item-phone-button
with js-show-stat
. So Javascript in general works, and the blue "Display phone" button must be special somehow.
What I've tried
To isolate the problem, I created a repo with minimal example script and a docker-compose file for Splash: https://github.com/alexanderlukanin13/splash-avito-phone
Javascript code is valid, you can verify it using Javascript console in Chrome and Firefox
document.getElementsByClassName('item-phone-button')[0].click()
I've tried it with Splash versions 3.0, 3.1, 3.2, result is the same.
Update
I've also tried:
@Lore's suggestions, including
simulateClick()
approach (see simulate_click branch)mouseDown/mouseUp events as described here: Simulating a mousedown, click, mouseup sequence in Tampermonkey? (see trigger_mouse_event branch)
回答1:
The following script works for me:
function main(splash, args)
splash.private_mode_enabled = false
assert(splash:go(args.url))
btn = splash:select_all('.item-phone-button')[2]
btn:mouse_click()
btn.style.border = "5px solid black"
assert(splash:wait(0.5))
return {
num = #splash:select_all('.item-phone-button'),
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end
There were 2 issues with the original solution:
- There are 2 elements with 'item-phone-button' class, and button of interest is the second one. I've checked which element is matched by setting
btn.style.border = "5px solid black"
. - This website requires private mode to be disabled, likely because it uses localStorage. Check http://splash.readthedocs.io/en/stable/faq.html#website-is-not-rendered-correctly for other common suggestions.
回答2:
I don't know how your implementation works, but I suggest to rename main
with parse
, the default function called by spiders on start.
If this isn't the problem, first thing to do is controlling if you have picked the right element of that class using Javascript with css selector. Maybe it exists another item with item-phone-button
class attribute and you are clicking in the wrong place.
If all above is correct, I suggest then two options that worked for me:
local button = splash:select('item phone-button')
button:mouse_click()
button:mouse_click()
function main(splash)
splash:go(splash.args.url)
splash:wait_for_resume("document.getElementsByClassName([[
function main(splash) {
document.getElementsByClassName('item-phone-button');[0].click()
splash.resume();
}
]])
return splash:png()
end
EDIT: it seems that is good to use dispatchEvent
instead of click()
like in this example:
function simulateClick() {
var event = new MouseEvent('click', {
view: window,
bubbles: true,
cancelable: true
});
var cb = document.getElementById('checkbox');
var cancelled = !cb.dispatchEvent(event);
if (cancelled) {
// A handler called preventDefault.
alert("cancelled");
} else {
// None of the handlers called preventDefault.
alert("not cancelled");
}
}
来源:https://stackoverflow.com/questions/49276401/scrapy-splash-click-button-doesnt-work