dryscrape: “No route found for…”

假装没事ソ 提交于 2019-12-11 06:06:06

问题


Context:

I am trying to code my own money aggregator because most of available tools on the market does not cover all financial websites yet. I am using python 2.7.9 on a raspberrypi.

I managed to connect to 2 of my accounts so far (one crow-lending website and one for my pension) thanks to requests library. The third website I am trying to aggregate is giving me hard time since 2 weeks now and its name is https://www.amundi-ee.com.

I figured out that the website is actually using JavaScript and after many research I ended up using dryscrape (I cannot use selenium cause Arm is not supported anymore).

Issue:

When running this code:

import dryscrape

url='https://www.amundi-ee.com'
extensionInit='/psf/#login'
extensionConnect='/psf/authenticate'
extensionResult='/psf/#'
urlInit = url + extensionInit
urlConnect = url + extensionConnect
urlResult = url + extensionResult

s = dryscrape.Session()
s.visit(urlInit)
print s.body()
login = s.at_xpath('//*[@id="identifiant"]')
login.set("XXXXXXXX")
pwd = s.at_xpath('//*[@name="password"]')
pwd.set("YYYYYYY")
# Push the button
login.form().submit()
s.visit(urlConnect)
print s.body()
s.visit(urlResult)

There is an issue when code visits urlConnect line 21, the body printing line 22 returns the below:

{"code":405,"message":"No route found for \u0022GET \/authenticate\u0022: Method Not Allowed (Allow: POST)","errors":[]}

Question

Why do have I have such error message and how can I login to the website properly to retrieve the data I am looking for?

PS: My code inspiration comes from this issue Python dryscrape scrape page with cookies


回答1:


ok so after more than one month of trying to tackle this down, I am very delighted to say that I finally managed to get what I want

What was the issue?

Basically 2 major things (maybe more but I might have forgotten in between):

  1. the password has to be pushed via button and those are randomly generated so every time you access you need to do a new mapping
  2. login.form().submit() was messing around the access to the page of needed data, by clicking the validate button was good enough

Here is the final code, do not hesitate to comment if you find a bad usage as I am a python novice and a sporadic coder.

import dryscrape
from bs4 import BeautifulSoup
from lxml import html
from time import sleep
from webkit_server import InvalidResponseError
from decimal import Decimal
import re
import sys 


def getAmundi(seconds=0):

    url = 'https://www.amundi-ee.com/psf'
    extensionInit='/#login'
    urlInit = url + extensionInit
    urlResult = url + '/#'
    timeoutRetry=1

    if 'linux' in sys.platform:
        # start xvfb in case no X is running. Make sure xvfb 
        # is installed, otherwise this won't work!
        dryscrape.start_xvfb()

    print "connecting to " + url + " with " + str(seconds) + "s of loading wait..." 
    s = dryscrape.Session()
    s.visit(urlInit)
    sleep(seconds)
    s.set_attribute('auto_load_images', False)
    s.set_header('User-agent', 'Google Chrome')
    while True:
        try:
            q = s.at_xpath('//*[@id="identifiant"]')
            q.set("XXXXXXXX")
        except Exception as ex:
            seconds+=timeoutRetry
            print "Failed, retrying to get the loggin field in " + str(seconds) + "s"
            sleep(seconds)
            continue
        break 

    #get password button mapping
    print "loging in ..."
    soup = BeautifulSoup(s.body())
    button_number = range(10)
    for x in range(0, 10):
     button_number[int(soup.findAll('button')[x].text.strip())] = x

    #needed button
    button_1 = button_number[1] + 1
    button_2 = button_number[2] + 1
    button_3 = button_number[3] + 1
    button_5 = button_number[5] + 1

    #push buttons for password
    button = s.at_xpath('//*[@id="num-pad"]/button[' + str(button_2) +']')
    button.click()
    button = s.at_xpath('//*[@id="num-pad"]/button[' + str(button_1) +']')
    button.click()
    ..............

    # Push the validate button
    button = s.at_xpath('//*[@id="content"]/router-view/div/form/div[3]/input')
    button.click()
    print "accessing ..."
    sleep(seconds)

    while True:
        try:
            soup = BeautifulSoup(s.body())
            total_lended = soup.findAll('span')[8].text.strip()
            total_lended = total_lended = Decimal(total_lended.encode('ascii','ignore').replace(',','.').replace(' ',''))
            print total_lended

        except Exception as ex:
            seconds+=1
            print "Failed, retrying to get the data in " + str(seconds) + "s"
            sleep(seconds)
            continue
        break 

    s.reset()


来源:https://stackoverflow.com/questions/43833051/dryscrape-no-route-found-for

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!