问题
Context:
I am trying to code my own money aggregator because most of available tools on the market does not cover all financial websites yet. I am using python 2.7.9 on a raspberrypi.
I managed to connect to 2 of my accounts so far (one crow-lending website and one for my pension) thanks to requests library. The third website I am trying to aggregate is giving me hard time since 2 weeks now and its name is https://www.amundi-ee.com.
I figured out that the website is actually using JavaScript and after many research I ended up using dryscrape (I cannot use selenium cause Arm is not supported anymore).
Issue:
When running this code:
import dryscrape
url='https://www.amundi-ee.com'
extensionInit='/psf/#login'
extensionConnect='/psf/authenticate'
extensionResult='/psf/#'
urlInit = url + extensionInit
urlConnect = url + extensionConnect
urlResult = url + extensionResult
s = dryscrape.Session()
s.visit(urlInit)
print s.body()
login = s.at_xpath('//*[@id="identifiant"]')
login.set("XXXXXXXX")
pwd = s.at_xpath('//*[@name="password"]')
pwd.set("YYYYYYY")
# Push the button
login.form().submit()
s.visit(urlConnect)
print s.body()
s.visit(urlResult)
There is an issue when code visits urlConnect line 21, the body printing line 22 returns the below:
{"code":405,"message":"No route found for \u0022GET \/authenticate\u0022: Method Not Allowed (Allow: POST)","errors":[]}
Question
Why do have I have such error message and how can I login to the website properly to retrieve the data I am looking for?
PS: My code inspiration comes from this issue Python dryscrape scrape page with cookies
回答1:
ok so after more than one month of trying to tackle this down, I am very delighted to say that I finally managed to get what I want
What was the issue?
Basically 2 major things (maybe more but I might have forgotten in between):
- the password has to be pushed via button and those are randomly generated so every time you access you need to do a new mapping
login.form().submit()
was messing around the access to the page of needed data, by clicking the validate button was good enough
Here is the final code, do not hesitate to comment if you find a bad usage as I am a python novice and a sporadic coder.
import dryscrape
from bs4 import BeautifulSoup
from lxml import html
from time import sleep
from webkit_server import InvalidResponseError
from decimal import Decimal
import re
import sys
def getAmundi(seconds=0):
url = 'https://www.amundi-ee.com/psf'
extensionInit='/#login'
urlInit = url + extensionInit
urlResult = url + '/#'
timeoutRetry=1
if 'linux' in sys.platform:
# start xvfb in case no X is running. Make sure xvfb
# is installed, otherwise this won't work!
dryscrape.start_xvfb()
print "connecting to " + url + " with " + str(seconds) + "s of loading wait..."
s = dryscrape.Session()
s.visit(urlInit)
sleep(seconds)
s.set_attribute('auto_load_images', False)
s.set_header('User-agent', 'Google Chrome')
while True:
try:
q = s.at_xpath('//*[@id="identifiant"]')
q.set("XXXXXXXX")
except Exception as ex:
seconds+=timeoutRetry
print "Failed, retrying to get the loggin field in " + str(seconds) + "s"
sleep(seconds)
continue
break
#get password button mapping
print "loging in ..."
soup = BeautifulSoup(s.body())
button_number = range(10)
for x in range(0, 10):
button_number[int(soup.findAll('button')[x].text.strip())] = x
#needed button
button_1 = button_number[1] + 1
button_2 = button_number[2] + 1
button_3 = button_number[3] + 1
button_5 = button_number[5] + 1
#push buttons for password
button = s.at_xpath('//*[@id="num-pad"]/button[' + str(button_2) +']')
button.click()
button = s.at_xpath('//*[@id="num-pad"]/button[' + str(button_1) +']')
button.click()
..............
# Push the validate button
button = s.at_xpath('//*[@id="content"]/router-view/div/form/div[3]/input')
button.click()
print "accessing ..."
sleep(seconds)
while True:
try:
soup = BeautifulSoup(s.body())
total_lended = soup.findAll('span')[8].text.strip()
total_lended = total_lended = Decimal(total_lended.encode('ascii','ignore').replace(',','.').replace(' ',''))
print total_lended
except Exception as ex:
seconds+=1
print "Failed, retrying to get the data in " + str(seconds) + "s"
sleep(seconds)
continue
break
s.reset()
来源:https://stackoverflow.com/questions/43833051/dryscrape-no-route-found-for