post request using python to asp.net page

前端 未结 1 1954
无人共我
无人共我 2020-12-03 09:03

i want scrap the PINCODEs from \"http://www.indiapost.gov.in/pin/\", i am doing with following code written.

import urllib
import urllib2
headers = {
    \'A         


        
相关标签:
1条回答
  • 2020-12-03 09:15

    Where did you get the value viewstate and eventvalidation? On one hand, they shouldn't end with "...", you must have omitted something. On the other hand, they shouldn't be hard-coded.

    One solution is like this:

    1. Retrieve the page via URL "http://www.indiapost.gov.in/pin/" without any form data
    2. Parse and retrieve the form values like __VIEWSTATE and __EVENTVALIDATION (you may take use of BeautifulSoup).
    3. Get the search result(second HTTP request) by adding vital form-data from step 2.

    UPDATE:

    According to the above idea, I modify your code slightly to make it work:

    import urllib
    from bs4 import BeautifulSoup
    
    headers = {
        'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Origin': 'http://www.indiapost.gov.in',
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko)  Chrome/24.0.1312.57 Safari/537.17',
        'Content-Type': 'application/x-www-form-urlencoded',
        'Referer': 'http://www.indiapost.gov.in/pin/',
        'Accept-Encoding': 'gzip,deflate,sdch',
        'Accept-Language': 'en-US,en;q=0.8',
        'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
    }
    
    class MyOpener(urllib.FancyURLopener):
        version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'
    
    myopener = MyOpener()
    url = 'http://www.indiapost.gov.in/pin/'
    # first HTTP request without form data
    f = myopener.open(url)
    soup = BeautifulSoup(f)
    # parse and retrieve two vital form values
    viewstate = soup.select("#__VIEWSTATE")[0]['value']
    eventvalidation = soup.select("#__EVENTVALIDATION")[0]['value']
    
    formData = (
        ('__EVENTVALIDATION', eventvalidation),
        ('__VIEWSTATE', viewstate),
        ('__VIEWSTATEENCRYPTED',''),
        ('txt_offname', ''),
        ('ddl_dist', '0'),
        ('txt_dist_on', ''),
        ('ddl_state','1'),
        ('btn_state', 'Search'),
        ('txt_stateon', ''),
        ('hdn_tabchoice', '1'),
        ('search_on', 'Search'),
    )
    
    encodedFields = urllib.urlencode(formData)
    # second HTTP request with form data
    f = myopener.open(url, encodedFields)
    
    try:
        # actually we'd better use BeautifulSoup once again to
        # retrieve results(instead of writing out the whole HTML file)
        # Besides, since the result is split into multipages,
        # we need send more HTTP requests
        fout = open('tmp.html', 'w')
    except:
        print('Could not open output file\n')
    fout.writelines(f.readlines())
    fout.close()
    
    0 讨论(0)
提交回复
热议问题