How do I scrape information off ASP.NET websites when paging and JavaScript links are being used?

后端 未结 2 570
轻奢々
轻奢々 2021-01-12 19:27

I have been given a staff list which is supposed to be up to date but it doesn\'t match an intranet People Finder which is written in ASP.NET.

As the information is

相关标签:
2条回答
  • 2021-01-12 20:06

    You could post a variable to the HTML page to go through the paging.

    string lcUrl = "http://www.mysite.com/page.aspx";
    
    HttpWebRequest loHttp =
    
       (HttpWebRequest) WebRequest.Create(lcUrl);
    
    
    // *** Send any POST data
    
    string lcPostData =
    
       "gvEmployees=" + HttpUtility.UrlEncode("Page$2");
    
    loHttp.Method="POST";
    
    byte [] lbPostBuffer = System.Text.           
    
                           Encoding.GetEncoding(1252).GetBytes(lcPostData);
    
    loHttp.ContentLength = lbPostBuffer.Length;
    
    Stream loPostData = loHttp.GetRequestStream();
    
    loPostData.Write(lbPostBuffer,0,lbPostBuffer.Length);
    
    loPostData.Close();
    
    HttpWebResponse loWebResponse = (HttpWebResponse) loHttp.GetResponse();
    
    Encoding enc = System.Text.Encoding.GetEncoding(1252);
    
    StreamReader loResponseStream =
    
       new StreamReader(loWebResponse.GetResponseStream(),enc);
    
    string lcHtml = loResponseStream.ReadToEnd();
    
    loWebResponse.Close();
    
    loResponseStream.Close();
    

    Then parse out the data you need from the string.

    --EDIT--

    Here is what I would try (something similar) where all of the post data is sent:

    string lcPostData =
    
           "__EVENTTARGET" + HttpUtility.UrlEncode("gvEmployees"); &
    "__EVENTARGUMENT" + HttpUtility.UrlEncode("Page%242"); &
    "__VIEWSTATE" + HttpUtility.UrlEncode("<Value of _Viewstate>");
    
    0 讨论(0)
  • 2021-01-12 20:09

    You open the fiddler and open the second page of asp.net website table.Go to webforms tab in Fiddler for that particular page session and check in the body what are the variables are posting.Concat all variables in same sequence format and post data using HttpWebRequest. In my case it was:

    string PostData = "__EVENTTARGET=" 
        + HttpUtility.UrlEncode("ctl00$ContentPlaceHolder2$grdDirectory") 
        + "&"
        + "__EVENTARGUMENT="+HttpUtility.UrlEncode("Page$2") 
        + "&"
        + "__VIEWSTATE="+ HttpUtility.UrlEncode(view_state)
        + "&"
        + "__VIEWSTATEGENERATOR=" 
        + HttpUtility.UrlEncode(viewstategenerator)
        + "&"
        + "__VIEWSTATEENCRYPTED=" 
        + HttpUtility.UrlEncode(viewstateencrypted) 
        + "&" 
        + "__EVENTVALIDATION=" + HttpUtility.UrlEncode(eventvalidation);
    

    Hope It will work.

    0 讨论(0)
提交回复
热议问题