WebClient problem with URL which ends with a period

后端 未结 5 1814
无人及你
无人及你 2021-01-21 13:48

I\'m running the following code;

using (WebClient wc = new WebClient())
{
    string page = wc.DownloadString(URL);
    ...
}

To access the URL

相关标签:
5条回答
  • 2021-01-21 14:24

    It seems you found a bug in WebClient/WebRequest, though perhaps Microsoft put that in intentionally, who knows. Nonetheless, when you pass in TW., the URI class is translating that to TW without the period. Since WebClient/WebRequest parse strings into URI, your . is disappearing in that world.

    You may have to use TcpClient to get around this and roll your own web client. Any variation of this:

    TcpClient oClient = new TcpClient("www.shareprice.co.uk", 80);
    
    NetworkStream ns = oClient.GetStream();
    
    StreamWriter sw = new StreamWriter(ns);
    sw.Write(
       string.Format( 
          "GET /{0} HTTP/1.1\r\nUser-Agent: {1}\r\nHost: www.shareprice.co.uk\r\n\r\n",
               "TW.", 
               "MyTCPClient"  )
    );                    
    sw.Flush();
    
    StringBuilder sb = new StringBuilder();
    
    while (true)
    {
        int i = ns.ReadByte(); // Inefficient but more reliable 
        if (i == -1) break;  // Other side has closed socket 
        sb.Append( (char) i );   // Accrue 'c' to save page data 
    }
    
    oClient.Close();
    

    This will give you a 302 redirect, so just parse out the 'Location:' and execute the above again with the new location.

    HTTP/1.1 302 Found
    Date: Wed, 11 Nov 2009 19:29:27 GMT
    Server: lighttpd
    X-Powered-By: PHP/5.2.4-2ubuntu5.7
    Expires: Thu, 19 Nov 1981 08:52:00 GMT
    Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
    Pragma: no-cache
    Location: /TW./TAYLOR-WIMPEY-PLC
    Content-type: text/html; charset=UTF-8
    Content-Length: 0
    Set-Cookie: SSID=668d5d0023e9885e1ef3762ef5e44033; path=/
    Vary: Accept-Encoding
    Connection: close
    
    0 讨论(0)
  • 2021-01-21 14:24

    Try adding a slash to the end, after the period. Your normal web browser will do that for you, and the WebClient class isn't that smart.

    http://www.shareprice.co.uk/TW./
    

    This worked for me as well when I typed it into the browser.

    Edit - added

    The following all also worked in the browser

    http://www.shareprice.co.uk/TW
    

    and

    http://www.shareprice.co.uk/TW/

    so it looks like you should be able to just check to see if the last character is a period, and remove it.

    0 讨论(0)
  • 2021-01-21 14:25

    Just add a space after the period, when parsing the space will be removed but the period will stay there.

    0 讨论(0)
  • 2021-01-21 14:27

    use URL encoding...it will turn the "." into %2E

    0 讨论(0)
  • 2021-01-21 14:28

    To address a single period (.) at the end of a URL use the following:

    <system.web>
                 <httpRuntime relaxedUrlToFileSystemMapping="true" /> 
    </system.web>
    

    To address two periods (..) or other denied sequences, see the following article:

    http://www.iis.net/ConfigReference/system.webServer/security/requestFiltering/denyUrlSequences

    0 讨论(0)
提交回复
热议问题