WebClient problem with URL which ends with a period

烂漫一生 提交于 2019-12-31 03:39:07

问题


I'm running the following code;

using (WebClient wc = new WebClient())
{
    string page = wc.DownloadString(URL);
    ...
}

To access the URL of a share price website, http://www.shareprice.co.uk

If you append a company's symbol name onto the end of the URL, then a page is returned which I parse to get the latest price info etc.

e.g.

http://www.shareprice.co.uk/VOD

http://www.shareprice.co.uk/TW.

Now, my problem is that some symbols end in periods, as in the second example there. For some unknown reason, the code above has a problem retrieving these sorts of URLs.

There is no run-time error, but a page is returned back which reports "Symbol could not be found" from the website itself, indicating that something is happening to the period on the end of the URL in between the call to DownloadString and the actual HTTP request.

Does anyone have any idea what might be causing this, and how to fix it?

Thanks


回答1:


It seems you found a bug in WebClient/WebRequest, though perhaps Microsoft put that in intentionally, who knows. Nonetheless, when you pass in TW., the URI class is translating that to TW without the period. Since WebClient/WebRequest parse strings into URI, your . is disappearing in that world.

You may have to use TcpClient to get around this and roll your own web client. Any variation of this:

TcpClient oClient = new TcpClient("www.shareprice.co.uk", 80);

NetworkStream ns = oClient.GetStream();

StreamWriter sw = new StreamWriter(ns);
sw.Write(
   string.Format( 
      "GET /{0} HTTP/1.1\r\nUser-Agent: {1}\r\nHost: www.shareprice.co.uk\r\n\r\n",
           "TW.", 
           "MyTCPClient"  )
);                    
sw.Flush();

StringBuilder sb = new StringBuilder();

while (true)
{
    int i = ns.ReadByte(); // Inefficient but more reliable 
    if (i == -1) break;  // Other side has closed socket 
    sb.Append( (char) i );   // Accrue 'c' to save page data 
}

oClient.Close();

This will give you a 302 redirect, so just parse out the 'Location:' and execute the above again with the new location.

HTTP/1.1 302 Found
Date: Wed, 11 Nov 2009 19:29:27 GMT
Server: lighttpd
X-Powered-By: PHP/5.2.4-2ubuntu5.7
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: /TW./TAYLOR-WIMPEY-PLC
Content-type: text/html; charset=UTF-8
Content-Length: 0
Set-Cookie: SSID=668d5d0023e9885e1ef3762ef5e44033; path=/
Vary: Accept-Encoding
Connection: close



回答2:


Try adding a slash to the end, after the period. Your normal web browser will do that for you, and the WebClient class isn't that smart.

http://www.shareprice.co.uk/TW./

This worked for me as well when I typed it into the browser.

Edit - added

The following all also worked in the browser

http://www.shareprice.co.uk/TW

and

http://www.shareprice.co.uk/TW/

so it looks like you should be able to just check to see if the last character is a period, and remove it.




回答3:


use URL encoding...it will turn the "." into %2E




回答4:


To address a single period (.) at the end of a URL use the following:

<system.web>
             <httpRuntime relaxedUrlToFileSystemMapping="true" /> 
</system.web>

To address two periods (..) or other denied sequences, see the following article:

http://www.iis.net/ConfigReference/system.webServer/security/requestFiltering/denyUrlSequences




回答5:


Just add a space after the period, when parsing the space will be removed but the period will stay there.



来源:https://stackoverflow.com/questions/1716667/webclient-problem-with-url-which-ends-with-a-period

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!