问题
I'm running the following code;
using (WebClient wc = new WebClient())
{
string page = wc.DownloadString(URL);
...
}
To access the URL of a share price website, http://www.shareprice.co.uk
If you append a company's symbol name onto the end of the URL, then a page is returned which I parse to get the latest price info etc.
e.g.
http://www.shareprice.co.uk/VOD
http://www.shareprice.co.uk/TW.
Now, my problem is that some symbols end in periods, as in the second example there. For some unknown reason, the code above has a problem retrieving these sorts of URLs.
There is no run-time error, but a page is returned back which reports "Symbol could not be found" from the website itself, indicating that something is happening to the period on the end of the URL in between the call to DownloadString and the actual HTTP request.
Does anyone have any idea what might be causing this, and how to fix it?
Thanks
回答1:
It seems you found a bug in WebClient/WebRequest, though perhaps Microsoft put that in intentionally, who knows. Nonetheless, when you pass in TW., the URI class is translating that to TW without the period. Since WebClient/WebRequest parse strings into URI, your . is disappearing in that world.
You may have to use TcpClient to get around this and roll your own web client. Any variation of this:
TcpClient oClient = new TcpClient("www.shareprice.co.uk", 80);
NetworkStream ns = oClient.GetStream();
StreamWriter sw = new StreamWriter(ns);
sw.Write(
string.Format(
"GET /{0} HTTP/1.1\r\nUser-Agent: {1}\r\nHost: www.shareprice.co.uk\r\n\r\n",
"TW.",
"MyTCPClient" )
);
sw.Flush();
StringBuilder sb = new StringBuilder();
while (true)
{
int i = ns.ReadByte(); // Inefficient but more reliable
if (i == -1) break; // Other side has closed socket
sb.Append( (char) i ); // Accrue 'c' to save page data
}
oClient.Close();
This will give you a 302 redirect, so just parse out the 'Location:' and execute the above again with the new location.
HTTP/1.1 302 Found Date: Wed, 11 Nov 2009 19:29:27 GMT Server: lighttpd X-Powered-By: PHP/5.2.4-2ubuntu5.7 Expires: Thu, 19 Nov 1981 08:52:00 GMT Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Pragma: no-cache Location: /TW./TAYLOR-WIMPEY-PLC Content-type: text/html; charset=UTF-8 Content-Length: 0 Set-Cookie: SSID=668d5d0023e9885e1ef3762ef5e44033; path=/ Vary: Accept-Encoding Connection: close
回答2:
Try adding a slash to the end, after the period. Your normal web browser will do that for you, and the WebClient class isn't that smart.
http://www.shareprice.co.uk/TW./
This worked for me as well when I typed it into the browser.
Edit - added
The following all also worked in the browser
http://www.shareprice.co.uk/TW
and
http://www.shareprice.co.uk/TW/
so it looks like you should be able to just check to see if the last character is a period, and remove it.
回答3:
use URL encoding...it will turn the "." into %2E
回答4:
To address a single period (.) at the end of a URL use the following:
<system.web>
<httpRuntime relaxedUrlToFileSystemMapping="true" />
</system.web>
To address two periods (..) or other denied sequences, see the following article:
http://www.iis.net/ConfigReference/system.webServer/security/requestFiltering/denyUrlSequences
回答5:
Just add a space after the period, when parsing the space will be removed but the period will stay there.
来源:https://stackoverflow.com/questions/1716667/webclient-problem-with-url-which-ends-with-a-period