问题
I try to extract image from a posted link and the first check I do is to see if the link is to a plain image like this :
HttpWebRequest request;
WebResponse webresponse;
request = (HttpWebRequest)HttpWebRequest.Create(url);
webresponse = request.GetResponse();
if (webresponse.ContentType.StartsWith("image/"))
...
If this is not found I want to go on with the HTML Agility Pack but to be able to do that I need to run :
HtmlDocument doc;
reader = new StreamReader(webresponse.GetResponseStream());
doc.LoadHtml(reader.ReadToEnd());
The problem is that LoadHtml will not find any source even when Im sure that there is HTML code in the response. I suspect that the formation of the HTML is not in correct format?
Here is part of what the ReadToEnd will generate :
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sv" lang="sv">
<head><title>
X - Eclipse - 2011
</title>
<!--[if lt IE 7]>
<script defer type="text/javascript" src="../javascript/pngfix.js"></script>
<![endif]-->
<!--<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />-->
<meta http-equiv="Content-type" content="text/html; charset=iso-8859-1" /><link href="../../../App_Themes/X/mainStyleSheet.css" type="text/css" rel="stylesheet" /><meta name="author" content="" /><meta name="copyright" content="X.net" /><meta name="description" content="Välkommen in till ett av Sveriges största Xcommunity." /><meta name="keywords" content="X, rollspel, boardgamegeek, boardgame, X.net, X.net, community, Jimmy, Nilsson, schack, risk, puerto rico" /><script language="javascript" type="text/javascript" src="/sites/X/javascript/common.js"></script><script language="javascript" type="text/javascript" src="/sites/X/javascript/ajaxHandler.js"></script><script language="javascript" type="text/javascript" src="/javascript/jquery.js"></script><link rel="shortcut icon" href="/App_Themes/X/Images/common/browserIcon/favicon.ico" /><link rel="icon" href="/App_Themes/X/Images/common/browserIcon/animated_favicon1.gif" type="image/gif" /></head>
<body>
<div id="topBack">
<div id="siteContainer">
<form method="post" action="game.aspx?gameId=72125" id="aspnetForm" enctype="multipart/form-data">
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDw....
I can see that the string contains some newline(\r\n) commands if that matters?
My goal is simple to avoid downloading the webpage more then one time, else I could use the WebClient.DownloadString(url);
to download it in a format that I know works.
回答1:
This worked :
request = (HttpWebRequest)HttpWebRequest.Create(url);
webresponse = (HttpWebResponse)request.GetResponse();
if (webresponse.ContentType.StartsWith("image/"))
{...}
if (webresponse.ContentType.StartsWith("text/html"))
{
var resultStream = webresponse.GetResponseStream();
doc.Load(resultStream);
}
来源:https://stackoverflow.com/questions/15569133/htmldocument-loadhtml-from-webresponse