问题
HtmlHelper.GetTagsAndValues(htmlContent);
and i get this error:
at System.String.Split(String[] separator, Int32 count, StringSplitOptions options)
at System.String.Split(String[] separator, StringSplitOptions options)
at WebCrawler.Logic.CrawlerManager.UseRulesOnHtmlPage(Agencies agency, String pageUrl, List`1 listTagValuePair, RulesGroups ruleGroup) in D:\PROJEKTI\crawler\WebCrawlerSuite\WebCrawler.Logic\CrawlerManager.cs:line 263
at WebCrawler.Logic.CrawlerManager.GetAdvertismentFromHtmlContent(List`1 listTagValuePair, Agencies agency, String pageUrl) in D:\PROJEKTI\crawler\WebCrawlerSuite\WebCrawler.Logic\CrawlerManager.cs:line 191
at WebCrawler.Logic.CrawlerManager.ImportAdvertisment2Database.Work(Crawler crawler, PropertyBag propertyBag) in D:\PROJEKTI\crawler\WebCrawlerSuite\WebCrawler.Logic\CrawlerManager.cs:line 668
at WebCrawler.Logic.CrawlerManager.ImportAdvertisment2Database.Process(Crawler crawler, PropertyBag propertyBag) in D:\PROJEKTI\crawler\WebCrawlerSuite\WebCrawler.Logic\CrawlerManager.cs:line 584
i read this article:
http://blogs.msdn.com/b/ericlippert/archive/2009/06/08/out-of-memory-does-not-refer-to-physical-memory.aspx
How can i prevent this error?
whole method:
public static List<TagValuePair> GetTagsAndValues(string htmlContent)
{
List<TagValuePair> tagsValues = new List<TagValuePair>();
Dictionary<string, int> tagAppearance = new Dictionary<string, int>();
HtmlDocument doc = new HtmlDocument();
if (htmlContent != null)
{
doc.LoadHtml(htmlContent);
if (doc.DocumentNode.SelectNodes("//*") == null)
{
List<TagValuePair> tempList = new List<TagValuePair>();
tempList.Add(new TagValuePair("Error!", htmlContent, -1));
return tempList;
}
foreach (HtmlNode tag in doc.DocumentNode.SelectNodes("//*"))
{
try
{
if (!string.IsNullOrEmpty(tag.InnerHtml.Trim()))
{
if (!tagAppearance.Keys.Contains(tag.Name))
{
tagAppearance.Add(tag.Name, 1);
}
else
tagAppearance[tag.Name] = tagAppearance[tag.Name] + 1;
tagsValues.Add(new TagValuePair(tag.Name, tag.InnerHtml.Trim(), tagAppearance[tag.Name]));
}
else
{
// Help link: http://refactoringaspnet.blogspot.com/2010/04/using-htmlagilitypack-to-get-and-post_19.html
if (!string.IsNullOrEmpty(tag.GetAttributeValue("value", "").Trim()))
{
if (!tagAppearance.Keys.Contains("option value"))
{
tagAppearance.Add("option value", 1);
}
else
tagAppearance["option value"] = tagAppearance["option value"] + 1;
tagsValues.Add(new TagValuePair("option value", tag.GetAttributeValue("value", "").Trim(), tagAppearance["option value"]));
}
if (tag.NextSibling != null && !string.IsNullOrEmpty(tag.NextSibling.InnerHtml.Trim()))
{
if (!tagAppearance.Keys.Contains(tag.Name))
{
tagAppearance.Add(tag.Name, 1);
}
else
tagAppearance[tag.Name] = tagAppearance[tag.Name] + 1;
tagsValues.Add(new TagValuePair(tag.Name, tag.NextSibling.InnerHtml.Trim(), tagAppearance[tag.Name]));
}
}
}
catch (Exception)
{
return null;
}
}
}
EDIT:
exact error is here:
doc.LoadHtml(htmlContent);
回答1:
I would suggest looking at a memory profiler to ensure you haven't got any leaks in your application. Given you say it occurs after 12 hours of app working, it seems to indicate that it may be a slow leak that eventually causes the OutOfMemory exception.
There are a number of ways that you can unitentionally hold onto references that will cause a slow leak. Running a profiler will help you identify these issues. It may not be the one line of code that is causing the problem. It may just be that the one line of code is often showing you the straw that breaks the camels back.
I have used Redgates Ants Profiler before (it comes with a 14 day free trial), and it helped me heaps to get memory usage down and increase performance. I seem to be plugging this a lot recently, but it is purely due to the fact I find it to be a very valuable tool.
Take a look through some of their walkthroughs and/or vidoes to see how to track down a leak.
来源:https://stackoverflow.com/questions/6786000/exception-of-type-system-outofmemoryexception-was-thrown-in-c-sharp