Remove all strings in { } delimiter using Regex or Html Agility Pack in ASP.NET web forms [duplicate]

空扰寡人 提交于 2019-12-16 18:04:10

问题


i'm trying to extract the text only content from a web page and displayed and i use the HtmlAgilityPack to do the text extraction but the text return with the javascript and css text and i don't want this so i'm trying to detect the { } delimiter to remove all string within the { } delimiter to delete all javascript and css text from the returned text and i use a regex to do that but is not working because i have a nested { } and this is my regex that i'm trying with :

string regex = "\t|\n|<.*?>|(\\[.*\\])|(\".*\")|('.*')|(\\(.*\\))|{\\[.*\\]}|{\".*\"}|{'.*'}|{\\(.*\\)}";
TextArea1.Value = Regex.Replace(s, regex, "");

Input Text:

Los Angeles Times - California, national and world news - Los Angeles Times;},svginImg:function;a.onload=function{var a=navigator.userAgent||navigator.vendor||window.opera;return/;},isIE9:function==9;}},notmobileCalccheck:function;a.style.cssText=;return !!a.style.length;},isAndroidBrowser:function{var a=navigator.userAgent||navigator.vendor;return/android/i.test&&!window.opera;},isSupportedBrowser:function&&!window.opera;},getScreenWidth:function;},isSupported:function isSupported{a=sessionStorage==;}else{try{a=this.supportsSvg{a=false;}}if<=8;}};trb.utils.redirect=function;b.name=;document.body.appendChild;b.submit;if{localStorage=d;}else{for{var c={};for{c;}return null;},remove:function remove;localStorage.removeItem{var b=localStorage;if;a=),f;for;}}},remove:function remove{a.trb=a.trb||{};trb.data=trb.data||{};trb.data.isMobile=trb.browsersupport.isMobile;trb.data.isIE9=trb.browsersupport.isIE9;trb.data.facebookAppId=;trb.data.parentSectionPath=);}if;}trb.data.isSectionFront=true;if;}trb.data.videos={};trb.data.videos.ndnFallbackJsURL=;trb.data.initialpathname=;trb.data.pages=trb.data.pages||{};trb.data.pages={};trb.data.pages.unsupportedBrowserPath=;trb.svg={};trb.svg.data={};trb.svg.data.svgStrings={};trb.svg.data.svgStrings.logoShort=;trb.svg.data.svgStrings.logo=;trb.svg.data.svgStrings.loadingCircle=;trb.svg.data.map={mastheadLogo:{colors:{PRIMARY_COLOR:},string:trb.svg.data.svgStrings.loadingCircle}}; { background: #404040; } .trb_allContentWrapper { background: #333; }


回答1:


i have been using HtmlAgilityPack to load an web page and extract the text content only so when i'm loading the page and extract the text the css and javascript text also is extracted so i try this method of regex to remove the javascript and css from the output text by detect the { } delimiter but was hard so i try anther way and it work and much simpler by using the Descendants() from HtmlAgilityPack and my code is

 HtmlWeb web = new HtmlWeb();
 HtmlDocument doc = web.Load(url);
 doc.DocumentNode.Descendants()
                            .Where(n => n.Name == "script" || n.Name == "style" || n.Name == "#comment")
                            .ToList()
                            .ForEach(n => n.Remove());

            string s = doc.DocumentNode.InnerText;
            TextArea1.Value = Regex.Replace(s, @"\t|\n|<.*?>","");

and find this from : THIS LINK

and every thing works now.




回答2:


why dont you simply try :

/\{.*?\}/g

and replace with nothing.




回答3:


You want to match all case of '{' to '}' including every character which isn't '}' between the pair, then use the following:

/\{[^\}]+\}/g



回答4:


You have nested braces.

In Perl, PHP, Ruby, you could match the nested braces using (?R) (recursion syntax). But .NET does not have recursion. Does this mean we are lost? Luckily, no.

Balancing Groups to the Rescue

C# regex cannot use recursion, but it has an awesome feature called balancing groups.

This regex will match complete nested braces.

(?<counter>{)(?>(?<counter>{)|(?<-counter>})|[^{}]+)+?(?(counter)(?!))

For instance, it will match

  1. {sdfs{sdfs}sd{d{ab}}fs}
  2. {ab}
  3. But not {aa



回答5:


int x=0, y=0;
int l=string.lastIndexOf("}");
do
{
x= string.indexof("{", x) + 1;
y= string.indexof{"}", x};
string.remove(x, y-x);
}
while(y!=l);


来源:https://stackoverflow.com/questions/24114019/remove-all-strings-in-delimiter-using-regex-or-html-agility-pack-in-asp-net

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!