I have an html code as a string. I need to find all img tags in that string, read the value of each src attribute and pass it to a function, that function returns an entire
If I understand your need correctly you can use HtmlAgilityPack for this purpose. Using regex may cause unwanted behavior. Can you try the code below ?
public static string DoIt()
{
string htmlString = "";
using (WebClient client = new WebClient())
htmlString = client.DownloadString("http://dean.edwards.name/my/base64-ie.html"); //This is an example source for base64 img src, you can change this directly to your source.
HtmlDocument document = new HtmlDocument();
document.LoadHtml(htmlString);
document.DocumentNode.Descendants("img")
.Where(e =>
{
string src = e.GetAttributeValue("src", null) ?? "";
return !string.IsNullOrEmpty(src) && src.StartsWith("data:image");
})
.ToList()
.ForEach(x =>
{
string currentSrcValue = x.GetAttributeValue("src", null);
currentSrcValue = currentSrcValue.Split(',')[1];//Base64 part of string
byte[] imageData = Convert.FromBase64String(currentSrcValue);
string contentId = Guid.NewGuid().ToString();
LinkedResource inline = new LinkedResource(new MemoryStream(imageData), "image/jpeg");
inline.ContentId = contentId;
inline.TransferEncoding = TransferEncoding.Base64;
x.SetAttributeValue("src", "cid:" + inline.ContentId);
});
string result = document.DocumentNode.OuterHtml;
}
You can retrieve HtmlAgilityPack from https://www.nuget.org/packages/HtmlAgilityPack
Hope this helps
I think you need to iterate your code for each img fetched form the string. The following code gives you the list of all the img tags:
public static List<string> FetchImgsFromSource(string htmlSource)
{
List<string> listOfImgdata = new List<string>();
string regexImgSrc = @"<img[^>]*?src\s*=\s*[""']?([^'"" >]+?)[ '""][^>]*?>";
MatchCollection matchesImgSrc = Regex.Matches(htmlSource, regexImgSrc, RegexOptions.IgnoreCase | RegexOptions.Singleline);
foreach (Match m in matchesImgSrc)
{
string href = m.Groups[1].Value;
listOfImgdata.Add(href);
}
return listOfImgdata;
}
use this list and user logic in a loop:
foreach (var item in listOfImgdata )
{
var imageData = Convert.FromBase64String(item);
var contentId = Guid.NewGuid().ToString();
LinkedResource inline = new LinkedResource(new MemoryStream(imageData), "image/jpeg");
inline.ContentId = contentId;
inline.TransferEncoding = TransferEncoding.Base64;
//Replace all img tags with the new img tag
htmlBody = Regex.Replace(htmlBody, "<img.+?src=[\"'](.+?)[\"'].*?>", @"<img src='cid:" + inline.ContentId + @"'/>");
}
Hope it works for you.
Also the best way to parse HTML dom is to use HtmlAgilityPack as mentioned by others.