问题
I need to create a newsletters by URL. I to do next:
- Create a WebClient;
- Use WebClient's method DownloadData to get a source of page in byte array;
- Get string from source-html byte array and set it to the newsletter content.
But I have some troubles with paths. All elements' sources were relative (/img/welcome.png) but I need absolute (http://www.mysite.com/img/welcome.png).
How can I do this?
Best regards, Alex.
回答1:
One of the possible ways to resolve this task is the use the HtmlAgilityPack library.
Some example (fix links):
WebClient client = new WebClient();
byte[] requestHTML = client.DownloadData(sourceUrl);
string sourceHTML = new UTF8Encoding().GetString(requestHTML);
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(sourceHTML);
foreach (HtmlNode link in htmlDoc.DocumentNode.SelectNodes("//a[@href]"))
{
if (!string.IsNullOrEmpty(link.Attributes["href"].Value))
{
HtmlAttribute att = link.Attributes["href"];
att.Value = this.AbsoluteUrlByRelative(att.Value);
}
}
回答2:
if the request comes in from your site (same domain links) then you can use this:
new Uri(Request.Uri, "/img/welcome.png").ToString();
If you're in a non-web app, or you want to hardcode the domain name:
new Uri("http://www.mysite.com", "/img/welcome.png").ToString();
回答3:
You have some options:
- You can convert your byte array to a string and find replace.
- You can create a DOM object, convert the byte array to string, load it and append the value to the attributes where needed (basically you are looking for any src, href attribute that doesn't have http: or https: in it).
Console.Write(ControlChars.Cr + "Please enter a Url(for example, http://www.msn.com): ") Dim remoteUrl As String = Console.ReadLine() Dim myWebClient As New WebClient() Console.WriteLine(("Downloading " + remoteUrl)) Dim myDatabuffer As Byte() = myWebClient.DownloadData(remoteUrl) Dim download As String = Encoding.ASCII.GetString(myDataBuffer) download.Replace("src=""/", "src=""" & remoteUrl & "/") download.Replace("href=""/", "href=""" & remoteUrl & "/") Console.WriteLine(download) Console.WriteLine("Download successful.")
This is super contrived and actually the main brunt of it is taken directly from : http://msdn.microsoft.com/en-us/library/xz398a3f.aspx but it illustrates the basic principal behind method 1.
回答4:
Just use this function
'# converts relative URL ro Absolute URI
Function RelativeToAbsoluteUrl(ByVal baseURI As Uri, ByVal RelativeUrl As String) As Uri
' get action tags, relative or absolute
Dim uriReturn As Uri = New Uri(RelativeUrl, UriKind.RelativeOrAbsolute)
' Make it absolute if it's relative
If Not uriReturn.IsAbsoluteUri Then
Dim baseUrl As Uri = baseURI
uriReturn = New Uri(baseUrl, uriReturn)
End If
Return uriReturn
End Function
回答5:
Instead of resolving/completing relative paths, you can try to set the base-element with the href-attrib = the original baseURI in question.
Placed as the first child of the header-element, all following relative paths should be resolved by browser to point to the original destination, not to where the doc (newsletter) is located/comes from.
on firefox, some tautologic(<-in formal logics) to-and-fro of getting/setting of all src/href-attribs resumes in having COMPLETE paths written to all layers(serialized) of the html-doc, thus scriptable, saveable ...:
var d=document;
var n= d.querySelectorAll('[src]'); // do the same for [href] ...
var i=0; var op ="";var ops="";
for (i=0;i<n.length;i++){op = op + n[i].src + "\n";ops=n[i].src;
n[i].src=ops;}
alert(op);
Of course, the url()-func bases as given in the STYLE-Element(s, - for background-img or content-rules) as well as in style-attrib's at node-level and in particular the url()-func-stated src/href-values are NOT regarded/tested by any of the solutions above.
Therefore, to get the base-Elem approach to a valid, tested (compat-list) state, seems the more promising notion to me.
来源:https://stackoverflow.com/questions/2719353/relative-to-absolute-paths-in-html-asp-net