Does anyone know how to screen scrape web-sites that use digest http authentication? I use code like this:
var request = (HttpWebRequest)WebRequest.Create(SiteUrl);
request.Credentials=new NetworkCredential(Login, Password)
I'm able to access the site's mainpage, but when I try to surf to any other pages (using another request with the same credentials) I get "HTTP/1.1 400 Bad Request" error.
I used Fiddler to compare requests of my C# application with Mozilla Firefox requests.
2 URLs that I try to access are: https://mysiteurl/forum/index.php https://mysiteurl/forum/viewforum.php?f=4&sid=d104363e563968b4e4c07e04f4a15203
Here are 2 requests () of my C# app:
Authorization: Digest username="xxx",realm="abc",nonce="NXa26+NjBAA=747dfd1776c9d585bd388377ef3160f1ff265429",uri="/forum/index.php",algorithm="MD5",cnonce="89179bf17dd27785aa1c88ad976817c9",nc=00000001,qop="auth",response="3088821620d9cbbf71e775fddbacfb6d"
Authorization: Digest username="xxx",realm="abc",nonce="1h7T6+NjBAA=4fed4d804d0edcb54bf4c2f912246330d96afa76",uri="/forum/viewforum.php",algorithm="MD5",cnonce="bb990b0516a371549401c0289fbacc7c",nc=00000001,qop="auth",response="1ddb95a45fd7ea8dbefd37a2db705e3a"
And that's what Firefox sending to the server:
Authorization: Digest username="xxx", realm="abc", nonce="T9ICNeRjBAA=4fbb28d42db044e182116ac27176e81d067a313c", uri="/forum/", algorithm=MD5, response="33f29dcc5d70b61be18eaddfca9bd601", qop=auth, nc=00000001, cnonce="ab96bbe39d8d776d"
Authorization: Digest username="xxx", realm="abc", nonce="T9ICNeRjBAA=4fbb28d42db044e182116ac27176e81d067a313c", uri="/forum/viewforum.php?f=4&sid=d104363e563968b4e4c07e04f4a15203", algorithm=MD5, response="a996dae9368a79d49f2f29ea7a327cd5", qop=auth, nc=00000002, cnonce="e233ae90908860e1"
So in my app I have different values in "nonce" field while in Firefox this field is the same. On the other hand I have same values in "nc" field while Firefox increments this field.
Also when my app tries to access site pages in Fiddler i can see that it always gets response "HTTP/1.1 401 Authorization Required", while Firefox authorizes only once. I've tried to set request.PreAuthenticate = true; but it seems to have no effect...
My question is: how to properly implement digest authentication using C#? Are there any standard methods or do I have to do it from scratch? Thanks in advance.
Create a class Digest.cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Security.Cryptography;
using System.Text.RegularExpressions;
using System.Net;
using System.IO;
namespace NUI
{
public class DigestAuthFixer
{
private static string _host;
private static string _user;
private static string _password;
private static string _realm;
private static string _nonce;
private static string _qop;
private static string _cnonce;
private static DateTime _cnonceDate;
private static int _nc;
public DigestAuthFixer(string host, string user, string password)
{
// TODO: Complete member initialization
_host = host;
_user = user;
_password = password;
}
private string CalculateMd5Hash(
string input)
{
var inputBytes = Encoding.ASCII.GetBytes(input);
var hash = MD5.Create().ComputeHash(inputBytes);
var sb = new StringBuilder();
foreach (var b in hash)
sb.Append(b.ToString("x2"));
return sb.ToString();
}
private string GrabHeaderVar(
string varName,
string header)
{
var regHeader = new Regex(string.Format(@"{0}=""([^""]*)""", varName));
var matchHeader = regHeader.Match(header);
if (matchHeader.Success)
return matchHeader.Groups[1].Value;
throw new ApplicationException(string.Format("Header {0} not found", varName));
}
private string GetDigestHeader(
string dir)
{
_nc = _nc + 1;
var ha1 = CalculateMd5Hash(string.Format("{0}:{1}:{2}", _user, _realm, _password));
var ha2 = CalculateMd5Hash(string.Format("{0}:{1}", "GET", dir));
var digestResponse =
CalculateMd5Hash(string.Format("{0}:{1}:{2:00000000}:{3}:{4}:{5}", ha1, _nonce, _nc, _cnonce, _qop, ha2));
return string.Format("Digest username=\"{0}\", realm=\"{1}\", nonce=\"{2}\", uri=\"{3}\", " +
"algorithm=MD5, response=\"{4}\", qop={5}, nc={6:00000000}, cnonce=\"{7}\"",
_user, _realm, _nonce, dir, digestResponse, _qop, _nc, _cnonce);
}
public string GrabResponse(
string dir)
{
var url = _host + dir;
var uri = new Uri(url);
var request = (HttpWebRequest)WebRequest.Create(uri);
// If we've got a recent Auth header, re-use it!
if (!string.IsNullOrEmpty(_cnonce) &&
DateTime.Now.Subtract(_cnonceDate).TotalHours < 1.0)
{
request.Headers.Add("Authorization", GetDigestHeader(dir));
}
HttpWebResponse response;
try
{
response = (HttpWebResponse)request.GetResponse();
}
catch (WebException ex)
{
// Try to fix a 401 exception by adding a Authorization header
if (ex.Response == null || ((HttpWebResponse)ex.Response).StatusCode != HttpStatusCode.Unauthorized)
throw;
var wwwAuthenticateHeader = ex.Response.Headers["WWW-Authenticate"];
_realm = GrabHeaderVar("realm", wwwAuthenticateHeader);
_nonce = GrabHeaderVar("nonce", wwwAuthenticateHeader);
_qop = GrabHeaderVar("qop", wwwAuthenticateHeader);
_nc = 0;
_cnonce = new Random().Next(123400, 9999999).ToString();
_cnonceDate = DateTime.Now;
var request2 = (HttpWebRequest)WebRequest.Create(uri);
request2.Headers.Add("Authorization", GetDigestHeader(dir));
response = (HttpWebResponse)request2.GetResponse();
}
var reader = new StreamReader(response.GetResponseStream());
return reader.ReadToEnd();
}
}
}
Now in your application, you can use the following code:
DigestAuthFixer digest = new DigestAuthFixer(url, username, password);
string strReturn = digest.GrabResponse(url);
This article from 4GuysFromRolla appears to be what you are looking for:
I'm currently observing the same issue, though the web server I'm testing this against is my own. The server logs show:
Digest: uri mismatch - </var/path/some.jpg> does not match request-uri
</var/path/some.jpg?parameter=123456789>
I tried removing the arguments from the URL (as that seemed to be what's different), but the error still occurred just like before.
My conclusion is that the URL arguments have to be included in the digest hash as well and that the HttpWebRequest
is for some reason removing it.
来源:https://stackoverflow.com/questions/594323/implement-digest-authentication-via-httpwebrequest-in-c-sharp