问题
I'm unable to access WebHDFS from browser due to Kerberos security. Can anyone help me with this?
Below is the error in browser for “http://****.****/webhdfs/v1/prod/snapshot_rpx/archive?op=LISTSTATUS&user.name=us”
HTTP ERROR 401
Problem accessing /webhdfs/v1/prod/snapshot_rpx/archive. Reason: Authentication required
.Net code for making a request to this URL
HttpWebRequest http = (HttpWebRequest)WebRequest.Create(requestUri);
http.Timeout = timeout;
http.ContentType = contentType;
string responseData = string.Empty;
using (WebResponse response = http.GetResponse())
{
Stream stream = response.GetResponseStream();
StreamReader sr = new StreamReader(stream);
responseData = sr.ReadToEnd();
}
return responseData;
回答1:
[Important notice] this answer applies to a plain Hadoop cluster using a Linux KDC (typically MIT Kerberos). For a Cloudera cluster relying on Microsoft Active Directory KDC, any .Net HTTP connector can achieve SPNEGO using Microsoft SSPI protocol (sooo boring...)
~~~~
The only way I know to access WebHDFS from the Microsoft world is an ugly and complex workaround:
- install MIT Kerberos for Windows utility on the machine that will actually connect to HDFS, plus the appropriate Kerberos5 config file
- make sure that your JVM has the "unlimited strength cryptography" security policy installed (separate download, duh)
- develop a small Java utility that connects to WebHDFS service (on the NameNode) using SPNEGO with a GSSAPI Kerberos ticket
Option 1: create the ticket thru GUI, and tell Java to fetch it in the default cache
Option 2: tell Java to create its own ticket automatically, using a keytab file (must be created on Linux with ktutil
; no such utility in the Windows package), and ignore the cache
- make your Java code run a single GET, to retrieve a HDFS delegation token for this session, then dump the token to StdOut, then exit
- make your .Net code run the Java utility, capture StdOut, and retrieve the token
- connect to WebHDFS (NameNode + eventual redirects to the DataNodes) without SPNEGO, but inserting the token on the URL as a proof of pre-authentication
So in the end it's a Java problem. And setting up a working Kerberos config is incredibly tricky (cf. "Madness beyond the Gate", the current reference site about Kerberos implementation issues in the Hadoop ecosystem)
回答2:
Sorry for the delayed response. Apache Knox may actually provide the solution that you are looking for. It shields the REST clients from the details of how the Hadoop cluster itself is secured. The cluster can go from secured to unsecured on a whim and the clients will authenticated to the Knox Gateway the same way.
The question is how exactly that you would like to authenticate to Knox. The typical way is through HTTP Basic Auth against LDAP (which could be AD). There are however other authentication/federation providers to allow for other mechanisms as well.
The Header based preauth SSO provider is a decent way to go for web app type usecases. See: http://knox.apache.org/books/knox-0-7-0/user-guide.html#Preauthenticated+SSO+Provider
Coupled with SSL mutual authentication (http://knox.apache.org/books/knox-0-7-0/user-guide.html#Mutual+Authentication+with+SSL) between the application and Apache Knox this is an effective way to leverage Knox's role as a trusted proxy for Hadoop to federate the identity established in your application.
The upcoming v0.8.0 release introduces more SSO mechanisms as well.
Hadoop REST clients shouldn't need to know so many details about the Hadoop cluster that when the flexibility of Hadoop allows services to move or security to be enabled in different ways that all of the clients break. Forcing SPNEGO on every browser is a show stopper for many. Apache Knox addresses these issues in a way that REST API developers/consumers are accustomed to working.
来源:https://stackoverflow.com/questions/33878290/accessing-kerberos-protected-webhdfs-from-net-applicationconsole