I want to make a query against a LDAP directory of how employees are distributed in departments and groups...
Something like: \"Give me the department name of all th
Found the answer myself:
First run this commands to make sure RCurl is installed (as described in http://www.programmingr.com/content/webscraping-using-readlines-and-rcurl/ ):
install.packages("RCurl", dependencies = TRUE)
library("RCurl")
And then user getURL with an ldap URL (as described in http://www.ietf.org/rfc/rfc2255.txt although I couldn't understand it until I read http://docs.oracle.com/cd/E19396-01/817-7616/ldurl.html and saw ldap[s]://hostname:port/base_dn?attributes?scope?filter
):
getURL("ldap://ldap.replaceme.com/o=replaceme.com?memberuid?sub?(cn=group-name)")
I wrote a R library for accessing ldap servers using the openldap library. In detail, the function searchldap is a wrapper for the openldap method searchldap. https://github.com/LukasK13/ldapr
I followed this strategy:
For step (1), I used this script:
#use Modern::Perl;
use strict;
use warnings;
use feature 'say';
use Net::LDAP;
use JSON;
chdir("~/git/_my/R_one-offs/R_grabbag");
my $ldap = Net::LDAP->new( 'ldap.mydomain.de' ) or die "$@";
my $outfile = "ldapentries_mydomain_ldap.json";
my $mesg = $ldap->bind ; # an anonymous bind
# get all cn's (= all names)
$mesg = $ldap->search(
base => " ou=People,dc=mydomain,dc=de",
filter => "(cn=*)"
);
my $json_text = "";
my @entries;
foreach my $entry ($mesg->entries){
my %entry;
foreach my $attr ($entry->attributes) {
foreach my $value ($entry->get_value($attr)) {
$entry{$attr} = $value;
}
}
push @entries, \%entry;
}
$json_text = to_json(\@entries);
say "Length json_text: " . length($json_text);
open(my $FH, ">", $outfile);
print $FH $json_text;
close($FH);
$mesg = $ldap->unbind;
You might need check the a max size limit of entries returned by the ldap server. See https://serverfault.com/questions/328671/paging-using-ldapsearch
For step (2), I used this R code:
setwd("~/git/_my/R_one-offs/R_grabbag")
library(rjson)
# read into R list, from file, created from perl script
json <- rjson::fromJSON(file="ldapentries_mydomain_ldap.json",method = "C")
head(json)
# create a data frame from list
library(reshape2)
library(dplyr)
library(tidyr)
# not really efficient, maybe thre's a better way to do it
df.ldap <- json %>% melt %>% spread( L2,value)
# optional:
# turn factors into characters
i <- sapply(df.ldap, is.factor)
df.ldap[i] <- lapply(df.ldap[i], as.character)
I've written a function here to parse ldap output into a dataframe, and I used the examples provided as a reference for getting everything going.
I hope it helps someone!
library(RCurl)
library(gtools)
parseldap<-function(url, userpwd=NULL)
{
ldapraw<-getURL(url, userpwd=userpwd)
# seperate by two new lines
ldapraw<-gsub("(DN: .*?)\n", "\\1\n\n", ldapraw)
ldapsplit<-strsplit(ldapraw, "\n\n")
ldapsplit<-unlist(ldapsplit)
# init list and count
mylist<-list()
count<-0
for (ldapline in ldapsplit) {
# if this is the beginning of the entry
if(grepl("^DN:", ldapline)) {
count<-count+1
# after the first
if(count == 2 ) {
df<-data.frame(mylist)
mylist<-list()
}
if(count > 2) {
df<-smartbind(df, mylist)
mylist<-list()
}
mylist["DN"] <-gsub("^DN: ", "", ldapline)
} else {
linesplit<-unlist(strsplit(ldapline, "\n"))
if(length(linesplit) > 1) {
for(line in linesplit) {
linesplit2<-unlist(strsplit(line, "\t"))
linesplit2<-unlist(strsplit(linesplit2[2], ": "))
if(!is.null(unlist(mylist[linesplit2[1]]))) {
x<-strsplit(unlist(mylist[linesplit2[1]]), "|", fixed=TRUE)
x<-append(unlist(x), linesplit2[2])
x<-paste(x, sep="", collapse="|")
mylist[linesplit2[1]] <- x
} else {
mylist[linesplit2[1]] <- linesplit2[2]
}
}
} else {
ldaplinesplit<-unlist(strsplit(ldapline, "\t"))
ldaplinesplit<-unlist(strsplit(ldaplinesplit[2], ": "))
mylist[ldaplinesplit[1]] <- ldaplinesplit[2]
}
}
}
if(count == 1 ) {
df<-data.frame(mylist)
} else {
df<-smartbind(df, mylist)
}
return(df)
}