I have a list shaped like this:
\'((\"Alpha\" . 1538)
(\"Beta\" . 8036)
(\"Gamma\" . 8990)
(\"Beta\" . 10052)
(\"Alpha\" . 12837)
I dunno that this is the most elegant, but it seems reasonable:
(defun add-for-cheeso (data)
(let (result)
(dolist (elt data result)
(let ((sofar (assoc (car elt) result)))
(if sofar
(setcdr sofar (1+ (cdr sofar)))
(push (cons (car elt) 1) result))))))
Every time you want to traverse a list and return some value afterwards, be it a new list or some aggregate result, you are thinking of a fold, also called "reduce" in Python and Lisps. Fold is a great abstraction, as it allows to write generic code, applicable for many use-cases just by tweaking some elements. What is similar between finding a sum of several numbers, finding a product, finding a minimum integer? They are all folds, because you run through the list and then return some result based on its content. In Emacs Lisp they would look like this:
(reduce '+ '(1 2 3 4 5)) ; 15
(reduce '* '(1 2 3 4 5)) ; 120
(reduce 'min '(1 2 3 4 5)) ; 1
But folds are even more general than this. What is similar between finding a sum, counting a number of even numbers in a list, removing every odd number, and building a list with every number increased by 5? Every such function can be implemented by taking some base value, successively transform it, until you get the result. You take this base value, metaphorical blob of clay, call it "accumulator", then take one element from a list and based on this element do something to this blob of clay, make it a draft of a magnificent sculpture. Then you take the next element from a list and do something new to your sculpture. You repeat that until the list is empty and you end up with a masterpiece. It's as if every element of a list is a single instruction in a large recipe. Just bear in mind, that you are completely free to do anything with the clay, you don't have to use the list elements in the result directly—technically, this means that the accumulator (and, thus, the result) may be of different type.
(reduce '+ '(1 2 3 4 5) :initial-value 0) ; 15
(reduce (lambda (acc x) (if (evenp x) (1+ acc) acc)) '(1 2 3 4 5) :initial-value 0) ; 2
(reduce (lambda (x acc) (if (oddp x) acc (cons x acc))) '(1 2 3 4 5) :initial-value '() :from-end t) ; (2 4)
(reduce (lambda (x acc) (cons (+ x 5) acc)) '(1 2 3 4 5) :initial-value '() :from-end t) ; (6 7 8 9 10)
Note about reducing from end: lists in Lisps are not smart arrays like in Python or Java, they are linked lists, therefore accessing or changing an element somewhere in a list is an O(n) operation, while "consing" to the beginning of a list is O(1). In other words, appending an element to the end of a list is expensive, therefore Lispers usually add elements to the beginning of a list and then finally reverse the list, which is called push/nreverse idiom. If we did the ordinary reduce in the last 2 functions, we would cons 1 to the accumulator and get (1), then cons 2 to accumulator and get (2 1), until we get the correct result but upside-down. We could use
reverse
function afterwards, but luckily Emacs'sreduce
supports:from-end
keyword argument, so it conses 5, then 4, then 3, and so on.
It's clear now, that your operation is a fold, traverse the original alist and count occurrences of each key. Before writing our fold, let's talk about alists first. Alist in Lisp is a poor man's hash-table. You don't usually tinker with a programming language's hash-table implementation details, do you? You work with an API. In Python this API looks like square bracket syntax (d['a'] = 1
) and dict methods (d.keys()
). For alists API contains function assoc
, which returns an item provided a key.
(assoc 'a '((a . 1) (b . 2))) ; (a . 1)
Why do I talk about implementation details? Because you work via assoc
and you don't care how exactly this alist looks like, you abstract that away. Another piece of API is that if you want to add a new element or change an existing one, you simply cons a dotted pair to the alist. It's how you supposed to work with alists, regardless of their internal structure. Why does that work? For example, if I want to change value for key a
to 10, I would simply run (cons '(a . 10) my-alist)
, and my-alist
would end up being '((a . 10) (a . 1) (b . 2))
. But it's no problem, because assoc
returns only the first dotted pair and ignores the rest, so you can treat alist just like any other key-value data structure. With that in mind let's write our first serious fold.
(reduce (lambda (acc x)
(let* ((key (car x))
(pair (assoc key acc))
(count (cdr pair)))
(if pair
(cons (cons key (1+ count)) acc)
(cons (cons key 1) acc))))
my-alist
:initial-value '())
What happens here? We take your data and an empty list, which will soon become our desired result. At each step we take a pair from data and ask: does our result contain info about this pair? If not, then we add it to the result and put 1—we met this key for the first time. However, if we do find info about this pair in our result, then we must again add it to our result, but this time with a number increased by 1. Repeat that process for every item in your data, and you get:
(("Alpha" . 4) ("Gamma" . 3) ("Gamma" . 2) ("Rho" . 1) ("Alpha" . 3)
("Beta" . 5) ("Beta" . 4) ("Beta" . 3) ("Alpha" . 2) ("Beta" . 2)
("Gamma" . 1) ("Beta" . 1) ("Alpha" . 1))
Remember that assoc
only cares about the first occurrence of a key? This alist would behave the same as if it was just (("Alpha" . 4) ("Gamma" . 3) ("Rho" . 1) ("Beta" . 5))
, so we're good here. Still, could we change our fold as to get the latter, shorter result instead? Hold on, what's the need to over-complicate our fold, if we could just tweak the result afterwards? After all, what is computer programming, if not series of data transformations? There is no reason why you couldn't just remove all the "obsolete" pairs from your alist, just use cl-remove-duplicates with correct arguments, and you're done.
So we're proud of ourselves, we wrote a fold, a pillar of functional programming, yet careful examination exposes an inefficiency: we traverse the accumulator with assoc
to find a pair and its value to increment. assoc
takes O(n), reduce
itself takes O(n), therefore our algorithm is O(n²) (read about order of growth, if you don't understand Big-O notation). It's clear that we should better work with a proper optimized hash-table instead, and convert it to an alist when we need. Rewrite our fold:
(reduce (lambda (acc x)
(cl-incf (gethash (car x) acc 0))
acc)
my-alist
:initial-value (make-hash-table :test 'equal))
(gethash k d 0)
is equivalent to Python's d.get('k', 0)
, where the last argument is default. cl-incf
(Common Lisp equivalent incf
) is a smart macro that increments its argument in-place (read about setf to understand smart assignments). make-hash-table
requires custom test function, because strings can't be compared with default eql
function. To get an alist, just convert the result hash-table of our fold with ht->alist
function, that we either take from Wilfred's ht.el library, or write ourselves:
(defun ht->alist (table)
(let (alist)
(maphash (lambda (k v)
(push (cons k v) alist))
table)
alist))
Combining higher level Common Lisp functions:
(defun count-unique (alist)
(mapcar
(lambda (item)
(cons (car item)
(count (car item) alist :test #'equal :key #'car)))
(remove-duplicates alist :test #'equal :key #'car)))
It doesn't scale to large lists though. If you need O(n) performance use a hash table based solution instead, such as the less elegant:
(defun count-unique (alist)
(loop
with hash = (make-hash-table :test #'equal)
for (key . nil) in alist
do (incf (gethash key hash 0))
finally (return
(loop for key being each hash-key of hash
using (hash-value value)
collect (cons key value)))))