curious if anyone has insight into what algorithm google news uses to group like stories together? k-means? or something custom?
It is kind of difficult to find that out, I guess; but for now I found this good white paper on possible algorithms for Google News Personalisation suggestions. Have a look for yourself:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.80.4329&rep=rep1&type=pdf
The three algorithms covered here are: (1) MinHash clustering (2) Probabilistic Latent Semantic Indexing (3) Covisitation
and some combinations.
Hope this information was helpful!
When Google launched Google News, they used to put a small section about the algorithms they used to group on "About Google News" page, there was a mention of "An advanced Bayesian Network" and some other algorithms(no other algorithms names were mentioned!). That paragraph is now absent from the same page.