Simple Python implementation of collaborative topic modeling?

前端 未结 2 848
情歌与酒
情歌与酒 2021-01-31 08:53

I came across these 2 papers which combined collaborative filtering (Matrix factorization) and Topic modelling (LDA) to recommend users similar articles/posts based on topic ter

相关标签:
2条回答
  • 2021-01-31 09:17

    This should get you started (although not sure why this hasn't been posted yet): https://github.com/arongdari/python-topic-model

    More specifically: https://github.com/arongdari/python-topic-model/blob/master/ptm/collabotm.py

    class CollaborativeTopicModel:
        """
        Wang, Chong, and David M. Blei. "Collaborative topic 
                                    modeling for recommending scientific articles."
        Proceedings of the 17th ACM SIGKDD international conference on Knowledge
                                    discovery and data mining. ACM, 2011.
        Attributes
        ----------
        n_item: int
            number of items
        n_user: int
            number of users
        R: ndarray, shape (n_user, n_item)
            user x item rating matrix
        """
    

    Looks nice and straightforward. I still suggest at least looking at gensim. Radim has done a fantastic job of optimizing that software very well.

    0 讨论(0)
  • 2021-01-31 09:22

    A very simple LDA implementation using gensin. You can find more informations here: https://radimrehurek.com/gensim/tutorial.html

    I hope it can help you

    from nltk.corpus import stopwords
    from nltk.tokenize import RegexpTokenizer
    from nltk.stem import RSLPStemmer
    from gensim import corpora, models
    import gensim
    
    st = RSLPStemmer()
    texts = []
    
    doc1 = "Veganism is both the practice of abstaining from the use of animal products, particularly in diet, and an associated philosophy that rejects the commodity status of animals"
    doc2 = "A follower of either the diet or the philosophy is known as a vegan."
    doc3 = "Distinctions are sometimes made between several categories of veganism."
    doc4 = "Dietary vegans refrain from ingesting animal products. This means avoiding not only meat but also egg and dairy products and other animal-derived foodstuffs."
    doc5 = "Some dietary vegans choose to wear clothing that includes animal products (for example, leather or wool)." 
    
    docs = [doc1, doc2, doc3, doc4, doc5]
    
    for i in docs:
    
        tokens = word_tokenize(i.lower())
        stopped_tokens = [w for w in tokens if not w in stopwords.words('english')]
        stemmed_tokens = [st.stem(i) for i in stopped_tokens]
        texts.append(stemmed_tokens)
    
    dictionary = corpora.Dictionary(texts)
    corpus = [dictionary.doc2bow(text) for text in texts]
    
    # generate LDA model using gensim  
    ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, id2word = dictionary, passes=20)
    print(ldamodel.print_topics(num_topics=2, num_words=4))
    

    [(0, u'0.066*animal + 0.065*, + 0.047*product + 0.028*philosophy'), (1, u'0.085*. + 0.047*product + 0.028*dietary + 0.028*veg')]

    0 讨论(0)
提交回复
热议问题