Unsupervised automatic tagging algorithms?

后端 未结 5 1684
挽巷
挽巷 2021-01-30 00:46

I want to build a web application that lets users upload documents, videos, images, music, and then give them an ability to search them. Thin

5条回答
  •  不思量自难忘°
    2021-01-30 01:02

    The most common unsupervised machine learning model for this type of task is Latent Dirichlet Allocation (LDA). This model automatically infers a collection of topics over a corpus of documents based on the words in those documents. Running LDA on your set of documents would assign words with probability to certain topics when you search for them, and then you could retrieve the documents with the highest probabilities to be relevant to that word.

    There have been some extensions to images and music as well, see http://cseweb.ucsd.edu/~dhu/docs/research_exam09.pdf.

    LDA has several efficient implementations in several languages:

    • many implementations from the original researchers
    • http://mallet.cs.umass.edu/, written in Java and recommended by others on SO
    • PLDA: a fast, parallelized C++ implementation

提交回复
热议问题