Interesting NLP/machine-learning style project — analyzing privacy policies

后端 未结 3 1426
别那么骄傲
别那么骄傲 2021-01-05 14:55

I wanted some input on an interesting problem I\'ve been assigned. The task is to analyze hundreds, and eventually thousands, of privacy policies and identify core characte

3条回答
  •  抹茶落季
    2021-01-05 15:14

    I would approach this as a machine learning problem where you are trying to classify things in multiple ways- ie wants location, wants ssn, etc.

    You'll need to enumerate the characteristics you want to use (location, ssn), and then for each document say whether that document uses that info or not. Choose your features, train your data and then classify and test.

    I think simple features like words and n-grams would probably get your pretty far, and a dictionary of words related to stuff like ssn or location would finish it nicely.

    Use the machine learning algorithm of your choice- Naive Bayes is very easy to implement and use and would work ok as a first stab at the problem.

提交回复
热议问题