Interesting NLP/machine-learning style project — analyzing privacy policies

后端未结

关注

 3  1426

别那么骄傲 2021-01-05 14:55

I wanted some input on an interesting problem I\'ve been assigned. The task is to analyze hundreds, and eventually thousands, of privacy policies and identify core characte

3条回答

抹茶落季 (楼主)

2021-01-05 15:14

I would approach this as a machine learning problem where you are trying to classify things in multiple ways- ie wants location, wants ssn, etc.

You'll need to enumerate the characteristics you want to use (location, ssn), and then for each document say whether that document uses that info or not. Choose your features, train your data and then classify and test.

I think simple features like words and n-grams would probably get your pretty far, and a dictionary of words related to stuff like ssn or location would finish it nicely.

Use the machine learning algorithm of your choice- Naive Bayes is very easy to implement and use and would work ok as a first stab at the problem.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...