What is feature hashing (hashing-trick)?

前端未结

关注

 3  1814

孤城傲影 2021-02-12 18:03

I know feature hashing (hashing-trick) is used to reduce the dimensionality and handle sparsity of bit vectors but I don\'t understand how it really works. Can anyone explain th

3条回答

南笙 (楼主)

2021-02-12 18:55
Large sparse feature can be derivate from interaction, U as user and X as email, so the dimension of U x X is memory intensive. Usually, task like spam filtering has time limitation as well.

Hash trick like other hash function store binary bits (index) which make large scale training feasible. In theory, more hashed length more performance gain, as illustrated in the original paper.

It allocate origin feature into different bucket (finite length of feature space) so that their semantic get kept. Even when spammer use typo to miss on the radar. Although there is distortion error, heir hashed form remain close.

For example,

"the quick brown fox" transform to:
```
h(the) mod 5 = 0

h(quick) mod 5 = 1

h(brown) mod 5 = 1

h(fox) mod 5 = 3
```
Use index rather then text value, saves space.

To summarize some of the applications:
- dimensionality reduction for high dimension feature vector
  - text in email classification task, collaborate filtering on spam
- sparsification
- bag-of-words on the fly
- cross-product features
- multi-task learning
Reference:
- Origin paper:
  1. Feature Hashing for Large Scale Multitask Learning
  2. Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A., Strehl, A., & Vishwanathan, V. (2009). Hash kernels
- What is the hashing trick
- Quora
- Gionis, A., Indyk, P., & Motwani, R. (1999). Similarity search in high dimensions via hashing
Implementation:
- Langford, J., Li, L., & Strehl, A. (2007). Vow- pal wabbit online learning project (Technical Report). http://hunch.net/?p=309.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...