I know feature hashing (hashing-trick) is used to reduce the dimensionality and handle sparsity of bit vectors but I don\'t understand how it really works. Can anyone explain th
Large sparse feature can be derivate from interaction, U as user and X as email, so the dimension of U x X is memory intensive. Usually, task like spam filtering has time limitation as well.
Hash trick like other hash function store binary bits (index) which make large scale training feasible. In theory, more hashed length more performance gain, as illustrated in the original paper.
It allocate origin feature into different bucket (finite length of feature space) so that their semantic get kept. Even when spammer use typo to miss on the radar. Although there is distortion error, heir hashed form remain close.
For example,
"the quick brown fox" transform to:
h(the) mod 5 = 0
h(quick) mod 5 = 1
h(brown) mod 5 = 1
h(fox) mod 5 = 3
Use index rather then text value, saves space.
To summarize some of the applications:
dimensionality reduction for high dimension feature vector
sparsification
bag-of-words on the fly
cross-product features
multi-task learning
Reference:
Origin paper:
Feature Hashing for Large Scale Multitask Learning
Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A., Strehl, A., & Vishwanathan, V. (2009). Hash kernels
What is the hashing trick
Quora
Gionis, A., Indyk, P., & Motwani, R. (1999). Similarity search in high dimensions via hashing
Implementation: