Fast search in compressed text files

前端 未结 5 639
旧巷少年郎
旧巷少年郎 2021-02-04 18:56

I need to be able to search for text in a large number of files (.txt) that are zipped. Compression may be changed to something else or even became proprietary. I want to avoid

5条回答
  •  时光取名叫无心
    2021-02-04 19:41

    This is possible, and can be done quite efficiently. There's a lot of exciting research on this topic, more formally known as a Succinct data structure. Some topics I would recommend looking into: Wavelet tree, FM-index/RRR, succinct suffix arrays. You can also efficiently search Huffman encoded strings, as a number of publications have demonstrated.

提交回复
热议问题