How to compress small strings

后端 未结 7 1295
没有蜡笔的小新
没有蜡笔的小新 2021-02-01 09:22

I have an sqlite database full of huge number of URLs and it\'s taking huge amount of diskspace, and accessing it causes many disk seeks and is slow. Average URL path length is

7条回答
  •  梦毁少年i
    2021-02-01 09:59

    Is that 97 bytes, or 97 8-bit ASCII characters, or 97 16-bit Unicode characters?

    Assuming that all your URLs are legal URLs as per http://www.w3.org/Addressing/URL/url-spec.txt, then you should have only ASCII characters.

    If 97 16-bit Unicode characters simply storing the lower byte of each character will automatically give you 50% savings.

    If 97 8-bit characters, notice that you only need 7-bits. You can simply pass in 7 bits at a time into your bitstream and store that bitstream into your database; use some older 7-bit transmission protocol; or come up with your own adhoc way of storing every 8th character's bits in the high bits of the previous 7 characters.

提交回复
热议问题