C Library for compressing sequential positive integers

前端 未结 6 1502
难免孤独
难免孤独 2021-02-05 19:39

I have the very common problem of creating an index for an in-disk array of strings. In short, I need to store the position of each string in the in-disk representation. For exa

6条回答
  •  礼貌的吻别
    2021-02-05 20:24

    I use fastbit (Kesheng Wu LBL.GOV), it seems you need something good, fast and NOW, so fastbit is a highly competient improvement on Oracle's BBC (byte aligned bitmap code, berkeleydb). It's easy to setup and very good gernally.

    However, given more time, you may want to look at a gray code solution, it seems optimal for your purposes.

    Daniel Lemire has a number of libraries for C/++/Java released on code.google, I've read over some of his papers and they are quite nice, several advancements on fastbit and alternative approaches for column re-ordering with permutated grey codes's.

    Almost forgot, I also came across Tokyo Cabinet, though I do not think it will be well suited for my current project, I may of considered it more if I had known about it before ;), it has a large degree of interoperability,

    Tokyo Cabinet is written in the C language, and provided as API of C, Perl, Ruby, Java, and Lua. Tokyo Cabinet is available on platforms which have API conforming to C99 and POSIX.

    As you referred to CDB, the TC benchmark has a TC mode (TC support's several operational constraint's for varying perf) where it surpassed CDB by 10 times for read performance and 2 times for write.

    With respect to your delta encoding requirement, I am quite confident in bsdiff and it's ability to out-perform any file.exe content patching system, it may also have some fundimental interfaces for your general needs.

    Google's new binary compression application, courgette may be worth checking out, in case you missed the press release, 10x smaller diff's than bsdiff in the one test case I have seen published.

提交回复
热议问题