Most efficient way to store a big DNA sequence?

前端 未结 7 1159
滥情空心
滥情空心 2021-02-04 11:58

I want to pack a giant DNA sequence with an iOS app (about 3,000,000,000 base pairs). Each base pair can have a value A, C, T or G

7条回答
  •  抹茶落季
    2021-02-04 12:45

    You want to look into a 3d space-filling curve. A 3d sfc reduces the 3d complexity to a 1d complexity. It's a little bit like n octree or a r-tree. If you can store your full dna in a sfc you can look for similar tiles in the tree although a sfc is most likely to use with lossy compression. Maybe you can use a block-sorting algorithm like the bwt if you know the size of the tiles and then try an entropy compression like a huffman compression or a golomb code?

提交回复
热议问题