How to compute good preset dictionary for deflate compression
问题 I have an opportunity to preset dictionary for deflate compression. It makes sense in my case, because data to be compressed is relatively small 1kb-3kb and I have a large sample of representative examples. Data to be compressed consists of arbitrary sequence of bytes, so tokenization etc. is not a good way to go. Also, data shows a lot of repetition (between data examples), so good dictionary could potentially give very good results. The question is how calculate good dictionary? Is there an