How to compress a String in Java?

前端 未结 10 1553
一向
一向 2020-11-28 05:28

I use GZIPOutputStream or ZIPOutputStream to compress a String (my string.length() is less than 20), but the compressed result is long

相关标签:
10条回答
  • 2020-11-28 06:22

    When you create a String, you can think of it as a list of char's, this means that for each character in your String, you need to support all the possible values of char. From the sun docs

    char: The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).

    If you have a reduced set of characters you want to support you can write a simple compression algorithm, which is analogous to binary->decimal->hex radix converstion. You go from 65,536 (or however many characters your target system supports) to 26 (alphabetical) / 36 (alphanumeric) etc.

    I've used this trick a few times, for example encoding timestamps as text (target 36 +, source 10) - just make sure you have plenty of unit tests!

    0 讨论(0)
  • 2020-11-28 06:24

    You don't see any compression happening for your String, As you atleast require couple of hundred bytes to have real compression using GZIPOutputStream or ZIPOutputStream. Your String is too small.(I don't understand why you require compression for same)

    Check Conclusion from this article:

    The article also shows how to compress and decompress data on the fly in order to reduce network traffic and improve the performance of your client/server applications. Compressing data on the fly, however, improves the performance of client/server applications only when the objects being compressed are more than a couple of hundred bytes. You would not be able to observe improvement in performance if the objects being compressed and transferred are simple String objects, for example.

    0 讨论(0)
  • 2020-11-28 06:28

    The ZIP algorithm is a combination of LZW and Huffman Trees. You can use one of theses algorithms separately.

    The compression is based on 2 factors :

    • the repetition of substrings in your original chain (LZW): if there are a lot of repetitions, the compression will be efficient. This algorithm has good performances for compressing a long plain text, since words are often repeated
    • the number of each character in the compressed chain (Huffman): more the repartition between characters is unbalanced, more the compression will be efficient

    In your case, you should try the LZW algorithm only. Used basically, the chain can be compressed without adding meta-informations: it is probably better for short strings compression.

    For the Huffman algorithm, the coding tree has to be sent with the compressed text. So, for a small text, the result can be larger than the original text, because of the tree.

    0 讨论(0)
  • 2020-11-28 06:33

    If the passwords are more or less "random" you are out of luck, you will not be able to get a significant reduction in size.

    But: Why do you need to compress the passwords? Maybe what you need is not a compression, but some sort of hash value? If you just need to check if a name matches a given password, you don't need do save the password, but can save the hash of a password. To check if a typed in password matches a given name, you can build the hash value the same way and compare it to the saved hash. As a hash (Object.hashCode()) is an int you will be able to store all 20 password-hashes in 80 bytes).

    0 讨论(0)
提交回复
热议问题