I want to compress a bigger string into a smaller string in C++. What are the different ways to do this in C++? The requirement is that output should also be a string.
As unaperson said, Google is your friend: Data Compression Algorithms.
Here are a few off the top of my head:
RLE -- Run Length encoded
Huffman
Lepel-Ziv
As I understand from comments of question you don't need to decompress it, but want only for unique strings get unique result. The hashing algorithm which I'm going to explain very easy to understand and works perfect (I've used it lots of times in my practice). It is very simple rolling hash function which is used in Rabin-Karp string search algorithms.
Ok let's consider each string as number in 257-base system (because 257 is prime number). Examples:
Where code(char a)
is ascii code of character a
+ 1 (taking +1 to give different result for strings '\0'
(n times) and '\0'
(m times)). Of course if the string is big then it's appropriate number can't be stored in int
or event in unsigned long long
. But it's not a problem and you can just MOD
it to MAX_SIZE
of data-type where you going to store it. So the final code of your hash function is fallowing.
unsigned long long hash(const string & s)
{
unsigned long long ret = 0;
for(int i = 0; i < s.size(); ++i)
{
ret *= 257;
ret += s[i] + 1;
}
return ret;
}
EDIT: Added source of this algorithm.
Well, if you don't need to uncompress it later:
string s = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
s = "";
Edit: Sounds like you want a hash function - there are a zillion out there, depending on your requirements. Google is your friend.