Short, case-insensitive string obfuscation strategy

前端未结

关注

 3  340

I am looking for a way to identify (i.e. encode and decode) a set of Java strings with one token. The identification should not involve DB persistence. So far I

相关标签:

3条回答

梦谈多话

2021-01-21 10:54

Rot13 obfuscates but does not shorten. Zip shortens (usually) but does not survive the URL round trip. Encryption will not shorten, and may lengthen. Hashing shortens but is one-way. You do not have an easy problem. Base32 is case insensitive, but takes more space than Base64, which isn't. I suspect that you are going to have to drop or modify your requirements. Which requirements are most important and which least important?

0 讨论(0)
发布评论:

提交评论
- 加载中...

终归单人心

2021-01-21 11:04

I have spent some time on this and I have a good solution for you.

Encode as base64 then as a custom base32 that uses 0-9a-v. Essentially, you lay out the bits 6 at a time (your chars are 0-9a-zA-Z) then encode them 5 at a time. This leads to hardly any extra space. For example, ABCXYZdefxyz123789 encodes as i9crnsuj9ov1h8o4433i14

Here's an implementation that works, including some test code that proves it is case-insensitive:

// Note: You can add 1 more char to this if you want to
static String chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";

private static String decodeToken(String encoded) {
    // Lay out the bits 5 at a time
    StringBuilder sb = new StringBuilder();
    for (byte b : encoded.toLowerCase().getBytes())
        sb.append(asBits(chars.indexOf(b), 5));

    sb.setLength(sb.length() - (sb.length() % 6));

    // Consume it 6 bits at a time
    int length = sb.length();
    StringBuilder result = new StringBuilder();
    for (int i = 0; i < length; i += 6)
        result.append(chars.charAt(Integer.parseInt(sb.substring(i, i + 6), 2)));

    return result.toString();
}

private static String generateToken(String x) {
    StringBuilder sb = new StringBuilder();
    for (byte b : x.getBytes())
        sb.append(asBits(chars.indexOf(b), 6));

    // Round up to 5 bit multiple
    // Consume it 5 bits at a time
    int length = sb.length();
    sb.append("00000".substring(0, length % 5));
    StringBuilder result = new StringBuilder();
    for (int i = 0; i < length; i += 5)
        result.append(chars.charAt(Integer.parseInt(sb.substring(i, i + 5), 2)));

    return result.toString();
}

private static String asBits(int index, int width) {
    String bits = "000000" + Integer.toBinaryString(index);
    return bits.substring(bits.length() - width);
}

public static void main(String[] args) {
    String input = "ABCXYZdefxyz123789";
    String token = generateToken(input);
    System.out.println(input + " ==> " + token);
    Assert.assertEquals("mixed", input, decodeToken(token));
    Assert.assertEquals("lower", input, decodeToken(token.toLowerCase()));
    Assert.assertEquals("upper", input, decodeToken(token.toUpperCase()));
    System.out.println("pass");
}

0 讨论(0)

温柔的废话

2021-01-21 11:10

What's a structure of the text (i.e. set of strings)? You could use your knowledge of it to encode it in a shorten form. E.g. if you have large base-decimal number "1234567890" you could translate it into 36-base number, which will be shorter.

Otherwise it looks like you are trying invent an universal archiver.

If you don't care about length, then yes, processing by alphabet based encoder (such as Base32) is the only choice.

Also, if text is large enough, maybe you could save some space by gzipping it.

0 讨论(0)
发布评论:

提交评论
- 加载中...