is a jpeg with a bogus huffman table recoverable?

前端 未结 1 2037
醉话见心
醉话见心 2021-02-08 14:37

I have a JPEG that is un-openable in any program:

Opening in Ubuntu Image Viewer yields:

Passing the photo through convert yields similar r

相关标签:
1条回答
  • 2021-02-08 14:57

    The approach used to get this back was more luck than judgement. I think I can explain, though be aware it involves a hex editor...

    The Wikipedia page for the syntax of a JPEG file explains that it is made up of a series of segments each started by a two byte marker - 0xFF and another byte to indicate the type of segment.

    The hope was that it was just the Huffman table segment of the file that was wrong - as suggested by the error message. Without needing to understand what a Huffman table is, it was enough to see that the same section on Wikipedia explains it is a 0xFF 0xC4 marker for a Huffman table segment.

    Further down the page, it mentions:

    The JPEG standard provides general-purpose Huffman tables; encoders may also choose to generate Huffman tables...

    Opening up a few other JPEG files found what looks like a standard set of 4 consecutive Huffman table segments - each starting with that 0xFF 0xC4 marker. The sample corrupt.jpg however just had one Huffman table - from position 0x00c8 to 0x02bc below.

    (Both contain that &'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz sequence you mentioned in their Huffman tables. In the corrupt file it appears twice in that single Huffman table, in the 'more conventional' JPEGs it appears in the second and fourth Huffman tables.)

    From there, the fixed image is a copy and paste of the standard 4 Huffman tables, in place of that range of bytes in corrupt.jpg - now from 0x00c8 to 0x0278 in the fixed file.

    Because the JPEG format is based around scanning for segments between those 0xff markers, you can just swap out the Huffman segments - there are no other pointers in the file to worry about. As you said, the rest of the file looked like a plausible JPEG.


    Summary of the steps taken:

    • Hex search the corrupt.jpg for FF C4 and note the offset
    • Hex search for the next FF. If it's another FF C4 (so a second Huffman table) keep going
    • Delete the content from the first FF C4 (included) up to but not including the next FF
    • Instead replace it with the 'standard 4 Huffman tables'. These are the bytes in the last sample below, or can be copied from 0x00c8 to 0x0278 in the fixed file

    Corrupt Huffman table:

    0000-00d0:  xx xx xx xx xx xx xx xx-ff c4 01 a2-00 00 01 05  !....... ........
    0000-00e0:  01 01 01 01-01 01 00 00-00 00 00 00-00 00 01 02  ........ ........
    0000-00f0:  03 04 05 06-07 08 09 0a-0b 01 00 03-01 01 01 01  ........ ........
    0000-0100:  0c 10 0d 0b-0c 0f 0c 09-0a 0e 13 0e-0f 10 11 12  ........ ........
    0000-0110:  12 12 0b 0d-13 15 13 11-15 10 11 12-11 01 03 03  ........ ........
    0000-0120:  03 04 04 04-08 04 04 08-11 0b 0a 0b-11 11 11 11  ........ ........
    0000-0130:  11 11 11 11-11 11 11 11-11 11 11 11-11 11 11 11  ........ ........
    0000-0140:  11 11 11 11-11 11 11 11-11 11 11 11-11 11 11 11  ........ ........
    0000-0150:  01 01 01 01-01 00 00 00-00 00 00 01-02 03 04 05  ........ ........
    0000-0160:  06 07 08 09-0a 0b 10 00-02 01 03 03-02 04 03 05  ........ ........
    0000-0170:  05 04 04 00-00 01 7d 01-02 03 00 04-11 05 12 21  ......}. .......!
    0000-0180:  31 41 06 13-51 61 07 22-71 14 32 81-91 a1 08 23  1A..Qa." q.2....#
    0000-0190:  42 b1 c1 15-52 d1 f0 24-33 62 72 82-09 0a 16 17  B...R..$ 3br.....
    0000-01a0:  18 19 1a 25-26 27 28 29-2a 34 35 36-37 38 39 3a  ...%&'() *456789:
    0000-01b0:  43 44 45 46-47 48 49 4a-53 54 55 56-57 58 59 5a  CDEFGHIJ STUVWXYZ
    0000-01c0:  63 64 65 66-67 68 69 6a-73 74 75 76-77 78 79 7a  cdefghij stuvwxyz
    0000-01d0:  83 84 85 86-87 88 89 8a-92 93 94 95-96 97 98 99  ........ ........
    0000-01e0:  9a a2 a3 a4-a5 a6 a7 a8-a9 aa b2 b3-b4 b5 b6 b7  ........ ........
    0000-01f0:  b8 b9 ba c2-c3 c4 c5 c6-c7 c8 c9 ca-d2 d3 d4 d5  ........ ........
    0000-0200:  d6 d7 d8 d9-da e1 e2 e3-e4 e5 e6 e7-e8 e9 ea f1  ........ ........
    0000-0210:  f2 f3 f4 f5-f6 f7 f8 f9-fa 11 00 02-01 02 04 04  ........ ........
    0000-0220:  03 04 07 05-04 04 00 01-02 77 00 01-02 03 11 04  ........ .w......
    0000-0230:  05 21 31 06-12 41 51 07-61 71 13 22-32 81 08 14  .!1..AQ. aq."2...
    0000-0240:  42 91 a1 b1-c1 09 23 33-52 f0 15 62-72 d1 0a 16  B.....#3 R..br...
    0000-0250:  24 34 e1 25-f1 17 18 19-1a 26 27 28-29 2a 35 36  $4.%.... .&'()*56
    0000-0260:  37 38 39 3a-43 44 45 46-47 48 49 4a-53 54 55 56  789:CDEF GHIJSTUV
    0000-0270:  57 58 59 5a-63 64 65 66-67 68 69 6a-73 74 75 76  WXYZcdef ghijstuv
    0000-0280:  77 78 79 7a-82 83 84 85-86 87 88 89-8a 92 93 94  wxyz.... ........
    0000-0290:  95 96 97 98-99 9a a2 a3-a4 a5 a6 a7-a8 a9 aa b2  ........ ........
    0000-02a0:  b3 b4 b5 b6-b7 b8 b9 ba-c2 c3 c4 c5-c6 c7 c8 c9  ........ ........
    0000-02b0:  ca d2 d3 d4-d5 d6 d7 d8-d9 da e2 e3-e4 e5 e6 e7  ........ ........
    0000-02c0:  e8 e9 ea f2-f3 f4 f5 f6-f7 f8 f9 fa-xx xx xx xx  ........ ........
    

    Then the next two bytes are ff dd for the start of the next segment:

    0000-02c0:  xx xx xx xx-xx xx xx xx-xx xx xx xx-ff dd 00 04  ........ ........
    

    This was replaced with the standard 4 general-purpose Huffman tables instead - look for the ff c4 markers:

    0000-00d0:  xx xx xx xx xx xx xx xx-ff c4 00 1f-00 00 01 05  !....... ........
    0000-00e0:  01 01 01 01-01 01 00 00-00 00 00 00-00 00 01 02  ........ ........
    0000-00f0:  03 04 05 06-07 08 09 0a-0b ff c4 00-b5 10 00 02  ........ ........
    0000-0100:  01 03 03 02-04 03 05 05-04 04 00 00-01 7d 01 02  ........ .....}..
    0000-0110:  03 00 04 11-05 12 21 31-41 06 13 51-61 07 22 71  ......!1 A..Qa."q
    0000-0120:  14 32 81 91-a1 08 23 42-b1 c1 15 52-d1 f0 24 33  .2....#B ...R..$3
    0000-0130:  62 72 82 09-0a 16 17 18-19 1a 25 26-27 28 29 2a  br...... ..%&'()*
    0000-0140:  34 35 36 37-38 39 3a 43-44 45 46 47-48 49 4a 53  456789:C DEFGHIJS
    0000-0150:  54 55 56 57-58 59 5a 63-64 65 66 67-68 69 6a 73  TUVWXYZc defghijs
    0000-0160:  74 75 76 77-78 79 7a 83-84 85 86 87-88 89 8a 92  tuvwxyz. ........
    0000-0170:  93 94 95 96-97 98 99 9a-a2 a3 a4 a5-a6 a7 a8 a9  ........ ........
    0000-0180:  aa b2 b3 b4-b5 b6 b7 b8-b9 ba c2 c3-c4 c5 c6 c7  ........ ........
    0000-0190:  c8 c9 ca d2-d3 d4 d5 d6-d7 d8 d9 da-e1 e2 e3 e4  ........ ........
    0000-01a0:  e5 e6 e7 e8-e9 ea f1 f2-f3 f4 f5 f6-f7 f8 f9 fa  ........ ........
    0000-01b0:  ff c4 00 1f-01 00 03 01-01 01 01 01-01 01 01 01  ........ ........
    0000-01c0:  00 00 00 00-00 00 01 02-03 04 05 06-07 08 09 0a  ........ ........
    0000-01d0:  0b ff c4 00-b5 11 00 02-01 02 04 04-03 04 07 05  ........ ........
    0000-01e0:  04 04 00 01-02 77 00 01-02 03 11 04-05 21 31 06  .....w.. .....!1.
    0000-01f0:  12 41 51 07-61 71 13 22-32 81 08 14-42 91 a1 b1  .AQ.aq." 2...B...
    0000-0200:  c1 09 23 33-52 f0 15 62-72 d1 0a 16-24 34 e1 25  ..#3R..b r...$4.%
    0000-0210:  f1 17 18 19-1a 26 27 28-29 2a 35 36-37 38 39 3a  .....&'( )*56789:
    0000-0220:  43 44 45 46-47 48 49 4a-53 54 55 56-57 58 59 5a  CDEFGHIJ STUVWXYZ
    0000-0230:  63 64 65 66-67 68 69 6a-73 74 75 76-77 78 79 7a  cdefghij stuvwxyz
    0000-0240:  82 83 84 85-86 87 88 89-8a 92 93 94-95 96 97 98  ........ ........
    0000-0250:  99 9a a2 a3-a4 a5 a6 a7-a8 a9 aa b2-b3 b4 b5 b6  ........ ........
    0000-0260:  b7 b8 b9 ba-c2 c3 c4 c5-c6 c7 c8 c9-ca d2 d3 d4  ........ ........
    0000-0270:  d5 d6 d7 d8-d9 da e2 e3-e4 e5 e6 e7-e8 e9 ea f2  ........ ........
    0000-0280:  f3 f4 f5 f6-f7 f8 f9 fa-xx xx xx xx xx xx xx xx  ........ .....(..
    
    0 讨论(0)
提交回复
热议问题