Is there any way to check if InputStream has been gzipped? Here\'s the code:
public static InputStream decompressStream(InputStream input) {
try {
Wrap the original stream in a BufferedInputStream, then wrap that in a GZipInputStream. Next try to extract a ZipEntry. If this works, it's a zip file. Then you can use "mark" and "reset" in the BufferedInputStream to return to the initial position in the stream, after your check.
I believe this is simpliest way to check whether a byte array is gzip formatted or not, it does not depend on any HTTP entity or mime type support
public static boolean isGzipStream(byte[] bytes) {
int head = ((int) bytes[0] & 0xff) | ((bytes[1] << 8) & 0xff00);
return (GZIPInputStream.GZIP_MAGIC == head);
}
It's not foolproof but it's probably the easiest and doesn't rely on any external data. Like all decent formats, GZip too begins with a magic number which can be quickly checked without reading the entire stream.
public static InputStream decompressStream(InputStream input) {
PushbackInputStream pb = new PushbackInputStream( input, 2 ); //we need a pushbackstream to look ahead
byte [] signature = new byte[2];
int len = pb.read( signature ); //read the signature
pb.unread( signature, 0, len ); //push back the signature to the stream
if( signature[ 0 ] == (byte) 0x1f && signature[ 1 ] == (byte) 0x8b ) //check if matches standard gzip magic number
return new GZIPInputStream( pb );
else
return pb;
}
(Source for the magic number: GZip file format specification)
Update: I've just dicovered that there is also a constant called GZIP_MAGIC
in GZipInputStream
which contains this value, so if you really want to, you can use the lower two bytes of it.
This function works perfectly well in Java:
public static boolean isGZipped(File f) {
val raf = new RandomAccessFile(file, "r")
return GZIPInputStream.GZIP_MAGIC == (raf.read() & 0xff | ((raf.read() << 8) & 0xff00))
}
In scala:
def isGZip(file:File): Boolean = {
int gzip = 0
RandomAccessFile raf = new RandomAccessFile(f, "r")
gzip = raf.read() & 0xff | ((raf.read() << 8) & 0xff00)
raf.close()
return gzip == GZIPInputStream.GZIP_MAGIC
}
Building on the answer by @biziclop - this version uses the GZIP_MAGIC header and additionally is safe for empty or single byte data streams.
public static InputStream maybeDecompress(InputStream input) {
final PushbackInputStream pb = new PushbackInputStream(input, 2);
int header = pb.read();
if(header == -1) {
return pb;
}
int b = pb.read();
if(b == -1) {
pb.unread(header);
return pb;
}
pb.unread(new byte[]{(byte)header, (byte)b});
header = (b << 8) | header;
if(header == GZIPInputStream.GZIP_MAGIC) {
return new GZIPInputStream(pb);
} else {
return pb;
}
}
SimpleMagic is a Java library for resolving content types:
<!-- pom.xml -->
<dependency>
<groupId>com.j256.simplemagic</groupId>
<artifactId>simplemagic</artifactId>
<version>1.8</version>
</dependency>
import com.j256.simplemagic.ContentInfo;
import com.j256.simplemagic.ContentInfoUtil;
import com.j256.simplemagic.ContentType;
// ...
public class SimpleMagicSmokeTest {
private final static Logger log = LoggerFactory.getLogger(SimpleMagicSmokeTest.class);
@Test
public void smokeTestSimpleMagic() throws IOException {
ContentInfoUtil util = new ContentInfoUtil();
InputStream possibleGzipInputStream = getGzipInputStream();
ContentInfo info = util.findMatch(possibleGzipInputStream);
log.info( info.toString() );
assertEquals( ContentType.GZIP, info.getContentType() );
}