I am experimenting with Unsafe to iterate over memory instead of iterating over the values in a byte[]. A memory block is allocated using unsafe. The memory is sufficient to
I think the two functions you posted are basically the same because they read only 1 byte and then convert it to int and do the futher comparing.
Reading 4-Byte int or 8-Byte long every time is much more effective.I wrote two function to do the same thing:compare the content of two byte[] to see if they are the same:
function 1:
public static boolean hadoopEquals(byte[] b1, byte[] b2)
{
if(b1 == b2)
{
return true;
}
if(b1.length != b2.length)
{
return false;
}
// Bring WritableComparator code local
for(int i = 0;i < b1.length; ++i)
{
int a = (b1[i] & 0xff);
int b = (b2[i] & 0xff);
if (a != b)
{
return false;
}
}
return true;
}
function 2:
public static boolean goodEquals(byte[] b1,byte[] b2)
{
if(b1 == b2)
{
return true;
}
if(b1.length != b2.length)
{
return false;
}
int baseOffset = UnSafe.arrayBaseOffset(byte[].class);
int numLongs = (int)Math.ceil(b1.length / 8.0);
for(int i = 0;i < numLongs; ++i)
{
long currentOffset = baseOffset + (i * 8);
long l1 = UnSafe.getLong(b1, currentOffset);
long l2 = UnSafe.getLong(b2, currentOffset);
if(0L != (l1 ^ l2))
{
return false;
}
}
return true;
}
I ran these two functions on my laptop(corei7 2630QM , 8GB DDR3 , 64bit win 7 , 64bit Hotspot JVM),and compare two 400MB byte[],result is below:
function 1: ~670ms
function 2: ~80ms
2 is much more faster.
So my suggestion is to read 8-byte every time and use the XOR operator(^):
long l1 = UnSafe.getLong(byteArray, offset); //8 byte
if(0L == l1 ^ 0xFF) //if the lowest byte == 0?
/* do something */
if(0L == l1 ^ 0xFF00) //if the 2nd lowest byte == 0?
/* do something */
/* go on... */
============================================================================
Hi Wilf, I use your code to make a test class as below,this class compare the speed among 3 functions in looking up the 1st 0 in a byte array:
package test;
import java.lang.reflect.Field;
import sun.misc.Unsafe;
/**
* Test the speed in looking up the 1st 0 in a byte array
* Set -Xms the same as -Xms to avoid Heap reallocation
*
* @author yellowb
*
*/
public class StackOverflow
{
public static Unsafe UnSafe;
public static Unsafe getUnsafe() throws SecurityException,
NoSuchFieldException, IllegalArgumentException,
IllegalAccessException
{
Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
Unsafe unsafe = (Unsafe) theUnsafe.get(null);
return unsafe;
}
/**
* use 'byte[index]' form to read 1 byte every time
* @param buf
*/
public static void normalLookup(byte[] buf)
{
for (int i = 0; i < buf.length; ++i)
{
if ((byte) 0 == buf[i])
{
System.out.println("The 1st '0' is at position : " + i);
return;
}
}
System.out.println("Not found '0'");
}
/**
* use Unsafe.getByte to read 1 byte every time directly from the memory
* @param buf
*/
public static void unsafeLookup_1B(byte[] buf)
{
int baseOffset = UnSafe.arrayBaseOffset(byte[].class);
for (int i = 0; i < buf.length; ++i)
{
byte b = UnSafe.getByte(buf, (long) (baseOffset + i));
if (0 == ((int) b & 0xFF))
{
System.out.println("The 1st '0' is at position : " + i);
return;
}
}
System.out.println("Not found '0'");
}
/**
* use Unsafe.getLong to read 8 byte every time directly from the memory
* @param buf
*/
public static void unsafeLookup_8B(byte[] buf)
{
int baseOffset = UnSafe.arrayBaseOffset(byte[].class);
//The first (numLongs * 8) bytes will be read by Unsafe.getLong in below loop
int numLongs = buf.length / 8;
long currentOffset = 0L;
for (int i = 0; i < numLongs; ++i)
{
currentOffset = baseOffset + (i * 8); //the step is 8 bytes
long l = UnSafe.getLong(buf, currentOffset);
//Compare each byte(in the 8-Byte long) to 0
//PS:x86 cpu is little-endian mode
if (0L == (l & 0xFF))
{
System.out.println("The 1st '0' is at position : " + (i * 8));
return;
}
if (0L == (l & 0xFF00L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 1));
return;
}
if (0L == (l & 0xFF0000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 2));
return;
}
if (0L == (l & 0xFF000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 3));
return;
}
if (0L == (l & 0xFF00000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 4));
return;
}
if (0L == (l & 0xFF0000000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 5));
return;
}
if (0L == (l & 0xFF000000000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 6));
return;
}
if (0L == (l & 0xFF00000000000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 7));
return;
}
}
//If some rest bytes exists
int rest = buf.length % 8;
if(0 != rest)
{
currentOffset = currentOffset + 8;
//Because the length of rest bytes < 8,we have to read them one by one
for(; currentOffset < (baseOffset + buf.length); ++currentOffset)
{
byte b = UnSafe.getByte(buf, (long)currentOffset);
if (0 == ((int) b & 0xFF))
{
System.out.println("The 1st '0' is at position : " + (currentOffset - baseOffset));
return;
}
}
}
System.out.println("Not found '0'");
}
public static void main(String[] args) throws SecurityException,
NoSuchFieldException, IllegalArgumentException,
IllegalAccessException
{
UnSafe = getUnsafe();
int len = 1024 * 1024 * 1024; //1G
long startTime = 0L;
long endTime = 0L;
System.out.println("initialize data...");
byte[] byteArray1 = new byte[len];
for (int i = 0; i < len; ++i)
{
byteArray1[i] = (byte) (i % 128 + 1); //No byte will equal to 0
}
//If you want to set one byte to 0,uncomment the below statement
// byteArray1[2500] = (byte)0;
System.out.println("initialize data done!");
System.out.println("use normalLookup()...");
startTime = System.nanoTime();
normalLookup(byteArray1);
endTime = System.nanoTime();
System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");
System.out.println("use unsafeLookup_1B()...");
startTime = System.nanoTime();
unsafeLookup_1B(byteArray1);
endTime = System.nanoTime();
System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");
System.out.println("use unsafeLookup_8B()...");
startTime = System.nanoTime();
unsafeLookup_8B(byteArray1);
endTime = System.nanoTime();
System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");
}
}
And the output is:
initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1271781 us.
use unsafeLookup_1B()...
Not found '0'
time : 716898 us.
use unsafeLookup_8B()...
Not found '0'
time : 591689 us.
the result shows that even reading 1 byte every time by Unsafe.getByte() is much faster than iterating the byte[] regularly.And reading 8-byte long is the fastest.