What is the fastest way to compare two byte arrays?

后端 未结 6 1496
你的背包
你的背包 2021-01-18 07:59

I am trying to compare two long bytearrays in VB.NET and have run into a snag. Comparing two 50 megabyte files takes almost two minutes, so I\'m clearly doing something wron

相关标签:
6条回答
  • 2021-01-18 08:16

    The fastest way to compare two byte arrays of equal size is to use interop. Run the following code on a console application:

    using System;
    using System.Runtime.InteropServices;
    using System.Security;
    
    namespace CompareByteArray
    {
        class Program
        {
            static void Main(string[] args)
            {
                const int SIZE = 100000;
                const int TEST_COUNT = 100;
    
                byte[] arrayA = new byte[SIZE];
                byte[] arrayB = new byte[SIZE];
    
                for (int i = 0; i < SIZE; i++)
                {
                    arrayA[i] = 0x22;
                    arrayB[i] = 0x22;
                }
    
                {
                    DateTime before = DateTime.Now;
                    for (int i = 0; i < TEST_COUNT; i++)
                    {
                        int result = MemCmp_Safe(arrayA, arrayB, (UIntPtr)SIZE);
    
                        if (result != 0) throw new Exception();
                    }
                    DateTime after = DateTime.Now;
    
                    Console.WriteLine("MemCmp_Safe: {0}", after - before);
                }
    
                {
                    DateTime before = DateTime.Now;
                    for (int i = 0; i < TEST_COUNT; i++)
                    {
                        int result = MemCmp_Unsafe(arrayA, arrayB, (UIntPtr)SIZE);
    
                        if (result != 0) throw new Exception();
                    }
                    DateTime after = DateTime.Now;
    
                    Console.WriteLine("MemCmp_Unsafe: {0}", after - before);
                }
    
    
                {
                    DateTime before = DateTime.Now;
                    for (int i = 0; i < TEST_COUNT; i++)
                    {
                        int result = MemCmp_Pure(arrayA, arrayB, SIZE);
    
                        if (result != 0) throw new Exception();
                    }
                    DateTime after = DateTime.Now;
    
                    Console.WriteLine("MemCmp_Pure: {0}", after - before);
                }
                return;
            }
    
            [DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl, EntryPoint="memcmp", ExactSpelling=true)]
            [SuppressUnmanagedCodeSecurity]
            static extern int memcmp_1(byte[] b1, byte[] b2, UIntPtr count);
    
            [DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl, EntryPoint = "memcmp", ExactSpelling = true)]
            [SuppressUnmanagedCodeSecurity]
            static extern unsafe int memcmp_2(byte* b1, byte* b2, UIntPtr count);
    
            public static int MemCmp_Safe(byte[] a, byte[] b, UIntPtr count)
            {
                return memcmp_1(a, b, count);
            }
    
            public unsafe static int MemCmp_Unsafe(byte[] a, byte[] b, UIntPtr count)
            {
                fixed(byte* p_a = a)
                {
                    fixed (byte* p_b = b)
                    {
                        return memcmp_2(p_a, p_b, count);
                    }
                }
            }
    
            public static int MemCmp_Pure(byte[] a, byte[] b, int count)
            {
                int result = 0;
                for (int i = 0; i < count && result == 0; i += 1)
                {
                    result = a[0] - b[0];
                }
    
                return result;
            }
    
        }
    }
    
    0 讨论(0)
  • 2021-01-18 08:20

    If you don't need to know the byte, use 64-bit ints that gives you 8 at once. Actually, you can figure out the wrong byte, once you've isolated it to a set of 8.

    Use BinaryReader:

    saveTime  = binReader.ReadInt32()
    

    Or for arrays of ints:

    Dim count As Integer = binReader.Read(testArray, 0, 3)
    
    0 讨论(0)
  • 2021-01-18 08:22

    Better approach... If you are just trying to see if the two are different then save some time by not having to go through the entire byte array and generate a hash of each byte array as strings and compare the strings. MD5 should work fine and is pretty efficient.

    0 讨论(0)
  • 2021-01-18 08:24

    Not strictly related to the comparison algorithm:

    Are you sure your bottleneck is not related to the memory available and the time used to load the byte arrays? Loading two 2 GB byte arrays just to compare them could bring most machines to their knees. If the program design allows, try using streams to read smaller chunks instead.

    0 讨论(0)
  • I see two things that might help:

    First, rather than always accessing the second array as item.Bytes, use a local variable to point directly at the array. That is, before starting the loop, do something like this:

     array2 = item.Bytes
    

    That will save the overhead of dereferencing from the object each time you want a byte. That could be expensive in Visual Basic, especially if there's a Getter method on that property.

    Also, use a "definite loop" instead of "for each". You already know the length of the arrays, so just code the loop using that value. This will avoid the overhead of treating the array as a collection. The loop would look something like this:

    For i = 1 to max Step 1
       If (array1(i) <> array2(i)) 
           Exit For
       EndIf 
    Next
    
    0 讨论(0)
  • 2021-01-18 08:30

    What is the _Bytes(I) call doing? It's not loading the file each time, is it? Even with buffering, that would be bad news!

    There will be plenty of ways to micro-optimise this in terms of looking at longs at a time, potentially using unsafe code etc - but I'd just concentrate on getting reasonable performance first. Clearly there's something very odd going on.

    I suggest you extract the comparison code into a separate function which takes two byte arrays. That way you know you won't be doing anything odd. I'd also use a simple For loop rather than For Each in this case - it'll be simpler. Oh, and check whether the lengths are correct first :)

    EDIT: Here's the code (untested, but simple enough) that I'd use. It's in C# for the minute - I'll convert it in a sec:

    public static bool Equals(byte[] first, byte[] second)
    {
        if (first == second)
        {
            return true;
        }
        if (first == null || second == null)
        {
            return false;
        }
        if (first.Length != second.Length)
        {
            return false;
        }
        for (int i=0; i < first.Length; i++)
        {
            if (first[i] != second[i])                
            {
                return false;
            }
        }
        return true;
    }
    

    EDIT: And here's the VB:

    Public Shared Function ArraysEqual(ByVal first As Byte(), _
                                       ByVal second As Byte()) As Boolean
        If (first Is second) Then
            Return True
        End If
    
        If (first Is Nothing OrElse second Is Nothing) Then
            Return False
        End If
        If  (first.Length <> second.Length) Then
             Return False
        End If
    
        For i as Integer = 0 To first.Length - 1
            If (first(i) <> second(i)) Then
                Return False
            End If
        Next i
        Return True
    End Function
    
    0 讨论(0)
提交回复
热议问题