Is it possible to accelerate (dynamic) LINQ queries using GPU?

后端 未结 5 828
青春惊慌失措
青春惊慌失措 2021-02-07 13:35

I have been searching for some days for solid information on the possibility to accelerate LINQ queries using a GPU.

Technologies I have \"investigated\" so far:

相关标签:
5条回答
  • 2021-02-07 14:15

    The GPU is really not intended for all general purpose computing purposes, especially with object oriented designs like this, and filtering an arbitrary collection of data like this would really not be an appropriate thing.

    GPU computations are great for things where you are performing the same operation on a large dataset - which is why things like matrix operations and transforms can be very nice. There, the data copying can be outweighed by the incredibly fast computational capabilities on the GPU....

    In this case, you'd have to copy all of the data into the GPU to make this work, and restructure it into some form the GPU will understand, which would likely be more expensive than just performing the filter in software in the first place.

    Instead, I would recommend looking at using PLINQ for speeding up queries of this nature. Provided your filter is thread safe (which it'd have to be for any GPU related work...) this is likely a better option for general purpose query optimization, as it won't require the memory copying of your data. PLINQ would work by rewriting your query as:

    var result = myList.AsParallel().Where(x => x.SomeProperty == SomeValue);
    

    If the predicate is an expensive operation, or the collection is very large (and easily partitionable), this can make a significant improvement to the overall performance when compared to standard LINQ to Objects.

    0 讨论(0)
  • 2021-02-07 14:18
    select *
    from table1  -- contains 100k rows
    left join table2 -- contains 1M rows
    on table1.id1=table2.id2 -- this would run for ~100G times 
                             -- unless they are cached on sql side
    where table1.id between 1 and 100000 -- but this optimizes things (depends)
    

    could be turned into

    select id1 from table1 -- 400k bytes if id1 is 32 bit 
    -- no need to order
    

    stored in memory

    select id2 from table2 -- 4Mbytes if id2 is 32 bit
    -- no need to order
    

    stored in memory, both arrays sent to gpu using a kernel(cuda,opencl) like below

    int i=get_global_id(0); // to select an id2, we need a thread id
    int selectedID2=id2[i];
    summary__=-1;
    for(int j=0;j<id1Length;j++)
    {
          int selectedID1=id1[j];
          summary__=(selectedID2==selectedID1?j:summary__); // no branching
    }
    summary[i]=j; // accumulates target indexings of 
    "on table1.id1=table2.id2" part.
    

    On the host side, you can make

     select * from table1 --- query3
    

    and

     select * from table2 --- query4
    

    then use the id list from gpu to select the data

     // x is table1 ' s data
     myList.AsParallel().ForEach(x=>query3.leftjoindata=query4[summary[index]]);
    

    The gpu code shouldn't be slower than 50ms for a gpu with constant memory, global broadcast ability and some thousands of cores.

    If any trigonometric function is used for filtering, the performance would drop fast. Also when left joined tables row count makes it O(m*n) complexity so millions versus millions would be much slower. GPU memory bandwidth is important here.

    Edit: A single operation of gpu.findIdToJoin(table1,table2,"id1","id2") on my hd7870(1280 cores) and R7-240(320 cores) with "products table(64k rows)" and a "categories table(64k rows)" (left join filter) took 48 milliseconds with unoptimized kernel.

    Ado.Net 's "nosql" style linq-join took more than 2000 ms with only 44k products and 4k categories table.

    Edit-2:

    left join with a string search condition gets 50 to 200 x faster on gpu when tables grow to 1000s of rows each having at least hundreds of characters.

    0 讨论(0)
  • 2021-02-07 14:18

    The simple answer for your use case is no.

    1) There's no solution for that kind of workload even in raw linq to object, much less in something that would replace your database.

    2) Even if you were fine with loading the whole set of data at once (this takes time) it would still be much slower as GPU have high thoroughput but their access is high latency, so if you're looking at "very" fast solutions GPGPU is often not the answer as just preparing / sending the workload and getting back the results will be slow, and in your case probably need to be done in chunks too.

    0 讨论(0)
  • 2021-02-07 14:27

    I will answer definitively about Brahma since it's my library, but it probably applies to other approaches as well. The GPU has no knowledge of objects. It's memory is also mostly completely separate from CPU memory.

    If you do have a LARGE set of objects and want to operate on them, you can only pack the data you want to operate on into a buffer suitable for the GPU/API you're using and send it off to be processed.

    Note that this will make two round trips over the CPU-GPU memory interface, so if you aren't doing enough work on the GPU to make it worthwhile, you'll be slower than if you simply used the CPU in the first place (like the sample above).

    Hope this helps.

    0 讨论(0)
  • 2021-02-07 14:30

    GpuLinq

    GpuLinq's main mission is to democratize GPGPU programming through LINQ. The main idea is that we represent the query as an Expression tree and after various transformations-optimizations we compile it into fast OpenCL kernel code. In addition we provide a very easy to work API without the need of messing with the details of the OpenCL API.

    https://github.com/nessos/GpuLinq

    0 讨论(0)
提交回复
热议问题