Is it possible to accelerate (dynamic) LINQ queries using GPU?

我们两清 提交于 2019-12-20 17:59:14

问题


I have been searching for some days for solid information on the possibility to accelerate LINQ queries using a GPU.

Technologies I have "investigated" so far:

  • Microsoft Accelerator
  • Cudafy
  • Brahma

In short, would it even be possible at all to do an in-memory filtering of objects on the GPU?

Let´s say we have a list of some objects and we want to filter something like:

var result = myList.Where(x => x.SomeProperty == SomeValue);

Any pointers on this one?

Thanks in advance!

UPDATE

I´ll try to be more specific about what I am trying to achieve :)

The goal is, to use any technology, which is able to filter a list of objects (ranging from ~50 000 to ~2 000 000), in the absolutely fastest way possible.

The operations I perform on the data when the filtering is done (sum, min, max etc) is made using the built in LINQ-methods and is already fast enough for our application, so that´s not a problem.

The bottleneck is "simply" the filtering of data.

UPDATE

Just wanted to add that I have tested about 15 databases, including MySQL (checking possible cluster approach / memcached solution), H2, HSQLDB, VelocityDB (currently investigating further), SQLite, MongoDB etc, and NONE is good enough when it comes to the speed of filtering data (of course, the NO-sql solutions do not offer this like the sql ones, but you get the idea) and/or the returning of the actual data.

Just to summarize what I/we need:

A database which is able to sort data in the format of 200 columns and about 250 000 rows in less than 100 ms.

I currently have a solution with parallellized LINQ which is able (on a specific machine) to spend only nano-seconds on each row when filtering AND processing the result!

So, we need like sub-nano-second-filtering on each row.

  1. Why does it seem that only in-memory LINQ is able to provide this?
  2. Why would this be impossible?

Some figures from the logfile:

Total tid för 1164 frågor: 2579

This is Swedish and translates:

Total time for 1164 queries: 2579

Where the queries in this case are queries like:

WHERE SomeProperty = SomeValue

And those queries are all being done in parallell on 225639 rows.

So, 225639 rows are being filtered in memory 1164 times in about 2.5 seconds.

That´s 9,5185952917007032597107300413827e-9 seconds / row, BUT, that also includes the actual processing of the numbers! We do Count (not null), total count, Sum, Min, Max, Avg, Median. So, we have 7 operations on these filtered rows.

So, we could say it´s actually 7 times faster than the the databases we´ve tried, since we do NOT do any aggregation-stuff in those cases!

So, in conclusion, why are the databases so poor at filtering data compared to in-memory LINQ filtering? Have Microsoft really done such a good job that it is impossible to compete with it? :)

It makes sense though that in-memory filtering should be faster, but I don´t want a sense that it is faster. I want to know what is faster, and if it´s possible why.


回答1:


I will answer definitively about Brahma since it's my library, but it probably applies to other approaches as well. The GPU has no knowledge of objects. It's memory is also mostly completely separate from CPU memory.

If you do have a LARGE set of objects and want to operate on them, you can only pack the data you want to operate on into a buffer suitable for the GPU/API you're using and send it off to be processed.

Note that this will make two round trips over the CPU-GPU memory interface, so if you aren't doing enough work on the GPU to make it worthwhile, you'll be slower than if you simply used the CPU in the first place (like the sample above).

Hope this helps.




回答2:


The GPU is really not intended for all general purpose computing purposes, especially with object oriented designs like this, and filtering an arbitrary collection of data like this would really not be an appropriate thing.

GPU computations are great for things where you are performing the same operation on a large dataset - which is why things like matrix operations and transforms can be very nice. There, the data copying can be outweighed by the incredibly fast computational capabilities on the GPU....

In this case, you'd have to copy all of the data into the GPU to make this work, and restructure it into some form the GPU will understand, which would likely be more expensive than just performing the filter in software in the first place.

Instead, I would recommend looking at using PLINQ for speeding up queries of this nature. Provided your filter is thread safe (which it'd have to be for any GPU related work...) this is likely a better option for general purpose query optimization, as it won't require the memory copying of your data. PLINQ would work by rewriting your query as:

var result = myList.AsParallel().Where(x => x.SomeProperty == SomeValue);

If the predicate is an expensive operation, or the collection is very large (and easily partitionable), this can make a significant improvement to the overall performance when compared to standard LINQ to Objects.




回答3:


GpuLinq

GpuLinq's main mission is to democratize GPGPU programming through LINQ. The main idea is that we represent the query as an Expression tree and after various transformations-optimizations we compile it into fast OpenCL kernel code. In addition we provide a very easy to work API without the need of messing with the details of the OpenCL API.

https://github.com/nessos/GpuLinq




回答4:


select *
from table1  -- contains 100k rows
left join table2 -- contains 1M rows
on table1.id1=table2.id2 -- this would run for ~100G times 
                         -- unless they are cached on sql side
where table1.id between 1 and 100000 -- but this optimizes things (depends)

could be turned into

select id1 from table1 -- 400k bytes if id1 is 32 bit 
-- no need to order

stored in memory

select id2 from table2 -- 4Mbytes if id2 is 32 bit
-- no need to order

stored in memory, both arrays sent to gpu using a kernel(cuda,opencl) like below

int i=get_global_id(0); // to select an id2, we need a thread id
int selectedID2=id2[i];
summary__=-1;
for(int j=0;j<id1Length;j++)
{
      int selectedID1=id1[j];
      summary__=(selectedID2==selectedID1?j:summary__); // no branching
}
summary[i]=j; // accumulates target indexings of 
"on table1.id1=table2.id2" part.

On the host side, you can make

 select * from table1 --- query3

and

 select * from table2 --- query4

then use the id list from gpu to select the data

 // x is table1 ' s data
 myList.AsParallel().ForEach(x=>query3.leftjoindata=query4[summary[index]]);

The gpu code shouldn't be slower than 50ms for a gpu with constant memory, global broadcast ability and some thousands of cores.

If any trigonometric function is used for filtering, the performance would drop fast. Also when left joined tables row count makes it O(m*n) complexity so millions versus millions would be much slower. GPU memory bandwidth is important here.

Edit: A single operation of gpu.findIdToJoin(table1,table2,"id1","id2") on my hd7870(1280 cores) and R7-240(320 cores) with "products table(64k rows)" and a "categories table(64k rows)" (left join filter) took 48 milliseconds with unoptimized kernel.

Ado.Net 's "nosql" style linq-join took more than 2000 ms with only 44k products and 4k categories table.

Edit-2:

left join with a string search condition gets 50 to 200 x faster on gpu when tables grow to 1000s of rows each having at least hundreds of characters.




回答5:


The simple answer for your use case is no.

1) There's no solution for that kind of workload even in raw linq to object, much less in something that would replace your database.

2) Even if you were fine with loading the whole set of data at once (this takes time) it would still be much slower as GPU have high thoroughput but their access is high latency, so if you're looking at "very" fast solutions GPGPU is often not the answer as just preparing / sending the workload and getting back the results will be slow, and in your case probably need to be done in chunks too.



来源:https://stackoverflow.com/questions/9298430/is-it-possible-to-accelerate-dynamic-linq-queries-using-gpu

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!