问题
So I'm working on a project that involves a LCD screen that can update 60 times per second. It uses a BitmapFrame
and I need to copy those pixels to a library that updates the screen. Currently I'm getting about 30-35 FPS which is too low. So I'm trying to use multi-threading but this creates a lot of problems.
The DisplayController already creates a thead to do all the work on like so:
public void Start()
{
_looper = new Thread(Loop);
_looper.IsBackground = true;
_looper.Start();
}
private void Loop()
{
while (_IsRunning)
{
renderScreen();
}
}
Which calls the renderScreen
method that draws all the elements and copies the pixels to the BitmapFrame
. But this proces takes too long so my FPS drops. My attempt to solve this problem was by creating a Task
that draws, copies and writes the pixels. But this solution uses a lot of CPU and causes glitchtes on the screen.
public void renderScreen()
{
Task.Run(() =>
{
Monitor.Enter(_object);
// Push screen to LCD
BitmapFrame bf = BitmapFrame.Create(screen);
RenderOptions.SetBitmapScalingMode(bf, BitmapScalingMode.LowQuality);
bf.CopyPixels(new Int32Rect(0, 0, width, height), pixels, width * 4, 0);
DisplayWrapper.USBD480_DrawFullScreenBGRA32(ref disp, pixels);
Monitor.Exit(_object);
});
}
I've been reading a lot about concurrent queues for C# but that's not what I need. And using two threads causes the issue that the compiler says that the variable is owned by another thread.
How can I concurrently render a new bitmap and write that bitmap 60 times per second to the LCD?
回答1:
I think you should have two threads (and two only):
- one that continuously creates the bitmap; and
- one that continuously takes the most recent bitmap and pushes it to LCD.
Here's my naive implementation.
I used a shared array that contains the latest produced image because it keeps number of allocations low. A shared array we can get away with 3 array objects (shared + 2 thread locals).
public class Program
{
public class A
{
private readonly object pixelsLock = new object();
Array shared = ...;
public void Method2()
{
Array myPixels = (...);
while (true)
{
// Prepare image
BitmapFrame bf = BitmapFrame.Create(screen);
RenderOptions.SetBitmapScalingMode(bf, BitmapScalingMode.LowQuality);
bf.CopyPixels(new Int32Rect(0, 0, width, height), myPixels, width * 4, 0);
lock (pixelsLock)
{
// Copy the hard work to shared storage
Array.Copy(sourceArray: myPixels, destinationArray: shared, length: myPixels.GetUpperBound(0) - 1);
}
}
}
public void Method1()
{
Array myPixels = (...);
while (true)
{
lock (pixelsLock)
{
//Max a local copy
Array.Copy(sourceArray: shared, destinationArray: myPixels, length: myPixels.GetUpperBound(0) - 1);
}
DisplayWrapper.USBD480_DrawFullScreenBGRA32(ref disp, myPixels);
}
}
}
public static async Task Main(string[] args)
{
var a = new A();
new Thread(new ThreadStart(a.Method1)).Start();
new Thread(new ThreadStart(a.Method2)).Start();
Console.ReadLine();
}
}
回答2:
I assume that USBD480_DrawFullScreenBGRA32
is what actually writes to the LCD, and the rest of the code just prepares the image. I think your key to better performance is preparing the next image while the previous image is being written.
I think your best solution is to use two threads and use a ConcurrentQueue as a buffer for what needs to be written. One thread prepares the images and puts them into the ConcurrentQueue, and the other thread pulls them off the queue and writes them to the LCD. This way you don't have the overhead of calling Task.Run
each time around.
It might also be wise to limit how many frames are written to the queue, so it doesn't get too far ahead and take up unnecessary memory.
回答3:
You could consider using the robust, performant, and highly configurable TPL Dataflow library, that will allow you to construct a pipeline of data. You will be posting raw data into the first block of the pipeline, and the data will be transformed while flowing from one block to the next, before being finally rendered at the last block. All blocks will be working in parallel. In the example bellow there are three blocks, all configured with the default MaxDegreeOfParallelism = 1
, so 3 threads at maximum will be concurrently busy doing work. I have configured the blocks with an intentionally small BoundedCapacity, so that if the incoming raw data is more than what the pipeline can process, the excessive input will be dropped.
var block1 = new TransformBlock<Stream, BitmapFrame>(stream =>
{
BitmapFrame bf = BitmapFrame.Create(stream);
RenderOptions.SetBitmapScalingMode(bf, BitmapScalingMode.LowQuality);
return bf;
}, new ExecutionDataflowBlockOptions()
{
BoundedCapacity = 5
});
var block2 = new TransformBlock<BitmapFrame, int[]>(bf =>
{
var pixels = new int[width * height * 4];
bf.CopyPixels(new Int32Rect(0, 0, width, height), pixels, width * 4, 0);
return pixels;
}, new ExecutionDataflowBlockOptions()
{
BoundedCapacity = 5
});
var block3 = new ActionBlock<int[]>(pixels =>
{
DisplayWrapper.USBD480_DrawFullScreenBGRA32(ref disp, pixels);
}, new ExecutionDataflowBlockOptions()
{
BoundedCapacity = 5
});
The pipeline is created by linking the blocks together:
block1.LinkTo(block2, new DataflowLinkOptions() { PropagateCompletion = true });
block2.LinkTo(block3, new DataflowLinkOptions() { PropagateCompletion = true });
And finally the loop takes the form bellow:
void Loop()
{
while (_IsRunning)
{
block1.Post(GetRawStreamData());
}
block1.Complete();
block3.Completion.Wait(); // Optional, to wait for the last data to be processed
}
In this example 2 types of blocks are used, two TransformBlocks and one ActionBlock at the end. The ActionBlock
s do not produce any output, so they are frequently found at the end of TPL Dataflow pipelines.
An alternative to TPL Dataflow is a recently introduced library named Channels, a small library that is easy to learn. This one includes the interesting option BoundedChannelFullMode, for selecting what items are dropped when the queue is full:
DropNewest: Removes and ignores the newest item in the channel in order to make room for the item being written.
DropOldest: Removes and ignores the oldest item in the channel in order to make room for the item being written.
DropWrite: Drops the item being written.
Wait: Waits for space to be available in order to complete the write operation.
In contrast TPL Dataflow has only two options. It can ether drop the item being written by using the demonstrated block1.Post(...)
, or wait for space to be available by using the alternative block1.SendAsync(...).Wait()
.
Channels are not a complete replacement of TPL Dataflow though, since they deal only with the queuing of the workitems, and not with their actual processing.
来源:https://stackoverflow.com/questions/58402150/c-sharp-asynchronous-lcd-write