问题
I'm working on a compute shader in Metal on macOS. I'm trying to do some very basic things to learn how they work. I'm seeing some output I don't understand. I thought I would start by trying to generate a simple 2D gradient. The red channel would increase from 0 to 1 along the width and the green channel would increase from 0 to 1 along the height. So I wrote this kernel:
kernel void myKernel(texture2d<half, access::write> outTexture [[ texture(MBKT_OutputTexture) ]],
uint2 gid [[thread_position_in_grid]])
{
half4 color = half4((float)gid.x / 480.0, (float)gid.y / 360.0, 0.0, 1.0);
outTexture.write(color, gid);
}
And what I get is an increase from 0 to 0.5 at the halfway point, and a solid 0.5 for the rest of the image, like this:
If I invert the 2 values so the kernel calculates this:
half4 color = half4(1.0 - (float)gid.x / 480.0, 1.0 - (float)gid.y / 360.0, 0.0, 1.0);
the results are even stranger. I would expect it to be 1.0 on the left and bottom and go down to 0.5 in the middle, but instead, I get this:
What is going on here? In the first case, it's like everything past the mid point has a value of 0.5. In the second case it's like the left/bottom edge is 0.5 and the middle is 1.0, then flips back to 0.0 one pixel later.
Oddly, if I use the thread_position_in_grid
to pull values out of buffers, it works correctly. For example, I can compute a Mandelbrot set and the results are correct. But I'm confused by what happens with the simple kernel above. Can anyone explain this to me?
Here's my compute kernel setup code in MTKViewDelegate
. This is based on the "Hello Compute" sample code from Apple:
_metalView = metalView;
_device = metalView.device;
_commandQueue = [_device newCommandQueue];
_metalView.colorPixelFormat = MTLPixelFormatBGRA8Unorm_sRGB;
// Load all the shader files with a .metal file extension in the project
id<MTLLibrary> defaultLibrary = [_device newDefaultLibrary];
// Load the kernel function from the library
id<MTLFunction> kernelFunction = [defaultLibrary newFunctionWithName:@"myKernel"];
// Create a compute pipeline state
NSError* error = nil;
_computePipelineState = [_device newComputePipelineStateWithFunction:kernelFunction
error:&error];
if(!_computePipelineState)
{
NSLog(@"Failed to create compute pipeline state, error %@", error);
return nil;
}
And here's the code where I create the output texture and the thread groups:
MTLTextureDescriptor* outputTextureDescriptor = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatBGRA8Unorm_sRGB
width:_viewportSize.x
height:_viewportSize.y
mipmapped:NO];
_outputTexture = [_device newTextureWithDescriptor:outputTextureDescriptor];
// Set the compute kernel's threadgroup size of 16x16
_threadgroupSize = MTLSizeMake(16, 16, 1);
// Calculate the number of rows and columns of threadgroups given the width of the input image
// Ensure that you cover the entire image (or more) so you process every pixel
_threadgroupCount.width = (_viewportSize.x + _threadgroupSize.width - 1) / _threadgroupSize.width;
_threadgroupCount.height = (_viewportSize.y + _threadgroupSize.height - 1) / _threadgroupSize.height;
// Since we're only dealing with a 2D data set, set depth to 1
_threadgroupCount.depth = 1;
In my tests, the _viewportSize
is 480 x 360.
I've done an additional test suggested by @Egor_Shkorov in the comments. Instead of hard-coding 480 and 360, I used the threads_per_grid
variable:
kernel void myKernel(
texture2d<half, access::write> outTexture [[ texture(MBKT_OutputTexture) ]],
uint2 gid [[thread_position_in_grid]],
uint2 tpg [[threads_per_grid]])
{
half4 color = half4((float)gid.x / tpg.x, (float)gid.y / tpg.y, 0.0, 1.0);
outTexture.write(color, gid);
}
That improves things, making the gradient stretch all the way in each direction, but it still only goes from 0 to 0.5 instead of to 1 in each direction:
来源:https://stackoverflow.com/questions/55821459/confused-about-thread-position-in-grid