Confused about thread_position_in_grid

问题

I'm working on a compute shader in Metal on macOS. I'm trying to do some very basic things to learn how they work. I'm seeing some output I don't understand. I thought I would start by trying to generate a simple 2D gradient. The red channel would increase from 0 to 1 along the width and the green channel would increase from 0 to 1 along the height. So I wrote this kernel:

kernel void myKernel(texture2d<half, access::write> outTexture [[ texture(MBKT_OutputTexture) ]],
                     uint2  gid  [[thread_position_in_grid]])
{
    half4  color = half4((float)gid.x / 480.0, (float)gid.y / 360.0, 0.0, 1.0);

    outTexture.write(color, gid);
}

And what I get is an increase from 0 to 0.5 at the halfway point, and a solid 0.5 for the rest of the image, like this:

If I invert the 2 values so the kernel calculates this:

half4  color = half4(1.0 - (float)gid.x / 480.0, 1.0 - (float)gid.y / 360.0, 0.0, 1.0);

the results are even stranger. I would expect it to be 1.0 on the left and bottom and go down to 0.5 in the middle, but instead, I get this:

What is going on here? In the first case, it's like everything past the mid point has a value of 0.5. In the second case it's like the left/bottom edge is 0.5 and the middle is 1.0, then flips back to 0.0 one pixel later.

Oddly, if I use the thread_position_in_grid to pull values out of buffers, it works correctly. For example, I can compute a Mandelbrot set and the results are correct. But I'm confused by what happens with the simple kernel above. Can anyone explain this to me?

Here's my compute kernel setup code in MTKViewDelegate. This is based on the "Hello Compute" sample code from Apple:

    _metalView = metalView;
    _device = metalView.device;
    _commandQueue = [_device newCommandQueue];

    _metalView.colorPixelFormat = MTLPixelFormatBGRA8Unorm_sRGB;

    // Load all the shader files with a .metal file extension in the project
    id<MTLLibrary> defaultLibrary = [_device newDefaultLibrary];

    // Load the kernel function from the library
    id<MTLFunction> kernelFunction = [defaultLibrary newFunctionWithName:@"myKernel"];

    // Create a compute pipeline state
    NSError*    error   = nil;
    _computePipelineState = [_device newComputePipelineStateWithFunction:kernelFunction
                                                                   error:&error];

    if(!_computePipelineState)
    {
        NSLog(@"Failed to create compute pipeline state, error %@", error);
        return nil;
    }

And here's the code where I create the output texture and the thread groups:

MTLTextureDescriptor*   outputTextureDescriptor = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatBGRA8Unorm_sRGB
                                                                                                     width:_viewportSize.x
                                                                                                    height:_viewportSize.y
                                                                                                 mipmapped:NO];
_outputTexture = [_device newTextureWithDescriptor:outputTextureDescriptor];

// Set the compute kernel's threadgroup size of 16x16
_threadgroupSize = MTLSizeMake(16, 16, 1);

// Calculate the number of rows and columns of threadgroups given the width of the input image
// Ensure that you cover the entire image (or more) so you process every pixel
_threadgroupCount.width  = (_viewportSize.x + _threadgroupSize.width - 1) / _threadgroupSize.width;
_threadgroupCount.height = (_viewportSize.y + _threadgroupSize.height - 1) / _threadgroupSize.height;

// Since we're only dealing with a 2D data set, set depth to 1
_threadgroupCount.depth = 1;

In my tests, the _viewportSize is 480 x 360.

I've done an additional test suggested by @Egor_Shkorov in the comments. Instead of hard-coding 480 and 360, I used the threads_per_grid variable:

kernel void myKernel(
                             texture2d<half, access::write> outTexture [[ texture(MBKT_OutputTexture) ]],
                             uint2  gid  [[thread_position_in_grid]],
                             uint2 tpg [[threads_per_grid]])
{

    half4  color = half4((float)gid.x / tpg.x, (float)gid.y / tpg.y, 0.0, 1.0);

    outTexture.write(color, gid);
}

That improves things, making the gradient stretch all the way in each direction, but it still only goes from 0 to 0.5 instead of to 1 in each direction:

来源：https://stackoverflow.com/questions/55821459/confused-about-thread-position-in-grid

标签

metal

compute-shader