As a follow-up question to this answer. I am trying to replace a for-loop running on CPU with a kernel function in Metal to parallelize computation and speed up performance.
The easiest way to allocate page-aligned memory is with posix_memalign
. Here's a complete example of creating a buffer with page-aligned memory:
void *data = NULL;
NSUInteger pageSize = getpagesize();
NSUInteger allocationSize = /* required byte count, rounded up to next multiple of page size */ pageSize * 10;
int result = posix_memalign(&data, pageSize, allocationSize);
if (result == noErr && data) {
id<MTLBuffer> buffer = [device newBufferWithBytesNoCopy:data
length:allocationSize
options:MTLResourceStorageModeShared
deallocator:^(void *pointer, NSUInteger length)
{
free(pointer);
}];
NSLog(@"Created buffer of length %d", (int)buffer.length);
}
Since you can't ensure that your data will arrive in a page-aligned pointer, you'll probably be better off just allocating a MTLBuffer
of whatever size can accommodate your data, without using the no-copy variant. If you need to do real-time processing of the data, you should create a pool of buffers and cycle among them instead of waiting for each command buffer to complete. The Shared
storage mode is correct for these use cases. The caveat related to malloc
only applies to the no-copy case, since in every other case, Metal allocates the memory for you.