问题
I am writing an iOS application using Apple's new Metal framework. I have an array of Matrix4 objects (see Ray Wenderlich's tutorial) that I need to pass in to a shader via the MTLDevice.newBufferWithLength() method. The Matrix4 object is leveraging Apple's GLKit (it contains a GLKMatrix4 object).
I'm leveraging instancing with the GPU calls.
I will later change this to a struct which includes more data per instance (beyond just the Matrix4 object.
How can I efficiently copy the array of [Matrix4] objects into this buffer?
Is there a better way to do this? Again, I'll expand this to use a struct with more data in the future.
Below is a subset of my code:
let sizeofMatrix4 = sizeof(Float) * Matrix4.numberofElements()
// This returns an array of [Matrix4] objects.
let boxArray = createBoxArray(parentModelViewMatrix)
let sizeOfUniformBuffer = boxArray.count * sizeOfMatrix4
var uniformBuffer = device.newBufferWithLength(sizeofUniformBuffer, options: .CPUCacheModeDefaultCache)
let bufferPointer = uniformBuffer?.contents()
// Ouch - way too slow. How can I optimize?
for i in 0..<boxArray.count
{
memcpy(bufferPointer! + (i * sizeOfMatrix4), boxArray[i].raw(), sizeOfMatrix4)
}
renderEncoder.setVertexBuffer(uniformBuffer, offset: 0, atIndex: 2)
Note: The boxArray[i].raw() method is defined as this in the Objective-C code:
- (void *)raw {
return glkMatrix.m;
}
You can see I'm looping through each array object and then doing a memcpy. I did this since I was experiencing problems treating the array as a contiguous set of memory.
Thanks!
回答1:
A Swift Array is promised to be contiguous memory, but you need to make sure it's really a Swift Array and not secretly an NSArray. If you want to be completely certain, use a ContiguousArray. That will ensure contiguous memory even if the objects in it are bridgeable to ObjC. If you want even more control over the memory, look at ManagedBuffer
.
With that, you should be using newBufferWithBytesNoCopy(length:options:deallocator)
to create a MTL buffer around your existing memory.
回答2:
I've done this with an array of particles that I pass to a compute shader.
In a nutshell, I define some constants and declare a handful of mutable pointers and a mutable buffer pointer:
let particleCount: Int = 1048576
var particlesMemory:UnsafeMutablePointer<Void> = nil
let alignment:UInt = 0x4000
let particlesMemoryByteSize:UInt = UInt(1048576) * UInt(sizeof(Particle))
var particlesVoidPtr: COpaquePointer!
var particlesParticlePtr: UnsafeMutablePointer<Particle>!
var particlesParticleBufferPtr: UnsafeMutableBufferPointer<Particle>!
When I set up the particles, I populate the pointers and use posix_memalign() to allocate the memory:
posix_memalign(&particlesMemory, alignment, particlesMemoryByteSize)
particlesVoidPtr = COpaquePointer(particlesMemory)
particlesParticlePtr = UnsafeMutablePointer<Particle>(particlesVoidPtr)
particlesParticleBufferPtr = UnsafeMutableBufferPointer(start: particlesParticlePtr, count: particleCount)
The loop to populate the particles is slightly different - I now loop over the buffer pointer:
for index in particlesParticleBufferPtr.startIndex ..< particlesParticleBufferPtr.endIndex
{
[...]
let particle = Particle(positionX: positionX, positionY: positionY, velocityX: velocityX, velocityY: velocityY)
particlesParticleBufferPtr[index] = particle
}
Inside the applyShader() function, I create a copy of the memory which is used as both the input and output buffer:
let particlesBufferNoCopy = device.newBufferWithBytesNoCopy(particlesMemory, length: Int(particlesMemoryByteSize),
options: nil, deallocator: nil)
commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 0)
commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 1)
...and after the shader has run, I put the shared memory (particlesMemory) back into the buffer pointer:
particlesVoidPtr = COpaquePointer(particlesMemory)
particlesParticlePtr = UnsafeMutablePointer(particlesVoidPtr)
particlesParticleBufferPtr = UnsafeMutableBufferPointer(start: particlesParticlePtr, count: particleCount)
There's an up to date Swift 2.0 version of this at my GitHub repo here
回答3:
Obviously the point of using shared memory and MTLDevice.makeBuffer(bytesNoCopy:...)
is to avoid redundant memory copies. Therefore, ideally we look for a design that allows us to easily manipulate the data after it's already been loaded into the MTLBuffer
object.
After researching this for a while, I've decided to try and create a semi-generic solution to allow for simplified allocation of page-aligned memory, loading your content into that memory, and subsequently manipulating your items in that shared memory block.
I've created a Swift array implementation called PageAlignedArray that matches the interface and functionality of the built-in Swift array, but always resides on page-aligned memory, and so can be very easily made into an MTLBuffer
. I've also added a convenience method to directly convert PageAlignedArray
into a Metal buffer.
Of course, you can continue to mutate your array afterwards and your updates will be automatically available to the GPU courtesy of the shared-memory architecture. However, keep in mind that you must regenerate your MTLBuffer
object whenever the array's length changes.
Here's a quick code sample:
var alignedArray : PageAlignedContiguousArray<matrix_double4x4> = [matrixTest, matrixTest]
alignedArray.append(item)
alignedArray.removeFirst() // Behaves just like a built-in array, with all convenience methods
// When it's time to generate a Metal buffer:
let testMetalBuffer = device?.makeBufferWithPageAlignedArray(alignedArray)
The sample uses matrix_double4x4
, but the array should work for any Swift value types. Please note that if you use a reference type (such as any kind of class
), the array will contain pointers to your elements and so won't be usable from your GPU code.
来源:https://stackoverflow.com/questions/32317572/efficiently-copying-swift-array-to-memory-buffer-for-ios-metal