There is a wonderful article about a lightweight notification system built in Swift, by Mike Ash: (https://www.mikeash.com/pyblog/friday-qa-2015-01-23-lets-build-swift-notificat
Well, the documentation on Grand Central Dispatch is fairly vague about the inner workings & the exact costs of dispatch queues, however it does state that:
GCD provides and manages FIFO queues to which your application can submit tasks in the form of block objects. Blocks submitted to dispatch queues are executed on a pool of threads fully managed by the system.
So, it sounds like queues are no more than an interface for queueing blocks through a thread pool, and therefore have no/minimal impact on performance when idle.
The conceptual documentation also states that:
You can create as many serial queues as you need
Which definitely sounds like there's almost a trivial cost with creating serial a dispatch queue, and leaving it idle.
Furthermore, I decided to test creating 10,000 serial and concurrent dispatch queues on an app with some Open GL content, and didn't find that the performance was impacted in any way, the FPS remained the same, and it only utilised an extra 4MB of RAM (~400 bytes for a single queue).
In terms of using an OS_SPINLOCK instead of dispatch queues, Apple is very clear in it's documentation about migrating away threads that GCD is more efficient than using standard locks (at least in contended cases).
Replacing your lock-based code with queues eliminates many of the penalties associated with locks and also simplifies your remaining code. Instead of using a lock to protect a shared resource, you can instead create a queue to serialize the tasks that access that resource. Queues do not impose the same penalties as locks. For example, queueing a task does not require trapping into the kernel to acquire a mutex.
Although it's also worth noting that you can always release a queue if you're not using it and re-create it later when it needs using again, if you are concerned about memory.
Dispatch queues are the way to go. You don't need to worry too much about creating lots of queues and not using them, and they're certainly more efficient than locks.
Edit: You actually found that a spinlock is faster in un-contended situations, so you'll probably want to use that for this!