I am developing a system with a DSP and an ARM. On the ARM there is a linux OS. I have a DSP sending data to the ARM (Linux) - In the Linux there is a kernel module which read t
Polling has no advantage over waiting. The process still has to be scheduled and switched to and all that and then it does useless poll part of the time.
Linux runs scheduler when returning from interrupts, so when you wake up the waiting task in the in-kernel interrupt handler and it has high priority set (you should give it real-time priority, obviously) the task will be scheduled immediately. You won't beat that with polling.
The standard interface of (character) device files is reasonably fast, so just implement blocking read, poll (which is a blocking system call, not polling anything really) and possibly asynchronous read (uses real-time signal), but I suspect performance of dedicated thread waiting in read system call will be better than AIO. And it's easier to write too. You should find enough examples in kernel sources.