I recently encountered an issue in a custom Linux kernel (2.6.31.5, x86) driver where copy_to_user would periodically not copy any bytes to user space. It would return the coun
I've found the answer. My #2 suggestion was correct and the mechanism was right in front of my face. The page fault does happen, but the fixup_exception mechanism is used to provide a exception/continue mechanism. This section adds entries to the exception handler table:
".section __ex_table,\"a\"\n" \
" .align 4\n" \
" .long 4b,5b\n" \
" .long 0b,3b\n" \
" .long 1b,6b\n" \
".previous" \
This says: if the IP address is the first entry and an exception is encountered in a fault handler, then set the IP address to the second address and continue.
So if the exception happens at "4:", jump to "5:". If the exception happens at "0:" then jump to "3:" and if the exception happens at "1:" jump to "6:".
The missing piece is in do_page_fault() in arch/x86/mm/fault.c:
/*
* If we're in an interrupt, have no user context or are running
* in an atomic region then we must not take the fault:
*/
if (unlikely(in_atomic() || !mm)) {
bad_area_nosemaphore(regs, error_code, address);
return;
}
in_atomic returned true because we are in a write_lock_bh() lock! bad_area_nosemaphore eventually does the fixup.
If a page_fault would occur (which was unlikely, because of the concept of the working space) then the function call would fail and jump out of the __copy_user macro, with the uncopied bytes set to size because preemption was disabled.