How does schedule()+switch_to() functions from linux kernel actually work?

人盡茶涼 提交于 2019-11-28 15:54:56
  1. After calling switch_to(), the kernel stack is switched to that of the task named in next. Changing the address space, etc, is handled in eg context_switch().
  2. schedule() cannot be called in atomic context, including from an interrupt (see the check in schedule_debug()). If a reschedule is needed, the TIF_NEED_RESCHED task flag is set, which is checked in the interrupt return path.
  3. See 2.
  4. I believe that, with the default 8K stacks, Interrupts are handled with whatever kernel stack is currently executing. If 4K stacks are used, I believe there's a separate interrupt stack (automatically loaded thanks to some x86 magic), but I'm not completely certain on that point.

To be a bit more detailed, here's a practical example:

  1. An interrupt occurs. The CPU switches to an interrupt trampoline routine, which pushes the interrupt number onto the stack, then jmps to common_interrupt
  2. common_interrupt calls do_IRQ, which disables preemption then handles the IRQ
  3. At some point, a decision is made to switch tasks. This may be from the timer interrupt, or from a wakeup call. In either case, set_task_need_resched is invoked, setting the TIF_NEED_RESCHED task flag.
  4. Eventually, the CPU returns from do_IRQ in the original interrupt, and proceeds to the IRQ exit path. If this IRQ was invoked from within the kernel, it checks whether TIF_NEED_RESCHED is set, and if so calls preempt_schedule_irq, which briefly enables interrupts while performing a schedule().
  5. If the IRQ was invoked from userspace, we first check whether there's anything that needs doing prior to returning. If so, we go to retint_careful, which checks both for a pending reschedule (and directly invokes schedule() if needed) as well as checking for pending signals, then goes back for another round at retint_check until there's no more important flags set.
  6. Finally, we restore GS and return from the interrupt handler.

As for switch_to(); what switch_to() (on x86-32) does is:

  1. Save the current values of EIP (instruction pointer) and ESP (stack pointer) for when we return to this task at some point later.
  2. Switch the value of current_task. At this point, current now points to the new task.
  3. Switch to the new stack, then push the EIP saved by the task we're switching to onto the stack. Later, a return will be performed, using this EIP as the return address; this is how it jumps back to the old code that previously called switch_to()
  4. Call __switch_to(). At this point, current points to the new task, and we're on the new task's stack, but various other CPU state hasn't been updated. __switch_to() handles switching the state of things like the FPU, segment descriptors, debug registers, etc.
  5. Upon return from __switch_to(), the return address that switch_to() manually pushed onto the stack is returned to, placing execution back where it was prior to the switch_to() in the new task. Execution has now fully resumed on the switched-to task.

x86-64 is very similar, but has to do slightly more saving/restoration of state due to the different ABI.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!