I've heard that TLB is maintained by the MMU not the CPU cache.
Then Does One TLB exist on the CPU and is shared between all processor or each processor has its own TLB cache?
Could anyone please explain relationship between MMU and L1, L2 Cache?
The TLB caches the translations listed in the page table. Each CPU core can be running in a different context, with different page tables. This is what you'd call the MMU, if it was a separate "unit", so each core has its own MMU. Any shared caches are always physically-indexed / physically tagged, so they cache based on post-MMU physical address.
The TLB is a cache, so technically it's just an implementation detail that could vary by microarchitecture (between different implementations of the x86 architecture).
In practice, all that really varies is the size. 2-level TLBs are common now, to keep full TLB misses to a minimum but still be fast enough allow 3 translations per clock cycle. The TLB's main goal is to be fast, not necessarily big, so a shared TLB between cores wouldn't be useful. Esp. given the overhead of making sure all the cores using it were running threads that shared the same page tables.
Even if all cores were running threads from the same process, some threads might be in kernel mode handling an interrupt or system call, and thus using the kernel's page tables. That makes a shared-across-cores TLB of very low value / harder to implement.
For an example of how the pieces fit together in a real CPU, see David Kanter's writeup of Intel's Sandybridge design. Note that the diagrams are for a single SnB core. The only shared-between-cores cache in most CPUs is the last-level cache. Intel's SnB-family designs all use a 2MiB-per-core modular L3 cache on a ring bus. So adding more cores adds more L3 to the total pool, as well as adding new cores (each with their own L2/L1D/L1I/uop-cache, and two-level TLB.)
来源:https://stackoverflow.com/questions/34437371/is-the-tlb-shared-between-multiple-cores