GPU - System memory mapping

问题

How system memory (RAM) is mapped for GPU access? I am clear about how virtual memory works for cpu but am not sure how would that work when GPU accesses GPU-mapped system memory (host). Basically something related to how Data is copied from system-memory to host-memory and vice versa. Can you provide explanations backed by reference articles please?

回答1:

I found the following slideset quite useful: http://developer.amd.com/afds/assets/presentations/1004_final.pdf

MEMORY SYSTEM ON FUSION APUS The Benefits of Zero Copy Pierre Boudier AMD Fellow of OpenGL/OpenCL Graham Sellers AMD Manager of OpenGL

AMD Fusion Developer Summit June 2011

Be aware, however, this is a fast moving area. Not so much developing new concepts, as in (finally) applying concepts like virtual memory to GPUs. Let me summarize.

In the old days, say prior to 2010, GPUs usually were separate PCI or PCI-excpress cards or boards. They had some DRAM on board the GPU card. This on-board DRAM is pretty fast. They could also access DRAM on the CPU side, typically via DMA copy engines across PCI. GPU access to CPU memory like this is usually quite slow.

The GPU memory was not paged. For that matter, the GPU memory is usually uncached, except for the software managed caches inside the GPU, like the texture caches. "Software managed" means these caches are not cache coherent, and must be manually flushed.

Typically, only a small section of the CPU DRAM was accessed by the GPU - an aperture. Typically, it was pinned - not subject to paging. Usually, not even subject to virtual address translation - typically virtual address = physical address, + maybe some offset.

(Of course, the rest of CPU memory is properly virtual memory, paged, certainly translated, and cached. It's just that the GPU cannot access this safely, because the GPU does (did) not have access to the virtual memory subsystem and the cache coherence system.

Now, the above works, but it's a pain. Operating on something first inside the CPU then inside the GPU is slow. Error prone. And also a great security risk: user provided GPU code often could access (slowly and unsafely) all CPU DRAM, so could be used by malware.

AMD has announced a goal of more tightly integrating GPUs and CPUs. One of the first steps was to create the "Fusion" APUs, chips containing both CPUs and GPUs. (Intel has done similar with Sandybridge; I expect ARM also to do so.)

AMD has also announced that they intend to have the GPU use the virtual memory subsystem, and use caches.

A step in the direction of having the GPU use virtual memory is the AMD IOMMU. Intel has similar. Although the IOMMUs are more oriented towards virtual machines than virtual memory for non-virtual machine OSes.

Systems where the CPU and GPU are inside the same chip typically have the CPU and GPU access the same DRAM chips. So there is no longer "on-GPU-board" and "off-GPU--CPU" DRAM.

But there usually still is a split, a partition, of the DRAM on the system motherboard into memory mainly used by the CPU, and memory mainly used by the GPU. Even though the memory may live inside the same DRAM chips, typically a big chunk is "graphics". Inthe paper above it is called "Local" memory, for historical reasons. CPU and Graphics memory may be tuned differently - typically the GPU memory is lower priority, except for video refresh, and has longer bursts.

In the paper I refer you to, there are different internal busses: Onion for "system" memory, and "Garlic" for faster access to the graphics memory partition. Garlic memory is typically uncached.

The paper I refer to talks about how the CPU and GPU have different page tables. Their subtitle, "the benefits of zero copy" refers to mapping a CPU datastructurer into the GPU page tables, so that you don't need to copy it.

Etc., etc.,

This area of the system is evolving rapidly, so the 2011 paper is already almost obsolete. But you should note the trends

(a) software WANTS uniform access to CPU and GPU memory - virtual memory and cacheable

but

(b) although hardware tries to provide (a), special graphics memory features nearly always make dedicated graphics memory, even if just a partition of the same DRAMs, significantly faster or power efficient.

The gap may be narrowing, but every time you think it is about to go away, another hardware trick can be played.

—-

BTW, this answer from 2012 should be updated - I am writing this in 2019. Much still applies, eg tge CPU/GPU memory distinction. GPU memory is still higher speed, but often nowadays there is more GPU memory than CPU, at least in datacenter DL systems. Not so much in home PCs. Also, GPUs now support virtual memory. This is by no means a full update.

来源：https://stackoverflow.com/questions/11355426/gpu-system-memory-mapping

标签

architecture

hardware

gpu

computer-architecture