How to receive L1, L2 & L3 cache size using CPUID instruction in x86

前端 未结 3 1523
忘掉有多难
忘掉有多难 2021-01-05 17:45

I encountered a problem during preparing an assembler x86 project which subject is to write a program getting L1 data, L1 code, L2 and L3 cache size.

I tried to fi

相关标签:
3条回答
  • 2021-01-05 18:07

    You can get the CPU L1, L2 and L3 cache size with CPUID instruction. According to the Intel x86 Software Developer's Manual Volume 2 (Instruction Set Reference). You can get the CPU cache information by CPUID insturciton with EAX equal to 2 or 4. EAX=2 is the older version, and seems like newer CPU does not use it. So I will introduct with EAX=4 case.

    Its output format is:

    So you can calculate the cache size with following formula:

    Cache size = (Ways + 1) * (Partitions + 1) * (Line_Size + 1) * (Sets + 1) or

    Cache size = (EBX[31:22] + 1) * (EBX[21:12] + 1) * (EBX[11:0] + 1) * (ECX + 1)

    For example, I execute the "cpuid -li" insturction in my ubuntu system, and get the following output:

       deterministic cache parameters (4):
      --- cache 0 ---
      cache type                           = data cache (1)
      cache level                          = 0x1 (1)
      self-initializing cache level        = true
      fully associative cache              = false
      extra threads sharing this cache     = 0x1 (1)
      extra processor cores on this die    = 0x7 (7)
      system coherency line size           = 0x3f (63)
      physical line partitions             = 0x0 (0)
      ways of associativity                = 0x7 (7)
      ways of associativity                = 0x0 (0)
      WBINVD/INVD behavior on lower caches = false
      inclusive to lower caches            = false
      complex cache indexing               = false
      number of sets - 1 (s)               = 63
      --- cache 1 ---
      cache type                           = instruction cache (2)
      cache level                          = 0x1 (1)
      self-initializing cache level        = true
      fully associative cache              = false
      extra threads sharing this cache     = 0x1 (1)
      extra processor cores on this die    = 0x7 (7)
      system coherency line size           = 0x3f (63)
      physical line partitions             = 0x0 (0)
      ways of associativity                = 0x7 (7)
      ways of associativity                = 0x0 (0)
      WBINVD/INVD behavior on lower caches = false
      inclusive to lower caches            = false
      complex cache indexing               = false
      number of sets - 1 (s)               = 63
      --- cache 2 ---
      cache type                           = unified cache (3)
      cache level                          = 0x2 (2)
      self-initializing cache level        = true
      fully associative cache              = false
      extra threads sharing this cache     = 0x1 (1)
      **extra processor cores on this die    = 0x7 (7)
      system coherency line size           = 0x3f (63)
      physical line partitions             = 0x0 (0)**
      ways of associativity                = 0x3 (3)
      ways of associativity                = 0x0 (0)
      WBINVD/INVD behavior on lower caches = false
      inclusive to lower caches            = false
      complex cache indexing               = false
      number of sets - 1 (s)               = 1023
      --- cache 3 ---
      cache type                           = unified cache (3)
      cache level                          = 0x3 (3)
      self-initializing cache level        = true
      fully associative cache              = false
      extra threads sharing this cache     = 0xf (15)
      extra processor cores on this die    = 0x7 (7)
      system coherency line size           = 0x3f (63)
      physical line partitions             = 0x0 (0)
      ways of associativity                = 0xb (11)
      ways of associativity                = 0x6 (6)
      WBINVD/INVD behavior on lower caches = false
      inclusive to lower caches            = true
      complex cache indexing               = true
      number of sets - 1 (s)               = 12287
    

    L1 data cache size is: (7+1)(0+1)(63+1)*(63+1)=32K

    L3 cache size is: (11+1)(0+1)(63+1)*(12287+1)=9M

    0 讨论(0)
  • 2021-01-05 18:16

    Marat Dukhan basically gave you the right answer. For newer Intel processors, meaning those made in the last 5-6 years, the best solution is to enumerate over the cpuid leaf 4, meaning you call cpuid a few times, first with EAX=4 and ECX=0, then with EAX=4 and ECX=1 and so forth. This will return info not only on the cache sizes and types but also tell you how these caches are connected to the CPU cores and hyperthreading/SMT units. The algorithm and sample code is given at https://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/ , more specifically in the section titled "Cache Topology Enumeration".

    0 讨论(0)
  • 2021-01-05 18:21

    For Intel CPUs:

    • for newer CPUs you should use "CPUID, eax=0x00000004" (with different values in ECX)

    • for older CPUs (that don't support the first option) you should use "CPUID, eax=0x00000002". This involves having a table to look up what the values mean. There are cases where the same value means different things for different CPUs and you need addition information (e.g. CPU family/model/stepping).

    For VIA CPUs; use the same methods as you would for Intel (with different tables for anything that involves "family/model/stepping").

    For AMD CPUs:

    • for newer CPUs you should use "CPUID, eax=0x8000001D" (with different values in ECX)

    • for older CPUs (that don't support the first option) you should use "CPUID, eax=0x80000006" (for L2 and L3 only), plus "CPUID, eax=0x80000005" (for L1 only).

    For all other cases (very old Intel/VIA/AMD CPUs, CPUs from other manufacturers):

    • use CPU "vendor/family/model/stepping" (from "CPUID, eax=0x0000001") with a table (or maybe 1 table per vendor) so you can search for the right CPU in your table/s and get the information that way.

    • if CPUID is not supported there are ways to try to narrow down the possibilities and determine what the CPU is with reasonable accuracy; but mostly you should just give up.

    In addition; for all CPUs you should trawl through the errata sheets to see if CPUID provides wrong information; and implement work-arounds to correct that wrong information.

    Note that (depending on which range of CPUs you support and awesome you want your code to be) it can take several months of work just to extract reliable information about caches.

    0 讨论(0)
提交回复
热议问题