Do the ARM instructions ldrex/strex have to operate on cache aligned data?

假如想象 提交于 2019-12-12 07:20:28

问题


On Intel, the arguments to CMPXCHG must be cache line aligned (since Intel uses MESI to implement CAS).

On ARM, ldrex and strex operate on exclusive reservation granuales.

To be clear, does this then mean on ARM the data being operated upon does not have to be cache line aligned?


回答1:


It says so right in the ARM Architecture Reference Manual A.3.2.1 "Unaligned data access". LDREX and STREX require word alignment. Which makes sense, because an unaligned data access can span exclusive reservation granules.




回答2:


Exclusive access restrictions

The following restrictions apply to exclusive accesses:

• The size and length of an exclusive write with a given ID must be the same as the size and length of the preceding exclusive read with the same ID.

• The address of an exclusive access must be aligned to the total number of bytes in the transaction.

• The address for the exclusive read and the exclusive write must be identical.

• The ARID field of the read portion of the exclusive access must match the AWID of the write portion.

• The control signals for the read and write portions of the exclusive access must be identical.

• The number of bytes to be transferred in an exclusive access burst must be a power of 2, that is, 1, 2, 4, 8, 16, 32, 64, or 128 bytes.

• The maximum number of bytes that can be transferred in an exclusive burst is 128.

• The value of the ARCACHE[3:0] or AWCACHE[3:0] signals must guarantee that the slave that is monitoring the exclusive access sees the transaction. For example, an exclusive access being monitored by a slave must not have an ARCACHE[3:0] or AWCACHE[3:0] value that indicates that the transaction is cacheable.

Failure to observe these restrictions causes Unpredictable behavior.

The above is from the AMBA/AXI spec. You will find that AWLOCK/ARLOCK is ignored by some vendors (meaning ldrex/strex wont work outside the core). I have some code that demonstrates this, or at least will if you find a system that doesnt support exclusive access.

https://github.com/dwelch67/raspberrypi/tree/master/extest

Depending on the task and how portable you want to be you may need to provide swp and ldrex/strex solutions surrounded by ifdefs and/or use the plethora of registers available (runtime) to tell you what instructions are or are not supported by the core you are running on. (you may find in at least one case neither swp nor ldrex/strex are supported).




回答3:


On Intel, the arguments to CMPXCHG do NOT need to be cache aligned. Try it, you will see that it works.

But, you are correct: in cacheable memory, Intel does use the cache protocol to implement CMPXCHG. So, you would be smart to not put two independent high usage synchronization variables in the same cache line - because if two processors were synchronizing using these different variables, cache lines might be thrashing back and forth. But this is exactly the same issue as for any data: you don't different processors to be writing to the same cacheline at the same time. False sharing.

But you certainly can do not cache line aligned locks:

struct Foo {
  int data;
  Lock lock;
  int data_after;
};

You can put different locks in the same cacheline:

struct Foo {
  int data;
  Lock read_lock;
  int data_between;
  Lock write_lock;
  int data_after;
};

Since reading and writing tend to be mutually exclusive, there may be no lossage;

You can put different locks in the same cacheline:

struct Foo {
  int data;
  Lock read_lock;
  int data_between;
  Lock write_lock;
  int data_after;
};

By the way, in uncached memory Intel does not use the cache snooping protocol for atomic operations like CMPXCHG. So there is less reason to cache line align synchronization variables. But you still may want to: many memory subsystems interleave by cacheline size, even when uncached.

And as for ARM: it is pretty much the same.

On a snoopy bus, or uncached, you may not need to worry too much about cache line alignment.

But in a clustered cache hierarchy, you have exactly the same issues as x86. More so, in fact, it is well known how to "export" operations like CMPXCHG, but not ARM ldrexd/strexd.



来源:https://stackoverflow.com/questions/11383125/do-the-arm-instructions-ldrex-strex-have-to-operate-on-cache-aligned-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!