ARM Neon: How to convert from uint8x16_t to uint8x8x2_t?

梦想与她 提交于 2019-12-21 12:36:22

问题


I recently discovered about the vreinterpret{q}_dsttype_srctype casting operator. However this doesn't seem to support conversion in the data type described at this link (bottom of the page):

Some intrinsics use an array of vector types of the form:

<type><size>x<number of lanes>x<length of array>_t

These types are treated as ordinary C structures containing a single element named val.

An example structure definition is:

struct int16x4x2_t    
{
    int16x4_t val[2];     
};

Do you know how to convert from uint8x16_t to uint8x8x2_t?

Note that that the problem cannot be reliably addressed using union (reading from inactive members leads to undefined behaviour Edit: That's only the case for C++, while it turns out that C allows type punning), nor by using pointers to cast (breaks the strict aliasing rule).


回答1:


Based on your comments, it seems you want to perform a bona fide conversion -- that is, to produce a distinct, new, separate value of a different type. This is a very different thing than a reinterpretation, such as the lead-in to your question suggests you wanted. In particular, you posit variables declared like this:

uint8x16_t  a;
uint8x8x2_t b;

// code to set the value of a ...

and you want to know how to set the value of b so that it is in some sense equivalent to the value of a.

Speaking to the C language:

The strict aliasing rule (C2011 6.5/7) says,

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object, [...]
  • an aggregate or union type that includes one of the aforementioned types among its members [...], or
  • a character type.

(Emphasis added. Other enumerated options involve differently-qualified and differently-signed versions of the of the effective type of the object or compatible types; these are not relevant here.)

Note that these provisions never interfere with accessing a's value, including the member value, via variable a, and similarly for b. But don't overlook overlook the usage of the term "effective type" -- this is where things can get bolluxed up under slightly different circumstances. More on that later.

Using a union

C certainly permits you to perform a conversion via an intermediate union, or you could rely on b being a union member in the first place so as to remove the "intermediate" part:

union {
    uint8x16_t  x1;
    uint8x8_2_t x2;
} temp;
temp.x1 = a;
b = temp.x2;

Using a typecast pointer (to produce UB)

However, although it's not so uncommon to see it, C does not permit you to type-pun via a pointer:

// UNDEFINED BEHAVIOR - strict-aliasing violation
    b = *(uint8x8x2_t *)&a;
// DON'T DO THAT

There, you are accessing the value of a, whose effective type is uint8x16_t, via an lvalue of type uint8x8x2_t. Note that it is not the cast that is forbidden, nor even, I'd argue, the dereferencing -- it is reading the dereferenced value so as to apply the side effect of the = operator.

Using memcpy()

Now, what about memcpy()? This is where it gets interesting. C permits the stored values of a and b to be accessed via lvalues of character type, and although its arguments are declared to have type void *, this is the only plausible interpretation of how memcpy() works. Certainly its description characterizes it as copying characters. There is therefore nothing wrong with performing a

memcpy(&b, &a, sizeof a);

Having done so, you may freely access the value of b via variable b, as already mentioned. There are aspects of doing so that could be problematic in a more general context, but there's no UB here.

However, contrast this with the superficially similar situation in which you want to put the converted value into dynamically-allocated space:

uint8x8x2_t *c = malloc(sizeof(*c));
memcpy(c, &a, sizeof a);

What could be wrong with that? Nothing is wrong with it, as far as it goes, but here you have UB if you afterward you try to access the value of *c. Why? because the memory to which c points does not have a declared type, therefore its effective type is the effective type of whatever was last stored in it (if that has an effective type), including if that value was copied into it via memcpy() (C2011 6.5/6). As a result, the object to which c points has effective type uint8x16_t after the copy, whereas the expression *c has type uint8x8x2_t; the strict aliasing rule says that accessing that object via that lvalue produces UB.




回答2:


It's completely legal in C++ to type pun via pointer casting, as long as you're only doing it to char*. This, not coincidentally, is what memcpy is defined as working on (technically unsigned char* which is good enough).

Kindly observe the following passage:

For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char.

42 If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value. [Example:

#define N sizeof(T)
char buf[N];
T obj;
// obj initialized to its original value
std::memcpy(buf, &obj, N);
// between these two calls to std::memcpy,
// obj might be modified 
std::memcpy(&obj, buf, N);
// at this point, each subobject of obj of scalar type
// holds its original value

— end example ]

Put simply, copying like this is the intended function of std::memcpy. As long as the types you're dealing with meet the necessary triviality requirements, it's totally legit.

Strict aliasing does not include char* or unsigned char*- you are free to alias any type with these.

Note that for unsigned ints specifically, you have some very explicit leeway here. The C++ Standard requires that they meet the requirements of the C Standard. The C Standard mandates the format. The only way that trap representations or anything like that can be involved is if your implementation has any padding bits, but ARM does not have any- 8bit bytes, 8bit and 16bit integers. So for unsigned integers on implementations with zero padding bits, any byte is a valid unsigned integer.

For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N−1, so that objects of that type shall be capable of representing values from 0 to 2N−1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.




回答3:


So there are a bunch of gotchas here. This reflects C++.

First you can convert trivially copyable data to char* or unsigned char* or c++17 std::byte*, then copy it from one location to another. The result is defined behavior. The values of the bytes are unspecified.

If you do this from a value of one one type to another via something like memcpy, this can result in undefined behaviour upon access of the target type unless the target type has valid values for all byte representations, or if the layout of the two types is specified by your compiler.

There is the possibility of "trap representations" in the target type -- byte combinations that result in machine exceptions or something similar if interpreted as a value of that type. Imagine a system that doesn't use IEEE floats and where doing math on NaN or INF or the like causes a segfault.

There are also alignment concerns.

In C, I believe that type punning via unions is legal, with similar qualifications.

Finally, note that under a strict reading of the c++ standard, foo* pf = (foo*)malloc(sizeof(foo)); is not a pointer to a foo even if foo was plain old data. You must create an object before interacting with it, and the only way to create an object outside of automatic storage is via new or placement new. This means you must have data of the target type before you memcpy into it.




回答4:


Do you know how to convert from uint8x16_t to uint8x8x2_t?

uint8x16_t input = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
uint8x8x2_t output = { vget_low_u8(input), vget_high_u8(input) };

One must understand that with neon intrinsics, uint8x16_t represents a 16-byte register; while uint8x8x2_t represents two adjacent 8-byte registers. For ARMv7 these may be the same thing (q0 == {d0, d1}) but for ARMv8 the register layout is different. It's necessary to get (extract) the low 8 bytes and the high 8 bytes of the single 16-byte register using two functions. The clang compiler will determine which instruction(s) are necessary based on the context.



来源:https://stackoverflow.com/questions/43521206/arm-neon-how-to-convert-from-uint8x16-t-to-uint8x8x2-t

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!