neon

IMX6系列查看CPU ID的方法

谁说胖子不能爱 提交于 2019-12-04 03:40:39
IMX6系列查看CPU ID的方法 想要查看CPU的ID信息,可以通过cat /proc/cpuinfo命令来查看:(以下是我用(电鱼)SAIL-IMX6实现的) 具体步骤: Imx6dl: root@imx6qdlsolo:~# cat /proc/cpuinfo processor : 0 model name : ARMv7 Processor rev 10 (v7l) BogoMIPS : 6.00 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 CPU revision : 10 processor : 1 model name : ARMv7 Processor rev 10 (v7l) BogoMIPS : 6.00 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 CPU revision : 10

Fast ARM NEON memcpy

谁说我不能喝 提交于 2019-12-03 16:34:30
I want to copy an image on an ARMv7 core. The naive implementation is to call memcpy per line. for(i = 0; i < h; i++) { memcpy(d, s, w); s += sp; d += dp; } I know that the following d, dp, s, sp, w are all 32-byte aligned, so my next (still quite naive) implementation was along the lines of for (int i = 0; i < h; i++) { uint8_t* dst = d; const uint8_t* src = s; int remaining = w; asm volatile ( "1: \n" "subs %[rem], %[rem], #32 \n" "vld1.u8 {d0, d1, d2, d3}, [%[src],:256]! \n" "vst1.u8 {d0, d1, d2, d3}, [%[dst],:256]! \n" "bgt 1b \n" : [dst]"+r"(dst), [src]"+r"(src), [rem]"+r"(remaining) : :

ARM NEON vectorization failure

余生颓废 提交于 2019-12-03 15:51:24
I would like to enable NEON vectorization on my ARM cortex-a9, but I get this output at compile: "not vectorized: relevant stmt not supported: D.14140_82 = D.14143_77 * D.14141_81" Here is my loop: void my_mul(float32_t * __restrict data1, float32_t * __restrict data2, float32_t * __restrict out){ for(int i=0; i<SIZE*4; i+=1){ out[i] = data1[i]*data2[i]; } } And the options used at compile: -march=armv7-a -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize -mvectorize-with-neon-quad -ftree-vectorizer-verbose=2 I am using arm-linux-gnueabi (v4.6 ) compiler . It is important to note

Divide by floating-point number using NEON intrinsics

泪湿孤枕 提交于 2019-12-03 15:46:40
问题 I'm processing an image by four pixels at the time, this on a armv7 for an Android application. I want to divide a float32x4_t vector by another vector but the numbers in it are varying from circa 0.7 to 3.85 , and it seems to me that the only way to divide is using right shift but that is for a number which is 2^n . Also, I'm new in this, so any constructive help or comment is welcomed. Example: How can I perform these operations with NEON intrinsics? float32x4_t a = {25.3,34.1,11.0,25.1};

How to use the multiply and accumulate intrinsics in ARM Cortex-a8?

僤鯓⒐⒋嵵緔 提交于 2019-12-03 12:50:59
问题 how to use the Multiply-Accumulate intrinsics provided by GCC? float32x4_t vmlaq_f32 (float32x4_t , float32x4_t , float32x4_t); Can anyone explain what three parameters I have to pass to this function. I mean the Source and destination registers and what the function returns? Help!!! 回答1: Simply said the vmla instruction does the following: struct { float val[4]; } float32x4_t float32x4_t vmla (float32x4_t a, float32x4_t b, float32x4_t c) { float32x4 result; for (int i=0; i<4; i++) { result

SIMD optimization of cvtColor using ARM NEON intrinsics

流过昼夜 提交于 2019-12-03 12:09:26
I'm working on a SIMD optimization of BGR to grayscale conversion which is equivalent to OpenCV's cvtColor() function . There is an Intel SSE version of this function and I'm referring to it. (What I'm doing is basically converting SSE code to NEON code.) I've almost finished writing the code, and can compile it with g++, but I can't get the proper output. Does anyone have any ideas what the error could be? What I'm getting (incorrect): What I should be getting: Here's my code: #include <opencv/cv.hpp> #include <opencv/highgui.h> #include <arm_neon.h> //#include <iostream> using namespace std;

iPhone detecting processor model / NEON support

微笑、不失礼 提交于 2019-12-03 07:34:25
问题 I'm looking for a way to differentiate at runtime between devices equipped with the new ARM processor (such as iPhone 3GS and some iPods 3G) and devices equipped with the old ARM processors. I know I can use uname() to determine the device model, but as only some of the iPod touches 3G received a boost in their ARM processor, this isn't enough. Therefore, I'm looking for one of these: A way of detecting processor model - I suppose there's none. A way of determining whether ARM neon

Optimizing RGBA8888 to RGB565 conversion with NEON

蹲街弑〆低调 提交于 2019-12-03 06:54:56
I'm trying to optimize an image format conversion on iOS using the NEON vector instruction set. I assumed this would map well to that because it processes a bunch of similar data. My attempts haven't gone that well, though, achieving only a marginal speedup vs the naive c implementation: for(int i = 0; i < pixelCount; ++i, ++inPixel32) { const unsigned int r = ((*inPixel32 >> 0 ) & 0xFF); const unsigned int g = ((*inPixel32 >> 8 ) & 0xFF); const unsigned int b = ((*inPixel32 >> 16) & 0xFF); *outPixel16++ = ((r >> 3) << 11) | ((g >> 2) << 5) | ((b >> 3) << 0); } 1 megapixel image array on iPad

Divide by floating-point number using NEON intrinsics

邮差的信 提交于 2019-12-03 05:22:49
I'm processing an image by four pixels at the time, this on a armv7 for an Android application. I want to divide a float32x4_t vector by another vector but the numbers in it are varying from circa 0.7 to 3.85 , and it seems to me that the only way to divide is using right shift but that is for a number which is 2^n . Also, I'm new in this, so any constructive help or comment is welcomed. Example: How can I perform these operations with NEON intrinsics? float32x4_t a = {25.3,34.1,11.0,25.1}; float32x4_t b = {1.2,3.5,2.5,2.0}; // somthing like this float32x4 resultado = a/b; // {21.08,9.74,4.4

what is the fastest FFT library for iOS/Android ARM devices? [closed]

时间秒杀一切 提交于 2019-12-03 04:24:38
问题 Closed . This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 4 years ago . What is the fastest FFT library for iOS/Android ARM devices? And what library to people typically use on iOS/Android platforms? I'm guessing vDSP is the library most frequently used on iOS. EDIT: my code is at http://anthonix.com/ffts and uses the BSD license. It runs on