Some doubts in optimizing the neon code

给你一囗甜甜゛ 提交于 2019-12-23 05:44:17

问题


I wrote some neon code in assembly and was aiming for maximum optimization. Though the numbers seem satisfactory, I was interested in understanding the possibilities of optimizing it further. Then I came across an online tool which helps in counting the cycles of each instruction.

Here goes the link to my code: http://pulsar.webshaker.net/ccc/sample-115d4c29

It clearly marked the areas of my concern, but I could not clearly understand the reason for those statements to contain the overheads.

The code segment is divided into 7 sections in the 'comment' area to make it easier for referring.

Thanks in advance. :)


回答1:


you can try this link

http://pulsar.webshaker.net/ccc/beta-sample-115d4c29

this use the beta version 0.9 of the cycle counter. The main difference is that NEON simulator do not use 2 distincts pipelines anymore. Due to Cortex A9 that can't execute 2 NEON instructions in one cycle.

I Started to udpate some part of the cycle counter.

The result Is:

-The cycle information are more accurate for Cortex A9.

-The result is easier to read because most of NEON latency information are due to unpaired instructions.

Orange color mean latency due to waiting for pipeline

Red color mean latency due to register conflict.

The number spécified near the register is not the number of loosed cycles. This is the max number of instructions you could place before this instruction.

I hope that help !



来源:https://stackoverflow.com/questions/8533833/some-doubts-in-optimizing-the-neon-code

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!