Verilog for loops - synthetization

不羁岁月 提交于 2019-12-11 08:07:46

问题


I am pretty new to Verilog, but would like to understand it properly. Currently I am making TxRx on FPGA. I noticed that my code is consuming huge amount of logic, although it should not be like that. So I did not wrote my code properly. I know where is mistake, obviously my for loop is making parallelization of expressions (especially because this for loop is nested into another for loop). What would be right way to write code to avoid this. The code is working but it is not efficient. Feel free to comment, suggest. I am still learning so every advice will probably be good. Thank you in advance.


回答1:


Each line of your inner loop has three multiplication on data and an addition operation as well as some other smaller operations (e.g. %16). The synthesizers unroll loops and tries to synthesize the logic to do all these operations in a single clock cycle, which counts to 6*256 multiplications. This has high area and leaves very little room for resource sharing.

You have a choice to trade off some performance for area. I would try the following:

  • Implement each iteration of the loop in a single cycle: calculate that iteration, save the results, then use it for the next next clock cycle. This will reduce the area 256 times, but it would take 256 clock cycles to finish, i.e., you can accept new input each 256 clock cycles. You can experiment with different number of iterations in a clock cycle. For example, you can just calculate each iteration of your outer loop in a single cycle. This will reduce your area by 16 times and each calculation takes 16 clock cycles.

  • If performance is of high importance, you can try pipelining your circuit. This makes your code a bit more complex, but will significantly increase your throughput. For example, you can have 256 stages + the area overhead of pipeline registers, but your clock period can be 256 times much shorter. Again, you can experiment with having various pipeline stages and chose the one that fits your needs best.

Here is an example of implementing an iterative algorithm either in a single clock cycle or in multiple clock cycles (see simple_mult module).



来源:https://stackoverflow.com/questions/29148206/verilog-for-loops-synthetization

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!