Subscribe to my newsletter, support me on Patreon or by PayPal donation.
This post is wrapping up the series. We just saw some really simple examples when vectorization either happens or not. But usually you have more complicated code. What to do in this case, how make use of vectorization capabilities of your CPU?
To best answer this question I want to highlight the typical reasons for not vectorized code and guidlines for writing vectorizable code.
for
loopsHowever, the main advice is: see compiler opt reports to understand what compiler did for you. If you measured and your code stil
Some items from the two checklists below were taken from Intel Compiler Autovectorization Guide. I really recommend it, even though it is slightly outdated.
Specifically I want to point out that compilers can do all sorts of loop transformations to make vectorization possible. I recommend to at least familiarize yourself with the basic loop transformations. For example, compiler can perform some of them if it will help to eliminate some loop dependency. Doing so will enable vectorization.
This is really nice article with lots of examples: Crunching numbers with AVX and AVX2. It is a good guide if you want to try out writing vector intrinsics. This post has nice pictures of how some particular hardware instruction works.
Vectorization codebook has rather high-level view for the topic with links to the more detailed documents.