Playing around with Vectorization
While I'm on paternity leave, I'm enjoying a bit of time off to write a blog article. Here we go!
Vectorization of code is one of the key tenets of performance computing. Unfortunately, these aspects are quite hidden in high-level programming language (Python/R/SQL) and Data Scientists are unaware of the inner workings. As a matter of fact, even Software Engineer at Tech companies rarely go that low.
This post aims at rediscovering them. Let's dive in!
0- Baseline approach
- For each user i
- For each user j
- if i and j are not already friends
- Check if there is a user c such as c is both friends with i and j
1- Bitset approach
2- Bitpacking + SIMD approach
- One can check 64 relationships at the time doing a bitwise AND between two ULL
- Furthermore, the 3rd loop is nicely set up as a SIMD reduction (running vectorized AND and reducing with vectorized OR).
Code in git: link