Showing posts from August, 2021

Should you ship this feature?

Introduction During my time at tech companies, one of the hardest tasks for my team is to guide the Product team in their decision to launch a feature. It is both very consequential and also very fraught with pitfalls. Let's take an example to ease into the topic: You worked hard to convince leadership to measure the effect of this cool new feature and you got a nice experiment set up according to the cannons of measurement. After 4 weeks, you are seeing stat sig improvement in your primary metric and not secondary metrics affected negatively. The story is clear as day: you should tell the team to ship it, right?  The problem with shipping features Once the results are out, the focus is put on the outcomes and leadership is incentivized to ship. As a result, little attention is given to cost side of the equation: Tech debt : Does this feature make the system more complex? Does it hinder long term growth by exacting a tax from any new feature? Think about Net Prevent Value of such t

Playing around with Vectorization

While I'm on paternity leave, I'm enjoying a bit of time off to write a blog article. Here we go! Vectorization of code is one of the key tenets of performance computing. Unfortunately, these aspects are quite hidden in high-level programming language (Python/R/SQL) and Data Scientists are unaware of the inner workings. As a matter of fact, even Software Engineer at Tech companies rarely go that low.   This post aims at rediscovering them. Let's dive in! Problem Statement: Let's take a simple problem : A social media company wants to send invites to folks that have a 2nd degree connection (i.e. at least 1 friend in common) but are not friends yet. The problem gives you a N x N matrix containing bits (1=friend). How many emails would you send? Context : There are better ways to solve this problem (for instance, by storing the index of the friends as opposed to all bits). Here, we are taking the data format as a constraint. Implementations: 0- Baseline approach We start w