Saving $12K on modeling jobs in AWS

Your AWS bill is going through the roof and you don't know what steps to take? Although lots of opinion leaders have voiced the somewhat flawed statement that "AWS is cheap", the reality is different where one needs to careful monitor costs since it became as simple as pushing a button to get extra resources.

AWS is still expensive:

AWS has lowered its price repeatedly  (42 times as of last year according to AWS) in the past years in a race-to-the-bottom with Google Cloud. That said, it can still run up to $17K a year for a large on-demand instance as the graph below shows:

Modeling jobs are "peak demand":

Modeling jobs need lots of resources for a short amount of time, days at the most. Consequently, they should be treated as peak demand which make them among the first candidates for Elastic Computing. Plus, it can easily be bundled as "Data + Instructions package" making it easy outsource it into another server.

Starcluster to the rescue:

StarCluster is an open-source utility software that was developed by the MIT. It was first developed for resource-hungry MIT students running Simulation Jobs but has not migrated to the Cloud, AWS in particular.

In practice, StarCluster allows the user to spin 100%-ready clusters on demand. As such, installations of requirement packages (like password-less SSH, Network File System for the infrastructure side and OpenMPI, OpenBLAS for the distibuted computing side) will be handled seamlessly by StarCluster so that the user focuses on the high-value tasks. Only 15-20min are needed to get a cluster of any size ready to crunch data!

Beside, the organization can put back the power into the users' hands by letting them setting up their cluster on-demand through a simple configuration file provided by StarCluster:


All things considered, the winning formula would be to consider down-scaling the current infrastructure and transfer the big modeling jobs to StarCluster. For instance, instead of having a 4xlarge, it would make sense having only 1 X-Large reserved instance for the day-to-day operations and offload the rest to a StarCluster setup. In the process, you would save up to $12K of expenditures.

Also noteworthy, it also comes with Python pre-installed (if you're more of a Python fan). 


Popular posts from this blog

Should you ship this feature?

My new job at Lyft

5 rules for a productive Science team