Scala :The Bridge Language

The Fragmented World of Languages

Lots of changes in Data Science have happened in its every dimension: applications, algorithms and techniques, software and ,of course, languages. Languages tell a fascinating story because it is a reflection of the nature and the state-of-mind of the practitioners. Not so surprising, they have changed a lot over the years.

When I first started getting interested in Data Science, SAS and Matlab were still sure bets. A few years in (2013 or so), R became the Lingua Franca around me: easy to code and understand, the vectorized calculations backed by the DataFrame API made it very practical for non-CS practitioners (read statisticians, engineering generalists like myself) to use. It did away with a lot of the lower level considerations and ended up making a simpler interface, predictably at the expenses of the CS crowd, loathing such abstractions.

Today, I think we are at another junction where the ball is moving in the opposite direction: the CS languages that were catering to the Computer Scientist community are now becoming increasingly easy to use, its most prominent member being Martin Odersky's Scala. Let's dive in.


Scala: the Bridge Language

My personal experience with Scala goes back to Scala in 2015 when I wanted to better understand Apache Spark. The language is touted as a user-friendly alternative to Java (both running on the JVM) that also has lots of additional features like macros and Functional Programming constructs. Right from the get go, the language strikes by its ease of use (the REPL really helps) and its clean syntax.

Scala is unique because it can both fulfill the needs of Programmers and Data Scientists: It is very much at the junction of the 2 worlds and is therefore, what I call a "Bridge Language" in that sense. Smart companies, e.g. LinkedIn, Square and Spotify to name a few, have been fast at understanding the benefits of it: more productivity, more collaboration and more readability. All the sudden, the code did not need rewriting for production use !

What Scala can do for you:

Scala can use any Java project which makes it easy to leverage all the work that has taken place in Java for the last 2 decades. If you don't know Java, this is a good entry door because Scala is more user friendly. If you know Java, Scala is worth learning to become more productive. 

Besides, additional Scala libraries bring top notch code to you:

Data Science: 
  • Apache Spark: The ultimate tool for Big Data Processing with its MLLib for Data Science
  • Apache Kafka: The go-to tool for stream distribution and processing
  • Apache Akka: Actors (quite low level but easy to use)
Web Dev:
  • Play Framework: Modern Web Applications for everybody 
  • Slick: Functional connector to databases
Those libraries are very well documented. You can usually get started in minutes even if you're not an expert in the field.

How to get started:

I think getting your feet wet is the biggest hurdle - mostly psychological but there are also some technical difficulties that can be hard to overcome if you don't follow the right sequence. Here is my recommended guide to nail it:

  1. I would start with the Martin's Coursera class. It is a great intro to Functional Programming (which you will encounter often in Scala) and it also features great materials about the installation. Try to do the whole specialization if you have time. I would encourage you to use IntelliJ for your IDE, a better alternative to Eclipse.
  2. If you're a Data Scientist, try to learn more about Data Structures using Scala. I followed the Data Structure and Algo specialization on Coursera, a $420 investment you won't regret. If you are a Software Engineer, chances are you already have this knowledge (but it could be a good refresher)
  3. Choose a side project to hone your skills. Here are a few ideas:
    1. Create a web service with Play
    2. Put together a Data Pipeline with Spark
    3. Replace a small Java App with Scala (or code your next Java App with Scala)
Good luck !

Have experience using Scala ? Let me know what you think ! 


Popular posts from this blog

Should you ship this feature?

My new job at Lyft

5 rules for a productive Science team