Why Spark is a game changer in Big Data and Data Science
This second part of this post is technical and requires an understanding of a classification algorithm. The Business Part: Hadoop was born in 2004 and conquered the world of top-end software engineering in the Valley. Used primarily to process logs, it had to be reinvented in order to attract business users. In 2014, I noticed a shift in the industry with the ascent of “Hadoop 2.0”: More approachable for business users (e.g. Cloudera Impala) and faster, it is bound to overtake Hadoop as we know it. Spark has been at the forefront this revolution and has provided a general purpose Big Data environment. Hadoop Spark comes with a strong value proposition: It's fast (10-100X vs. Map Reduce) It's scalable (I would venture in saying that it can scale for 99.99% of the companies out there) It's integrated (for instance, it's possible to run SQL / ML algorithms in the same script) It's flexible Spark's Components- ( Credit: spark.apache.org ) The flexib