In 2014, Apache™ Hadoop® gathered momentum as the leading platform for big data analytics. Without a doubt, Hadoop is clearly here to stay, it has extended its dominance from enterprise software into social media—Twitter and Facebook both use it—making it hard to imagine a clear successor emerging any time soon. That said, while data scientists are getting better and better and processing larger and larger sets of numbers—there is still a lot of work to be done in improving how data are measured, organized, and visualized. Here are some trends we are following in 2015.
Hadoop outgrows MapReduce
One of the first things that we can expect from 2015 is that Hadoop clusters will start to benefit from other programming models besides MapReduce to deal with large data sets. We already saw YARN begin to gain momentum in 2014 when it got across-the-board support from distribution providers like Cloudera as well as Hortonworks. Expect that this investment will begin to pay off in 2015 as more customers start leveraging YARN’s ability to support alternative execution engines, such as Apache SparkTM.
Spark gains traction
2014 was the year when Spark emerged as the most obvious successor to MapReduce. Unlike MapReduce, Spark works well with iterative algorithms and is considerably more lightweight. Spark also comes with a much-touted API that many programmers say is easier to work with. That said, at five years old, Spark is still immature technology, as well as more expensive than some alternatives. If Spark is to live up to its promise, its community must deliver better documentation, training, and stability.
Hadoop wins over late-majority adopters
There is also good reason to believe that 2015 will see a bit of a rush from late-majority adopters who have been slow to embrace Hadoop. Apache Hadoop has long since broken free of its web giant and ad tech heritage, penetrating most industries; notably music as streaming became ubiquitous. In 2015, even late adopters will turn their attention to Hadoop, so expect an uptick in cost-driven implementations around better storage and faster load-times: SAN/NAS augmentation, ETL offload, and mainframe conversions.
Hadoop gets better apps
NoSQL data management systems were a big driver of innovation in 2014 as SQL emerged in the app space, making it far easier for data scientists to run queries on the go. We should expect more advancement in other languages—Hive, Impala, Presto, Phoenix—as developers build out more options for the interactive querying of data in Hadoop, HBase, and others. We also think that Hive, the warehouse infrastructure favored by Facebook and Netflix, as well as Presto and Impala will attract more support from the business intelligence community and other reporting tool vendors as more complete SQL support emerges.
Data analysts now have more options than ever before when it comes to how they build and analyze data that constantly streams from our devices and phones. In 2015 we expect these tools to become faster, more dynamic, and easier to use. Adoption will also probably be rapid—Spark already has started to appear on job descriptions. Expect for these trends to continue, if not to intensify in the year to come.