Companies are adopting Apache™ Hadoop® and other open source big data technologies to build data lakes as they strive to take advantage of big and diverse data to enable smarter business decisions. However, the simple installation of a Hadoop cluster does not constitute a data lake. In fact, without adopting practices of thinking big but starting smart, adhering to data management processes, and measuring business value continually, the investment in a data lake will result in wasted time and money.
Rick and I wrote this article to help companies land on a common data lake definition, and to codify some of the best practices we’re seeing from data lake initiatives in the field. Analyst firms are estimating huge data lake project failures in the near future, and companies are spending too much time, energy and money to get these important projects wrong!
There are many reasons why data lake implementations fail. We think we’ve codified three best practices to get them right, but there are surely more. What’s your experience –and what can you add to the article? We’d love to hear from you!