The Open Source Community Welcomes Kylo™: A Next Generation Data Lake Management Software Platform

Comments (2)

Hadoop is difficult to get right, and most organizations will freely admit they don’t have the in-house engineering skills to successfully implement big data solutions on the Hadoop stack. In fact, at the recent Gartner Data & Analytics Summit in Sydney, Gartner research director Nick Heudecker claimed that 70 per cent of Hadoop deployments in 2017 will either fail to deliver their estimated cost savings or their predicted revenue.

It’s a dim view of the year ahead, but on the plus side, it’s not a view shared by the emerging Kylo community. Kylo™, a new open source enterprise-ready data lake management software platform inspired by over eight years experience and 150 projects, has been used in beta and production throughout that past year across a dozen major multi-national companies, and today debuts its full release. A framework used to develop data lakes with self-service data ingest and data preparation, and with integration metadata management, governance, and security best practices, Kylo™ is helping organizations make sense of the many working pieces in Hadoop.

Getting Data Lakes Started Right: Implementing Best Practices Upfront

Until now, data lake engineering options have been limited to integrating low-level open source components, or expensive, outdated ETL tools with only bolted on Hadoop support. Kylo™ bridges the gaps left by other open source frameworks by providing IT with templates for building pipelines while also prompting teams to follow best practice so that overall data lake development is done comprehensively, rather than in siloes.

For example, in creating feeds in Kylo™, data engineers design templates that can enable self-service feed creation. Kylo enforces best practices such as metadata capture, lineage, with features to measure data quality. Many organizations don’t think about the importance of data quality until months or years after they begin development.  

Out-of-the-box Features

  • Extensible, flexible APIs and plug-in architecture: Similar platforms typically offer a ‘fixed and opinionated’ model for data ingestion with some customization capability, Kylo™ was built with extensibility in mind. The default model for data ingestion can be easily modified or augmented by entirely new models (for example, streaming pipelines or highly customized error handling.)
  • Metadata capture: Kylo™ automatically captures all operational metadata generated by feeds and extensive business and technical (for example, schema) metadata defined during the creation of feeds and categories.
  • Essential operations capabilities: Kylo provides key operational capabilities around monitoring feeds, troubleshooting, and measuring service levels.
  • User-oriented features such as self-service data ingest, data preparation, and data discovery: Kylo™ application is oriented around data lake users, but Kylo™ also provides a framework to IT that enables a skills-shift from low-level software coding to enabling power users such as data analysts to self-service data preparation and ingestion. Once data is ingested, it features a visual SQL builder and over 100 transformation features to help data analysts wrangle data before they publish and schedule their feeds.
  • Enterprise training: A variety of courses to choose from on Kylo™, Hadoop, and Spark.
  • Commercial managed services: Managed operational services for Hadoop clusters available including Kylo™, for both on-premise or cloud-based deployments. Managed services are delivered via expert Kylo™ teams with operations experience on major Hadoop distributions.

How is Kylo™ Helping?

Here’s an example of how Kylo is helping a global telecommunications company:

The company had spent 12 weeks using 30 data engineers doing hand coding to create data ingestion pipelines. Using Kylo™, one engineer who was well versed in using the framework was able to ingest, cleanse, profile, and validate the same data in five days with 10 times the throughput. In this sense, Kylo™ not only improved efficiencies, but it also allowed in-house engineers to focus on other business priorities.  

Ready to Try out Kylo™?

If you’d like to give Kylo™ a try, please see the sandbox download and tutorials at If you are interested in becoming part of the Kylo™ contributing community, please see the contributing instructions at, where you can also leave more about the immediate roadmap.

2 responses to “The Open Source Community Welcomes Kylo™: A Next Generation Data Lake Management Software Platform

  1. Hi,
    Myself Ravikumar and I work for Quotient technology pvt ltd as a data engineer.
    We would like to explore Kylo for one of our upcoming project but facing some issues while setting up.

    We will need help from you guys.

Leave a Reply

Your email address will not be published. Required fields are marked *