If you read the trade press, it seems like everyone is excited about Hadoop—and with good reason. There is a lot to be excited about, especially all the YARN applications on top of HDFS, such as Storm, Spark and others.
But, downloading free software, buying commodity servers or firing up a cloud-based Hadoop instance is actually the easy part of adding Hadoop to the technology mix. The challenge really comes with arranging your organization around the concept of using analytics on Hadoop, understanding which capabilities to focus on first and getting on the same data governance page.
At Think Big, many of the organizations we work with tell us that their primary challenges involve a lack of the right internal skills and a proper strategy or roadmap to move forward with Hadoop beyond the phase of running small-scale proofs of concept.
Often, organizations follow a common Hadoop adoption path that begins because there is an immediate need for large scale MPP for all data types, which is followed by establishing a data repository and then initial analytics exploration. The critical point arrives with the transition from that phase of initial analytics where excitement begins to build, to actually integrating Hadoop into overall analytics capabilities.
Fortunately, there are some important strategies and best practices that can be employed to expand analytics on Hadoop and help overcome organizational and other obstacles.
Working with multi-structured data and exploiting the processing power of Hadoop requires new skills, which can create significant gaps between existing resources and what is needed. There is also a lack of common vocabulary across groups who will use Hadoop.
Finally, knowledge capture and sharing is often lacking because organization silos do not naturally support broad knowledge transfer. What can be done?
- First of all, senior-level support is critical to develop a cross-business unit committee to guide organizational change, define common vocabulary, defend the effort to executive leadership and share successes.
- Conduct thorough, honest skills assessments to identify gaps and training needs, and map needs to roles and responsibilities.
- Document tool requirements based on current and projected skills.
- Establish organizational/collaboration architecture.
- Plug into existing knowledge transfer practices and tools, and allow for informal information exchange.
Foundational big data capabilities, which don’t immediately impact the bottom line, can lose resources at the whim of new business priorities. Here are a few ways to address capabilities challenges:
- Consolidate ownership in a team that has organizational influence and includes representatives from business, infrastructure, architecture, data and analytics.
- Come to a consensus on what is meant by “capabilities” through a common vocabulary for your business unit and technology partners.
- Create collaborative, cross-functional roadmaps to provide visual representations of high-level goals against a timeline, which helps define projects and priorities.
- Dedicate resources to capabilities (building the foundation) and protect those resources.
- Re-visit your roadmap—ask: does it still reflect our vision?
Hadoop can support a variety of data structures, which is exciting because it allows new types of analytics. However, data schema is still needed. Without it, the burden of defining the schema shifts to the data user, which can create problems.
Consistent taxonomies and reference data are critical to be able to perform meaningful analysis, because users must know what a field means. Individual teams creating new taxonomies and reference data changing also create challenges, as do access patterns and flows related to architecture, and data flows across platforms, regular updates, and physical and virtual constraints. Some ways to address data challenges include:
- Create a data lake, which is a big issue with many varying opinions.
- Test and define common data manipulation patterns for different use cases, such as aggregations, reductions and basic statistical derivations.
- Centralize responsibility of data governance, data architecture, taxonomy and maintenance.
- Establish knowledge sharing for data post analytics.
Want to add Hadoop to your analytics mix? First, realize that technology itself is not enough. Organizational change, whether complete or partial, cannot be ignored. Teams are now working in ways that are truly different than in the past, which makes having the right people and processes critical to success. Addressing capabilities requires dedicated resources to build a strong core and having difficult conversations early, which can result in much better alignment. As for data, centralized data management and aligning data and business needs is critical.
By understanding the challenges mentioned above and how to properly address them, adding Hadoop to your analytics mix can live up to the excitement it has been generating to deliver real business value.
Madina Kassengaliyeva is Director of Client Services at Think Big, a Teradata Company. She is responsible for driving the client engagement strategy and ensuring ongoing success of Think Big engagements — helping clients capture the value of adding big data technologies and methods to their businesses. She recently conducted a webinar about adding Hadoop to the analytics mix. To listen to her webinar, click here.