There are a lot of definitions for data science and a lot of mixed messages, but the core is something we can all agree on. At a high level, data science is about bringing programming and statistical tools to bear on new data—whether it’s data that hasn’t been looked at before, data that is inherently new to the organization, or a new way of looking at data that you’ve already captured.
At Think Big, we’ve spent at least 100,000 man hours over the last five years executing data science or data science supporting activities. Our insight from this experience boils down to three critical areas for supporting data scientists and analysts to drive value through data:
It’s not surprising, but above all else you need data. But, not clean or sanitized data. As a data scientist, I want the messy flood, with bad events, headers, etc. For example, if it’s sensor data, I need the “bad” data (i.e. data that was once discarded by IT) to predict failures. The next thing needed is metadata and systems that enable tagging data, to make it usable for multiple purposes downstream within the organization. When I build automated systems for data quality, I need those scores saved as metadata about records; not used to justify data deletion!
From speaking to many enterprise IT leaders about strategy, it’s clear there’s often a dirty secret in the room. Probably half the leaders who have adopted a system, are ready to work with big data and are thinking about creating a data lake look at us and ask; “What exactly should we be doing with big data? Should we look to improve operational efficiency? If so, what kind of efficiency should we be targeting?” If you don’t have the answers to basic strategic questions, then you probably don’t have clear goals defined. And this is a dangerous place to start your big data journey.
Metaphorically speaking, an issue we see time and again is that there’s too much focus on “how to build the fastest car”. Let me say that you don’t need the fanciest and most costly data science platform and software with all the bells and whistles. What you really need is a solution that is trusted to reliably and consistently get results today and tomorrow, something that works not just for data scientists but for the business. To earn the trust of your business executives your solution has to be something they can touch, monitor and see value from—quickly! We often begin with customers by helping them answer a simple question: “How will these new models and insights integrate into existing business processes?” Without an answer to that question, the best models in the world can’t add value.
With these three areas addressed—capturing all your data, tying what you’re doing from a platform and engineering standpoint back to an overall strategy and having a plan for integration between the data science side and the business—you will begin to see actual results that will support real data-driven decision making. And your data scientists will thank you!
Dan Mallinger is the Director of Data Science at Think Big, a Teradata Company. Dan specializes in creating value by integrating data science and big data into existing business processes. He recently spoke at MongoDB World 2015. To see his complete keynote address, visit https://www.youtube.com/watch?v=03FJjb_Gw8M