Earlier this year, Twitter admitted they lost personal information on 250,000 or so users to hackers. Other companies, including the New York Times and the Federal Reserve, reported hackers had been inside their systems. Companies of all sizes, and the network security community, battle constantly against advanced, persistent threats. It’s time to fight back on a level playing field. A new weapon in this fight is the ability to use Big Data techniques to build comprehensive predictive analytic applications on your data.
Big Data, and the analysis of massive data sets including all data going back years and years, has been nearly impossible to manage until recently. New technologies such as Hadoop and NoSQL, in the hands of a new wave of Big Data vendors, now make it possible for companies to cost effectively store, process, and analyze 100 percent of their data. Before, they could store and analyze only a fraction of their information, due to the cost of traditional data warehouse systems
A new role exists today to fight back against the Hacker and the market is recognizing them as Data Scientists. Today’s Data Scientists can build analytic applications to detect problems and analyze small signals in the long tail of data to troubleshoot areas and take preventive actions before problems materialize. Now that companies can easily store and keep on hand 100 percent of their data, viruses and malware, hidden inside company data just like in personal computers, can be located. Delayed-action malware, designed to “sleep” in the data until it’s no longer current then wakes up and enables the attack. Companies that don’t manage old data have a poor chance of ever detecting their presence.
Built on Big Data technology, Device Data Analytic Applications read individual machine records and find patterns before they become problems. For a manufacturer, the problem may be trends pointing to device or part failures. For every company, the pattern may be hidden malware ready to take down critical systems or worse steal customer information or financial records.
Many firms are already tackling the problem of both external and internal threats. Think Big has helped several clients use data science to understand how disparate computers are coordinating and communicating to prepare for attacks months in advance. These “botnets” are armies of infected machines repeatedly making network requests to a central “commander,” who moves to different network locations throughout the day. By comparing network requests across machines, using both supervised and unsupervised algorithms, data scientists bubble up only the malicious traffic, the location of the infected machines, and the potential locations of the commander. This type of solution will change the way the security community looks at threats and has already been shown to improve the lead-time on threat detection by as much as three months.
Other clients of Think Big have looked to their own intranets for security issues. Like most companies, they have known for years that by the time viruses or malware are detectable via software, the infections have often already taken place. Infected computers may perform malicious behaviors such as attacking non-infected corporate machines. By utilizing intranet network patterns, data science, and Hadoop, these clients build network-path based propensity models over an acyclic graph having the infection point center. Across the graph, the propensity for a machine to be infected is measured as a function of machine characteristics as well as its place in the network. With this data science power, companies are able to alert security teams to audit certain machines rather than waiting for Anti-Virus software to identify infection(s).
What is it worth to organizations to have the flexibility to run 50x more unique analytics on their data? To run analysis previously not possible? To do at a fraction of the cost? With the right Big Data tools and analytics applications applied to your entire data pool, you may not have to admit hackers have done a better job than you of exploiting the value of your own data.