Presto is an open source distributed SQL query engine designed for running interactive analytic queries against data sources of all sizes. Through a single query, Presto allows you to access data where it lives, including in Apache Hive™, Apache Cassandra™, relational databases or even proprietary data stores. Presto was created by Facebook for the analytics needs of extremely large data-driven organizations.
Teradata is contributing open source code to Presto and making a multi-year commitment to increase adoption in the enterprise. Through this commitment Teradata is adding critical features in the areas of software installation, improved monitoring & management, YARN integration, security, support for ODBC/JDBC drivers, ecosystem integration and BI tools certifications. In addition to these software contributions, Teradata is also improving Presto’s documentation and creating easy-to-follow QuickStart guides.
Think Big, a Teradata Company, is now offering specialized Presto professional services, including:
- Piloting new functionality with Presto Jumpstart
- Customized Presto training
- Design and development services for Presto
As useful as Hive is, the latency of Hadoop means that each query in an interactive Hive session will take many seconds, which makes it difficult to explore and evolve ideas quickly. Impala is a new query engine that bypasses MapReduce for very fast queries over data sets in HDFS. It uses HiveQL as the query language. Impala is very new; the first production release is forthcoming. It currently doesn’t support all HiveQL features, but in many scenarios, speedups of 100x over Hive performance are already possible.
Spark and Shark
While very flexible, the MapReduce has a number of constraints that affect performance. Spark is a newer distributed computing framework that exploits sophisticated in-memory data caching to significantly improve many common data operations, sometimes by multiples of 10x. Shark is a port of Hive to Spark, bringing performance improvements to Hive queries that are comparable to the improvements Impala provides.
This is a web service that lets you do interactive analysis of massive datasets—up to billions of rows. It is the first publicly available access to Google’s internal big data technology stack.