Hadoop: The Definitive Guide

Tom White


Organizations large and small are adopting Apache Hadoop to deal with huge application datasets. Hadoop: The Definitive Guide provides you with the key for unlocking the wealth this data holds. Hadoop is ideal for storing and processing massive amounts of data, but until now, information on this open source project has been lacking -- especially with regard to best practices. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems. Programmers will find details for analyzing large datasets with Hadoop, and administrators will learn how to set up and run Hadoop clusters. This book helps you: Use the Hadoop Distributed File System (HDFS) for storing large datasets, and running distributed computations over those datasets, using MapReduce Become familiar with Hadoop's data and IO building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Use Pig, a high-level query language for large-scale data processing Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use HBase, Hadoop's database for structured and semi-structured data

Размер: 3Mb
Тип: pdf

Другие файлы:

Hadoop: The Definitive Guide, 2nd Edition.

HBase: The Definitive Guide
If your organization is looking for a storage solution to accommodate a virtually endless amount of data, this book will show you how Apache HBase can...

Professional Hadoop Solutions
Today's enterprise architects need to understand how the Hadoop frameworks and APIs fit together, and how they can be integrated to deliver real-world...

Pro Microsoft HDInsight: Hadoop on Windows
Pro Microsoft HDInsight is a complete guide to deploying and using Apache Hadoop on the Microsoft Windows and Windows Azure Platforms. The information...

Securing Hadoop
Security of Big Data is one of the biggest concerns for enterprises today. How do we protect the sensitive information in a Hadoop ecosystem? How can...