如果你需要维护大型而且复杂的Hadoop集群的话,《Hadoop操作手册(影印版)》是绝对必需的。随着Hadoop变成数据中心里大规模数据处理的行业标准,操作手册方面的需求急剧增长。萨默尔,cloudera公司的首席方案架构师,在本书中为你展示了产品级Hadoop的运行细节,从规划、安装和配置系统到提供可持续的维护管理。《Hadoop操作手册(影印版)》这本操作指南并没有列举每种可能的场景,它更注重实效,描述了在重要部署中的各项步骤。 本书内容: HDFS和MapRedLice概览:它们存在的原因和原理;从硬件和OS选择到网络需求来规划Hadoop部署; 根据重要属性列表来学习搭建和配置细节; 通过在多个组中共享集群来管理资源;获取最常见的集群维护任务运行手册; 监控Hadoop集群——以及学习基于实际例子的故障检测;使用基础工具和技术来处理备份和灾难性故障。
目录
- Preface
- 1.Introduction
- 2.HDFS
- Goals and Motivation
- Design
- Daemons
- Reading and Writing Data
- The Read Path
- The Write Path
- Managing Filesystem Metadata
- Namenode High Availability
- Namenode Federation
- Access and Integration
- Command—Line Tools
- FUSE
- REST Support
- 3.MapReduce
- The Stages of MapReduce
- Introducing Hadoop MapReduce
- Daemons
- When It All Goes Wrong
- YARN
- 4.Planning a Hadoop Cluster
- Picking a Distribution and Version of Hadoop
- Apache Hadoop
- Cloudera’S Distribution Including Apache Hadoop
- What Should I Use
- Hardware Selection
- Master Hardware Selection
- Worker Hardware Selection
- Cluster Sizing
- Blades,SANs,and Virtualization
- Operating System Selection and Preparation
- Deployment Layout
- Software
- Hostnames.DNS.and Identmcation
- Users,Groups,and Privileges
- Kernel Tuning
- vm.swappiness
- vm.overcommit_memory
- Disk Configuration
- Choosing a Filesystem
- Mount Options
- Network Design
- Network Usage in Hadoop:A Review
- 1 Gb versus 10 Gb Networks
- Typical Network Topologies
- 5.Installation andConfiguration
- Installing Hadoop
- Apache Hadoop
- CDH
- Configuration:An 0verview
- The Hadoop XML Configuration Files
- Environment Variables and Shell Scripts
- Logging Configuration
- HDFS
- Identification and Location
- Optimization and Tuning
- Formatting the Namenode
- Creating a/tmp Directory
- Namenode High Availability
- Fencing Options
- Basic Configuration
- Automatic Failover Configuration
- Format and Bootstrap the Namenodes
- Namenode Federation
- MapReduce
- Identification and Location
- Optimization and Tuning
- Rack Topology
- Security
- 6.Identity,Authentication,and Authorization
- Identity
- Kerberos and Hadoop
- Kerberos:A Refresher
- Kerberos Support in Hadoop
- Authorization
- HDFS
- MapReduce
- Other Tools and Systems
- Tying It Together
- 7.ResojJrceManagement
- What Is Resource Management
- HDFS Quotas
- MapReduce Schedulers
- The FIFO Scheduler
- The Fair Scheduler
- The Capacity Scheduler
- The Future
- 8.ClusterMaintenance
- Managing Hadoop Processes
- Starting and Stopping Processes with Into Scripts
- Starting and Stopping Processes Manually
- HDFS Maintenance Tasks
- Adding a Datanode
- Decommissioning a Datanode
- Checking Filesystem Integrity with fsck
- Balancing HDFS Block Data
- Dealing with a Failed Disk
- MapReduce Maintenance Tasks
- Adding a Tasktracker
- Decommissioning a Tasktracker
- Killing a MapReduce Job
- Killing a MapReduce Task
- Dealing with a Blacklisted Tasktracker
- 9.Troubleshooting
- Differential Diagnosis Applied to Systems