阿里云-云小站(无限量代金券发放中)
【腾讯云】云服务器、云数据库、COS、CDN、短信等热卖云产品特惠抢购

Hadoop官方文档翻译——YARN Architecture(2.7.3)

174次阅读
没有评论

共计 3830 个字符,预计需要花费 10 分钟才能阅读完成。

The fundamental idea of YARN is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job or a DAG of jobs.

The ResourceManager and the NodeManager form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system. The NodeManager is the per-machine framework agent who is responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the ResourceManager/Scheduler.

The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.

YARN 的基本构想是将资源管理器和作业调度器 / 监控器分开成两个单独的进程。这个想法是为了拥有一个全局的资源管理器(RM)和每一个应用都有一个应用控制器。应用可以是一个单独的作业也可以是一组作业。

ResourceManager 和 NodeManager 构成数据计算框架。RM 是最终的权威仲裁系统中的所有应用的资源分配。NodeManager 是框架在每台机器中负责 containers 的代理,监控它们的资源使用(内存、CPU、磁盘和网络)和将其汇报给 ResourceManager/ 调度器。监控它们的资源使用(内存、CPU、磁盘和网络)和将其汇报给 ResourceManager/ 调度器。

每个应用程序的 ApplicationMaster 实际上是框架指定的库负责从 RM 谈判获取资源并和 MM 一起工作来执行和监控任务。

  Hadoop 官方文档翻译——YARN Architecture(2.7.3)

The ResourceManager has two main components: Scheduler and ApplicationsManager.

The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application. Also, it offers no guarantees about restarting failed tasks either due to application failure or hardware failures. The Scheduler performs its scheduling function based the resource requirements of the applications; it does so based on the abstract notion of a resource Container which incorporates elements such as memory, cpu, disk, network etc.

The Scheduler has a pluggable policy which is responsible for partitioning the cluster resources among the various queues, applications etc. The current schedulers such as the CapacityScheduler and the FairScheduler would be some examples of plug-ins.

The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure. The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress.

MapReduce in Hadoop-2.x maintains API compatibility with previous stable release (hadoop-1.x). This means that all MapReduce jobs should still run unchanged on top of YARN with just a recompile.

ResourceManager 有两个主要的组成部分:调度器和应用管理器。

调度器负责给各个正在运行的拥有相似的约束如容量,队列等的应用分配资源。调度器是一个纯粹的调度器而不负责监控或者跟踪应用的状态。他也不负责恢复由于应用失效或者硬件失效而失败的任务。调度器根据应用的资源需求来执行它的调度。而不是根据一个抽象资源“容器”包含的元素例如内存、CPU、磁盘和网络等

调度器是一个可插拔的组件负责将资源分配给各种各样的队列、应用等。目前的容量调度器和公平调度器将成为一些插件的例子。

应用管理器负责接收作业的提交、选择第一个容器用来运行应用指定的应用控制器和提供当 ApplicationMaster 容器失效时的重启。每个应用的 ApplicationMaster 负责从调度器那里谈判获取合适的资源容器,跟踪他们的状态和监控过程。

hadoop-2.x 中的 MapReduce 兼容前面稳定的版本(hadoop-1.x)。这就意味着所有的 MapReduce 作业只需要再编译一次无需做任何改变就可以运行在 YARN 上。

* 由于译者本身能力有限,所以译文中肯定会出现表述不正确的地方,请大家多多包涵,也希望大家能够指出文中翻译得不对或者不准确的地方,共同探讨进步,谢谢。

下面关于 Hadoop 的文章您也可能喜欢,不妨看看:

Ubuntu14.04 下 Hadoop2.4.1 单机 / 伪分布式安装配置教程  http://www.linuxidc.com/Linux/2015-02/113487.htm

CentOS 安装和配置 Hadoop2.2.0  http://www.linuxidc.com/Linux/2014-01/94685.htm

CentOS 6.3 下 Hadoop 伪分布式平台搭建  http://www.linuxidc.com/Linux/2016-11/136789.htm

Ubuntu 14.04 LTS 下安装 Hadoop 1.2.1(伪分布模式)http://www.linuxidc.com/Linux/2016-09/135406.htm

Ubuntu 上搭建 Hadoop 环境(单机模式 + 伪分布模式)http://www.linuxidc.com/Linux/2013-01/77681.htm

实战 CentOS 系统部署 Hadoop 集群服务 http://www.linuxidc.com/Linux/2016-11/137246.htm

单机版搭建 Hadoop 环境图文教程详解 http://www.linuxidc.com/Linux/2012-02/53927.htm

Hadoop 2.6.0 HA 高可用集群配置详解  http://www.linuxidc.com/Linux/2016-08/134180.htm

Spark 1.5、Hadoop 2.7 集群环境搭建  http://www.linuxidc.com/Linux/2016-09/135067.htm

更多 Hadoop 相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13

本文永久更新链接地址:http://www.linuxidc.com/Linux/2016-12/138028.htm

正文完
星哥说事-微信公众号
post-qrcode
 0
星锅
版权声明:本站原创文章,由 星锅 于2022-01-21发表,共计3830字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
【腾讯云】推广者专属福利,新客户无门槛领取总价值高达2860元代金券,每种代金券限量500张,先到先得。
阿里云-最新活动爆款每日限量供应
评论(没有评论)
验证码
【腾讯云】云服务器、云数据库、COS、CDN、短信等云产品特惠热卖中