阿里云-云小站(无限量代金券发放中)
【腾讯云】云服务器、云数据库、COS、CDN、短信等热卖云产品特惠抢购

Hadoop2.7.3安全模式-hadoop kerberos官方配置详解

132次阅读
没有评论

共计 19812 个字符,预计需要花费 50 分钟才能阅读完成。

介绍

这篇文档描述了如何为 Hadoop 在安全模式下配置认证。当 Hadoop 被配置运行在安全模式下时,每个 Hadoop 服务和每个用户都必须被 Kerberos 认证。正向方向的主机去查找所有服务的主机,必须被正确地配置来相互认证。主机查找可能都被配置在 DNS 或者 /etc/hosts 文件中。推荐你在尝试配置 Hadoop 安全模式前,先了解 kerberos 和 DNS 的工作原理。

kerberos 相关详细介绍 见 http://www.linuxidc.com/Linux/2016-09/134949.htm。

Hadoop 的安全特性,由 Authentication(认证), Service Level Authorization(服务级别认证), (Authentication for Web Consoles)(http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/HttpAuthentication.html)(web 控制台认证)和 Data Confidentiality(数据保密)组成。

Authentication(认证)

终端用户帐号

当服务基本的认证开启时,终端用户必须在和 Hadoop 服务交互前认证。最简单的方式就是使用 Kerberos 的 kinit 命令来交互认证。使用 Kerberos keytab 文件的程序认证可能会在使用 kinit 的交互登录不可用时使用。

Hadoop 进程的用户帐号

确认 HDFS 和 YARN 进程跑在不同的 Unix 用户下,比如:hdfs 和 yarn。还有,保证 MapReduce JobHistory 服务也跑在不同的用户之下,比如 mapred。

推荐它们使用同一个 Unix 组,比如:hadoop。参考“Mapping from user to group”进行组的管理。

用户:组 进程
hdfs:hadoop NameNode, Secondary NameNode, JournalNode, DataNode
yarn:hadoop ResourceManager, NodeManager
zebra stripes MapReduce JobHistory Server

Hadoop 进程的 Kerberos principals(实体)

每个 Hadoop 服务实例都必须配置他的 Kerberos principal 和 keytab 文件位置。

一个服务实体的一般格式是:服务名 /_HOST@REALM.TLD。比如:dn/_HOST@EXAMPLE.COM。

Hadoop 通过允许服务 principal 的主机组件被指定为_HOST 通配符来简化配置文件的部署。每个服务实例都会用它们自己当前运行的合法主机名来代替_HOST。这就允许管理员给所有节点部署相同设置的配置文件。但是,keytab 文件将会不同。

HDFS

NameNode 在每个 NameNode 主机上的 keytab 文件,应该看起来像这样:

$ klist -e -k -t /etc/security/keytab/nn.service.keytab
Keytab name: FILE:/etc/security/keytab/nn.service.keytab
KVNO Timestamp         Principal
   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

Seconday NameNode 在主机上的 keytab 文件,应该看起来像这样:

$ klist -e -k -t /etc/security/keytab/sn.service.keytab
Keytab name: FILE:/etc/security/keytab/sn.service.keytab
KVNO Timestamp         Principal
   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

DataNode 在每个主机上的 keytab 文件, 应该看起来像这样:

$ klist -e -k -t /etc/security/keytab/dn.service.keytab
Keytab name: FILE:/etc/security/keytab/dn.service.keytab
KVNO Timestamp         Principal
   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

YARN

在资源管理器(ResourceManager)主机上的资源管理器 keytab 文件,应该看起来像这样:

$ klist -e -k -t /etc/security/keytab/rm.service.keytab
Keytab name: FILE:/etc/security/keytab/rm.service.keytab
KVNO Timestamp         Principal
   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

在每个主机上的节点管理器(NodeManager)的 keytab 文件,应该看起来像这样:

$ klist -e -k -t /etc/security/keytab/nm.service.keytab
Keytab name: FILE:/etc/security/keytab/nm.service.keytab
KVNO Timestamp         Principal
   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

MapReduce JobHistory Server

在 MapReduce JobHistory Server 主机上的 keytab 文件,应该看起来像这样:

$ klist -e -k -t /etc/security/keytab/jhs.service.keytab
Keytab name: FILE:/etc/security/keytab/jhs.service.keytab
KVNO Timestamp         Principal
   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

映射 Kerberos principals 到操作系统用户账号

Hadoop 使用被 hadoop.security.auth_to_local 指定的规则来映射 kerberos principals 到操作系统(系统)用户账号。这些规则使用和在 Kerberos configuration file (krb5.conf) 中的 auth_to_local 相同的方式工作。另外,hadoop auth_to_local 映射支持 / L 标志来是返回的名字小写。

默认会取 principal 名字的第一部分作为系统用户名如果 realm 匹配 defaul_realm(通常被定义在 /etc/krb5.conf)。比如:默认的的规则映射 principal host/full.qualified.domain.name@REALM.TLD 到系统用户 host。默认的规则可能对大多数的集群都不合适。

在一个典型的集群中,HDFS 和 YARN 服务将分别由 hdfs 和 yarn 用户启动。hadoop.security.auth_to_local 可以被配置成这样:

<property>
  <name>hadoop.security.auth_to_local</name>
  <value>
    RULE:[2:$1@$0](nn/.*@.*REALM.TLD)s/.*/hdfs/
    RULE:[2:$1@$0](jn/.*@.*REALM.TLD)s/.*/hdfs/
    RULE:[2:$1@$0](dn/.*@.*REALM.TLD)s/.*/hdfs/
    RULE:[2:$1@$0](nm/.*@.*REALM.TLD)s/.*/yarn/
    RULE:[2:$1@$0](rm/.*@.*REALM.TLD)s/.*/yarn/
    RULE:[2:$1@$0](jhs/.*@.*REALM.TLD)s/.*/mapred/
    DEFAULT
  </value>
</property>

自定义规则可以使用 Hadoop kerbname 命令测试,这个命令运行你指定一个 principal 并应用 Hadoop 当前的 auth_to_local 规则设置。

映射用户到组

系统用户到系统组的映射机制可以通过 hadoop.security.group.mapping 配置。更多细节查看 HDFS Permissions Guide。

实际上,你需要在 Hadoop 安全模式中使用 Kerberos with LDAP 管理 SSO(单点登录) 环境。

代理用户

有些访问终端用户维护的 Hadoop 服务的产品,比如 Apache Oozie,需要能够模拟终端用户。更多细节查看 the doc of proxy user。

保护 DataNode

因为 DataNode 的数据传输协议没有使用 Hadoop RPC 框架,DataNodes 必须使用被 dfs.datanode.address 和 dfs.datanode.http.address 指定的特权端口来认证他们自己。该认证是基于假设攻击者无法获取在 DataNode 主机上的 root 特权。

当你使用 root 执行 hdfs datanode 命令时,服务器进程首先绑定特权端口,随后销毁特权并使用被 HADOOP_SECURE_DN_USER 指定的用户账号运行。这个启动进程使用被安装在 JSVC_HOME 的 the jsvc program。你必须在启动项中(hadoop-env.sh)指定 HADOOP_SECURE_DN_USER 和 JSVC_HOME 做为环境变量。

2.6.0 版本开始起,SASL 可以被使用来认证数据传输协议。这不再需要安全集群使用 jsvc 的用户启动 DataNode 并绑定特权接口。要在数据传输协议上启用 SASL,在 hdfs-site.xml 设置 dfs.data.transfer.protection,为 dfs.datanode.address 设置一个免特权端口,设置 dfs.http.policy to HTTPS_ONLY 并保证 HADOOP_SECURE_DN_USER 环境变量没有设置。注意,如果 dfs.datanode.address 是设置了一个特权端口将不可能在数据传输协议上使用 SASL。这是向后兼容的原因所要求的。

为了迁移一个存在的使用 root 认证的集群用使用 SASL 启动的方式替代。首先保证 2.6.0 或以上版本的 hadoop 已经被部署在所有的集群节点上,同时所有外部应用程序需要连接在这个集群上。只有 2.6.0 或以上版本的 HDFS 客户端可以使用 SASL 认证数据传输协议来连接 DataNode。所以,在迁移前保证所有的节点版本正确是至关重要的。所有地方的 2.6.0 或以上版本被部署之后,更新所有外部应用程序的配置来是 SASL 生效。如果以个 HDFS 客户端使用了 SASL,那么他可以成功的连接一个 DataNode,不管它使用的事 root 认证或者是 SASL 认证。配置所有的客户端保证以后在 DataNode 上的配置改变不会破坏这个应用程序。最后,每个 DataNode 个体都可以通过改变它的配置和重启来迁移。

数据保密

在 RPC 上的数据加密

在 hadoop 服务端和客户端之间传输的数据可以被加密。在 core-site.xml 上设置 hadoop.rpc.protection 隐私来激活加密。

块数据传输的数据加密

你需要在 hdfs-site.xml 上设置 dfs.encrypt.data.transfer 成 true 来激活为 Datanode 的数据传输协议的数据加密。

你可以选择性的设置 dfs.encrypt.data.transfer.algorithm 为 3des 或者 rc4 来选择使用特定的加密算法。如果不指定,那么在这个系统中,被配置的 JCE 将被默认使用,它通常情况使用 3DES。

设置 dfs.encrypt.data.transfer.cipher.suites 成 AES/CTR/NoPadding 激活 AES 加密。默认情况下,这不被指定,所以 AES 不被使用。当 AES 被使用时,在一个初始密钥交换过程中被指定在 dfs.encrypt.data.transfer.algorithm 中的算法仍然被使用。AES 密钥的长度可以通过设置 dfs.encrypt.data.transfer.cipher.key.bitlength 成 128,192,或者 256 来配置。默认是 128.

AES 提供最大的加密强度和最佳的性能。目前,3DES 和 RC4 已经经常在 Hadoop 集群中使用。

HTTP 上的数据加密

在 Web-console 和客户端的数据传输被 SSL(HTTPS) 保护。SSL 配置是推荐的,但是不需要使用 kerberos 配置 Hadoop 的安全。

配置

对于 HDFS 和本地文件系统路径的权限

下面的表格列出了各种 HDFS 和本地文件系统的路径(在所有节点上)和推荐的权限设置:

Filesystem Path User:Group Permissions
local dfs.namenode.name.dir hdfs:hadoop drwx——
local dfs.datanode.data.dir hdfs:hadoop drwx——
local $HADOOP_LOG_DIR hdfs:hadoop drwxrwxr-x
local $YARN_LOG_DIR yarn:hadoop drwxrwxr-x
local yarn.nodemanager.local-dirs yarn:hadoop drwxr-xr-x
local yarn.nodemanager.log-dirs yarn:hadoop drwxr-xr-x
local container-executor root:hadoop –Sr-s–*
local conf/container-executor.cfg root:hadoop r——-*
hdfs / hdfs:hadoop drwxr-xr-x
hdfs /tmp hdfs:hadoop drwxrwxrwxt
hdfs /user hdfs:hadoop drwxr-xr-x
hdfs yarn.nodemanager.remote-app-log-dir yarn:hadoop drwxrwxrwxt
hdfs mapreduce.jobhistory.intermediate-done-dir mapred:hadoop drwxrwxrwxt
hdfs mapreduce.jobhistory.done-dir mapred:hadoop drwxr-x—

常见的配置

为了在 Hadoop 上开启 RPC 认证,设置 hadoop.security.authentication 的属性值为“kerberos”,并且合理地设置在下面列出的安全相关的配置项。

下面的属性应该在集群中所有节点的 core-site.xml 文件中。

Parameter Value Notes
hadoop.security.authentication kerberos simple : No authentication. (default) kerberos : Enable authentication by Kerberos.
hadoop.security.authorization true Enable RPC service-level authorization.
hadoop.rpc.protection authentication authentication : authentication only (default); integrity : integrity check in addition to authentication; privacy : data encryption in addition to integrity
hadoop.security.auth_to_local RULE:exp1 RULE:exp2 … DEFAULT The value is string containing new line characters. See Kerberos documentation for the format of exp.
hadoop.proxyuser.superuser.hosts   comma separated hosts from which superuser access are allowed to impersonation. * means wildcard.
hadoop.proxyuser.superuser.groups   comma separated groups to which users impersonated by superuser belong. * means wildcard.

NameNode

Parameter Value Notes
dfs.block.access.token.enable true Enable HDFS block access tokens for secure operations.
dfs.namenode.kerberos.principal nn/_HOST@REALM.TLD Kerberos principal name for the NameNode.
dfs.namenode.keytab.file /etc/security/keytab/nn.service.keytab Kerberos keytab file for the NameNode.
dfs.namenode.kerberos.internal.spnego.principal HTTP/_HOST@REALM.TLD The server principal used by the NameNode for web UI SPNEGO authentication. The SPNEGO server principal begins with the prefix HTTP/ by convention. If the value is‘*’, the web server will attempt to login with every principal specified in the keytab file dfs.web.authentication.kerberos.keytab. For most deployments this can be set to ${dfs.web.authentication.kerberos.principal} i.e use the value of dfs.web.authentication.kerberos.principal.
dfs.web.authentication.kerberos.keytab /etc/security/keytab/spnego.service.keytab SPNEGO keytab file for the NameNode. In HA clusters this setting is shared with the Journal Nodes.

下面的设置允许配置 SSL 访问 NameNode 的 web UI(可选)。

Parameter Value Notes
dfs.http.policy HTTP_ONLY or HTTPS_ONLY or HTTP_AND_HTTPS HTTPS_ONLY turns off http access. This option takes precedence over the deprecated configuration dfs.https.enable and hadoop.ssl.enabled. If using SASL to authenticate data transfer protocol instead of running DataNode as root and using privileged ports, then this property must be set to HTTPS_ONLY to guarantee authentication of HTTP servers. (See dfs.data.transfer.protection.)
dfs.namenode.https-address nn_host_fqdn:50470  
dfs.https.port 50470  
dfs.https.enable true This value is deprecated. Use dfs.http.policy

Secondary NameNode

Parameter Value Notes
dfs.namenode.secondary.http-address snn_host_fqdn:50090  
dfs.secondary.namenode.keytab.file /etc/security/keytab/sn.service.keytab Kerberos keytab file for the Secondary NameNode.
dfs.secondary.namenode.kerberos.principal sn/_HOST@REALM.TLD Kerberos principal name for the Secondary NameNode.
dfs.secondary.namenode.kerberos.internal.spnego.principal HTTP/_HOST@REALM.TLD The server principal used by the Secondary NameNode for web UI SPNEGO authentication. The SPNEGO server principal begins with the prefix HTTP/ by convention. If the value is‘*’, the web server will attempt to login with every principal specified in the keytab file dfs.web.authentication.kerberos.keytab. For most deployments this can be set to ${dfs.web.authentication.kerberos.principal} i.e use the value of dfs.web.authentication.kerberos.principal.
dfs.namenode.secondary.https-port 50470  

JournalNode

Parameter Value Notes
dfs.journalnode.kerberos.principal jn/_HOST@REALM.TLD Kerberos principal name for the JournalNode.
dfs.journalnode.keytab.file /etc/security/keytab/jn.service.keytab Kerberos keytab file for the JournalNode.
dfs.journalnode.kerberos.internal.spnego.principal HTTP/_HOST@REALM.TLD The server principal used by the JournalNode for web UI SPNEGO authentication when Kerberos security is enabled. The SPNEGO server principal begins with the prefix HTTP/ by convention. If the value is‘*’, the web server will attempt to login with every principal specified in the keytab file dfs.web.authentication.kerberos.keytab. For most deployments this can be set to ${dfs.web.authentication.kerberos.principal} i.e use the value of dfs.web.authentication.kerberos.principal.
dfs.web.authentication.kerberos.keytab /etc/security/keytab/spnego.service.keytab SPNEGO keytab file for the JournalNode. In HA clusters this setting is shared with the Name Nodes.

DataNode

Parameter Value Notes
dfs.datanode.data.dir.perm 700  
dfs.datanode.address 0.0.0.0:1004 Secure DataNode must use privileged port in order to assure that the server was started securely. This means that the server must be started via jsvc. Alternatively, this must be set to a non-privileged port if using SASL to authenticate data transfer protocol. (See dfs.data.transfer.protection.)
dfs.datanode.http.address 0.0.0.0:1006 Secure DataNode must use privileged port in order to assure that the server was started securely. This means that the server must be started via jsvc.
dfs.datanode.https.address 0.0.0.0:50470  
dfs.datanode.kerberos.principal dn/_HOST@REALM.TLD Kerberos principal name for the DataNode.
dfs.datanode.keytab.file /etc/security/keytab/dn.service.keytab Kerberos keytab file for the DataNode.
dfs.encrypt.data.transfer false set to true when using data encryption
dfs.encrypt.data.transfer.algorithm   optionally set to 3des or rc4 when using data encryption to control encryption algorithm
dfs.encrypt.data.transfer.cipher.suites   optionally set to AES/CTR/NoPadding to activate AES encryption when using data encryption
dfs.encrypt.data.transfer.cipher.key.bitlength   optionally set to 128, 192 or 256 to control key bit length when using AES with data encryption
dfs.data.transfer.protection   authentication : authentication only; integrity : integrity check in addition to authentication; privacy : data encryption in addition to integrity This property is unspecified by default. Setting this property enables SASL for authentication of data transfer protocol. If this is enabled, then dfs.datanode.address must use a non-privileged port, dfs.http.policy must be set to HTTPS_ONLY and the HADOOP_SECURE_DN_USER environment variable must be undefined when starting the DataNode process.

WebHDFS

Parameter Value Notes
dfs.web.authentication.kerberos.principal h ttp/_HOST@REALM.TLD Kerberos principal name for the WebHDFS. In HA clusters this setting is commonly used by the JournalNodes for securing access to the JournalNode HTTP server with SPNEGO.
dfs.web.authentication.kerberos.keytab /etc/security/keytab/http.service.keytab Kerberos keytab file for WebHDFS. In HA clusters this setting is commonly used the JournalNodes for securing access to the JournalNode HTTP server with SPNEGO.

ResourceManager

Parameter Value Notes
yarn.resourcemanager.principal rm/_HOST@REALM.TLD Kerberos principal name for the ResourceManager.
yarn.resourcemanager.keytab /etc/security/keytab/rm.service.keytab Kerberos keytab file for the ResourceManager.

NodeManager

Parameter Value Notes
yarn.nodemanager.principal nm/_HOST@REALM.TLD Kerberos principal name for the NodeManager.
yarn.nodemanager.keytab /etc/security/keytab/nm.service.keytab Kerberos keytab file for the NodeManager.
yarn.nodemanager.container-executor.class org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor Use LinuxContainerExecutor.
yarn.nodemanager.linux-container-executor.group hadoop Unix group of the NodeManager.
yarn.nodemanager.linux-container-executor.path /path/to/bin/container-executor The path to the executable of Linux container executor.

WebAppProxy 配置

WebAppProxy 在应用程序输出的 web 应用和一个终端用户之间提供一个代理。如果安全机制被启用,在用户访问一个潜在不安全的 web 应用时它会发出警告。认证和使用代理的认证和其他加密的 web 应用一样被处理。

Parameter Value Notes
yarn.web-proxy.address WebAppProxy host:port for proxy to AM web apps. host:port if this is the same as yarn.resourcemanager.webapp.address or it is not defined then the ResourceManager will run the proxy otherwise a standalone proxy server will need to be launched.
yarn.web-proxy.keytab /etc/security/keytab/web-app.service.keytab Kerberos keytab file for the WebAppProxy.
yarn.web-proxy.principal wap/_HOST@REALM.TLD Kerberos principal name for the WebAppProxy.

LinuxContainerExecutor

一个被 YARN 框架使用的 ContainerExecutor(容器执行者)定义了任何 container 如何被启动和控制。

下面在 Hadoop YARN 中是可用的:

ContainerExecutor Description
DefaultContainerExecutor The default executor which YARN uses to manage container execution. The container process has the same Unix user as the NodeManager.
LinuxContainerExecutor Supported only on GNU/Linux, this executor runs the containers as either the YARN user who submitted the application (when full security is enabled) or as a dedicated user (defaults to nobody) when full security is not enabled. When full security is enabled, this executor requires all user accounts to be created on the cluster nodes where the containers are launched. It uses a setuid executable that is included in the Hadoop distribution. The NodeManager uses this executable to launch and kill containers. The setuid executable switches to the user who has submitted the application and launches or kills the containers. For maximum security, this executor sets up restricted permissions and user/group ownership of local files and directories used by the containers such as the shared objects, jars, intermediate files, log files etc. Particularly note that, because of this, except the application owner and NodeManager, no other user can access any of the local files/directories including those localized as part of the distributed cache.

构建 LinuxContainerExecutor 可执行文件,执行:

$ mvn package -Dcontainer-executor.conf.dir=/etc/hadoop/

这个可执行文件必须有特殊的权限:6050 或者–Sr-s—权限被 root 用户所拥有(super-user)和被特殊组(比如:hadoop)所拥有,这个组中 NodeManager Unix 用户是他的成员并且没有其他普通应用用户。如果有其他应用的用户属于这个特殊的组,那么安全性就不能得到保证了。这个特殊的组的名字应该被指定在 yarn.nodemanager.linux-container-executor.group 配置属性中,conf/yarn-site.xml 和 conf/container-executor.cfg 有需要。

比如,假设 NodeManager 使用 yarn 用户(是 users 和 hadoop 组的一部分,他们中的任何一个都是主要的组)运行。让 users 组中处理 yarn 还有另外一个用户 alice(应用提交者),并且 alice 不在 hadoop 组中。根据以上的描述,setuid/setgid 可执行文件一个被设置成 6050 或者–Sr-s—,user-owner 是 yarn,group-owner 是 hadoop,yarn 是 hadoop 的成员(而不是 users 组,它出了 yarn 用户外还有一个 alice 的用户)。

LinuxTaskController 要求被指定在 yarn.nodemanager.local-dirs 和 yarn.nodemanager.log-dirs 的包含路径和引导到的目录,它就像上面的表格中描述的一样被设置成 775 权限在权限路径上。

  • conf/container-executor.cfg

这个可执行文件需要一个叫做 container-executor.cfg 的配置文件,在配置路径中出现,通过之前提到的 MVN target。

这个配置文件必须被运行 NodeManager 的用户所拥有(比如上面例子中的 yarn 用户),被任何拥有 0400 或 r——–权限的组所拥有。

这个可执行文件需要以下在 conf/container-executor.cfg 文件中出现的配置项。这些项目应该被要求成简单的 key=value(键值对),每一项一行。

Parameter Value Notes
yarn.nodemanager.linux-container-executor.group hadoop Unix group of the NodeManager. The group owner of the container-executor binary should be this group. Should be same as the value with which the NodeManager is configured. This configuration is required for validating the secure access of the container-executor binary.
banned.users hdfs,yarn,mapred,bin Banned users.
allowed.system.users foo,bar Allowed system users.
min.user.id 1000 Prevent other super-users.

复习一下,这里是本地文件系统各种与 LinuxContainerExecutor 相关的路径的权限要求:

Filesystem Path User:Group Permissions
local container-executor root:hadoop –Sr-s–*
local conf/container-executor.cfg root:hadoop r——-*
local yarn.nodemanager.local-dirs yarn:hadoop drwxr-xr-x
local yarn.nodemanager.log-dirs yarn:hadoop drwxr-xr-x

MapReduce JobHistory Server

Parameter Value Notes
mapreduce.jobhistory.address MapReduce JobHistory Server host:port Default port is 10020.
mapreduce.jobhistory.keytab /etc/security/keytab/jhs.service.keytab Kerberos keytab file for the MapReduce JobHistory Server.
mapreduce.jobhistory.principal jhs/_HOST@REALM.TLD Kerberos principal name for the MapReduce JobHistory Server.

多宿主

多宿主(每个主机可能在 DNS 上有多个主机名,比如:不同的主机名对应公共和私有的网络接口)的设置,可需要额外的配置来使 kerberos 工作。查看 HDFS Support for Multihomed Networks。

参考

  1. O’Malley O et al. Hadoop Security Design
  2. O’Malley O, Hadoop Security Architecture
  3. Troubleshooting Kerberos on Java 7
  4. Troubleshooting Kerberos on Java 8
  5. Java 7 Kerberos Requirements
  6. Java 8 Kerberos Requirements
  7. Loughran S., Hadoop and Kerberos: The Madness beyond the Gate

更多 Hadoop 相关信息见 Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13

本文永久更新链接地址 :http://www.linuxidc.com/Linux/2016-09/134948.htm

正文完
星哥说事-微信公众号
post-qrcode
 0
星锅
版权声明:本站原创文章,由 星锅 于2022-01-21发表,共计19812字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
【腾讯云】推广者专属福利,新客户无门槛领取总价值高达2860元代金券,每种代金券限量500张,先到先得。
阿里云-最新活动爆款每日限量供应
评论(没有评论)
验证码
【腾讯云】云服务器、云数据库、COS、CDN、短信等云产品特惠热卖中