阿里云-云小站(无限量代金券发放中)
【腾讯云】云服务器、云数据库、COS、CDN、短信等热卖云产品特惠抢购

OpenStack环境下Hadoop2.2.0环境搭建

93次阅读
没有评论

共计 13434 个字符,预计需要花费 34 分钟才能阅读完成。

OpenStack 目前已经成为众多云计算厂商搭建私有云的首选,众多学术机构也使用 OpenStack 搭建小规模测试环境供学生实验,在此分享使用 OpenStack 虚拟机搭建 Hadoop2.2.0 环境的过程。

1.VM 环境准备

OpenStack 版本:Folsom

a. 发起三台测试虚拟机,操作系统为 Ubuntu-12.04.2-x86_64

b. 配置 IP 地址,因为在 F 版本的 OpenStack 中,网络采用 FlatDHCP 模式使得虚拟机获得 10.0.x.x 段的 Fixed IP 地址,因此需要在虚拟机中配置 /etc/hosts 文件。

# vim /etc/hosts

127.0.0.1 localhost localhost.localdomain
10.0.0.225 hdp-server-01
10.0.1.19 hdp-server-02
10.0.1.17 hdp-server-03

c. 用 root 在每台机器上新建用户 yarn,使用同样的密码

# useradd -m -s /bin/bash yarn
# passwd yarn
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully

d. 设置 ssh 无密码互访

# 每台机器
$ su yarn
$ cd ~
$ ssh-keygen -t rsa
$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys
# 可以使用 ssh localhost 测试是否可以无密码访问
# 相互之间可以将.ssh/authorized_keys 的内容互拷到对方的.ssh/authorized_keys 文件中。

e. 使用 yarn 账户,通过 /etc/hosts 文件中填写的主机名进行互访,并验证是否无密码登录。

因为采用 64 位的操作系统,因此不能够直接使用从官网下载的文件进行安装,必须手动编译。以下为编译过程:

2. 编译 Hadoop2.2.0

a. 配置 JDK 环境变量,假设 jdk 文件夹为 /usr/java/jdk1.7.0_45

su yarn
# vim ~/.bashrc
# 追加写入

export JAVA_HOME=/usr/local/java/jdk1.7.0_45
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

$source ~/.bashrc #使设置生效

 

 

相关阅读

在 Ubuntu 12.10 上安装部署 Openstack http://www.linuxidc.com/Linux/2013-08/88184.htm

Ubuntu 12.04 OpenStack Swift 单节点部署手册 http://www.linuxidc.com/Linux/2013-08/88182.htm

OpenStack 云计算快速入门教程 http://www.linuxidc.com/Linux/2013-08/88186.htm

企业部署 OpenStack:该做与不该做的事 http://www.linuxidc.com/Linux/2013-09/90428.htm

Ubuntu 上搭建 Hadoop 环境(单机模式 + 伪分布模式)http://www.linuxidc.com/Linux/2013-01/77681.htm

Ubuntu 下 Hadoop 环境的配置 http://www.linuxidc.com/Linux/2012-11/74539.htm

单机版搭建 Hadoop 环境图文教程详解 http://www.linuxidc.com/Linux/2012-02/53927.htm

b. 安装编译所需依赖

$sudo apt-get install g++ autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev

c. 安装 protobuf 2.5.0

$cd $HOME/protobuf2.5.0

$./configure –prefix=/usr
$sudo make
$sudo make check
$sudo make install

$ protoc –version
libprotoc 2.5.0

d. 安装 maven

$ sudo apt-get install maven

e. 开始编译

$ cd ~
$ tar -xzvf Hadoop-2.2.0-src.tar.gz
$ cd hadoop-2.2.0-src/
$ mvn package -Pdist,native -DskipTests -Dtar

编译大概耗时约 30 分钟,编译完后的文件在 hadoop-2.2.0-src/hadoop-dist/target 路径中。如果发起的虚拟机都是相同操作系统,编译只需要在一台机器上执行。

验证编译结果:

yarn@hdp-server-01:~$ $HOME/hadoop-2.2.0/bin/hadoop version
Hadoop 2.2.0
Subversion Unknown -r Unknown
Compiled by yarn on 2013-11-05T06:41Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4

This command was run using /home/yarn/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar

 

yarn@hdp-server-01:~$ file $HOME/hadoop-2.2.0/lib/native/*
/home/yarn/hadoop-2.2.0/lib/native/libhadoop.a:        current ar archive
/home/yarn/hadoop-2.2.0/lib/native/libhadooppipes.a:  current ar archive
/home/yarn/hadoop-2.2.0/lib/native/libhadoop.so:      ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=0xaa74c9d23bfe750f160412e4465b14c88cf1c650, not stripped
/home/yarn/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=0xaa74c9d23bfe750f160412e4465b14c88cf1c650, not stripped
/home/yarn/hadoop-2.2.0/lib/native/libhadooputils.a:  current ar archive
/home/yarn/hadoop-2.2.0/lib/native/libhdfs.a:          current ar archive
/home/yarn/hadoop-2.2.0/lib/native/libhdfs.so:        ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=0x89671252f3c5fb7034425e80c9d31ea67da75c4d, not stripped
/home/yarn/hadoop-2.2.0/lib/native/libhdfs.so.0.0.0:  ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=0x89671252f3c5fb7034425e80c9d31ea67da75c4d, not stripped

OpenStack 目前已经成为众多云计算厂商搭建私有云的首选,众多学术机构也使用 OpenStack 搭建小规模测试环境供学生实验,在此分享使用 OpenStack 虚拟机搭建 Hadoop2.2.0 环境的过程。

1.VM 环境准备

OpenStack 版本:Folsom

a. 发起三台测试虚拟机,操作系统为 Ubuntu-12.04.2-x86_64

b. 配置 IP 地址,因为在 F 版本的 OpenStack 中,网络采用 FlatDHCP 模式使得虚拟机获得 10.0.x.x 段的 Fixed IP 地址,因此需要在虚拟机中配置 /etc/hosts 文件。

# vim /etc/hosts

127.0.0.1 localhost localhost.localdomain
10.0.0.225 hdp-server-01
10.0.1.19 hdp-server-02
10.0.1.17 hdp-server-03

c. 用 root 在每台机器上新建用户 yarn,使用同样的密码

# useradd -m -s /bin/bash yarn
# passwd yarn
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully

d. 设置 ssh 无密码互访

# 每台机器
$ su yarn
$ cd ~
$ ssh-keygen -t rsa
$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys
# 可以使用 ssh localhost 测试是否可以无密码访问
# 相互之间可以将.ssh/authorized_keys 的内容互拷到对方的.ssh/authorized_keys 文件中。

e. 使用 yarn 账户,通过 /etc/hosts 文件中填写的主机名进行互访,并验证是否无密码登录。

因为采用 64 位的操作系统,因此不能够直接使用从官网下载的文件进行安装,必须手动编译。以下为编译过程:

2. 编译 Hadoop2.2.0

a. 配置 JDK 环境变量,假设 jdk 文件夹为 /usr/java/jdk1.7.0_45

su yarn
# vim ~/.bashrc
# 追加写入

export JAVA_HOME=/usr/local/java/jdk1.7.0_45
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

$source ~/.bashrc #使设置生效

 

 

相关阅读

在 Ubuntu 12.10 上安装部署 Openstack http://www.linuxidc.com/Linux/2013-08/88184.htm

Ubuntu 12.04 OpenStack Swift 单节点部署手册 http://www.linuxidc.com/Linux/2013-08/88182.htm

OpenStack 云计算快速入门教程 http://www.linuxidc.com/Linux/2013-08/88186.htm

企业部署 OpenStack:该做与不该做的事 http://www.linuxidc.com/Linux/2013-09/90428.htm

Ubuntu 上搭建 Hadoop 环境(单机模式 + 伪分布模式)http://www.linuxidc.com/Linux/2013-01/77681.htm

Ubuntu 下 Hadoop 环境的配置 http://www.linuxidc.com/Linux/2012-11/74539.htm

单机版搭建 Hadoop 环境图文教程详解 http://www.linuxidc.com/Linux/2012-02/53927.htm

3. 安装配置 Hadoop2.2.0

假定各 node 角色划分如下:

hdp-server-01    resourcemanager, nodemanager, proxyserver,historyserver, datanode, namenode
hdp-server-02    datanode, nodemanager
hdp-server-03    datanode, nodemanager

a. 目录准备

mkdir -p ~/yarn_data/tmp
mkdir -p ~/yarn_data/mapred

b. 配置环境变量(追加至~/.bashrc)

#hadoop env
export HADOOP_HOME=”$HOME/hadoop-2.2.0″
export HADOOP_PREFIX=”$HADOOP_HOME/”
export YARN_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=”$HADOOP_HOME”
export HADOOP_COMMON_HOME=”$HADOOP_HOME”
export HADOOP_HDFS_HOME=”$HADOOP_HOME”
export HADOOP_CONF_DIR=”$HADOOP_HOME/etc/hadoop/”
export YARN_CONF_DIR=$HADOOP_CONF_DIR
export PATH=”$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH”

c. 修改官方启动脚本

$ cd $YARN_HOME/libexec/
$ vim hadoop-config.sh
# 修改第 96 行代码为:
export HADOOP_SLAVES=”${HADOOP_CONF_DIR}/$1″
# 保存退出 vim

d. 设置配置文件

<!– $YARN_HOME/etc/hadoop/core-site.xml –>

<?xml version=”1.0″ encoding=”UTF-8″?>
<?xml-stylesheet type=”text/xsl” href=”https://www.linuxidc.com/Linux/2014-01/configuration.xsl”?>

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://hdp-server-01:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/yarn/yarn_data/tmp/hadoop-grid</value>
  </property>
</configuration>

 

<!– $YARN_HOME/etc/hadoop/hdfs-site.xml –>

<?xml version=”1.0″ encoding=”UTF-8″?>
<?xml-stylesheet type=”text/xsl” href=”https://www.linuxidc.com/Linux/2014-01/configuration.xsl”?>

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
</configuration>

 

<!– $YARN_HOME/etc/hadoop/yarn-site.xml –>

<?xml version=”1.0″?>

<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>hdp-server-01:8032</value>
  </property>
  <property>
      <name>yarn.resourcemanager.resource-tracker.address</name>
      <value>hdp-server-01:8031</value>
  </property>
  <property>
      <name>yarn.resourcemanager.admin.address</name>
      <value>hdp-server-01:8033</value>
  </property>
  <property>
      <name>yarn.resourcemanager.scheduler.address</name>
      <value>hdp-server-01:8030</value>
  </property>
  <property>
      <name>yarn.nodemanager.loacl-dirs</name>
      <value>/home/yarn/yarn_data/mapred/nodemanager</value>
      <final>true</final>
  </property>
  <property>
      <name>yarn.web-proxy.address</name>
      <value>hdp-server-01:8888</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
</configuration>

 

 

<!– $YARN_HOME/etc/hadoop/mapred-site.xml –>

<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”https://www.linuxidc.com/Linux/2014-01/configuration.xsl”?>

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

 

至此,配置文件修改完毕,修改完后将 $HOME/hadoop-2.2.0 及 $HOME/yarn_data 两个文件分别拷贝至其他机器的同样位置,注意确保文件所有者为 yarn。

e. HDFS 格式化

$ hdfs namenode -format

f. 启动 Hadoop

@hdp-server-01 #在不同 vm 上启动的服务不同,根据划分的角色

$cd $YARN_HOME
$sbin/hadoop-daemon.sh  –script hdfs start namenode # 启动 namenode
$sbin/hadoop-daemon.sh –script hdfs start datanode # 启动 datanode
$sbin/yarndaemon.shstart nodemanager #启动 nodemanager
$sbin/yarn-daemon.sh  start resourcemanager # 启动 resourcemanager
$sbin/yarn-daemon.shstart proxyserver #启动 web App proxy
$sbin/mr-jobhistory-daemon.sh  start historyserver

jps 查看
$ jps
8770 ResourceManager
11609 Jps
8644 NodeManager
9071 JobHistoryServer
8479 NameNode
9000 WebAppProxyServer
8552 DataNode

@hdp-server-02
@hdp-server-03

$cd $YARN_HOME
$sbin/yarndaemon.shstart nodemanager # 启动 nodemanager
$sbin/hadoop-daemon.sh  –script hdfs start datanode # 启动 datanode

jps 查看
$ jps
6691 NodeManager
9089 Jps
6787 DataNode

至此,集群搭建完毕,跑一个测试用例试试:

cd $YARN_HOME
$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 1000

这是 mongodb 蒙特卡洛算法计算圆周率的测试用例,pi 后跟的两个数字分别表示使用多少个 map 以及计算的精度。结果如下:

Number of Maps  = 10
Samples per Map = 1000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
13/12/22 17:50:42 INFO client.RMProxy: Connecting to ResourceManager at hdp-server-01/10.0.0.225:8032
13/12/22 17:50:43 INFO input.FileInputFormat: Total input paths to process : 10
13/12/22 17:50:43 INFO mapreduce.JobSubmitter: number of splits:10
13/12/22 17:50:43 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
13/12/22 17:50:43 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/12/22 17:50:43 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
13/12/22 17:50:43 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/12/22 17:50:43 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/12/22 17:50:43 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/12/22 17:50:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1387700249346_0004
13/12/22 17:50:44 INFO impl.YarnClientImpl: Submitted application application_1387700249346_0004 to ResourceManager at hdp-server-01/10.0.0.225:8032
13/12/22 17:50:44 INFO mapreduce.Job: The url to track the job: http://hdp-server-01:8888/proxy/application_1387700249346_0004/
13/12/22 17:50:44 INFO mapreduce.Job: Running job: job_1387700249346_0004
13/12/22 17:50:53 INFO mapreduce.Job: Job job_1387700249346_0004 running in uber mode : false
13/12/22 17:50:53 INFO mapreduce.Job:  map 0% reduce 0%
13/12/22 17:51:03 INFO mapreduce.Job:  map 40% reduce 0%
13/12/22 17:51:13 INFO mapreduce.Job:  map 90% reduce 0%
13/12/22 17:51:14 INFO mapreduce.Job:  map 100% reduce 0%
13/12/22 17:51:15 INFO mapreduce.Job:  map 100% reduce 100%
13/12/22 17:51:16 INFO mapreduce.Job: Job job_1387700249346_0004 completed successfully
13/12/22 17:51:16 INFO mapreduce.Job: Counters: 43
 File System Counters
  FILE: Number of bytes read=226
  FILE: Number of bytes written=878638
  FILE: Number of read operations=0
  FILE: Number of large read operations=0
  FILE: Number of write operations=0
  HDFS: Number of bytes read=2680
  HDFS: Number of bytes written=215
  HDFS: Number of read operations=43
  HDFS: Number of large read operations=0
  HDFS: Number of write operations=3
 Job Counters
  Launched map tasks=10
  Launched reduce tasks=1
  Data-local map tasks=10
  Total time spent by all maps in occupied slots (ms)=142127
  Total time spent by all reduces in occupied slots (ms)=8333
 Map-Reduce Framework
  Map input records=10
  Map output records=20
  Map output bytes=180
  Map output materialized bytes=280
  Input split bytes=1500
  Combine input records=0
  Combine output records=0
  Reduce input groups=2
  Reduce shuffle bytes=280
  Reduce input records=20
  Reduce output records=0
  Spilled Records=40
  Shuffled Maps =10
  Failed Shuffles=0
  Merged Map outputs=10
  GC time elapsed (ms)=2606
  CPU time spent (ms)=11090
  Physical memory (bytes) snapshot=2605563904
  Virtual memory (bytes) snapshot=11336945664
  Total committed heap usage (bytes)=2184183808
 Shuffle Errors
  BAD_ID=0
  CONNECTION=0
  IO_ERROR=0
  WRONG_LENGTH=0
  WRONG_MAP=0
  WRONG_REDUCE=0
 File Input Format Counters
  Bytes Read=1180
 File Output Format Counters
  Bytes Written=97
Job Finished in 34.098 seconds
Estimated value of Pi is 3.14080000000000000000

总结:

1. 在 OpenStack 环境启动的虚拟机中搭建 Hadoop 与物理机搭建并无太大不同,但需要注意虚拟机获取到的 IP 地址,用 openstack 分配的浮动 ip(Floating ip)往往不能使用。因为浮动 ip 是由 nova-network 设置,用于 nat 转发的,虚拟机自身并不知道这个地址。

2. 集群中使用的虚拟机最好是同样的操作系统,这样可以使用编译好的文件,因为在 Hadoop2.2.0 框架中 hdfs 不存在 Master 节点,因此每个节点的配置文件都是相同的,故可以先发起一台虚拟机,安装配置完之后将其做成镜像,后续可以起多个节点,区别在于启动的服务不同。

更多 Hadoop 相关信息见 Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13

正文完
星哥说事-微信公众号
post-qrcode
 
星锅
版权声明:本站原创文章,由 星锅 2022-01-20发表,共计13434字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
【腾讯云】推广者专属福利,新客户无门槛领取总价值高达2860元代金券,每种代金券限量500张,先到先得。
阿里云-最新活动爆款每日限量供应
评论(没有评论)
验证码
【腾讯云】云服务器、云数据库、COS、CDN、短信等云产品特惠热卖中