阿里云-云小站(无限量代金券发放中)
【腾讯云】云服务器、云数据库、COS、CDN、短信等热卖云产品特惠抢购

Linux下搭建Hadoop详细步骤

160次阅读
没有评论

共计 13960 个字符,预计需要花费 35 分钟才能阅读完成。

装好虚拟机 +Linux,并且主机网络和虚拟机网络互通。以及 Linux 上装好 JDK

————————————– 分割线 ————————————–

Ubuntu 13.04 上搭建 Hadoop 环境 http://www.linuxidc.com/Linux/2013-06/86106.htm

Ubuntu 12.10 +Hadoop 1.2.1 版本集群配置 http://www.linuxidc.com/Linux/2013-09/90600.htm

Ubuntu 上搭建 Hadoop 环境(单机模式 + 伪分布模式)http://www.linuxidc.com/Linux/2013-01/77681.htm

Ubuntu 下 Hadoop 环境的配置 http://www.linuxidc.com/Linux/2012-11/74539.htm

单机版搭建 Hadoop 环境图文教程详解 http://www.linuxidc.com/Linux/2012-02/53927.htm

————————————– 分割线 ————————————–

1:在 Linux 下输入命令 vi /etc/profile 添加 HADOOP_HOME

export  JAVA_HOME=/home/hadoop/export/jdk
export  HADOOP_HOME=/home/hadoop/export/hadoop
export  PATH=.:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

2:修改 hadoop/conf 目录下面 hadoop-env.sh 第九行

export JAVA_HOME=/home/hadoop/export/jdk

3:修改 hadoop/conf 目录下面 core-site.xml

<configuration>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/…/tmp</value>
        </property>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://127.0.0.1:9000</value>
        </property>
</configuration>

4:修改 hadoop/conf 目录下面 hdfs-site.xml

<configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
</configuration>

5:修改 hadoop/conf 目录下面 mapred-site.xml

<configuration>
        <property>
                <name>mapred.job.tracker</name>
                <value>127.0.0.1:9001</value>
        </property>
</configuration>

修改完成。
转到 hadoop/bin 下面输入 hadoop namenode -format
出现如下:(说明成功)

Warning: $HADOOP_HOME is deprecated.

14/07/15 16:06:27 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:  host = ubuntu/127.0.1.1
STARTUP_MSG:  args = [-format]
STARTUP_MSG:  version = 1.2.1
STARTUP_MSG:  build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by ‘mattf’ on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:  java = 1.7.0_55
************************************************************/

14/07/15 16:07:09 INFO util.GSet: Computing capacity for map BlocksMap
14/07/15 16:07:09 INFO util.GSet: VM type      = 32-bit
14/07/15 16:07:09 INFO util.GSet: 2.0% max memory = 1013645312
14/07/15 16:07:09 INFO util.GSet: capacity      = 2^22 = 4194304 entries
14/07/15 16:07:09 INFO util.GSet: recommended=4194304, actual=4194304
14/07/15 16:07:10 INFO namenode.FSNamesystem: fsOwner=hadoop
14/07/15 16:07:10 INFO namenode.FSNamesystem: supergroup=supergroup
14/07/15 16:07:10 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/07/15 16:07:10 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
14/07/15 16:07:10 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/07/15 16:07:10 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
14/07/15 16:07:10 INFO namenode.NameNode: Caching file names occuring more than 10 times
14/07/15 16:07:10 INFO common.Storage: Image file /home/hadoop/tmp/dfs/name/current/fsimage of size 118 bytes saved in 0 seconds.
14/07/15 16:07:10 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/hadoop/tmp/dfs/name/current/edits
14/07/15 16:07:10 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/hadoop/tmp/dfs/name/current/edits
14/07/15 16:07:10 INFO common.Storage: Storage directory /home/hadoop/tmp/dfs/name has been successfully formatted.
14/07/15 16:07:10 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/

更多详情见请继续阅读下一页的精彩内容:http://www.linuxidc.com/Linux/2014-07/104312p2.htm

在这一部分中有一部分人会出现失败的情况,但是你一定要去查 Hadoop 下面 logs 里面的输出异常很详细。
第一次失败一定要记住删掉 tmp 下面的输出。因为有可能会出现不兼容的情况。

然后输入 start-all.sh

Warning: $HADOOP_HOME is deprecated.

starting namenode, logging to /home/hadoop/export/hadoop/libexec/../logs/hadoop-hadoop-namenode-Ubuntu.out
localhost: starting datanode, logging to /home/hadoop/export/hadoop/libexec/../logs/hadoop-hadoop-datanode-ubuntu.out
localhost: starting secondarynamenode, logging to /home/hadoop/export/hadoop/libexec/../logs/hadoop-hadoop-secondarynamenode-ubuntu.out
starting jobtracker, logging to /home/hadoop/export/hadoop/libexec/../logs/hadoop-hadoop-jobtracker-ubuntu.out
localhost: starting tasktracker, logging to /home/hadoop/export/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-ubuntu.out

在上面的过程中可能会提示你输入密码, 这时你可以设置个 ssh 免密码登陆,我博客里面有。
输入 jps 出现如下:(少一个 datanode,这里我故意设置一个错误)
10666 NameNode
11547 Jps
11445 TaskTracker
11130 SecondaryNameNode
11218 JobTracker

 

查看 logs

2014-07-15 16:13:43,032 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2014-07-15 16:13:43,094 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2014-07-15 16:13:43,098 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2014-07-15 16:13:43,118 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2014-07-15 16:13:43,999 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2014-07-15 16:13:44,044 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2014-07-15 16:13:45,484 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /home/hadoop/tmp/dfs/data: namenode namespaceID = 224603228; datanode namespaceID = 566757162
 at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
 at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
 at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)
 at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:321)
 at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
 at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
 at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
 at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
 at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)

这时你只要删除 tmp 下的文件,问题解决。

 

然后你可以执行一个实例:具体操作如下

hadoop@ubuntu:~/export/hadoop$ ls
bin          hadoop-ant-1.2.1.jar          ivy          README.txt
build.xml    hadoop-client-1.2.1.jar      ivy.xml      sbin
c++          hadoop-core-1.2.1.jar        lib          share
CHANGES.txt  hadoop-examples-1.2.1.jar    libexec      src
conf        hadoop-minicluster-1.2.1.jar  LICENSE.txt  webapps
contrib      hadoop-test-1.2.1.jar        logs
docs        hadoop-tools-1.2.1.jar        NOTICE.txt

进行上传 hdfs 文件操作

hadoop@ubuntu:~/export/hadoop$ hadoop fs -put README.txt  /
Warning: $HADOOP_HOME is deprecated.

如上说明上传成功。
执行一段 wordcount 程序(进行对 README.txt 文件处理)

hadoop@ubuntu:~/export/hadoop$ hadoop jar hadoop-examples-1.2.1.jar word
count /README.txt /wordcountoutput
Warning: $HADOOP_HOME is deprecated.

14/07/15 15:23:01 INFO input.FileInputFormat: Total input paths to process : 1
14/07/15 15:23:01 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/07/15 15:23:01 WARN snappy.LoadSnappy: Snappy native library not loaded
14/07/15 15:23:02 INFO mapred.JobClient: Running job: job_201407141636_0001
14/07/15 15:23:03 INFO mapred.JobClient:  map 0% reduce 0%
14/07/15 15:23:15 INFO mapred.JobClient:  map 100% reduce 0%
14/07/15 15:23:30 INFO mapred.JobClient:  map 100% reduce 100%
14/07/15 15:23:32 INFO mapred.JobClient: Job complete: job_201407141636_0001
14/07/15 15:23:32 INFO mapred.JobClient: Counters: 29
14/07/15 15:23:32 INFO mapred.JobClient:  Job Counters
14/07/15 15:23:32 INFO mapred.JobClient:    Launched reduce tasks=1
14/07/15 15:23:32 INFO mapred.JobClient:    SLOTS_MILLIS_MAPS=12563
14/07/15 15:23:32 INFO mapred.JobClient:    Total time spent by all reduces waiting after reserving slots (ms)=0
14/07/15 15:23:32 INFO mapred.JobClient:    Total time spent by all maps waiting after reserving slots (ms)=0
14/07/15 15:23:32 INFO mapred.JobClient:    Launched map tasks=1
14/07/15 15:23:32 INFO mapred.JobClient:    Data-local map tasks=1
14/07/15 15:23:32 INFO mapred.JobClient:    SLOTS_MILLIS_REDUCES=14550
14/07/15 15:23:32 INFO mapred.JobClient:  File Output Format Counters
14/07/15 15:23:32 INFO mapred.JobClient:    Bytes Written=1306
14/07/15 15:23:32 INFO mapred.JobClient:  FileSystemCounters
14/07/15 15:23:32 INFO mapred.JobClient:    FILE_BYTES_READ=1836
14/07/15 15:23:32 INFO mapred.JobClient:    HDFS_BYTES_READ=1463
14/07/15 15:23:32 INFO mapred.JobClient:    FILE_BYTES_WRITTEN=120839
14/07/15 15:23:32 INFO mapred.JobClient:    HDFS_BYTES_WRITTEN=1306
14/07/15 15:23:32 INFO mapred.JobClient:  File Input Format Counters
14/07/15 15:23:32 INFO mapred.JobClient:    Bytes Read=1366
14/07/15 15:23:32 INFO mapred.JobClient:  Map-Reduce Framework
14/07/15 15:23:32 INFO mapred.JobClient:    Map output materialized bytes=1836
14/07/15 15:23:32 INFO mapred.JobClient:    Map input records=31
14/07/15 15:23:32 INFO mapred.JobClient:    Reduce shuffle bytes=1836
14/07/15 15:23:32 INFO mapred.JobClient:    Spilled Records=262
14/07/15 15:23:32 INFO mapred.JobClient:    Map output bytes=2055
14/07/15 15:23:32 INFO mapred.JobClient:    Total committed heap usage (bytes)=212611072
14/07/15 15:23:32 INFO mapred.JobClient:    CPU time spent (ms)=2430
14/07/15 15:23:32 INFO mapred.JobClient:    Combine input records=179
14/07/15 15:23:32 INFO mapred.JobClient:    SPLIT_RAW_BYTES=97
14/07/15 15:23:32 INFO mapred.JobClient:    Reduce input records=131
14/07/15 15:23:32 INFO mapred.JobClient:    Reduce input groups=131
14/07/15 15:23:32 INFO mapred.JobClient:    Combine output records=131
14/07/15 15:23:32 INFO mapred.JobClient:    Physical memory (bytes) snapshot=177545216
14/07/15 15:23:32 INFO mapred.JobClient:    Reduce output records=131
14/07/15 15:23:32 INFO mapred.JobClient:    Virtual memory (bytes) snapshot=695681024
14/07/15 15:23:32 INFO mapred.JobClient:    Map output records=179

hadoop@ubuntu:~/export/hadoop$ hadoop fs -ls /
Warning: $HADOOP_HOME is deprecated.

Found 3 items
-rw-r–r–  1 hadoop supergroup      1366 2014-07-15 15:21 /README.txt
drwxr-xr-x  – hadoop supergroup          0 2014-07-14 16:36 /home
drwxr-xr-x  – hadoop supergroup          0 2014-07-15 15:23 /wordcountoutput
hadoop@ubuntu:~/export/hadoop$ hadoop fs -get  /wordcountoutput  /home/hadoop/
Warning: $HADOOP_HOME is deprecated.

你可以下载下来看看这个文件
如下:

(see 1
5D002.C.1, 1
740.13) 1
<http://www.wassenaar.org/> 1
Administration 1
Apache 1
BEFORE 1
BIS 1
Bureau 1
Commerce, 1
Commodity 1
Control 1
Core 1
Department 1
ENC 1
Exception 1
Export 2
For 1
Foundation 1
Government 1
Hadoop 1
Hadoop, 1
Industry 1
Jetty 1
License 1
Number 1
Regulations, 1
SSL 1
Section 1
Security 1
See 1
Software 2
Technology 1
The 4
This 1
U.S. 1
Unrestricted 1
about 1
algorithms. 1
and 6
and/or 1
another 1
any 1
as 1
asymmetric 1
at: 2
both 1
by 1
check 1
classified 1
code 1
code. 1
concerning 1
country 1
country’s 1
country, 1
cryptographic 3
currently 1
details 1
distribution 2
eligible 1
encryption 3
exception 1
export 1
following 1
for 3
form 1
from 1
functions 1
has 1
have 1

更多 Hadoop 相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13

装好虚拟机 +Linux,并且主机网络和虚拟机网络互通。以及 Linux 上装好 JDK

————————————– 分割线 ————————————–

Ubuntu 13.04 上搭建 Hadoop 环境 http://www.linuxidc.com/Linux/2013-06/86106.htm

Ubuntu 12.10 +Hadoop 1.2.1 版本集群配置 http://www.linuxidc.com/Linux/2013-09/90600.htm

Ubuntu 上搭建 Hadoop 环境(单机模式 + 伪分布模式)http://www.linuxidc.com/Linux/2013-01/77681.htm

Ubuntu 下 Hadoop 环境的配置 http://www.linuxidc.com/Linux/2012-11/74539.htm

单机版搭建 Hadoop 环境图文教程详解 http://www.linuxidc.com/Linux/2012-02/53927.htm

————————————– 分割线 ————————————–

1:在 Linux 下输入命令 vi /etc/profile 添加 HADOOP_HOME

export  JAVA_HOME=/home/hadoop/export/jdk
export  HADOOP_HOME=/home/hadoop/export/hadoop
export  PATH=.:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

2:修改 hadoop/conf 目录下面 hadoop-env.sh 第九行

export JAVA_HOME=/home/hadoop/export/jdk

3:修改 hadoop/conf 目录下面 core-site.xml

<configuration>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/…/tmp</value>
        </property>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://127.0.0.1:9000</value>
        </property>
</configuration>

4:修改 hadoop/conf 目录下面 hdfs-site.xml

<configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
</configuration>

5:修改 hadoop/conf 目录下面 mapred-site.xml

<configuration>
        <property>
                <name>mapred.job.tracker</name>
                <value>127.0.0.1:9001</value>
        </property>
</configuration>

修改完成。
转到 hadoop/bin 下面输入 hadoop namenode -format
出现如下:(说明成功)

Warning: $HADOOP_HOME is deprecated.

14/07/15 16:06:27 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:  host = ubuntu/127.0.1.1
STARTUP_MSG:  args = [-format]
STARTUP_MSG:  version = 1.2.1
STARTUP_MSG:  build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by ‘mattf’ on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:  java = 1.7.0_55
************************************************************/

14/07/15 16:07:09 INFO util.GSet: Computing capacity for map BlocksMap
14/07/15 16:07:09 INFO util.GSet: VM type      = 32-bit
14/07/15 16:07:09 INFO util.GSet: 2.0% max memory = 1013645312
14/07/15 16:07:09 INFO util.GSet: capacity      = 2^22 = 4194304 entries
14/07/15 16:07:09 INFO util.GSet: recommended=4194304, actual=4194304
14/07/15 16:07:10 INFO namenode.FSNamesystem: fsOwner=hadoop
14/07/15 16:07:10 INFO namenode.FSNamesystem: supergroup=supergroup
14/07/15 16:07:10 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/07/15 16:07:10 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
14/07/15 16:07:10 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/07/15 16:07:10 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
14/07/15 16:07:10 INFO namenode.NameNode: Caching file names occuring more than 10 times
14/07/15 16:07:10 INFO common.Storage: Image file /home/hadoop/tmp/dfs/name/current/fsimage of size 118 bytes saved in 0 seconds.
14/07/15 16:07:10 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/hadoop/tmp/dfs/name/current/edits
14/07/15 16:07:10 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/hadoop/tmp/dfs/name/current/edits
14/07/15 16:07:10 INFO common.Storage: Storage directory /home/hadoop/tmp/dfs/name has been successfully formatted.
14/07/15 16:07:10 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/

更多详情见请继续阅读下一页的精彩内容:http://www.linuxidc.com/Linux/2014-07/104312p2.htm

正文完
星哥说事-微信公众号
post-qrcode
 0
星锅
版权声明:本站原创文章,由 星锅 于2022-01-20发表,共计13960字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
【腾讯云】推广者专属福利,新客户无门槛领取总价值高达2860元代金券,每种代金券限量500张,先到先得。
阿里云-最新活动爆款每日限量供应
评论(没有评论)
验证码
【腾讯云】云服务器、云数据库、COS、CDN、短信等云产品特惠热卖中