共计 13772 个字符,预计需要花费 35 分钟才能阅读完成。
在 Ubuntu14.04 上安装 Java 和 Hadoop 环境
Java 安装的是 /usr/lib/jvm/jdk1.7.0_72
1. 下载,
2. 使用 sudo 创建 jvm 文件夹,并且 cp
3. 解压 tar–zxvf
4.sudochown -R castle:castle hadoop-2.6.0 修改权限
5. 配置环境变量
~/.profile 中也可以在~/.bashrc 中添加
#setjava env
exportJAVA_HOME=/usr/lib/jvm/jdk1.7.0_72
exportJRE_HOME=${JAVA_HOME}/jre
exportCLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
exportPATH=${JAVA_HOME}/bin:$PATH
#sethadoop env
exportHADOOP_HOME=/usr/local/hadoop/hadoop-2.6.0
exportPATH=$PATH:$HADOOP_HOME/bin
source .profile 不需要注销登陆时文件生效
hadoop/usr/local/hadoop/hadoop-2.6.0
前面的步骤与上面的很相似的
1. 配置 etc/hadoop/hadoop-env.sh
#set to the root of your Java installation
exportJAVA_HOME=/usr/lib/jvm/jdk1.7.0_72
#hadoop
exportHADOOP_PREFIX=/usr/local/hadoop/hadoop-2.6.0
2. 伪分布配置
etc/hadoop/core-site.xml:
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/hadoop-2.6.0/tmp</value>
<description>Abase for other
temporary directories.
</description>
</property>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property></configuration>etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/hadoop-2.6.0/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/hadoop-2.6.0/dfs/data</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
//// 这个属性节点是为了防止后面 eclopse 存在拒绝读写设置的
</property>
</configuration>
mapred-site.xml
<!–mapreduce parameter –>
<!– 新框架支持第三方 MapReduce 开发框架以支持如 SmartTalk/DGSG 等非 Yarn 架构,注意通常情况下这个配置的值都设置为 Yarn,
如果没有配置这项,那么提交的 Yarn job 只会运行在 locale 模式,而不是分布式模式。–>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
注意:旧版的 mapreduce 在这里面是要配置以下内容的:
<property>
<name>mapred.job.tracker</name>
<value>http://192.168.1.2:9001</value>
</property>
新框架中已改为 Yarn-site.xml 中的 resouceManager 及 nodeManager 具体配置项,新框架中历史 job 的查询已从 Jobtracker 剥离,归入单独的 mapreduce.jobtracker.jobhistory 相关配置,
所以这里不需要配置这个选项。在 yarn-site.xml 配置相关属性即可。
yarn-site.xml
<configuration> <property>
<name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property></configuration>
关于新旧版本的 mapreduce 的差别可以查看这些:http://www.linuxidc.com/Linux/2013-09/90090.htm
虾皮最经典的集群配置方法。http://www.linuxidc.com/Linux/2012-12/76694.htm
其他的修改文章
http://www.linuxidc.com/Linux/2015-01/112368.htm
http://www.linuxidc.com/Linux/2015-01/112369.htm
3. 配置 SSH 无密码登陆
如果 ubuntu 没有安装 ssh 相关的软件
$
sudo apt-get install ssh$
sudo apt-get install rsyncSetuppassphraseless ssh
Nowcheck that you can ssh to the localhost without a passphrase:
$
ssh localhostIfyou cannot ssh to localhost without a passphrase, execute thefollowing commands:
$
ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
$
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keysssh-keygen
代表产生密钥 ssh
localhost 还是出现问题
无法连接 ssh:
connect to host localhost port 22: Connection refused 从网上得知解决办法 1. 首先查看是否有 sshd 进程
ps
-e | grep ssh2. 没有的话启动
/etc/init.d/ssh
-start 如果启动不了的话,需要安装 3. 安装
sudo
apt-get install openssh-server4. 重新启动 5. 查看可以了 1695
? 00:00:00 ssh-agent12407
? 00:00:00 sshdcastle@castle-X550VC:~$
ssh localhost
The
authenticity of host ‘localhost (127.0.0.1)’ can’t be established.ECDSA
key fingerprint is ae:23:4a:95:bc:37:dd:1a:5b:48:4f:66:e2:87:12:1c.Are
you sure you want to continue connecting (yes/no)? yPlease
type ‘yes’ or ‘no’: yesWarning:
Permanently added ‘localhost’ (ECDSA) to the list of known hosts.Welcome
to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-43-generic x86_64)
*
Documentation: https://help.ubuntu.com/The
programs included with the Ubuntu system are free software;the
exact distribution terms for each program are described in theindividual
files in /usr/share/doc/*/copyright.Ubuntu
comes with ABSOLUTELY NO WARRANTY, to the extent permitted byapplicable
law. $
bin/hdfs namenode -formatbin/hdfs namenode -format 只需要执行一次即可。如果执行两次的话,
每次 namenode
format 会重新创建一个 namenodeId
/usr/local/hadoop/hadoop2.6.0/tmp/dfs/name
会被清空; 而 datanode 不清空。
会出现:datanode 的 clusterID
和
namenode 的 clusterID
不匹配
出现这种问题的解决办法是:修改 …/tmp/dfs/name 下的 namenodeId.
为什么我在 hadoop0.20.2 中每一次都执行了 format?我想是因为我每一次 format 都不成功的原因吧。hdfs dfs -mkdir /user 在 hdfs 中创建文件夹。$
sbin/start-dfs.sh 使用 jps 命令查看 2855
org.eclipse.equinox.launcher_1.3.0.v20140415-2008.jar
11127 DataNode
10975 NameNode
11432 Jps
11284 SecondaryNameNode
表示成功了。
$
sbin/start-yarn.sh$
sbin/stop-dfs.sh
$
sbin/stop-yarn.sh 如果在 eclipse 运行 helloword 的时候,控制台没有打印出运行的过程。那么就将 hadoop 安装文件夹中的 etc/hadoop/log4j.properties 复制到 eclipse 项目中的 src 文件夹中即可。15/01/2410:30:12 WARN util.NativeCodeLoader: Unable to load native-hadooplibrary for your platform… using builtin-java classes whereapplicable
15/01/2410:30:13 INFO Configuration.deprecation: session.id is deprecated.Instead, use dfs.metrics.session-id
15/01/2410:30:13 INFO jvm.JvmMetrics: Initializing JVM Metrics withprocessName=JobTracker, sessionId=
15/01/2410:30:13 WARN mapreduce.JobSubmitter: No job jar file set. Userclasses may not be found. See Job or Job#setJar(String).
15/01/2410:30:13 INFO input.FileInputFormat: Total input paths to process : 2
15/01/2410:30:14 INFO mapreduce.JobSubmitter: number of splits:2
15/01/2410:30:14 INFO mapreduce.JobSubmitter: Submitting tokens for job:job_local632218717_0001
15/01/2410:30:14 INFO mapreduce.Job: The url to track the job:http://localhost:8080/
15/01/2410:30:14 INFO mapreduce.Job: Running job: job_local632218717_0001
15/01/2410:30:14 INFO mapred.LocalJobRunner: OutputCommitter set in confignull
15/01/2410:30:14 INFO mapred.LocalJobRunner: OutputCommitter isorg.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
15/01/2410:30:15 INFO mapred.LocalJobRunner: Waiting for map tasks
15/01/2410:30:15 INFO mapred.LocalJobRunner: Starting task:attempt_local632218717_0001_m_000000_0
15/01/2410:30:15 INFO mapred.Task: Using ResourceCalculatorProcessTree : []
15/01/2410:30:15 INFO mapred.MapTask: Processing split:hdfs://localhost:9000/user/castle/wordcount_input/input1:0+32
15/01/2410:30:15 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/01/2410:30:15 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/01/2410:30:15 INFO mapred.MapTask: soft limit at 83886080
15/01/2410:30:15 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/01/2410:30:15 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/01/2410:30:15 INFO mapred.MapTask: Map output collector class =org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/01/2410:30:15 INFO mapred.LocalJobRunner:
15/01/2410:30:15 INFO mapred.MapTask: Starting flush of map output
15/01/2410:30:15 INFO mapred.MapTask: Spilling map output
15/01/2410:30:15 INFO mapred.MapTask: bufstart = 0; bufend = 52; bufvoid =104857600
15/01/2410:30:15 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend =26214380(104857520); length = 17/6553600
15/01/2410:30:15 INFO mapred.MapTask: Finished spill 0
15/01/2410:30:15 INFO mapred.Task:Task:attempt_local632218717_0001_m_000000_0 is done. And is in theprocess of committing
15/01/2410:30:15 INFO mapred.LocalJobRunner: map
15/01/2410:30:15 INFO mapred.Task: Task’attempt_local632218717_0001_m_000000_0′ done.
15/01/2410:30:15 INFO mapred.LocalJobRunner: Finishing task:attempt_local632218717_0001_m_000000_0
15/01/2410:30:15 INFO mapred.LocalJobRunner: Starting task:attempt_local632218717_0001_m_000001_0
15/01/2410:30:15 INFO mapred.Task: Using ResourceCalculatorProcessTree : []
15/01/2410:30:15 INFO mapred.MapTask: Processing split:hdfs://localhost:9000/user/castle/wordcount_input/input2:0+29
15/01/2410:30:15 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/01/2410:30:15 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/01/2410:30:15 INFO mapred.MapTask: soft limit at 83886080
15/01/2410:30:15 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/01/2410:30:15 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/01/2410:30:15 INFO mapred.MapTask: Map output collector class =org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/01/2410:30:15 INFO mapred.LocalJobRunner:
15/01/2410:30:15 INFO mapred.MapTask: Starting flush of map output
15/01/2410:30:15 INFO mapred.MapTask: Spilling map output
15/01/2410:30:15 INFO mapred.MapTask: bufstart = 0; bufend = 49; bufvoid =104857600
15/01/2410:30:15 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend =26214380(104857520); length = 17/6553600
15/01/2410:30:15 INFO mapred.MapTask: Finished spill 0
15/01/2410:30:15 INFO mapred.Task:Task:attempt_local632218717_0001_m_000001_0 is done. And is in theprocess of committing
15/01/2410:30:15 INFO mapred.LocalJobRunner: map
15/01/2410:30:15 INFO mapred.Task: Task’attempt_local632218717_0001_m_000001_0′ done.
15/01/2410:30:15 INFO mapred.LocalJobRunner: Finishing task:attempt_local632218717_0001_m_000001_0
15/01/2410:30:15 INFO mapred.LocalJobRunner: map task executor complete.
15/01/2410:30:15 INFO mapred.LocalJobRunner: Waiting for reduce tasks
15/01/2410:30:15 INFO mapred.LocalJobRunner: Starting task:attempt_local632218717_0001_r_000000_0
15/01/2410:30:15 INFO mapred.Task: Using ResourceCalculatorProcessTree : []
15/01/2410:30:15 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin:org.apache.hadoop.mapreduce.task.reduce.Shuffle@158e338a
15/01/2410:30:15 INFO reduce.MergeManagerImpl: MergerManager:memoryLimit=626471744, maxSingleShuffleLimit=156617936,mergeThreshold=413471360, ioSortFactor=10,memToMemMergeOutputsThreshold=10
15/01/2410:30:15 INFO reduce.EventFetcher:attempt_local632218717_0001_r_000000_0 Thread started: EventFetcherfor fetching Map Completion Events
15/01/2410:30:15 INFO mapreduce.Job: Job job_local632218717_0001 running inuber mode : false
15/01/2410:30:15 INFO mapreduce.Job: map 100% reduce 0%
15/01/2410:30:16 INFO reduce.LocalFetcher: localfetcher#1 about to shuffleoutput of map attempt_local632218717_0001_m_000000_0 decomp: 40 len:44 to MEMORY
15/01/2410:30:16 INFO reduce.InMemoryMapOutput: Read 40 bytes from map-outputfor attempt_local632218717_0001_m_000000_0
15/01/2410:30:16 INFO reduce.MergeManagerImpl: closeInMemoryFile ->map-output of size: 40, inMemoryMapOutputs.size() -> 1,commitMemory -> 0, usedMemory ->40
15/01/2410:30:16 INFO reduce.LocalFetcher: localfetcher#1 about to shuffleoutput of map attempt_local632218717_0001_m_000001_0 decomp: 51 len:55 to MEMORY
15/01/2410:30:16 INFO reduce.InMemoryMapOutput: Read 51 bytes from map-outputfor attempt_local632218717_0001_m_000001_0
15/01/2410:30:16 INFO reduce.MergeManagerImpl: closeInMemoryFile ->map-output of size: 51, inMemoryMapOutputs.size() -> 2,commitMemory -> 40, usedMemory ->91
15/01/2410:30:16 INFO reduce.EventFetcher: EventFetcher is interrupted..Returning
15/01/2410:30:16 INFO mapred.LocalJobRunner: 2 / 2 copied.
15/01/2410:30:16 INFO reduce.MergeManagerImpl: finalMerge called with 2in-memory map-outputs and 0 on-disk map-outputs
15/01/2410:30:16 INFO mapred.Merger: Merging 2 sorted segments
15/01/2410:30:16 INFO mapred.Merger: Down to the last merge-pass, with 2segments left of total size: 71 bytes
15/01/2410:30:16 INFO reduce.MergeManagerImpl: Merged 2 segments, 91 bytes todisk to satisfy reduce memory limit
15/01/2410:30:16 INFO reduce.MergeManagerImpl: Merging 1 files, 93 bytes fromdisk
15/01/2410:30:16 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytesfrom memory into reduce
15/01/2410:30:16 INFO mapred.Merger: Merging 1 sorted segments
15/01/2410:30:16 INFO mapred.Merger: Down to the last merge-pass, with 1segments left of total size: 79 bytes
15/01/2410:30:16 INFO mapred.LocalJobRunner: 2 / 2 copied.
15/01/2410:30:16 INFO Configuration.deprecation: mapred.skip.on isdeprecated. Instead, use mapreduce.job.skiprecords
15/01/2410:30:16 INFO mapred.Task:Task:attempt_local632218717_0001_r_000000_0 is done. And is in theprocess of committing
15/01/2410:30:16 INFO mapred.LocalJobRunner: 2 / 2 copied.
15/01/2410:30:16 INFO mapred.Task: Taskattempt_local632218717_0001_r_000000_0 is allowed to commit now
15/01/2410:30:16 INFO output.FileOutputCommitter: Saved output of task’attempt_local632218717_0001_r_000000_0′ tohdfs://localhost:9000/user/castle/wordcount_output/_temporary/0/task_local632218717_0001_r_000000
15/01/2410:30:16 INFO mapred.LocalJobRunner: reduce > reduce
15/01/2410:30:16 INFO mapred.Task: Task’attempt_local632218717_0001_r_000000_0′ done.
15/01/2410:30:16 INFO mapred.LocalJobRunner: Finishing task:attempt_local632218717_0001_r_000000_0
15/01/2410:30:16 INFO mapred.LocalJobRunner: reduce task executor complete.
15/01/2410:30:16 INFO mapreduce.Job: map 100% reduce 100%
15/01/2410:30:16 INFO mapreduce.Job: Job job_local632218717_0001 completedsuccessfully
15/01/2410:30:16 INFO mapreduce.Job: Counters: 38
FileSystem Counters
FILE:Number of bytes read=1732
FILE:Number of bytes written=754881
FILE:Number of read operations=0
FILE:Number of large read operations=0
FILE:Number of write operations=0
HDFS:Number of bytes read=154
HDFS:Number of bytes written=42
HDFS:Number of read operations=25
HDFS:Number of large read operations=0
HDFS:Number of write operations=5
Map-ReduceFramework
Mapinput records=10
Mapoutput records=10
Mapoutput bytes=101
Mapoutput materialized bytes=99
Inputsplit bytes=242
Combineinput records=10
Combineoutput records=7
Reduceinput groups=5
Reduceshuffle bytes=99
Reduceinput records=7
Reduceoutput records=5
SpilledRecords=14
ShuffledMaps =2
FailedShuffles=0
MergedMap outputs=2
GCtime elapsed (ms)=0
CPUtime spent (ms)=0
Physicalmemory (bytes) snapshot=0
Virtualmemory (bytes) snapshot=0
Totalcommitted heap usage (bytes)=855638016
ShuffleErrors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
FileInput Format Counters
BytesRead=61
FileOutput Format Counters
BytesWritten=42
Hadoop2.6 和 eclipse 整合开发配置编译 hadoop
eclipse 插件 git
clone https://github.com/winghc/hadoop2x-eclipse-plugin.git 然后使用 ant 进行编译 cd
src/contrib/eclipse-pluginant jar -Dversion=2.6.0 -Declipse.home=/usr/local/eclipse -Dhadoop.home=/usr/local/hadoop-2.6.0 // 需要手动安装的 eclipse,通过命令行一键安装的不行
eclipse.home 和 hadoop.home 设置成你自己的环境路径
生成的位置是:/home/hunter/hadoop2x-eclipse-plugin/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.6.0.jar
不好意思我没有成功,就是编译的时候卡在那里,也不报错什么的。后来我用这个 git 文件中 release 下有一个 hadoop2.2.0 版本的。用这个就可以,其他的就不行。
右边配置的要和 core-site.xml 中的一致。左边的话可以不需要配置,以前旧版的 mapreduce 是配置和 mapred-site.xml 中的一致。
CentOS 安装和配置 Hadoop2.2.0 http://www.linuxidc.com/Linux/2014-01/94685.htm
Ubuntu 13.04 上搭建 Hadoop 环境 http://www.linuxidc.com/Linux/2013-06/86106.htm
Ubuntu 12.10 +Hadoop 1.2.1 版本集群配置 http://www.linuxidc.com/Linux/2013-09/90600.htm
Ubuntu 上搭建 Hadoop 环境(单机模式 + 伪分布模式)http://www.linuxidc.com/Linux/2013-01/77681.htm
Ubuntu 下 Hadoop 环境的配置 http://www.linuxidc.com/Linux/2012-11/74539.htm
单机版搭建 Hadoop 环境图文教程详解 http://www.linuxidc.com/Linux/2012-02/53927.htm
搭建 Hadoop 环境(在 Winodws 环境下用虚拟机虚拟两个 Ubuntu 系统进行搭建)http://www.linuxidc.com/Linux/2011-12/48894.htm
更多 Hadoop 相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13