CentOS6下配置Spark+Python开发环境记录

共计 18526 个字符，预计需要花费 47 分钟才能阅读完成。

1. 使用 $SPARK_HOME/sbin/ 下的 pyspark 启动时，报错 Traceback (most recent call last):

File “/home/joy/spark/spark/Python/pyspark/shell.py”, line 28, in

首先按照搜索结果使用 yum install -y zlib* 安装了欠缺的包，但是仍报错，后使用 sudo 命令执行./pyspark 即可正常执行。目前必须使用 sudo 命令才能正常执行，可能与环境设置有关，待解决——因为使用 sudo 命令安装，所以文件的所有者为 root，chown 更改所有者。
但是这样必须使用 sudo 安装 pip，为了一劳永逸，重新编译 python
解决方法：

1、安装依赖 zlib、zlib-devel

2、重新编译安装 Python

./configure
编辑 Modules/Setup 文件
找到下面这句，去掉注释

zlib zlibmodule.c -I$(prefix)/include -L$(exec_prefix)/lib -lz

重新编译安装：make & make install
编译后报错仍有部分模块未编译成功
Python build finished, but the necessary bits to build these modules were not found:
_bsddb _curses _curses_panel
_sqlite3 _ssl _tkinter
bsddb185 bz2 dbm
dl gdbm imageop

无论报错信息如何，意思很明确，我们编译的时候，系统没有办法找到对应的模块信息，为了解决这些报错，我们就需要提前安装依赖包，这些依赖包对应列表如下（不一定完全）：

模块	依赖	说明
_bsddb	bsddb	Interface to Berkeley DB library。Berkeley 数据库的接口.bsddb is deprecated since 2.6. The ideal is to use the bsddb3 module.
_curses	ncurses	Terminal handling for character-cell displays。
_curses_panel	ncurses	A panel stack extension for curses。
_sqlite3	sqlite	DB-API 2.0 interface for SQLite databases。SqlLite，CentOS 可以安装 sqlite-devel
_ssl	openssl-devel.i686	TLS/SSL wrapper for socket objects。
_tkinter	N/A	a thin object-oriented layer on top of Tcl/Tk。如果不使用桌面程序可以忽略 TKinter
bsddb185	old bsddb module	老的 bsddb 模块，可忽略。
bz2	bzip2-devel.i686	Compression compatible with bzip2。bzip2-devel
dbm	bsddb	Simple“database”interface。
dl	N/A	Call C functions in shared objects.Python2.6 开始，已经弃用。
gdbm	gdbm-devel.i686	GNU’s reinterpretation of dbm
imageop	N/A	Manipulate raw image data。已经弃用。
readline	readline-devel	GNU readline interface
sunaudiodev	N/A	Access to Sun audio hardware。这个是针对 Sun 平台的，CentOS 下可以忽略
zlib	Zlib	Compression compatible with gzip

在 CentOS 下，可以安装这些依赖包：readline-devel，sqlite-devel，bzip2-devel.i686，openssl-devel.i686，gdbm-devel.i686，libdbi-devel.i686，ncurses-libs，zlib-devel.i686。完成这些安装之后，可以再次编译，上表中指定为弃用或者忽略的模块错误可以忽略。

在编译完成之后，就可以接着上面的第六步安装 Python 到指定目录下。安装完成之后，我们可以到安装目录下查看 Python 是否正常安装。

3. SparkSQL 准备

首先呢，看使用 HiveContext 都需要哪些要求，文章中有这么三个要求：
1、检查 $SPARK_HOME/lib 目录下是否有 datanucleus-api-jdo-3.2.1.jar、datanucleus-rdbms-3.2.1.jar
、datanucleus-core-3.2.2.jar 这几个 jar 包。
2、检查 $SPARK_HOME/conf 目录下是否有从 $HIVE_HOME/conf 目录下拷贝过来的 hive-site.xml。
3、提交程序的时候将数据库驱动程序的 jar 包指定到 DriverClassPath，如 bin/spark-submit –driver-class-path *.jar。或者在 spark-env.sh 中设置 SPARK_CLASSPATH。

参考文章，将 $HIVE_HOME/lib 下以 datanucleus 开头的几个 jar 包复制到 $SPARK_HOME/lib 下；$HIVE_HOME/conf 下的 hive-site.xml 复制到 $SPARK_HOME/conf 下；将 $HIVE_HOME/lib 下的 MySQL-connector 复制到 $SPARK_HOME/jars 下，

2. 启动 spark-shell 时报错

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/01/17 11:42:58 WARN SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0
17/01/17 11:43:00 WARN NativeCodeLoader: Unable to load native-Hadoop library for your platform... using builtin-java classes where applicable
17/01/17 11:43:00 WARN Utils: Your hostname, node1 resolves to a loopback address: 127.0.0.1; using 192.168.85.128 instead (on interface eth1)
17/01/17 11:43:00 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/01/17 11:43:11 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.server2.thrift.http.min.worker.threads does not exist
17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.mapjoin.optimized.keys does not exist
17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.mapjoin.lazy.hashtable does not exist
17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.datampi.maxslots does not exist
17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.metastore.ds.retry.attempts does not exist
17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.server2.thrift.http.max.worker.threads does not exist
17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.datampi.sendqueue does not exist
17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.optimize.multigroupby.common.distincts does not exist
17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.metastore.ds.retry.interval does not exist
17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.datampi.parallelism does not exist
17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.stats.map.parallelism does not exist
17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.datampi.memusedpercent does not exist
17/01/17 11:43:12 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/joy/spark/spark-2.1.0-bin-hadoop2.6/jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/joy/spark/spark/jars/datanucleus-rdbms-3.2.9.jar."
17/01/17 11:43:12 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/joy/spark/spark-2.1.0-bin-hadoop2.6/jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/joy/spark/spark/jars/datanucleus-api-jdo-3.2.6.jar."
17/01/17 11:43:12 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/joy/spark/spark/jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/joy/spark/spark-2.1.0-bin-hadoop2.6/jars/datanucleus-core-3.2.10.jar."
17/01/17 11:43:16 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.server2.thrift.http.min.worker.threads does not exist
17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.mapjoin.optimized.keys does not exist
17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.mapjoin.lazy.hashtable does not exist
17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.datampi.maxslots does not exist
17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.metastore.ds.retry.attempts does not exist
17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.server2.thrift.http.max.worker.threads does not exist
17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.datampi.sendqueue does not exist
17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.optimize.multigroupby.common.distincts does not exist
17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.metastore.ds.retry.interval does not exist
17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.datampi.parallelism does not exist
17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.stats.map.parallelism does not exist
17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.datampi.memusedpercent does not exist
17/01/17 11:43:22 ERROR ObjectStore: Version information found in metastore differs 0.13.0 from expected schema version 1.2.0. Schema verififcation is disabled hive.metastore.schema.verification so setting version.
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
  at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
  at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
  at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
  at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
  ... 47 elided
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
  ... 58 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
  at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169)
  at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
  at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
  at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
  at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
  at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
  at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
  ... 63 more
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.io.FileNotFoundException: File /hive/tmp does not exist
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166)
  ... 71 more
**Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.io.FileNotFoundException: File /hive/tmp does not exist**
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:366)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:270)
  at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:65)
  ... 76 more
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File /hive/tmp does not exist
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
  at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:192)
  ... 84 more
Caused by: java.io.FileNotFoundException: File /hive/tmp does not exist
  at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:537)
  at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:750)
  at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:527)
  at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
  at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:599)
  at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
  ... 85 more

分析报错信息，发现出错原因为 /hive/tmp 不存在的 FileNotExist 错误，查找 hive-site.xml 文件，该路径为 hive.exec.scratchdir 值，hive.exec.scratchdir 为 HDFS 路径，用于存储不同 map/reduce 阶段的执行计划和这些阶段的中间输出结果。

在终端输入hadoop fs -ls /hive，执行结果为

Found 2 items
drwxr-xr-x   - joy supergroup          0 2016-06-12 21:35 /hive/log
drwxr-xr-x   - joy supergroup          0 2017-01-16 14:17 /hive/tmp

权限分配不对，应该增加 g +w，hadoop fs -chmod g+w /hive/tmp 以及hadoop fs -chmod g+w /hive/log，但是依然报错不存在

在 $SPARK_HOME/conf 下的 spark-env.sh 中增加 HADOOP_CONF_DIR，增加后报错信息变更为

java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
  at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
  at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
  at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
  at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
  ... 47 elided
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
  ... 58 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
  at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169)
  at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
  at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
  at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
  at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
  at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
  at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
  ... 63 more
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /hive/tmp on HDFS should be writable. Current permissions are: rwxrwxr-x
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166)
  ... 71 more
Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /hive/tmp on HDFS should be writable. Current permissions are: rwxrwxr-x
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:366)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:270)
  at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:65)
  ... 76 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /hive/tmp on HDFS should be writable. Current permissions are: rwxrwxr-x
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
  at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:192)
  ... 84 more
Caused by: java.lang.RuntimeException: The root scratch dir: /hive/tmp on HDFS should be writable. Current permissions are: rwxrwxr-x
  at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
  at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
  ... 85 more
<console>:14: error: not found: value spark
       import spark.implicits._
              ^
<console>:14: error: not found: value spark
       import spark.sql

出错信息指出文件夹权限不正确，再次使用hadoop fs -ls /hive

drwxrwxr-x   - joy supergroup          0 2016-06-12 21:35 /hive/log
drwxrwxr-x   - joy supergroup          0 2017-01-16 14:17 /hive/tmp

将文件夹权限改为 777，最终启动成功

4. KeyError: u’y’

出错信息类似于以下：

Traceback (most recent call last):
  File "/Users/lyj/Programs/kiseliugit/MyPysparkCodes/test/spark2.0.py", line 5, in <module>
spark = SparkSession.builder.master("local").appName('test 2.0').config(conf=SparkConf()).getOrCreate()
  File "/Users/lyj/Programs/Apache/Spark2/python/pyspark/conf.py", line 104, in __init__
SparkContext._ensure_initialized()
  File "/Users/lyj/Programs/Apache/Spark2/python/pyspark/context.py", line 243, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
  File "/Users/lyj/Programs/Apache/Spark2/python/pyspark/java_gateway.py", line 116, in launch_gateway
java_import(gateway.jvm, "org.apache.spark.SparkConf")
  File "/Library/Python/2.7/site-packages/py4j/java_gateway.py", line 90, in java_import
return_value = get_return_value(answer, gateway_client, None, None)
  File "/Library/Python/2.7/site-packages/py4j/protocol.py", line 306, in get_return_value
value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
KeyError: u'y'

出错原因为 py4j 版本过低，使用 pip upgrade 升级即可
参考：http://stackoverflow.com/questions/38637988/how-could-i-write-the-right-entry-point-in-spark-2-0-program-actually-pyspark-2

更多 Spark 相关教程见以下内容：

CentOS 7.0 下安装并配置 Spark http://www.linuxidc.com/Linux/2015-08/122284.htm

Spark1.0.0 部署指南 http://www.linuxidc.com/Linux/2014-07/104304.htm

Spark2.0 安装配置文档 http://www.linuxidc.com/Linux/2016-09/135352.htm

Spark 1.5、Hadoop 2.7 集群环境搭建 http://www.linuxidc.com/Linux/2016-09/135067.htm

Spark 官方文档 – 中文翻译 http://www.linuxidc.com/Linux/2016-04/130621.htm

CentOS 6.2(64 位)下安装 Spark0.8.0 详细记录 http://www.linuxidc.com/Linux/2014-06/102583.htm

Spark2.0.2 Hadoop2.6.4 全分布式配置详解 http://www.linuxidc.com/Linux/2016-11/137367.htm

Ubuntu 14.04 LTS 安装 Spark 1.6.0（伪分布式）http://www.linuxidc.com/Linux/2016-03/129068.htm

Spark 的详细介绍：请点这里
Spark 的下载地址：请点这里

本文永久更新链接地址：http://www.linuxidc.com/Linux/2017-03/141725.htm