安装spark1.5.2单机环境

发布时间:2016-03-15  栏目:Spark  评论:0 Comments

本文介绍安装spark单机环境的方法,可用于测试及开发。主要分成以下4部分:
(1)环境准备
(2)安装scala
(3)安装spark
(4)验证安装情况

1、环境准备
(1)配套软件版本要求:Java, Python 2.6, Scala 2.10. 注意对应的版本要求。
(2)安装好linux、jdk、python, 一般linux均会自带安装好jdk与python,但注意jdk默认为openjdk,建议重新安装oracle jdk。

JDK下载地址: http://www.oracle.com/technetwork/java/javase/downloads/jdk-netbeans-jsp-142931.html
(3)IP:10.171.29.191  hostname:master

2、安装scala
(1)下载scala
wget http://downloads.typesafe.com/scala/2.11.8/scala-2.11.8.tgz

或者从这里手动下载:http://www.scala-lang.org/download/

(2)解压文件
tar -zxvf scala-2.11.8.tgz

(3)配置环境变量
#vi/etc/profile
#SCALA VARIABLES START
export SCALA_HOME=/home/wyang22/scala-2.11.8
export PATH=$PATH:$SCALA_HOME/bin
#SCALA VARIABLES END

$ source /etc/profile
$ scala -version
Scala code runner version 2.11.8 — Copyright 2002-2013, LAMP/EPFL

(4)验证scala
$ scala
Welcome to Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51).
Type in expressions to have them evaluated.
Type :help for more information.

scala> 9*9
res0: Int = 81

3、安装spark
(1)下载spark
wget http://mirror.bit.edu.cn/apache/spark/spark-1.5.2/spark-1.5.2-bin-hadoop2.6.tgz

或者到这里手动下载:https://spark.apache.org/downloads.html

 

(2)解压spark
tar -zxvf http://mirror.bit.edu.cn/apache/spark/spark-1.5.2/spark-1.5.2-bin-hadoop2.6.tgz

(3)配置环境变量
#vi/etc/profile
#SPARK VARIABLES START
export SPARK_HOME=/home/wyang22/spark-1.5.2-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin
#SPARK VARIABLES END

$ source /etc/profile

(4)配置spark
$ pwd
/home/wyang22/spark-1.5.2-bin-hadoop2.6/conf

$ mv spark-env.sh.template spark-env.sh
$vi spark-env.sh
export SCALA_HOME=/home/wyang22/scala-2.11.8
export JAVA_HOME=/usr/java/jdk1.7.0_51
export SPARK_MASTER_IP=10.171.29.191
export SPARK_WORKER_MEMORY=512m
# export master=spark://10.171.29.191:7070

(5)启动spark
cd /home/wyang22/spark-1.5.2-bin-hadoop2.6/sbin
$ ./start-all.sh
注意,hadoop也有start-all.sh脚本,因此必须进入具体目录执行脚本

$ jps
30302 Worker
30859 Jps
30172 Master

4、验证安装情况
(1)运行自带示例
$ bin/run-example  org.apache.spark.examples.SparkPi

(2)查看集群环境
http://master:8080/

或http://IPAddress:8080/

(3)进入spark-shell
$spark-shell

或者进入到spark-1.4.0-bin-hadoop2.4/bin目录下运行./spark-shell命令  进入scala命令行

 

Spark also provides a Python API. To run Spark interactively in a Python interpreter, use bin/pyspark:

./bin/pyspark --master local[2]

(4)查看jobs等信息
http://master:4040/jobs/

(5)

Example applications are also provided in Python. For example,

./bin/spark-submit examples/src/main/python/pi.py 10

提交java程序到spark的步骤:http://spark.apache.org/docs/latest/submitting-applications.html

(6)运行MLLib需要安装numpy库1.4版本以上

可以自动安装

或者

手动下载numpy的zip的包(或tar.gz):https://sourceforge.net/projects/numpy/files/NumPy/1.4.1/
再python setup.py install

 

更多信息参考:http://spark.apache.org/docs/latest/

留下评论

You must be logged in to post a comment.

相册集

pix pix pix pix pix pix

关于自己

杨文龙,微软Principal Engineering Manager, 曾在各家公司担任影像技术资深总监、数据科学团队资深经理、ADAS算法总监、资深深度学习工程师等职位,热爱创新发明,专注于人工智能、深度学习、图像处理、机器学习、算法、自然语言处理及软件等领域,目前发明有国际专利19篇,中国专利28篇。

联系我

个人技术笔记

welonshen@gmail.com

2015 in Shanghai