1.创建脚本
cd /opt/spark-1.4.1-bin-hadoop2.6/conf
cp spark-env.sh.template spark-env.sh
cp slaves.template slaves
2.程序加入环境变量
vi spark-env.sh
export JAVA_HOME=/opt/jdk1.7.0_75
export SCALA_HOME=/opt/scala-2.11.6
export HADOOP_CONF_DIR=/opt/hadoop-2.6.0/etc/hadoop
# spark的work目录临时文件自动清理,清理频率每半小时
export SPARK_WORKER_DIR="/home/hadoop/spark/worker/"
export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=1800"
vi slaves
填入各节点hostname
3.系统加入环境变量
vi /etc/profile
export SPARK_HOME=/opt/spark-1.4.1-bin-hadoop2.6
export PATH=$SPARK_HOME/bin:$PATH
4.启动
cd ../sbin/
./start-all.sh
5.查看进程是否启动
jps
4211 Master
4367 Worker
6.进入spark的web页面 http://spore:8080/
7.使用spark-shell
cd ../bin/
./spark-shell
8.sparkUI http://spore:4040
源码阅读,查看spark支持哪些sql关键字:
spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\SQLParser.scala
spark-sql自定义函数例子
http://colobu.com/2014/12/11/spark-sql-quick-start/
如果要使用bin/spark-sql这个命令
必须启动hive metastore且conf/hive-site.xml内必须要有hive.metastore.uris的配置,例如
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://byd0087:9083</value>
</property>
</configuration>
启动bin/spark-sql,即可使用hive的HQL语句,速度可比hive快多了
spark优化
http://my.oschina.net/u/877759/blog/490053
来源:oschina
链接:https://my.oschina.net/u/877759/blog/490527