(一)spark 相关安装部署、开发环境
1、Spark 伪分布式 & 全分布式 安装指南
http://my.oschina.net/leejun2005/blog/394928
2、Apache Spark探秘:三种分布式部署方式比较
http://dongxicheng.org/framework-on-yarn/apache-spark-comparing-three-deploying-ways/
3、idea上运行local的spark sql hive
http://dataknocker.github.io/2014/10/11/idea%E4%B8%8A%E8%BF%90%E8%A1%8Clocal%E7%9A%84spark-sql-hive/
4、Apache Spark学习:利用Scala语言开发Spark应用程序
http://dongxicheng.org/framework-on-yarn/spark-scala-writing-application/
5、如何在CDH5上运行Spark应用(Scala、Java、Python)
http://blog.javachen.com/2015/02/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/
6、Spark集群安装和使用
http://blog.javachen.com/2014/07/01/spark-install-and-usage/#
(二)spark 架构、原理与编码
1、理解Spark的核心RDD
http://www.infoq.com/cn/articles/spark-core-rdd
2、How-to: Translate from MapReduce to Apache Spark(怎样从 MapReduce 迁移到 Spark)
http://blog.cloudera.com/blog/2014/09/how-to-translate-from-mapreduce-to-apache-spark/
3、Spark SQL 源码分析之 In-Memory Columnar Storage 之 cache table
http://blog.csdn.net/oopsoom/article/details/39525483
4、Databricks Spark 知识库
http://aiyanbo.gitbooks.io/databricks-spark-knowledge-base-zh-cn/content/
5、Spark1.0.0 编程模型
http://blog.csdn.net/book_mmicky/article/details/32096871
6、Spark技术内幕:Client,Master和Worker 通信源码解析
http://blog.csdn.net/anzhsoft/article/details/30802603
7、Spark Streaming编程指南
http://yangqijun.com/archives/200
8、Spark分布式计算执行模型
9、Top 3 Troubleshooting Tips To Keep You Sparking
http://engineering.sharethrough.com/blog/2013/09/13/top-3-troubleshooting-tips-to-keep-you-sparking/
10、Apache Spark 设计与实现(重点关注设计思想、运行原理、实现架构及性能调优,附带讨论与 MapReduce 在设计与实现上的区别。)
https://github.com/JerryLead/SparkInternals/tree/master/markdown
11、Spark Examples
http://spark.apache.org/examples.html
12、RDD操作详解
http://dataknocker.github.io/2014/07/20/RDD%E5%90%84%E6%93%8D%E4%BD%9C%E8%AF%A6%E8%A7%A3/
13、Spark编程指南笔记
http://blog.javachen.com/2015/02/03/spark-programming-guide/#
14、Spark Core Runtime分析: DAGScheduler, TaskScheduler, SchedulerBackend
http://blog.csdn.net/pelick/article/details/44495611
15、Getting Started with Spark (in Python)
https://districtdatalabs.silvrback.com/getting-started-with-spark-in-python
16、Spark编程指南笔记
http://blog.javachen.com/2015/02/03/spark-programming-guide/#
17、Spark SQL中的DataFrame
http://blog.javachen.com/2015/03/26/spark-sql-dataframe/#
18、Spark RDD API详解(一) Map和Reduce
https://www.zybuluo.com/jewes/note/35032
19、Spark算子系列文章
http://lxw1234.com/archives/2015/07/363.htm
20、Spark Streaming实践和优化
(三)spark 监控与管理
1、Common Spark Troubleshooting
http://www.datastax.com/dev/blog/common-spark-troubleshooting
2、
(四)YARN & spark
1、Apache Spark探秘:多进程模型还是多线程模型?
http://dongxicheng.org/framework-on-yarn/apache-spark-multi-threads-model/
(五)spark 数据平台架构
(六)spark 应用与实践
1、How-to: Do Near-Real Time Sessionization with Spark Streaming and Apache Hadoop
2、Integrating Kafka and Spark Streaming: Code Examples and State of the Game
http://www.michael-noll.com/blog/2014/10/01/kafka-spark-streaming-integration-example-tutorial/
3、spark读取 kafka nginx网站日志消息 并写入HDFS中
http://yangqijun.com/archives/227
4、Flafka: Apache Flume Meets Apache Kafka for Event Processing
http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/
5、Log Analysis with Spark
6、Spark将计算结果写入到Mysql中
http://www.iteblog.com/archives/1275
7、Spark Streaming 1.3对Kafka整合的提升详解
http://www.iteblog.com/archives/1307
8、Spark SQL中的数据源
http://blog.javachen.com/2015/04/03/spark-sql-datasource/#
9、Kafka+Spark Streaming+Redis实时计算整合实践
http://shiyanjun.cn/archives/1097.html
(七)spark 机器学习实践
1、ML Pipelines: A New High-Level API for MLlib
http://databricks.com/blog/2015/01/07/ml-pipelines-a-new-high-level-api-for-mllib.html
2、Spark 0.9.1 MLLib 机器学习库简介
(八)Scala 学习指北
1、Spark开发指南(0.8.1中文版)
2、Swift和Scala语法上的诸多相似之处
http://segmentfault.com/a/1190000000575561
3、Awesome Scala
https://github.com/lauris/awesome-scala
4、scala(有关jvm,scala与后端架构,阿里工程师的博客,相当不错)
5、Scala极速入门
http://my.oschina.net/mup/blog/363436?from=20150111
6、An-Overview-of-the-Scala-Programming-Language
https://github.com/wecite/papers/tree/master/An-Overview-of-the-Scala-Programming-Language
7、Scala简明教程
http://colobu.com/2015/01/14/Scala-Quick-Start-for-Java-Programmers/
8、Scala 课堂
http://twitter.github.io/scala_school/zh_cn/index.html
9、Scala基本语法和概念
http://blog.javachen.com/2015/04/20/basic-of-scala.html
Scala集合
http://blog.javachen.com/2015/04/22/scala-collections.html
10、scala 从入门到入门+
http://segmentfault.com/a/1190000003068853
(九)Spark book
1、Spark Cook Book
http://www.infoobjects.com/spark-cookbook/
2、Fast Data Processing with Spark
http://it-ebooks.info/book/3185/
3、Scala语言概览
http://wecite.github.io/docs/ScalaOverview-20150226.pdf
4、Effective Scala
http://twitter.github.io/effectivescala/index-cn.html
5、有趣的 Scala 语言: 简洁的 Scala 语法
http://www.ibm.com/developerworks/cn/java/j-lo-funinscala2/
来源:oschina
链接:https://my.oschina.net/u/2306127/blog/683687