Apache Spark:
Apache Spark™ is a fast and general engine for large-scale data processing.
Spark run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
Spark gives us a comprehensive, unified framework to manage big data processing requirements with a variety of data sets that are diverse in nature (text data, graph data etc) as well as the source of data (batch v. real-time streaming data).
Integrates well with the Hadoop ecosystem and data sources (HDFS, Amazon S3, Hive, HBase, Cassandra, etc.)
Can run on clusters managed by Hadoop YARN or Apache Mesos, and can also run in Standalone mode
Provides APIs in Scala, Java, and Python, with support for other languages (such as R) on the way
In addition to Map and Reduce operations, it supports SQL queries, streaming data, machine learning and graph data processing.
We should look at Spark as an alternative to Hadoop MapReduce rather than a replacement to Hadoop.
Have a look at infoQ and toptal articles for better understanding.
Major Use cases for Spark:
- Machine Learning algorithms
- Interactive analytics
- Streaming data
Akka: from Letitcrash
Akka is an event-driven middleware framework, for building high performance and reliable distributed applications in Java and Scala. Akka decouples business logic from low-level mechanisms such as threads, locks and non-blocking IO. With Akka, you can easily configure how actors will be created, destroyed, scheduled, and restarted upon failure.
Have a look at this typesafe article for better understanding on Actor framework.
Akka provides fault-tolerance based on supervisor hierarchies. Every Actor can create other Actors, which it will then supervise, making decisions if they should be resumed, restarted, retired or if the problem should be escalated.
Have a look at Akka article & SO questions
Major use cases :
- Transaction processing
- Concurrency/parallelism
- Simulation
- Batch processing
- Gaming and Betting
- Complex Event Stream Processing