Surprisingly Redis one of the most popular in-memory databases did not have auto sharding 3 months ago. They recently added this feature. Redi 3.0 has auto sharding.
AS Supports auto clustering by having fast restart feature, where all the indexes are persisted without adding to the throughput and the database can be brought up in a couple of mins (e.g; db size of 50 TBs can be brought up in a couple of mins.). All of this can be achieved on commodity hardware. Adding capacity is nothing but adding a new node to the cluster. Works across data centers & cloud environments most importantly it works for any local environment.
Supports online match making (managing demand and supply).
No-SQL database has to deal with real time use cases to meet aggressive SLAs’ needed by todays’ Advertising world, Online shopping portals, logistic service providers such as OLA cab (Identifying the nearest cab who is ready for a pick up and can reach the customer under 5 mins is computed in under <3 ms), online bidding applications(99.7% accuracy in finalizing a AdBid in <3 ms), Fraud Detection systems and so on which needs to identify a malicious user in <5ms(miliseconds).
- Aerospike is record level ACID compliant which is true for most of
the No-SQL databases.
- Aerospike is designed for clustered environment,
- Built for horizontal scaling,
- Supports data balancing(Automic/Manual),
- Auto sharding – application level or transparent to end user.
Aerospike is Open source real-time, no-SQL and key-value store.
Built in C from the scratch because then there are ways in which the db is written to take advantage of the hardware, networking, SSD, memory and Kernel. Optimized for SSD/Flash storage the reason being that SSD are the future of storage devices at the same time it works on HDD (rotational disk drives) SSD provides parallel channels depending upon the SSD provider who may choose to use 8,16 32 and so on. SSD have a wear and tear to it if the same block location is written to and erased from. In case of SSD you write in terms of blocks, SSD is used as a no file system as a block store and used as a ring buffer meaning you write at the ring buffer start and keep adding data to the next , next , next until the end of the drive. Once you reach the end you come back to the first location of the block and then carry on in the same fashion which ensures that the 1st location will be used not the most number of times but equal amount of time.
Clustering or lets call it Auto Clustering.
Add a node and bring it within cluster happens in <100 ms. It is implemented using Paxos Algorithm.
What is is Paxos algorithm?
http://www.quora.com/Distributed-Systems/What-is-a-simple-explanation-of-the-Paxos-algorithm
RIPE160MD# which provides 20 bytes 160bit # it is guaranteed to be unique and
The # is normal 4K distribution,
Every namespace maintains its partition trees, every namespace has a partition ID, every partition has a b-tree.
Storage Model
In memory database: everything is stored in DRAM effectively high performance and high cost involved.
Disk Storage: Primary and 2dary indexes stored in DRAM, Data goes on SSD or HDD. Which means optimum using SSD but slightly slower than DRAM but atleast ~10X cheaper than DRAM.
Hybrid Storage: Everything stored in DRAM. Data persisted on SSD or HDD. DRAM performance backed by SSD or HDD persistence. Higher DRAM cost without losing out on performance.
Benchmark
1.6 million TPS with YCSB(yahoo cloud source benchmark) on 4 node, in-memory.
SSD performance guarantee given by Aerospike:
ACT (Aerospike Compliance test): It is defined and developed to test SSD performance. Today it is the std or certification for SSD. Intel did a blog post stating that they are the only SSD providers in the world who support 1 million TPS using ACT.
Google cloud has done some work to display the throughput of google compute engine. Google posted on their blog what Cassandra takes 300 nodes to produce, what AS does it with 50 nodes.
Aerospike deals with realtime problems in a very effective manner.