Cassandra Client Java API's [closed]

半世苍凉 提交于 2019-11-27 17:26:41
Lyuben Todorov

Thrift is becoming more of a legacy API:

First, you should be aware that the Thrift API is not going to be getting new features ; it's there for backwards compatibility, and not recommended for new projects.
- the paul

So I'd avoid Thrift based APIs (thrift is only kept for backwards compatibility).

In saying that if you do need to use a thrift based API I'd go for Astyanax. Astyanax is very easy to use (compared to other thrift APIs but my personal experience is that Datastax's driver is even easier).

So you should have a look at Datastax's API (and GitHub repo)? I'm not sure if there any compiled versions of the API for download but you can easily build it with Maven. Also if you take a look at the GitHub repo's commit logs it undergoes very frequent updates.

The driver works exclusively with CQL3 and is asynchronous but be warned that Cassandra 1.2 is the earliest supported version.

Performance
Astyanax is thrift based and Datastax's drive is the binary protocol. Here are the latest benchmarks I could find between thrift and CQL (note these are definitely out of date). But in fairness the small difference in performance shown in these benchmarks will rarely matter.

Asynch support
Datastax's asynch support is a definite advantage over Astyanax (Netflix tried implementing it but decided not to).

Documentation
I cant really argue against Netflix's wiki. The documentation is excellent and its updated fairly frequently. Their wiki includes code examples, and you can find tests in the source code if you need to see the code at work. I struggled to find any documentation of the Datastax driver however test are provided in the GitHub repository so that is a starting point.

Also have a look at this answer (well.. not my one anyway) It looks into some advantages/disadvantages of Thrift and CQL.

I would recommend Datastax java driver for Cassandra http://www.datastax.com.

For JPA like support try my mapping tool. http://valchkou.com/cassandra-driver-mapping.html

Annotation driven No mapping files, no scripts, no configuration files. No need for DDL scripts. Schema automatically synchronized with the entity definition.

Usage sample:

   Entity entity = new Entity();
   mappingSession.save(entity);
   entity = mappingSession.get(Entity.class, id);
   mappingSession.delete(entity); 

available on maven central

   <dependency>
      <groupId>com.valchkou.datastax</groupId>
      <artifactId>cassandra-driver-mapping</artifactId>          
    </dependency>

I would also add decent support as well. We post answers to playORM all the time on stack overflow ;). It also is about to start supporting mongodb(work is nearly finished) so any clients can run on mongodb or cassandra. It has it's own query language such that this port works just fine. You always have access to the raw astyanax interface too when really need the speed.

Also, your note on asynch...thrift previously did not support asynch so no clients did either as they generated the thrift code. Since that has changed, I don't know of a client that has added the asynch stuff in.

I know hbase has an asynch client though. Anyways, just thought I would add my 2 cents in case it helps a little.

EDIT: I was recently in the cassandra-thrift generated source code and it is not a very good api for async development with send and a recv() method but you don't know when to call the recv method. Aaron morton on cassandra user list has a blog on how you can really do it but it is not clean at all...have to grab the selector from thrift deep down and do some stuff so you know when to call the recv method...pretty nasty stuff.

later, Dean

I've used Hector, Astyanax and Thrift directly. I've also used the Python client PyCassa.

The features that I found important and differentiating were:

  • Ease of use of the API
  • Composite column support
  • Connection pooling
  • Latency
  • Documentation

One of the major issues is getting the types correct. You want to be able to pass in longs, Strings, byte[], etc.. Both Hector and Astyanax solve this by using Serializer objects. In Astyanax you specify them higher up the chain so you have to specify them less often. In Hector the syntax is often very clunky and hard to adapt if you change your schema.

Since Python has dynamic types, it is much easier to deal with this in PyCassa. Since it's not an option for you I won't say much about it, but I found it easiest to use (by far) but also quite slow.

Composite column support is very confusing in Hector. Astyanax has annotations to greatly simplify this.

As far as I know, the connection pooling is the same for Hector and Astyanax. Both will avoid downed hosts and discover new ones added to the ring. Both of these features a crucial to reliability and maintainability. Pelops appears to have these features but I've never tried it.

A key difference between Astyanax and Hector is the latency optimizations. Astyanax has the ability to route read and write requests to a replica node, potentially avoiding an extra networking hop. This can reduce the latency by a few milliseconds.

At last look, Astyanax had poor documentation, but it seems much improved now.

The only advantage of Hector I can see today is that it is much more widely used so probably less buggy. But Astyanax has a better feature set.

I have a similar recommendation as Valchkou. DataStax java CQL driver, is very good. I tried astyanax, kundera and buffalosw's playorm. Astyanax is very low level and some what complex. Kundara and playorm are generic ORMs for nosql databases, and are complex to setup and to get started.

Datastax apis are pretty much similar to a JDBC driver and you have to embed CQL statements in your DAO and write several lines of code to load and save your entities. To solve this problem, I wrote a java object mapper called cassandra-jom, built around datastax cql driver. Cassandra-jom annotations are very similar to JPA/Hibernate annotations and can even create/update your column family schema from your object model. It is easy to use and reliable and used in my other live web applications. Check it out at its github page.

https://github.com/w3cloud/cassandra-jom

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!