Titan: Cassandra vs. Hazelcast persistence benchmark

10 node load test comparison using Amazon EC2 SSD-based instances. 1 billion vertices and 1 billion edges processed for each test run. Used the titan-loadtest project to run each test.

Method

Experiment maximizes data locality by co-locating load generation, Titan graph database, and Cassandra/Hazelcast within the same JVM instance while partitioning data across a cluster. Exploration of methods for tuning garbage collection, Titan, and Cassandra for the peer computing use case.

The following components were utilized during the experiment:

Technology	Version
RHEL x64 HVM AMI	6.4
Oracle JDK x64	1.7_45
Apache Cassandra	1.2.9
Hazelcast	3.1.1
Titan	0.3.2

Each test iteration has 6 read ratio phases starting with 0% reads (100% writes) all the way up to 90% reads and 10% writes. For all tests, the persistence implementation executes in the same JVM as Titan to avoid unnecessary context switching and serialization overhead. Tests were conducted using an Amazon placement group to ensure instances resided on the same subnet. The storage was formatted with 4K blocks and used the noop scheduler to improve latency.

For each phase, new vertices were added with one edge linking back to a previous vertex. No tests of update or delete were conducted.

Please see the titan-loadtest project above for all Cassandra and Hazelcast settings and configurations used in the test.

Results

Please note: the results are listed in rates of thousands of vertices per second and include the creation of an edge as well as a vertex. Also, the Hazelcast SSD x1 results used a custom flash storage module for Hazelcast developed privately so those results are not replicable without that module installed.

Conclusions

Hazelcast performed better than Cassandra for all tests and demonstrated one order of magnitude better performance on reads. Surprisingly, Hazelcast slightly outperformed Cassandra for writes as well.

3 thoughts on “Titan: Cassandra vs. Hazelcast persistence benchmark”

super curious about “Hazelcast SSD x1 results used a custom flash storage module for Hazelcast developed privately”

mpouttuclarke says:

December 12, 2013 at 3:07 pm

Yea, still trying to get my employer to agree to open source that one 🙂

Reply

So just FYI in followup tests I’ve been able to get Hazelcast to scale to about 120 million vertices and 120 mil edges on a single node. So if you are planning a cluster make sure to use a larger number of smaller nodes to get more efficient utilization of the hardware you purchase 🙂

Think outside received beliefs

Miko Matsumura says:

December 12, 2013 at 1:40 pm

super curious about “Hazelcast SSD x1 results used a custom flash storage module for Hazelcast developed privately”

1. mpouttuclarke says:
  
  December 12, 2013 at 3:07 pm
  
  Yea, still trying to get my employer to agree to open source that one 🙂
  
mpouttuclarke says:

December 30, 2013 at 10:13 am

So just FYI in followup tests I’ve been able to get Hazelcast to scale to about 120 million vertices and 120 mil edges on a single node. So if you are planning a cluster make sure to use a larger number of smaller nodes to get more efficient utilization of the hardware you purchase 🙂

Matt Pouttu-Clarke's Blog

3 thoughts on “Titan: Cassandra vs. Hazelcast persistence benchmark”

Leave a reply to Miko Matsumura Cancel reply

Think outside received beliefs

Method

Results

Conclusions

Share this:

Related

3 thoughts on “Titan: Cassandra vs. Hazelcast persistence benchmark”

Leave a reply to Miko Matsumura Cancel reply

Think outside received beliefs