Titan: Cassandra vs. Hazelcast persistence benchmark
10 node load test comparison using Amazon EC2 SSD-based instances. 1 billion vertices and 1 billion edges processed for each test run. Used the titan-loadtest project to run each test.
Experiment maximizes data locality by co-locating load generation, Titan graph database, and Cassandra/Hazelcast within the same JVM instance while partitioning data across a cluster. Exploration of methods for tuning garbage collection, Titan, and Cassandra for the peer computing use case.
The following components were utilized during the experiment:
|RHEL x64 HVM AMI||6.4|
|Oracle JDK x64||1.7_45|
Each test iteration has 6 read ratio phases starting with 0% reads (100% writes) all the way up to 90% reads and 10% writes. For all tests, the persistence implementation executes in the same JVM as Titan to avoid unnecessary context switching and serialization overhead. Tests were conducted using an Amazon placement group to ensure instances resided on the same subnet. The storage was formatted with 4K blocks and used the noop scheduler to improve latency.
For each phase, new vertices were added with one edge linking back to a previous vertex. No tests of update or delete were conducted.
Please see the titan-loadtest project above for all Cassandra and Hazelcast settings and configurations used in the test.
Please note: the results are listed in rates of thousands of vertices per second and include the creation of an edge as well as a vertex. Also, the Hazelcast SSD x1 results used a custom flash storage module for Hazelcast developed privately so those results are not replicable without that module installed.
Hazelcast performed better than Cassandra for all tests and demonstrated one order of magnitude better performance on reads. Surprisingly, Hazelcast slightly outperformed Cassandra for writes as well.