10 node load test comparison using Amazon EC2 SSD-based instances. 1 billion vertices and 1 billion edges processed for each test run. Used the titan-loadtest project to run each test.
Method
Experiment maximizes data locality by co-locating load generation, Titan graph database, and Cassandra/Hazelcast within the same JVM instance while partitioning data across a cluster. Exploration of methods for tuning garbage collection, Titan, and Cassandra for the peer computing use case.
The following components were utilized during the experiment:
Technology | Version |
---|---|
RHEL x64 HVM AMI | 6.4 |
Oracle JDK x64 | 1.7_45 |
Apache Cassandra | 1.2.9 |
Hazelcast | 3.1.1 |
Titan | 0.3.2 |
Each test iteration has 6 read ratio phases starting with 0% reads (100% writes) all the way up to 90% reads and 10% writes. For all tests, the persistence implementation executes in the same JVM as Titan to avoid unnecessary context switching and serialization overhead. Tests were conducted using an Amazon placement group to ensure instances resided on the same subnet. The storage was formatted with 4K blocks and used the noop scheduler to improve latency.
For each phase, new vertices were added with one edge linking back to a previous vertex. No tests of update or delete were conducted.
Please see the titan-loadtest project above for all Cassandra and Hazelcast settings and configurations used in the test.
Results
Please note: the results are listed in rates of thousands of vertices per second and include the creation of an edge as well as a vertex. Also, the Hazelcast SSD x1 results used a custom flash storage module for Hazelcast developed privately so those results are not replicable without that module installed.
Conclusions
Hazelcast performed better than Cassandra for all tests and demonstrated one order of magnitude better performance on reads. Surprisingly, Hazelcast slightly outperformed Cassandra for writes as well.
super curious about “Hazelcast SSD x1 results used a custom flash storage module for Hazelcast developed privately”
Yea, still trying to get my employer to agree to open source that one 🙂
So just FYI in followup tests I’ve been able to get Hazelcast to scale to about 120 million vertices and 120 mil edges on a single node. So if you are planning a cluster make sure to use a larger number of smaller nodes to get more efficient utilization of the hardware you purchase 🙂