In my previous post Game Over for VMs, what’s next? I made the case that the underlying concerns that gave rise to virtual machine technology no longer exist. Container technologies such as Docker have made all VMs, including the Java VM, unnecessary. This post documents an experiment defining the cost of using VM technologies for a specific and non-trivial use case.
- A non-trivial use case with real world applications.
- Terabytes or even petabytes of data readily available for testing.
- A reputable third party has developed and open sourced functionally identical code which runs in both a native and a VM environment.
The design I settled upon was to test regex processing using Google’s RE2 library against the Common Crawl data set. Google has open sourced RE2 in both it’s C and Java forms.
I developed a code base to extract potential phone numbers from the raw text format provided by Common Crawl written in both C and Java. The goal was to extract anything that looked like a phone number from the crawl data and provide the original matching line along with a standardized phone number as output. The intention was to use this program as a high performance upstream filter to “boil the ocean” and provide every potential instance of a phone number. Later downstream processing could then be used to validate and identify if the phone numbers were valid and actual, but it was not the intention of this process to do so.
The following command was used to execute the C test:
cat ~/Downloads/CC-MAIN-20160524002110-00000-ip-10-185-217-139.ec2.internal.warc.wet | time ./getphone-c/Release/getphone-c > getphone-c.txt
The following command was used to execute the Java test:
cat ~/Downloads/CC-MAIN-20160524002110-00000-ip-10-185-217-139.ec2.internal.warc.wet | time java -jar getphone-java/target/getphone-main.jar > getphone-java.txt
Both tests were executed on an uncompressed 412 MiB crawl file with an identical preceding warmup test. Both tests saturated a single core throughout the test.
This test assumes that Google puts reasonable effort into tuning both the C and Java versions of RE2. The test shows that for this use case the native implementation produces 20 times the hardware ROI of the VM implementation.
That includes the Java VM. Yes, you heard it. I’ve been writing Java since 1996 and in 2016 I can officially say that all the reasons I supported Java for all these years no longer apply. I accurately predicted the rise of Java when the technology was literally a laughing stock, and I have stuck with it for very good reasons until now. What’s Changed? And more importantly: What’s Next?
FYI: this post relates to mission critical enterprise software, not desktop software
- Back in the day, there were legions of different processors, endian issues, and not even agreement on how big a “byte” was. Now only three viable von Neumann architectures exist: Intel, Intel, and Intel. The von Neumann architecture itself is dead too. We’ll get to that.
- In yesteryear, networks were slow so you had to move the code around to avoid moving the data across the network. Java Applets and Hadoop are good examples of this. Java was ideal for this because of platform independence, dynamic class loading, and dynamic compilation to native code. Now it is actually the software which is slowing the networks today, not the other way around. We’ll get to that.
- In the old days, operating systems vied for superiority, spreading FUD as they went. No one knew who would win (nail-biter). Now there are only three operating systems vying for dominance: and they are all flavors of Linux.
Spinning up a Linux container has literally almost no overhead, and yet has enterprise class resource management, security, and robustness. The industry currently focuses on microservices as a design pattern for client side applications, however this pattern applies equally to server-side applications as well. New Linux flavors like CoreOS and Alpine build on this concept where everything except the kernel is a microservice operating in a container. This allows very high levels of performance, security, and efficiency in the kernel that all the other services rely upon. These new server-side microservice platforms provide all the enterprise class deployment, management, monitoring, security, and interoperability that the Java Platform delivered 21 years ago, without the need for a virtual machine of any kind. Server-side microservices provide both resource isolation and maximum performance at the same time: at a level which neither the Java VM nor any VM can come close to matching. And what would be the language of choice for implementing these world changing server-side microservices?
The OS itself is implemented in C, so naturally any server-side microservice not implemented in C will have a very hard time competing with those who are. Note that the container model completely eliminates the normal native package management hell associated with C, even to the point where an “Apple Store” for containers was recently announced by Docker.
Marketplaces like the Docker Store allow purchasing an entire server cluster pre-configured with server-side microservices of your choice on any cloud platform or even in a local bare metal data center. The same solution also solves the cloud vendor lock-in that many companies have been struggling with. Like I said: GAME OVER
On a final note: von Neumann microprocessors no longer fit Moore’s Law and price / performance ratios been degrading for some time now. The data volumes and low latency requirements of the Internet-of-Things will soon place unbearable pressures on the von Neumann microprocessor model. Programmable hardware such as FPGA have traditionally required learning different languages and complete software re-write to take advantage of programmable processor architecture. Seymour Cray founded a company 20 years ago with this exact realization, and the graphic below says it all. If you want to leverage the power of the Saturn FPGA cartridge, you’ll be writing your code in C, thank you very much 🙂