That includes the Java VM. Yes, you heard it. I’ve been writing Java since 1996 and in 2016 I can officially say that all the reasons I supported Java for all these years no longer apply. I accurately predicted the rise of Java when the technology was literally a laughing stock, and I have stuck with it for very good reasons until now. What’s Changed? And more importantly: What’s Next?
FYI: this post relates to mission critical enterprise software, not desktop software
Back in the day, there were legions of different processors, endian issues, and not even agreement on how big a “byte” was. Now only three viable von Neumann architectures exist: Intel, Intel, and Intel. The von Neumann architecture itself is dead too. We’ll get to that.
In yesteryear, networks were slow so you had to move the code around to avoid moving the data across the network. Java Applets and Hadoop are good examples of this. Java was ideal for this because of platform independence, dynamic class loading, and dynamic compilation to native code. Now it is actually the software which is slowing the networks today, not the other way around. We’ll get to that.
In the old days, operating systems vied for superiority, spreading FUD as they went. No one knew who would win (nail-biter). Now there are only three operating systems vying for dominance: and they are all flavors of Linux.
Spinning up a Linux container has literally almost no overhead, and yet has enterprise class resource management, security, and robustness. The industry currently focuses on microservices as a design pattern for client side applications, however this pattern applies equally to server-side applications as well. New Linux flavors like CoreOS and Alpine build on this concept where everything except the kernel is a microservice operating in a container. This allows very high levels of performance, security, and efficiency in the kernel that all the other services rely upon. These new server-side microservice platforms provide all the enterprise class deployment, management, monitoring, security, and interoperability that the Java Platform delivered 21 years ago, without the need for a virtual machine of any kind. Server-side microservices provide both resource isolation and maximum performance at the same time: at a level which neither the Java VM nor any VM can come close to matching. And what would be the language of choice for implementing these world changing server-side microservices?
The OS itself is implemented in C, so naturally any server-side microservice not implemented in C will have a very hard time competing with those who are. Note that the container model completely eliminates the normal native package management hell associated with C, even to the point where an “Apple Store” for containers was recently announced by Docker.
Marketplaces like the Docker Store allow purchasing an entire server cluster pre-configured with server-side microservices of your choice on any cloud platform or even in a local bare metal data center. The same solution also solves the cloud vendor lock-in that many companies have been struggling with. Like I said: GAME OVER
On a final note: von Neumann microprocessors no longer fit Moore’s Law and price / performance ratios been degrading for some time now. The data volumes and low latency requirements of the Internet-of-Things will soon place unbearable pressures on the von Neumann microprocessor model. Programmable hardware such as FPGA have traditionally required learning different languages and complete software re-write to take advantage of programmable processor architecture. Seymour Cray founded a company 20 years ago with this exact realization, and the graphic below says it all. If you want to leverage the power of the Saturn FPGA cartridge, you’ll be writing your code in C, thank you very much 🙂
They realize all their “little friends” have been leading them off the face of a cliff. I know it’s a bit macabre, but *can* I watch? The Oracle-hating (read success-hating) that birthed Scala is finally coming to a head.
By their fruits you shall know them. The final fraudulent claim: all the trash code produced by Scala’s over-complexity will somehow run faster natively, and that the saving grace is finally getting rid of that darn JVM. As I have said previously FYI: creating a new language does’t make your writing better. Java used to have ahead-of-time compilation too, but that died off because dynamic run time application redefinition is more valuable in a business sense then statically compiled code. It is easy to criticize a mature, free, and broadly adopted platform like the JVM for the minor flaws it has, and very hard to create a new native platform that actually works. More likely Scala will loose the tiny market share it has long before this native platform kills off the stragglers.
But seriously, is it too much to hope that *all* the talent-haters and success-haters will be killed off by this one Darwin Award qualifying mass extinction event?
100% PowerPoint free! Learn at your own pace with 10 hours of video training and 15+ hours of in-depth exercises created by Matt Pouttu-Clarke and available exclusively through Harvard Innovation Labs on the Experfy Training Platform.
Learn to develop industrial strength MapReduce applications hands-on with tricks of the trade you will find nowhere else.
Apply advanced concepts such as Monte Carlo Simulations, Intelligent Hashing, Push Predicates, and Partition Pruning.
Learn to produce truly reusable User Defined Functions (UDFs) which stand the test of time and work seamlessly with multiple Hadoop tools and distributions.
Learn the latest industry best practices on how to utilize Hadoop ecosystem tools such as Hive, Pig, Flume, Sqoop, and Oozie in an Enterprise context.
In this video training, Matt explains how hyperdimentional reasoning implicitly plays a part in all big data analyses and how today’s analytics and deep learning can utilize hyperdimensionality to improve accuracy and reduce algorithmic blind spots.
The Software In Silicon Data Analytic Accelerator (SWiS DAX) APIs released by Oracle this week signify a sea change for big data and fast data analytic processing. Natively accelerated common analytic functions usable in C, Python, and Java have already shown a 6x lift for a Spark cube building application. Apache Flink and Apache Drill completely eclipse Spark performance so it will be very interesting to see upcoming benchmarks of these higher performing frameworks on SWiS DAX. There is nothing to keep any vendor or group from bench marking with these APIs as they will work with any C, Python, or Java application.
I’m also looking forward to testing performance of SWiS DAX on non-partitionable data sets in a big memory SMP architecture as well. The easy problems are partitionable, and true data discovery should allow any-to-any relations without injecting a-priori partitioning assumptions.
It seems that Oracle’s long standing commitment to developing Sun’s Sparc processors is about to pay off in a very big way for big data and fast data analysts.
Remember back in the the day when the some kids would invent their own language like Pig Latin and then talk it amougst themselves? And if you didn’t talk to them in Pig Latin you just weren’t cool enough to be part of the conversation. I was more like the “why bother” kid. Not that I couldn’t learn Pig Latin, I just didn’t see the point. Seriously, not that much changes.
Don’t get me wrong, I’m being hard on Berkley out of love. Several of my relatives went there during the People’s Park era. I get that having your own language like Swift or Go provides artificial culture creation and built-in revenue protection. That’s fine if you want to program to the Google or Apple cool-aid culture and plug right in. Not my thing, but it takes all kinds. It’s just that I’m not getting this whole Scala thing. Why it exists.
Ok so I get that for a lot of people the Java music died when Scott McNealy finally sold out to Oracle. And the whole Bill Joy thing and how that was handled… Shameful, all. Oracle’s attempts to profiteer and build an empire off Java with things like the OpenOffice purchase: ridiculous. But the funny thing is, Oracle has taken a beating from their industry customers (like me) and has actually realized that being a great Java stakeholder is the best chance they have of preserving market share. Of course they would never admit that publicly but that’s what I love about Oracle: they spend a lot more time delivering than talking about it. They’re kind of like the quiet doer kid who thinks he’s Iron Man.
Hard coding Hadoop dependencies in because that’s for sure how I’ll store data when everything is on Non-volatile RAM
It’s not really about the language in the end: it’s about who’s writing it and the quality and integrity of what is written.
So I just want to say it clearly and definitively for all to hear: Twitter is dead wrong, Nathan Marz is right, and the coolest kids are alive and well and speaking Clojure just because they love it. Nothing personal.
To me this weekend wasn’t the Panthers vs. Broncos match-up for Super Bowl 50, or when we found out that Bernie Sanders won the New Hampshire primary. Although both of these were hoooge: it WAS when these parallel but significant facts emerged:
Google makes it’s historical first open source contribution to the Apache Foundation in the form of Apache Beam
When combined with Apache Spark’s severe tech resourcing issues caused by mandatory Scala dependencies, it seems that Apache Beam has all the bases covered to become the de facto streaming analytic API. The cool thing is that by using Apache Beam you can switch run time engines between Google Cloud, Apache Spark, and Apache Flink. A generic streaming API like Beam also opens up the market for others to provide better and faster run times as drop-in replacements. Google is the perfect stakeholder because they are playing the cloud angle and don’t seem to be interested in supporting on-site deployments. Hats off Google, and may the best Apache Beam run time win!
Apache Beam from Google finally provides robust unification of batch and real-time Big Data. This framework replaced MapReduce, FlumeJava, and Millwheel at Google. Major big data vendors already contributed Apache Beam execution engines for both Flink and Spark, before Beam even officially hit incubation. Anyone else seeing the future of Big Data in a new light? I know I am…
Scala works best with IntelliJ Idea IDE, which has licensing costs and is extremely unlikely to replace free Eclipse tooling at any large company
Scala is among a crowd of strong contenders and faces a moving target as Java has gained 5% in market share between 2015 and 2016. To put this in perspective, Scala has less market share than Lisp
Consistency and Integrity Issues
Trying to get Spark to meet rigorous standards of data consistency and integrity proves difficult. Apache Spark’s design originates from companies who consider Data Consistency and Data Integrity secondary concerns, while most industries consider these primary concerns. For example, achieving at-most-once and at-least-once consistency from Spark requires numerous workarounds and hacks: http://blog.cloudera.com/blog/2015/03/exactly-once-spark-streaming-from-apache-kafka/
Dependency Hell with a Vengeance
Apache Spark (and Scala) import a huge number of transitive dependencies compared to other alternative technologies. Programmers must master all of those dependencies in order to master Spark. No wonder very few true experts in Spark exist in the market today.
What’s the Alternative to Spark?
For real-time in-memory processing Use Case: data grids, once the purview of blue chip commercial vendors, now have very strong open source competition. Primary contenders include Apache Ignite and Hazelcast.
For fast SQL analytics (OLAP) Use Case: Apache Drill provides similar performance to Spark SQL with a much simpler, more efficient, and more robust footprint. Apache Kylin from eBay looks to become a major OLAP player very quickly, although I have not used it myself.
For stream processing Use Case:Apache Beam from Google looks likely to become the de-facto streaming workhorse, unseating both Apache Flink and Spark Streaming. Major big data vendors have already contributed Apache Beam execution engines for both Flink and Spark, before Beam even officially hit incubation.
If you try these alternative technologies, and compare to Spark, I’m sure you’ll agree that Spark isn’t worth the headache.