Google recently released a detailed comparison of the programming models of Apache Beam vs. Apache Spark. FYI: Apache Beam used to be called Cloud DataFlow before it was open sourced by Google:
Here’s a link to the academic paper by Google describing the theory underpinning the Apache Beam execution model:
When combined with Apache Spark’s severe tech resourcing issues caused by mandatory Scala dependencies, it seems that Apache Beam has all the bases covered to become the de facto streaming analytic API. The cool thing is that by using Apache Beam you can switch run time engines between Google Cloud, Apache Spark, and Apache Flink. A generic streaming API like Beam also opens up the market for others to provide better and faster run times as drop-in replacements. Google is the perfect stakeholder because they are playing the cloud angle and don’t seem to be interested in supporting on-site deployments. Hats off Google, and may the best Apache Beam run time win!