Archive for the ‘ Inverse Code-on-Demand ’ Category

The Elephant and the Rhino

If we use JavaScript for Code On Demand couldn’t we also use it for Inverse Code on Demand using Hadoop?  It’s not as hard to find people who know JavaScript as it is to find people who know Groovy or Python.  Less of a learning curve for everyone involved if we use the most popular scripting language for our embedded business rules.   Plus if we build them right we could use the same business rule across our entire ultra high scale web presence: from the browser to the elastic map reduce

Mozilla Rhino doesn’t have a lot of marketing behind it, but it’s been around.  Feature rich, mature, and embeddable: Mozilla’s JavaScript engine looks like a winner. We could inject input splits and emit results with ease.

Plus if you thought the Bull in the china shop was fun, wait until you see the Elephant and the Rhino!

Make the Elephant and the Rhino your partner, and utterly destroy a china shop near you!  China Shops don’t scale anyway…

Tips for Implementing Rhino in Hadoop

Make sure to use the Java scripting API compilation option in the setup and cleanup method of your mapper or reducer. Please see this article on how to compile scripts.  This dramatically reduces the CPU requirements and execution time for scripts.

Inverse REST

The principles of REST allow HTTP to scale to Internet user volumes, and code-on-demand provides one of the building blocks of REST.  Code-on-demand allows rich content on the browser via JavaScript or browser plug-ins, and this technique has matured so much that it requires minimal server interaction to run even the most sophisticated applications.

In short, code-on-demand enables scaling across across users by moving the code (logic) to the consumer, instead of requiring the consumer to make a remote request to the code.

It follows logically that any successful approach to scaling across big data requires inverting REST and executing the code as close to the data as possible, rather than trying to move big data to the code.

Likewise, the web architecture scales across users by utilizing caching to provide data locality for shared data.  Scaling across big data also requires data locality for reference data. We must move our small data close to our big data to scale efficiently.

The major flaw designed into many current big data applications involves the failure to utilize the inverse of REST: SOA and integration vendors sell customers on the idea that big data problems can be solved by moving all data through a layer of middle-ware: whether this be a J2EE application server, a service bus, or an integration tool.  I have literally spent years of my career trying to tune and optimize middle-ware solutions for big data.  That’s why I can say definitively that the middle-ware concept does very well at selling a lot of consulting hours, software licenses, and hardware.  What it does not do is scale to big data.

You could code all the big data logic in stored procedures, assuming willingness to embed business logic into a closed system, and assuming that a database will scale to your data volumes.  Database vendors are only beginning to utilize Inverse REST: evaluating filters , transformations, and lookups in the storage pipeline is a new (or non-existent) feature in most DBMS systems.  Yet another opportunity for vendor lock-in.

Hadoop Map Reduce follows an open system implementation of inverse REST.

Regardless of who wins that battle between RDBMS and Map/Reduce one thing is certain: anyone not leveraging the principles of Inverse REST will be left in the dust.

Google, Yahoo, Facebook, StumbleUpon, and others have already hit the wall, and it’s only a matter of time before we all do.

Follow

Get every new post delivered to your Inbox.