Tag Archives: REST

RESTful High Availability and Low Latency on Amazon EC2

For this post, I have condensed some tricks, tips, and techniques for improving availability and latency for RESTful apps on Amazon Elastic Compute Cloud.

Geo-aware Load Balancing

Since Amazon Elastic Load Balancing does not automatically route users to the best region, utilizing a geo-aware IP load balancing service on top of the Amazon ELB tier reduces latency and can also increase availability.  These services route user requests to the closest region automatically, thus reducing network latency. Also, if the entire region is down user requests are routed to another region. Examples include Dyn DNS and Akamai DNS.

Leveraging HTTP Caching

Using RESTful services makes all the rich caching features of HTTP available to your services.  Browser (agent) cache takes direct load off your servers at no cost to you: so you should conduct a thorough review of your HTTP interactions to determine which could be cached.  This can dramatically improve both latency and availability by reducing round trips to the server. Surrogate cache (such as Amazon Cloudfront) operates over a content delivery network (CDN), which replicates content close to the user to reduce latency and increase availability.  You can put Cloudfront in front of any HTTP server to take load off the web server, app server, and db tiers and improve response time for users. Secure (HTTPS) content cannot be cached by a surrogate so it is important to segregate truly secure interactions from shared resources such as images or other rich content which could be shared among users. Also, caching normally requires managing the service URLs to ensure cacheablility (i.e. don’t embed JSESSION_ID in the URL) and customizing the HTTP cache-control headers on app server responses for cacheable responses.

Throttle Back and Fail-fast

When interacting with a load balancer, use the HTTP 503 Service unavailable response code to immediately cause the load balancer to re-submit the request to another server instance. This allows throttling back the load proactively before a downstream server or other component fails, and allows the application to control how much load it is willing to take on at any given moment.  This is also called “fail-fast”, since we can throttle back the request immediately, without waiting for a server or connection time out.  In order to take advantage of throttling we need to ensure that we process requests asynchronously and monitor the queue depth or request backlog.  Then we have to have a tunable configuration point were we can control when the application throttles back.  This technique improves availability and latency in the following areas:

  1. Under heavy load and spikes, thottling allows the architecture to respond proactively to potential overload rather than waiting around for something bad to happen.
  2. Throttling back tells the load balancer exactly when to send a request to another node, instead of having it perform guess-work based on CPU, network traffic, or request latency.  This reduces availability problems which occur when the load balancer guesses wrong or has the wrong configuration.
  3. The front end web application can control throttle back based on implicit knowledge of the end-to-end architecture which the load balancer cannot possibly have.

In order to implement throttling correctly we need to ensure all requests are either idempotent or queued.  An idempotent request is implicitly cachable and could be served from another layer such as a CDN.  Anything which is not idempotent must be queued.  Non-idempotent requests implicitly take more time and resources to execute and may involve stateful interactions with a high overhead.  Using a reliable and scalable managed queuing mechanism such as Amazon SQS ensures that the queue itself is not a point of failure.

Turn Off Sessioning

Most RESTful services will not require tracking or persisting HTTP session information.  In fact, for scalability purposes REST explicitly defines a “Stateless” constraint to prevent cross-request state from interfering with web server scalability.  To disable HTTP sessions for a Tomcat webapp, add a context.xml under your webapp/META-INF (not WEB-INF) with the following contents.  This will work even with Elastic Beanstalk applications, since Beanstalk uses Tomcat:

<?xml version='1.0' encoding='utf-8'?>
<Context cookies="false" cacheTTL="0">

  <!-- Default set of monitored resources -->

  <!-- Uncomment this to disable session persistence across restarts -->
  <Manager pathname="" />


Redirect After Post

Many applications I’ve seen still serve a response body with a POST, PUT, or DELETE request.  This violates REST principals, and removes a needed separation of concerns.  It can effect latency and availability since a response from a POST, PUT, and DELETE are not cachable.  It can also allow the user to refresh the page and unknowingly re-submit the transaction again.  To avoid these problems, utilize the Redirect After Post pattern. As mentioned under the throttling section above, the POST, PUT, or DELETE can issue a message to the queue. This returns immediately, and should redirect to a status page, or redirect to the next page in the user interaction flow. The status page can automatically refresh periodically until the request is done, or we can just move on with the flow. It is counter productive to latency and availability to keep a HTTP connection open while a (potentially long running) state change completes, and there is no reason to keep the user waiting either.

Utilize Application Tier Cache

The ephemeral nature of cloud instances means that anything in memory or local disk could disappear at any moment.  However, this does not mean we should avoid in-memory caching in the application tier.  If the instance goes down, ELB guarantees that in-process requests go to another instance.  Therefore, it is no problem to use application tier caches just like you would in any JEE deployment.  Remember, any single server is by nature ephemeral.  The cloud does nothing to change this.  Someone could spill coffee on the server at any time.  The only thing that changes with the cloud is that the person spilling the coffee works for Amazon, so you have the potential to get your money back!


If possible, shard your data in a way that evenly distributes load.  Not all businesses can shard their data; however, if you provide a user-centric service which doesn’t often mix data between users (like Facebook), then sharding may be your availability and latency coup de grâce.  Set up a series of sub-domains, or RDS databases, etc.  Then, in the user profile, store the assigned resource name.  If you assign the resources intelligently you can service any number of users with a continuous level of latency and availability, as long as users don’t cross geo regions.  If users cross regions then there will be some overhead for that user due to cross-region communication.  However, at a later point if this becomes a problem you can create a routine to detect when a user changes regions and migrate their data to the new region.

Using RDS without Sharding?  Try Read Replicas

If you do not need absolute read consistency and need to scale out reads Amazon RDS allows creation of an infinite number of read replicas. Read away!

PS: make sure to create a multi-AZ deployment because then the replication will run off the Standby Database and not place load on the transactional instance.

Batch SQS Messages

For many applications, the latency associated with Simple Queue Service is unacceptable.  In this case, try batching multiple incoming requests into one SQS message.  SQS scales out to larger messages so this should reduce the latency of your service end point if you can find the “sweet spot” for SQS message size.  I recently bench-marked SQS on an m1.large instance and found that queue through-put with 8KB messages was 393 messages per second.  That single-threaded performance benchmark will scale linearly across multiple threads.  So by batching SQS requests a REST service with a request size of 1024 bytes SQS will provide service to over 3000 requests per second with a single thread processing the SQS enqueues.

Change Piecemeal Changes to Bulk Changes

If you followed the advice above about queuing changes in SQS, then you will find a queue full of changes to process instead of a slew of piecemeal changes.  If these changes need to be applied to an RDS MySQL instance, then considerable performance gains could be made via applying change using batch JDBC. Overall, the database latency and availability improve with batch operations due to batches incurring less frequent network hits and transaction commits compared to piecemeal changes.

Closing Comment

Hopefully the techniques above will help you build faster, more available RESTful applications on the Amazon EC2 platform.  If you have any questions, comments, or feedback please feel free to comment the post!