2008 East
DIAMOND SPONSOR:
Data Direct
Frontiers in Data Access: The Coming Wave in Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
Intel
Virtualization – Path to Predictive Enterprise
Green Hills
IT Security in a Hostile World
JBoss / freedom oss
Practical SOA Approach
GOLD SPONSORS:
Software AG
The Art & Science of SOA: How Governance Enables Adoption
PlateSpin
Effective Planning for Virtual Infrastructure Growth
Fujitsu
Automated Business Process Discovery & Virtualization Service
Ceedo
Workspace Virtualization
Click For 2007 West
Event Webcasts

2008 East
PLATINUM SPONSORS:
Appcelerator
Think Fast: Accelerate AJAX Development with Appcelerator
GOLD SPONSORS:
DreamFace Interactive
The Ultimate Framework for Creating Personalized Web 2.0 Mashups
ICEsoft
AJAX and Social Computing for the Enterprise
Kaazing
Enterprise Comet: Real–Time, Real–Time, or Real–Time Web 2.0?
Nexaweb
Now Playing: Desktop Apps in the Browser!
Sun
jMaki as an AJAX Mashup Framework
POWER PANELS:
The Business Value
of RIAs
What Lies Beyond AJAX?
KEYNOTES:
Douglas Crockford
Can We Fix the Web?
Anthony Franco
2008: The Year of the RIA
Click For 2007 Event Webcasts
SYS-CON.TV
TOP THREE LINKS YOU MUST CLICK ON


Building Web Apps That Leverage Content Delivery Networks
Five different integration options

As the Web becomes an intrinsic part of the economy and our everyday lives, the success and survival of many businesses increasingly depend on the availability and accessibility of their core Web applications. Although a high degree of scalability and reliability can be achieved through the right combination of local and global redundancy, load balancing and sound application design, many companies turn to Content Delivery Networks or CDNs such as Akamai or Speedera. This article recounts experiences and lessons learned from developing an information portal that serves millions of users and leverages Akamai's CDN.

How CDN Works

Typically CDN providers augment the traditional Web infrastructure shown in Figure 1-a by introducing thousands of edge servers usually located at ISPs, carriers, backbones and other Web hubs around the world. They intercept the HTTP traffic directed to the sites of the network's customers and attempt to serve the requests from the closest possible location as shown in Figure 1-b. If the requested information can't be found in the cache, an edge server requests it from the origin site, passes it to the client, and then caches it to serve future requests.

It's important to know that page URLs are used as caching keys. If two pages have identical ULRs, they would be considered the same page even if their content were different. Conversely, two identical pages with different URLs would be considered distinct and would be cached separately.

This basic service model is often complimented by premium services. In case of Akamai it includes server mapping and cache hierarchies. Its edge platform consists of more than 14,000 servers. Under high load conditions, the origin site could be swamped by the requests from edge servers from different locations. Creating a dedicated cache hierarchy, which is done by establishing parent-child relationships among the edge servers, can mitigate this situation. In such hierarchies, children request content from their parents rather than the origin site. This provides caching on multiple levels, and can greatly reduce the number of edge servers that access the site itself. Server mapping involves dedicating specific servers in each data center to a particular origin site. This reduces the overall number of edge servers that access the site and improves the cache hit-rate of each server.

Today, many well-known sites are delivered from the edge by a CDN. An easy way to find out is to ping the same Web address through two separate providers, e.g. from home and the office. If it resolves to two different hosts outside the site's domain, they're likely to be edge servers. Another clue is when Netcraft reports an impossible platform combination such as this: www.cdc.gov was running Microsoft-IIS on Linux when last queried at 24-Sep-2004 10:01:22 GMT

CDN Integration Options

My first impression after reading white papers published by CDN providers and talking to their sales staff was that integrating a CDN into the solution architecture was easy and transparent. It looked like all that had to be done to make an application globally available was to sign the contract and provide the server and URL information. However, reality is rarely that simple, and CDNs, as most other optimizations, can be ineffective, even detrimental, if applied incorrectly.

Further analysis identified five possible levels of integration between Web applications and CDNs. They put different requirements on the Web application and have a major impact on end-user performance and the load on the application.

There are five possible levels of integration between a Web application and a CDN: Asset Caching, Page Caching, Personalized Page Caching, Edge Side Includes, and Edge Computing.

Asset Caching
Cacheable assets include file-based elements embedded in the pages and files that are downloadable from a site. They can be images, scripts, style sheets, applets, and static documents. This mechanism also includes serving the streaming media from the network's edge. It's completely transparent to applications and is the easiest integration method to use. Most CDNs are flexible enough to control caching objects by extension, file name, path, or domain, and let cached sites force the premature expiration of certain assets. The main drawback here is that, with the exception of streaming media, asset caching has little impact on either server load or the user's perception of performance. With this mode, each page is still served from the origin site and most load reduction occurs on the Web server, which is normally responsible for serving static assets. The application server still has to build every page, and the users have to wait for the entire roundtrip to complete before the page starts rendering in the browser.

Page Caching
The next level involves caching pages in their entirety. CDN usually allows the flexibility to control various aspects of caching and expiration down to the individual page level. Since edge servers cache pages by their URLs, for pages to be statically cacheable, they must render identical HTML when requested by different users or at different times within their Time To Live window. It's also important to limit the number of different URLs that represent each page. Having multiple URLs for the same page can result in decreased efficiency and increased cost, since CDN providers usually charge by bandwidth. If applicable, Full-Page Caching is the most effective mechanism from all points of view. It offers the most significant load reduction on the entire Web application, dramatically reduces roundtrip times for cached pages, and ensures availability by serving the cached pages if the origin site goes down. The only drawback is that for consistent results, this method requires that an application maintain strict URL and content discipline.

Personalized Page Caching
Page Caching can be extended to slightly personalized pages (e.g., ones containing a user's name, or a link to 'My...' page) by using the <jesi:personalize> tag. This mechanism lets some user-specific values be calculated and inserted into a page at the edge. The information for such resolution comes from session cookies and other parameters such as browser locale from the request. It doesn't increase the load on the origin servers compared to Full-Page Caching, and imposes the same limitations on the application.

Edge Side Includes
Full Edge Side Includes (ESI) integration assembles volatile and highly personalized pages at the edge by caching static page fragments while delegating the generation of mutable content to the origin server. This mechanism provides the ultimate flexibility, but it requires extensive modifications to the Web site implementation, including the ability to render individual page fragments along with complete pages. These modifications don't map well to most Web architectures particularly MVC frameworks such as Struts. And, despite the claim that ESI was specifically designed with portals in mind, I have yet to find a successful implementation with any commercial portal platform, such as WebLogic and Vignette. To get the desired flexibility, ESI shifts a significant part of the load for generating every page to the origin server. If a page contains multiple includes, each one has to be requested separately, so page load times can actually be increased overall, especially when going over a highly latent connection.

Edge Computing
The most radical step in bringing dynamic content closer to the user is to generate it right at the edge. Akamai offers an Edge Computing solution based on WebSphere that deploys J2EE components such as JSPs and servlets to the edge of the network. Usually only the presentation components are pushed to the edge, however the offering includes Cloudscape - a lightweight database that can be used to store relatively static application data such as product catalogs. If used appropriately, this option is unbeatable for versatility, performance, and load management. But, building and managing massively parallel applications is a very complex task full of potential pitfalls and unmarked dangers. This option might also be unsuitable for non-technical reasons such as cost, network latency, and application security.

Building Page-Cacheable Applications

Given these considerations it becomes clear that, when applicable, Page Caching offers the optimal balance for improving performance, reducing the load on the target site, and preserving a Web application's original architecture. So from here on out we'll focus on building applications that work consistently and reliably with the Page Caching mechanism of CDNs.

An Application's View of a CDN

When working through a CDN, applications still get HTTP requests from the Internet, but user demographics and behavior change dramatically. There's no longer a big and diverse population of "normal" users who surf pages sequentially and take time to read the content. Instead there's a small pack of "crazy" users, who look like they've bookmarked every page in the system and are jumping between them without any apparent logic or even time to digest the content. These users are cache servers, and their odd behavior comes from the collective cache misses of the actual users accessing the origin site through each server. Besides the virtual Attention Deficit Disorder, these new users also have multiple personalities - page requests that get through from a given edge server contain different cookies, browsers, and locales.
About Alex Maclinovsky
Alex works at Sun Microsystems as the Engineering Manager for Sun SOA Governance Solution. For nearly two decades he architected and built distributed systems on enterprise, national and global scale. Alex specializes in SOA Infrastructure, Security and Composite Applications. He blogs at http://blogs.sun.com/RealSOA/ and can be contacted at maclinovsky@yahoo.com

LATEST JAVA STORIES & POSTS
Unit testing is hard. There I said it. Although I have been developing software for the past 18 years I still find that putting my applications through their paces via unit testing is difficult. I have learned the lesson (I'm sure like many of you) the hard way. Unit testing is p...
Continuent has announced support and enhancements to MySQL Server 5.1.30 GA release, the 5.1 production version of the open source database. MySQL 5.1.30 is recommended for use on production systems by the MySQL build team at Sun Microsystems. Continuent Tungsten provides advance...
As a software journalist, there are times when certain vendors will shut the door on reporting opportunities that might represent too much of an "inside view" of their technology or their organization. I've been to more developer events than I can remember where I've been handed ...
Active Endpoints has announced the general availability of ActiveVOS 6.0.2, in response to ever increasing demands for improved process performance and efficiencies. ActiveVOS is an all-in-one, 100% standards-based orchestration and business process management system (BPM) that p...
Just because the web has been open so far doesn't mean that it will stay that way. Flash and Silverlight, arguably the two market-leading technology toolkits for rich media applications are not open. Make no mistake - Microsoft and Adobe aim to have their proprietary plug-ins, ak...
Doing network I/O on the user interface (UI) thread is bad. Most developers know that and can tell you why; unfortunately, it’s still done. At this year's JavaOne, one of the keynote JavaFX demos bombed because the network was slow, something that would be forgivable had the en...
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON FEATURED WHITEPAPERS

SPONSORED BY INFRAGISTICS
In every field of design one of the first things students do is learn from the work of others. They ...
There are many forces that influence technological evolution. After a decade of building enterprise ...
2008 is going to be an important year for Rich Internet Applications. Most organizations are deliver...
The OpenAjax Alliance is developing an Ajax industry wishlist for future browsers, using a dedicated...
Infragistics announced the availability of two Community Technology Preview (CTP) User Interface (UI...
The YUI development team has released version 2.5.2; you can download the new release from SourceFor...
ADS BY GOOGLE