SBN

How You Get There Matters: "Middle-Mile" RUM to Drive CDN Strategy

alt text

Traditional RUM products make use of APIs (Navigation-Timing, Resource-Timing, etc.) built into browsers to help measure the performance characteristics of the loading of a web page. That is, they do their best to quantify the work required by the browser when rendering a web page.

APM products use instrumentation and other techniques to provide visibility into performance bottlenecks experienced in the back end of an application. While both of these types of tools are tremendously valuable and necessary if you care about the performance and availability of your applications, even when used together, they paint an incomplete picture of the pipeline used to deliver a digital experience to users’ eyeballs.

With the Spring 2018 release of mPulse, we are breaking open the black box of the middle mile — and exposing performance timers and metrics about our CDN in a fully transparent way. With the new data that we are collecting and exposing in our world-class visualizations, we are making it dead-easy to monitor, optimize, and validate your caching, compression, and acceleration strategies.

New Timers

Origin Latency

Origin Latency is defined as the duration (in milliseconds) of a round trip between the edge server closest to the origin and the origin. You can use Origin Latency as a proxy measurement for the distance between your origin server and the nearest located CDN POP.

Client Round Trip

Client Round Trip is defined as duration (in milliseconds) of a round trip between the client (browser) and the edge server closest to the client. Just like Origin Latency, Client Round Trip is a CDN measurement. It can help you quantify just how well distributed your CDN PoPs are in relation to your actual users.

New Dimensions

New in mPulse, you can slice and dice your data based on the following edge-injected dimensions.

Akamai Adaptive Acceleration

Was A2 applied? What are the performance gains due to the transformations A2 applies? And how are those gains affecting Session Duration and Conversion Rate? With mPulse 60, we can easily get at that data.

Possible values: enabled, disabled

Akamai Front-End Optimization

Was FEO applied? If applied, how much acceleration am I seeing? By deferring non-critical scripts, how much of a boost am I seeing for DOM Ready or Page Load?

Possible values: enabled, disabled

HTTP Protocol

This identifies the protocol negotiated with the client when NPN or ALPN is used.

Possible values: h2, h2-13, h2-14, spdy/2, spdy/3, spdy/3.1, http/1.0, http/1.1

IP Version

This identifies the format of the client IP address.

Possible values: 4, 6

Browser Cache Hit Ratio

As we know, the most performant request is the one we don’t have to make. So let’s use some of our new mPulse data to help evaluate our browser caching strategy.

We use Resource Timing Level 2 to identify which resources (scripts, stylesheets, fonts, and documents) were served from the browser’s cache. For resources that pass the same-origin policy, we use the “transferSize” attribute.

function browserCacheHit(resourceTimingEntry) {
  return resourceTimingEntry.transferSize === 0
}

That is, if any bytes came over the wire, then the resource was NOT served from the browser’s cache. Cross-origin resources are a little more complicated. But we can use the rules of the spec (and the rules of physics) to give us a useful heuristic.

function browserCacheHit(resourceTimingEntry) {
  // cross-origin resources are zeroed out
  if (resourceTimingEntry.requestStart === 0) {
    // because physics
    return duration < 30
  }
  return resourceTimingEntry.transferSize === 0
}

The only responses whose timing data we don’t have access to are those from inside cross-origin IFRAMEs, which is not an insignificant percentage of resources on most pages — almost 32% for Alexa Top 100 sites.

New to mPulse, we are collecting and aggregating the Browser Cache Hit Ratio as recorded during page loads. This will be a first-class, built-in metric in the mPulse UI. Faceting on page group will help you plot meaningful percentiles over time.

Compression Ratio

For those resources that either aren’t cacheable or aren’t yet cached, it’s important to compress them to minimize bytes transferred over the wire. Relying once again on the Resource Timing Level 2 spec, we can compare the “encodedBodySize” to the “decodedBodySize.” This allows us to evaluate the compression strategy that we’re employing, be it from build tooling or CDN configuration.

Unfortunately, from a RUM perspective, we are completely blind to both cross-origin resources and those resources from cross-origin IFRAMEs. But for all same-origin resources, we can calculate the compression ratio like this:

function calcCompressionRatio() {
  let uncompressedSize = 0, compressedSize = 0
  for (let {encodedBodySize, decodedBodySize} of performance.getEntriesByType('resource')) {
    uncompressedSize += decodedBodySize
    compressedSize += encodedBodySize
  }
  return compressedSize &&
    uncompressedSize / compressedSize
}

Server-Timing

Server-Timing is a new web performance API that landed in Chrome 65 (and Opera 52!). It allows us, for every request, to pass timing and any other metadata from the server to the script running in the browser. It’s the perfect mechanism for communicating CDN RUM metrics and timers to the client, where we can collect and beacon them back with our other traditional RUM data. Browser support is getting there … we expect it to land in Firefox 59 and a Safari Technology Preview soon! Let’s see how we can leverage Server-Timing to report on CDN RUM.

CDN RUM

“If the service we provide can be put out of business by being transparent with our performance, then we are not delivering enough value to our customers.” — Colin Bendell

But what about those resources that do have to hit the network? Let’s take the leap that you are sitting behind a CDN. How effective is your CDN at offloading your origin? How effective is the multi-tiered caching that your CDN orchestrates? If we were carving up a “blame pie” for the time the browser spent waiting on a resource, how big would the CDN slice be?

Edge Time

Server-Timing: edge; dur=123; desc="millis spent at client facing edge server"

“Edge time” is measured on the CDN machine that sits closest to the browser and is defined as the duration between the moment that the first byte of the request is received and the moment just before the first byte of the response is written – excluding time that was spent forwarding the request to the origin. The edge could have the resource in its own cache, or it might have to go to a parent or sibling — regardless, all of that time would be included as part of “edge time.” Analyzing edge time for base pages and sub resources gives you insight into the work that needs to be done by your CDN.

This duration is measured for all requests that pass through the Akamai network on mPulse-enabled properties, including requests for the base page and all sub resources. Edge time is displayed as a first-class timer along with other RUM-specific timers like Page Load Time, DOM Ready, Front-End Time, Back-End Time, etc.

Origin Time

Server-Timing: origin; dur=123; desc="millis spent forwarding the request to origin"

“Origin time” is measured on the CDN machine that sits closest to the origin and is defined as the round-trip time between that machine and the origin – including time the origin spends handling the request. This duration can be invaluable in diagnosing any latency that is seen in fullfilling requests to the browser.

Like “Edge time”, it is measured for all requests that pass through the Akamai network on mPulse-enabled properties, and is displayed as a first-class timer in the dashboard UI.

CDN Cache Hit Ratio

Server-Timing: cdn-cache; desc=HIT

Server-Timing: cdn-cache; desc=MISS

This is our take on the classic CDN “offload” metric. How well is my CDN offloading requests made to my origin? How is the multi-tiered caching architecture performing to make sure that my assets persist in a cache very close to all of my clients? By communicating cdn-cache status via the response header of every request that goes through Akamai’s servers (for mPulse enabled properties), we can build a picture of how many of your resources were served via CDN cache.

Now, using Server-Timing, we can know exactly by whom all same-origin requests were ultimately served: browser cache, CDN cache, or the origin. It allows you to monitor the cacheability and cached-ness of the resources that make up your pages using RUM. You can set up alerts on cache regressions, watch trends over time and seasonality, and get an overall picture as to how much work your origin is actually being asked to do.

Monitor, Optimize, and Validate

A cloud delivery platform helps you transport and optimize your digital experience to your distributed end users, making it fast, reliable, and secure. The strategy you use to deliver your experience is just as critical as the strategy you use to design and build it. These new metrics in mPulse help you validate your strategy is working, tune CDN settings to optimize delivery, and also give you the visibility to reduce time to resolution should something go wrong.

To read more about measuring what matters and all the new capabilities in mPulse, check out these related posts:

*** This is a Security Bloggers Network syndicated blog from The Akamai Blog authored by Charles Vazac. Read the original post at: http://feedproxy.google.com/~r/TheAkamaiBlog/~3/C-pAhwAK5Y8/how-you-get-there-matters-middle-mile-visibility-to-drive-cdn-strategy.html