Enhancing video streaming quality for ExoPlayer – Part 1: Quality of User Experience Metrics

Authors: Mark Greve, Domițian Tămaș-Selicean

The online video player landscape is fragmented with a wide variety of players across a mix of popular platforms. In the world of HTML5-video players in browsers, there are a number of open-source solutions (e.g., hls.js, dash.js, Shaka Player), as well as commercial offerings which include Akamai’s AMP.

On Android, one of the most popular choices is Google’s ExoPlayer, which will be the focus of this blog post. ExoPlayer is an open-source player developed by Google for the Android platform and distributed under the Apache License 2.0. Although each Android version contains a native MediaPlayer out of the box, ExoPlayer has several advantages: (1) ExoPlayer is open-source, modular, customizable and extensible; (2) it supports multiple streaming formats (e.g., HLS, MPEG-DASH, SmoothStreaming) and features (e.g., Widevine common encryption) and (3) It allows an app to use the same player across different Android versions.

Specifically, in this series we will look at what options you have to improve the quality of the user experience (QoE) by tweaking configuration options in ExoPlayer. In this first post we will define the QoE metrics that we will look at, common video player features, how the features impact the QoE metrics, and we will describe the trade-offs between the different metrics.

In the subsequent posts we will focus on specific ExoPlayer configuration options, and how to tweak to improve certain QoE metrics.

Quality of User Experience

When we refer to quality of user experience (QoE), we refer to the overall experience of a user watching a video stream. Unlike quality of service (QoS), QoE is a subjective matter, thus difficult to measure, or to guarantee a certain level.

We will focus on the following elements that influence the QoE:

Startup time

Startup time is the time period that passes since the playback session is initiated until the playback has started. Put more crudely, it’s the time since the user has pressed Play until there’s video playback on the screen.

The importance of short startup time is highlighted in a report we published in 2016, “viewers will start abandoning a video if startup takes longer than two seconds to begin playing and for every additional second of delay, roughly an additional 6% of the audience leaves. […] with a 10 second delay, nearly half the audience has left.” See the original report for more details.

In practice, the startup time is influenced by two factors:

  1. The player: the startup algorithm of the player, that decides where in the stream to start, and how much to buffer before starting playback. For video-on-demand (VOD) content, the start point is evident — at the beginning of the stream. For live content, this gets more complicated, as it depends on the type of stream, standards and player implementations.
  2. Network conditions: the delivery time, i.e., how much time it takes to download the actual content.    

For this QoE metric, we have chosen to focus only on the first point, the startup algorithm of the player; and separate the delivery time into the next QoE metric.

Measure: the shorter, the better.

End-to-end hand-waving latency

End-to-end hand-waving latency (or hand-waving latency, end-to-end delay, glass-to-glass delay, capture-to-display delay) refers to the time it takes for a frame of a live stream since ingestion to be shown on the viewer’s screen. More colloquially, the hand-waving latency is the time it takes from the moment a person waves a hand in front of the camera until it is seen by the viewer on screen.

The hand-waving latency is an important metric, relevant only for live events.

The end-to-end hand-waving latency can be broken down into three big components:

  1. The “Ingest Time” (also referred to as First Mile) —  the time it takes for the video stream to get from the camera, via the encoder, to the entry point in the Akamai Intelligent Platform (or another cloud).
  2. The “Cloud Time” — the time it takes for the video stream to make its way through the cloud (including any replication, backup, live transcoding that might take place in the cloud).
  3. The “Delivery Time” (also referred to as the Last Mile) — the time it takes for the stream from the cloud exit point until the user’s end-device.

Please check out this Akamai guide to encoding and transcoding to see how Akamai can help you lower your end-to-end latency and this Akamai blog post on the options for ultra low end-to-end latency with chunked-encoded and chunk-transferred CMAF.

Latency-cloud.pngMeasure: the shorter, the better.

Video quality

Video quality is a function of the video bitrate: usually, a higher bitrate means better video quality, clearer and crisper picture, richer colours.

Measure: higher quality is always better.

Bitrate switches

In the case of streams with multiple renditions of different qualities, we refer to a bitrate switch as the change from a rendition of a certain quality to another rendition of different quality. In case the new rendition is of higher quality (e.g., switch from 720p to 1080p), we call this an upswitch. In case the rendition is of lower quality (e.g., switch from 2160p to 1080p), we call this a downswitch.

Recent research has shown that viewers respond negatively to bitrate switches (both down- and upswitches), preferring a constant bitrate even to an upswitch.

Measure: the fewer, the better.


Rebuffering (also referred to as buffering, stalling) is possibly the most noticeable undesired playback event, during which the player runs out of media data, resulting in a pause of the video.

Research from 2016 has shown that that “a viewer experiencing a rebuffer delay that equals or exceeds 1% of the video duration played 5.02% less of the video in comparison with a similar viewer who experienced no rebuffering.” In 2018, Limelight noticed that 28% of viewers experiencing a rebuffering event abandon the playback session. One of the key findings of a 2019 report from Akamai and MTM was that a single rebuffering event could lead to loss of over 85,000 USD in revenue.

Measure: the fewer, the better.

Representing the QoE Metrics

enhancing video image two.PNG

In this blog post series, we represent our QoE metrics using a radar chart, where:

  • the most center level represents the “good” state (e.g., high video quality),
  • the middle level is “unclear”, and
  • the top level represents the “bad” state (e.g., high hand-waving latency).

The chart above shows the state of the ideal user experience: no bitrate switches or rebuffering events, minimal startup time and hand-waving latency, and the highest video quality available.

QoE References and Further Reading

In a survey from 2016 among 351 company managers, the participants have identified the following culprits that affect the user experience in a negative way: (1) video buffering, (2) audio out of sync, (3) blurry/pixelation, (4) slow start, stops mid-play, (5) lagging behind the source. See this infographic summarizing the results.

A research paper from 2016 analyzed over 400.000 YouTube views for more than 900 viewers from over 100 countries found that rebuffering and bitrate switch events (even if upswitches) affect negatively the QoE.

According to a 2017 study, a single rebuffering event causes a decrease in positive emotions (happiness down 14%) and a 16% increase in negative emotions. The study also confirms that video quality matters: in non-buffering video sequences, higher resolutions produce 10.4% higher emotional engagement than lower resolutions.

A 2019 white paper cites a senior manager at a major broadcaster: “When rebuffering is less than 0.5, 90% of the sessions are completed. As soon as you get 0.5-1%, then the number starts to drop — 80%. As soon as you hit 1% you see the rate drop down to 50%.”

Player features that impact QoE

All video players for modern streaming formats (e.g., HLS and MPEG-DASH) have a common feature set. Many of the features are subject to various tradeoffs between QoE and other parameters, which means it’s often possible to improve QoE by coming up with better heuristics. For some players (including ExoPlayer), there’s an easier way to improve some QoE metrics at the expense of others, since the heuristics can be tweaked using configuration options in the player. The two major important features in modern video players that have an impact on QoE are:

  • Bitrate selection: to pick a suitable bitrate when there are multiple renditions in different qualities for a video stream. This feature is known by many names, e.g. adaptive bit rate (ABR) strategy, multi bit rate (MBR) strategy, automatic bitrate selection, etc.
  • Buffering strategy: for deciding the amount of media data to keep in the player’s internal buffer, when to fetch media data, and how much media data is needed at startup before playback is initiated.

Next, we will present in more detail some of the questions and trade-offs that shape the bitrate selection strategy and the buffering strategy. In the next blog posts of this series, we will look at how it’s possible to tweak the behavior in ExoPlayer for some of these key questions.

Bitrate selection strategy

If you were to develop a new video player, then you would face a number of questions to answer when building the bitrate selection strategy. Some of the key questions are listed below:

Which bitrate should the strategy pick at startup?                              

Picking a bitrate that is too high (i.e., one that cannot be sustained) may lead to a long startup time and to many rebuffering events. Picking a bitrate that is too low (i.e., well below what the connection can sustain) means that the video quality will be low and the viewer may experience several bitrate switches before reaching the highest available bitrate that it can sustain.

Startup bitrate -- too high vs too low.png

Which criteria are used for switching up in bitrate?

Switching up too fast will increase the chance of a rebuffering event, if the player cannot sustain the high bitrate. Switching up too slowly will keep the viewer at a low quality bitrate, and potentially increase the number of bitrate switch events.

Switching up in bitrate -- too fast vs too slow.png

Which criteria are used for switching down in bitrate?

If the strategy switches to a lower quality bitrate while the current one could have been sustained, the viewer will unnecessarily experience a low quality picture. If the switch happens too late, the viewer may experience rebuffering events.

Switching down in bitrate -- too fast vs too slow.png

How can you avoid rapid oscillations in bitrate switching?

As mentioned previously, research has demonstrated that bitrate switches, regardless whether upswitches or downswitches, impact negatively the user experience. Thus, it is really important that the strategy avoids unnecessary bitrate switches, and implements a heuristic that avoids sudden changes.

Buffering strategy

For the buffering strategy, there are similar key questions and a short discussion of some of the trade-offs involved:

How much data should be buffered before playback can be initiated?

Buffering too little will lead to a rebuffering event. On the other hand, the more the strategy buffers, the more it increases the startup time and handwaving latency.

Buffering before playback -- too little vs too much.png

When and by how much should the player’s internal buffer be filled with media data?

Keeping the internal buffer always filled to a level (drip style) versus filling it in bursts on an interval basis (or based on other metrics) affects network usage and battery usage of the device. For example, this thread on the ExoPlayer bug tracker reveals that the buffering strategy in ExoPlayer versions 2.9.6 and below is based on the assumption that network operators prefer burst-transfers, rather than drip style (they plan to change this strategy in a subsequent release).

On the topic of by how much to fill up the internal buffer, a large buffer decreases considerably the chances for rebuffering events. However, the capabilities of the device can limit the buffer size. Furthermore, in the case of live streams, there’s a direct correlation between the minimum required amount of media data in the buffer and the hand-waving latency — the larger the required amount of media data, the further the player plays from the live edge of the stream.

How much data should be retained from the previous bitrate when switching to a new bitrate?

Retaining too much media data from the previous bitrate limits the amount of buffer available for media data from the new bitrate. However, retaining too little (or none) runs the risk of rebuffering events in case the player reverts to the previous bitrate (i.e., it cannot sustain the new bitrate in case of an upswitch, it decides that the previous bitrate was sustainable in case of a downswitch).

ExoPlayer Details

ExoPlayer version investigated: 2.9.6

ExoPlayer is a modular open-source player, with the following four components common to all ExoPlayer implementations:

  • MediaSource: defines and provides the media to be player. ExoPlayer has default implementations for HLS, MPEG-DASH and SmoothStreaming.
  • Renderer: consume the media from the MediaSource and renders the media read.
  • TrackSelector: implements the bitrate selection strategy. ExoPlayer provides several default implementations (FixedTrackSelection, RandomTrackSelection and AdaptiveTrackSelection).
  • LoadControl: implements the buffering strategy. ExoPlayer provides a default configurable implementation (DefaultLoadControl). 

In this series, we will show how to configure the TrackSelector and the LoadControl to improve one or several QoE metrics. Stay tuned for the next installment.

*** This is a Security Bloggers Network syndicated blog from The Akamai Blog authored by Akamai. Read the original post at: