Main content

Digital distribution: Analytics with logs and heartbeats

Ralfas Jegorovas

Technical Product Manager

Tagged with:

Last month, Federico Benedetto wrote about from and to partner CDNs (Content Distribution Networks) and our audience. Technical Product Manager Ralfas Jegorovas now describes how problems affecting the quality of content delivery to our audience are identified and how the underlying causes are diagnosed, in order to provide the highest quality of service possible for iPlayer, iPlayer Radio and other video and audio content available on 主播大秀 Online.

It's not playing!

Delivering media content to our audiences across the internet is a complex task involving many technologies and many organisations. So when a problem arises it's important for us to be able to narrow down the problem efficiently so that it can be fixed as quickly as possible.

For example, if a video isn't playing for you on iPlayer or if it is showing the spinner (buffering) for a long time, then it could be due to any of the following:

  • a problem with the media player or the operating system it is running on.
  • a problem with the network connection between the device (PC, TV, tablet or phone) and the internet.
  • a problem with a CDN or its connection to the 主播大秀.
  • a problem with the 主播大秀 distribution platforms, Video Factory or Audio Factory.

Where is it broken?

We use two main sources of information to help us narrow down the problem to a specific organisation and, when possible, the particular technology that's at fault.

Media analytics are available during the journey content takes from source to audience

The first of these sources are records - commonly referred to as access logs - created by CDNs describing what content has been sent where.

We have developed internal technology called Logarama that collects this data from CDNs, transforms it and provides visualisations to give us insight into how well content is being delivered, thus enabling us to narrow down the problem. I will describe some of the specific methods we use later in this post.

While CDN access logs give us a complete view on what has been delivered to our audience, they can be a slow way to learn about a problem - with some CDNs only making the logs available to us several hours after the fact.

The second source of information is "heartbeat" messages sent to us at regular 1 minute intervals by your browser or app while it's playing media. They tell us what you’re playing, via which CDN and at which stream quality.

We receive heartbeat messages close to real-time, this makes them an effective way of monitoring our content distribution and spotting problems early. To handle the large volume of heartbeats - which at peak hours are in the order of millions per minute - we are developing a technology called Sawmill.

Unfortunately, we cannot solely rely on heartbeat messages because not all of the devices that can stream our media content can implement our heartbeat reporting mechanism.

This means that we can get an early warning mechanism from the heartbeat messages with reasonably broad coverage across our supported devices, while we can turn to CDN logs to get a full picture of what was delivered.

Needle in the Haystack

iPlayer and iPlayer Radio have , which results in terabytes of logs from CDNs and hundreds of gigabytes of heartbeat messages. To make sense of all this information, we have to look at it in aggregate. For example, instead of looking at each individual video stream, we would combine streams that share any one of these characteristics into a “bucket”:

  • network connection type
  • Internet Service Provider
  • CDN delivering content
  • media player or app
  • type of media being streamed

Once we have these buckets, we can then combine this data in several ways and look for changes in the trend.

Consumption Monitoring

We use the heartbeat messages to monitor consumption enabling us to detect a change to the number of viewers or listeners at any point in time. If this number falls below what we regularly see, then there’s likely to be a problem and Sawmill will alert us to when this happens.

However, Sawmill only helps identify problems once they are already affecting a large proportion of our audience - when they can no longer play content. In the future, we will be improving Sawmill to detect more subtle changes so we can detect and react to problems quicker.

Bitrate Distribution

Another useful trend to monitor is the spread of iPlayer requests across the available stream quality options - we refer to this as the bitrate distribution. Similarly to the consumption trend, these graphs are normally very regular, which makes them useful for spotting unusual changes.

Bitrate distribution graphs are useful for spotting unusual changes

Above is an example of the bitrate distribution for a particular subset of our video distribution. Some things to note are:

  • The daily pattern, which is somewhat different on the weekend
  • A change in the “roughness” of the lines in the graph since the morning of the 8th

In this case, the change in “roughness” can be an indication that media players may be automatically changing stream quality more often than normal, resulting in a poorer experience for the viewer. To verify this we would look at the data received by Sawmill.

We will be writing more about how Sawmill deals with half a million heartbeats per minute in the near future.

We're Hiring

If the type of work described in this post interests you, we're looking for experienced engineers to help us design, build and operate the 主播大秀's media distribution infrastructure. You can .

Tagged with: