Main content

Building a reactive file-based delivery workflow for Video Factory

Andrew Payne

Senior Software Engineer

Tagged with:

The Platform Media Services team build and run the which ingests, transcodes and delivers programme content onto 主播大秀 iPlayer. Andrew Payne, Senior Software Engineer, gives an update on the latest work undertaken by the team.

Video Factory is designed as a collection of microservice components - small independent pieces of software each responsible for a particular step in the overall workflow of processing an item of content. The components communicate with each other asynchronously using messages sent to and read from queues. When a component receives a message, it will perform some task on the item which the message is about, then typically send new messages to other downstream components. In other words, components react when the workflow needs something done.

Deploying Video Factory to the cloud has allowed it to be reactive in other ways. Many workflows begin when a broadcast ends, and when several end at the same time the increased workload could previously have caused delays in content reaching iPlayer. Now, though, it’s easy to expand our computing infrastructure at these busy times and reduce it again when there’s less to do. This flexibility also means we can be resilient against any particular server or network connection breaking down.

Video Factory ingests raw content in several different ways, one of which is file-based delivery (FBD). When one of our suppliers uploads video files to the S3 storage service, S3 sends a notification message to a component known as Dipper. Dipper’s job is to check whether the file meets our criteria for ingest, extract DPP metadata from it (hence “DiPPer”), save that metadata for future use, and finally send a message downstream to the component responsible for actually ingesting the video. DPP - the Digital Production Partnership - is a joint effort by several UK broadcasters to define a set of standards, including ones for video metadata.

Within Dipper, the stages in its processing form a dependency tree, with each step using results from one or more earlier steps to provide its result. This led us to extend our “reactive” approach to encompass not only the way components communicate with each other but also the way an individual component works.

Our development of FBD coincided with AWS Lambda becoming available as a service. Lambda allows code to be run directly in response to a notification, without us having to boot-up and manage a server for the code to run on. While Lambda does now support Java, the language in which most of Video Factory has been written, at the time it required the use of NodeJS.

Often, programs boil down to a series of imperative instructions. In JS, though, all I/O operations are asynchronous, meaning that the result isn’t available straight away. To get the result, you have to register a callback — a function which is to be run when the operation completes. If you have several I/O operations to do, JS code can end up resembling the so-called Pyramid of Doom.

stepA(function (value1) {
    stepB(value1, function(value2) {
        stepC(value2, function(value3) {
            stepD(value3, function(value4) {
                // Do something with value4
            });
        });
    });
});

This quickly gets messy, especially if several steps actually depend on the same value.

Since Dipper does a lot of I/O, we want to avoid a nested callback design. Instead, we just want to define how each piece of data can be obtained, and what the data dependencies are between the various processing steps.

A NodeJS library called Q implements a feature called promises, which makes this possible. A promise is an object that represents the return value that a function (say, f1) will eventually provide. In JS, functions are also objects and can be attached to promises such that when the promise is “resolved” (i.e the return value becomes known) any attached functions will be invoked with that value. Moreover, attaching a function (f2) to a promise gives you a new promise: a promise of the return value of f2.

An example of parsing some XML might look like this:

var promiseOfParser = buildParser();
var promiseOfText = readFile();
var promiseOfObject = Q.all([promiseOfParser, promiseOfText]).then(runParserOnText);

We can attach as many functions to one promise as we need, and combine promises together into new ones (as with Q.all, above) where a function takes more than one input, like our parseXml function here. And so long as the dependencies are satisfied, we don’t care about the order in which steps are executed. When Lambda’s NodeJS interpreter runs the code, it can parallelise operations so one step doesn’t have to wait for an unrelated step to finish. The main module of Dipper’s code is solely concerned with the wiring-together of promises into a dependency tree. When the final promise in the tree is resolved, the whole job has been done. Other modules concern themselves with providing specific bits of data, like the contents of a downloaded file, or the response from a message queue.

When in the future we want to build workflows for new suppliers, it’s likely we’ll want to reuse some of the steps inside Dipper, perhaps adding or removing some others, or changing some dependencies. Having our code structured this way will help make that reuse a reality.

Links




Tagged with: