Technology + Creativity at the �� Feed

Introducing machine-based video recommendations in �� Sport

Robert Heap — Tue, 14 Sep 2021 12:07:03 +0000

From this week we are adding a new feature to our short form video pages on the �� Sport website.

Related clips

On every video page in �� Sport you’ll see a related links section. This is usually put together by our editorial colleagues, a routine task which can be time consuming. They have good knowledge about related content, but cannot know about everything, which means that the audience do not see some content that might be relevant.

Short Form Video & Datalab

With this in mind we have worked with Datalab (our in-house �� machine learning specialists) to create an algorithm-based video recommendations engine which we hope will help our audience see more of the content they love whilst reducing the editorial overhead of creating a set of relevant links.

Algorithm-based recommendations

The engine works by combining content information about the clip with more information about user journeys from across the ��. This combination of multiple sources should provide a more relevant list of videos for our audience to watch next. This is the first cross product engine supporting user journeys across News and Sport, which means that you may see news, sport or a combination of both in the module. This is the first version of the short form video recommender, there will be more improvements to come, as we continue to develop it.

Launching the new recommendations

The plan is to launch this new functionality on all American Football clips the week commencing 13 September to give the engine a trial with freshly published content and to give us the opportunity to measure its impact. Provided all is well, we will gradually release this feature across all �� Sport videos. After that we will begin to roll out the same engine for �� News. Beyond that we will continue to work with editorial colleagues to improve it over the coming months.

If you have any feedback on this new video experience, please leave your comments below.

Philip 21 - an interactive story exploring race, love and modern Britain

Joey Amoah — Tue, 20 Jul 2021 13:12:44 +0000

Philip 21 is a brand new narrative object-based media (OBM) experience from that takes the premise of a date with a young black man and turns it into an introspective examination of race, love and modern Britain. In this blog post, we look at the techniques and mechanics that underpin this and other branching narrative experiences, examining how they keep audiences engaged compared to traditional media.

In this regard, two specific areas of the project need to be examined; the fabula and syzhuet of the authored experience; and the dual narrative created by having the outcome of the story that each audience member sees be dependent on the choices they make.

�� Taster - Try Philip 21

The 'fabula' is a literary term used to refer to the raw material of a story, what takes place and the chronology of events. It describes the skeleton of the experience, story beats and the audience journey. The 'syzhuet' describes how the story is organised and presented to audiences. It covers everything from the perspective the story is told from to the arrangement of actors and the cinematography of a scene.

In traditional media, the audience is passive and cannot interact with what is being shown. Storytellers may play with the audience journey and how the story is presented, but they never surrender control of the constituent elements, and they do not offer alternative and equally valid branches. halucid_ have had to wrestle with the challenge of making the fabula and syzhuet work together while also giving up a degree of creative control.

The fabula and syzhuet of the authored experience

In the case of Philip 21, the fabula could be described as being broadly linear. The audience arrives for a date with Philip; they engage in conversation with him and are ultimately asked if they would like a second date or not. However, this simple sequence is not what is received by the audience. What does take place is a fragmented series of scenes where progress can only be made through user choices and engagement. halucid_ plays with the fabula and utilises the narrative setting, the conversational structure and the first-person perspective to drive the experience forward. Philip 21 takes the narrative setting, that of a date, and uses it to establish the boundaries of the world and to inform us on how we should behave. Since many of us will be familiar with this experience from our own lives, halucid_ leans on the understanding of social norms to get audiences to participate in the way that they desire. Building on this, halucid_ uses the codes and conventions of conversations to create an internal metre that demands our engagement. It is only through responding to Philip’s questions and internalising his responses that the narrative can advance. In so doing, it bids the viewer to suspend their disbelief and enter into the story world. Finally, the first-person perspective means that the audience is always focused on the subject, unable to look away, and the fixed camera position creates a sense of intensity, intimacy and immediacy, which is further heightened by the one-to-one interaction enjoyed with Philip.

These creative decisions help move the narrative forward and straddle the line between the fabula and how the authored experienced is presented. This is particularly important to the narrative OBM experience because both content creators and audiences are jointly responsible.

Philip 21 can be navigated in several different ways, with each choice offering a different route through the experience. These routes have been created by halucid_, but audiences have the freedom to select which paths to follow. They can choose a path from the outset or change course at any point, meaning that the syzhuet presented is unique to each viewer. Philip 21 has no primary path, and as a result, all routes through the experience and all outcomes received are equally valid.

The dual narrative

The second interesting element worthy of discussion about Philip 21 is how it goes about creating a dual narrative. On the one hand, Philip 21 is an authored experience, a story that halucid_ is seeking to tell, and on the other, there is the audience experience and what the audience bring and take away from the experience. The audience is integral to delivering Philip 21, and by taking part, they are positioned not as passive watchers but are co-protagonist alongside Philip. The viewer becomes a character in the story and must decide how to approach the situations in the story.

(Jason Dodd Photography)

The viewer must decide if they will approach the work sincerely as their authentic self or assume a persona/play a role that goes against type. This choice, whether conscious or unconscious, determines where the story goes. This way, what is presented back to the audience is a reflection of choices made. Philip 21 could be considered as having two narratives. The authored story and one reflected to the audience based on how they interact with the experience.

What comes next

Narrative OBM experiences are still in their infancy, and we are only beginning to understand the impact of content creators giving over control to audiences. Altering the syzhuet of a story presents a wide array of creative opportunities for creators and audiences, but these choices will need to work in tandem with the fabula and not distract from the story. We are unsure of what works best and are eager to see further experiments in this area.

Likewise, the interplay between the authored experience and the one reflected back at audiences is something we are keen to explore in more detail. What would happen if additional choices were offered to audiences? What would occur if a story was told episodically and not in one session? How would audiences experience this? The only way to find out is to build and test these kinds of experiences.

Building a WebAssembly Runtime for �� iPlayer and enhanced audience experiences

Juliette Carter — Mon, 01 Mar 2021 10:19:41 +0000

At �� Research & Development, we are investigating how we evolve our current multimedia applications to move beyond video by using object-based media (OBM). OBM allows us to develop future audience experiences which are immersive, interactive and personalised.

There is an ever-increasing number and range of audience devices capable of playing back OBM experiences. The challenge we now face is universal access - How can we get all members of the audience to enjoy OBM experiences on any device, and how do we do this sustainably and at minimal cost?

Our Render Engine Broadcasting (REB) project is investigating new technologies that will allow the �� to deliver these OBM experiences at scale to all of our audiences, no matter what device they use. Our ultimate goal is to deliver real-time and fully rendered experiences on any device or platform and write the software to do it only once. We have been investigating the use of WebAssembly as a cross-platform technology for this.

What is WebAssembly?

(wasm) is a Universal Binary format designed as a sandboxed environment and a portable compilation target, which means that the same wasm module can run securely on multiple platforms. A number of strongly typed languages such as C/C++, Rust or AssemblyScript can compile to WebAssembly, making it language agnostic. This makes it an attractive option for adoption in the industry as it enables developers to use languages they already know to produce wasm binaries.

When WebAssembly was first developed a few years ago, its target platform was the web. The aim was to compile fast and efficient system-level code and have it run in the browser. Compute intensive applications, such as real-time interactive rendered graphics, could be run in a web browser at near-native performance. This also enabled some native applications to be ported to the web, increasing their reach and usage. These include , which renders 3D representations of satellite image in the browser, and , which now offers a WebApp to create and edit CAD drawings.

In the last couple of years, WebAssembly outside of the browser has been gaining traction. A number of native wasm runtimes have been developed, which has enabled the use of WebAssembly for microservices and server applications. In 2018, the website security company announced the use of WebAssembly on their edge workers, allowing users to deploy secure and fast serverless code compiled to wasm. And the edge cloud platform provider offers new wasm-based edge computation using their native runtime Lucet.

The portability of WebAssembly across multiple platforms and its security model are the key reasons for �� R&D’s interest in using this technology as a compilation target for media experiences. As a public service broadcaster, we need to deliver value to all of our audiences, regardless of the device they use. Where traditionally, a codebase for each target platform and a different team to maintain each codebase would be required, the use of WebAssembly potentially allows for a much more sustainable developer ecosystem. It enables media software applications to be created once, from a single codebase, compiled to WebAssembly and deployed on any client or server platform depending on the capabilities required. It also offers numerous advantages compared to previous multimedia or cross-platform technologies (such as Flash or Java Runtime Environment). Indeed, it is language agnostic, security-focused, has predictable performance, and works inside and outside the browser. WebAssembly is also an open standard, which encourages its adoption.

How have we used WebAssembly?

We wanted to demonstrate how we could use WebAssembly to deliver media experiences that can run on many target platforms built from a single codebase. To do that, we implemented an example media application written in C++, which we compile to WebAssembly, giving us a wasm module. We designed this application to look like a version of �� iPlayer, allowing users to select content, watch video programmes, AND play OBM experiences. We call this application the Single Service Player (SSP).

An example of how object-based media experiences could appear within �� iPlayer.

To run our SSP wasm module, we needed a wasm runtime. The SSP makes use of some low-level media functionality, which isn’t scoped by the WebAssembly specification. To enable wasm modules to make use of these low-level multimedia capabilities, they need to be implemented in the runtime and made available to the wasm module through a set of imports. Examples of such capabilities include:

Windowing and rendering — In most cases, a multimedia application will have some graphical elements to it, which requires things to be drawn in a window (such as video frames or a UI screen).
User inputs — An interactive multimedia experience expects user inputs, such as keyboard or mouse events.
Media encoding and decoding — To efficiently encode and decode media (such as video frames or audio packets), it is preferable to use the host’s hardware resources where possible.

As there is currently no WebAssembly runtime that offers these media capabilities, we've decided to create our own.

There are already some efforts in specifying ways a wasm module can talk to the host. proposes a set of standardised POSIX-like syscalls (the programmatic way in which a computer programme communicates with the host system) for libc functionality, mainly file handling and networking. These are called from the wasm module and implemented in the runtime.

We decided to use a similar approach to allow our SSP wasm module to communicate with the host, enabling it to have access to low-level media functionality. This involved identifying all the platform-specific media capabilities that could not be compiled to wasm and implementing them in the runtime. These capabilities were then made accessible to the wasm module through a set of platform-independent syscalls passed as imports.

This figure illustrates the whole process, from writing a media experience as software (such as the SSP) to running it as a wasm module on any device. The steps are detailed below.

The first step was to design the multimedia sys-call API behind which we would implement our cross-platform multimedia capabilities in the runtime. It needed careful consideration to ensure it was thread-safe and honoured the wasm security requirements around memory access. In the figure above, we use reb_decode_video() as an example syscall, which our SSP application can use to access low-level multimedia functionality, such as utilising the system’s hardware for video decoding.

Our SSP code was compiled to wasm using the clang compiler and the wasi-sdk toolchain, and the required syscalls are added as imports to the wasm module.

We then built the multimedia wasm runtime, consisting of two parts. The first one is the execution environment for wasm modules, which allows us to load and run a wasm module. For this, we embedded , a project based on , which generates the machine code for the target platform from the wasm binary.

The second part of our runtime is the implementation of the low-level multimedia functionality. For this, we created a cross-platform C++ library with input detection, networking, windowing, graphical rendering, and media decoding, which sits behind our carefully designed syscall APIs. We compiled our library for several target platforms, such as Linux, macOS, Windows, Raspberry Pi and Android. We also wrote some glue code to connect the two parts.

Where do we go from here?

A wasm runtime capable of executing multimedia applications opens a lot of possibilities, principally around flexible compute. Flexible compute allows us to run computationally demanding applications by dividing up the workload between available resources. These resources could be located locally (a laptop, games console or phone in your house), in the edge, or the cloud.

As we move towards delivering fully rendered real-time interactive experiences, the flexible compute approach becomes an attractive solution to the computational demands of such applications. We could, for example, consider segmenting a rendered frame into several tiles or objects, each of those rendered on a separate available compute resource. Many systems approach this problem by running specific compute tasks in containers across the available devices and platforms. We hope to use our work and accrued knowledge in developing the wasm multimedia runtime to investigate a viable alternative to the container approach for distributed media applications. We are looking into using wasm modules to perform secure and fast computation on any remote compute nodes.

Our runtime, capable of performing media services such as rendering and decoding or encoding of rendered video frames, can be used to display the final experience to the user on a client device and to execute the remote computational tasks as wasm modules. Using WebAssembly combined with a flexible compute approach, we hope to develop technology that allows the audience to access any future experience, regardless of their devices at home.

The complexities of creating a new 'follow topic' capability

Dave Lee — Wed, 13 Jan 2021 13:49:58 +0000

Exploring how the visual emphasis of our Follow UI works alongside other calls to action.

The �� is committed to creating a truly personalised experience for everyone. We believe that a more personal �� helps us build a deeper connection with our audiences, and helps us to extract more value from online. We have so much to offer across our products and services and one of our challenges is to help our audience discover the content we have.

We have a number of approaches to achieve this but alongside automated personalisation like smarter recommendations, we want to enable our audience to tell us what they want to see more of. So, we’re exploring the idea of enabling audiences to follow topics, a new capability that could work as a service for all of our teams and products.

This consistent and collaborative approach is not only more efficient, but would allow users’ preferences to follow them wherever on the �� they are. This blog posting will try to convey the complexities around creating this new follow capability.

Context

The �� creates thousands of different content items on a daily basis, ready to be consumed by people in the UK and all around the world. But how might we improve the discoverability of that content? How might we offer new content to those who are interested in it?

The �� team started working on the idea of ‘topics’ in summer of 2019. Working with colleagues from around the ��, we have developed the technology to present a Pan-�� topic to the audience.

Very early on in the work on topics it was recognised that there was an opportunity to harness the power of topics by allowing the audience to choose which topics they personally find interesting so we can personalise their experience of the �� and notify them when relevant content is available. Examples of this already exist with the �� such as the My Sport functionality in the �� Sport app; subscribing to podcasts in �� Sounds and adding a programme to your favourites in �� iPlayer.

The prior implementations have been undertaken in a product-specific fashion, but we are building ’follow’ in a unified way, aiming for a seamless audience and technical approach across all products. Behind the scenes, the mechanism for all these types of ‘following’ is common. How might we create an experience which harnesses all these ‘follow’ events to aid in content discovery by putting new and relevant content to our audiences quickly and efficiently?

Principles

For further background, read Matthew Clark’s blog post.

Platform Consistency

All new developments will occur within the ��’s existing web technology platform unless it is not technically possible. WebCore utilises a mono-repository and because all the teams can access everyone’s code, it becomes much easier to be consistent across all our teams’ work. The graphical components are all stored in a central repository and are available to all teams to utilise and integrate into their own developments.

Reuse

All existing solution building blocks need to be taken into account and reused where applicable. If anything new is developed, it must be straightforward for other teams to utilise or develop further to meet their own needs.

Collaborate

Collaboration with other teams is key to this project’s success. This work is being carried out within a complex organisation and as such, numerous teams and colleagues will be stakeholders in it. Openness and a readiness to collaborate will pay dividends later.

The Key Considerations and Complexities:

What concepts should the audience be able to follow?

We have chosen to do this, initially, at the topic level. For example, audience members will be able to follow their favourite musical artists, news correspondents or even locations. As topics are representations of the concepts that the �� is creating content on, this feels like a more natural fit given that new content will keep being added to inform audiences on. While standalone content items can be revised (often with minor or major edits) after publication or when the story develops, this feels like too granular a level for a user to specifically follow, making a save or read later function more appropriate. To ‘Follow’ an article, for example, is more of a ‘read later’ function.

We need to understand how a ‘follow’ button should operate and how it changes state based on the current context it’s displayed within.

We need to consider if the audience member is already following a topic or not, if we need to offer the ability to unfollow that topic or not.
We also need to consider preferences we’ve already been given for other follow-like implementations and if it’s appropriate or not to use that data.
It might also be applicable to present a follow count to indicate how popular a particular topic might be.
We will also make the button mechanism available to our app ecosystem to keep the mechanism consistent across all our platforms.
We need to build the logical components in such a way that they will operate in the same way regardless of where the follow button is presented.
For example, if the follow button is offered on a Pan-�� topics page like Manchester or Mark D’Arcy we must ensure the behaviour is consistent and intuitive.
How will it work within accessibility guidelines?
How will it work for our apps?

Legal constraints and how this will operate within the law

Considering GDPR and previous preferences provided to us by our audience members, can this new capability function within the law and if not, what must we do to ensure we are legally compliant and transparent to our audience?

How will this operate within the overall ecosystem of the �� Online estate?

The �� Online estate is complex and ever shifting. We must ensure we abide by open standards and reuse existing services to ensure a good fit today and for the future. Also, we must collaborate with teams internally to ensure our internal stakeholders are fully aware of our progress.

How will we ensure we don’t create echo chambers for audiences?

We need to carefully consider how we present followed content, so that it is presented coherently alongside both recommendations and related content.
Followed content will be shown next to editorially curated selection, therefore, we will also need to consider how we deduplicate and prioritise content.

How do we maintain adequate levels of data to be functional and keep that minimal dataset secure?

We have a very robust data security process to satisfy. At every design step, security of the data is a key requirement.

What data do we need to record to adequately record a Follow action?

This data must then be easily parsed and follow our own internal standards and schemas.
This data format must be compatible with the User Activity Service — our centralised location for storing user activity data — and be registered with it.

Where is the result of following one or more topics displayed? i.e. how are followed topics surfaced?

This is a key question for any product team. The ��page team will have different needs than for example, the Sport team. But, the underlying system must be able to cope with this in as an efficient manner as possible.
A key consideration is how much should we develop for each delivery? We could work for a long period and deliver in a more big-bang approach, or does it make more sense to develop in small iterations accepting a changing interface and functionality to our audience?

How will this level of personalisation work within the WebCore stack’s caching strategies?

This is less of a concern for our team as we’ll be using the outcome rather than heavily contributing to it. However, it is of great importance and so we will keep in close touch with the team working on it to ensure no assumptions are made about what is needed.

Do we have the technical capability to process a potentially complex set of follow activities?

Our �� Data Capabilities Team is developing specific technology based on AWS ElasticSearch for this very purpose.
This system will capture metadata about our content from various internal sources to allow us to query that metadata in novel ways to find the content we need on a per user basis.

How will the resulting follow actions be surfaced?

The Follow Product and User Experience team have broken down the approach into four separate phases:

Audience members will be able to see what topics they’re currently following, follow new ones and choose to unfollow others from a topic index page or the �� homepage.
Our audience members will be able to see the topics they have followed across the News and Sport websites and apps. This will come with the new ability to manage what they have followed across the ��.
Also, our audience will get the ability to follow a topic from the content level i.e. from within an article.
Follow will become a unified approach across the �� online estate helping breakdown any barriers between content and aiding discovery for all audience members.

One outlet for follow is in alerts and notifications. There is a team working hard on understanding how best to achieve this whilst offering the most value to our audience. It’s a complex area so we’re keeping close on that work to ensure we’re ready for it.

Conclusion

It is early days for ‘follow’, but the ground work is already in place for it to be a great success. The �� topics product is maturing and is already powering the majority of the Sport website and app. It also powers the �� homepages and we’re currently working closely with the News team to help them bring the �� News topics up to date and into the WebCore world.

We’ve detailed our key principles and considerations around the follow action and how we plan to implement it. It’s a challenging area to work within but all the building blocks are sliding into place, and we’re starting to look at how and when we can roll out the first phase. Following a topic is something people understand, as they’re used to doing it on social media.

We’re working hard on bringing this functionality to our audiences as soon as we can make sure it meets the level of quality our audience expects from the ��. We aim to bring together all our content to all our audiences in such a way that the traditional silos of content are blurred to the point of invisibility.

Me, you and the machine

Matthew Postgate — Mon, 20 Jul 2020 09:24:37 +0000

We’re relying on a wide set of actions and tools to help us deal with the current pandemic. The �� is playing its part to inform, educate and entertain. And for us and others, digital technologies are playing a key role. In this blogpost, I discuss the ��’s approach to one of most important set of digital tools: 'machine learning'.

The term machine learning (ML) covers a range of computer systems which learn from experience. With Covid-19, we know ML techniques are being used for contact mapping and predicting the effectiveness of drugs.

One reason ML is being deployed here is that it is being deployed everywhere. Tools that can be trained on vast data sets and learn and improve as a result are behind social media feeds, computer vision and robotics, financial and weather models, and of course the improved machine translation and voice recognition systems that many of us use every day.

Many of these areas are directly relevant to the �� and its day-to-day operations. The Design and Engineering division I lead has been looking at them closely for some time, exploring ways in which machine learning can help us to enhance what the �� offers our audiences.

We believe that ML can help us respond to audience expectations, especially from ‘digital native’ younger audiences. A key area is content discovery and recommendations. Audiences no longer accept having to put significant effort into searching for what they want. They want a personalised offer, which feels both relevant and fresh – something ML can help us to provide.

And ML can help us innovate. There is potential to transform the ways we make programmes, the way we run as a business, and of course the ways we do our journalism. Examples include speeding up video compression or finding ways of detecting and flagging disinformation.

It's not surprising that we should be looking at ML in this way: the �� has always worked with new technologies to offer the best user experience we can. This is why we created iPlayer and Sounds, and developed approaches like our Global Experience Language (GEL), the ��’s shared design framework. As ML has developed, we have started to explore how to use the technologies responsibly and efficiently. We have also developed a set of principles governing our deployment of ML technologies.

I want to be clear about where our ambitions lie. We are not Microsoft, Google or Baidu. We don't have their amounts of data, money or computing power. We are not aiming to compete with them by developing our own machine learning frameworks, or performing advanced research in novel algorithms.

But the �� is a fertile environment for applying ML techniques. We have unique types of problems to solve, and we have the ability as an organisation to draw from almost one hundred years of experience in storytelling. We are ambitious in the desire to explore the positive impact of applying ML to our operations.

What does this mean in practice?

The first thing we think through is whether a ML solution is needed. We then assess the benefits of each application to both individuals and society. An example would be designing the ��’s content recommendation engines to broaden our audience’s horizons. This is because we think there is both individual and public value in discovering new perspectives, music or experiences – not simply finding more of the same.

We also ensure that we use our resources efficiently. ML requires a solid data platform and a consistent and modern approach to experimentation across our portfolio of products and services. It is important to maintain a central and coordinated approach so that, as an organisation, we can deploy scarce capability in the most effective way and optimise on learning quickly.

We pair our Machine Learning capabilities with human judgement and diversity of experience. This applies both from a technology development perspective - where we bring together technical experts (e.g. data scientists, UX designers, product specialists) with editorial, policy, legal and R&D colleagues, and in terms of our audience experience - where the ��’s automated curation will sit alongside human curation.

Finally, we recognise the need for collaboration and co-operation with other industries and organisations in maturing our approach with ML. Collaborations which allow media and technology companies to bring their expertise together in the public good will create more powerful experiences than anything we can do alone.

Machine learning has enormous potential to transform not just the �� but every other organisation. I want us to use it to connect with people more effectively, to bring out the strengths of our storytelling and to find new ways of communicating our trusted journalism. I hope the power of machines will help me and my colleagues create something new, compelling and distinctively �� for each member of our audience.

How metadata will drive content discovery for the �� online

Jonathan Murphy and Jeremy Tarling — Wed, 15 Apr 2020 13:53:12 +0000

Jonathan Murphy, editorial lead for metadata and Jeremy Tarling, lead data governance specialist in Digital Publishing, explain what's being done to create a common metadata structure.

The ��’s online portfolio has been built up over more than 20 years into a rich and varied collection of websites and services - but with all this content, it’s sometimes difficult trying to find some of it, let alone manage it all. As a result we have lots of hidden gems that aren’t being surfaced, and that’s something that we’re trying to fix with a new content discovery strategy.

One of the challenges we’re facing up to as we rebuild our digital portfolio is how to make more of our content discoverable and personalised to more of our audiences. That’s particularly true of the under 35s age group who now have an array of competing platforms which do a great job in building algorithms to attract their attention.

Underpinning this strategy of discovery and personalisation will be establishing more detailed metadata that describes all of our content using the same terminology and the same tools and data model.

Metadata is the background info that describes the things we make. It can come in all forms, from technical metadata such as which camera was used in a film shoot to promotional metadata used to describe the plot of a programme. For the remit of this project, we’re focusing on what we call descriptive content metadata - tags that describe what an asset (e.g. an article, programme or TV/Audio clip) is about or who/what it mentions. That’s already used in areas like the �� News and �� Sport websites using data architecture called Dynamic Semantic Publishing that was created for the London 2012 Olympics, and now drives many thousands of subject-based aggregations, or Topic pages.

There are also categories of programmes on �� Sounds and �� iPlayer, which use a mixture of genres and formats contained in the PIPs database that supports our vast online library of programme information. As a result of these two data silos, and their limitations, it’s difficult to offer audiences any pan-�� experiences or anything that requires an in-depth understanding of the content.

Common Metadata

So we need to go further than that, in order to offer material that covers both the breadth of our online content and to suit everyone’s tastes and needs. Here in the ��’s Digital Publishing team, we’re developing tools that make content description possible at all stages of production across our portffolio, and new vocabularies that allow for richer descriptions of our content.

We’ve already worked with the Sounds team, who have used our new tags and curations to create some of their Music Playlists which give you a soundtrack to suit your mood, whether that’s ‘chilled out’ ‘feel good’, music to dance to, or music to focus your mind.

To make this possible we’ve developed a concept that every piece of content has a basic set of common metadata associated with it, that it carries around wherever it’s surfaced across the ��’s portfolio - whether that’s in Sounds, iPlayer or on the �� News homepage. We’re storing this set of basic common metadata in what we call a ‘Passport’, and to create and manage this metadata we’ve developed a tool called Passport Control.

Using the simple graph model first developed for the ��’s 2012 Olympic’s coverage, we create subject-predicate-object triples to describe the nature of the relationship between an asset (the subject) and a tag (the object).

In the above example this asset has been described as being “about” some things - this kind of subject based tagging is well established at the ��, especially in our journalism output. But we have added three new predicates: “editorial tone”, “intended audience”, and “genre”.

Each predicate can be used with an associated controlled vocabulary of terms. In some cases these controlled vocabularies are taxonomic hierarchies (like �� genres) while in others they are simple lists of terms that we have developed to describe our output in ways that make sense to us and our audience.

These new types of metadata can be used to make much richer collections of content, either as manual editorial curations or algorithmically generated recommendations.

Our colleague Anna McGovern explains further some of those challenges we face at the �� here in building our curations and recommendations, building on our public service values. With the amount and variety of material that we produce, from news articles to music mixes, live events to boxsets, we think we’re in a good position to provide content for all kinds of different tastes.

We’ll update you more about metadata developments, curations and recommendations as these features begin to roll out on �� Online over the coming months.

Understanding public service curation: What do ‘good’ recommendations look like?

Anna McGovern — Tue, 17 Dec 2019 13:24:25 +0000

Unless you know exactly what you’re after, finding the �� content that is exactly right for you can be a little like looking for a needle in a haystack. Some challenges we face at the �� are:

1. We produce a vast quantity of content, at a rough estimate, around 2000 pieces of new content a day. In terms of our ‘shop window’, we have a good number of promotional areas and slots – on homepages, in our apps, on our schedules and on our social media feeds – but simply not enough to accommodate all our content.

2. When content makes it to a prominent promotional slot, it will, of course, get good traffic, but there is no guarantee that the person for whom that content is most relevant will see it at the time. And once that content loses its place it will be hard to find, indeed, no one will know it exists unless they are armed with determination and the greatest of digital search skills.

3. Colleagues working in curation have an incredible ability to seek out the best content and ensure that it is labelled in a way that maximises its impact, but they simply can’t read, watch or listen to all of our output. There is just too much of it.

So far, so familiar.

Personalised recommendations, fuelled by the power of machine learning, are what every forward-thinking media and tech organisation is doing to put the content that audiences would most enjoy right in front of them. The �� has automatic recommendations in , on and on some news language services like , which use strategies like content similarity, popularity and collaborative filtering.

The curatorial challenge for the �� becomes more interesting and complex, because our guiding principles relate so strongly to delivering public service value to the audience. We cannot simply work out what would get the most clicks, and show that to our audiences (although ensuring content is attractive to our audiences is important). Instead, we are required:

To provide impartial news and information to help people understand and engage with the world around them
To support learning for people of all ages
To show the most creative, highest quality and distinctive output and services
To reflect, represent and serve the diverse communities of all of the United Kingdom’s nations and regions and, in doing so, support the creative economy across the United Kingdom
To reflect the United Kingdom, its culture and values to the world

These requirements, written into our are why editorial values are woven into our Machine Learning Engine Principles and why those working in machine learning work closely with editorial teams. I recently conducted a deep exploration of what public service curation means to an organisation like the ��, so that we can begin to identify some of the signals that can help us as we build next level public service recommendations services.

Around 80 editorial staff, involved in some way with digital curation or content creation, took part in the discussion. Around 40 Design & Engineering colleagues observed those discussions, mostly those working in areas related to data science and engineering, metadata and research. Five criteria for public service curation emerged:

1. We want our content to reach and engage as large an audience as possible. We have a role in the national conversation by bringing the most important and resonant stories of the day to the attention of our audiences, as well as prioritising content, that is popular and has universal appeal for the greatest number of people. Examples of this might include content that brings people together like Strictly Come Dancing or content of global importance like Seven Worlds, One Planet, both of which have an impact on the national and cultural conversation. Popular and impactful content also represents good value for money.

2. In a seemingly contradictory move, we don’t always optimise for peak popularity. We make and promote content that aims to appeal to different audiences, groups, communities, regions and perspectives. We want to showcase content that feels personal. Take the podcast Netballers, which is about a sport with less national impact than, say, Premier League football. In terms of feeling relevant, it’s a win for women, young and BAME audiences. And it reflects an aspect of British culture. By its nature, it doesn’t get as many downloads as the Peter Crouch podcast, but it serves to inform and entertain those with a passion for netball.

3. We strive always to bring audiences something new - newly released music, new writers, new presenters, untold stories, news about events and stories of national significance, information about emerging technologies, research, discoveries, perspectives, major drama series, ways of storytelling. This quest for the new is the kindling that starts the national and cultural conversation in the first place. The �� produces a lot of this type of content, but it is not easily identified by engines which are based on collaborative filtering alone*.

4. We provide useful, helpful, practical information, explainers and fact checks, which makes us a trusted source of information: that could be a news story, or revision notes for a Chemistry GCSE, or recipes for a quick mid-week meal, or sports results.

5. We have enormous breadth and depth which means that there is something for everyone: there is variety by topic, tone, format, duration, location, level of expertise and age suitability and relevance.

Our long term ambition is to use this thinking to build recommendation systems which can broaden our audience’s horizons, providing different perspectives and stories and experiences that they might not otherwise have come across. But this is complex - and a recommender that can effectively deal with as multi-faceted an editorial issue as impartiality is extremely challenging and will take significant time to develop.

For now, curation in the context of our recommenders involves upholding our editorial values and finding ways to surface the most relevant and compelling content for each user. All the content the �� makes for our UK audiences, one way or another, is public service, so our recommendations will of course always have a public service flavour.

I’m excited that we have already built models with business rules about increasing breadth (in the Sounds recommender) and depth (in the World Service recommenders), and as well as reflecting editorial values around sensitivity. For example, on the Sport recommender, due for release next year, we’ve taken the curatorial decision that content from rival teams will not be shown together in a set of recommendations.

We can learn so much about public service by sharing the editorial point of view as we iterate and refine our approach. I’ll be working in collaboration with editorial colleagues and data scientists to ensure these public service curation criteria inform the ��’s future recommendations engines.

Lastly, we also recognise that the power of machine learning - of which recommendations is a part - can only get us so far. Machines cannot understand all the subtleties, complexities and nuance of editorial decision making. An algorithm will have trouble identifying what is entertaining or fresh or authentic without significant human assistance. A machine can help only up to a point to accurately tag content in a metadata system before a human verifies the machine’s choices and hits ‘publish’.

So at the ��, we’ll be maintaining a human hand in content creation and discovery. We need both humans and machines to best serve our audiences; editorial colleagues are highly skilled at making and promoting our content and machines can help amplify those skills. More and more, what recommendations can do is help locate the needle in the haystack in the first place - and exactly the right needle for you.

*Which is why we have built a factorisation machine for our Sounds recommender which combines collaborative filtering with content matching. For more detail see Developing personalised recommendation systems at the ��.

Scaling responsible machine learning at the ��

Gabriel Straub — Fri, 04 Oct 2019 09:32:44 +0000

Machine learning is a set of techniques where computers can ‘solve’ problems without being explicitly programmed with all the steps to solve the problem, within the parameters set and controlled by data scientists working in partnership with editorial colleagues.

The �� currently uses machine learning in a range of ways – for example to provide users with personalised content recommendations, to help it understand what is in its vast archive, and to help transcribe the many hours of content we produce. And in the future, we expect that machine learning will become an ever more important tool to help the �� create great audience experiences.

The �� was founded in 1922 in order to inform, educate and entertain the public. And we take that purpose very seriously. We are governed by our and public service is at the heart of everything we do. This means that we act on behalf of our audience by giving them agency and that our organisation exists in order to serve individuals and society as a whole rather than a small set of stakeholders.

With Machine Learning becoming a more prevalent aspect of everyday life, our commitment to audience agency is reflected in this area as well. And so in 2017, we submitted in which we promised to be leading the way in terms of responsible use of all AI technologies, including machine learning.

But what does this mean in practice?

For the last couple of months, we have been bringing together colleagues from editorial, operational privacy, policy, research and development, legal and data science teams in order to discuss what guidance and governance is necessary to ensure our machine learning work is in line with that commitment.

Together, we agreed that the ��’s machine learning engines will support public service outcomes (i.e. to inform, educate and entertain) and empower our audiences.

This statement then led to a set of �� Machine Learning Principles:

The ��’s Values

1. The ��’s ML engines will reflect the values of our organisation; upholding trust, putting audiences at the heart of everything we do, celebrating diversity, delivering quality and value for money and boosting creativity.

Our Audiences

2. Our audiences create the data which fuels some of the ��’s ML engines, alongside �� data. We hold audience-created data on their behalf, and use it to improve their experiences with the ��.

3. Audiences have a right to know what we are doing with their data. We will explain, in plain English, what data we collect and how this is being used, for example in personalisation and recommendations.

Responsible Development of Technology

4. The �� takes full responsibility for the functioning of our ML engines (in house and third party). Through regular documentation, monitoring and review, we will ensure that data is handled securely. And that our algorithms serve our audiences equally and fairly, so that the full breadth of the �� is available to everyone.

5. Where ML engines surface content, outcomes are compliant with the ��’s editorial values (and where relevant as set out in our editorial guidelines). We will also seek to broaden, rather than narrow, our audience’s horizons.

6. ML is an evolving set of technologies, where the �� continues to innovate and experiment. Algorithms form only part of the content discovery process for our audiences, and sit alongside (human) editorial curation.

These principles are supported by a checklist that gives practitioners concrete questions to ask themselves throughout a machine learning project. These questions are not formulated as a governance framework that needs to be ticked off, but instead aim to help teams building machine learning engines to really think about the consequences of their work. Teams can reflect on the purpose of their algorithms; the sources of their data; our editorial values; how they trained and tested the model; how the models will be monitored throughout their lifecycle and their approaches to security and privacy and other legals questions.

While we expect our six principles to remain pretty consistent, the checklist will have to evolve as the �� develops its machine learning capabilities over time.

The is currently testing this approach as they build the ��’s first in-house recommender systems, which will offer a more personalised experience for �� Sport and �� Sounds. We also hope to improve the recommendations for other products and content areas in the future. We know that this framework will only be impactful if it is easy to use and can fit into the workflows of the teams building machine learning products.

The �� believes there are huge benefits to being transparent about how we’re using Machine Learning technologies. We want to communicate to our audiences how we’re using their data and why. We want to demystify machine learning. And we want to lead the way on a responsible approach. These factors are not only essential in building quality ML systems, but also in retaining the trust of our audiences.

This is only the beginning. As a public service, we are ultimately accountable to the public and so are keen to hear what you think of the above.

Navigating the data ecosystem technology landscape

Hannes Ricklefs, Max Leonard — Tue, 03 Sep 2019 12:46:36 +0000

Credit: Jasmine Cox

Want to message your Facebook friends on Twitter? Move your purchased music from iTunes to Amazon? Get Netflix recommendations based on your iPlayer history? Well, currently you can’t.

Many organisations are built on data, but the vast majority of the leading players in this market are structured as vertically integrated walled gardens, with few (if any) meaningful interfaces to any outside services. There are a great number of reasons for this, but regardless of whether they are intentional or technological happenstance (or a mixture of both), there is a rapidly growing movement of GDPR supercharged technologists who are putting forward decentralised and open alternatives to the household names of today. For the �� in particular, these new ways of approaching data are well aligned with our public service ethos and commitment to treating data in the most ethical way possible.

Refining how the �� uses data, both personal and public, is critical if we are to create a truly personalised �� in the near term and essential if we want to remain relevant in the coming decades. Our Chief Technology and Product Officer Matthew Postgate recently spoke about the ��’s role within data-led services, in which he outlined some of the work we have been doing in this respect to ensure the �� and other public service organisations are not absent from new and emerging data economies.

Alongside focused technical research projects like the �� Box, we have been mapping the emerging players, technologies and data ecosystems to further inform the ��’s potential role in this emerging landscape. Our view is that such an ecosystem is made up of the following core capabilities: Identity, data management (storage, access, and processing), data semantics and the developer experience, which are currently handled wholesale in traditional vertical services. A first step for us is hence to ascertain which of these core capabilities can realistically be deployed in a federated, decentralised future, and which implementations currently exist to practically facilitate this.

Identity, a crucial component of the data ecosystem, proves who users say they are providing a true digital identity. Furthermore we expect standard account features such as authentication and sharing options via unique access token that could enable users to get insights or to share data to be part of any offering. We found that identity, in the context of proving a user’s identity, was not provided by any of the solutions we investigated. Standard account features were present, ranging from platform specific implementations, to decentralised identifier approaches via WebID, and blockchain based distributed ledger approaches. As we strongly believe it is important to prove a user is who they say they are, at this point we would look to integrate solutions that specialise in this domain.

Data management can be further broken down into 3 areas:

Data usage and access, involves providing integration of data sources with an associated permission and authorisation model. Users should have complete governance of their data and usage by data services. Strong data security controls and progressive disclosure of data are key here. Given our investigation is based around personal data stores (PDS) and time series sensor/IoT device data platforms to capture personal, public and open data, providing access and controls around sharing of data was a fundamental capability of all offerings. All of them provided significant granularity and transparency to the users about what data is being stored, its source and usage by external services.
Data storage must provide high protection guarantees of users’ data, encrypted in transit and at rest, giving users complete control and transparency of data lifecycle management. Again, this is a fundamental requirement, such that storage is either a core offering of any platform or outsourced to external services that store data in strongly encrypted formats.
Data processing mechanisms to allow users to bring “algorithms” to their data, combined with a strong contract based exchange of data. Users are in control and understand what insights algorithms and services derive from their data. These might include aspects such as the creation of reports, creation and execution of machine learning models, other capabilities that reinforce the user’s control over how their personal data is used for generated insights. Through contract and authorisation based approaches users have complete audit trails of any processing performed which provides transparency of how data is utilised by services, whilst continuously being able to detect suspicious or unauthorised data access. Our investigations found that processing of data is either through providing SDKs that heavily specify the workflow for data processing, or no provisioning at all, leaving it to developers to create their own solution.

Data model and semantics refers to mechanisms that describe (schemas, ontologies) and maintain the data domains inside of the ecosystem, which is essential to provide extensibility and interoperability. Our investigations found this being approached in a wide spectrum from:

no provision requiring developers to come to conclusions about the best way to proceed
using open standards such as schema.org and modeling data around linked data and RDF
completely proprietary definitions around schemas within the system.

Finally the developer experience is key. It requires a set of software development tools to enable engineers to develop features and experiences as well as being able to implement unique value propositions required by services. This is the strongest and most consistent area across all our findings.

In summary our investigations have shown that there is no one solution that provides all of our identified and required capabilities. Crucially the majority of the explored end user solutions are still commercially orientated, such that they either make money from subscribers or through associated services.

So with the number of start-ups, software projects and standards that meet these capabilities snowballing, where might the �� fit into this increasingly crowded new world?

We believe that the �� has a role to play in all of these capabilities and that it would enhance our existing public service offering: to inform, educate and entertain. A healthy ecosystem requires multiple tenants and solutions providers, all adhering to core values such as transparency, interoperability and extensibility. Only then will users be able to freely and independently move or share their data between providers which would enable purposeful collaboration and fair competition toward delivering value to audiences, society and industry.

The �� was incorporated at the dawn of the radio era to counteract the unbridled free-for-all that often comes with any disruptive technology, and its remit to shape standards and practices for the good of the UK and its population stands today as . With a scale, reach and purpose that is unique to the ��, it is strongly congruent with our public service duty to help drive policy, standards and access rights to ensure that the riches on offer in these new ecosystems are not coopted solely for the downward pursuit of profit, and remain accessible for the benefit of all.

Machine learning and editorial collaboration within the ��

Anna McGovern, Ewan Nicolson, Svetlana Videnova — Thu, 29 Aug 2019 13:55:00 +0000

The �� is nearly 100 years old. Inevitably, as an organisation we are having to adapt to meet some of the technological requirements of the future, such as incorporating Machine Learning (ML) technologies. ML recommendations, for example, is a standard way for audiences to discover content and the �� is committed to make this discovery more personal. Developing these services has brought an interesting opportunity for collaboration between the ML and Editorial teams within Datalab, the �� team focused on building recommendation engines.

About a year ago we started the experiment of the ��+ app. This was the first time the �� provided the audience with a fully automated ML service. With this wealth of knowledge and with more data science initiatives taking shape, we want to use all the available expertise the �� can provide.

Our aim is to create responsible recommendation engines, true to the �� values and using all available expertise the �� can provide. In industry, it is commonplace for data science teams to make use of specialist knowledge to inform how models are developed. For example, data scientists working for a travel site would use experts with knowledge about everything from business flights to how and when families go on holiday. Datalab consulted editorial teams and representatives who specialised in curation as it began to develop recommendations for content discovery.

Datalab’s editorial lead, Anna McGovern, helps us with advice on editorial judgement and content curation expertise within the ��. Ewan Nicolson is lead data scientist and represents the technological aspect of Datalab’s work here. Svetlana Videnova, Business Analyst, poses some of the common teamwork problems within the public media industry and technological challenges we face today. We will focus on a given challenge about the curation of the content and leave its creation phase for another post. Both Anna and Ewan will provide their way of tackling that work in their own fields. The last column of the table below demonstrates an example of how the collaboration works in our team.

As you’ll see, the two fields of editorial and data science compliment each other. Working across discipline gives better results for the audience, and helps us learn from each other. It means that machine learning is actually solving the correct problems because we’re making use of the rich expertise from editorial. It also means that editorial are able to take advantages of techniques like machine learning to multiply their efforts and deliver more value to the audience.

Challenge	Machine Learning solution	Editorial solution	When we collaborate
How do we ensure curation is a good experience for users?	We consider many different measures of success: accuracy, diversity, recency, impartiality, editorial priority.	Traditionally on an editorial team, a journalist would research a story, discuss how it might be covered and compose the story itself to make it compelling.	The data scientists get a rich understanding from editorial of the different trade-offs between these measures of success. Deep domain knowledge.
How does recency impact curation of content?	We include publication date as a feature in our models. We sometimes try and optimise for recency, showing people more current content in some situations.	One of the challenges is that once that work is done it is fairly hard to bring the editorial creation back to life, especially for evergreen content. This is one of many examples that ML recommendations could help with, by surfacing this content in the most relevant time according to the user’s experience or history.	By working together we’re able to identify how to make decisions about which pieces of content are evergreen and suitable for recommendation, and which pieces have a limited shelf-life and shouldn’t be presented to users beyond a certain point.
How does the �� ensure impartiality?	We use to understand if our model is giving unbiased results. Good practice in machine learning make sure that we’re using unbiased training data.	Editors, journalists and content creators make a concerted effort to ensure that a range of views and perspectives are shown within a piece of content or across several pieces of content(within a series for example)	We combine our good practices with domain knowledge from editorial. We use techniques like human-in-the-loop machine learning, or semi-supervised learning to make editorial’s lives easier, and apply their knowledge at massive scale. ML helps editorial identifying those pieces of content that show a breadth of views.
How we ensure variety within content serving?	We construct mathematical measures for We include these in our machine learning optimisations.	Editorial staff responsible for curation ensure a breadth and depth of content on indexes, within collections etc	We learn about the differences between our different pieces of content. Working together we’re able to determine if our recommendations offer an interesting, relevant, and useful journey for the user. The ��’s audio networks feature different output and tone of voice. ie. Radio 4 has a very different ‘flavour’ to 6Music. Consequently network can be used to ensure variety in results.
How do we avoid legal issues?	We are given a checklist, and we check the items off. We get told that there are things “we can’t do for opaque legal reasons” but never really understand why, and limit the functionality of our solution.	Editors, journalists and content creators have to attend a mandatory course relating to media law, so that they have full knowledge about issues such as contempt of court, defamation and privacy. An editor will sign off content to ensure that content is compliant with legal requirements.	By talking to legal advisers we can build business rules to minimise the risk of legal infractions. Close collaboration with editorial means we gain a deep understanding of the potential problems ahead at an early stage. We build with awareness of these concerns, and with that awareness build a solution that is high quality from both a technical and editorial point of view.
How we handle editorial quality?	We build and refine a model using data science good practices, and then turn it over to our editorial colleagues. They then decide if the results are good or not.	When editors curate they can choose content that is relevant, interesting and of good quality. Recommendations present a specific editorial challenge, in that recommenders can surface content that is not the best of our output.	In ��+ we prioritised content that we knew would suit the environment in which it appeared: standalone, short-form videos, appearing in a feed, from digital first areas such as Radio 1, The Social, �� Ideas etc Including editorial throughout the process means that they teach us about what is important in the results, so that data science understand the real problems that we’re trying to solve. We fail quickly, and learn quickly, getting to a better quality result.
How we learn from our audiences? Accuracy/user generated content?	Measure user activity with the products, and construct measurements of engagement. Building implicit and explicit feedback loops. An explicit feedback loop is having a“like” button, an implicit feedback loop is determining a way to measure when something has gone wrong, like bounce rate or user churn.	We monitor feedback and analyse stats to build a picture about how our audiences engage with our content.	We work with editorial to understand the insights we get from data. They help rationalise the behaviours that we see in the data. They also teach us things that we should look for in the data.
How we test recommendations	A mixture of offline evaluation metrics (e.g.testing against a known test set of data), and online evaluation metrics (e.g.A/B testing)	Traditionally: We monitor feedback and analyse stats to build a picture about how our audiences engage with our content.	The editorial lead works with data scientists on the composition of the recommender. The results are then reviewed by the editorial lead and to obtain a variety of opinions the results are reviewed by more editorial colleagues. More on quantitative testing here . The rich editorial feedback lets us understand where our model could be better and make improvements.

We’re big believers in cross-disciplinary collaboration. As we’ve touched on in this article the �� has a lot of uniquely complex problems to solve in this space. This collaboration is essential if we’re going to continue to deliver value to the ��’s audience using data.

If you are curious about this collaboration and would like to know more in depth about how we work, leave us a message and we will be happy to get back to you.

Also, we are hiring .

Developing personalised recommender systems at the ��

Jana Eggink — Thu, 22 Aug 2019 14:20:43 +0000

The �� is on a journey to become more personalised, and recommendations are an important part of that goal. To date, recommendations in the �� have been provided primarily by external providers. We feel that offering — and understanding — good recommendations is a crucial area for us in reaching our target audience of young listeners and so we have started exploring this area in-house. The is a relatively new team specialising in machine learning, and looking after recommender systems in the ��. We work with product groups to develop new ways to personalise their offerings, and also collaborate with �� R&D.

We want to be able to explain the composition of our recommendations and so we need to understand how they are generated. Our recommendations should reflect the breadth and diversity of our content and meet our editorial guidelines, as well as informing, educating and entertaining! All these were good reasons for us to build the capability to constantly create challengers to the existing recommendation models.

Current recommendations on Sounds website

Datalab was assigned this brilliant and fun challenge and began collaborating with the Sounds team, using a multidisciplinary group made up of data scientists, engineers, editorial specialists and product managers.

The team had some prior experience building personalised recommendations for our video clip app ��+. For ��+, the recommender was purely content based, using existing metadata information such as genres (e.g. Drama/Medical) or brands (e.g. Glastonbury Festival). This would probably have been a good approach if our content had been labelled for the express purpose of personalisation. However, the ��’s production workflows were designed to meet the needs of broadcast systems, and we didn’t always have all the labels we would have wanted for recommendations.

��+ app

come with the enticing promise of combining content-based recommendations with collaborative filtering.

Using a standard content-based approach, if a user had listened to podcasts from the genre ‘Health & Wellbeing’ the system would recommend a new episode from Radio 1’s Life Hacks but it could also recommend Radio 4’s Inside Health, which has a very different tone of voice. By contrast, collaborative filtering matches programmes based on what similar users have enjoyed — so if they listen to Radio 1’s Life Hacks, they might be recommended Radio 1 comedy. This model relies on ‘adjacent’ content similar to the recommendations found in shopping websites where ‘customers who bought this also bought that’. This approach often leads to better recommendations for established content, but is less effective for fresh content that hasn’t been consumed by significant numbers of people. Since the �� continuously produces new content throughout the day this recommendation strategy by itself would be limiting.

Factorisation machines are a smart way to combine both. They have been around a few years, and open source toolboxes exist to support them. Our team programs primarily in Python, so we wanted a toolbox that integrates with that. Obviously, we also wanted it to be fast, give superior results and be easy to use (more on that later…).

We stored user-item interactions (i.e. the programmes a specific user has listened to) in a BigQuery table. The programme items with the corresponding genre and brand metadata were in a different table, and both needed to be assembled in the correct format for the factorisation machines. Our first choice of toolbox was xlearn. The code seemed relatively mature, running a first test example was easy, and the toolbox offers a variety of different flavours in terms of learning algorithm. But it was hard to get the data into the correct format and, even now that we have a version up and running, we’re still not sure we got everything right — mainly because the initial results are nowhere near as good as we had wanted (and expected) them to be!

The quality of recommendations can be subjective and we needed a way to test them before making them live anywhere on the ��’s websites or apps. Predicting past behaviour is one way of doing this, but also comes with all sorts of problems: users only click on what they see, a piece of content might be brilliant, but if it does not appear in the results, the user will not see it and cannot click on it. Recommending the most popular items generally gives good numbers (as by definition these items get the most clicks), but rarely leads to recommendations of fresh content. In practical terms, it’s also a lot of work to set up if your data is stored in ways that were not devised with the ease of access for data scientists in mind…

So we decided to test the results using qualitative evaluation, asking about 20 editorial and other non-technical people to judge our new recommendations against those from an existing provider. We didn’t tell them which set came from which recommender! We used the individual history of the internal test participants to generate the recommendations by both providers and asked for their preference and general feedback.

Qualitative experiment

Most of our test users preferred the recommendations we currently have live to our first set of test recommendations and we weren’t keen on them either, so we knew we had more work to do.

With the overall infrastructure set-up, it was quite easy to swap out the toolbox we’ve used for the factorisation machines. We had previously looked at and it had a much simpler data format, so we decided to give it a go. We were able to compute new recommendations and run another qualitative experiment in less than two weeks. Our recommendations looked much better, and our test users agreed! However, these are still first results. We don’t feel we’ve fully solved the problem of recommending popular items versus programmes that are strongly tailored towards a specific user’s interests, and are looking into ways to improve this.

We are happy with the results so far, but there is still a lot of work to do to bring the recommender into production. The infrastructure needs decidedly more work to make it robust and able to scale, and we’d like to do more testing. Having a variety of offline metrics should help us to optimise parameters, and test new algorithms without having to go back to our testing panels every few days. We’re also still looking at a simple content-based recommender to have another baseline, so more results hopefully soon.

We also still have some more fundamental questions that we hope our practical work will help us to answer. For example, can we use the same approach for recommending entertainment as for news, or do we need specialised systems for each domain? And what if we change the medium and move from audio and video to text, or new interfaces like voice controlled devices? Even if the overall editorial guidelines do not change, we might need different technical approaches to be able to achieve them. But we also want to avoid starting from scratch for every new recommender we build, and we’re still trying to figure out how best to do that. In summary, there is lots to do, but it’s exciting and we’re enjoying the challenge!

Want to work with us?

Personalisation: Is there a price for convenience?

Sinead O'Brien — Fri, 19 Oct 2018 12:52:34 +0000

Sinead O'Brien, Technology Strategy & Architecture's Lead Project Manager for Transformation Delivery shares her insights from this month's �� Machine Learning Fireside Chat.

As more and more of our intimate data is collected, is there a price of convenience? If so, what is it, and is it worth paying? The decidedly thought-provoking discussion at last week’s sold out '�� Machine Learning Fireside Chats presents: The Price of Convenience’ was hosted by Ahmed Razek of �� Blue Room.

The provocation…
There is an increasingly fine line between personalised services and invasive services. Do people understand that they’re trading their personal data for these services? Are they aware of the risks? Do they care?

On the panel…
Maxine Mackintosh, PhD student at . Maxine’s PhD involves mining medical records for new predictors of dementia. She is passionate about understanding how we might make better use of routinely collected data to improve our cognitive health.

Also on the stellar line-up was Josh Cowls, Research Associate in Data Ethics at The Alan Turing Institute, and a doctoral researcher at the Digital Ethics Lab, Oxford Internet Institute. Josh's research agenda centres on decision-making in the digital era, with a particular focus on the social and ethical impact of big data and AI and its intersection with public opinion and policy-making.

The third guest speaker for the evening, Martin Goodson, is Chief Scientist and CEO of . Martin is a specialist in natural language processing, the computational understanding of human language.

The discourse…

Maxine kicked off the conversation with a rather hard-hitting statement, that we misunderstand what “health data” really means. When we discuss health data, it is presumed that we refer to data that is collected when we interact with the health system – our medical records. Incorrect. That is “sickness data”. Health data refers to search data, the information captured when we Google, for example travel, which indicates how healthy we are.

Maxine is a member of independent review panel. The board looks to build trust through radical transparency. She argued that we cannot expect the NHS or academia to afford the computational power required to get things right. Therefore we have to work together alongside the large corporations. Corporates can play an innovating role but they, and not just DeepMind, should not be enabled to profit from our data. We, the citizens, own the data. The government has a regulator role to play in protecting society.

Martin spoke further to the tensions between privacy and innovation. If we are too private with our data, there will be less innovation. He argued that the privileged of society are more likely to benefit from AI in terms of convenience. Data needs to work for people and for society. Misuses of machine learning based systems that have led to cruel justice were pointed to as an exemplar of the negative impact of the less privileged of society.

The panel then moved on to the topics of ethics. There was a sudden interest in the ethical perspective. Ahmed asked if there is a risk of “ethics-washing”, using ethical defence to side-step issues such as privacy, autonomy, and agency? General consensus amongst the panel was that the UK is in a good place to be setting the agenda. Europe has a long tradition of setting human liberties. But we need to be ethical and enable innovation at the same time.

The panel argued that unless citizens are personally affected by data breaches, they don’t really understand the repercussions. The public perspective is as much about when and how you ask, as whom you ask. We don’t need to teach kids to code. We need to teach young people to think about how coding impacts and why the control of data may be important.

Maxine highlighted that NHS users are automatically opted in to their depersonalised confidential patient information being used for research and planning by the NHS, as well as commercial and academic partners. NHS data isn’t great but it does have scale. There are huge benefits for populational research. Health data was likened to taxes, a societal contract. Informed decision-making is important. I am happy to share my data in this scenario. Would you opt out of giving your health data?

The discussion closed with a last thought-provoking question: "Can we put data solely in the hands of non-profits?". The panel argued that our health and justice systems need to be able to engage with organisations commercially. And sufficient profit is needed to run these organisations. The panel concluded that we need to define what we mean by “reasonable profit” in this sense.

For more details about upcoming events, visit .

Privacy by design

Adam Bailin — Wed, 13 Jun 2018 14:18:00 +0000

This little button means a lot to us.

When it’s switched on it allows you to view personalised content recommendations based on your historic �� activity data.

I work for the �� Analytics Services team in Cardiff and we’ve been working on what happens when .

The �� decided that when you switch off personalisation, it shouldn’t link any activity data to your account. In fact, we think that it shouldn’t ever be possible to tie activity data back to you. This page expresses it well:

“Data about how you use the �� will be anonymous. For instance, we’d be able to see that someone looked at a particular story on �� News, but we wouldn’t be able to tell if it was you.”

That’s actually quite a difficult software engineering problem to solve. The way most analytics tracking systems work is by using cookies or some other persistent identifier precisely to be able to tie together a user’s activity across multiple sessions. To get around that, we reset a user’s analytics identifier whenever they switch personalisation off.

How it's normally done

Normally, when you sign in or out of a service your analytics identifier will persist. This means that organisations can attribute data to you as an individual even when you’re signed out or have chosen not to receive a personalised experience.

How we've designed it

We’ve designed our analytics ID differently, with privacy in mind. We’ve designed it so that when you sign in to your �� account and disable personalisation, we can’t attribute activity back to you as an individual.

We’ve designed privacy into our analytics.

Putting users in control of their data

The General Data Protection Regulation, or GDPR for short, is one of the biggest changes to data privacy law in recent years. It is designed to put you in control of how your information is collected and used by organisations.

This change to our analytics services is a small example of how we are designing privacy as a feature to put users in control of their data. It is part of a wider opportunity to enable much greater control over how your data is collected, what you share and with whom. And in turn to drive a more relevant and nuanced personalisation of the ��’s services.

We see privacy not just as an exercise in legal compliance but as an opportunity to deliver greater value for users.

Your data matters

Julie Foster — Tue, 22 May 2018 14:04:00 +0000

Last , we updated everyone on our plans to make the �� more personalised and relevant to you. We can give you more of what you love when we understand you better, and also make sure that as a public service, we make something for everyone.

We now have over 15 million people with �� accounts using the ��’s websites and apps in the last month. What’s more, they are also using �� websites and apps more than people who are not signed in. 64% of �� account users visit �� online more than 2 days per week, compared to 46% of all users. And when they are on �� websites and apps, people with �� account spend an additional hour per week than people not signed in.

Your personal data is helping power this transformation. We can’t provide you with a meaningful personal or tailored experience without this information, but it is ultimately your data. And your data matters.

The General Data Protection Regulation, or GDPR for short, is coming into enforcement in the next week. It makes sure that businesses clearly explain to you why they collect your personal data, and how they use it. It is an evolution of the Data Protection Act, and gives you new and important rights.

As we’ve said before, we’ve built our new �� account system with GDPR in mind, but we’re always reviewing our processes, technology and governance.

We use your personal data for different reasons, and it’s important we are transparent to you why we collect and use this data. Our site spells out, in plain English, what we will (and importantly won’t) do with your data. It also can help you exercise your GDPR rights, such as changing some of your details in Settings.

How have we prepared for GDPR?

For starters, you should not need to be a rocket scientist to know your rights. We’ve updated our policies to make them even more transparent and clear.

What are my rights?

We’ve created a new section in Using The �� all about GDPR to help you understand what your rights are, how you can exercise them with the �� and get help.
We’ve also innovated and developed technology with data privacy at the heart of what we create. Below are a couple of examples of the kind of work we’ve done to prepare for GDPR.

Privacy for children

We want to help you make sure that your child can only watch programmes, read comments and upload their creations in a space that is age appropriate and suitable to them. For this reason, we’ve developed a way for parents or guardians to register their child which is simple and easy to do, but more importantly is safe and secure for the entire family. We want your children to get great experiences on �� websites and apps, and play our part to help protect them.

Privacy by design

This little button means a lot to us.

We really want you to have a personalised experience, like picking up where you left off watching a show, getting recommendations on programmes you might like or getting notifications about your favourite football team. But you have the right to these features if you don’t want them. Our analytic services team has worked hard to develop a technical solution that can do this easily, and ensure your privacy.

What’s next?

We have some fantastic events coming this summer, from the to FIFA World Cup and Wimbledon. Your data is helping us learn what you like, so we can make sure you get the best out of this summer, and improve our services for you in the future.

Building a more personal and local �� News website

Karolina Iwaszko — Wed, 08 Nov 2017 09:55:00 +0000

We are making some changes to the way we provide local news, sport and other location driven information, like travel or weather updates online. Our new pages offer a view of what’s relevant and important to a person from the perspective of their locality and allow users to define how zoomed in they want their view to be.

If you enter or share your location, you’ll get the latest �� content about the location that matters to you. You can choose from between all locations in the UK, Channel Islands and Isle of Man (around 17,000 of them!). You will see the stories relevant for your selected location in the Latest Stories section – they are ordered based on publication time, with the most recent ones appearing at the top.

You’ll also see a selection at the top of the page of the stories that we think might interest you most, covering slightly wider areas around. To make it more personal and easier we are making it possible to save your chosen location for later, so when you return to the page, you will straight away access the content that is most relevant to you. Your location will be saved in cookies and shared between other �� websites. Clearing cookies will remove this setting.

We understand people can have different views of what is “local” so we have designed this new proposition to allow everyone to define the level of locality most suitable for them personally. If you are interested in what’s happening in the surrounding area you can extend the radius to bring in stories from further afield (up to a radius of 20 miles).

These changes are part of a wider piece of work designed to improve discovery and navigation around �� content, as well as making the �� more relevant to its audience, by offering more personalised products and services online. Other changes will follow and will see us moving away from our current local pages and towards a more refined service of personalised content.

We are committed to continuing to improve the ��’s local news experience and we will be adding more features in the following months. We really welcome your .

You can find more about the new service here.

Laura Ellis (Head of Digital, English Regions) and Karolina Iwaszko (Executive Product Manager, Design & Engineering).

Technology + Creativity at the �������� Feed

Introducing machine-based video recommendations in �������� Sport

Related clips

Short Form Video & Datalab

Algorithm-based recommendations

Launching the new recommendations

Philip 21 - an interactive story exploring race, love and modern Britain

�������� Taster - Try Philip 21

The fabula and syzhuet of the authored experience

The dual narrative

What comes next

Building a WebAssembly Runtime for �������� iPlayer and enhanced audience experiences

How have we used WebAssembly?

Where do we go from here?

The complexities of creating a new 'follow topic' capability

Context

Principles

The Key Considerations and Complexities:

How will the resulting follow actions be surfaced?

Conclusion

Me, you and the machine

What does this mean in practice?

How metadata will drive content discovery for the �������� online

Common Metadata

Understanding public service curation: What do ‘good’ recommendations look like?

Scaling responsible machine learning at the ��������

The ��������’s Values

Our Audiences

Responsible Development of Technology

Navigating the data ecosystem technology landscape

Machine learning and editorial collaboration within the ��������

Developing personalised recommender systems at the ��������

Personalisation: Is there a price for convenience?

Privacy by design

How it's normally done

How we've designed it

Putting users in control of their data

Your data matters

How have we prepared for GDPR?

What are my rights?

Privacy for children

Privacy by design

What’s next?

Building a more personal and local �������� News website

Technology + Creativity at the �� Feed

Introducing machine-based video recommendations in �� Sport

�� Taster - Try Philip 21

Building a WebAssembly Runtime for �� iPlayer and enhanced audience experiences

How metadata will drive content discovery for the �� online

Scaling responsible machine learning at the ��

The ��’s Values

Machine learning and editorial collaboration within the ��

Developing personalised recommender systems at the ��

Building a more personal and local �� News website