Ö÷²¥´óÐã

« Previous | Main | Next »

ABC-IP and work on audio archiving research

Post categories: ,Ìý,Ìý

Dominic Tinley | 09:09 UK time, Tuesday, 8 November 2011

We're a few months in to a new collaboration withÌý where we're looking at how to unlock archive content by making more effective use of metadata. The Automatic Broadcast Content Interlinking Project (ABC-IP for short) is researching and developing advanced text processing techniques to link together different sources of metadata around large video and audio collections. The project is part funded by theÌý Ìý(TSB) under its 'Metadata: increasing the value of digital content' competition which was awarded earlier this year. The idea is that by cross-matching the various sources of data we have access to - many of which are incomplete - we will be able to build a range of new services that help audiences find their way around content from the Ö÷²¥´óÐã and beyond.

Our starting point is the English component of the massiveÌýWorld ServiceÌýaudio archive. The World Service has been broadcasting since 1932 so deriving tags from this content gives us a hugely rich dataset of places, people, subjects and organisations within a wide range of genres, all mapped against the times they've been broadcast.

The distribution of programme genres in the World Service radio archive

The distribution of programme genres in the World Service radio archive

One of the early innovations on the project has been to improve the way topic tags can be derived from automated speech-to-text transcriptions which gives us a whole new set of metadata to work with for comparatively little effort. We've optimised various algorithms to work with the sorts of transcriptions with high word error rate that speech recognition creates and the results so far have been quite impressive.

Other sources of data include everything fromÌýÖ÷²¥´óÐã Programmes, including the topics manually added by Ö÷²¥´óÐã producers, and everything fromÌýÖ÷²¥´óÐã Redux, an internal playable archive of almost everything broadcast since mid-2007. In later stages of the project we'll also be adding data about what people watch and listen to as well. Blending all this together provides many different views of Ö÷²¥´óÐã programmes and related content including, for example, topics over time or mappings of where people and topics intersect. The end result is a far richer set of metadata for each unique programme than would be possible with either automatic or manual metadata generation alone.

Based on the work so far our project partners have built the first user-facing prototype for the project, called Tellytopic, which lets users navigate between programmes using each of the new tags available. You can find more on .

The plan is that the work we're doing will eventually complement projects in other parts of the Ö÷²¥´óÐã, such as Audiopedia which was announced by Mark Thompson last week. We'll talk more about other ways we're going to use the data on this blog over the coming months.

Comments

Be the first to comment

More from this blog...

Ö÷²¥´óÐã iD

Ö÷²¥´óÐã navigation

Ö÷²¥´óÐã © 2014 The Ö÷²¥´óÐã is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.