Ö÷²¥´óÐã Genome Blog Feed News, highlights and banter from the team at Ö÷²¥´óÐã Genome – the website that shows you all the Ö÷²¥´óÐã’s listings between 1923 and 2009 (and tells you what was on the day you were born!) Join us and share all the oddities, archive gems and historical firsts you find while digging around… 2016-11-09T16:00:00+00:00 Zend_Feed_Writer /blogs/genome <![CDATA[Under construction...]]> 2016-11-09T16:00:00+00:00 2016-11-09T16:00:00+00:00 /blogs/genome/entries/adf3c700-7be5-4f5e-a04b-b84e31fe6cf0 Andy Armstrong <div class="component"> <img class="image" src="https://ichef.bbci.co.uk/images/ic/320xn/p04fhg7z.jpg" srcset="https://ichef.bbci.co.uk/images/ic/80xn/p04fhg7z.jpg 80w, https://ichef.bbci.co.uk/images/ic/160xn/p04fhg7z.jpg 160w, https://ichef.bbci.co.uk/images/ic/320xn/p04fhg7z.jpg 320w, https://ichef.bbci.co.uk/images/ic/480xn/p04fhg7z.jpg 480w, https://ichef.bbci.co.uk/images/ic/640xn/p04fhg7z.jpg 640w, https://ichef.bbci.co.uk/images/ic/768xn/p04fhg7z.jpg 768w, https://ichef.bbci.co.uk/images/ic/896xn/p04fhg7z.jpg 896w, https://ichef.bbci.co.uk/images/ic/1008xn/p04fhg7z.jpg 1008w" sizes="(min-width: 63em) 613px, (min-width: 48.125em) 66.666666666667vw, 100vw" alt=""><p><em>Not the Ö÷²¥´óÐã Genome team at work - but we thought you would like this picture of an engineer installing cable in the new Ö÷²¥´óÐã Wales Broadcasting House in 1965</em></p></div> <div class="component prose"> <p><a title="Ö÷²¥´óÐã Internet Blog" href="http://www.bbc.co.uk/blogs/internet/entries/cedd1f30-ac95-3271-8e8a-78c07b9a4d23" target="_blank">When we launched Genome</a> just over two years ago we had no idea how popular the ability to make corrections to programme listings would be. It turns out that it's been quite popular - as I write we've accepted more than 175,000 edits. Thank you to <a title="Genome - editors" href="http://www.bbc.co.uk/blogs/genome/entries/0e47c634-5d48-47f8-bffe-ca68cdb200be" target="_blank">everyone who has worked</a> so diligently to improve the quality of the data.</p> <p>When a correction is suggested it is sent from one of the four Genome web servers to an admin server for review by the moderating team. When a change is accepted it is then sent back to the web servers where it is applied to the main Genome database.</p> <p>The current system has worked reasonably well but it isn't as flexible as we'd like it to be. In addition to individual corrections suggested via the <a title="Ö÷²¥´óÐã Genome" href="http://genome.ch.bbc.co.uk" target="_blank">Genome website</a> we also want to make bulk changes to the data - for example when new sources of programme listings are discovered or to fix an OCR error that affects many programmes in the same way. We're also working on being able to split listings where a single item should be broken into multiple programmes. The Genome software doesn't currently handle <a title="Genome FAQs" href="http://genome.ch.bbc.co.uk/faqs#missing-listings" target="_blank">those kinds of edits.</a></p> <p>We've been working on a more flexible and general purpose version of this core functionality. It allows all changes to the Genome data to be handled in the same way. That makes the code simpler and more reliable which, in turn, makes it easier for us to introduce new editing features.</p> <p>We've been testing this new code for a few weeks and now it's time to release it. Because we're switching to an entirely new way of synchronising edits between the public web servers and the admin server we will have to disable editing while we migrate to this new system. We hope to complete the switch over in a few days.</p> <p>At first you won't (we hope) notice any changes; our initial goal is to get everything working as before with the new code. Once we're happy that we've done that we can start introducing new editing features and publishing some of the additional listing sources we've discovered.</p> </div> <![CDATA[New content added to Genome!]]> 2015-09-16T14:45:00+00:00 2015-09-16T14:45:00+00:00 /blogs/genome/entries/06107fe1-eeae-4ed6-8466-6b5df49c93a9 <div class="component prose"> <p>Many of you noticed and wrote to tell us that the listings in programmes like Grandstand and musical concerts were incomplete. So good news coming up - we're pleased to announce that some 40,000 listings now have more information in them.</p> <p>These enhanced and enriched entries were recovered from a number of issues which occurred during the original scanning process, which is <a title="explained in detail here" href="http://genome.ch.bbc.co.uk/faqs#missing-listings" target="_blank">explained in detail here.</a></p> <p>You’ll now find that there are extended synopses for programmes including Grandstand, which show start times and details for individual sports. Previously, this information was listed in extra 'boxes' scattered around Radio Times and was not directly associated with the main entry.</p> <p>Further details about music programmes and pieces played during these shows have also been restored. A number of listings which displayed incorrect dates have also been cleaned up as a result of this work.</p> <p>A previous entry on the blog reveals exactly how this data has been <a title="painstakingly extracted" href="http://www.bbc.co.uk/blogs/genome/entries/c2bfad9e-bae3-4a62-aa85-cc8b020b1b2d" target="_blank">painstakingly extracted</a> by Genome's technical team. This is an ongoing process that could eventually add a total of 1,000,000 new and updated programme listings.</p> <p>We hope that being able to search for composers and pieces of music, for example, will make Genome more useful for you. But remember, Ö÷²¥´óÐã Genome is still very much a work in progress, so we would appreciate all <a title="your help with edits" href="http://genome.ch.bbc.co.uk/faqs#entry-edit" target="_blank">your help with edits </a>to continue to make it better.</p> </div> <div class="component"> <img class="image" src="https://ichef.bbci.co.uk/images/ic/320xn/p032nn2l.jpg" srcset="https://ichef.bbci.co.uk/images/ic/80xn/p032nn2l.jpg 80w, https://ichef.bbci.co.uk/images/ic/160xn/p032nn2l.jpg 160w, https://ichef.bbci.co.uk/images/ic/320xn/p032nn2l.jpg 320w, https://ichef.bbci.co.uk/images/ic/480xn/p032nn2l.jpg 480w, https://ichef.bbci.co.uk/images/ic/640xn/p032nn2l.jpg 640w, https://ichef.bbci.co.uk/images/ic/768xn/p032nn2l.jpg 768w, https://ichef.bbci.co.uk/images/ic/896xn/p032nn2l.jpg 896w, https://ichef.bbci.co.uk/images/ic/1008xn/p032nn2l.jpg 1008w" sizes="(min-width: 63em) 613px, (min-width: 48.125em) 66.666666666667vw, 100vw" alt=""><p><em>A number of listings for sports digest Grandstand will now be more complete</em></p></div> <![CDATA[Data archaeology - or the missing listings]]> 2015-08-11T10:30:00+00:00 2015-08-11T10:30:00+00:00 /blogs/genome/entries/c2bfad9e-bae3-4a62-aa85-cc8b020b1b2d Jake Berger <div class="component prose"> <p>The small-but-perfectly-formed Genome technical team have spent the last six months digging into the vast cache of data that was produced during the scanning over 350,000 pages of the Radio Times. They’ve found quite a lot interesting artefacts in unexpected places, and are currently brushing them down and cleaning them up in readiness for publication on the site...</p> <p>I’ll describe a couple of these artefacts, and try to explain how they came to exist and what we are doing with them.</p> <p>The Genome project set out to create an easily navigable and searchable broadcast history of the Ö÷²¥´óÐã - the public face of the Ö÷²¥´óÐã’s catalogue - rather than to simply digitise the Radio Times. This meant that substantial chunks of the magazines would not be made available online (e.g the adverts, articles, letters), as these would have made the listings themselves harder to find.</p> <p>In order to achieve this, during the design of the data extraction process, algorithms were written to automagically identify which sections of the magazine and which elements within a page were likely to contain a listing (as opposed to anything else). These algorithms essentially looked for patterns in the layout of pages and patterns in sequences of characters.</p> <p>As with most automated pattern recognition scenarios, there will always be an element of error. Given the substantial variance in the design and layout of the Radio Times between 1923 and 2009 (I blame the graphic designers) these algorithms had to be continually tweaked to achieve a satisfactory accuracy rate, but a degree of error was inevitable. Finding and correcting these errors is of huge importance if we are going to be able to make available the fullest and most accurate representation of the Ö÷²¥´óÐã’s broadcast history.</p> <p>Simon Smith - our champion data archaeologist - spent a substantial amount of time poring over a million or so rows of data in the Genome database in two tables, called ‘genome_related’ and' genome_tables’. Fields in these rows contain blocks of text that the algorithms decided were not programme listings (it usually decided that they were articles and therefore not for publication).</p> <p> </p> </div> <div class="component"> <img class="image" src="https://ichef.bbci.co.uk/images/ic/320xn/p02z7ydt.jpg" srcset="https://ichef.bbci.co.uk/images/ic/80xn/p02z7ydt.jpg 80w, https://ichef.bbci.co.uk/images/ic/160xn/p02z7ydt.jpg 160w, https://ichef.bbci.co.uk/images/ic/320xn/p02z7ydt.jpg 320w, https://ichef.bbci.co.uk/images/ic/480xn/p02z7ydt.jpg 480w, https://ichef.bbci.co.uk/images/ic/640xn/p02z7ydt.jpg 640w, https://ichef.bbci.co.uk/images/ic/768xn/p02z7ydt.jpg 768w, https://ichef.bbci.co.uk/images/ic/896xn/p02z7ydt.jpg 896w, https://ichef.bbci.co.uk/images/ic/1008xn/p02z7ydt.jpg 1008w" sizes="(min-width: 63em) 613px, (min-width: 48.125em) 66.666666666667vw, 100vw" alt=""><p><em>This is what a Grandstand listing would look like.</em></p></div> <div class="component prose"> <p>Simon found that in many cases, these fields contained either what was essentially an extended synopsis (as found in text boxes adjacent to the programme listing or in the longer photo captions) or further details about the programme listed (such as which pieces by which composer would be played during a radio music programme) or ‘sub-listings’ (such as the start time of the various sports featured in an episode of Grandstand). He has painstakingly identified more than a hundred thousand items which we are working on making available on the Genome website, adding a whole new level of detail to the listings.</p> <p>We are very excited (yes - we do need to get out more) about something else that Simon has found: around 150,000 chunks of text that the algorithms misidentified as articles when in fact they are brief programme listings. We estimate that when the data is extracted and properly classified, we might be able to add nearly a million new programme listings to Genome. This will take a good while to get ready, but we’ll blog about it in more detail when we have made some more progress.</p> </div>