en Technology + Creativity at the Ö÷²¥´óÐã Feed Technology, innovation, engineering, design, development. The home of the Ö÷²¥´óÐã's digital services. Mon, 04 Oct 2021 15:51:46 +0000 Zend_Feed_Writer 2 (http://framework.zend.com) /blogs/internet Fighting misinformation: An embedded media provenance specification Mon, 04 Oct 2021 15:51:46 +0000 /blogs/internet/entries/5d371e1b-54be-491f-b8ee-9e354bafb168 /blogs/internet/entries/5d371e1b-54be-491f-b8ee-9e354bafb168 Charlie Halford Charlie Halford

For the last few years, the Ö÷²¥´óÐã has had a project running in its technology division looking at technology solutions to various problems in the domain of news disinformation. Part of that effort, called , is working to make it easier to understand where the news you consume online really comes from so that you can decide how credible it is. You can find some history on this in Laura Ellis' excellent "Project Origin: one year on" blog.

Part of Project Origin has been working in collaboration with major media and tech companies, most recently, with the , which it helped form. This group recently released . This spec tackles the problem of missing, trusted provenance information in images / video / audio consumed on the internet. For example, where a video of elections in one country from 10 years ago is being presented as video from recent elections in another. This is an overview of how that specification is intended to work.

Embedding

The C2PA specification works primarily by defining mechanisms for embedding additional data into media assets to indicate their authentic origin. An essential aspect of this data is "assertions" - statements about when and where media was produced. The embedded information is then digitally signed so that a consumer knows who is making the statements.

While the C2PA specification also includes mechanisms for locating this provenance data remotely (e.g. hosted somewhere on the internet), I'll focus on the use case where all data is embedded directly in the asset itself.

Data model

The C2PA specification uses a few different mechanisms for embedding and storing data. Embedding is done with , a container format, and structured data storage is done with a combination of  and  (which is a binary format based on JSON).

Container - The "Manifest Store"

Similar to , the C2PA specification defines several embedding points in a selection of media formats to place a "Manifest Store" in JUMBF format, which is the container for the various pieces of provenance data. Once you've identified where and how a manifest store is embedded in your favourite media format, most of the specification is format-agnostic.

What is JUMBF?

JUMBF (JPEG universal metadata box format) is a binary container format initially designed for adding metadata to JPEG files, and it’s now used in other file formats too. It is structurally similar to the , an extensible container format that is used for many different types of media files. JUMBF "superboxes" are boxes that only contain other boxes. JUMBF "content type" boxes contain actual payload data, the serialisation of which should match the advertised content type of the box. All boxes have labels, which allow boxes to be addressed and understood when parsing. C2PA uses JUMBF in all the media formats it supports to provide the container format for the Manifest, Claims, Assertions, Verifiable Credentials and Signatures.

Each piece of embedded provenance data is called a “Manifest”. A manifest contains a part of the provenance data about the current asset, or the assets it was made from. Because an asset might have been created from multiple original sources or have been processed multiple times, we will often need to store several manifests to understand the complete history of the current asset.

Manifests are located in the "Manifest Store", which is a JUMBF superbox. The last manifest in the store is the "ActiveManifest", which is the provenance data about the current asset and it's the logical place for validation to start. The other manifests are the data for the "ingredients" of the active manifest - i.e. the assets that were a part of the creation of the active manifest. This is one of the key features of C2PA: each asset provides a graph of the history of editing and composition actions that went into the active asset, exposing as little or as much as the asset publisher wants.

Each manifest within the store is again its own JUMBF superbox. A manifest then consists of: a "Claim", an "AssertionStore", a "" and a "Signature". Manifests are signed by an actor (the “Signer”) whose credential identifies them to the user validating or consuming them.

Diagram of a Manifest box, without any VCs

Assertions

Assertions are the statements being made by the signer of a manifest. They are the bits of provenance data that consumers of that data are being asked to trust, for example, the date of image capture, the geographical location, or the publisher of a video.
In the spec, each assertion has its own data model. Some are published as "Standard Assertions" in the spec, some are adoptions of existing metadata specifications such as ,  and , and it is expected that implementers will extend the spec by defining their own as well.

Media metadata isn't new

For example, the EXIF standard is nearly universal in digital photographs, used to record location and camera settings. The fundamentally new thing that C2PA does is allow you to cryptographically bind that metadata (with hashes) to a particular media asset and then sign it with the identity credential of the origin of that data, ensuring that the result is tamper-proof and provable.

Assertions are contained in their own JUMBF Content Type Box in the assertion store superbox and are serialised in the format defined in the spec for that assertion. The C2PA-defined assertions are stored as CBOR, while most adopted assertions from other standards are JSON-LD.

Here's an example of an "Action" assertion (in ) which tells you what the signer thinks was done in creating the active asset:

{
  "actions": [
    {
    "action": "c2pa.filtered",
    "when": 0("2020-02-11T09:00:00Z"),
    "softwareAgent": "Joe's Photo Editor",
    "changed": "change1,change2",
    "instanceID": 37(h'ed610ae51f604002be3dbf0c589a2f1f')
    }
  ]
}

And here's an EXIF one (in JSON-LD) that contains location data:

{
  "@context" : {
    "exif": "http://ns.adobe.com/exif/1.0/",
  },
  "exif:GPSLatitude": "39,21.102N",
  "exif:GPSLongitude": "74,26.5737W",
  ...
}

The one critical assertion is the binding, something that binds the claim to an asset. In fact, the spec requires one. This ensures that claims are not applied to any asset other than the one they were signed against. This is important in helping to ensure that the consumer can trust that the C2PA data wasn't tampered with between the publisher and the consumer. There are currently two types of "hard bindings" available, a simple hash binding to an area of bytes in a file or a more complex one intended for ISO BMFF-based assets, which can use their box format to reference specific boxes that should be hashed.

Claim

The claim in a manifest exists to pull together the assertions being made, any "redactions" (removals of previous provenance data for privacy reasons), and some extra metadata about the asset, the software that created the claim, and the hashing algorithm used. Assertions are linked by their reference in the assertion store and a hash. The claim itself is another JUMBF box, serialised as a CBOR structure. This is the thing that is signed, and it provides a location to find the signature itself.

Signature

The signature in a manifest is a CBOR structure that signs the contents of the claim box. COSE is the CBOR version of the JOSE framework of specs, which includes JWT/JWS. The signature is produced using the credentials of the signer. The signer is the primary point of trust in the C2PA Trust Model, and consumers are expected to use the signer's identity to help them make a trust decision on the claim's assertions.

The only currently supported credentials for producing the signature are x.509 certs. The specification provides a profile that certificates are expected to adhere to (including key usages such as “id-kp-emailProtection”, which is a placeholder). The specification does not include any requirements on how validators & consumers assemble lists of trusted issuers, as it is expected that an ecosystem of issuers will develop around this specification. Instead, it simply requires that validators maintain or reference such a list of trust anchors. Alternatively, they can put together a trusted list of individual entity certificates provided out-of-band of the trust anchor list.

What now?

This is an overview and omits both the detail required to produce C2PA manifests and the breadth of some of the other components of the specification (e.g. ingredients, the use of Verifiable Credentials, the concept of assertion metadata, timestamping etc). I'd love to produce a worked example of how to extract and validate a C2PA manifest from an asset; watch out for that in the future. I will highlight an , and I know of other implementations in the works, too.

At the Ö÷²¥´óÐã, we can't wait for this specification to develop and gain adoption. We'd love to see it supported in production and distribution tools, web browsers, and on social media and messaging platforms. We really think it can make a difference to some of the harms done by mis- and disinformation.

]]>
0
Refreshed interface, extra features: new Ö÷²¥´óÐã Media Player launched for web browsers Thu, 02 Sep 2021 16:04:34 +0000 /blogs/internet/entries/65db5af7-1ec1-4ba5-9f09-4f3c5f632b03 /blogs/internet/entries/65db5af7-1ec1-4ba5-9f09-4f3c5f632b03 Oli Freke Oli Freke

We are pleased to announce the launch of our new web media player, codenamed Project Toucan. We have been working on the player over the last year, and it brings many benefits for audiences, including a refreshed user interface and additional features. The new player is already live on Ö÷²¥´óÐã Weather, Ö÷²¥´óÐã Food and Ö÷²¥´óÐã Motorsport for most browsers as it begins to incrementally roll out over the coming months – Safari and iPhones will be supported in due course.

New playback and skip forward and back controls in the new version of the media player.

Toucan is a significant update to our existing Standard Media Player (SMP) - which has been around in one form or another since the 2012 Olympics. The existing player has been continuously updated since then and is still used across the whole of the Ö÷²¥´óÐã website, from Ö÷²¥´óÐã iPlayer to Ö÷²¥´óÐã Sounds and from the World Service to Weather. Over the last twelve months, the player delivered over three billion streams to web browsers on desktop, tablet, and mobile phones.

Why introduce a new player?

The SMP dates from 2012 and has accumulated support for many older browsers and technologies, which are increasingly being deprecated or even no longer exist. One example is the use of iframes which is used to separate the player code from the embedding page code. This technique still works but is no longer the preferred method in the web development community. Additionally, we wanted to completely redesign the user interface (UI) and take a mobile-first approach. Building a brand new UI on top of an old codebase didn’t make sense due to the incompatible technologies we wanted to use. The advantage of a new player is that we can drop all the old code, rebuild more efficiently and be more able to add up to date features and technologies in the future.

Toucan, therefore, supports many modern features that audiences expect. These include 20-second skip forward/back buttons, variable speed playback, slicker animations, and a cleaner design with mobile use in mind. Accessibility is also vitally important and has been considered from the start of our development. The player now has keyboard control – press space to play and pause, arrow keys to seek forward and back, and F to go full screen.

The new volume slider in the latest Ö÷²¥´óÐã media player.

One major advantage we’ve brought to the new player is a componentised approach. This means that the UI is loaded separately from the main player and enables us to produce different UIs more easily if required and load the most appropriate one. For example, a child-friendly version with larger buttons or additional features could be created and used where necessary (and save download costs by not loading any UI which is not needed). Another advantage of the componentised UI is that we can build-once, use-everywhere, meaning we don’t waste effort building the same UI for multiple use-cases. The new UI that comes with the new player is already being used with the new Chromecast receiver, which has prevented exactly that kind of duplication. This UI can be seen now on your TV if you cast from iPlayer mobile (and shortly, iPlayer web and Sounds mobile).

New Chromecast receiver controls used in the iPlayer mobile app version of our updated player.

We have also updated to a more modern version of CSS and JavaScript, and the player continues to be built on , the open-source media playback component, which we contribute back to for the benefit of the Dash.js community. New releases of Dash.js are, of course, always fully tested before making it live in production.

There are performance gains from the smaller size and more efficient build practices of Toucan and benefits from not needing to support as many older browsers as the current player does - and will continue to do so.

It’s built using , which was recently released in browsers as a native way to produce a component that can easily be integrated in diverse situations. We’ve also dropped our use of iframes in favour of Web Components’ – this is leading to a significant decrease in player load times.

Now that we have launched the first version, we will be adding all the other features that audiences need in a modern player. This includes audio on-demand, podcasts, live video, live audio, variable speed playback and much more. This will take time, but keep your eyes peeled as we incrementally develop the player across the Ö÷²¥´óÐã’s digital services.

You will soon see the new player on all our web and mobile services that playback video and audio, including Ö÷²¥´óÐã Sport, Ö÷²¥´óÐã News, Ö÷²¥´óÐã iPlayer, Ö÷²¥´óÐã Sounds, World Service and many more over the coming months as the features that these services require are built and launched into the new player.

While Toucan is exciting and will eventually replace the current player, we won’t rush to switch off the SMP. It will be needed by the audience where they are using browsers that don’t support the necessary features that Toucan requires, and we are committed to making sure that people can enjoy the Ö÷²¥´óÐã wherever they happen to be and on whichever platform they choose to use.

But for browsers that do support Toucan - we hope you enjoy that experience. Do send feedback to us at mediaplayer@bbc.co.uk.

]]>
0
Quality engineering for a shared codebase Mon, 22 Mar 2021 13:39:37 +0000 /blogs/internet/entries/c3bfeca9-88b5-4930-8be0-d7ca77ac6ea6 /blogs/internet/entries/c3bfeca9-88b5-4930-8be0-d7ca77ac6ea6 Abigael Ombaso Abigael Ombaso

The ‘You might have missed’ section showing featured content at the bottom of the Ö÷²¥´óÐã homepage

The Ö÷²¥´óÐã is developing a shared platform for building its digital products to reduce development complexity and duplication as much as possible. The aim being to enable quicker and more efficient software development processes resulting in quicker delivery of digital content to our audiences. Read more about the technology changes.

A key aspect of this project has been having a shared repository for the Presentation layer code with different teams working on this platform. This blog will be sharing our experiences so far through the lens of quality engineering by answering three commonly occurring questions that pop up before, during, and after product development — who is going to use the product, how will we ensure quality, and what have we learned so far?

Who will be using the product?

Engineers across different teams working on the platform directly and our digital products consumers are the main product users. We want to keep making great digital products (quality, usability and design), even as we change technology platforms, while also minimising bugs in our software as much as possible.

In order to reduce bugs and issues raised, the testing is integrated into the development workflow, and team members across disciplines have ownership of the product quality. Having a consistent approach to testing features in the platform and having a quick feedback loop for spotting and fixing defects early, helps in minimising the risks and impact across different teams. This is an ongoing process with fine tuning based on feedback from the development teams.

Solution: The users’ needs, (in our case digital products users and engineering teams building on the shared platform) help to define the product requirements that influence the test process.

How do we do the testing?

One of the key things Test engineers and other project stakeholders consider are the risks. The impact of different code merges and changes cascading to different teams was one such risk in the shared repo. Having a shared platform meant sharing other infrastructure (besides a GitHub repo), such as deployment pipelines, communication channels in Slack, documentation, etc.

A consequence of this is that deployments are now visible to multiple teams or stakeholders, with the notifications in our Slack channels flagging failing builds. Bugs get flagged up quickly and when needed, different development team members are able to ‘swarm’ (even while working remotely) to collaboratively debug and resolve these issues. This has led to more frequent and better communication across teams and we think this has been a beneficial and worthwhile project just for getting more people talking and working together more often.

There has been consistency planned into the project as a whole from the start, for example with the . Similarly, it was important to have consistency across teams when it came to testing the features developed in the platform, as we simultaneously worked on this shared code space, in order to minimise bugs, regressions and other product risks. Having an overarching Test strategy considering approaches to manual and automated testing (guided by the and ) has informed our testing.

Automated tests form part of pull request checks and before deployment to Live. We have consistency in the automated test tools we use and engineers across teams are able to know what the expectations for testing are. This is by no means a finished endeavour, but a continuous work in progress so having forums like the Test Guild and team knowledge sharinghelp with communication, continuous learning, and further improvements.

Because of the scale of the project we rely on automated test tooling for regression testing. We also began to use fairly new test tools for visual regression testing like Storybook and Chromatic. Alternatives were Percy, Nightwatchjs and Browserstack. For other types of automated tests we use Puppeteer and formerly Cypress. We had communication channels with the test tool makers to feed back issues encountered and to request new features as we scaled and grappled with using the different test tools.

Solution: Have a test strategy and plan early to mitigate against identified project risks by including quick and early feedback during the development process.

A diagram showing factors influencing quality engineering cycle in the project - strategy and planning, product users, communication, technology and continuous learning.

What have we learned?

One of the benefits of a brand-new project is that there is no legacy code or technical debt at the start (this changes pretty quickly though!). Mature products have gone through the growth pains. There are known unknowns and workarounds for known problems or pain points which the development teams, (and Test engineers in particular) come to know and understand fairly well.

The challenge however, with new projects, is that there are lots of unknowns with the new technology stack. As the project has been growing we have also been dealing with and learning from the scaling challenges such as pipeline issues from multiple deployments taking place at the same time, improving monitoring of traffic and website status errors, as well as optimising our stack’s performance as more product features have been built. Being able to identify such issues early on has been important. Manual testing by different team members helps with identifying such issues that may not be covered by the automated processes initially.

Solution: Continuously learn and iterate as issues are identified and fixed.

Conclusion

Building quality engineering into a shared repository requires similar considerations to that of single-team projects but on a bigger scale and with a wider focus. These considerations are; who is the product being made for and by whom; what are the product risks, and what is the test approach or plan to reduce the impact from these risks. The aim is to provide quick feedback and monitoring for regressions during the software development process. Test automation and tooling are important for facilitating this. Continuously learning about our product, (including from our product users) by regular communication, exploring, and working collaboratively. This has helped with iterating on our quality processes based on our findings and has been important for our quality engineering.

]]>
0
Building a WebAssembly Runtime for Ö÷²¥´óÐã iPlayer and enhanced audience experiences Mon, 01 Mar 2021 10:19:41 +0000 /blogs/internet/entries/39f42525-77db-43b0-81bb-70a0d5b1f062 /blogs/internet/entries/39f42525-77db-43b0-81bb-70a0d5b1f062 Juliette Carter Juliette Carter

At Ö÷²¥´óÐã Research & Development, we are investigating how we evolve our current multimedia applications to move beyond video by using object-based media (OBM). OBM allows us to develop future audience experiences which are immersive, interactive and personalised.

There is an ever-increasing number and range of audience devices capable of playing back OBM experiences. The challenge we now face is universal access - How can we get all members of the audience to enjoy OBM experiences on any device, and how do we do this sustainably and at minimal cost?

Our Render Engine Broadcasting (REB) project is investigating new technologies that will allow the Ö÷²¥´óÐã to deliver these OBM experiences at scale to all of our audiences, no matter what device they use. Our ultimate goal is to deliver real-time and fully rendered experiences on any device or platform and write the software to do it only once. We have been investigating the use of WebAssembly as a cross-platform technology for this.

What is WebAssembly?

(wasm) is a Universal Binary format designed as a sandboxed environment and a portable compilation target, which means that the same wasm module can run securely on multiple platforms. A number of strongly typed languages such as C/C++, Rust or AssemblyScript can compile to WebAssembly, making it language agnostic. This makes it an attractive option for adoption in the industry as it enables developers to use languages they already know to produce wasm binaries.

When WebAssembly was first developed a few years ago, its target platform was the web. The aim was to compile fast and efficient system-level code and have it run in the browser. Compute intensive applications, such as real-time interactive rendered graphics, could be run in a web browser at near-native performance. This also enabled some native applications to be ported to the web, increasing their reach and usage. These include , which renders 3D representations of satellite image in the browser, and , which now offers a WebApp to create and edit CAD drawings.

In the last couple of years, WebAssembly outside of the browser has been gaining traction. A number of native wasm runtimes have been developed, which has enabled the use of WebAssembly for microservices and server applications. In 2018, the website security company announced the use of WebAssembly on their edge workers, allowing users to deploy secure and fast serverless code compiled to wasm. And the edge cloud platform provider  offers new wasm-based edge computation using their native runtime Lucet.

The portability of WebAssembly across multiple platforms and its security model are the key reasons for Ö÷²¥´óÐã R&D’s interest in using this technology as a compilation target for media experiences. As a public service broadcaster, we need to deliver value to all of our audiences, regardless of the device they use. Where traditionally, a codebase for each target platform and a different team to maintain each codebase would be required, the use of WebAssembly potentially allows for a much more sustainable developer ecosystem. It enables media software applications to be created once, from a single codebase, compiled to WebAssembly and deployed on any client or server platform depending on the capabilities required. It also offers numerous advantages compared to previous multimedia or cross-platform technologies (such as Flash or Java Runtime Environment). Indeed, it is language agnostic, security-focused, has predictable performance, and works inside and outside the browser. WebAssembly is also an open standard, which encourages its adoption.

How have we used WebAssembly?

We wanted to demonstrate how we could use WebAssembly to deliver media experiences that can run on many target platforms built from a single codebase. To do that, we implemented an example media application written in C++, which we compile to WebAssembly, giving us a wasm module. We designed this application to look like a version of Ö÷²¥´óÐã iPlayer, allowing users to select content, watch video programmes, AND play OBM experiences. We call this application the Single Service Player (SSP).

An example of how object-based media experiences could appear within Ö÷²¥´óÐã iPlayer.

To run our SSP wasm module, we needed a wasm runtime. The SSP makes use of some low-level media functionality, which isn’t scoped by the WebAssembly specification. To enable wasm modules to make use of these low-level multimedia capabilities, they need to be implemented in the runtime and made available to the wasm module through a set of imports. Examples of such capabilities include:

  • Windowing and rendering — In most cases, a multimedia application will have some graphical elements to it, which requires things to be drawn in a window (such as video frames or a UI screen).
  • User inputs — An interactive multimedia experience expects user inputs, such as keyboard or mouse events.
  • Media encoding and decoding — To efficiently encode and decode media (such as video frames or audio packets), it is preferable to use the host’s hardware resources where possible.

As there is currently no WebAssembly runtime that offers these media capabilities, we've decided to create our own.

There are already some efforts in specifying ways a wasm module can talk to the host. proposes a set of standardised POSIX-like syscalls (the programmatic way in which a computer programme communicates with the host system) for libc functionality, mainly file handling and networking. These are called from the wasm module and implemented in the runtime.

We decided to use a similar approach to allow our SSP wasm module to communicate with the host, enabling it to have access to low-level media functionality. This involved identifying all the platform-specific media capabilities that could not be compiled to wasm and implementing them in the runtime. These capabilities were then made accessible to the wasm module through a set of platform-independent syscalls passed as imports.

This figure illustrates the whole process, from writing a media experience as software (such as the SSP) to running it as a wasm module on any device. The steps are detailed below.

The first step was to design the multimedia sys-call API behind which we would implement our cross-platform multimedia capabilities in the runtime. It needed careful consideration to ensure it was thread-safe and honoured the wasm security requirements around memory access. In the figure above, we use reb_decode_video() as an example syscall, which our SSP application can use to access low-level multimedia functionality, such as utilising the system’s hardware for video decoding.

Our SSP code was compiled to wasm using the clang compiler and the wasi-sdk toolchain, and the required syscalls are added as imports to the wasm module.

We then built the multimedia wasm runtime, consisting of two parts. The first one is the execution environment for wasm modules, which allows us to load and run a wasm module. For this, we embedded , a project based on , which generates the machine code for the target platform from the wasm binary.

The second part of our runtime is the implementation of the low-level multimedia functionality. For this, we created a cross-platform C++ library with input detection, networking, windowing, graphical rendering, and media decoding, which sits behind our carefully designed syscall APIs. We compiled our library for several target platforms, such as Linux, macOS, Windows, Raspberry Pi and Android. We also wrote some glue code to connect the two parts.

Where do we go from here?

A wasm runtime capable of executing multimedia applications opens a lot of possibilities, principally around flexible compute. Flexible compute allows us to run computationally demanding applications by dividing up the workload between available resources. These resources could be located locally (a laptop, games console or phone in your house), in the edge, or the cloud.

As we move towards delivering fully rendered real-time interactive experiences, the flexible compute approach becomes an attractive solution to the computational demands of such applications. We could, for example, consider segmenting a rendered frame into several tiles or objects, each of those rendered on a separate available compute resource. Many systems approach this problem by running specific compute tasks in containers across the available devices and platforms. We hope to use our work and accrued knowledge in developing the wasm multimedia runtime to investigate a viable alternative to the container approach for distributed media applications. We are looking into using wasm modules to perform secure and fast computation on any remote compute nodes.

Our runtime, capable of performing media services such as rendering and decoding or encoding of rendered video frames, can be used to display the final experience to the user on a client device and to execute the remote computational tasks as wasm modules. Using WebAssembly combined with a flexible compute approach, we hope to develop technology that allows the audience to access any future experience, regardless of their devices at home.

]]>
0
Looks good to me: Making code reviews better for remote-first teams Tue, 23 Feb 2021 10:44:31 +0000 /blogs/internet/entries/a07fc8a0-ff7f-46be-87c8-aded28dd65c0 /blogs/internet/entries/a07fc8a0-ff7f-46be-87c8-aded28dd65c0 James Donohue James Donohue

The Russian version of the Ö÷²¥´óÐã News website.

Why do you do code reviews? Perhaps it’s company policy, just an automatic part of your process, but have you ever sat down with your team and asked what everyone hopes to get out of it?

As a developer, has it ever felt like playing a strange board game where the rules are secret and keep on changing? Or as a delivery manager, have you ever been puzzled why reviews sometimes seems to take longer than writing the code itself?

These are some of the things we were asking ourselves at Ö÷²¥´óÐã News a year ago. We’re not sure we’ve found all the answers yet. But we think what we’ve learned so far has improved our engineering culture and helped to make code reviews a better experience for everyone.
In this post I’ll share why we started asking these questions, and some of the things we found out along the way.

Our first taste of remote-first working

Throughout 2019 the teams working on the Ö÷²¥´óÐã News and World Service websites went through a period of rapid expansion and changing priorities. What had started out as a close-knit team of five building a reimagined articles page for Ö÷²¥´óÐã News ended up with 35 engineers across five teams contributing to the same product.

Suddenly we weren’t just building one page type, we were rebuilding all from scratch.

There were other changes. At the start of the year, everyone was co-located around the same bank of desks at Broadcasting House in London. Instant feedback and advice were usually available by turning to the person sat next to you, and pairing came relatively easily.

By December however, the engineers were split across six geographical locations in three time zones. While nobody could have imagined the upheavals of 2020 that lay ahead, we found ourselves having to adapt to a remote-first culture rapidly.

Growing pains

Despite the stresses of scaling sevenfold in a matter of months, the teams made a great success of the changes: by mid-2020 we had successfully rebuilt a set of sites that deliver free and impartial journalism to 35m unique visitors globally per week (for an overview of the Ö÷²¥´óÐã’s cloud journey see Matthew Clark’s recent post, and for the performance improvements in the new sites see Chris Hinds’ post here).

But there were definitely challenges along the way, and one area that was often raised as a source of difficulties was peer code reviews. A large influx of engineers from other teams, most of whom were learning about the codebases for the first time, started opening pull requests (PRs) to get feedback on their work.

We saw that these PRs would sometimes spend considerable time in the code review stage, often cycling between review and re-work multiple times. Approaches to giving and receiving reviews seemed inconsistent. Some developers reported that they found the process unclear, and at times stressful.

It was clear it was time to talk and reflect as an engineering community. We started asking what this ‘code review’ thing we were doing automatically was actually for. We decided we needed a kind of charter for code reviews.

Creating our very own charter

Why make our own? After all, there are already numerous excellent online resources available (such as and .

The answer was twofold. Firstly, we wanted to create a shared, living document that would evolve over time as we learned more about what works and does not work for our teams.

Secondly, while existing resources offer many useful global principles, a lot of the questions we had required local answers, specific to the structure of our organisation, our codebases and the design of our automated tooling and Continuous Delivery processes. For example: who can ‘approve’ a PR? Do all types of changes get reviewed in the same way (documentation, config, infrastructure code etc.)?

As Gergely Orosz points out, having an engineer-initiated guide to code reviewing is one hallmark of organisational support for the practice. So we started an anonymous collaborative document to source ideas, frank observations and anecdotes about code reviews from engineers of all levels of experience. Then, with feedback and advice from a range of viewpoints we gradually distilled that advice into our guide.

We wanted to make sure that the guide was easy to discover and would continue to be updated, so we put it right into our repositories and linked to it from our top-level READMEs and PR templates.

And as an added bonus, the project we were working on is open source, meaning that potential contributors from outside the Ö÷²¥´óÐã benefit from more transparency about how we aim to review contributions.

A quick quiz

What’s the goal of doing code reviews? Is it:

  1. To catch defects before they reach production
  2. To share knowledge about helpful patterns and best practices
  3. To discuss alternative approaches and viewpoints
  4. To allow developers of all levels of experience to learn
  5. To link to useful documentation and other resources
  6. To distribute knowledge across the team
  7. To ask questions and check understanding
  8. To improve readability and maintainability of the code
  9. To identify documentation needs
  10. To ensure that work meets quality standards
  11. To record the rationale for certain changes
  12. To coach and mentor junior developers
  13. To promote sharing ownership of the codebase
  14. To notify other affected teams of changes that are being made
  15. All of the above

If you answered “all of the above,” you’ve come to the same conclusion that we did.

We’ve seen over and again that a good code review can achieve all of the above and more (which is not to say that code reviews are the only way to achieve these things). Participating in code reviews in a spirit of open, egoless collaboration is the key to unlocking all of these benefits.

In fact, we sometimes wonder if catching defects is one of the weakest arguments for code reviews, at least in the relatively unstructured way we do them. Think about the last time a bug hit production and caused you a problem, perhaps because of something as trivial as a typo in some config. Could it realistically have been caught during code review? Was it? The truth is that busy developers are not always great at playing ‘spot the difference’.

By the way, the idea that there is a discrepancy between what we expect from code reviews and what they actually achieve is not a new one. An of hundreds of code reviews at Microsoft in 2013 found that sharing understanding and social communication were more common outcomes than finding defects.

What we learned from the process

A few of the most interesting themes that emerged from our reading and internal discussions are summarised below. For more context, see the full guide linked to at the end of this section.

Reviewing code is a first-class activity

We think learning to review code well is as important as learning to write it well. Code reviews are not a second-class activity, to be squeezed in between ‘real’ work. We allow sufficient time in our planning and estimates to allow it to take place, and we don’t neglect it in emergencies, as the cost of haste at these times can be even greater. We actively nurture and develop the skill of reviewing in engineers of all levels.

Communication is at the heart of code review

As mentioned earlier, often the real benefits of code reviews are communication and understanding. GitHub is a powerful collaboration tool but not the only one at our disposal. Jumping on a Zoom or Teams call to talk it through can be friendlier and more efficient than back-and-forth debates on PRs.

This is especially important for growing, distributed teams where there is greater scope for confusion and misunderstandings. We also find effective, especially for significant new abstractions.

Don’t miss the chance to learn

To quote from our :

Our primary aim in participating in code reviews is to learn from each other, increasing our understanding of the codebase we are responsible for and of the technologies we use.

Code reviews aren’t just about getting a rubber stamp of approval from someone (often written LGTM, or Looks Good To Me). Every code review is an opportunity to ask questions, share knowledge and consider alternative ways of doing things. Done properly, both authors and reviewers can expect to learn and grow from this process.

And if potential bugs are caught along the way, that’s awesome too.

Remember to be kind

Code reviews can be emotionally demanding for both sides. Pride can come into play — authors may have spent a lot of time preparing their changes, and others may be protective of a codebase.

Taking a cue from the , we assume that everyone did the best job they could, given what they knew at the time and the resources available. As we found, trying always to understand the other person’s perspective is particularly critical when new team members are contributing to an established codebase.

Language is also important (Philipp Hauer provides ). Careless use of evaluations and jargon can undermine the spirit of collaboration that ought to underpin code review. For the same reason we also make a conscious effort to to replace

Keep things in proportion

Many of us have seen cases where a review hasn’t gone so well. A high comment count, multiple revision cycles and comments that have forgotten to be kind are all warning signs that team leads should be watching out for.

Time spent on code review should be kept in proportion with other stages of the development life cycle including testing and accessibility and UX review.

We find that as a rule of thumb, if a change has spent longer in review than it took to implement the code that might also point to a wider problem that needs attention. Are team members prioritising reviews correctly? If there are lengthy conversations, perhaps some of them could have happened earlier in the process?

It may also be helpful for dev teams to regularly reflect on reviews which were harder or less effective than they should have been.

Look for every opportunity to make reviews easier

We’ve noticed that changes that have been authored by two engineers pairing or a larger group swarming often seem to spend less time in code review, with fewer review cycles. This makes sense given the above points about communication, as a lot of potential questions and concerns can be pre-empted, and the change should have already benefited from a more diverse range of skills and viewpoints.

And as April Wensel suggests, automate what you can, for example using linters effectively to minimise trivial nitpicks.

More detail on the above can be found in the full guide at . Bear in mind that some of it is intentionally quite specific to how we work at Ö÷²¥´óÐã News and may not apply in your context.

Ask difficult questions!

There are many potential benefits to doing code reviews, but they are not always what people expect, and there are pitfalls along the way too. Even so, not all teams will need or want to go through the process of reflection described above. Perhaps tacit assumptions about code reviews are working great for you, or you find one of the existing online resources sufficient.

But in the spirit of continuous improvement and trying to make the way we collaborate more transparent, we found writing our guide a really positive experience. And with remote working now the norm, having a clear shared understanding of these activities is more vital than ever.

The document is now part of our onboarding process for new engineers, and we’ve seen promising feedback about the cultural benefits that talking frankly about code reviews has made.

But the journey doesn’t end there — we’ll continue to update our guide as we learn more about what works for our teams. Sometimes it’s only by asking difficult questions about every stage in our development process that we can really improve the way we build software together.

]]>
0
Optimising serverless for Ö÷²¥´óÐã Online Tue, 26 Jan 2021 13:34:29 +0000 /blogs/internet/entries/41db1cb0-8948-40a9-b2be-22c9494078f6 /blogs/internet/entries/41db1cb0-8948-40a9-b2be-22c9494078f6 Johnathan Ishmael Johnathan Ishmael

The Ö÷²¥´óÐã Ö÷²¥´óÐãpage, powered by serverless.

Previously I talked about why Web Core, our new web technology platform is built using serverless cloud functions. In this post we’ll discuss our experience of integrating them with our architecture and how we’ve optimised them for our use case.

The Ö÷²¥´óÐã makes use of many different technology providers to deliver its online services. We make use of service to deliver the serverless functionality for Web Core. There are other platforms which provide similar functionality, such as , , and . The majority of points made can be applied to any of these platforms.

Serverless in Web Core

We make use of serverless in two of the high CPU intensive areas of Web Core. The first is the React app that renders the HTML. This layer understands which visual components make up a page, and fetches the data for them. The second is our business logic layer, which transforms data from many different Ö÷²¥´óÐã systems into a common data model.

Unlike other typical architectures we don’t make use of an to launch our serverless functions. Instead we have our own custom APIs running on virtual machine instances. These sanitise requests, forward traffic to the relevant serverless functions as well as handling caching and fall backs. These APIs are high network, high memory, low CPU services. This best fits a virtual machine instance over a serverless approach due to the availability of different instance types. In addition the scaling pattern for the APIs is slower and more predictable as traffic grows, compared to the high CPU workload carried out by our serverless functions.

A simplified version of the Web Core stack, requests enter at the traffic management layer.

So why the intermediate caching layers, why not just call two serverless functions one after each other?

Firstly to reduce cost. If we had serverless functions calling serverless functions directly without intermediate caching, we would be chaining them. This is a problem as we would be paying for the first to sit idle, waiting for the second to return. This can get expensive quite quickly, especially when service calls are involved. If this were a virtual machine instance, we would be able to run multiple requests in parallel, using the CPU’s idle time waiting for a request to return. But most serverless solutions have a one-to-one mapping between request and function invocation. Every request results in a new invocation, meaning the costs can quickly add up. Performance is therefore critical, but additional caching layers ensure most functions can complete in under 50ms.

The second benefit of intermediary cache layers is that we can perform content revalidation (e.g. stale while revalidate). This allows the presentation function to return a response to the audience quickly (and terminate the first serverless function) while the second serverless function runs and populates the cache for the next request.

Optimising our performance

Out of the box, serverless performs well and can easily handle large work loads that you throw at it. However, as it is the workhorse of our stack, we wanted to make sure we can get the most value out of it. The longer the serverless function runs the more it costs us, but also the longer the user has to wait for a response.

Most things you can do to optimise a normal application also apply to a serverless function, such as intelligent caching of data, reducing time waiting for other systems to respond, and reducing synchronous operations. For the purposes of this post we’ll touch on Lambda specific tuning that we have implemented.

Selecting the correct memory profile

Unlike virtual machine instances, Lambdas only have a single configurable “capacity” dimension — memory. Our React app is fairly low in terms of memory usage (~200Mb), so ideally we would pick the lowest value (and so lowest cost). However changing the memory setting also impacts the available vCPUs. This in turn impacts our response times and cold start times.

We performed load tests against each memory setting to determine the average response time. There is a critical point at which further increases in memory have no improvement in performance, but do increase in cost. In our case we found 1024MB as the optimal price/performance point.

Cold start times

When you make your first request to a serverless function, it must copy your code bundle to the physical machine and launch your ‘container’ for the first time. If you make a second request, it’s likely to hit the same ‘container’ that you previously launched. If two concurrent requests come in at the same time, the first hits the existing ‘container’ you have in place, and the second will need a new one starting up. It’s this start up time which is referred to as our cold start time. This happens every time we increase the number of concurrents from a previous high, or we deploy a new version of the code.

Some of the factors that can impact cold start time include memory allocation, size of the code bundle, time taken to invoke the run time associated with your code, and any initialisation you may choose to do before the serverless function executes.

As our website is mostly request driven, cold starts result in our audience members waiting longer. It also causes an increase in connections held open and so a significant slow down in serverless function execution times which can cause issues for any service waiting for that serverless functions to return.

One advantage of the cold start process is that you can take care of any initialisation that might be required for when the function executes. This allows the actual invocation to be as efficient as possible. You also don’t pay for the cold start period, so essentially a few free cycles you can use to optimise the performance of your function invocation later. We used this time to establish network connections to all the APIs which are needed (removing the need for SSL negotiation later) and we also loaded any JavaScript requirements into memory for use later. This means that we are not having to perform this logic during every execution.

As mentioned earlier, we went through a process of identifying the correct memory profile for our serverless function. During that process we discovered that the memory profile also impacted cold start times, with a 512 MB memory profile increasing cold start times by a factor of 3 over a 1024 MB cold start time.

So in tuning our cold start times, we struck a balanced approach between per invocation overhead and per cold start overhead.

A graph showing minimum, average and maximum cold start times over a week for our react rendering function. The X axis is time, Y axis is cold start duration in milliseconds.

In terms of overall cold start performance, it’s very predictable, with our average cold start time sitting around 250ms, and our maximum peaking around 1 to 2 seconds depending upon time of day.

What type of performance do we see?

We are performing over 100 million serverless function invocations per day. The performance of each invocation is predictable and performs as we would get from a virtual machine or container based service. Our HTML render times are quick, with the being around 220ms. This time includes the time taken to fetch the data from our data services.

A graph showing the average and p90 response times of our presentation React app serverless function over a 4 week period. X axis is time, Y axis is duration in milliseconds.

In terms of scalability, during benchmark tests we were able to go from a cold system at 0 rps with everything uncached to 5,000 rps within a few minutes. We would have tried a faster scale up, but we didn’t want to break our internal content production infrastructure. During our production use we have not seen a single function invocation fail. At 100 million requests a day, that’s some achievement.

Diagnosing and debugging

One of the disadvantages of serverless is the lack of a platform to remotely diagnose during an operational incident. If something went wrong whilst using an EC2 instance, once you’d exhausted logs and monitoring, you would typically SSH into the box and diagnose the health of the running device.

Moving to serverless for our compute has meant us having to adopt other platforms to support monitoring and debugging. To do this we make use of a combination of and , brought together via Service Lens. Other platforms from other providers are available to do this including and .

A screenshot of an X-ray trace showing the service calls and timings of rendering our homepage.

On every request that enters our stack, we annotate the request with a trace identifier. As the request navigates our stack the request maintains the id. This allows us to build a graph of each request through the system, along with a timing waterfall.

If an individual page is erroring, we can look at the traces generated by that URL and understand where in the stack that issue lies. Through the Service Lens interface, we’re able to provide all the log files for that trace in a single place, which enables us to view any stack traces or errors generated by the code.

We have found that this form of debugging has enabled us to go from being notified of an operational incident to a log stack trace in a very short period of time, which is great for ensuring our website stays available and up to date.

Deploying to Live

One of the big surprises with serverless was the way in which it changes your approach to environments and deployments. Typical virtual machines require you to build an image and deploy rolling changes out to your infrastructure. If you want to run two builds concurrently (e.g. staging and testing) you would need a separate infrastructures running that software.

With serverless, it’s slightly different. We package and upload our code and it sits there ready to be invoked, at any scale, but critically in isolation away from the other versions you have already uploaded. Containers do operate in a similar fashion, but could still impact the underlying hardware on which your critical service might be running.

Within the Web Core project, we have the ability to build and deploy our serverless function on each commit to GitHub (although typically we wait until the pull request stage). This can then be invoked by a developer, tester or product owner and get the full functionality as it if were audience facing, even when it’s in development. In essence this gives us an environment per pull request!

When a feature or change is ready for deployment, we alias the function to deploy the change to live. Making a particular commit hash the ‘production’ alias, will expose that serverless function to the audience. A rollback is just a case of updating the alias again.

Thanks for reading!

The platforms that run Ö÷²¥´óÐã Online are always evolving and changing as our technology providers improve their services. We’re always on the lookout for how we can deliver the best audience experience. The Ö÷²¥´óÐã website can reach 60 million browsers a day, with our users requesting up to 20k pages per second.

Compute comes in many flavours, and after careful consideration we chose to use serverless for the task. We are now running over 100 million serverless function invocations per day, with the technology being cost effective, able to respond quickly to changes in demand and performant to audience requests.

The technology has worked well from the outset, however we have learned a number of tricks to improve its performance (including optimising cold start times and behaviour). We have also overcome some of its limitations by adding non-serverless layers to our stack.

We have been able to take advantage of the monitoring services integrated with serverless, improving our ability to diagnose and debug issues quickly.  Thanks for reading. 

]]>
0
Delivering Ö÷²¥´óÐã Online using serverless Wed, 20 Jan 2021 11:31:49 +0000 /blogs/internet/entries/54a6decb-1469-46df-939c-2ab299494183 /blogs/internet/entries/54a6decb-1469-46df-939c-2ab299494183 Johnathan Ishmael Johnathan Ishmael

A Ö÷²¥´óÐã News article powered by serverless.

Previously we wrote about how we’ve moved Ö÷²¥´óÐã Online to the cloud. This post looks more closely at why Web Core, our new web technology platform is built using serverless cloud functions.

The Ö÷²¥´óÐã has some big traffic days

Before we talk about serverless, here’s some context to the workloads we need our technology platform to handle.

During the 2020 US Election, the Ö÷²¥´óÐã News website received visits from over 165 million unique browsers from around the world. During the election week the Web Core stack averaged 80,000 requests per minute, peaking at 120,000 requests per minute to our platform. Those requests are the ones which made it past the traffic management layers (caches) in front of us. At its peak our edge traffic managers and CDNs saw 2.5 million page views per minute (around 41,000 requests per second).

At first glance, our traffic profile is typically cyclical — busy during the day, and quiet overnight. However, it isn’t completely smooth and predictable as the graph below shows. Instead, it fluctuates based on the current news, social media traffic, as well as calls to actions on TV, radio and when we send push notifications to our apps.

A view of how a request travels through the Web Core Stack

In addition, different parts of Ö÷²¥´óÐã Online will peak at different times, and in unpredictable ways. If a major news story breaks, Ö÷²¥´óÐã News can peak in an instant. And Ö÷²¥´óÐã Bitesize, during the first UK national lockdown, became consistently busy during the school day, as school children moved to learning online.

As an example, let’s take the breaking news story around the London Bridge attack in November 2019. The traffic profile at our edge cache for this page resulted in a 3x increase in traffic in a single minute (4k req/s to 12k req/s) followed by a near-doubling of traffic a few minutes later (12k req/s to 20k req/s).

These key moments are critical for the Ö÷²¥´óÐã, and they are the moments in which the audience turn to us. We must not fail. Any technology we choose must be able to respond to these traffic patterns.

A chart showing requests per second over the 7 day period 3/11/2020 to 10/11/202 for bbc.co.uk/news. The traffic is mostly cyclical, with a few exceptions as news stories break.

Choosing the right platform for the job

Choosing the type of compute depends upon factors associated with your workload. These can include the number of requests, duration of work, and resources required (CPU, Memory, Network). Being a website, requests are short lived — up to a second or two — and don’t require persistence between requests. The requests are typically CPU bound, rather than restricted by network or memory.

Virtual machines (e.g. or ) are useful for workloads that change no faster than you’re able to add capacity, or for work loads that can tolerate delay in scaling (e.g. queue based event systems). They are also useful for short lived sessions that can tolerate scale down events. For our use case — rendering web pages — virtual machines don’t tolerate large bursts of traffic well enough, as they can take a few minutes to add capacity. As mentioned previously, our traffic load can double in less than a minute, and so autoscaling might not react quickly enough. Our experience has taught us that we need to over provision the number of instances you have available (e.g. 50% more than what you need) in order to deliver enough capacity to manage scale events. Even trickier is understanding when you have over provisioned and when you should scale down. Virtual machines comes in many different flavours, tailoring for high memory, high CPU or a combination of both. So it’s possible to pick a compute platform that matches your workload, response times and cost profile.

Containers (e.g. a cluster or ) which run on top of fixed compute. Like virtual machines they are useful for traffic volumes which slowly change over time. While you’re able to start up a new container quickly to handle a new session, you’re still limited by the underlying compute instance on which the containers are running. The underlying compute hardware has to scale to meet the demands of the running containers. Containers are great for long lived, stateful sessions, as they can be ported between physical hardware instances while still running. This allows for them to be redistributed across physical hardware as the workload grows and shrinks. A great use case for containers would be an online game, where you need sessions to persist as you reorganise your workloads.

Serverless functions (e.g. or ) are, in essence, containers running on top of physical hardware. They are ideally suited for handling unpredictable traffic volumes and for non-persistent and stateless connections. As a customer of the service we don’t need to worry about how the underlying hardware is provisioned to deal with load. Instead, we simply initiate requests to the service. The burden of provisioning the containers, and scaling of the underlying infrastructure is handled for us by the service provider. So why can they handle the scaling better than we can using containers? — The simple answer is economy of scale, in that the provider manages the work loads of many customers shared across the same physical hardware platform, what is a large increase in workload for us, is small in comparison to the service providers overall workload.

Reducing operational complexity

One of the biggest advantages of serverless over fixed compute is that of reducing the operational complexity of running the service. Reducing the engineering effort in running and maintaining our systems allow us to focus more on the audience product and experiences. The further away you move from the bare bones hardware, the less the engineering cost for the developing team. What used to be engineering effort for DevOps teams, is now commoditised.

With Virtual Machines we need to consider general maintenance and security, everything from patching the underlying operating system through to ensuring we have configured the machine for running at scale (such as available file handlers). We need to ensure the machine is secure from an external attack, whilst being remotely administrable. We also have to manage things such as log rotation and temporary file storage. All of these things go away when you move towards serverless.

Managing the scaling rules of a fixed compute server is complex and requires extensive testing. You have to consider aspects such as measuring your workload, capturing metrics, understanding the correct properties to scale on, and by how much. Understanding and implementing these things come with some considerable effort.

Next time… using serverless in Web Core

This blog post has outlined some of the considerations we took when choosing a technology platform for Web Core.

In my next blog post I’ll talk about how we make use of serverless, including how it fits into our architecture, its performance and some benefits we’ve gained from its use.

 

]]>
0
Ö÷²¥´óÐã Online: 2020 in review Mon, 07 Dec 2020 13:44:53 +0000 /blogs/internet/entries/cf71a61c-9da9-4194-90af-4e754956edf3 /blogs/internet/entries/cf71a61c-9da9-4194-90af-4e754956edf3 Neil Craig Neil Craig

2020 has clearly been a very unusual and oftentimes painful year in many ways for just about everyone on the planet. Like many organisations, we’ve had to shift the majority of our staff to working from home in a very short timeframe — this was easier for some teams and people than others.

It’s well documented that in troubling times, the world comes to the Ö÷²¥´óÐã. Nowadays, that means we see big increases in demand for our online services. That extra demand and adaptation to new working patterns has been coupled with enormous changes in our output — far fewer TV programmes being made, much more frequent breaking news events, a significant bump in the volume of online education we provide and regular live press briefings, to name but a few.

With all this going on, it’s easy to forget about the fantastic work which has been going on, so we wanted to follow on from last year’s “Ö÷²¥´óÐã Online — 2019 in review” and document 2020. Lots of our teams have contributed so you’ll be able to see some of the highlights of their achievements.

Ö÷²¥´óÐã Children’s and Education

The Children’s and Education team are responsible for building and maintaining 4 flagship CBeebies app propositions, 1 CÖ÷²¥´óÐã app proposition, 350+ interactive games and content items, the CÖ÷²¥´óÐã, CBeebies and Own It websites along with the Bitesize website and app.

  • Jacob Clark has written a blog post “Shipping Progressive Web Apps everywhere”, which outlines in detail the journey we’ve been on these past 12 months transforming how we build our 5 flagship Children’s apps. Shifting away from proprietary native app frameworks to building on modern, open, web standards, developing universally distributable web apps on a new platform.
  • We’ve been working hard over the course of the past year ensuring children and parents have access to resources to Daily Lessons to continue their education as well as developing new, personalised and adaptive learning experiences.
  • Across our games estate, we’ve been working on ways to provide users with onward journeys into more games they might be interested in through our cross platform “Exit Tile” whilst developing features such as Shops within our core game engine technology .

Ö÷²¥´óÐã Account

The Ö÷²¥´óÐã Account team are responsible for enabling sign in or registration for a Ö÷²¥´óÐã Account. This helps to enable a personalised service for our audience members across TVs, the Ö÷²¥´óÐã website and mobile applications.

In 2020 we’ve brought lots of great improvements to Ö÷²¥´óÐã account:

  • Children’s profiles: We’ve worked closely with the iPlayer team to enable our younger viewers to see the content that’s more relevant to them using our new children’s profiles on televisions.
  • Password breach checker: Building on an idea from our engineering 10% innovation time, we aim to help our audience to create safer passwords by checking if they have been used in a previous data breach. Read more here.
  • Showcasing Account Memories: To remind users of the amazing content they have enjoyed and to associate happy memories with the Ö÷²¥´óÐã, we’ve started to play back some of the user activities within the Account space.
  • Performance improvements: We’ve iterated and optimised our applications to enable better resource utilisation and autoscaling. This has allowed us to cope with the increased demand for our accounts during lockdown.
  • Rolling refresh tokens: To improve the experience for our TV users, we’re keeping regular users signed in for longer.
  • iOS and Android: We’re making better use of the capabilities of these platforms to make it easier for our users to sign-on to multiple Ö÷²¥´óÐã apps and websites.

Ö÷²¥´óÐã World Service News

The World Service News teams are responsible for building and maintaining an open source platform called Simorgh which is used for the Ö÷²¥´óÐã’s 41 language World Service News websites. We’re made up of 13 Engineers, 2 Business Analysts, 2 Product Managers, 2 Delivery Managers, 3 Testers and 7 User Experience Designers.

This year we’ve been working on:

  • Simorgh: Over the past 12 months we’ve migrated our pages which are spread across 41 discrete sites from a legacy PHP monolith to a new React based application. This application is called , an open source, isomorphic single page application developed by the World Service languages team.
  • Performance improvements: A large proportion of the Ö÷²¥´óÐã World Service audience are on slower 2G and 3G networks, they use lower end budget-friendly android handsets or feature phones. Blocking JavaScript requests dropped by 100% from 9 to 0, JavaScript requests dropped by 79% and total page weight is now 60% smaller than before. We have made other efforts to improve our web page performance, meaning our worldwide audiences can access stories quicker than before.
  • New reading experiences: We have started work on a project to deliver a new article experience to our users, which will allow our editorial teams to more easily create content for our audiences.
  • Improved media experiences: We have launched new ways for our worldwide audiences to access audio and video content on the World Service News websites.

Ö÷²¥´óÐã Website (including News, Sport and Ö÷²¥´óÐã)

This year has been a huge year for the Ö÷²¥´óÐã website, which has completed its rebuild and move to the cloud. Tens of millions of people now consume articles, video clips, and other content on a rebuilt site that’s faster and more accessible. Multiple pages have been rebuilt, including the Ö÷²¥´óÐã Ö÷²¥´óÐã Page, topic pages, articles, clips, and search. And the technology used has changed, from PHP to Node.JS/React. It’s also partly hosted on serverless technology, and we believe we’re one of the largest sites in the world to do so.

This recreation of our website, ready for the next decade, has been a huge programme of work. It’s tackled duplication and cross-team challenges, ensuring new site is as efficient and high quality as possible. Matthew Clark has written a separate post on the approach taken to make such a large change happen.

Radio & Music Services (Ö÷²¥´óÐã Sounds Back End)

Radio & Music Services are a team made up of around 12 software engineers and 5 software engineers in test, along with product and business analysts. We’re responsible for building the API that encapsulate the product functionality of Ö÷²¥´óÐã Sounds. This API is used across web, apps and other clients to provide the Sounds Experience, drawing on content from around the Ö÷²¥´óÐã.

This year we’ve been working on:

  • Audience: Rolling out Sounds to International users.
  • Tooling: Migrating our build system to Code Pipeline.

Ö÷²¥´óÐã Voice+AI

The Voice+AI team are responsible for building & maintaining several Voice-based and AI-supported services in the Ö÷²¥´óÐã. We’re made up of 45 engineers, 9 Product Managers, 5 Delivery Managers and a team of UX designers and editorial.

This year we’ve been working on:

  • Beeb, the voice assistant from the Ö÷²¥´óÐã: We’ve built, from the ground up, a fully-functioning voice assistant and released a Beta demonstrator on Windows 10. Built , we’re using several new technologies, including, a custom , and a stellar bespoke neural voice that we developed for the Ö÷²¥´óÐã. Beeb for Windows 10 is available today on the .
  • Ö÷²¥´óÐã Corona Bot: Launched to the Ö÷²¥´óÐã News Facebook pages’ 50 million followers. Built using the model for the chatbot was trained using editorially curated question and answer pairs on the latest information from lockdown rules, to relevant medical information with support from the NHS.
  • Songbird: In partnership with Ö÷²¥´óÐã World News we launched a ‘listen to this article’ feature. An entirely new content format for the Ö÷²¥´óÐã the feature provides text-to-speech rendered articles using our bespoke . Try it out on .
  • Local news bulletins: At the beginning of lockdown in March, we launched an ambitious project in partnership with Nations and Regions to make available news updates from all 40 local stations in England, Scotland, NI and Wales. In the midst of a pandemic, local information is more important than ever.

Ö÷²¥´óÐã Online Technology Group (OTG)

The Online Technology Group is responsible for the provision and management of the Ö÷²¥´óÐã’s core online services such as networking, , , audience-facing ,, , routing and to content origins, our on-premise hosting platforms, and media analytics.

Here’s what some of our teams have been working on in 2020:

Website traffic management

  • DNS: Migrating our core DNS platform to a new supplier and building a robust, fast, low-cost, automated deployment and testing CI/CD pipeline which is driven from source control.
  • Statistics: Adapted our endpoint to also receive data from our client-side code.
  • Website traffic management: Removed our old traffic managers and caching layer from the majority of traffic on www.bbc.co.uk and .
  • Quality & efficiency: Made our www edge services more consistent & efficient (cache offload) across CDN & in-house traffic managers (GTM).
  • Traffic peaks: Like many organisations, the events of 2020 have resulted in regularly/constantly elevated traffic to our websites. We’ve seen around 40% more than we did in 2019 — a big bump on the usual year-on-year increases.

Media Distribution

Ö÷²¥´óÐã Internet Distribution Infrastructure (BIDI) is an in-house Content Delivery Network (CDN) designed for high volume distribution of iPlayer video and audio streams.

  • Traffic peaks: BIDI’s traffic peak reached 1.2 Terabits per second in 2020, serving nearly 250,000 concurrent iPlayer clients.
  • SmartShard: A new playback routing algorithm called SmartShard has been developed, greatly improving cache efficiency and automating capacity management around the network.
  • Capacity: BIDI estate size has doubled in 2020. BIDI now consists of 102 caching servers, across 24 datacentres, using over 1,200 Solid State Drives. Deployments are run in both Ö÷²¥´óÐã sites and within 3 UK Internet Service Provider networks.
  • Resilience: Automatic failure detection is now active, reducing service disruption when either the hardware or underlying network fails.

For a look at BIDI growth so far and plans for the future, see:

iBL (iPlayer API)

We are the iBL team — the iPlayer Business Layer, and we are responsible for the data that powers all Ö÷²¥´óÐã iPlayer apps — mobile, web, and smart TVs. From promotions on the homepage, categories, channels, to personalisation, user data, schedules and recommendations. We are a team of 5 engineers, a product manager, and a delivery manager.

In the past twelve months we:

  • Improved promotions and relevancy: Creating “hero” imagery to showcase high profile content, and using audiences’ watching history to highlight content they are interested in rather than treating everyone the same. This includes messaging why we recommend certain programmes, and tailoring homepage promotions for younger audiences.
  • Safeguarding younger users: There is a breadth of content on iPlayer, and not all of it is suitable for all ages, so we are taking steps to allow users make better informed decisions.
  • Technical Reliability: We’ve worked on our resiliency a lot — we have improved capacity and redundancy across about a dozen of our database-backed services this year. We’ve also moved some of our error handling to a CDN so we can more reliably serve peaks of demand.
  • Developer workflows: We’ve already had our “infrastructure in code”, using CloudFormation. This year we have also moved our CI pipelines into code, and include infrastructure changes in our CI/CD workflows to make sure the code is always up to date with what is actually deployed. We are also iterating on our internal documentation, using OpenAPI schemas to ensure the data we exchange internally is what we expect it to be.

Ö÷²¥´óÐã iPlayer — Off-Product Discovery

We are the people that share the iPlayer catalogue and availability with partners and link up voice interactions from non-Ö÷²¥´óÐã platforms. It’s what allows you to choose to watch a programme on iPlayer from somewhere which isn’t inside iPlayer! Next time you say ‘watch his dark materials on Ö÷²¥´óÐã iPlayer’ to your fireTV stick or Chromecast with GoogleTV think of us!

Big deliverables in 2020 have been:

  • Amazon catalogue and Voice integration: Following on from many months work, we are now delivering our standard catalogue and integrating with the Amazon video platform. We completed the work to allow you to find content and then control the playback of this content in iPlayer using voice interactions as well as from the remote. You can now see that Dr Who and Killing Eve have the newest episodes on iPlayer, available for no additional charge.
  • GoogleTV catalogue and Voice integration: A very different technical solution completed quickly by following on from the lessons learned working with other partners and having a solid understanding of the user journeys that voice control opens up. Again, we have developed and delivered the code to link up google’s infrastructure with the Ö÷²¥´óÐãs and allow you to watch great content in iPlayer via voice control alone.
  • LG Catalogue and Search: Having a strong, well-defined catalogue format and a means of integrating this with partners allowed the team to complete a VoD catalogue and promotions integration with the team from LG in a very compressed time period. This is what powers the row of links to content above the iPlayer logo on LG TVs and allows the LG voice search to find content in the iPlayer catalogue.

Digital Publishing

The Digital Publishing team work for the creators: the journalists, editors, curators, programme-makers, podcasters and more who make Ö÷²¥´óÐã online. This year we’ve been working on how they do what they do, moving away from online’s late-1990s roots to ways that get the content they make to the right audiences, at the right time (and more cost-effectively, too).

We’ve built:

  • Passports, which rethinks how we describe the content we make. That means better recommendations, better curator tools, and a focus on how the audience discovers content. Looking for relaxing music in Sounds or good news stories? Passports gets it to you.
  • Curation tools to allow editorial to showcase content. Alongside pan-Ö÷²¥´óÐã Topics, this lets us builds collections that span the Ö÷²¥´óÐã’s services, and will soon be used in Ö÷²¥´óÐãpage.
  • “Optimo”, a tool for writers that moves beyond the classic 500-word story and supports the articles audiences are asking for. It’s starting in Ö÷²¥´óÐã World Service, and will be a big focus in 2021.
  • Signal Box, which brings together production data (things like ‘how much content did we make?’) with content data (‘what was it about?’) and audience data (‘how many people consumed it?’) in one place. That answers the questions our content teams need to ask.

Technology Strategy & Architecture (TS&A)

Technology Strategy & Architecture is a community of technologists and architects who help ensure the Ö÷²¥´óÐã makes technology decisions that provide the best value and most effective technology systems to deliver world class services to its audiences. It’s 8 different teams cover areas such as security, connectivity and digital products and represent disciplines from data science to broadcast engineering to architects who oversee the infrastructure of the systems we all use. Below is a snapshot of what some of the TS&A teams have been working on in 2020:

Ö÷²¥´óÐã Blue Room

The is the Ö÷²¥´óÐã’s consumer technology research lab. The team offers access, insight and guidance on consumer technologies, new experiences and changing behaviours, through its Blue Room demo spaces across the UK and ‘on the road’ at public and industry events. This is what the Blue Room team has been up to in 2020:

Ö÷²¥´óÐã Datalab

is responsible for using machine learning to provide great recommendations across the Ö÷²¥´óÐã’s product portfolio.

This is what Ö÷²¥´óÐã Datalab have been up to in 2020:

  • Significantly progressed development of the Ö÷²¥´óÐã’s in-house recommender systems and successfully launched a personalised recommender for Ö÷²¥´óÐã Sounds, as well as recommenders for several World Service language sites and the new Ö÷²¥´óÐã News app.
  • Facilitated another pan-Ö÷²¥´óÐã effort to build on the Ö÷²¥´óÐã's Machine Learning Engine Principles, a framework for responsible development of technology.

Ö÷²¥´óÐã Digital Ecosystem

Ö÷²¥´óÐã Digital Ecosystem is a small team of architects working mainly with the Ö÷²¥´óÐã Platform team to create foundational software capabilities for digital transformation. In 2020 the team have been focussing on:

  • Building a suite of advanced software services to make Ö÷²¥´óÐã content media and metadata simple to access across the Ö÷²¥´óÐã and for our Partners, allowing us to get the best value from our content for our audience.
  • Defining the shape and technology for the Ö÷²¥´óÐã’s next generation data platform to help us understand how to serve our audience better.
  • - a collaboration between the Ö÷²¥´óÐã, Canadian Broadcasting Corporation/Radio Canada, Microsoft and The New York Times to focus on one aspect of the disinformation challenge - content source verification. The goal of Project Origin is to apply clear signals to videos and pictures from publishers so you know the content is coming from where it says it is and has not been manipulated.

Ö÷²¥´óÐã Product Architecture

Ö÷²¥´óÐã Product Architecture helps Ö÷²¥´óÐã teams to develop and deliver audience experiences across all surfaces and modalities. In 2020 the team has:

  • Provided architecture support and guidance to the Ö÷²¥´óÐã Voice and AI team across client facing products, such as Beeb and the Ö÷²¥´óÐã’s presence on Alexa, as well as efforts such as Songbird, which brings Text to Speech capabilities to bbc.com’s The Life Project.
  • Successfully introduced the to D+E teams for architecture visualisation and started to roll out to evaluate the suitability of a tool in the production environment.

Ö÷²¥´óÐã News and Weather Apps Team

We are responsible for building and maintaining the News and Weather apps, across Android and iOS, for UK and international audiences.

  • Ö÷²¥´óÐã News app beta: We are reinventing the News app from the inside out to give it a fresh feel. This includes recommendations, new ways to showcase the top stories, short video clips and an article overview to present articles in a short, digestible way (pictured). The first version is being rolled out to both Android and iOS before the end of the year.
  • ABL: The Apps Business Layer is a backend for frontend technology that will produce app-specific data at its endpoint. This will make a faster, cleaner and more flexible app experience on the device with more work being done on the backend.
  • Weather app: We started work on an updated weather app with split screen, animated icons and a weather widget for iOS 14

Ö÷²¥´óÐã Research & Development

This year Ö÷²¥´óÐã Research & Development is 90 years old. We have played a huge part in broadcast technology innovations from Digital Television to Object Based Media. Whilst the technology evolves, our purpose hasn’t. We look forward 3, 5, 7 years. All the time asking how we use technology to keep public services relevant.

Here are some things we’ve been working on in 2020:

  • AI: Autumnwatch used R&D’s AI system - YOLO (You Only Look Once) to identify different animals captured by multiple cameras saving hours of production time.
  • 5G: Ö÷²¥´óÐã R&D, including IBC’s Accelerator programme, were integral to an exploring cutting edge 5G remote production; and 5G Records a major EU project looking at professional production workflows across audio, video and immersive.
  • Sustainability: Our Sustainability team released research exploring the energy used to deliver and consume television and radio.
  • Streaming: We published the technical specification for Dynamic Adaptive Streaming over IP Multicast, which would enable the distribution of live television at scale over the Internet.
  • Sounds Amazing: Something that was better in Lockdown! We brought together producers, presenters, reporters and engineers to talk about what will be next and later for audio. Over 3000 experts attended this virtual conference from around the world.

And lots more - /rd

]]>
0
Shipping Progressive Web Apps everywhere Wed, 02 Dec 2020 10:01:02 +0000 /blogs/internet/entries/3533ce9c-393a-43d8-8f44-0c46214c86aa /blogs/internet/entries/3533ce9c-393a-43d8-8f44-0c46214c86aa Jacob Clark Jacob Clark

For the past five years Ö÷²¥´óÐã Children’s & Education have been building interactive games and experiences with JavaScript and HTML5. This enables us to ship games to any platform with a modern web runtime. We work with agencies to commission new games and experiences for the many brands that we have.

We have around 300+ games, some of which are distributed via our five mobile apps on iOS and Android and others are embedded across www.bbc.co.uk.

Our 5 Apps within Children’s & Education

We optimise ourselves around being cross-platform. Designing our organisational structure to make bringing new content into our portfolio as painless as possible, and good value for money.

We maintain multiple in-house game engines. These enable agencies to build various different game mechanics effortlessly with varying fidelities, whilst building abstractions and delivery pipelines so the games can be delivered to multiple platforms rapidly.

Delivering these HTML5 games to the web is fairly straightforward. We fetch the content from S3 and pop it into the DOM, each game shares a common API interface that is configured at runtime, allowing the host platform to inject its own implementations of libraries and config. This allows our game developers to focus on building the game and not the underlying mechanics of how our web and app platforms work.

Games on the web can start utilising modern new browser features like Service Workers, Web Workers, and Progressive Web Apps to deliver more and more optimised experiences relatively easily.

However, delivering web games into app stores isn’t as straightforward as “popping it into an app shell”, though.

In a series of blog posts, the Children’s and Education team are going to outline how we rapidly develop web content for app stores and beyond using modern web technologies on a platform dubbed the “Universal App Platform”.

What came before?

Around five years ago, when the business was first facing the challenge of shipping HTML5 content onto iOS and Android stores, many of the options that existed were fairly immature. Tooling like PhoneGap and Cordova were only just materialising, and client-side libraries like React were not as popular at the time. Web standards were growing but there was no concept of a Progressive Web App yet and browser capabilities like Service Workers did not exist.

Back then, putting web content into apps was a hostile environment. Shipping onto app stores was typically considered a problem that could only be solved well by experienced native app developers.

This platform complexity, coupled with immature frameworks and the business requirement to ship web apps to app stores, led us to spinning up an internal engineering team to create a proprietary app wrapping framework. This framework would be capable of bundling HTML5 web content and generating iOS and Android binaries from that content.

We called this framework Pick’n’Mix.

Pick’n’Mix provided a heap of functionality to its embedded web content, including the ability to download and store remote content from a CDN to disk, UI overlays for app settings and download management, analytics events, push notification support, and many other features. This functionality was built from scratch and solely maintained by our internal native engineering team.

Fundamentally, Pick’n’Mix was a WebView/WKWebView with a JS “bridge” capable of passing messages between content running within the WebView to trigger certain actions on the web/native side. This bridge was a “standard” we maintain internally called the Games Messaging Interface (or GMI). This GMI ensures that games are interoperable between many platforms and abstracts away the nuances from game developers of having to worry about the runtime.

You might expect to see methods such as getConnectivity, setConnectivityCallback, download, onSettings, getSettings, setMotionSettings on the GMI.

A high-level look at the Architecture of Pick’n’Mix:

High res image: https://github.com/bbc/uap-architecture/blob/main/Pick%20n%20Mix%20Container%20Diagram.png

From a product perspective, our apps can be split into two distinctive parts:

  1. The Menu system where users can browse the catalogue of games and experiences they are able to download and interact with
  2. The Content items themselves

Content items are typically cross-platform and built in the browser and deployed to both the Ö÷²¥´óÐã website and Pick’n’Mix.

Menu systems, however, were only needed in the app deployable to enable discoverability of content. This meant that menus were built exclusively for Pick’n’Mix and as such we historically required the engineers building menu systems to develop, test, and deliver the menus as fully-fledged ‘Pick’n’Mix’ apps. A very closely coupled integration. This is an important nuance; we were asking engineers to deliver us HTML5 content to run exclusively on a proprietary platform.

To support this, we had to provide means for engineers to test and develop their menus within Pick’n’Mix itself, adding a “development harness” to the pile of proprietary things we’re having to build and maintain ourselves. This nuance cuts even deeper - though, by requiring HTML5 apps to be delivered to a proprietary set of standards, we were moving further and further away from new and emerging web standards. This meant we were unable to take advantage of fantastic resources like MDN, and fundamentally in a position where 80% of the underlying runtime had to be supported by us.

The web is an amazing thing; it can turbocharge your ability to get things done and ship content anywhere, but it’s flexible too, making it easy to box yourself into a corner of unsustainable bespoke implementations, before you know it your once esoteric requirement is a web standard.
Pick’n’Mix was monolithic, it was a semantically versioned framework that every app had to depend on and bundle itself into. Pick’n’Mix did not stipulate how apps should build themselves, none of our apps were running on the same version of Pick’n’Mix and each time an app needed to take a new version upgrades were laborious and expensive from a development and testing perspective.

Due to Pick’n’Mix’s un-opinionated stance on how things should be built, each app had to maintain its build pipelines and menus/games had to orchestrate their own interaction with the GMI, ultimately there was no sharing or composability between apps. On the surface, abstracting native functionality and business logic behind the GMI seemed sensible and good value for money, a single, well-defined API for fetching remote data, right?! Digging into this further though, it was clear that we’d simply offset the cost away from agencies having to build the native functionality (which is good in itself) to agencies having to integrate with an API that was just as complex and potentially more un-intuitive than if we did nothing at all.

This also led to us having significant amounts of duplication across our portfolio of apps. UX patterns implemented time and time again written imperatively with WebGL/Canvas, several integrations and duplicated business logic around how downloads should be handled under certain network conditions, inconsistent bundling tools and inconsistent config approaches across apps.

We used Wardley Mapping to understand where each of the technical components required to ship our apps on Pick’n’Mix sits in terms of the maturity (and thus cost) against the value they deliver to our audience.

A Wardley Map illustrating cost/maturity of all the aspects of our Pick’n’Mix framework

From this map, and our experience and knowledge of building hybrid web apps over the years, it was clear the majority of our time and money was being spent building our in-house app wrapping technology, and not on the app experiences themselves.

In 2020, the problem of wrapping HTML5 content for app stores is a well-solved one. Several open-source tools exist that can bundle HTML5 content into a native app, including PhoneGap, Cordova, Capacitor.js, Framework7, Ionic plus others. Progressive Web Apps (PWAs) and Trusted Web Apps are maturing as standardised ways of building portable, app like web experiences. As well as this, platforms are beginning to support PWAs/HTML5 apps as first-class deployment types, from Android via the Trusted Web Activity through to Amazon Fire TV sticks. Service Workers, Web Workers, and many other mature/newly emerging web standards are coming to fruition that makes building native-like apps on the web a reality.

It is clear that continuing to maintain our hybrid app wrapping framework in 2020 was unsustainable, so we decided to re-orientate ourselves around a series of principles to better inform a more sustainable technical approach to building experiences that can be shipped onto an app store.

Principles

  • Embrace the technology — fundamentally our apps are websites, we should embrace that
  • A single codebase for app and web — we’d got so caught up in where our code was deployed too, we were not optimising for the right things
  • If it exists, we don’t reinvent it — The JavaScript ecosystem is extremely vibrant around the problem area we’re in, we should stand on the shoulders of giants
  • It runs anywhere — we want to move away from maintaining proprietary tools and enable web developers to use the tools they are most familiar with
  • There is no framework — the most important principle, we want to stand on the shoulders of giants and define a set of opinions that make building our experiences familiar to any developer who has worked on a modern web stack recently

Building a Universal App Platform

Once we set out our principles, we ran some experiments to answer a series of questions. How hard can we push the web today? How close to an app store ready app can we get with little to no native engineering? Could we move all natively built user interface components out to web components? Could we avoid building our own hybrid app framework full stop?

We learnt that as of today, shipping Progressive Web Apps to many platforms is becoming a reality. Whether it is Androids support for PWAs in Google Play via a Trusted Web Activity, or support for modern web standards across a range of smart devices from Smart TVs to Fire TV Sticks — it’s clear this is a rapidly changing and improving field.

Our unique and immersive user interfaces also means that we don’t need to match the native system look and feel. Having our own design system that kids are already familiar with, across all of our experiences, places us in a unique position where a single code implementation makes sense.

Left: Our new, cross platform settings menu for Web, iOS and Android. Right: Our old, natively build, platform specific settings menu.

Progressive Web Apps are yet to make an appearance across any part of the Apple ecosystem, but we know Safari has been adopting more and more modern web standards which we can use to our advantage as we develop this technical strategy.

We know, however, that there will be gaps in platform support. These gaps require us to have a flexible architecture that puts modern web standards at the heart of our engineering but retains the ability to patch in polyfills/native implementations where required.

From our research and development spikes, we settled on a 3-pronged technical strategy:

  1. Modern web standards like Progressive Web Apps now give us the ability to build app quality experiences with just web technology — we should embrace the PWA as our core development environment and optimise for this as a priority
  2. Emerging technology and open source tooling give us the capability of shipping compliant Progressive Web Apps to all app stores. We identified Ionic’s Capacitor.js to be the best choice to wrap compliant PWAs for app store distribution
  3. A declarative, component-driven architecture will allow us to drive more re-use, increase consistency, and reduce platform-specific implementations across App + Web. We chose React.js to enable us to build up this architecture

PWAs + Capacitor.js + React = Universal App Platform

This is a significant product and technical change for us. We’re shifting our focus away from “we build native apps” to “we build immersive experiences for any platform”. This new way of thinking enables us to build consistency into the platform as a whole whilst also reducing the cost of building our products, reducing cognitive load across all disciplines and making room for us to focus on new product propositions.

The Universal App Platform Architecture. The diagram illustrates the relationship between the Progressive Web App, the App shell for App Stores and how each are built and deployed respectively. High res image: https://github.com/bbc/uap-architecture/blob/main/container-diagram.png

Outcomes

  • Re-built our entire proprietary app platform in less than a year whilst rebuilding all our menu experiences to have highly reusable, declarative front ends at the same time
  • A single codebase composed of multiple apps and a cohesive set of components that can be reused, extended, and delivered across multiple platforms
  • A single community of engineers collaborating, improving and developing a shared platform
  • Standardised ways of working, linting rules, and testing approaches across multiple teams
  • A shift away from imperative WebGL/Canvas programming to a DOM based component model through React.js for our Menu systems
  • No further requirement to maintain proprietary development harnesses, testing frameworks, or documentation
  • Our ability to power all of our five apps, plus our entire interactive embedding estate of 350+ games from a single codebase and a single set of cohesive components
  • The ability to focus on new, product enhancing capabilities across the entire portfolio of Children’s and Education products

Re-mapping where we are today illustrates that through shifting heavily to off the shelf, open source and commodity driven ways of building software, we’ve been able to re-focus our time and money on building immersive experiences for our audiences. We’ve achieved his whilst eliminating significant amounts of duplication as well as having a modern, standardised technical stack and reaping all the benefits that come with this.

A Wardley Map illustrating cost/maturity of all the aspects of our Universal App Platform

The left screenshot below is our existing, bespoke, imperative Canvas/WebGL based Pick’n’Mix wrapped Playtime Island app, on the right is our new skinnable, component driven Progressive Web App wrapped in Capacitor.js producing the Playtime Island app.

Left: Playtime Island running within Pick’n’Mix build imperatively with Canvas/WebGL. Right: Playtime Island running as a component driven, Progressive Web App in Capacitor.js. Virtually no difference in look and feel.

Here is just a small flavour of how declarative, composable and intuitive our component driven architecture now is. This component displays an information banner across the app, transitioning onto the screen and animating images that sit either side of the banner text.

An example UAP React.js JSX component

What’s coming next?

As the Universal App Platform matures, we will be looking at new opportunities to deliver more compelling, personalised, and rich experiences to our audience, as well as understanding what future opportunities our Universal App Platform can support for products looking to be cross-platform.

In the short term, we will be launching all four CBeebies apps, migrating our game embedding technology and developing rich, personalised experiences to benefit both Web and Apps on the Universal App Platform. Ultimately decommissioning both Pick’n’Mix and our game embedding platform CAGE early 2021.

In the longer term, we’ll be using the opportunities this transformation has provided to develop our product propositions further, looking towards opportunities in personalisation, AR/VR, new platforms and more.

This has been a significant, strategic, technical and product level change for us. A tremendous number of people have been involved in making this change happen. Change at this scale is never easy, but working together is the key to success. Expect to see more details on this blog from teams working on the Universal App Platform over the coming weeks and months.

It really does run anywhere! Go Explore as a PWA running on a Tesla In Car Entertainment System.

]]>
0
Ö÷²¥´óÐã World Service: Migrating 31 million readers and an 83% improvement in page performance Wed, 25 Nov 2020 10:47:40 +0000 /blogs/internet/entries/a98f6952-4051-4b8f-8d27-35b3db69a839 /blogs/internet/entries/a98f6952-4051-4b8f-8d27-35b3db69a839 Chris Hinds Chris Hinds

The Ö÷²¥´óÐã World Service publishes news stories in over 40 languages globally. Stories are written by journalists around the world in their native language instead of using translations. World Service covers everything from local to global news and content is delivered in multiple formats, including text, video and audio.

As mentioned in Moving Ö÷²¥´óÐã Online to the cloud the frontend (and many backend services) powering the World Service websites were previously written mostly in PHP, hosted by Ö÷²¥´óÐã-owned data centres. Over the last couple of years, teams within Ö÷²¥´óÐã Online have been working tirelessly to migrate their services to the cloud and the Ö÷²¥´óÐã World Service has nearly completed this transition.

Over the past 12 months we’ve migrated our pages which are spread across 41 discrete sites from a legacy PHP monolith to a new React based application. This application is called , an open source, isomorphic single page application developed by the World Service languages team.

What do we mean by Single page application and Isomorphic?

Single Page Application (SPA)

A single page application (SPA) is a web application that works solely in the browser, removing the need to refresh or reload the page. This creates an outstanding user experience that feels close to that of a native mobile application. Some common services you may use on a daily basis make use of this technology, including Gmail, GitHub, Facebook and Google Maps.

Isomorphic

An isomorphic (sometimes referred to as “universal”) app is a web app that can run on both the server and the client. The idea being that the first request to a web page e.g. will be rendered on the web server delivering server side rendered HTML to the readers browser. Once the rendered page reaches the client and the JavaScript is downloaded and parsed the browser is able to take control and then treat subsequent page views as a single page application. In React this is handled via the React hydrate function which “hydrates” the client side DOM with the data that was used to render the page on the server. In most cases the reader does not notice this phase as React is performing a diff on the DOM between the server side render and the client side render. For the most part these will be identical, however, at this point React is in control of the page rendering in the browser.

Simorgh (The React SPA built by the Ö÷²¥´óÐã World Service)

Simorgh is the rendering platform built by the Ö÷²¥´óÐã World Service web team using the technologies described above. What made Simorgh challenging to build wasn’t the technology we used, but the specific requirements of Ö÷²¥´óÐã World Service. When building Simorgh to replace our dated PHP solution we had to bear in mind the following:

  • Performance — The websites must be as performant as they can be. Many of our readers are on lower end smart/feature phones on networks with low bandwidth rates and high data costs, slow connections and patchy coverage.
  • Accessibility — The Ö÷²¥´óÐã aims to provide a fully accessible web platform, ensuring that anyone can access our websites using any assistive technology.
  • Support for multiple languages — Ö÷²¥´óÐã World Service currently supports 41 different languages, each language site has its own editorial team and from the outside this is seen as 41 separate websites.
  • Huge volumes of traffic — The World Service currently serves 31m weekly readers. Simorgh despite being behind many different caching/routing layers is rendering on average 1 million unique pages per day with an average of 11 million daily renders across the 41 languages.
  • First class AMP support — Offering AMP variants of all supported pages. This allows us to move away from the previously separate AMP rendering system that was built on an internal Ruby based framework for static rendering.

We are planning to post a dedicated writeup on the history of Simorgh and the technologies chosen in the near future so keep an eye out for that.

So where are we today?

Simorgh currently supports twelve different page types:

These pages may appear visually similar, but our internal content management system treats them as different page types. One of the biggest advantages we have seen by rebuilding our platform is the ability to reuse code as much as possible. Many of these pages share the same code and React components. These components are located in our open source React component library (Storybook ).

We knew that we wanted to focus on improving web performance for the World Service, but this was difficult with the previous PHP platform. So one of the main goals of the migration was to build a platform that would enable this type of rapid prototyping, allowing us to make changes and improve the feedback loop of those pages.

Web performance wasn’t the number one goal when we created Simorgh, but as we followed best practice in developing the new platform, we did see vast improvements compared to the old one. Some of this was attributed to some early design decisions such as no blocking JS, minimal layout shift, server side rendering etc.

We released pages in batches, grouped by the language and the first language we released onto the new platform saw huge gains performance in many areas.

  • Lighthouse performance score saw a 224% increase from 24 > 94
  • Lighthouse best practice score saw a 27% increase from 79 > 100
  • Total number of requests dropped by 85% to 17 down from 112
  • Blocking JS requests dropped by 100% from 9 to 0
  • JS requests dropped by 79%
  • Total page weight is now 60% smaller than before
  • JS size dropped by 61%
  • Dom Content Loaded is 85% faster at just 0.4s down from 2.6s
  • Visually complete time dropped by 62% down to just 1.8s vs the previous 4.7

These performance metrics were captured using SpeedCurve comparing the same url on the old platform and the new.

So as you can see we have already made a great improvement in our frontend web performance — but we won’t stop here. A large proportion of the Ö÷²¥´óÐã World Service audience are on slower 2g and 3g networks, they use lower end budget-friendly android handsets or feature phones. In some of our supported regions network coverage is patchy at best, some readers may only have network access whilst traveling to work or whilst at work. We must continue to make improvements in every way we can to make our pages some of the most accessible web pages in the news category both in terms of accessibility requirements and performance.

This video demonstrates the performance improvement between the old and new platforms.

This external content is available at its source:

Post migration improvements

Since the migration we have already released a number of new features that aim to help improve performance, perhaps the most notable one was the lazyloading of social embeds (Tweets, Instagram posts and YouTube videos).

Social embeds are often a key part of telling a story. We have found that many of our journalists add a number of social embeds to each page. For instance one language always embeds 2–3 YouTube videos at the bottom of each story. When looking into the performance metrics for these pages we noticed upwards of 500Kb of JS (that was more than the entire Simorgh application) was being loaded by YouTube and that some of this JS was actually blocking the rendering of our page as it was being parsed. In one extreme example the Time to First Byte (TTFB) was at 12s.

This content had to be on the page as it was part of the onward journeys experience. However not every reader would scroll down the page to where these embeds were rendered, so they shouldn't have to download the extra JavaScript or use the extra data allowance when they may never interact with these social embeds.

The Solution?

Lazyloading of third party content - We already do this with any images that are outside of viewport so why not for social embeds? A quick pull request later and we were lazyloading social embeds, no new library, no JS size increase, just using an already existing feature on the platform. Soon after releasing we saw a wide variety of results as these were dependent on where the social embeds were in the story and how many a given story had.

In most cases we were seeing a 10–15% improvement in TTI, as well as reducing, if not eliminating the render blocking time. Where I was most impressed though was in the story mentioned earlier. We had taken the TTI from 12s down to 6s. 6s is still a long time, however, this was a story with many different social embeds and in some ways a worst case scenario. In any-case a 50% improvement in just a few lines of code is phenomenal. This kind of change would not have been possible, at least not so quickly on the previous platform.

How are we monitoring web performance?

Now that the migration is complete, we are in a position to start making more improvements to web performance and changes to the platform. Before we can make many meaningful improvements to the application we need to be able to monitor web performance.

There are two common ways of monitoring web performance;

Synthetic Testing

Synthetic testing is great for catching regressions during the development lifecycle. We use Lighthouse, SpeedCurve and WebPageTest to measure our web page performance.

RUM (Real User Monitoring)

RUM testing is a method of capturing performance metrics from our users. RUM is generally more expensive in comparison to synthetic testing, however it provides a vital look into how real users are experiencing our site.

We use a combination of Synthetic and RUM monitoring for Simorgh. During development, Lighthouse runs on every pull request/feature branch. Lighthouse tests a subset of pages and for the most part is looking at the Accessibility, PWA and BestPractices audits.

Lighthouse is also used in our continuous delivery pipeline. After we deploy to the test environment, we run Lighthouse against the environment and can choose to fail the build if the audits fail. This same test will then also run against the live environment once the deployment is complete.

SpeedCurve runs daily tests against a smaller subset of URLs. SpeedCurve is a tool that essentially wraps around WebPageTest and Lighthouse, providing a fantastic UI on top of those underlying tools. These tests give us an insight into performance of our pages from different regions around the world.

Core Web-Vitals

A recent initiative from Google is the Core Web-Vitals. The idea behind these metrics is that they are a way to determine/monitor the user experience of your site. Google collects the metrics themselves from popular sites and publishes the CRUX dataset (Chrome User Experience). These metrics include things like; Time to first byte, First input delay, Cumulative Layout Shift.

Through a new package in our component library we are now able to collect these same metrics ourselves if the user has opted into performance tracking via the Ö÷²¥´óÐã cookies settings page.

In comparison to procuring a 3rd party tool, this is a cost-effective way for us to be collecting real user metrics. RUM is very important for us in the Ö÷²¥´óÐã World Service as our users are situated all around the world, they all use different devices with different capabilities and run on a wide variety of different networks. Getting this sort of test coverage with just synthetic testing would be impossible. This new data will allow us to start making informed production decisions about where we need to improve the web pages directly affecting the readers experience.

Screenshot of our custom RUM solution reporting on Web Vitals

We hope to publish a dedicated post in the near future about how we collect and use Web-Vitals.

Conclusion

It’s been a busy period for many teams at the Ö÷²¥´óÐã this year but we are seeing the light at the end of the tunnel. The World Service migration has been a great success thus far. We have migrated to a modern platform that is open source, and faster than ever both in terms of product/feature iteration and web performance.

Our journey has only just begun. Simorgh represents a new beginning for the Ö÷²¥´óÐã World Service, and we will continue to improve the performance and accessibility of our news web pages for our global audiences.

]]>
0
The lessons learnt creating a design system for Ö÷²¥´óÐã Online Mon, 23 Nov 2020 10:33:48 +0000 /blogs/internet/entries/55645f54-f554-45aa-b652-df470a721e59 /blogs/internet/entries/55645f54-f554-45aa-b652-df470a721e59 Dave Morris and Josh Tumath Dave Morris and Josh Tumath

Dave Morris and Josh Tumath are part of something called the Presentation Layer team; a team working on the design system and React front-end for the new Ö÷²¥´óÐã website.

A short history lesson

Ö÷²¥´óÐã GEL is the design framework for the Ö÷²¥´óÐã and was one of the very first design frameworks to be published online back in 2010. GEL sets out guidelines to allow UX designers working on the different sections of the Ö÷²¥´óÐã website to create a consistent user experience and visual identity. This was fantastic as a set of guidelines for designers and developers to follow when creating websites for Ö÷²¥´óÐã Sport, Ö÷²¥´óÐã News and Ö÷²¥´óÐã Bitesize (to name just a few). Plus, all the other experiences for mobile apps and TV.

However, as Matt talked about in his recent blog post about moving to the cloud, we’ve been going through a cultural change in the Ö÷²¥´óÐã. We’re not building separate websites under one domain name any more. Through the WebCore project, we’re now building one website. And one key part of that is the design system.

Design systems are the single source of truth for both the design and implementation of a digital product, whether that’s a website or app. GEL can now be injected into the design system instead of being left to interpretation by teams reading and digesting the guidelines.

While there are plenty of design systems which are only built for one brand, ours has to serve a wide variety of destinations at the Ö÷²¥´óÐã. Destinations are the places which people come to the Ö÷²¥´óÐã to visit and they have to feel different to each other. There are informative destinations like News and Sport but also more playful ones like CÖ÷²¥´óÐã and CBeebies. This is tricky to do in a structured design system. And it’s something we’re learning to do best as we go.

Our design system is split into five areas:

  • Foundations. Basic tokens like font sizes and spacing units.
  • Components. These range from simple UI snippets like a button to complex ones like a navigation bar. Each component has accessibility and reusability considered from the first iteration.
  • Layout components. Special components with a specific job to arrange other components in a layout, like Grids, Sidebars and the humble .
  • Levers. The theming system. There are three levers (so far) in the design system: the font palette, the core colour palette and the brand colour palette. We want more levers. More on this later.
  • Containers. The top-level components that assemble other components together and fill them with data to form a stand-alone experience that can be used on a page. For example, the Article container fetches the article content from the business layer, and assembles the Heading, Tag List, Rich Text and Image components to make an article.

Our community in Ö÷²¥´óÐã Design + Engineering have been building up the design system for over a year now, and we want to share with you the lessons we’ve learnt in that year.

1. Make it as easy as possible to share

We’ve taken a very different approach to the GEL design system compared to others out there. Our team, the Presentation Layer team, are not the sole maintainers of the design system. We haven’t pre-anticipated requirements and made in-advance a library of components like buttons, input boxes, navigation bars and dialog boxes. We haven’t created the design system as a versioned library that developers need to pull into their project and keep up-to-date.

So what have we done instead? We’ve done three things:

  1. The Presentation Layer, our React-based rendering layer for the new Ö÷²¥´óÐã website, is a that includes the design system. Only web pages created in the monorepo can use the design system. And, most importantly, any changes to components in the design system are rolled out instantly.
  2. Layouts, Components and Tokens are only created when they are needed. We typically wait until there are at least two use-cases for a component before creating it. We didn’t even have a Button component until a few months into the project!
  3. Our team does not solely own the design system. Every team building with the design system jointly owns it. Working together as a community, we all build and make changes to components.

Each team contributes and reuses from the design system. It’s a give and take relationship

By doing these three things, every page is always using the up-to-date version of the design system and using it consistently. The design system does not stand alone. It’s built into the wider platform. So it’s not possible to build something bespoke to a particular team’s needs; everything can be shared.

2. Do things on a need to make basis

We intend to build the one Ö÷²¥´óÐã website together because previous approaches have led to fragmented results. Therefore, a top down approach of ‘we build these things and you use them’ doesn’t play nicely with this way of doing things. It shouldn’t be down to one central team to create the components all teams will eventually use. Our product teams do their own user research, so they know much more about how the audience are using Ö÷²¥´óÐã products and services than our team does. They’re the ones best placed to create new components and improve what we already have available.

We knew this approach was going to be difficult. Teams across the Ö÷²¥´óÐã are used to working in silos with autonomy to create components and features to meet their own needs whilst using the GEL guidelines as a steer. We’re essentially pressuring a move from vertical silos to horizontal shared ownership through the design system. Any new additions to the design system should be considered by how anyone might end up using them.

Because of this culture change, we’re striving to keep the components simple to start with. Remember that this design system has to serve multiple destinations at the Ö÷²¥´óÐã. If we start with something complicated then it’s only going to get more complicated and unmanageable by the time we’ve figured out how it can be used by everyone.

Keeping things simple means to avoid making components until they’re needed. In some cases it also means to not make them with all the bells and whistles until it becomes a requirement to do so. This allows us to avoid complicated discussions and move forwards in the short term.

We’re there to join the dots between all those other teams creating stuff in the design system and offer guidance and support in building and improving components. We create the standards and principles for all teams so that we can make sure accessibility, reuse opportunities and consistency are at the heart of what everyone makes.

3. The reinvention of the wheel

Prompting teams to find and use solutions already in the design system can be challenging. It’s something we’ve had to tackle whilst teams are getting acquainted with the design system.

There are some products at the Ö÷²¥´óÐã with well-established online experiences which are now rebuilding in the cloud using the design system. This can bring along with it a like-for-like mentality. Some teams (but not all) expect to create a carbon-copy of their existing experience using the design system.

There’s a whole array of articles out there which cover a topic of “what happens when a design system gets you 90% there” — these are our experiences on the subject. We know design systems are fundamentally driven by re-use. Creating something new which is similar to another pattern or component inside the library can add unnecessary weight, take up lots of designers and developers time in creating those new components and cause confusion in the future with those looking to find the right component for their needs.

We’ve been building things in isolation for years at the Ö÷²¥´óÐã, it’s sometimes hard for teams to think outside of their silo and horizontally about what the business needs in a component library. The design system, and the Presentation Layer Team are now influencing teams to think more horizontally.

A simple tactic we employ in these circumstances is to ask the teams “what problem are they trying to solve?”. This flips the priority of “we need this” to “a user needs to be able to do this”. We try to approach each new request for a component with a re-use first way of thinking. By doing this, we’re setting an example for the rest of the teams building using the Design System.

We can then have a conversation around the existing components available and potential iterations which all teams could benefit from. If we do find that there’s a requirement for a new component — we work with that team to build it with reuse considered. It’s easy to spot the same problems in different areas on our radar. We then join the dots, and get the relevant teams involved to help everyone work more collaboratively.

4. Be strict to allow for flexibility

To drive re-use, we need to be able to create components once; not multiple times for each of the different brands. So, to support the variety of destinations on the Ö÷²¥´óÐã website, we came up with a theming system we call ‘the levers’. Anyone creating a page can choose from a set of font combinations and colour swatches to theme it. And each component will work with all of them out-of-the-box.

A theming system might not seem like a ground-breaking idea, but it’s pretty powerful when Ö÷²¥´óÐã News leadership ask, ‘When can you get this video player page on the Ö÷²¥´óÐã Sport website ready to use on News as well?’ and we call pull some levers and say, ‘Hey presto – it’s already done!’

Just pull some levers and hey presto — you’ve got a website! (Provided you plug in the right data sources for the content too)

To make this work, however, we needed to be strict about what sets of colours and fonts are allowed to be used in Components. We had to resist the temptation to use our own colours in specific places.

Colours and fonts are given ‘semantic’ names based on where they’re intended to be used, rather than what they look like. For example, the text for heading text is called ‘primary’ and the colour for borders is called… you guessed it… ‘border’!

That doesn’t mean they’re set in stone, though. We’ve already had to add new colours and change them as we get newer use-cases, and we hope to add more levers in the future too. How great would it be to have different levers to change the type of motion applied to an experience? Or to help us create fantastic children’s experiences where there may be crazily different requirements for a website to cater for those types of audiences.

5. Documentation is hard, but important

One thing we always worry about is our documentation. We have a lot of it: accessibility guides, design guides, development guides, tutorials, component docs, principles, values…. And to make matters worse, they’re all over the place: on our internal wiki, in our cloud storage, on GitHub and in

Storybook catalogues our reusable components and contains other documentation

Building a successful Design System and platform hinges on the quality of the documentation. It needs to be kept up-to-date. It needs to be discoverable. It needs to be easy to navigate.

Just like with components, documentation can start off simple but grow unwieldy as people add to it bit-by-bit. Taking the time to audit it, get feedback and clean it up is really important. That’s something we’re doing at the moment.

The Presentation Layer team duties are part design system curation and part support team. Having a complete and well organised set of documentation means we can run a tight ship. It helps us to automate a lot of support requests for both designers and developers.

6. Design for longevity through obsolescence

Finally, the Design System has got to be fit for the future. By designing our content and layouts as components, it’s easy to rip them out and replace them when the time comes. And we shouldn’t be afraid to do that!

is becoming a hot topic in the world of web design. As we start to embrace newer features of CSS like Grid and intrinsic sizing units, it will be easy for us to swap out our existing layout components with something more intrinsic.

We hope to talk more about Intrinsic Web Design in a future blog post.

============================================

It’s been over a year of designing, building and learning as we go. Above all, we’ve learnt that we need to be flexible as we work together with everyone from different teams. The progress that’s been made so far has been brilliant, and we’re excited about building on the foundations we’ve made to make experiences that our audiences will love.

This article is also featured on Ö÷²¥´óÐã GEL and the . 

]]>
0
You don't know what you don't know Tue, 10 Nov 2020 11:26:55 +0000 /blogs/internet/entries/585c7800-5248-4201-939a-89d44213a418 /blogs/internet/entries/585c7800-5248-4201-939a-89d44213a418 Lucia Mormocea Lucia Mormocea

The story of how the Bitesize team created a quiz feature from their bedrooms, kitchens and garages.

It was the middle of summer 2020. After the whirldwind brought on by coronavirus, lockdown, working from home and not least Daily Lessons we were just settling into a comfortable routine.

And when I say ‘we’, I mean the lovely bunch of people that make up the Bitesize team these days: developers, testers, UX designer, business analysts, project and product managers.

Tickets were added in at a steady pace, features were coming along nicely. We even spent a couple of days in a hackathon devising new ideas for the website while shaking off some technical cobwebs.

Overall life working from home was as peachy as it could get in a global pandemic.

Maths, Science, Magic

We started looking ahead at what we could do to support our audiences in the new autumn term. Students faced interrupted lessons and a potential mix of in-school and at-home learning. We came up with a number of things we could offer to help. One of them was using quizzes to diagnose gaps in their knowledge and recommend where they should focus.

Quizzes are nothing new to Bitesize, we have two types of them already. One is the old favourite version made up of ten questions at the end of the revision chapters andother was a more elaborate learning experience powered by machine learning which aimed to help the user retain information.

This quiz was different!

The data science team had a prototype that used existing questions from test chapters and computed the user’s mastery level across each study guide with a certain level of confidence.

The more questions the user was asked, the higher the confidence level, and the more accurate the mastery level.

Based on those mastery levels we could suggest different topics or guides that the user should revise, effectively allowing us to diagnose a user’s gaps in knowledge.

“Trusssst in me…”(Kaa, the python — Jungle Book)

The data science team had written their prototype in Python. Within the Bitesize team we had never touched that particular gem of a programming language so we were very apprehensive at first.

However, Python is “The” language of choice for machine learning and data science in general, not least because two of the most powerful libraries for mathematical operations are using it: numpy and scipy.

Those two libraries featured heavily in the prototype presented to us, so it made perfect sense for us to try and use Python for the actual implementation. However it did mean we had to produce a whole new CI/CD pipeline for it to work.

It’s a long way to Live if you wanna rock ’n roll

Having a fantastically clever algorithm that can provide mastery and confidence scores is great!

Shaping it into a product that provides value to our users is a whole different proposition.

Some of the initial questions we had to answer were:

  • How should we package algorithm in order to be hosted in our AWS VPC?
  • What’s the algorithm’s domain and data contract?
  • How do we explain the algorithm to our audiences? Understandibly, these days people can be very suspicious of software that uses data to provide information.
  • And last but not least, what happens when there is a Live bug? Contrary to popular belief software will always have bugs. Our job is to ensure they are few and far between.

A few shaping and discovery sessions with the data services team led to refactoring the prototype to a new working model.

We started calling it the Efficient Learning algorithm or just “the algorithm”.

Its domain seemed to be made up of questions and answers. It required knowledge of Bitesize curriculum domain made up of subjects, topics and study guides. For example it needed to know that the topic Cell Biology contains guides like Cell Structure, Cell Division etc.

The results returned would be an array of guides with mastery and confidence scores for each one.

While our understanding of the Maths behind the scenes was not that great, hence the “Mystery Box”, we knew enough in order to get started on building the infrastructure for this and the UI.

To Lambda or not to Lambda

One of the early decisions made about the this new quiz was to host the algorithm as a lambda.

Some very good reasons for that were that the algorithm looked to be a data in/out function which did not need to store the results or any user data. Therefore, we did not need a permanent running EC2 for this. It could be packaged in a lambda that spun up, computed results, then shut down.

In addition, the “algorithm” did not need access to any of our systems except perhaps the curriculum data. Since the data required about subjects, topics, study guides was static we could just add it to one of our S3 buckets.

Lambdas are a lot easier to develop, wonderfully encapsulated, easy to spin up and tear down. Plus we already had a few kicking about our estate and we felt confident we could do this fast.

A Matter of State

Choosing to create a lambda implied that something else would have to handle the state. The “state” in our case consisted of a list of questions sent to the algorithm with a flag for whether the answer was right or wrong.

Initial UI drafts suggested that the quiz would run one question at a time and required the ability to store the user’s progress. We did not need any personal user data for this which suited us just fine because we wanted the quiz to be available to non-signed in users.

This is where User Store was the right solution for us. It is currently a client side module which handles CRUD operations for key-value data using the local storage API. The ambition for the future is to update the component to use a more permanent cloud storage solution whilst hiding the complexities away from the UI.

For the quizzes MVP we use it to store the questions which have already been answered. We could later extend this to securely store achievements and progress for signed in users.

UI and the MVP approach

The MVP planning sessions with the teams are always fun!

They involve creative discussions about what the core functionality should be. Then we look at the time we have to build it and start trimming the edges.

The MVP approach allows the whole team to build this product as fast as possible, then learn what works for the audience, and what doesn’t.
UX designers also changed their approach. Instead of carrying on a big discovery phase with lots of user studies, they switched to a ‘build, measure, learn’ one.

The UI focused on getting the user to results as fast possible with minimal onboarding. It had to cater for questions and answers with very different text sizes.

It needed to be friendly to smaller devices since it was estimated that most students will be doing their studying on tablets or phones.

The UI also challenged us to think of the way we can display results meaningful to a user.

Mastery scores are not extremely useful. We needed a system that can explain how well a student has done for each topic they chose.
In order to help with this, the data science team expanded the algorithm to return band 1, 2 or 3 for each study guide with band 3 being the highest mastery level.

To simplify results even further, we created band 1, 2 and 3 for the whole topic based on the scores of each study guide.

 

The bands map to a colour system of Red, Orange, Green:

The most useful part for the user is the study suggestions provided for each topic. For Red and Orange bands we display links to the study guides with the lowest mastery scores. This should direct our users to content that is highly relevant to their revision goals.

Infrastructure

Getting a Python lambda off the ground was not trivial!

At first we had to setup a local development environment to run the function locally. We used Python 3 virtualenv and AWS SAM which enabled us to test the function with different types of events.

Our CI/CD pipelines are mostly done via Jenkins and Cosmos. Cosmos is an internal tool that provides a GUI for managing CloudFormation stacks. We created a pipeline that spun up a Docker container from a Python 3 image. The pipeline copied the code, ran the unit tests and packaged the build and dependencies into a zip file. From there it was a simple step to publish it to our AWS VPC.

Cosmos uses AWS CLI behind the scenes, so the size of the package when uploading a lambda is set to no more than 50 MB. However, when creating the package we quickly discovered that numpy and scipy libraries in the dependencies pushed the size of the package over that threshold.

This is where AWS Layers were helpful. The Layers functionality in AWS allowed us to move these two runtime dependencies outside the function. AWS offers an out of the box layer for the two libraries so we did not need to create one ourselves.

Automate all the things!

Between our development teams, our testers are spread thinner than marmite on toast. So our plans to secretly make them GCSE experts by getting them to manually test the quiz over and over again seemed unlikely.

We came to the conclusion that we had created a very challenging system to test.

The algorithm was not deterministic. It would not be possible to force it to provide the same questions with the same input. Luckily it provided the same mastery results if the same questions were used.

The UI was intricate, worked fully client side and the quiz was 20 questions long. So manual testing would be long, boring and repetitive. Exactly the opposite of what we want our testers to do!

The test team built two sets of automated tests in Cypress. One set runs contract tests against the responses coming from the API Gateway. The other runs an end to end test of the UI and can run through the whole quiz interface as many times as it is needed.

Creating and updating automated tests has become part of the regular development process and as a result it has given us a lot of confidence in our features over the past year. Automated tests run across the whole estate on an hourly basis and flag breaking changes.

This means we can deploy to Live faster, catch problems early and we are not restricted by the number of testers in our team.

You Don’t Know What You Don’t Know … Until you know it!

Hopefully Bitesize quizzes will help you figure out what you don’t know. Well, at least as far as GCSE’s go.

We had a great time making this and look forward to adding more subjects and introducing new types of questions.

]]>
0
Moving Ö÷²¥´óÐã Online to the cloud Thu, 29 Oct 2020 10:56:20 +0000 /blogs/internet/entries/8673fe2a-e876-45fc-9a5f-203c049c9f9c /blogs/internet/entries/8673fe2a-e876-45fc-9a5f-203c049c9f9c Matthew Clark Matthew Clark

This is the first in a series of posts in the coming weeks about how Ö÷²¥´óÐã Online is changing, making use of the cloud and more. 

Over the past few years, we in the Ö÷²¥´óÐã’s Design+Engineering team have completely rebuilt the Ö÷²¥´óÐã website. We’ve replaced a site hosted on our datacentres with a new one, designed and built for the cloud. Most of the tools and systems that power the site have moved too. We’ve used modern approaches and technologies, like serverless. And we’ve refreshed the design, approach, and editorial workflow, ready for the future. Hundreds of people have been involved, over several years. And it’s just about complete.

This post is the first of several looking at the what, the why, and most importantly the how: the approach we took to creating a new site that’s ready for the future. Delivering quality technology change, quickly and effectively.

Examples of pages that have been recreated on the cloud in the past year

Context

The Ö÷²¥´óÐã website is huge. Over half the UK population use it every week. Tens of millions more use it around the world. It has content in 44 different languages. And it offers over 200 different types of page — from programmes and articles, to games and food recipes.

As is the case with tech, if you stand still, you go backwards. Until recently, much of the Ö÷²¥´óÐã website was written in PHP and hosted on two datacentres near London. That was a sensible tech choice when it was made in 2010; but not now.

The Ö÷²¥´óÐã’s site is made up of several services (such as iPlayer, Sounds, News and Sport). For them all, we need to ensure they use the latest and best technology. That’s the only way to ensure they’re the best at what they do. They all need to be hugely reliable at scale, as well as fast, well-designed, and accessible to all.

And so, over the past few years, it’s been our strategy to recreate Ö÷²¥´óÐã Online. Almost every part has been rebuilt on the cloud. We’ve taken advantage of the many benefits that the cloud brings — such as the flexibility to provision new services almost instantly. And we’ve used best-practice tools and techniques — such as the React framework, and the DevOps model. We’ll look more at the approach in a moment, but first, let’s discuss the underlying principles.

Principles

Rebuilding a massive website could easily suffer from the . It’s all too easy for new projects to be over-ambitious with the requirements and the approach. A push for perfection makes it tempting to pick the most sophisticated solutions, rather than the simplest. We needed to prevent this, to ensure good value and delivery at pace. Here are some principles that helped us do this.

1) Don’t solve what’s been solved elsewhere

When building something as large as Ö÷²¥´óÐã Online, it might be tempting to consider everything from scratch. Doing so provides the most control, and leaves no stone unturned. But the cost in doing so can be huge. An off-the-shelf solution might only give you 90% of what you want, but if can be delivered in 10% of the time, it’s probably a worthy trade-off. This applies to tech, UX, business analysis, and pretty-much everything else. Most problems have already been solved somewhere else; so don’t solve them again.

A great tech example of this is the use of serverless. Around half of the Ö÷²¥´óÐã’s website is rendered serverlessly with AWS Lambda. Managing virtual machines (or containers) is expensive — keeping them secure, reliable and scalable takes time. Serverless mostly solves that problem for us. And problems solved elsewhere mean we shouldn’t do them ourselves.

2) Remove duplication (but don’t over-simplify)

When there are many teams, duplication is inevitable. Two teams will each come across the same problem, and create their own solution. In some ways this is good — teams should be empowered to own and solve their challenges. But left unchecked, it can create multiple solutions that are incompatible and expensive to maintain.

As we’ve rebuilt Ö÷²¥´óÐã Online, we’ve removed a lot of duplication and difference that has built up over the years. Multiple bespoke systems have been replaced with one generic system. It’s a double win, because as well as being more efficient (cheaper), we can focus on making the new single approach better than the multiple old approaches. It’s because of this that the Ö÷²¥´óÐã website now has better performance and accessibility than ever before.

However, we must be wary of over-simplification. Replacing multiple systems with one looks great from a business point-of-view. But software complexity grows exponentially: each new feature costs more than the previous one. There reaches a point where two simple systems are better than one sophisticated system.

As an example, we chose to keep the Ö÷²¥´óÐã’s World Service sites separate from the main English-language Ö÷²¥´óÐã site. The needs of the World Service (such as working well in poor network conditions) were specialist enough to warrant a separate solution. Two simpler websites, in this case, are better than one complex site.

The rendering of the English-language site (left) and World Service site (right) is kept separate, to not over-complicate one solution

3) Break the tech silos through culture & communication

Creating a new Ö÷²¥´óÐã website has involved many teams. To be successful, we needed these teams to align and collaborate more than we’ve ever done before. Otherwise, we could easily create something that’s less than the sum of its parts.

It’s hard to overstate the value of communication. Without it, teams cannot understand how their work fits alongside that of other teams. And without that understanding, they cannot see the opportunities to share and align. Teams may even start to distrust each other.

Communication brings understanding, and that allows the culture to change. Instead of teams doing their own thing in isolation, they naturally share, collaborate, and flex to each other’s needs. They go beyond what is strictly their team’s remit, knowing that others will too. Which ultimately makes a better solution for everyone.

Over recent months I’ve heard teams say things like ‘that other team is busy so we’re helping them out’, or ‘we’re aligning our tech choice with the other teams’. It’s a level of collaboration I’ve not seen before. By understanding the bigger picture, and how everyone plays their part, we’ve created a level of trust and alignment that’s a joy to see.

4) Organise around the problems

This ‘onion diagram’ helped explain how common concerns & capabilities (in the centre) should be solved once. This frees up teams to work on the more specialist capabilities in the outer rings.

Even with great culture and communication, multiple teams won’t necessarily collectively come together to build the right thing. This is, of course, the much-quoted .

“Organizations which design systems […] are constrained to produce designs which are copies of the communication structures of these organizations.”
Melvin Conway

The Ö÷²¥´óÐã has historically had separate websites — for News, for Sport, and so on. Each one had a separate team. To change that, and build one website, we needed to reorganise. But how? One gigantic team wouldn’t work, so instead we split teams into the most efficient approach. We created teams for each page ‘type’ — a home page, an article page, a video page, and so on. We also created teams to handle common concerns — such as how the site is developed and hosted. Altogether, it minimised overlap and duplication, and allowed each team to own and become expert in their area.

5) Plan for the future, but build for today

When designing large software systems, we’ve a tricky balance to find. We must plan for the future, so that we meet tomorrow’s needs as well as today’s. But we also don’t want to over-engineer. We cannot be sure what the future will bring. Requirements will change. Cloud providers will offer new technologies. The rate of change in the world — particularly the technology world — is higher than ever before.

There is no substitution for good analysis, planning, and software design. Researching the opportunities and choices available is key in ensuring a project sets off in the right direction. But we must resist the danger in over-thinking a solution, because it may be for a future that never comes.
The beauty of agile software development is that we can discover and adapt to challenges as we go along. Business plans and architectures need to evolve too, based on what we learn as the project evolves. Don’t solve problems until you’re sure they’re problems you actually have.

6) Build first, optimise later

Expanding on the above, we must be careful not to optimise too soon. Large software projects will inevitably run into performance issues at some point. It’s super-hard to predict when and how these issues will appear. So, don’t. Use the benefit of agile development to respond to performance issues only when they become real.

As mentioned above, much of the Ö÷²¥´óÐã website now renders serverlessly using AWS Lambda. At the start of the project, we were suspicious of how quickly Lambda could render web pages at scale. We had an alternative approach planned. But in the end, we didn’t need it. Performance using serverless has been excellent. By not optimising too early, we saved a huge amount of effort.

7) If the problem is complexity, start over

This principle is :

“A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.”
John Gall

Removing complexity from an existing system is hard. And in our case, we had multiple complex websites that we wanted to combine. The collective requirements of these sites would overload any one system. So, we had to start again, going back to the basics of what common abilities were needed.

8) Move fast, release early and often, stay reliable

Part of the the monthly status report we make for the WebCore project.

Finally, a very practical principle: ensure you are able to move quickly, so that you can learn and adapt. Release early, and often — even if it’s only to a small audience. As discussed earlier, predicting the future is notoriously hard. The best way to understand the future is to get there quicker.

The counterargument to this is that change brings risk. And with a popular service like Ö÷²¥´óÐã Online, reliability is critical. The Ö÷²¥´óÐã has always had a strong operational process (including 24/7 teams managing services, and a DevOps approach to ensure those who develop systems are also responsible for maintaining them). We’ve continued to invest in this area, with new teams focussing on infrastructure, and on the developer experience (DevX). We’ll go into more details in a future blog post.

Smaller releases, done more often, are also an excellent way to minimise risk.

High level technology overview

Putting the above principles into practice, here’s a super-high overview of how the Ö÷²¥´óÐã website works.

Let’s look at each layer.

Traffic management layer

First up, all traffic to www.bbc.co.uk or www.bbc.com reaches the Global Traffic Manager (GTM). This is an in-house traffic management solution based on Nginx. It handles tens of thousands of requests a second. Because of its scale, and the need to offer extremely low latency, it partly resides in our own datacentres, and partly on AWS.

For parts of our site, a second traffic management layer is sometimes used. (Internally, these are called Mozart and Belfrage). These services, hosted on AWS EC2s, handle around 10,000 requests per second. They provide caching, routing, and load balancing. They also play a key part in keeping the site resilient, by spotting errors, ‘serving stale’, and backing off to allow underlying systems to recover from failure.

Website rendering layer

The vast majority of the Ö÷²¥´óÐã’s webpages are rendered on AWS, using React. React’s isomorphic nature allows us to render the pages server-side (for best performance) and then do some further updates client-side.

Increasingly, the rendering happens on AWS Lambda. About 2,000 lambdas run every second to create the Ö÷²¥´óÐã website; a number that we expect to grow. As discussed earlier, serverless removes the cost of operating and maintenance. And it’s far quicker at scaling. When there’s a breaking news event, our traffic levels can rocket in an instant; Lambda can handle this in a way that EC2 auto-scaling cannot.

A new internal project, called WebCore, has provided a new standard way for creating the Ö÷²¥´óÐã website. It’s built around common capabilities (such as an article, a home page, and a video page). It’s created as a monorepo, to maximise the opportunity for sharing, and to make upgrades (e.g. to the React version) easier. By focussing on creating one site, rather than several, we’re seeing significant improvements in performance, reliability, and SEO.

Spot the difference: comparing the older page (left) with the new cloud-based version (right). The numbers are Lighthouse performance scoring.

As discussed earlier, we’ve kept our World Service site as a separate implementation, so that it can focus on the challenges of meeting a diverse worldwide audience. (This project, called Simorgh, is open source and available on GitHub.) Our iPlayer and Sounds sites are kept separate too, though there is still a considerable amount of sharing (e.g. in areas such as networking, search, and the underlying data stores).

Business layer

The rendering layer focuses just on presentation. Logic on fetching and understanding the content is better placed in a ‘business layer’. Its job is to provide a (RESTful) API to the website rendering layer, with precisely the right content necessary to create the page. The Ö÷²¥´óÐã’s apps also use this API, to get the same benefits.

The Ö÷²¥´óÐã has a wide variety of content types (such as programmes, Bitesize revision guides, weather forecasts, and dozens more). Each one has different data and requires its own business logic. Creating dozens of separate systems, for each content type, is expensive. Not only do they need the right logic, but they also need to run reliably, at scale, and securely.

And so, a key part of our strategy has been to simplify the process of creating business layers. An internal system called the Fast Agnostic Business Layer (FABL) allows different teams to create their own business logic without worrying about the challenges of operating it at scale. Issues such as access control, monitoring, scaling, and caching are handled in a single, standard way. As per our principles, we’re making sure we don’t solve the same problem twice.

Platform and workflow layers

The final two layers provide a wide range of services and tools that allow content to be created, controlled, stored and processed. It’s beyond the scope of this post to talk about this area. But the principles are the same: these services are moving from datacentres to the cloud, tackling difference and duplication as they do, and ensuring they are best-placed to evolve as Ö÷²¥´óÐã Online grows.

On to the next chapter

So, Ö÷²¥´óÐã Online is now (almost) entirely on the cloud. And it’s faster, better and more reliable as a result. We’ve discussed the key principles on how we made that change happen. And we’ve seen a summary of the technology used. More on that in later posts.

Most excitingly, this isn’t the end; just the start of something new. The tech and culture ensure we’re in a brilliant place to make Ö÷²¥´óÐã Online the best it can be. And that’s what we’ll do.

I’m massively proud of all we’ve achieved — to everyone involved, thank you.

]]>
0
The Remote Generation Mon, 27 Apr 2020 09:00:34 +0000 /blogs/internet/entries/7a072979-8e30-49c6-a44e-a5bed647e12c /blogs/internet/entries/7a072979-8e30-49c6-a44e-a5bed647e12c Eloise Coveny and Moritz Kornher Eloise Coveny and Moritz Kornher

Illustration of Eloise and Moritz by Eloise Coveny

About the authors
We are Eloise Coveny, and Moritz Kornher. A Kiwi who once lived in Germany, and a German who once lived in New Zealand. And now we are both Software Engineers at the Ö÷²¥´óÐã. We gave a version of this article as a talk at the {develop:Ö÷²¥´óÐã} conference in March 2020. It was the beginning of the coronavirus outbreak in Europe, and the conference had to be moved online within a very short time frame.

A few months ago, the idea of a worldwide pandemic seemed unimaginable. Today the whole planet is in lockdown. Back then only some of us had any experience working remotely. Now everyone does.

We were planning to write a light hearted blog about remote first practices. We thought that it may perhaps be thought provoking. But as the current context has been changing daily, so has this. Now that you’re all remote working experts, what can we possibly say that you don’t already know?

We want to tell you two stories about remote-first working. Our teams have been engaging in these practices for some time now. Whilst we found remote working to be very different for each individual and every team, we did also see some common themes emerging. By putting our two stories next to each other, we hope to highlight different perspectives.

Remote Generation

by Eloise Coveny

Since I started working at the Ö÷²¥´óÐã in May of last year I have been working in a cross-located team across London and Glasgow. Everything has been done remote-first; our morning stand ups, pair programming, and all social interaction with the wider team. In my first few months it was just my team lead and I who were in Glasgow, so our team was only just transitioning to becoming remote-first. I can only imagine for my team, who were used to being all co-located, that this must have been a difficult transition for them to make.

It was particularly awkward for me, not just because I was new to the Ö÷²¥´óÐã, but also because of the fact that it was my very first role as a software engineer. I was transitioning from a career in the arts, so it was a completely new experience. In hindsight now I think joining a cross-located team meant that I had to be a good communicator from day one. Luckily for me I was no stranger to communication, as I had over a decade of experience in my past career.

I understood the importance of rigorous clear communication very early on, and that it would become my lifeline if I was to get to know my team and learn to become a better engineer. What helped me to feel valued by my team was when my colleagues would reach out to me and either express an interest in getting to know me, or simply offer to pair with me on a task.

So given my experience-to-date as a software engineer has been by and large remote-first, does that make me part of a new “remote” generation? It’s an interesting concept. And perhaps now due to this pandemic, we are the Remote Generation. So what does this mean for the future of software engineering?

Having been a cross-located team now for just under a year, I think we are all pretty comfortable with remote-first practices, and on the whole things seems to work quite well. But this isn’t to say that we can’t improve. Now there are more engineers in our team in Glasgow, I can be guilty of swivelling round to my colleague sitting next to me, to ask him a quick question. But when I do this instead of asking the wider team via Slack, I am excluding others and restricting knowledge sharing. Code Reviews, for example, are more inclusive when done remotely because more people can be involved, so I can receive more varied feedback and it increases knowledge transfer across the team. So this got me thinking, can remote-first practices benefit us, allowing us to develop our practices beyond the limits of co-located restrictions?

We understand the importance of social interaction and meeting colleagues in real life. As ironic as it sounds now, none of us want to be subjected to eternal house arrest. So how do we get around this? What can we do now with our remote tooling to try to bridge these gaps?

Our team has been constantly facing this problem. We have been trying a few things, and although they are no substitution for face-to-face time, they do perhaps make up for some things in our day-to-day working. Every morning in the 15 minutes before stand-up we have a casual drop-in chat room called Talking Heads. This dedicated time-box gives us an opportunity to have these over-the-desk conversations with our colleagues in London that we otherwise don’t get to have. We have also had several farewell calls, giving us the opportunity to say our byes to departing colleagues and share in their embarrassment as they open their leaving gifts. And just very recently we have begun holding end-of-play quizzes on Fridays. Three hosts prepare some questions about topics of their personal interest. The rest of us break off into groups to answer them, and we get the pleasure of learning something new in the process.

It’s fun being able to have the creative scope to use our imagination and design different ways of doing things with remote tools. But, how far can we push these boundaries? And where will remote working take us next?

Test Drive

by Moritz Kornher

For us, remote working started with an email. The subject line was unsuspicious: “Video First trial”. Its contents were ambitious, because in just a couple of weeks the whole of the Voice + AI department would switch to remote-first working for two months. It was July 2019 and little did we know of what the world would be like less than a year later. It was only after the trial that I learned it was, amongst other things, indeed planned as a test drive on our business continuity.

To be clear, we weren’t kicked out of the office. But the intentions of the experiment were communicated very openly. People would choose where they wanted to work; at the office, from home, on the beach or a lonely bothy in the Scottish Highlands. We were to push the boundaries of our existing working practices and learn as much as we can about this new exciting challenge.

How we prepared

So, how do you take a whole department remote? We didn’t need to start from nothing. Slack has always been an integral part of our communication, and early in the year the department did sign up for a video conferencing solution. Around then my team had also moved to remote-first stand-ups to support regular home-workers.

Of course we still had another three weeks to prepare. As a first measure a department-wide town hall meeting was called. Town halls are exactly what they sounds like; one large video call with everyone from within the department and an open floor to ask the senior leadership questions. We held them regularly during the trial and I found them to be a very effective tool for vertical knowledge sharing. These meetings allowed me to ask questions beyond the general brief that was normally communicated. And it was a great test to see if we could have a web conference with over a hundred participants.

With lots of questions answered, some uncertainty remained around the practical execution. What does it mean for me as an individual to work “Video First”? To tackle all these little issues a multi-discipline working group was put together. They have created a Remote-Working Handbook. It reflects industry practises as well as learnings we made during the trial and is now widely shared across the business.

How I dealt with it

The first day of the experiment had arrived. Most colleagues started to work from home straight away. Others preferred to keep working from the office, which is what I did. It gave me the chance to observe changes around us. For obvious reasons face-to-face meetings did not occur anymore, everything moved online. Our work area was much quieter than usual and some co-workers started earlier in the day to make good use of the extra time they were saving on their commute.

Eventually I took the leap and also stayed at home to work. What surprised me at first was how little changed. Within two weeks our processes had adapted so swiftly that the only noticeable difference was a lack of direct social interactions. Of course, there were challenges. If you have ever tried remote pair programming, you will know how tricky it can be. Running our team retrospectives smoothly took us a while, until we had figured out the best tools and the right flow. And yes, video call fatigue can be a real problem.

But these are not issues with remote first working. Look closely and you will find the exact same problems in a co-located workplace. Many people don’t follow proper pairing protocols even when they sit next to each other. Retros can easily be dominated by a few vocal people, and I almost never take the recommended hourly screen breaks.

It was important for us to identify these pain points and to address them quickly. As we had hoped, our ways of working were pushed to their limits and it was clear they needed to change. In a remote world everything is more formalised, therefore flaws in the workflow are exposed earlier. No overhearing of conversations to fill gaps, or kitchen chit-chat to get the latest updates. I believe that being remote helped us to manage this change.

Remote practices unlock new possibilities that a co-located world cannot provide. When I am online there is less of a barrier to use tools or online services that help me to improve my pairing technique. Virtual retrospectives allow us to be more creative and so engaged more team members in different ways, helping to encourage them to share their thoughts. And since I’m in web meetings all day anyway, I can easily schedule my breaks between calls or I get friendly reminders from my colleagues.

How we are doing now

Fast forward a few months and it is the beginning of 2020. We have been following the same remote-first practices we all adopted during our trial. On any given day in my team, someone was working from home. Sometimes it was just one person, on other days it was almost all of us. We did not plan our presence in the office anymore, it just happened. The trial had helped us to improve our working practices so much, that we did not notice a difference anymore. For us it was the new “normal”.

Until coronavirus put Europe in lockdown. At first we were told to stay away from cafés and pubs. Later we were asked to work from home. That’s when I realised how lucky we were. Because we had a chance to prepare for exactly this situation.

Now I know it’s okay to call someone for a quick question, or a wee social chat.
Now I know the value of a well facilitated remote meeting, and I never want to go back to a meeting room again.
Now I know pair programming is a much more effective tool for knowledge sharing than overhearing conversations in the office.
Now I know that it’s okay to do my laundry while I’m listening to that town hall meeting.
Now I know that regular screen breaks are equally important as doing the work itself.
And of course, now I know that I can wear pyjama bottoms all day.

Originally I wanted to finish with a recommendation. I would have suggested that everyone in your team should adopt remote-first working practises for a couple months; to maybe pick a time that is a bit quieter, but really just to go for it. I would have said that all of you need to experience remote working first-hand, so that you will be ready for the new decade and the changes it will bring.

Instead I would now like to leave you with an interesting and telling observation my team lead made at the beginning of the year.

'Everything works better when we work at home'

Keep an open mind, and I promise you can get there as well.

Futurespective

Two very different stories from two unique perspectives. In the first we learn about a team that is gradually forming remotely. We learn how a new starter finds their footing, whilst being physically disconnected from the rest of the team. And how the team finds remote answers to their remote questions.

In the second story we have an existing team that know each other well. A team that is already comfortable working together. And then suddenly they become remote all at once, and they have to figure out how to carry over their practices into this new environment.

Of course both stores share some similarities. Each of our teams had to overcome challenges. But by being flexible and trying out new things, we found that remote first practices have helped to improve our workflows.

There are two points that particularly stand out for us. Firstly, it’s all about communication. Communication is much more difficult when we are not in the same place, but as humans it is engrained in our DNA. We now have the remote tools to allow us to be more creative in the ways we communicate; to explore more, to play more. We need to use this creativity to overcome any remote hurdles.

Secondly, it’s important to have a positive outlook. We are all in this situation together and we can’t just opt out of it. However it is more than just that, we are in the midst of the biggest work revolution since the nineteenth century. Coronavirus is not the cause, it is only an accelerant for this transformation. Our ways of working will change. What we can do is to be a part of this change, to influence it, to make it positive. Let us re-imagine and re-design the way we work as an industry and as a society.

What will our working environments look like?
Our relationships within our teams?
What about our daily schedules?
Our workflows?

Once this is over, all of us will form this new Remote Generation. We just don’t know what it will look like yet.

]]>
0
Mobile: Apps & Beyond – a conference without limits? Thu, 20 Feb 2020 13:38:35 +0000 /blogs/internet/entries/941ff3c2-6f47-4141-9ed2-50b212fb103b /blogs/internet/entries/941ff3c2-6f47-4141-9ed2-50b212fb103b Shahnaz Hameed Shahnaz Hameed

On 5th February 2020, the Ö÷²¥´óÐã’s mobile community came together for the first time to discuss the opportunities and challenges they face, share experiences and learn from others working in the mobile space.

Breaking down silos and building a wider mobile community - that was the key message of the introductory address made by Will Wychgram, Senior Architect for Technology Strategy & Architecture (TS&A) who worked tirelessly - alongside TS&A’s Alison Kelly and Diane Richard - to put together the first mobile community conference, with Google’s London Academy as our hosts for the day.

As members of the tech sector - software engineers, developers in test, business analysts, project managers and product owners - we all feel that none of us really have the opportunity to talk to anyone in the other teams with whose products we integrate; or that we can reach out to another team to find out how they might approach a problem that’s causing us some grief. That’s what the inaugural Mobile: Apps & Beyond conference was all about.

The day was full of talks, presentations and opportunities for networking – you can’t beat passing time chatting out in the cold, during a fire drill!

The first of our talks was a fireside chat with Matthew Postgate, Ö÷²¥´óÐã Chief Technology & Product Officer - hosted by Ö÷²¥´óÐã R&D’s Bill Thompson - who mused that ‘...mobile is still the most tangible expression of the information age…’, aligning with thoughts expressed in a lecture Postgate had heard at the Royal Academy of Engineering regarding information as a utility, in that it would be constantly available and accessible.

Matthew Postgate and Bill Thompson engage in a fireside chat

Thoughts then turned to 5G and Ambient Computing; Postgate cautioned that whilst 5G will lead to more device connectivity, there’s no telling how users will make use of that bandwidth and that the technology is still quite young. Ambient Computing opens up a number of possibilities and scenarios - it’s something the Ö÷²¥´óÐã is exploring with Voice and AI - where we’ll be relying on, perhaps something being said out loud, or triggered by a watch, with the response being displayed; or even an activity started on one device and continued on another.

After being evacuated for a fire drill – surely a coincidence that this followed Matthew’s fireside chat - Victor Riparbelli of took to the stage. He has worked with the Blue Room to put together a concept of a personalised weather report utilising a computer generated weather reporter. Riparbelli touched upon other ways Synthesia uses AI to create content, such as generating streets and traffic, and having someone speak a number of languages, as showcased in David Beckham’s Malaria Must Die campaign.

The mobile community in networking mode - including party hats

We then had a panel discussion with Neil Hall (Head of Sport Digital), Alex Watson (Head of Product, News), Matt Clark (Head of Architecture, Digital Products), Christine Bellamy (Head of Product, Ö÷²¥´óÐã) and David Andrade (Head of Software Engineering, iPlayer & Sounds), chaired by Ahmed Razek (Blue Room). Topics of discussion covered:

  • The 2020 Vision - The panel turned its thoughts towards ways to generate engagement with Ö÷²¥´óÐã content during this year’s Sporting events, and Ö÷²¥´óÐã Sounds. By creating a personalised recommended experience for users, with a larger focus on acquiring and retaining under 35s with Project Chrysalis.
  • Retaining Under 35s - As this is a large age range to cover, there are different strategies being utilised, such as enhancing children’s experience of iPlayer, encouraging Sign In and celebrating the release of Nightfall; so far, experience shows that teens are harder to engage, but this challenge also presents many opportunities. Across the board though, the panel felt we need to be more granular, to aggregate content and have a better symbiosis and partnership with Editorial, so that we can get relevant content in front of our audiences.
  • Staying Competitive - With the launch of Disney Plus and Amazon’s airing of sport, the panel reminded us to have confidence in our position - the Ö÷²¥´óÐã has a strong relationship with sports fans and we should be playing to our strengths. However, we shouldn’t rest on our laurels, as it’s good to be worried with the market getting so competitive - it keeps us on our toes and drives us to get more creative. Despite being our competitors, we should still look for opportunities to collaborate with these platforms professionally and create an “Education Space” for us to learn from each other, whilst maintaining our brand identity.
  • The Level of Focus on Web vs Mobile - Whilst we get a lot of users interacting with the Ö÷²¥´óÐã via mobile, this is not solely through our apps, but also our web presence. We do need to encourage more development in the “App Space” because apps can interact with users in a way that the web doesn’t, such as through push notifications. The panel recognised that there is frustration with the slow pace of change, but assured us this is due to deep structural reasons.

Demos, Networking & Lightning Talks

During lunch, we were given a tour of the Ö÷²¥´óÐã Blue Room’s Digital Human demos and Ö÷²¥´óÐã R&D helped us discover the potential uses of 5G in content production, consumption and distribution.

After lunch we all participated in a fun networking activity which involved party hats, emoji stickers and some surprising revelations! To round off the day, there were a series of interesting lightning talks - one from Google showcasing some highlights with what to expect from Android Studio 4.0; and the rest from various Ö÷²¥´óÐã Mobile teams including News, Sport, iPlayer & Sounds, and Children’s.

Overall the day provided us with a great opportunity to meet, share ideas, collaborate and build new working relationships with other members of the mobile community – it was certainly long overdue and I hope it is just the start of many more such opportunities to come together as one community.

]]>
0