Video Compression Using Neural Networks: Improving Intra-Prediction

Published: 22 June 2020

Maria Santamaria
Researcher
Saverio Blasi
Lead Research Engineer
Marta Mrak
Lead R&D Engineer

Artificial intelligence (AI) can be successfully applied to images and videos to improve how they look - to add colour, to understand their content better or to help with storytelling, for instance. While these models can successfully automate a variety of tasks, AI algorithms can be biased if not used wisely. Due to the increasing popularity and application of these learning-based models, it is important to be able to explain how their results are devised.

We recently explored various forms of AI to create new video compression coding tools, and we have explained how we use convolutional neural networks in their design. We are also experimenting to see whether some video codecs could benefit from machine learning based on fully connected networks (FCNs).

What we鈥檙e doing

Our objective is to improve intra-prediction. This is a well-known technique to combine and process the neighbouring pixels of a specified area of a video frame to obtain a good prediction of the content being compressed. By getting intra-predictions to be as close as possible to portions of the original content, we can avoid transmitting these portions of the video frame in full, and therefore achieve compression!

FCNs have the potential to improve intra-prediction vastly. However, the resulting models are difficult to interpret and are very complex, mostly due to their structure, a large number of layers and parameters. A layer receives an input, transforms it with linear and non-linear functions (the average rate of change of a linear function is constant, for a non-linear function it is not). The resulting values are passed to the next layer, and so on. The parameters of the network are the weights, learned during the training, used to compute the operations. By simplifying the network and reducing the number of weights, it can be easier to understand how models make their predictions and also to identify ways to reduce the model further. This can result in a compact and explainable model, which requires less computational resources meaning they can be used in applications such as video on demand and video streaming. This is our goal.

Our approach

We used the to train an FCN for intra-prediction. The parameters of the network are updated by minimising a function that takes into account coding the residual (the difference between the original and predicted content). Just as we did in interpreting CNNs for video coding, we analysed the model that was generated to avoid applying the learned parameters without understanding how the model works. The result of our analysis is a simplified and more efficient model that can then be used in video compression.

In our intra-prediction coding experiments using an FCN, after training on a multilayer model we have seen that all non-linear functions can be cleared away for the implementation without significant loss of performance, as shown in the following video.

We evaluated the compression efficiency as well as the encoding and decoding time of both the original FCN and our simplification. Our tests show that our simplification can achieve similar compression efficiency while taking less processing time. The resulting models are easy to interpret and enable a clear understanding of how reference samples contribute to producing the intra-predictions.

More details about this approach can be found in the paper , to be presented at the .

What's next?

Our results demonstrate that simple techniques can perform similarly to more complex ones and in less time in the context of intra-prediction. The learned knowledge can be used to improve future video codec solutions.

This work was co-supported by the , through an in collaboration with the , .

主播大秀

Accessibility links

Maria Santamaria

Saverio Blasi

Marta Mrak

Rebuild Page

Useful links

Theme toggler

主播大秀