Linear Video Coding and Transmission schemes for next generation video applications
VCIP 2022 Tutorial, 13th December, Suzhou, China

Tutorial Description and Motivation

Conventional video coding and transmission systems are currently based on digital video compression (e.g., HEVC) on a suitable network protocol (802.11, 4G, or 5G) and rely on Shannon separation theorem. However, they suffer from some inherent limitations when the video content is transmitted over wireless error-prone networks. First, the coding choices (compression rate, channel coding rate) are decided a priori and at the transmitter and are the same for all the potential receivers. They could misfit with the actual channel conditions. Some user(s) with degraded channels may undergo digital cliff (glitches or freeze of the video) while other(s) may have a very good channel and yet not taking fully benefit of it since the design choices are based on more pessimistic hypotheses. Second, the traditional techniques require a permanent adaptation of the coding parameters by the transmitter relying on an estimate of the rate-distortion characteristic of the source and on an estimation of the channel characteristics, implying additional delay to perform this adaptation. Third, delay is introduced by the various buffers present at the encoder, within the network, and at the receiver. They are either required to smooth out variations of the encoding rate and of the channel characteristics, or due to the shared network infrastructure.

Linear Video Coding and Transmission (LVCT) schemes, pioneered by the SoftCast architecture, have demonstrated over the last decade a high potential to address/mitigate these issues. LVCT schemes are joint source-channel video coding and transmission architectures that process pixels by successive linear operations (spatio-temporal decorrelation transform, power allocation, analog modulation) and directly transmit the information without quantization or coding. LVCT architectures deliver a single data stream that can be decoded by any receiver, even those experiencing bad channel quality. This data stream allows each receiver to decode a video quality commensurate with its channel quality, without requiring any feedback information, while avoiding the complex adaptation mechanisms of conventional schemes. Moreover, LVCT schemes offer a relatively low and controlled latency that can be adjusted through the size of the temporal transform. This is a paradigm break with respect to traditional video transmission architectures, which has the potential of dramatically improving the quality of experience in wireless and latency-constrained scenarios.

This tutorial will first introduce use cases where LVCT schemes can make a difference compared to traditional schemes relying on conventional encoded video streams (e.g., HEVC) over a suitable network protocol (802.11, 4G, or 5G). Issues with conventional digital schemes will also be discussed (e.g., complex adaptation, cliff-effect, etc.), justifying the LVCT approaches. Then, a block-by-block description of the components of the baseline SoftCast LVCT scheme will be presented and visual examples provided to facilitate the understanding. A third part will be devoted to real implementations of LVCT architectures, the dense modulation process and bandwidth computation will be detailed. Recent technical innovations and results including those dedicated to new media (VR, 360°, point cloud) from the literature will be presented and discussed. Finally, current research challenges related to the development of LVCT schemes will be presented, including hybrid digital analog schemes, channel adaptation and reduction of the latency.