The Inefficiency of Current Video Codecs (Part 2)
Prior to its transmission or storage, a multi-media signal (whether audio, image or video) often needs to be converted from one format, or code, to another. This process (or ‘encoding’) is repeated in reverse (or ‘decoded’) for playback or editing. The technology that makes this happen is thus referred to as a “codec“. With the advent of signal digitisation midway through the last century, codecs were originally used to convert analog signals into digital form. Today, however, their uses are wide-ranging and, unsurprisingly, there are many different kinds of codecs that have been developed to address specific purposes. Notwithstanding, competition amongst the most widely-used codecs in the multi-media streaming industry centres around three key features: quality, compression, and computational efficiency.
From the end-user’s perspective, the resulting quality is perhaps the most obviously important factor as it directly affects the end-user experience. However, compression is also relevant: the smaller the file size, the faster the file can be transmitted due to lower bandwidth requirements. A compressed file also requires less memory and so file playback is more economical computationally as it requires fewer system resources. Thus, greater compression results in shorter transmission delays, which in turn contributes to increased quality of the multi-media experience: everyone can relate to the frustration of waiting for the spinning buffering circle to disappear and for the video to continue playback. Better compression also benefits the streaming media provider as smaller files require less space and therefore lower data storage costs. Unsurprisingly, higher quality multi-media is frequently provided at a premium in order to cover the additional costs associated with storage and bandwidth requirements.
However, the dynamic between compression and quality in the context of codecs is often a zero-sum game as the improvement of one comes at the cost of the other. Codecs that prioritise compression over quality are referred to as ‘lossy’ codecs and the most commonly used lossy codecs are based on a mathematical technique called discrete cosine transform or “DCT“. This is the most widely applied mathematical theory in digital media encoding and has been in use over the last 50 years in the compression of images, audio, video, television and radio broadcasts, and speech. Although the principals are fundamentally the same across different media, this article will focus on the application of DCT in the context of videos only. Perhaps the easiest way to understand how our visual environment (as perceived by our eyes) is converted into digital form using DCT (for representation on electronic devices) is to imagine all signals (e.g. light waves) as continuous streams of information. For example, the colour of a flower is never static in reality and is constantly changing based on small variations in factors such as atmospheric pressure, humidity, ambient lighting or even the viewing angle. In order to represent the flower as an image (or series of images), a snapshot is taken at a discrete moment in time and the wavelength data that is captured in that snapshot is then converted into digital form by assigning a numerical value to each discrete colour frequency and then associating that value with its corresponding grid coordinate to map it onto the image landscape. This data can easily be stored digitally (‘encoded’) and reconstituted into a visual representation (‘decoded’) for viewing or editing in a process that is similar to a paint-by-number exercise. However, the problem with this technique is that it creates hard boundaries between areas of different colours in the same way that a paint-by-number has boundaries around each colour area. This imprecision creates a type of error that is known as an image artefact which is the technical term for a feature that is not present in the original object. Although this type of error can be mitigated somewhat by increasing the sampling size and frequency, any residual errors in an image are nonetheless magnified when that image forms part of a series of images, as is the case with video.
To understand why this occurs, we need to introduce the concept of motion compensation. Simply put, motion compensation is an algorithm that looks at the ‘movement’ of pixels from one frame in a video to the next and determines the most efficient mathematical formula to represent their change in position. An effective algorithm only requires key images to be ‘saved’ at maximum quality with the motion compensation determining what happens between those key images to simulate movement. Whilst the algorithm itself may not take up a lot of file space, it requires complex operations that increase computational time. Motion compensation comes in varying degrees of complexity and efficiency and those at the lower end can introduce image artefacts into a frame that have a tendency to accumulate as they are mathematically repeated across a series of frames. This effect is the main source of quality loss in DCT-based video codecs and is a direct result of the types of errors that result from the application of discrete mathematics such as DCT. Such errors include blurring, jitter and various other distortions that are perceptible by the human eye.
Although DCT has been applied to codec technology for nearly 50 years, its ability to address the demands of the current consumer environment is likely exhausted. A modern codec is simultaneously expected to reduce quality loss, file size and code complexity during the encoding process. In order to do so, recent improvements to codec technology have focused on adding filters to boost quality by dividing an image (or frame in the context of a video) into increasingly smaller chunks (i.e. to increase sampling frequency within a static image) to which the same compression mathematics are then applied. Predictably, these additional chunks (albeit smaller) add computational complexity and subsequently require greater time and resources to be dedicated to the encoding process. An alternative solution aimed at reducing file size (applicable to video codecs specifically) focuses on skipping certain frames from the compressed video feed (i.e. reducing sampling frequency across time). As mentioned above, the improvement of one parameter entails a sacrifice of the other and this method results in a significant drop in video quality. Omitting inter-frame data requires the motion compensation algorithm to reconstruct movement based on a smaller sampling frequency, which leads to an accumulation of artefact errors that form visible distortions.
Much has been written about how, in the pursuit of addressing growing demands for higher video quality, enhancement to the DCT algorithms seem unable to reduce computing time for encoding operations. In other words, code complexity in DCT-based codecs has gotten to the point where it is pushing the limits of efficiency. A break-through will likely require the application of new mathematical principles – a solution the industry has been unable to develop convincingly. Fortunately, a European start-up called POLAR HPC Solutions Limited has discovered a model for encoding and decoding multi-media data through the use of fuzzy logic – an approach based on ‘varying degrees of truth’ rather than the classical Boolean logic currently applied in modern computing that offers only a binary approach to truth. Predictably, the results are in a class of their own. POLAR’s codec is called the Wave Spiral Codec (“WSS“) and the main difference between it and other video codecs currently in use is that WSS does not require signal data to be assigned a discrete numerical value. Accordingly, the contours of a shape are more blended and akin to an impressionistic painting rather than a paint-by-number. As a result, WSS does not need to include a complex motion compensation algorithm in order to eliminate artefacts and distortions. The results provide an undeniable advantage over competitive codecs in terms of video quality.
As mentioned in the previous article, algorithmic complexity is of great importance in the development of the next generation of codecs. In this space, WSS is also vastly superior to leading codecs (such as H.264 and H.265/HEVC) as the number of computations required by the WSS algorithm is lower by a factor of at least 10x but can reach 95x in certain circumstances. Put differently, for the same number of computational tasks, WSS would require fewer hardware resources due to its superior computational speed, and this in turn directly influences data centre savings. Additional features of WSS include:
- greater preservation of image integrity;
- individual compression of each image frame in a frame series;
- no transfer of image artefacts between frames; and
- optional adjustment of distortion levels by altering desired compression rate.
However, limitations of the underlying software logic are not the only hurdle to overcome. As a developer of FPGA-based high performance computing devices, POLAR also believes that hardware acceleration can greatly decrease not only compression time but also overall power consumption. Part 3 in this series will explore the current state of hardware solutions in the multi-media streaming market and describe POLAR’s integrated response to the evolving needs of the industry.