Why are video encoders all different?

Most of the Lossy Codec standards including audio, still picture and video don't actually define the encoder process. What they define is the format how the information is communicated - the syntax of the compressed bitstream - and how to perform the decoding process of that data. All decoders need to follow this decoding process accurately, they can still have small differences between decoder in ways like how they handle corrupt or incomplete data, how they perform the buffering of data. For the same compressed bit stream you will get a bit exact decoded output. The diagram below is a simplified codec design. It does not show the temporal predictions and closing the loop with the decoder which is used in modern video codecs. (IMHO the diagram gets too complex when these are included, but our CTO did not like this simplification). The important takeaway is that only the decoder is fully defined in the standard like H.265/HEVC.

In the case of H.265/HEVC there is a virtually infinite number of different ways to compress a block of pixels, much larger than what was the case for H.264/AVC. It is the ‘art’ of the encoder vendor to invent proprietary algorithms and to find the optimum trade-off  between image quality, compression efficiency and computational requirements.

Different encoder markets have different requirements. Some markets such as professional broadcast care the most about compression quality. They need the lowest bit rate for a given quality and are willing to allow more computation resources to achieve this goal. Typically the Average Selling Price (ASP) of ICs for this market is $1,000-2,000 each! 

Also the power consumption of these solutions is quite large. The broadcast customer expects a level of configurability which is very sophisticated. They need to optimize the encoding based on the source material and their system architecture.

Lets compare this to a Smartphone encoder. In the smartphone encoder power consumption and cost are very important. As such the encoder needs to be smaller and have extensive optimizations for low power.

An IP camera used in a surveillance system typically uses Ethernet and as such is not very focussed on compression quality or error robustness. What they do focus on is integration and low cost.

A telepresence video conferencing customer needs error robustness and low latency. They also need good compression quality and lots of configurability. Telepresence customers use specific features of the standard like long term reference frames and slices which are not normally supported in smartphone encoders.

The following table show some of the different encoder requirements by market.

So we have shown how encoder vendors optimize for different requirements. At NGCodec we are building an encoder which can be configured at synthesys and run-time to support a wide range of requirements and different market and applications.