Introduction


We are currently entering a new phase in the delivery of multimedia content to consumers.  As in any previous phase, the quality and fidelity of the experience increases. In this phase we are migrating from High Definition TV (HDTV) to Ultra High Definition TV (UHDTV). It has become apparent that, in addition to higher resolution, UHDTV would benefit from other visual quality enhancements like Higher Dynamic Range (HDR) and Wide Color Gamut (WCG). Furthermore, there are also situations in which the increase in High Frame Rate (HFR) may have some benefits, especially in high motion content, such as sports. Consumer research has revealed that such features can result in a much more realistic experience, and have been received very positively in the marketplace. This has led most of the CE manufacturers to create and promote related technologies and equipment.


High Dynamic Range is the capability to represent a large brightness variation in the video signal, i.e., from very low luminance/dark values (<0.01cd/m2)i  to very high luminance/bright values (≧1000cd/m2). Currently, existing systems that utilize what is commonly referred to as the Standard Dynamic Range (SDR) typically support brightness values only in the range of the order of 0.1 to 100cd/m2.

Similarly, Wide Color Gamut is the capability of representing a larger variety of colors than have been supported by conventional systems. Existing systems have been based on the BT.709 and BT.601 color spaces, which only capture a relatively small percentage (~33.5%) of all visible chromaticity values, according to CIE 1931. Wider color spaces, such as the P3D65 and BT.2020 can represent a much larger percentage of visible chromaticities (~45.5% and ~63.3% respectively). These representations can result in more colorful and realistic content and thus enhance the viewer’s experience and impression of reality.

                Coverage of BT.2020, P3 and BT.709 in CIE 1931 XYZ chromaticity diagram

                Coverage of BT.2020, P3 and BT.709 in CIE 1931 XYZ chromaticity diagram

HDR/WCG content have very different characteristics from SDR content. For example, it is important to recognize that when content is produced for an HDR/WCG display, the content needs to be accommodated to the new characteristics of the display and may require different color grading (tone mapping) to compensate for the higher dynamic range being represented. The characteristics of the content may also differ due to different psychovisual models or processing methods employed during production of the material.

These differences may result in a number of changes that must be performed on the pre-processing, encoding and post-processing blocks of a delivery chain, in order to convert, encode, and display the content properly. As an example, HDR/WCG content can considerably vary in brightness range, with most content produced currently with a peak brightness value starting from ~300 up to 2000 cd/m2. Similarly, HDR/WCG content may be produced using only BT.709 or P3D65 primaries. However, existing and upcoming HDR/WCG display systems are only capable of representing peak brightness values of ~support from 300 to ~1000 cd/m2 and may have their own color gamut limitations.  This implies that these elements need to be taken into account when processing HDR/WCG content.

 

Encoding HDR/WCG content with HEVC Main 10 profile: HDR10 

One of the earliest concerns with regards to the delivery of HDR/WCG material with the desired color gamut and dynamic range to consumer devices, was whether there could be a consumer-implementation friendly format that could be used for the delivery of such material with little if any degradation. It was believed, initially, that higher bit-depth, e.g. 12 bits, 4:4:4 encoding, as well as making use of color spaces such as the CIE 1931 XYZ color space, would be necessary to achieve that goal. Fortunately, after careful investigation it was determined that the use of a carefully designed 10 bit 4:2:0 format would still be suitable for HDR/WCG delivery. This perfectly fit with plans for deployment of the HEVC Main 10 profile by several consumer devices and services. Various tests conducted using this profile demonstrated the viability of this solution.
Given these results, several organizations including the Blu-ray Disc Association (BDA), the High-Definition Multimedia Interface (HDMI) Forum, and the Ultra-High Definition (UHD) Alliance have decided to adopt a delivery format based on HEVC Main 10, commonly referred to as “HDR10”, for the compression and delivery HDR and WCG content. The Consumer Electronics Association (CEA) defined the term HDR10 Media Profile in 08/27/2015 press relese. An example of an HDR10 processing chain is presented in the following figure:


In the above example, the input is assumed to be a 4:4:4 linear light RGB signal(ii) , with BT.2020 primaries. This signal is likely to have been graded by a colorist or some other system, so as to have a much higher dynamic range and color gamut than found in existing video content. However, it is also possible that SDR content may have to be converted into this format so as to ease the transition from HDR to SDR scenes. The encoding and decoding process is based on HEVC Main 10 profile.

In the example, the encoder side includes:

  • The SMPTE ST 2084:2014 Electro Optical Transfer Function (EOTF) and Inverse-EOTF (OETF) commonly referred to as the Transfer Function (TF), which is applied to linear light samples in the RGB BT.2020 domain.
  • Color conversion to the Non-Constant Luminance (NCL) YCbCr BT.2020 format.
  • Chroma down-sampling to 4:2:0. 
  • Quantization to a 10 bit integer representation. 

Essentially, HDR10 is defined as the combination of the following container and coding characteristics:

  • Color container/primaries: BT.2020
  • Transfer function (OETF/EOTF): SMPTE ST 2084
  • Representation: Non Constant Luminance (NCL) YCbCr
  • Sampling: 4:2:0
  • Bit Depth: 10 bits
  • Metadata: SMPTE ST 2086, MaxFALL, MaxCLL,  HEVC Supplemental enhancement information (SEI) Messages  
  • Encoding using: HEVC Main 10 profile (iii)  

The HEVC specification supports all of these features as well as metadata (SEI) that can specify the mastering and brightness limitations characteristics of the content.

 

Electro Optical Transfer Function (EOTF) and Inverse-EOTF (OETF) example: SMPTE ST 2084:2014 and Hybrid Log-Gamma (HLG)

In the previous figure, the linear light signal was first converted to a non-linear light representation using the ST 2084 (or PQ – Perceptual Quantizer) inverse transfer function. The main purpose of this conversion is to convert linear/optical data to an electronic/digital representation that can best exploit the characteristics of the human visual system. This is a necessary step so as to be able to represent a rather wide dynamic range of brightness with little loss of information using a fixed precision representation, e.g. 12 or 10 bits, which is required by current digital signal processing and compression systems.
ST 2084 is a transfer function recently standardized by SMPTE that enables a somewhat compact representation of HDR content using such limited bit depth representations. Unlike the traditional power law gamma curves, e.g. BT.709 or BT.1886, that have been designed to cover brightness values of up to 100 cd/m2 this transfer function is designed to cover a much wider range of brightness values.  Specifically, ST 2084 is able to support brightness values in the range of 0.001 to 10,000 cd/m2 as shown in the following figure. Even though existing consumer displays can only support much more limited dynamic ranges, e.g. <2,000cd/m2), it was designed with future consumer equipment as well as professional applications, such as post-production and archiving, and display interfaces (e.g. HDMI) in mind.

SMPTE ST 2084 EOTF (courtesy of SMPTE)

SMPTE ST 2084 EOTF (courtesy of SMPTE)

In contrast, the Association of Radio Industries and Businesses (ARIB) adopted a hybrid log-gamma OETF, which was jointly developed by the BBC and NHK, for their UHDTV service that is to be deployed in Japan. This service will support the delivery of 8K format video with both HDR and WCG. This transfer function is specified in ARIB STD-B67, and apart from being able to represent higher dynamic range content also claims a form of backward compatibility, when the content is decoded and displayed on a BT.1886 capable system. In such a system, the content, although not perfect, would still look reasonable to the casual viewer, without the need of any additional processing. A major difference of this transfer function versus ST 2084 is that this transfer function is considered to be a scene referred transfer function, like BT.709, allowing a system to invert it based on the target display capabilities. On the other hand, ST 2084 is a display referred transfer function and requires the exact inverse process to be applied so as to allow the display to match the exact intent of the grading display. Another difference is that this transfer function can only cover a considerably narrower dynamic range than ST 2084 (<2,000cd/m2). Nevertheless, there is considerable interest in using this transfer function for several live production applications, especially for TV broadcast applications.

Hybrid log-gamma and SDR OETFs (courtesy of BBC/NHK)

Hybrid log-gamma and SDR OETFs (courtesy of BBC/NHK)

 Backward compatibility with SDR

As there are a large number of displays and TV sets that are not capable of displaying HDR content, it is important to have some backward compatibility that allows an HDR signal to be displayed on existing SDR systems. There are different ways to provide backward compatibility with traditional SDR systems.

Several schemes are based on multi-layer, scalable solutions that essentially may provide color gamut, extended brightness, as well as, in some systems, bit-depth scalability. These are to some extent similar to the color gamut and bit-depth scalability features supported in the scalable extension of HEVC (SHVC), in which NGCodec was also very active. Nevertheless, similar scalable solutions have been tried in the past for other purposes (like resolution or visual quality) but never became popular due to a number of limitations, including decoder complexity, limited compression efficiency gains, difficulty to create a complete optimized encoder, and restricted flexibility in discontinuing legacy services, among others. Dolby also employs a scalable solution, although with the main difference that the enhancement layer is primarily a residual signal and thus, may require a lower bit-depth representation than other solutions. Major differences in the prediction process between the base and enhancement layer signals also exist. 

A few other systems, such as the ones proposed by Philips and Technicolor, try to avoid some of the above issues by specifying restrictions in the SDR signal generation that can allow a simpler transition from SDR to HDR using simple metadata instead of a complex enhancement layer. The SDR signal would still be encoded using existing codecs, such as HEVC or AVC. A legacy decoder system will then be able to decode and display the SDR signal while ignoring the provided metadata. However, that SDR signal may look different than the SDR signal explicitly graded for such target displays. It is not yet clear whether this limitation is acceptable by the content owners/creators.

Backward compatibility may also be achieved using an HDR capable decoder byperforming a tone mapping process on the decoded data, based on the capabilities of the display. The tone mapping process could be potentially enhanced by or directed using metadata embedded in the bitstream. Such metadata can again more closely match the intent of the content owners/creators, although again with some limitations. It should be noted that this method could be used to deal with a wide range of display capabilities, and not just SDR displays. Given its importance, SMPTE is currently developing the ST 2084 suite of standards, which essentially specifies the semantics and representation of content-dependent metadata that are needed for the colour volume transformation of high dynamic range and wide colour gamut imagery to smaller colour volumes. This currently includes technologies from Dolby, Technicolor, Philips, and Samsung.

Finally, as mentioned earlier, the Hybrid Log-Gamma TF may also allow some form of legacy support. This is the approach that has been proposed by the NHK and BBC and that has been adopted in the ARIB specification for delivering UHDTV content in Japan.


MPEG and VCEG

There is currently an activity inside MPEG that is studying how HEVC compression efficiency can be improved when looking at HDR content. HDR and WCG were already considered very important during the development of HEVC and that has resulted on the importance of the Main 10 profile, compared with previous specifications that focused on 8 bits and did not considered HDR or WCG. 
Based on this activity the MPEG group has been concluded that there is no need to modify the decoding process or the bitstream specification and that the Main 10 profile provides the necessary compression efficiency and fidelity needed for the delivery of HDR content.
MPEG and ITU are also starting to work on the creation of new tools that will be used in the future to create a new video compression specification and efficient support for HDR content is a very important element for this work as well.

Pre-processing/Encoding system changes to accommodate HDR/WCG

Using the HDR10 as an example, it is possible to identify the necessary changes required on an SDR capable system to support the encoding of HDR content.

Pre-Processing/Content Ingestion:

  • Implementation of an Inverse Transfer Function: this is normally a simple process and can easily be performed with the help of a Lookup Table.
  • Color conversion to Non-Constant Luminance YCbCr: this is also a simple arithmetic process.
  • Chroma down-sampling to 4:2:0:  This is a more sophisticated filtering process that tries to preserve as much of the chrominance information that was on the original RG content. This can also be combined with a noise reduction mechanism to reduce the noise on the incoming video to be processed on the encoder.
  •  Quantization to a 10 bit integer representation: this is also a simple arithmetic process.

Encoding using an HEVC encoder: 

  • While the bit stream syntax and essentially the entire decoding process do not need to be modified, the actual characteristics of an HDR/WCG sequence may be quite different from its SDR counterpart. Apart from the different/wider distribution in code values for example, the use also of the ST 2084 transfer function tends to create different visual quality tradeoffs. This implies that encoding optimization techniques that could be used for HDR/WCG material may have to be substantially different than techniques commonly employed for SDR material. Such modifications would likely be necessary so as to allow such HEVC encoders to compress such content as efficiently as possible. Enhancements may be needed, for example, to the motion estimation engine, to the rate/quality allocation and rate control schemes of an encoder, to pre-analysis blocks that commonly collect information about the content that can assist in the compression process, as well as other blocks such as mode decision, quantization, and others. 


Conclusion: Future Proofing with Programmable FPGAs


HDR can be combined with other elements like WCG, higher number of pixels and frame rate to deliver a greater visual experience compared to traditional HDTV systems.
HDR/WCG systems characteristics are still evolving and will continue to evolve in the near future.
The Main 10 profile of HEVC can already provide the necessary compression technology to encode HDR/WCG content without the need for any modifications to the decoding process or to the bitstream specification.
Nevertheless, additional or different encoding optimizations may need to be performed in an HEVC encoder so as to be able to better support HDR/WCG signals given their different characteristics compared to SDR signals. 
This presents a clear advantage for programmable platforms, like Field Programmable Gate Arrays (FPGA), as those can be updated and enhanced to be able to optimize for the characteristics of different video content and improve its performance over time compared with non-programmable and non-upgradable solutions.

“This blog would have not been possible without the efforts, help, as well as feedback from several people in the standardization community, and in particular Alexis Tourapis, Joel Sole and Chad Fogg among others.”

Footnotes

  i) This unit is also commonly referred to as a nit. However, the nit is a non-SI unit name and is considered by many as obsolete.
  ii) Most content that would be available, would likely be in a 16 bit RGB ST 2084 (with 12/14 bit effective data).
  iii) AVC/H.264 High 10 could also be used.

Comment