PNG (Portable Network Graphics) Specification, Version 1.1

Previous page
Next page
Table of contents

9. Recommendations for Encoders

This chapter gives some recommendations for encoder behavior. The only absolute requirement on a PNG encoder is that it produce files that conform to the format specified in the preceding chapters. However, best results will usually be achieved by following these recommendations.

9.1. Sample depth scaling

When encoding input samples that have a sample depth that cannot be directly represented in PNG, the encoder must scale the samples up to a sample depth that is allowed by PNG. The most accurate scaling method is the linear equation

   output = ROUND(input * MAXOUTSAMPLE / MAXINSAMPLE)

where the input samples range from 0 to MAXINSAMPLE and the outputs range from 0 to MAXOUTSAMPLE (which is (2^sampledepth)-1).

A close approximation to the linear scaling method can be achieved by "left bit replication", which is shifting the valid bits to begin in the most significant bit and repeating the most significant bits into the open bits. This method is often faster to compute than linear scaling. As an example, assume that 5-bit samples are being scaled up to 8 bits. If the source sample value is 27 (in the range from 0-31), then the original bits are:

   4 3 2 1 0
   ---------
   1 1 0 1 1

Left bit replication gives a value of 222:

   7 6 5 4 3  2 1 0
   ----------------
   1 1 0 1 1  1 1 0
   |=======|  |===|
       |      Leftmost Bits Repeated to Fill Open Bits
       |
   Original Bits

which matches the value computed by the linear equation. Left bit replication usually gives the same value as linear scaling and is never off by more than one.

A distinctly less accurate approximation is obtained by simply left-shifting the input value and filling the low order bits with zeroes. This scheme cannot reproduce white exactly, since it does not generate an all-ones maximum value; the net effect is to darken the image slightly. This method is not recommended in general, but it does have the effect of improving compression, particularly when dealing with greater-than-eight-bit sample depths. Since the relative error introduced by zero-fill scaling is small at high sample depths, some encoders may choose to use it. Zero-fill must not be used for alpha channel data, however, since many decoders will special-case alpha values of all zeroes and all ones. It is important to represent both those values exactly in the scaled data.

When the encoder writes an sBIT chunk, it is required to do the scaling in such a way that the high-order bits of the stored samples match the original data. That is, if the sBIT chunk specifies a sample depth of S, the high-order S bits of the stored data must agree with the original S-bit data values. This allows decoders to recover the original data by shifting right. The added low-order bits are not constrained. Note that all the above scaling methods meet this restriction.

When scaling up source data, it is recommended that the low-order bits be filled consistently for all samples; that is, the same source value should generate the same sample value at any pixel position. This improves compression by reducing the number of distinct sample values. However, this is not a requirement, and some encoders may choose not to follow it. For example, an encoder might instead dither the low-order bits, improving displayed image quality at the price of increasing file size.

In some applications the original source data may have a range that is not a power of 2. The linear scaling equation still works for this case, although the shifting methods do not. It is recommended that an sBIT chunk not be written for such images, since sBIT suggests that the original data range was exactly 0..(2^S)-1.

9.2. Encoder gamma handling

See Gamma Tutorial if you aren't already familiar with gamma issues.

Encoders capable of full-fledged color management [ICC] will perform more sophisticated calculations than those described here, and they may choose to use the iCCP chunk. Encoders that know that their image samples conform to the sRGB specification [sRGB] should use the sRGB chunk and not perform gamma handling. Otherwise, this section applies.

The encoder has two gamma-related decisions to make. First, it must decide how to transform whatever image samples it has into the image samples that will go into the PNG file. Second, it must decide what value to write into the gAMA chunk.

The rule for the second decision is simply to write whatever value will cause a decoder to do what you want. See Recommendations for Decoders: Decoder gamma handling.

The first decision depends on the nature of the image samples and their precision. If the samples represent light intensity in floating-point or high-precision integer form (perhaps from a computer image renderer), then the encoder may perform "gamma encoding" (applying a power function with exponent less than 1) before quantizing the data to integer values for output to the file. This results in fewer banding artifacts at a given sample depth, or allows smaller samples while retaining the same visual quality. An intensity level expressed as a floating-point value in the range 0 to 1 can be converted to a file image sample by

   sample = intensity ^ encoding_exponent
   integer_sample = ROUND(sample * (2^bitdepth - 1))

If the intensity in the equation is the desired display output intensity, then the encoding exponent is the gamma value to be written to the file, by the definition of gAMA (See the gAMA chunk specification). But if the intensity available to the encoder is the original scene intensity, another transformation may be needed. Sometimes the displayed image should have higher contrast than the original image; in other words, the end-to-end transfer function from original scene to display output should have an exponent greater than 1. In this case,

   gamma = encoding_exponent / end_to_end_exponent

If you don't know whether the conditions under which the original image was captured (or calculated) warrant such a contrast change, you may assume that display intensities are proportional to original scene intensities; in other words, the end-to-end exponent is 1, so gamma and the encoding exponent are equal.

If the image is being written to a file only, the encoder is free to choose the encoding exponent. Choosing a value that causes the gamma value in the gAMA chunk to be 1/2.2 is often a reasonable choice because it minimizes the work for a decoder displaying on a typical video monitor.

Some image renderers may simultaneously write the image to a PNG file and display it on-screen. The displayed pixels should be appropriate for the display system, so that the user sees a proper representation of the intended scene.

If the renderer wants to write the displayed sample values to the PNG file, avoiding a separate gamma encoding step for file output, then the renderer should approximate the transfer function of the display system by a power function, and write the reciprocal of the exponent into the gAMA chunk. This will allow a PNG decoder to reproduce what the file's originator saw on screen during rendering.

However, it is equally reasonable for a renderer to compute displayed pixels appropriate for the display device, and to perform separate gamma encoding for file storage, arranging to have a value in the gAMA chunk more appropriate to the future use of the image.

Computer graphics renderers often do not perform gamma encoding, instead making sample values directly proportional to scene light intensity. If the PNG encoder receives intensity samples that have already been quantized into integers, there is no point in doing gamma encoding on them; that would just result in further loss of information. The encoder should just write the sample values to the PNG file. This does not imply that the gAMA chunk should contain a gamma value of 1.0, because the desired end-to-end transfer function from scene intensity to display output intensity is not necessarily linear. The desired gamma value is probably not far from 1.0, however. It may depend on whether the scene being rendered is a daylight scene or an indoor scene, etc.

When the sample values come directly from a piece of hardware, the correct gamma value can in principle be inferred from the transfer function of the hardware and the lighting conditions of the scene. In the case of video digitizers ("frame grabbers"), the samples are probably in the sRGB color space, because the sRGB specification was designed to be compatible with video standards. Image scanners are less predictable. Their output samples may be proportional to the input light intensity because CCD (charge coupled device) sensors themselves are linear, or the scanner hardware may have already applied a power function designed to compensate for dot gain in subsequent printing (an exponent of about 0.57), or the scanner may have corrected the samples for display on a monitor. The device documentation might describe the transformation performed, or might describe the target display or printer for the image data (which might be configurable). You can also scan a calibrated target and use calibration software to determine the behavior of the device. Remember that gamma relates file samples to desired display output, not to scanner input.

File format converters generally should not attempt to convert supplied images to a different gamma. Store the data in the PNG file without conversion, and deduce the gamma value from information in the source file if possible. Gamma alteration at file conversion time causes re-quantization of the set of intensity levels that are represented, introducing further roundoff error with little benefit. It's almost always better to just copy the sample values intact from the input to the output file.

If the source file format describes the gamma characteristic of the image, a file format converter is strongly encouraged to write a PNG gAMA chunk. Note that some file formats specify the exponent of the function mapping file samples to display output rather than the other direction. If the source file's gamma value is greater than 1.0, it is probably a display system exponent, and you should use its reciprocal for the PNG gamma. If the source file format records the relationship between image samples and something other than display output, then deducing the PNG gamma value will be more complex.

Regardless of how an image was originally created, if an encoder or file format converter knows that the image has been displayed satisfactorily using a display system whose transfer function can be approximated by a power function with exponent display_exponent, then the image can be marked as having the gamma value:

   gamma = 1 / display_exponent

It's better to write a gAMA chunk with an approximately right value than to omit the chunk and force PNG decoders to guess at an appropriate gamma.

On the other hand, if the encoder has no way to infer the gamma value, then it is better to omit the gAMA chunk entirely. If the image gamma has to be guessed at, leave it to the decoder to do the guessing.

Gamma does not apply to alpha samples; alpha is always represented linearly.

See also Recommendations for Decoders: Decoder gamma handling.

9.3. Encoder color handling

See Color Tutorial if you aren't already familiar with color issues.

Encoders capable of full-fledged color management [ICC] will perform more sophisticated calculations than those described here, and they may choose to use the iCCP chunk. Encoders that know that their image samples conform to the sRGB specification [sRGB] are strongly encouraged to use the sRGB chunk. Otherwise, this section applies.

If it is possible for the encoder to determine the chromaticities of the source display primaries, or to make a strong guess based on the origin of the image or the hardware running it, then the encoder is strongly encouraged to output the cHRM chunk. If it does so, the gAMA chunk should also be written; decoders can do little with cHRM if gAMA is missing.

Video created with recent video equipment probably uses the CCIR 709 primaries and D65 white point [ITU-BT709], which are:

            R           G           B         White
   x      0.640       0.300       0.150       0.3127
   y      0.330       0.600       0.060       0.3290

An older but still very popular video standard is SMPTE-C [SMPTE-170M]:

            R           G           B         White
   x      0.630       0.310       0.155       0.3127
   y      0.340       0.595       0.070       0.3290

The original NTSC color primaries have not been used in decades. Although you may still find the NTSC numbers listed in standards documents, you won't find any images that actually use them.

Scanners that produce PNG files as output should insert the filter chromaticities into a cHRM chunk.

In the case of hand-drawn or digitally edited images, you have to determine what monitor they were viewed on when being produced. Many image editing programs allow you to specify what type of monitor you are using. This is often because they are working in some device-independent space internally. Such programs have enough information to write valid cHRM and gAMA chunks, and should do so automatically.

If the encoder is compiled as a portion of a computer image renderer that performs full-spectral rendering, the monitor values that were used to convert from the internal device-independent color space to RGB should be written into the cHRM chunk. Any colors that are outside the gamut of the chosen RGB device should be clipped or otherwise constrained to be within the gamut; PNG does not store out-of-gamut colors.

If the computer image renderer performs calculations directly in device-dependent RGB space, a cHRM chunk should not be written unless the scene description and rendering parameters have been adjusted to look good on a particular monitor. In that case, the data for that monitor (if known) should be used to construct a cHRM chunk.

There are often cases where an image's exact origins are unknown, particularly if it began life in some other format. A few image formats store calibration information, which can be used to fill in the cHRM chunk. For example, all PhotoCD images use the CCIR 709 primaries and D65 whitepoint, so these values can be written into the cHRM chunk when converting a PhotoCD file. PhotoCD also uses the SMPTE-170M transfer function. (PhotoCD can store colors outside the RGB gamut, so the image data will require gamut mapping before writing to PNG format.) TIFF 6.0 files can optionally store calibration information, which if present should be used to construct the cHRM chunk. GIF and most other formats do not store any calibration information.

It is not recommended that file format converters attempt to convert supplied images to a different RGB color space. Store the data in the PNG file without conversion, and record the source primary chromaticities if they are known. Color space transformation at file conversion time is a bad idea because of gamut mismatches and rounding errors. As with gamma conversions, it's better to store the data losslessly and incur at most one conversion when the image is finally displayed.

See also Recommendations for Decoders: Decoder color handling.

9.4. Alpha channel creation

The alpha channel can be regarded either as a mask that temporarily hides transparent parts of the image, or as a means for constructing a non-rectangular image. In the first case, the color values of fully transparent pixels should be preserved for future use. In the second case, the transparent pixels carry no useful data and are simply there to fill out the rectangular image area required by PNG. In this case, fully transparent pixels should all be assigned the same color value for best compression.

Image authors should keep in mind the possibility that a decoder will ignore transparency control. Hence, the colors assigned to transparent pixels should be reasonable background colors whenever feasible.

For applications that do not require a full alpha channel, or cannot afford the price in compression efficiency, the tRNS transparency chunk is also available.

If the image has a known background color, this color should be written in the bKGD chunk. Even decoders that ignore transparency may use the bKGD color to fill unused screen area.

If the original image has premultiplied (also called "associated") alpha data, convert it to PNG's non-premultiplied format by dividing each sample value by the corresponding alpha value, then multiplying by the maximum value for the image bit depth, and rounding to the nearest integer. In valid premultiplied data, the sample values never exceed their corresponding alpha values, so the result of the division should always be in the range 0 to 1. If the alpha value is zero, output black (zeroes).

9.5. Suggested palettes

Suggested palettes can appear as sPLT chunks in any PNG file, or as a PLTE chunk in truecolor PNG files. In either case, the suggested palette is not an essential part of the image data, but it may be used to present the image on indexed-color display hardware. Suggested palettes are of no interest to viewers running on truecolor hardware.

When sPLT is used to provide a suggested palette, it is recommended that the encoder use the frequency fields to indicate the relative importance of the palette entries, rather than leave them all zero (meaning undefined). The frequency values are most easily computed as "nearest neighbor" counts, that is, the approximate usage of each RGBA palette entry if no dithering is applied. (These counts will often be available for free as a consequence of developing the suggested palette.) Because the suggested palette includes transparency information, it should be computed for the uncomposited image.

Even for indexed-color images, sPLT can be used to define alternative reduced palettes for viewers that are unable to display all the colors present in the PLTE chunk.

An older method for including a suggested palette in a truecolor PNG file uses the PLTE chunk. If this method is used, the histogram (frequencies) should appear in a separate hIST chunk. Also, PLTE does not include transparency information, so for images of color type 6 (truecolor with alpha channel), it is recommended that a bKGD chunk appear and that the palette and histogram be computed with reference to the image as it would appear after compositing against the specified background color. This definition is necessary to ensure that useful palette entries are generated for pixels having fractional alpha values. The resulting palette will probably be useful only to viewers that present the image against the same background color. It is recommended that PNG editors delete or recompute the palette if they alter or remove the bKGD chunk in an image of color type 6.

For images of color type 2 (truecolor without alpha channel), it is recommended that PLTE and hIST be computed with reference to the RGB data only, ignoring any transparent-color specification. If the file uses transparency (has a tRNS chunk), viewers can easily adapt the resulting palette for use with their intended background color. They need only replace the palette entry closest to the tRNS color with their background color (which may or may not match the file's bKGD color, if any).

If PLTE appears without bKGD in an image of color type 6, the circumstances under which the palette was computed are unspecified.

For providing suggested palettes, sPLT is more flexible than PLTE in the following ways:

With sPLT, there can be multiple suggested palettes. A decoder may choose an appropriate palette based on name or number of entries.
In an RGBA (color type 6) PNG, PLTE represents a palette already composited against the bKGD color, so it is useful only for display against that background color. The sPLT chunk provides an uncomposited palette, which is useful for display against backgrounds of the decoder's choice.
Since sPLT is a noncritical chunk, a PNG editor can add or modify suggested palettes without being forced to discard unknown unsafe-to-copy chunks.
Whereas sPLT is allowed in PNG files of color types 0, 3, and 4 (grayscale and indexed), PLTE cannot be used to provide reduced palettes in these cases.
More than 256 entries can appear in sPLT.

An encoder that uses sPLT may choose to write a PLTE/hIST suggested palette as well, for backward compatibility with decoders that do not recognize sPLT.

9.6. Filter selection

For images of color type 3 (indexed color), filter type 0 (None) is usually the most effective. Note that color images with 256 or fewer colors should almost always be stored in indexed color format; truecolor format is likely to be much larger.

Filter type 0 is also recommended for images of bit depths less than 8. For low-bit-depth grayscale images, it may be a net win to expand the image to 8-bit representation and apply filtering, but this is rare.

For truecolor and grayscale images, any of the five filters may prove the most effective. If an encoder uses a fixed filter, the Paeth filter is most likely to be the best.

For best compression of truecolor and grayscale images, we recommend an adaptive filtering approach in which a filter is chosen for each scanline. The following simple heuristic has performed well in early tests: compute the output scanline using all five filters, and select the filter that gives the smallest sum of absolute values of outputs. (Consider the output bytes as signed differences for this test.) This method usually outperforms any single fixed filter choice. However, it is likely that much better heuristics will be found as more experience is gained with PNG.

Filtering according to these recommendations is effective on interlaced as well as noninterlaced images.

9.7. Text chunk processing

A nonempty keyword must be provided for each text chunk. The generic keyword "Comment" can be used if no better description of the text is available. If a user-supplied keyword is used, be sure to check that it meets the restrictions on keywords.

PNG text strings are expected to use the Latin-1 character set. Encoders should avoid storing characters that are not defined in Latin-1, and should provide character code remapping if the local system's character set is not Latin-1.

Encoders should discourage the creation of single lines of text longer than 79 characters, in order to facilitate easy reading.

It is recommended that text items less than 1K (1024 bytes) in size should be output using uncompressed tEXt chunks. In particular, it is recommended that the basic title and author keywords should always be output using uncompressed tEXt chunks. Lengthy disclaimers, on the other hand, are ideal candidates for zTXt.

Placing large tEXt and zTXt chunks after the image data (after IDAT) can speed up image display in some situations, since the decoder won't have to read over the text to get to the image data. But it is recommended that small text chunks, such as the image title, appear before IDAT.

9.8. Use of private chunks

Applications can use PNG private chunks to carry information that need not be understood by other applications. Such chunks must be given names with lowercase second letters, to ensure that they can never conflict with any future public chunk definition. Note, however, that there is no guarantee that some other application will not use the same private chunk name. If you use a private chunk type, it is prudent to store additional identifying information at the beginning of the chunk data.

Use an ancillary chunk type (lowercase first letter), not a critical chunk type, for all private chunks that store information that is not absolutely essential to view the image. Creation of private critical chunks is discouraged because they render PNG files unportable. Such chunks should not be used in publicly available software or files. If private critical chunks are essential for your application, it is recommended that one appear near the start of the file, so that a standard decoder need not read very far before discovering that it cannot handle the file.

If you want others outside your organization to understand a chunk type that you invent, contact the maintainers of the PNG specification to submit a proposed chunk name and definition for addition to the list of special-purpose public chunks (see Additional chunk types). Note that a proposed public chunk name (with uppercase second letter) must not be used in publicly available software or files until registration has been approved.

If an ancillary chunk contains textual information that might be of interest to a human user, you should not create a special chunk type for it. Instead use a tEXt chunk and define a suitable keyword. That way, the information will be available to users not using your software.

Keywords in tEXt chunks should be reasonably self-explanatory, since the idea is to let other users figure out what the chunk contains. If of general usefulness, new keywords can be registered with the maintainers of the PNG specification. But it is permissible to use keywords without registering them first.

9.9. Private type and method codes

This specification defines the meaning of only some of the possible values of some fields. For example, only compression method 0 and filter types 0 through 4 are defined. Numbers greater than 127 must be used when inventing experimental or private definitions of values for any of these fields. Numbers below 128 are reserved for possible future public extensions of this specification. Note that use of private type codes may render a file unreadable by standard decoders. Such codes are strongly discouraged except for experimental purposes, and should not appear in publicly available software or files.

Previous page
Next page
Table of contents