Adaptive Predictor for Lossless Image Compression

Václav Hlavác, Jaroslav Fojtí k
Center for Machine Perception
Faculty of Electrical Engineering, Czech Technical University
121 35 Prague 2, Karlovo námestí 13, Czech Republic
{hlavac,fojtik}@vision.felk.cvut.cz
keywords: image compression, lossless compression,
predictor-based compression

Abstract

A new method for lossless image compression of grey-level images is proposed. The image is treated as a set of stacked bit planes. The compressed version of the image is represented by encoded residuals of a non-linear local predictor spanning the current bit plane as well as a few neighbouring ones. Predictor configurations are grouped in pairs differing in one bit of the representative point only. The frequency of predictor configurations is obtained from the input image. The predictor adapts automatically to the image, it is able to estimate the influence of neighbouring cells and thus copes even with complicated structure or fine texture.

The residuals between the original and the predicted image are those that correspond to the less frequent predictor configurations. Efficiently coded residuals constitute the output image. To our knowledge, the performance of the proposed compression algorithm is comparable to the current state of the art. Especially good results were obtained for binary images, grey-level cartoons and man-made drawings.

1 Introduction

Data compression can be viewed as a process producing a model of the input data together with associated residuals. An improved data model yields better compression results.

Redundant information in generic raster images is due to high correlation between neighbouring cells¹. In other words, the local structure and similarity are the main source of redundancy in generic imagery. The proposed image compression method examines the neighbourhood of the current cell using an adaptive probe, creates a non-linear predictor for the current cell value, and stores residuals of the predictor in the resulting compressed image.

In the compression phase, an image is systematically traversed using an adaptive probe and, for the current cell, a non-linear predictor is created by examining the local neighbourhood. The estimated value is compared to the actual value of the cell. Where there is a discrepancy between the estimated and the actual value the residual (binary value one) is stored in the position of the current cell. If the predictor provides a good estimate, a precise match will be found for the majority of the cells and the resulting compressed image will be small. The description of the predictor and the positions of the residuals in the image are stored together in the output file. Three predictors with increasing complexity are described in this paper, i.e. (i) for binary images, (ii) for grey-level images, (iii) adaptive predictor for grey-level image.

The reduced data is optimally coded, and together with the predictor description, the compressed image is created.

The paper is organized as follows. Section lists the current most efficient lossless compression methods and these are later used for comparison with the proposed methods. Section explains the basic predictor as a set of probe pairs that differ in one bit only. Section generalizes the predictor for the grey-level image that is treated as stacked bit planes. Section further improves the predictor by allowing it to adapt to the actual image data. Section proposes an efficient way to encode the sparse matrix of predictor residuals. In Section we present experimental results. In the last section we draw conclusions.

A preliminary version of the paper was published in [].

2 Related works

There are several lossless compression methods used for 1D streams of data as text or executable programs. The ZIP or LZW are examples of commonly used packages. They are known and we do not describe them here. Let us begin with describing some methods similar to ours developed for 2D compression. Lossless compression techniques are usually divided into two generations. The methods in the first generation use quite simple algorithms. Let us just mention three representatives here.

The IBM Q80 coder (lossless JPEG) [] is obsolete and outperformed by the conceptually similar Portable Network Graphics (PNG).

PNG [] consists of two steps. In the first step, the difference image D_x (x,y) = f(x-1,y)-f(x,y) is computed and a matrix which contains only differences is created. In the second step, this matrix is encoded using the LZW method. Both parts may be connected by a pipe and these can run in parallel. The source code of the algorithm is freely available. The disadvantage is that the second step is not tuned to 2D image data. PNG is based on the assumption that the number of regions with limited 1st derivatives is smaller than the number of regions with higher 1st derivatives. The context of images is usually richer.

Paper [] discusses several simple compression methods PVRG-JPEG proposed by the Portable Video Research Group at Stanford. Each method was tried on the images and the best method was selected for actual compression.

The second generation lossless compression methods are usually divided into two subgroups:

Region based.

The image is divided into regions that have similar intensity or other properties. Regions are compressed separately and stored together with a description of the boundary of the region. The method Segment [] belongs to this group. The regions are grown from seeds. The resulting compression ratios are very good.

Data prediction based

methods try to create an internal data model. Residuals of the model are stored. A method which is performing similarly to ours that has similarly is called Callic []. It calculates several features from the pixels in a small neighbourhood. The pixels in the predictor are chosen adaptively using information from image directional derivatives.

The rounding transform from [] uses circular-shaped hierarchical estimators. The compression ratio does not rank the method among the best.

There is one more method which is relevant to ours. It treats the image as separate bit planes and is called the Logic coding method []. The method uses boolean logic for image compression.

3 Idea of image coding for binary image

The initial idea comes from Schlesinger [] who proposed a new representation scheme for binary images. Let us denote the method by Schl-2 . Number 2 comes from two degrees of freedom (DOF).

The Schl-2 method takes a binary image f as input. The support of the image is T={(x,y): 0 Ł x Ł M; 0 Ł y Ł N }, where M is the width of the image and N is the height. The image represents the mapping f(x,y) Ž{0,1}. Without loss of generality, we assume that the leftmost column and the bottom line of the image have value zero, i.e. f(0,y) = f(x,0) = 0.

The Schl-2 method traverses the binary image f by the probe consisting of 2×2 cells. Its representative cell (x,y) is placed in the upper right corner of the probe. The value of the actual (estimated) cell is p₄ = f(x,y). Let values of the probe placed in the particular position in the probe be denoted p₁ = f(x,y-1); p₂ = f(x-1,y-1); p₃ = f(x-1,y); see Fig. .

Schlesinger designed the predictor (estimator, probe) e in the following way. All the 16 possible combinations of cells within the 2 ×2 probe are considered.

Probes are arranged into two rows and eight columns in such a way that probes in the same column differ in the upper right bit only, see Fig. . A frequency analysis of the predictor configurations was performed on many different images with the aim of finding out which configuration in each pairs occurs more often. The more frequent configurations are highlighted in Fig. . The probes in column 5 will be discussed soon.

The idea of this compression method is to store residuals of less frequent probes, i.e. value one in the representative point of the probe. The arrangement of the probes allows us to reconstruct the image from a sparse matrix of stored residuals only.

Schlesinger tested the predictor on many images and the more frequent probe of each pair was found through extensive experiments. In the compression phase, only the residuals between predicted and real values are stored. This typically yields a much sparser image.

Actually, the probe configurations proposed by Schlesinger are suboptimal with respect to the relative frequencies of different configurations. Probes in Fig. , column 5 would have to be highlighted in opposite manner if the probe statistics had to be optimal for binary images tested by Schlesinger. A substantial advantage of the suboptimal configurations independent of the image content is that standard set operations (intersection, union, negation) and binary morphology operations can be performed on the compressed images.

4 Compression of binary images

Our compression method for binary images abbreviated as FH-2 takes a step forward compared to Schlesinger's one. The probe frequency analysis is performed for each image. The result is then stored in a frequency table consisting of two rows and eight columns. A value in the table represents the number of occurrences of the corresponding probe in the image. The maxima in columns of the frequency table point to optimal predictor configurations. These are marked.

One bit is needed to distinguish between two probes in one column. Thus one byte describes all eight pairs of probes. This information will be added to the compressed image to store the predictor description.

In the compression phase , the image is traversed in a bottom-up manner and from right to left. In each position of the image, a check is made to determine whether a more frequent or less frequent probe appeared. In the former case, the value 0 is written to the representative cell. Whereas, in the latter case, the value one is stored (corresponding to the residual of the predictor). The algorithm traverses the whole image and produces an image of residuals. This reduced image is typically much sparser than the input image due to the estimation abilities of the predictor.

The decompression phase has at hand the description of the predictor (one byte) and the image of residuals. The left-most column and the bottom most row are known to be zero-valued. The probe starts at the bottom left corner with the values p₁, p₂, and p₃ being known. These three values uniquely determine one of the eight columns of the predictor. The value of the stored residual distinguishes the correct probe in the pair. The decompressed value p₄ is set accordingly and overwrites the residual in the position p₄. The probe traverses the image row-wise from left to right and from bottom to top, until the original image is restored.

5 Compression of grey-level images

The grey-level image can be treated as stacked bit planes. The predictor described in the previous section was generalized to cope with the current bit plane and the closest bit plane upwards. The predictor context encompassed by the volumetric probe has three degrees of freedom (DOF). Therefore we abbreviate the proposed compression algorithm FH-3. The arrangement of cells in the probe is shown in Fig. . The processed bit plane is the lower one. Its representative point, i.e. the predicted cell, is denoted by p₈ and is highlighted. The idea of the compression remains the same as in the binary case.

The frequency analysis of probe configurations is performed when passing through the image for the first time. The process is analog to the binary image case described in Section 3. Here 2⁷ bits = 16 bytes are needed to store the frequency table of all possible configurations. The image of the processed bit-plane is reduced and residuals are stored in the same way as in the binary image case. This requires a second pass through the original image data. The result is a sparse matrix with a small number of residuals.

The original image can be uniquely reconstructed using the residuals and the description of the predictor stored in the compressed image. The compression process as described above starts from the top bit plane using the FH-2 algorithm (since there is no available data above the top plane). Then the FH-3 algorithm is applied to remaining bit planes.

More complicated probes spanning several bit planes can be also used, see Fig. . The rightmost probe is used for processing the top bit plane. The probe to the left is used for coding the bit plane below the top plane and so on.

6 Adaptive compression

The two previous predictors had a fixed number of probe cells (3 in the Schl-2 case and 7 in the FH-3 case) in a fixed arrangement. The desire is to exploit a more complex image context in the predictor. The support set of the predictor can vary in more dimensions according to the context of the image. Such a predictor can encode more complicated structures in the image, e.g. some fine texture.

The size of the predictor support is limited from a practical point of view by the size of the frequency table. The size of the frequency table grows exponentially and has 2^k entries, where k is the number of cells of the predictor. The predicted representative cell of the probe is not counted. For example, for 20 cells, the frequency table size is 2²⁰ = 1Mbit. This would be too much. Of course, not all predictor cells contain the same amount of information. The trick we propose is to throw away some cells from a relatively large probe adaptively. This is done according to the content of the particular image. In the following, let us denote such an adaptive predictor by FH-Adapt in the sequel.

The construction of predictor probes is similar to the procedure described earlier. They have more cells, e.g. 20, and could span n-DOF (degrees of freedom). In the case of an RGB colour image, there are 3-DOF in the intensity image corresponding to one of the colour components. The two other DOF are added by the context in the same bit-plane in other two colour components. Probes are again arranged in pairs that differ only in the value of the actual (estimated) cell. This facilitates a unique reconstruction of the original image in the decompression phase.

The frequency analysis comes next. The result of the analysis is the frequency table with the size 2^k+1, where k is the number of support cells of the predictor. Note that there is no need for the representative cell to be in the set of k elements in the probe. The frequency table is usually rather big. The aim is to select the subset of i cells, i < k, containing most of the information. This results in a reduction of the length of the frequency table to 2ⁱ.

The selection step does not need to pass through the original image another time and thus is independent of the size of the image. Predictor cells may be viewed as features in statistical pattern recognition task. The cells contribute to the estimate of the representative point of the probe. After the frequency analysis, it is known how often a particular predictor probe occurs in the image. Moreover, the cells can be ordered according to their importance for estimation of the representative cell. The proposed algorithm decouples the influences of individual cells from the frequency table.

Let us explain the decoupling in the specific case of a binary image. The aim is to reduce the number of probe cells by one to k = 3. All configurations of the probe were shown in Fig. . Assume for a moment that we would like to decouple the influence of cell p₃ (the top left corner of the probe Fig. ). If this feature was omitted the probe pairs would merge as shown in Fig. . The frequency of the new probe configuration is the sum of the frequencies of its two constituting configurations, which differ only by the value of the removed cell.

An appropriate representation of the frequency table facilitates the use of very efficient bit operations. Recall that the representative cell has the highest index. Let the frequency table be stored in a vector q with 2^k+1 elements. The first 2^k elements of the vector q = (q₀, ź,q_k) correspond to the frequencies of the top row of probe configurations.

Let e_all be the number of occurrences of the less frequent configurations of the probes. Furthermore, let b be an index to the first element of the second half of the frequency table, b = 2^k, and c the index of the removed cell. The value e_all is calculated as

e_all =

b-1
ĺ
i = 0

min

(q_i, q_i+b) .

Let e_c denote a measure of the estimativness of a cell with index c.

e_c = e_all -

b-1
ĺ
i = 0

min

( q_i + q_{i Ĺc},q_{i Ĺb} + q_{i Ĺc Ĺb} ) .

The factor ¹/₂ in front of the sum comes from the fact that the value within the sum is considered twice. The estimativness e_c expresses the increase in the number of residuals when the cell c is omitted. The operator Ĺ denotes logical eXclusive OR.

The cell with the lowest estimativness e_c is omitted from the probe. This algorithm can be applied iteratively and in each iteration the cell corresponding to the minimal value e_c is thrown away.

At last, we will define the termination criterion for removal of cells. The number of residuals e_all increases monotonically if we throw away cells one by one. The number of entries in the frequency table decreases monotonically and thus its length decreases. We are trying to find the extremum of the difference between these two functions. If the increase of the length of the coded residuals (given by the value e_c) is bigger than the decrease of the table length or the table is empty then the algorithm terminates.

7 Encoding residuals

To achieve high compression ratio the matrix of residuals should be efficiently coded. Consider Fig. where a simple example of residuals is shown. The idea is to code residuals along line scans using distances between residuals. The counting starts from an hypothetical starting point positioned left of the first row. The counting finishes in the hypothetical end point positioned to the right of the last row. The distances l₁, ź, l_k obtained represent the residuals uniquely. Of course, the sum of all distances is constant, i.e.

k
ĺ
i = 1

l_i = M N + 1 .

This fact can be used for error checking, e.g. while loading stored data.

The question how to code distances l₁, ź, l_k efficiently is important. The traditional Huffman coding is not suitable because the maximal possible length l_max = M N+1 is too big. The time needed to compute the Huffman table is also important. The best algorithms can do it in O(n log(n)) time where the optimal complexity of sorting items is O(n log(n)) and the complexity of computing the Huffman table is O(n). The number O((M N+1) log(M N+1)) would be too high for all practical purposes.

It is desirable to find suitable code to represent big integer numbers. There is a need to encode large values very often. The Elias code [] inspired us. The Elias code is composed of two parts, the first uses unary and the second binary code. We propose a new code, called code with logarithmic growth , that was inspired by Elias' code. The length of this code grows logarithmically with the encoded number l_i.

We combined the Huffman coding and the prefix code with logarithmic growth for better compression. The combination of the Huffman code and the prefix code with logarithmic growth is illustrated in Fig. . The number K is the highest number coded by the Huffman code. If K š 0 we chose a modified code with logarithmic growth. This code can be found in the column round(log₂(K)).

There are three options, see Tab. : (a) K = 0, i.e. only the code with logarithmic growth is used (basic option); (b) K is fixed. Experiments have shown that the best values are typically K = 32 or K = 64; (c) K is set optimally according to the data to be compressed. This option is described next.

202mm

No. Binary (a) K = 0 (b) Fixed K (c) Optimal K

1 00 00 000 0000
2 01 01 001 0001
3 100 1000 010 0010
4 101 1001 011 0011
5 11000 1010 10000 0100
6 11001 1011 10001 0101
7 11010 110000 10010 0110
8 11011 110001 10011 0111
9 1110000 110010 10100 100000
10 1110001 110011 10101 100001
11 1110010 110100 10110 100010
12 1110011 110101 10111 100011
13 1110100 110110 1100000 100100
14 1110101 110111 1100001 100101
15 1110110 11100000 1100010 100110
16 1110111 11100001 1100011 100111
17 111100000 11100011 1100100 101000

Table 1: Code with logarithmic growth.

The length L₁ of the Huffman code table is an increasing function

L₁ = 4 + (n+1) log₂ S [bits],

where S is the size of the Huffman code space (that corresponds to the longest coded symbol). Value (n+1) is the number of entries in the table, n entries are used for the numbers and 1 is used for the prefix. The log₂ S term gives the length of the table entry in bits. The length of the coded residuals, (the small perturbations are caused mainly by the change of the exponent, e.g. from 2³ to 2⁴), decreases more or less monotonically. The difference between these two monotonic functions is the function we are looking for. The point of interest is the maximum of the function corresponding to the optimal of K.

The relationship between the number K and the length of encoded residuals is illustrated in Fig. . Eight curves, one for each bit plane of a test image, are shown.

8 Experiments

The goal of the experiments were to (1) test the properties of the proposed method, (2) compare it with the results of others, and (3) find possibilities of further improvements.

Let us illustrate the behaviour of the proposed FH-3 compression algorithm on the standard grey-level testing image Lena of size 256×256×8 bits, see Fig. . Selected bit-planes of the original and compressed image are shown in Fig. . Note that the compression is more effective on bit-planes corresponding to more significant bits as there are usually less changes there. Look how well the residuals express perceptually significant features in the image. The mentioned experiment is quantified in Tab. . Notice how the number of residuals increases from the most significant bit-plane to the least significant bit-plane. It is difficult to represent randomness by the estimator. The estimativness e_c gives a clear insight into the behaviour of the predictor. Cell 6 and 4 from the neighbouring bit planes contain the highest amount of information.

Bit-plane	bit 7	bit 6	bit 5	bit 4	bit 3	bit 2	bit 1	bit 0

FH-2 residuals	2010	5477	10177	15067	21865	28595	31771	32214
FH-3 residuals	empty	3841	5935	9421	15045	21690	28622	31272

estimativness e₁	empty	25	160	218	233	91	125	320
estimativness e₂	empty	616	2165	3108	4930	4919	2521	457
estimativness e₃	empty	52	180	219	233	91	128	352
estimativness e₄	empty	1892	4101	5543	6611	6600	2813	447
estimativness e₅	0	14	123	143	92	26	112	361
estimativness e₆	1457	2097	3477	4169	5807	5031	2312	467
estimativness e₇	0	23	118	136	170	85	71	220

Table 2: Quantitative description of the FH-2 and FH-3 methods.

The proposed adaptive compression method FH-Adapt was compared on several images from the standard JPEG image set []. The methods used for comparison were: Segment from [], Portable Video Research Group at Stanford PVRG-JPEG from [], Rounding transform from [], and Logic coding from []. Results were taken directly from the papers. The only implementation that was available was that of Callic [].

The quantitative measure used for comparison of the different methods is the compression efficiency CE []

CE =

total number of input bytes - totalnumber of outputbytes

total number of input bytes

·100 %

The value CE in Tab. shows how our method performs in comparison with others. The FH-3 method was outperformed only by Segment and PNG in the case of an image of a monkey (Baboon). The reason is that the predictor cannot cope with the complicated structure of the monkey's hair. Segment performs also better on the Lena image. Unfortunately we were not able to experiment with Segment because the code was not available. The PNG compression is very similar to the IBM Q80 coder []. IBM Q80 is obsolete and the implementation was not accessible.

Method/Image	Lena	Baboon	Boats

PVRG-JPEG	22.7-29.4	6.7-13.7	24.9-33.1
Logic	26.1	12.6	29.3
Segment	54.6	36.3	empty
Rounding Tr.	37.6	empty	empty
PNG	20.7	41.7	33.7
*FH-3*	34.0	19.1	41.2

Table 3: Compression coefficients compared with results of others

The Tab. shows the comparison of our method FH- Adapt with other common methods on the Lena image and on a technical drawing image, see Fig. . The latter image is perfectly suited for our compression method as can be seen from the results.

Type of data	Lena [Bytes]	CE Lena [%]	Drawing [bytes]	CE Drawing [%]

Uncompressed	262656	0	106940	0
ZIP	225007	15.62	10224	90.44
ARJ	228389	14.35	10380	90.29
PCX	274256	-2.85	38624	63.88
GIF	278771	-4.54	17290	83.83
*FH-Adapt.*	175434	34.21	7297	93.18

Table 4: Comparison with typical compression methods.

Tab. illustrates the influence of three residual coding methods on three different predictors. This gives us an understanding what improvement the individual method brings. The values in the table are the lengths of the compressed 512 × 512 image Lena in bytes.

Method/Coding	Exp. code only	Fixed Huffman	Moving Huffman

FH-2	203000	200758	200599
FH-3	178655	176021	175922
FH-Adapt	178130	175541	175432

Table 5: Comparison of three proposed predictors and three types of coding techniques. Values are given in bytes.

Tests were performed on an Intel Pentium 120MHz computer. The computation time was obtained using a RAM disk to avoid randomness in accessing the disk. The compression of the image Lena (512 × 512 × 8 Bits) takes 6.2 seconds using FH-3 method. The decompression is approximately twice as fast, i.e. in the Lena case it runs 2.7 seconds. Loading/Storing a data without compression takes 0.1 seconds.

9 Conclusions

We have proposed a new lossless image compression method that in many respects outperforms other methods considered as state of the art. The basic idea of the method is to predict differences in one bit only.

The advantages are:

The performance in terms of compression efficiency is similar or better compared to the best methods used. Especially good results were obtained for binary images, grey-level cartoons and man-made drawings.
The quick implementation is possible since the operations used are very simple, e.g. SHL, XOR, ADD, and the selection from a table.
Substantial parts of the compression and decompression phases can run in parallel.
The predictor that constitutes the core of the method can be extended to handle more degrees of the freedom, e.g. to colour images or image sequences.
When the resolution of the image increases, the compression ratio does not degrade. This happens in the case of pyramidal algorithms, e.g. MLP (Multi Level Progressive) [].

The disadvantages of the method are:

The compression algorithm needs at least two passes through the image data. When the Huffman coding is used to encode the residuals from the predictor an additional pass is needed.
The decompression algorithm is very sensitive to errors in the compressed data. Of course, this disadvantage holds for most compression methods.

We encourage the reader to test the method, see the www page
http://cmp.felk.cvut.cz/~fojtik/.

10 Acknowledgements

We are grateful to M. I. Schlesinger from Institute of Cybernetics, Ukrainian Academy of Sciences Kiev, Ukraine. His series of lectures held in Prague in summer semester 1996 initiated research reported in this article. J. Matas suggested us a few improvements and K. Jonsson carefully read the manuscript.

This research was supported by the Czech Ministry of Education grant VS96049, the Grant Agency of the Czech Republic, grants 102/97/0480, 102/97/0855 and the European Union grant Copernicus CP941068.

Figure Figure 1: 2 ×2 probe of the predictor.

Figure Figure 2: Predictor used for binary image coding. All possible configurations are arranged into eight pairs differing in the upper right bit only. Highlighted configurations are treated as less frequent.

30mm

Figure

Figure 3: FH-3 predictor.

Figure Figure 4: Modified probes spanning several bit planes.

45mm

Figure

Figure 5: Lengths to be coded.

100mm

Figure

Figure 6: K selects the length of the fixed part of the code.

Figure Figure 7: Changes of the overall size in dependence of a K value.

Figure Figure 8: Extracted cell p₃.

Figure Figure 9: Lena - grey level test image.

0.23

Figure 10: Comparison between bit-planes of the original image and bit-planes storing residuals.

Figure Figure 11: Image of the scanned technical drawing.

Footnotes:

¹ The term cell is used instead of pixel common for 2D images as it copes with higher dimensions too.

File translated from T_EX by T_TH, version 2.33.
On 5 Aug 1999, 13:50.

No.	Binary	(a) K = 0	(b) Fixed K	(c) Optimal K

1	00	00	000	0000
2	01	01	001	0001
3	100	1000	010	0010
4	101	1001	011	0011
5	11000	1010	10000	0100
6	11001	1011	10001	0101
7	11010	110000	10010	0110
8	11011	110001	10011	0111
9	1110000	110010	10100	100000
10	1110001	110011	10101	100001
11	1110010	110100	10110	100010
12	1110011	110101	10111	100011
13	1110100	110110	1100000	100100
14	1110101	110111	1100001	100101
15	1110110	11100000	1100010	100110
16	1110111	11100001	1100011	100111
17	111100000	11100011	1100100	101000

Adaptive Predictor for Lossless Image Compression

Václav Hlavác, Jaroslav Fojtí k Center for Machine Perception Faculty of Electrical Engineering, Czech Technical University 121 35 Prague 2, Karlovo námestí 13, Czech Republic {hlavac,fojtik}@vision.felk.cvut.cz keywords: image compression, lossless compression, predictor-based compression

Abstract

1 Introduction

2 Related works

3 Idea of image coding for binary image

4 Compression of binary images

5 Compression of grey-level images

6 Adaptive compression

7 Encoding residuals

8 Experiments

9 Conclusions

10 Acknowledgements

Footnotes:

Václav Hlavác, Jaroslav Fojtí k
Center for Machine Perception
Faculty of Electrical Engineering, Czech Technical University
121 35 Prague 2, Karlovo námestí 13, Czech Republic
{hlavac,fojtik}@vision.felk.cvut.cz
keywords: image compression, lossless compression,
predictor-based compression