BogoToBogo
  • Home
  • About
  • Big Data
  • Machine Learning
  • AngularJS
  • Python
  • C++
  • go
  • DevOps
  • Kubernetes
  • Algorithms
  • More...
    • Qt 5
    • Linux
    • FFmpeg
    • Matlab
    • Django 1.8
    • Ruby On Rails
    • HTML5 & CSS

Digital Image Processing - Compression 2020

OpenCV_Logo.png  height=




Bookmark and Share





bogotobogo.com site search:








What is compression?

We do compress an image to minimize the number of bits in an image

  1. while being able to reconstruct original image exactly (lossless)
  2. while maintaining a reasonable quality of the reconstructed image (lossy)




Why are signals compressible?
  1. Signals have statistical redundency or structure (spatial, spectral, or temporal) in the data.
  2. The are perceptually not useful information either because we cannot see or hear. So, we can throw such irrelevant data.




Vision

The picture below shows simplified cross section of the human eye. We have the cornea, the lenses, and the retina. The retina is where the images that we see are projected on, and then they are sent into the brain.


Human_eye.png

The retina is full of sensors, all over the retina.

The retina contains two major types of light-sensitive photoreceptor cells used for vision: the rods and the cones. We see a very high peak of cones around the fovea. The fovea is basically where we can see the best. Cones are very good at seeing details, especially in bright light. Actually, cones are responsible for color vision. They require brighter light to function than rods require. We're always trying to move our eyes, such that the scene is projected as much as possible into the fovea region of our retina. The amount of cones, the amount of receptors goes down when we get away from the fovea.

There is another type of receptors (sensors) which are called the rods. The rods are marked in this .. picture with dashed lines. As we can see, the concentration of the rods is more uniform across the whole retina.

The cones that are very good at seeing at bright light and they are concentrated around the fovea. The rods which are spread all around the retina and are very good at seeing at very low light. In other words, rods cannot distinguish colors, but are responsible for low-light (scotopic) monochrome (black-and-white) vision.

Note that there's a region in the retina that has no sensors, and it is called the blind spot with no receptors.

receptors.png




Color Systems

How we store the pixel values. We can select the color space and the data type used. The color space refers to how we combine color components in order to code a given color. We have couple of methods and each of them breaks it down to three or four basic components and we can use the combination of these to create the others.

  1. The most popular one is RGB, mainly because this is also how our eye builds up colors. Its base colors are red, green and blue. To code the transparency of a color sometimes a fourth element: alpha (A) is added.
  2. The HSV and HSL decompose colors into their hue, saturation and value/luminance components, which is a more natural way for us to describe colors.
  3. YCrCb is used by the popular JPEG image format which we will discuss in later section of this chapter.
  4. CIE L*a*b* is a perceptually uniform color space, which comes handy if we need to measure the distance of a given color to another color.





Image Compression - JPEG

The JPEG compression is a block based compression. The data reduction is done by the subsampling of the color information, the quantization of the DCT-coefficients and the Huffman encoding.

JPEG typically achieves 10:1 compression with little perceptible loss in image quality, and is the file type most often produced in digital photography or computer screenshots.- wiki

1. Conversion - $RGB \text{ to } Y'C_bC_r$ and subsampling

The representation of the colors in the image is converted from $RGB$ to $Y'C_bC_r$, consisting of one luma component ($Y'$), representing brightness, and two chroma components, ($C_b$ and $C_r$), representing color. This step is sometimes skipped. The resolution of the chroma data is reduced, usually by a factor of 2. This reflects the fact that the eye is less sensitive to fine color details than to fine brightness details. In other words, the color information ($C_b$ and $C_r$) can be sub-sampled without a significant loss of visible image information as illustrated in the picture below.

UnderSampling.png

Picture source: wiki



JPEG_Compression_process_YCbCr.png

2. Construct $n\times n$ subimages

JPEG_Compression_process_subimages.png

We usually use $8\times 8$.
Why 8?
We make subimages to make it faster to process, and it turned out 8 to be a sort of magic number in a sense that it produces results close to Karhunen-Loeve theorem (it yields the best basis that minimizes the total mean squared error): $$MSE = \left[\frac{1} {\text{# of pixels}} \sum_{pixels}(\hat{f}-f)^2 \right]^{\frac12}$$ where $\hat{f}$ is the reconstructed image and $f$ is the original image.




3. Forward transformation - DCT

The $Y$, $C_b$, and $C_r$ data in the $8\times 8$ pixels block undergoes the Discrete Cosine Transform (DCT).


JPEG_Compression_process_DCT.png

The DCT constructs linear combinations of the basis shown below.


JPEG_Compression_process_8_8_basis.png

In the spatial domain (before DCT), the data is described via digital value for each pixel and we represent the image by a list of pixel values. After the transformation, the image is described by the coefficients of the spatial frequencies for vertical and horizontal orientation. So, we still have to store 64 frequency coefficients after DCT. This means no data reduction with DCT. To reduce the amount of data to store the 64 coefficients, we do quantize them as described in next section.

The following picture demonstrates how we can decompose an image into bases images ( (8 x 8 gray scale smile face decomposed in Hadamard bases)

Hadamard_Decomposition.png



Why_DCT.png

The Fourier Transform is assuming periodicity (top picture) at the boundary of each block, which is not realistic because the neighboring pixel tends to be drastically different. However, if we use DCT, we get the similar neighboring pixels (bottom picture Markovian image).



DC_AC_Coefficient.png

As shown in the picture, the image (left) to be encoded is first divided into 8 by 8 blocks. The blocks are zero-shifted, so they range from minus 128 to 127, for an 8-bit per pixel image. Then the discrete cosine transform is taken of each and every block. The picture on the right side shows the DCT coefficients. The zero zero coefficient is referred to as the DC coefficient, while the rest are referred to as AC coefficients, an analogy from direct current and alternating current.








4. Quantize Transformation Coefficient

JPEG_Compression_process_quantizer.png

The amplitudes of the frequency components (for example, values in the range of 0-255) are quantized.

JPEG_Compression_process_COEF_pixel.png JPEG_Compression_process_COEF.png

Human vision is much more sensitive to small variations in color or brightness over large areas than to the strength of high-frequency brightness variations. Therefore, the magnitudes of the high-frequency components are stored with a lower accuracy than the low-frequency components. The quality setting of the encoder affects to what extent the resolution of each frequency component is reduced.

JPEG_Compression_process_QUAN_STEP.png
$$T(u,v) = \sum_x \sum_y F(x,y) r(x,y,u,v)$$, where $T(u,v)$ is the coefficients of each pixel in the block, $F(x,y)$ is our image, and $r(x,y,u,v)$ is the basis shown in step 3 DCT. So, after the quantization ($\hat{T}$), we get the reconstructed image ($\hat{F}$):
$$\hat{F}(x,y) = \sum_u \sum_v \hat{T}(u,v) r(x,y,u,v)$$.

Most of the times, the user can define the strength of the JPEG compression. The quantization is the step where this user information has influence on the image quality and file size.




5. Encoder

The resulting data for all $8\times 8$ blocks is further compressed with a lossless algorithm, a variant of Huffman encoding.


JPEG_Compression_process_Huffman.png

For more on Huffman encoding, please visit
http://www.bogotobogo.com/Algorithms/compression_huffman_encoding.php.





JPEG Encoding Example

The following pictures are from Fundamentals of Digital Image and Video Processing:


DivideImageIntoBlocks.png

CalculateDCT.png

QuantizeTheDCTCoefficient.png

VectorizeTheCoefficients.png




Reference
  1. Digital Image Processing, 3rd ed. Gonzalez & Woods.
  2. Fundamentals of Digital Image and Video Processing




For image sensors, visit Image Sensors (CCD & CMOS).








Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization

YouTubeMy YouTube channel

Sponsor Open Source development activities and free contents for everyone.

Thank you.

- K Hong







OpenCV 3 -

image & video processing



Installing on Ubuntu 13

Mat(rix) object (Image Container)

Creating Mat objects

The core : Image - load, convert, and save

Smoothing Filters A - Average, Gaussian

Smoothing Filters B - Median, Bilateral




Sponsor Open Source development activities and free contents for everyone.

Thank you.

- K Hong








OpenCV 3 image and video processing with Python



OpenCV 3 with Python

Image - OpenCV BGR : Matplotlib RGB

Basic image operations - pixel access

iPython - Signal Processing with NumPy

Signal Processing with NumPy I - FFT and DFT for sine, square waves, unitpulse, and random signal

Signal Processing with NumPy II - Image Fourier Transform : FFT & DFT

Inverse Fourier Transform of an Image with low pass filter: cv2.idft()

Image Histogram

Video Capture and Switching colorspaces - RGB / HSV

Adaptive Thresholding - Otsu's clustering-based image thresholding

Edge Detection - Sobel and Laplacian Kernels

Canny Edge Detection

Hough Transform - Circles

Watershed Algorithm : Marker-based Segmentation I

Watershed Algorithm : Marker-based Segmentation II

Image noise reduction : Non-local Means denoising algorithm

Image object detection : Face detection using Haar Cascade Classifiers

Image segmentation - Foreground extraction Grabcut algorithm based on graph cuts

Image Reconstruction - Inpainting (Interpolation) - Fast Marching Methods

Video : Mean shift object tracking

Machine Learning : Clustering - K-Means clustering I

Machine Learning : Clustering - K-Means clustering II

Machine Learning : Classification - k-nearest neighbors (k-NN) algorithm



Matlab Image and Video Processing



Vectors and Matrices

m-Files (Scripts)

For loop

Indexing and masking

Vectors and arrays with audio files

Manipulating Audio I

Manipulating Audio II

Introduction to FFT & DFT

Discrete Fourier Transform (DFT)



Digital Image Processing 2 - RGB image & indexed image

Digital Image Processing 3 - Grayscale image I

Digital Image Processing 4 - Grayscale image II (image data type and bit-plane)

Digital Image Processing 5 - Histogram equalization

Digital Image Processing 6 - Image Filter (Low pass filters)

Video Processing 1 - Object detection (tagging cars) by thresholding color

Video Processing 2 - Face Detection and CAMShift Tracking












Contact

BogoToBogo
contactus@bogotobogo.com

Follow Bogotobogo

About Us

contactus@bogotobogo.com

YouTubeMy YouTube channel
Pacific Ave, San Francisco, CA 94115

Pacific Ave, San Francisco, CA 94115

Copyright © 2024, bogotobogo
Design: Web Master