EFFICIENT TRANSMISSION OF PICTORIAL INFORMATION
Document Type:
Collection:
Document Number (FOIA) /ESDN (CREST):
CIA-RDP80B00829A001100060001-0
Release Decision:
RIPPUB
Original Classification:
K
Document Page Count:
245
Document Creation Date:
December 9, 2016
Document Release Date:
August 22, 2001
Sequence Number:
1
Case Number:
Publication Date:
August 22, 1976
Content Type:
REPORT
File:
Attachment | Size |
---|---|
CIA-RDP80B00829A001100060001-0.pdf | 22.28 MB |
Body:
STATINTL
Approved For Release 2001/09/05
OF THE SOCIETY OF PHOTO-OPTICAL INSTRUMENTATION ENGINEERS
400.11=111111?11MIMIS
VOLUME 66
Efficient Transmission of
Pictorial Information
August 21-22, 1975 San Diego, California
a..t.t e4x..g.e.LA.ey.
,2-ey (90
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Proceedings of the Society of Photo-Optical Instrumentation Engineers
VOLUME 66
Efficient Transmission of
Pictorial Information
August 21-22, 1975 e San Diego, California
Andrew G. Tescher
Editor
Presented by
The Society of Photo-Optical Instrumentation Engineers
In cooperation with
Department of the Army, United States Army Electronics Command, Night Vision Laboratory
Department of the Navy, Office of Naval Research
Energy Research Development Agency
Environmental Research Institute of Michigan
Institute of Electrical & Electronic Engineers, San Diego Section
The Institute of Optics, University of Rochester
Jet Propulsion Laboratory
National Aeronautics & Space Administration, Ames Research Center
National Aeronautics & Space Administration, Headquarters, Washington, D. C.
National Science Foundation
The Optical Society of America
U. S. Department of Commerce, National Oceanic and Atmospheric Administration
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Copyright ? 1975 by the Society of Photo-Optical Instrumentation Engineers, 338 Tejon Place, Palos
Verdes Estates, California 90274 U.S.A. All rights reserved. This book or any part thereof must not
be reproduced in any form without the written permission of the publisher. Printed in U.S.A.
II / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Contents
Author Index
Seminar Committee vii
Reasons for Data Compression?An Introduction ix
Andrew G. Tescher, The Aerospace Corporation
SESSION I. FUNDAMENTALS OF IMAGE DATA COMPRESSION 1
An Overview of Human Observer Characteristics and their Effect on Image Transmission and Display 2
Thomas G. Stockham, Jr., University of Utah
Coding of Two-Tone Images 5
T. S. Huang, Purdue University
Source Coding of Television Signals Using Interframe Techniques 9
B. G. Haskell and P. L. Gordon, Bell Telephone Laboratories, Inc.
Bandwidth Compression of Multispectral Data 23
A. Habib' and A. S. Samulon, TRW Systems Group
Real Time Television Image Bandwidth Reduction Using Charge Transfer Devices 36
H. J. Whitehouse, R. W. Means, E. H. Wrench, Naval Undersea Center
Technology of Charge-Coupled Devices for Video Bandwidth Reduction 48
D. D. Buss, R. W. Brodersen, C. R. Hewes, A. F. Tasch, Jr., Texas Instruments, Inc.
SESSION II. ADVANCED IMPLEMENTATIONS 57
Real-Time Video Compression Algorithm for Hadamard Transform Processing 58
Scott C. Knauer, Ames Research Center, NASA
An Advanced Imaging Communication System for Planetary Exploration 70
Robert F. Rice, Jet Propulsion Laboratory
Advanced Facsimile System and Network Applications 90
John W. Watkins, Harris Corporation, Electronic Systems Division
An Operational Video Data Compression System for ATS and ITOS 94
R. L. Kutz, NASa/Goddard Space Flight Center; L. D. Davisson, Univ. of Southern California
A CAQ Bandwidth Reduction System for RPV Video Transmission 101
James J. Pearson, Lockheed Palo Alto Research Laboratory
Intraframe and Interframe Adaptive Transform Coding 108
Clifford Reader, Aeronutronic Ford Corporation
SESSION III. Panel: IMPACTS OF IMAGE DATA COMPRESSION 119
SESSION IV. POTENTIAL IMPLEMENTATIONS 121
Joint Pattern Recognition Data Compression Concept for ERTS Multispectral Imaging 122
Edward E. Hilbert, Jet Propulsion Laboratory
A Dual Mode Nonlinear Interpolative Compressor for SAR Images 138
Wen-hsiung Chen, Aeronutronic Ford Corporation
A Fast Two-Dimensional Karhunen-Loeve Transform 144
Robert M. Heralick, Norman Griswold, Nimitra Kattiyakulwanich, University of Kansas Center
for Research, Inc., Remote Sensing Laboratory
Singular Value Decomposition for Image Coding 160
Harry C. Andrews, University of Southern California, Image Processing Institute
TV Bandwidth Compression 161
Paul Wintz, Purdue University; Jim Wilson, WINTEK Corp.; Ernie Schmidt, Aeronutronic
Ford Corp.
DPCM Quantization Error Reduction for Image Coding 167
William K. Pratt, Michael N. Huhns, University of Southern California
Combined Spatial and Temporal Coding of Digital Image Sequences 172
John A. Roese, Naval Undersea Center; Guner S. Robinson, University of Southern California
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / III
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
SESSION V. RELATED TOPICS 181
Laser Scanner/Recorders for Image Transmission and Computer Processing 182
W. F. Schreiber, U. F. Gronemann, D. E. Troxel, Massachusetts Institute of Technology
Real-Time Image Data Acquisition/Retrieval (RIDAR) System 190
Harold H. Kantner, CAI, a Division of Bourns, Inc.
An Investigation of MSE Contributions in Transform Image Coding Schemes 196
John R. Parsons, Andrew G. Tescher, The Aerospace Corporation
A Comparison of Hardware Implementations of the Hadamard Transform for Real-Time Image Coding. . 207
Stephen C. Noble, Air Force Avionics Laboratory, Wright Patterson Air Force Base
A Teaching Stereo-Video Microscope 212
James F. Butterfield, Dimensional Television Corporation
Pictorial Information Transmission through Simulation 215
Robert T. P. Wang, Honeywell, Marine Systems Division California Center
Three-Dimensional Perception Produced by Two 2-Dimensional Geometric Patterns Presented with
Brief Exposure Times and Time Intervals 223
Kiyoe Mizusawa, The Pennsylvania State University
iv / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Author Index
Andrews, Harry C. (University of Southern California, Image Processing Institute) 160
Singular Value Decomposition for Image Coding
Brodersen, R. W. (Texas Instruments, mc) 48
Technology of Charge-Coupled Devices for Video Bandwidth Reduction
Buss, D. D. (Texas Instruments, Inc.) 48
Technology of Charge-Coupled Devices for Video Bandwidth Reduction
Butterfield, James F. (Dimensional Television Corporation) 212
A Teaching Stereo-Video Microscope
Chen, Wen-hsiung (Aeronutronic Ford Corporation) 138
A Dual Mode Nonlinear Interpolative Compressor for SAR Images
Davisson, L. D. (University of Southern California) 94
An Operational Video Data Compression System for ATS and ITOS
Gordon, P. L. (Bell Telephone Laboratories, Inc.) 9
Source Coding of Television Signals Using Interframe Techniques
Griswold, Norman (University of Kansas Center for Research, Inc.) 144
A Fast Two-Dimensional Karhunen-Loeve Transform
Gronemann, U. F. (Massachusetts Institute of Technology) 182
Laser Scanner/Recorders for Image Transmission and Computer Processing
Habibi, A. (TRW Systems Group) 23
Bandwidth Compression of Multispectral Data
Haralick, Robert M. (University of Kansas Center for Research, Inc.) 144
A Fast Two-Dimensional Karhunen-Loeve Transform
Haskell, B. G. (Bell Telephone Laboratories, Inc.) 9
Source Coding of Television Signals Using Inter frame Techniques
Hewes, C. R. (Texas Instruments, Inc.) 48
Technology of Charge-Coupled Devices for Video Bandwidth Reduction
Hilbert. Edward E. (Jet Propulsion Laboratory) 122
Joint Pattern Recognition/Data Compression Concept for EATS Multispectral Imaging
Huang, T. S. (Purdue University) 5
Coding of Two-Tone Images
Huhns, Michael N. (University of Southern California) 167
DPCM Quantization Error Reduction for Image Coding
Kantner, Harold H. (CAI, a Division of Bourns, Inc.) 190
Real-Time Image Data Acquisition/Retrieval (RIDAR) System
Kattiyakulwanich, Nimitra (University of Kansas Center for Research, Inc ) 144
A Fast Two-Dimensional Karhunen-Loeve Transform
Knauer, Scott C. (Ames Research Center/NASA) 58
Real-Time Video Compression Algorithm for Hadamard Transform Processing
Kutz, R. L. (NASA/Goddard Space Flight Center) 94
An Operational Video Data Compression System for ATS and ITOS
Means, R. W. (Naval Undersea Center) 36
Real Time Television Image Bandwidth Reduction Using Charge Transfer Devices
Mizusawa, Kiyoe (The Pennsylvania State University) 223
Three-Dimensional Perception Produced by Two 2-Dimensional Geometric Patterns
Presented with Brief Exposure Times and Time Intervals
Noble, Stephen C. (Air Force Avionics Laboratory, Wright-Patterson Air Force Base) 207
A Comparison of Hardware Implementations of the Hadamard Transform for
Real-Time Image Coding
Parsons, John R. (The Aerospace Corporation) 196
An Investigation of MSE Contributions in Transform Image Coding Schemes
Pearson, James J. (Lockheed Palo Alto Research Laboratory) 101
A CAQ Bandwidth Reduction System for RPV Video Transmission
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / v
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Pratt, William K. (University of Southern California) 167
DPCM Quantization Error Reduction for Image Coding
Reader, Clifford (Aeronutronic Ford Corporation) 108
Intro frame and lnterframe Adaptive Transform Coding
Rice, Robert F. (Jet Propulsion Laboratory) 70
An Advanced Imaging Communication System for Planetary Exploration
Robinson, Guner, S. (University of Southern California) 172
Combined Spatial and Temporal Coding of Digital Image Sequences
Roese, John A. (Naval Undersea Center) 172
Combined Spatial and Temporal Coding of Digital Image Sequences
Samulon, A. S. (TRW Systems Group) 23
Bandwidth Compression of Multispectral Data
Schmidt, Ernie (Aeronutronic Ford Corporation) 161
TV Bandwidth Compression
Schreiber, W. F. (Massachusetts Institute of Technology) 182
Laser Scanner/Recorders for Image Transmission and Computer Processing
Stockham, Jr., Thomas G. (University of Utah) 2
Overview of Human Observer Characteristics & their Effect on Image Transmission & Display
Tasch, Jr., A. F. (Texas Instruments, Inc.) 48
Technology of Charge-Coupled Devices for Video Bandwidth Reduction
Tescher, Andrew G. (The Aerospace Corporation) ix 196
Reasons for Data Compression?An Introduction
An Investigation of MSE Contributions in Transform Image Coding Schemes
Troxel, D. E. (Massachusetts Institute of Technology) 182
Laser Scanner/Recorders for Image Transmission and Computer Processing
Wang, Robert T. P. (Honeywell, Marine Systems Division California Center) 215
Pictorial Information Transmission through Simulation
Watkins, John W. (Harris Corporation, Electronic Systems Division) 90
Advanced Facsimile System and Network Applications
Whitehouse, H. J. (Naval Undersea Center) 36
Real Time Television Image Bandwidth Reduction Using Charge Transfer Devices
Wilson, Jim (WINTEK Corp) 161
TV Bandwidth Compression
Wintz, Paul (Purdue University) 161
TV Bandwidth Compression
Wrench, E. H. (Naval Undersea Center) 36
Real Time Television Image Bandwidth Reduction Using Charge Transfer Devices
vi / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
roved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Seminar Committee
General Chairman
Andrew G. Tescher, The Aerospace Corporation
Technical Program Committee
Harry C. Andrews, University of Southern California
All Habibi, TRW Systems Group
John R. Parsons, The Aerospace Corporation and University of Arizona
Session I. Fundamentals of Image Data Compression
Chairman, Andrew G. Tescher, The Aerospace Corporation
Session II Advanced Implementations
Chairman, Ali Habibi, TRW Systems Group
Session III. Impacts of Image Data Compression
Panel Moderator, Andrew G. Tescher, The Aerospace Corporation
Session IV. Potential Implementations
Chairman, J. R. Parsons, The Aerospace Corporation
Session V. Related Topics
Chairman, Harry C. Andrews, Image Processing Institute, University of Southern
California
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / vii
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
viii / SPIE Vol. 66 (1975) Efficient Transmission of fictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP806008 9A001100060001-0
Reasons for Data Compression?An Introduction
Andrew G. Tescher
The Aerospace Corporation, P. 0. Box 92957, Los Angeles, California 90009
Visual data is the primary form by which man acquires information. However, the re-
quired information and its potential user may be separated by large distances. Although
conventional communication channels are available, the large amount of information
present in pictorial data quite often taxes this available communication bandwidth. At
the practical level, pictorial data transmission may be prohibitively expensive.
The seminar on "Efficient Transmission of Pictorial Information" attempted to cover
various problem areas associated with visual information transfer technology. The twenty-
six, primarily invited, papers address current and future problem areas. The various sub-
ject areas can be grouped into image representation, image coding, psychophysical consid-
erations, channel effects, image quality, data management, and display technology. There
is a balanced ratio of primarily theoretical discussions and those papers which emphasized
hardware development.
By design, the presentation level of the various papers is diverse. On one side of the spec-
trum, the several review papers, whose authors are well-respected experts of long experi-
ence in this field, are primarily provided for the newcomers to the subject of image com-
pression. The majority of the papers, however, present new and valuable results and in
most cases advance the state-of-the-art of their respective specializations.
In addition to the formal presentations, an informal panel discussion was organized and
moderated by this author. The panelists were F. Billingsley of JPL, H. M. Federhen of
ARPA, C. R. Hewes of Texas Instruments, Inc., and P. A Wintz of Purdue University.
The panel, interactively with the numerous individuals offering comments and sugges-
tions, has reiterated the need for new solutions in the field of pictorial data communica-
tion.
Regarding just how fast new solutions will come, the opinion is diverse. Perhaps based on
a personal bias leaning towards optimism, this author believes the new and effective
solutions will come soon. This belief is supported by the extensive and high quality ac-
tivity in the field of image compression and related technologies.
Andrew G. Tescher, Editor
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / ix
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
x / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP801300829A001100060001-0
Session I
FUNDAMENTALS OF IMAGE DATA COMPRESSION
Chairman
ANDREW G. TESCHER
The Aerospace Corporation
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 1
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
AN OVERVIEW OF HUMAN OBSERVER CHARACTERISTICS AND THEIR EFFECT
ON IMAGE TRANSMISSION AND DISPLAY
Thomas G. Stockham, Jr.
Departments of Computer Science and Electrical Engineering
University of Utah
Salt Lake City, Utah 84112
The science and technology of image transmission and display has evolved primarily
from the diciplines of computer science, electrical engineering and physics. Thus, it is
only natural that techniques and attitudes which have developed are characterized by the
style in which people in these areas approach the subject of images and their trans-
mission, display, and processing. For example, one finds many important differences
between the methods of television and those of photography. Moreover, an even greater
contrast is found when comparing the methods of electrical engineering, physics, tele-
vision and photography with those now understood to be employed by the human eye and the
human visual system. Specifically, let us contrast the method by which television,
photography and the human visual system, represent image information. In television
(particularly digital imaging), image values are represented by signals analogous to
quantities of light. They are called intensities. In photography, on the other hand,
the representing quantities are concentrations of silver or dyes. Consequently, due to
the natural exponentiating laws governing the interaction of light with these media,
they are analogous to the logarithm of quantities of light. They are called densities.
The human visual system, on the other hand, (while being generally logarithmically
sensitive) moves a large step further away from representation by physical quantities of
light and produces at very early stages in its processing highly modified versions of the
patterns of light or their logarithms. The natural question then arrises, why should the
human visual system try to do this; and since it does, what consequences are implied in
terms of the television and photographic presentations normally employed?
The answers to these questions rest crucially on the issue of errors. In any trans-
mission or display system, errors will be committed. These are unavoidable and are a
result of the physical limitations encountered. In a broad variety of applications, the
most important forms of error encountered are those imposed by limited dynamic range and
various forms of noise. The classic dilemma one faces in the diciplines of image trans-
mission and display is that a compromise must be effective between the conflicting con-
straints of dynamic range and noise. On the one hand, one wishes to make signals larger
so that the noise may be rendered negligible. At the same time, large signals are pre-
cluded by the limitation in dynamic range in the form of distortions which effect the
signals when their values are made too variable. More specifically, in any transmission
and/or display design, the natural goal would seem to be the attainment of fidelity
reproduction. Unfortunately, the demands of typical imagery upon transmission and dis-
play systems is usually so severe that this goal cannot be reasonably met. The dis-
tortions due to limited dynamic range and noise, especially the former, are relentlessly
unavoidable and so the designer must content himself with one form of distortion or
another.
Fortunately, the human visual system itself produces a large quantity of distortion.
This phenomenon may be exploited by trading off undesirable forms for forms which will be
encountered naturally anyway. The distortions produced by the human visual system are
often referred to as optical illusions. The simplest of these is the gray scale illusion
which reveals the logarithmic sensitivity of the human visual system alluded to above(1).
This illusion is responsible for the fact that we do not turn the lights on during the
day although they add just as much light then as at night. Another ramification is that
gray scales must be arranged in exponential progression to appear arithmetic to the
observer.
Other illusions which are spatial in character are much more important and striking,
however. The simplest of these is the illusion of simultaneous contrast which permits
two neutral gray shades to appear so different from one another at the same time, that
one can assign the names near-white to one and near-black to the other. Careful study of
these illusions has permitted researchers to formulate signal processing models for the
early processing stages of the human visual systekn. These models (e.g. consider those of
Stockham(2), Baudelaire(3), Frei(4), and Baxter(5)) predict the human visual illusions or
distortions well enough to permit their use in defining useful objective measures of how
different two images (one original, the other distorted) will look from one another.
Applications of these models to problems in image transmission and display have
already yielded significant advantages. For example, when the model is used to evaluate
the distortions produced by a given display instrument, it is often possible to predict
how the physical performance of that display may be significantly relaxed without
2 / SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
producing appreciable visual degradation. Additionally, by using the visual model as a
processor prototype, it is possible to "look at the image" before it is displayed or
transmitted by first processing the image with the visual model processor and then trans-
mitting or displaying it. This procedure places the image information in a signal space
maximally compatible with the most recent knowledge of the human visual system before the
degradations characteristic of either transmission or display are allowed to detract from
the quality.
The advantages are significant as can be seen from (2) and (6). Rom shows how in the
case of transmission with coding, it is possible by including human observer character-
istics in the criterion for error to produce significantly superior subjective image
transmission. First, the image isprocessed through the visual model. The resulting
signal is then coded and transmitted. The received image is then passed through the
inverse of the visual model and displayed. Subject to the validity of the model, the
effect is as if the distortions due to coding were not allowed to enter the system until
the image had been processed significantly by the human visual system itself.
The most important problem in image display is that associated with limited dynamic
range. Cathode ray tubes, photographic transparencies, reflection photographs (i.e.
prints), and images printed with ink on paper all face the basic light-to-dark ratio
limitations of their media. For example, a good photographic film is not capable of
reflectiong much more than 90% of the light that strikes it. Neither is it capable of
reflecting much less than 1% of the light that strikes it. The result is a dynamic range
limitation of about 100:1. Unfortunately, high fidelity images such as those seen by the
human eye in the real world, obtain dynamic ranges far in excess of 1000:1 or even
10,000:1. This limitation can and has been overcome throughout the years by various
techniques. Now, however, knowledge of the human visual system allows us to effect the
same dynamic range limiting processes on the raw images to be displayed that the eye uses
naturally. Since these processes take place anyway when the display is viewed, little is
lost by also invoking them before the limitations of the display or transmission medium
is encountered. In addition, by the very nature of the processes which the visual system
uses, double processing is not subjectively disagreeable and often can hardly be dis-
tinguished from the single natural process which takes place in direct viewing of the
original scene. Thus, by deliberately distorting the image before transmission and dis-
play by using visual model processing, one can avoid the much more disagreeable dis-
tortions which would be otherwise encountered due to the transmission or display.
As our knowledge of human observer characteristics grows, it is possible to formulate
more sophisticated and complex visual models, and our abilities to cope with the problems
being described here increases. While the latest work,inrvi.sual modeling has included
such aspects of vision as color constancy and contrast(7J01) as well as the relatively
recently disclosed frequency channel model of vision(8)(5), it is anticipated that in the
future, much more sophisticated aspects will also be embraced. Such aspects will
undoubtedly include edge detection, motion sensitivity, non-uniform visual acuity over
the field and perhaps higher forms of image abstraction.
Studies are already underway in an effort to discover methods for analyzing images by
synthesis. One possible synthesis method is through the modern technology of shaded
image synthesis by computer. While image transmission systems employing coding have for
the most part made use of signal properties to achieve their bandwidth reduction, it is
also possible to achieve potentially much greater reduction by coding to a level embrac-
ing the geometric structure constituting the scene which has been imaged. Such a break-
through would be based, not only on the structure of images themselves, but also upon the
way in which that structure is perceived by the visual system. An analogy with speech
bandwidth compression, which is capable of much greater information rate savings then the
best general sound bandwidth compression methods, is an informative one. The basic
structure of speech and the mechanism which produces it, is exploited in such bandwidth
compression methods. Techniques for detecting the parameters of vocal chord motion and
the shape of the human vocal tract directly from the acoustic signal are at the heart of
such techniques. In addition, the nature of the humanhearing is exploited in effecting
compromises which must be introduced during the analysis.
So it is that one might expect the most modern efforts in image transmission, display
and processing to be strongly influenced by a knowledge of the human observer charac-
teristics. The implication of this statement is that the image processing scientist must
be well versed not only in the physics of optics, the electrophysics of sensing, the
electronics and chemistry of transmission, and the analog and digital diciplines of pro-
cessing, but also in the psychophysics, biophysics and psychology of vision.
References
1. Cornsweet, T.N. Vi4uat Penception. Chapter X. New York:Academic Press, 1970, pp.
249-253.
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 3
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
2. Stockham, T.G., Jr. "Image Processing in the Context of a Visual Model," Proc.
IEEE, vol. 60, pp. 828-842, July 1972.
3. Colas-Baudelaire, Patrick. "Digital Picture Processing and Psychophysics: A
Study of Brightness Perception," Computer Science Dept., University of Utah, Salt Lake
City, Utah, UTEC-CSc-74-025, March 1973.
4. Frei, Werner. "A New Model of Color Vision and Some Practical Implications,"
Univ. of Southern California Image Processing Institute, Semiannual Technical Report,
March 31, 1974, USCIPI Report 530, pp. 128-143.
5. Baxter, Brent S. "Image Processing in the Human Visual System," Computer Science
Dept., University of Utah, Salt Lake City, Utah, (in publication).
6. Rom, Raphael. "Image Transmission and Coding Based on Human Vision," Computer
Science Dept., University of Utah, Salt Lake City, Utah, (in publication).
7. Cornsweet, T.N. Vitluat Peteeption. Chapter XIII. New York:Academic Press,
1970.
8. Campbell, F.W., Cooper, F.F. and Christina Enrath-Cndgell. "The Spacial Cells of
the Cat." Journal of Physiology 203, p. 223.
4 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
CODING OF TWO-TONE IMAGES
T. S. Huang
School of Electrical Engineering
Purdue University
West Lafayette, Indiana 47907
Abstract
We give a brief overview of efficient coding methods for two-tone Images, especially: white block skip-
ping and runiength coding.
I. Introduction
In this tutotial paper we discuss some aspects of the digitization and effcient coding of two-tone
images, such as business documents, line drawings, and weather maps.
After a brief discussion on sampling and quantization of nominally two-tone images, we describe the two
main classes of efficient coding methods: White skipping, and contour coding. Perhaps the most commonly
used efficient coding scheme is runlength coding. The remaining part of the paper concentrates on easily
implementable codes for runlengths.
II. Digitization
We shall concern ourselves with the digital transmission of images. Then, the first step is to sample
the imayg and quantize the amplitude of each sample. For two-tone images we quantize the amplitude to two
levels' '; thus each sample is represented by one bit. The number of samples required depends on the
Image at hand. We take two examples: business letters and newspaper pages.
In analog transmission of business letters, a resolution of around 100 lines per Inch gives satisfactory
reproductions. For digital transmission where the amplitude of each sample is quantized to only two levels,
the same resolution of 100 lines per Inch yields reproductions which are marginally acceptable, the main de-
fect being the staircase appearance of the boundaries between the black and the white. We can iTuove the
image quality by either increasing the resolution or using more than two levels for each sample ' . Keep-
ing two levels per sample, a linear resolution of around 150 points per inch (pp!) is required for satis-
factory image quality. At 150 eel and 1 bit per sample, a 8 1/2" x 11" letter gives us approximately 2
million bits. To transmit such a letter using PCM through a telephone channel with a 2400 bits/sec modem
takes about 14 minutes. In order to reduce the transmission time to say I minute, a data reduction or
compression ratio of 14:1 is required.
There has been considerable interest in the digital transmission of newspaper pages(2). It has been
found that, for text material, a samp11990ensity of 400 ppi is sufficient to ensure satisfactory repro-
duction quality. For halftone pictures'', the required sampling density depends on two factors: (1) how
many equivalent brightness levels (i.e. how many different halftone dot sizes) would be desirable; and (2)
how dense we have to sample in order to avoid the appearance of moire patterns.
No systematic subjective experiments have been conducted ylp respect to the first question. However,
to extrapolate from experiments with continuous-tone pictures' , it is estimated that 64 levels are suf-
ficient and 32 levels may be satisfactory In many cases. Let fh and f be the halftone screen density and
the sampling density,2respectively, in ppi. If f is k times fh, thea we have k samples per dot, which
allows us to create k dot sizes (by making some 8f the samples black). For example, for 64 levels, we
need f =8th' ? for 32 levels, fs = 6f
With respect to the second question, less is known. It is geryally assumed that f must be at least
10f, in order to avoid moire patterns. However, recent experiments showed that the facior 10 might be too
conVervative since no moire patterns were observed in any experiment with fs > 8fh.
From the above discussion, then, it seems that fs = 8f, is a good choice for the sampling density in the
case of halftone pictures. A typical value for fk is 85 pH!. The corresponding sampling density is
f = f, = 680 ppi, which is 70% larger than what Ts needed for the text material. Since, operationally, it
much simpler to use the same sampling density for both the text material and the halftone pictures, we
will be wasting a lot of samples in the text material. In any case, it should be obvious that a huge
amount of bits is required to represent newspaper pages. Even with a wideband channel, some data com-
pression is desirable.
III. Basic Principles of Efficient Coding
To reduce transmission time, we can either do source coding to reduce data redundancy or to design
higher speed modems. In this paper we are concerned with the first approach. We shall also limit our-
selves to error-free coding - the receiver can reconstruct an exact copy of the original digitized image
(assuming the channel is noiseless).
SPIE Vol, 66 (1975) Efficient Transmission of Pictorial Information / 5
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
There are two basic principles of efficient coding for two-tone image: skipping white, and contour
coding.
In most two-tone images, such as business documents and weather maps, the information is contained in
black and there is a large amount of white background. We should aim to skip the white space and not trans-
mit it.
Furthermore, to transmit the (sampled) black areas, we h. 4e to transmit only the boundary points. The
receiver can fill in the interiors by black.
In sections IV and V, we discuss some practical implementations of these basic principles.
IV. White Block Ski2ping
Most documents and line drawings contain a large amount of white space. One approach to theirfgfficient
coding isf6o skip the white space. A simple way of doing that was suggested by de Coulon and Kunt''',
Horlander 1, and others. Each sampled scan line is divided into N-picture-element (pel) blocks. For a
block containing all white pels, the 1-bit code word "0" is used. For a block containing at least one
black pel, we use an (N+1)-bit code word, the first bit being "1" and the next N bits being the binary
pattern of the N-pel block in question. For example, if N = 4 and a block contains two white pels followed
by two black pels, then the code word for that block is "10011" - we have used "0" to represent white and
"1" black. We call this code the white block skipping code.
Once a value for N is fixed, the optimum code for the 2N possible N-pel patterns is of course the
Huffman code based on the probability distribution of these patterns. However, the implementation of a
Huffman code is complicated. If the image being coded contains a large amount of white, then the white
block skipping (WBS) code may perform almost as well as the Huffman code and in the meantime is much
simpler to implement.
For a given image or class of images, the average bit rate bN (bits/pel) of the WBS code depends on the
block size N. We would like of course to use an N which minimizes the bit rate bN.
The WBS code does not take full advantage of the fact that most texts contain a large number of all-
white scan lines. To take advantage of that, we propose a modified WBS code where all-white scan lines are
treated specially (7). We use the 1-bit code word "0" to represent an all-white scan line. For a scan line
containing at least one black pel, we transmit "1" followed by the regular WBS code words for that line.
Assuming N = 3, then for a scan line containing the pels: WWWBBWBWBW..., we will transmit: 1011101101...".
The first bit "1" indicates the scan line is not all white, and the second bit "0" represents WWW, etc.
The optimum N is chosen by looking at the statistics of the image after we have discarded the all-white scan
lines. Thus, by adding a one-bit overhead for each line, we have eliminated the necessity of coding and
transmitting the all-white scan lines. This overhead is low on a per pel basis. For example, if we have
1,000 pels in each scan line, then the overhead is only .001 bit/pel which is negligible in most cases.
For images such as weather maps which contain very few or no all-white scan lines, the modified code
reduces essentially to the regular WBS codes.
For some experimental results, see Table I. The pictures used are the same as in Ref. 5.
V. Runlength Coding and Extensions(8)
In runlength coding, the image is scanned linearly in the usual manner; however, instead of transmitting
the individual samples, the lengths of the black and the white runs are transmitted. To achieve compression,
the statistics of the runs are utilized. Some results are shown in Table I. Huffman coding was used for
the runs. For a given set of probabilities for the runlengths, the most efficient code is of course the
Huffman code. However, the implementation of a Huffman code requires a large dictionary and is complicated.
There are suboptimum variable-length codes which are almost as efficient as Huffman codes yet much
simpler to implement. Two clas of such codes, the linear codes and the logarithmic codes are particular-
ly suitable for graphical data'''. The linear codes, which includes the Laemmel codes and the Golomb codes,
has the property that the code word length increases approximately linearly with runlength. Tfm\are nearly
optimum for exponential distributions. The logarithmic codes, which includes the Hasler codes' ', are so
named because the codeword length increases approximately logarithmically with runlength. They are nearly
optimum for negative-power distributions.
Experimental evidence indicates that the frequency distributions of runlengths of alphanumerical data
(typewritten letters, printed texts, etc.) are roughly negative-power, and that those of runlengths of line
drawings and weather maps are roughly exponential. One would therefore expect that linear codes are nearly
optimum for line drawings and weather maps and that logarithmic codes are nearly optimum for alphanumerical
data. Such indeed turns out to be the case.
Both linear and logarithmic codes are easy to implement. The main part of the encoder or decoder is
a counter.
An extension of runlength coding, called PDQ, was described in Ref. 8. In this method, essentially the
changes in the black-to-white and white-to-black transition locations from scanline to scanline are coded.
6 SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Assuming that the black/white boundaries have no preferred direction, we can show that the probability dis-
tribution of the amount of change of a transition location from line to line is Cauchy, which is approxi-
mately negative-power. We therefore expect that logarithmic codes are suitable in this case.
VI. Other Coding Schemes
For other coding schemes, see Refs. 11-14. For channel error consideration, see Ref. 15.
Acknowledgement
This work was supported in part by the Advanced Research Projects Agency of the Department of Defense
under Contract No. F 30602-75-C-0150.
Table I. Bit Rates of Coding Schemes
Picture
Bit rate (Bits/pel)
Number
WBS Code
Modified WBS Code
Rullength Coding
Al
(typewritten)
.15
.10
.10
A2
(typewritten)
.30
.18
.22
A3
(circuit
diagram)
.25
.25
.13
A4
(handwritten)
.19
.14
.12
A5
(weather
map)
.40
.41
.24
A6
weather
map)
.37
.37
.23
References
1. Huang, T.S. and Koller, H. U., "Coding of Multilevel Images," IEEE Trans. on Communications,
June 1975.
2. ANPA Research Institute Bulletin 949, All-digital page facsimile tests at the ANPA/RI Research
Center, March 18, 1968.
3. Huang, T.S., "Digital Transmission of Halftone Pictures," Computer Graphics and Image Processing,
September 1974.
4. Huang, T. S., "PCM Picture Transmission," IEEE Spectrum, December 1965.
5. Coulon, F. de and Kunt, M, "An Alternative to Run-length Coding for Black-and-White Facsimile,"
Proc. of the 1974 International ZUrich Seminar on Digital Communications, ZUrich, Switzerland, March 1974.
6. Horlander, F. J., IBM Disclosure, 1972.
7. Huang, T. S. and Hussain, A. B. S., "Facsimile Coding by Skipping White," to appear in IEEE Trans.
on Communications.
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 7
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
8. Huang, T. S., "Runlength Coding and its Extensions," in Picture Bandwidth Compression, ed. by
T. S. Huang and O. J. Tretiak, Gordon and Breach, 1972.
9. Huang, T. S., "Easily Implementable Suboptimum Runlength Codes," 1975 Int'l Communication Conference
Record, June 16-18, 1975, San Francisco.
10. Meyr, H, Rosdolsky, H and Huang, T. S., "Optimum Runlength Codes," IEEE Trans. on Communications,
June 1974.
11. Arps, R. B., "Bibliography on Digital Graphic Image Compression and Quality," IEEE Trans. on Inf.
Theo., Jan. 1974.
12. Stern, P. A. and Heinlein, W. E., "Facsimile Coding Using Two-dimensional Runlength Prediction,"
Proc. 1974 Zurich Seminar on Digital Communications, ZUrich, Switzerland, March 1974.
13. Ascher, R. N. and Nagy, G., "A Means for Achieving a High Degree of Compaction on Scan-digitized
Printed Text," IEEE Trans. on Computers, Nov. 1974.
14. Session 7: Digital Facsimile Compression, 1975 Int'l Comm. Conf., June 16-18, 1975, San Francisco.
??
15. Koller, H. U., "Einfluss von Kanalfehlern auf Digitale Faksimile-Ubertragungen," Ph.D. Thesis,
Institut fUr Technische Physik, ETH-ZUrich, Switzerland, 1975.
8 / SPIE Vol 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
SOURCE CODING OF TELEVISION SIGNALS
USING INTERFRAME TECHNIQUES
by
B. G. Haskell and P. L. Gordon
Bell Telephone Laboratories, Incorporated
Holmdel, New Jersey
The channel capacity required for long distance digital transmission of video signals
can be substantially reduced through a system using interframe coding. This is possible
because of the amount of frame-to-frame redundancy which exists in a video source
output. Through appropriate signal processing it is feasible to send only the changed
area of each frame rather than the full frame provided that the previous frame values
are retained in memories.
Since the very beginning of television, efforts have been underway to develop cheaper
means for transmitting video signals.(1,2) Recently, most of the attention has been
focused on digital systems because, as is well known, noise does not accumulate in
digital regenerators as it does in analog amplifiers, and, in addition, signal
processing is much easier in a digital format.
Progress is being made on two fronts. First, the present high cost-2er-bit of
transmitting a digital data stream has generated interest in a number of methods for
cost reduction which have general application and are not confined to data streams
produced by a video signal source. However, it is important to remember that video bit-
rates tend to be considerably higher than those required for voice or data transmission.
The most promising techniques for more economical digital transmission include
waveguides, optical fibers, laser light pipes, among others.
The second front on which progress is being made involves reducing the number of bits
which have to be transmitted in a video communication system. The balance of this paper
deals with this facet of the problem.
Bit-rate reduction is accomplished by eliminating, as much as possible, the substantial
amount of redundant information which exists in a video signal as it leaves the camera.
The cost of the signal processing which is required to reduce this redundancy determines
its economic feasibility in a given system. The savings which accrue from lowering the
transmission bit-rate must more than offset the cost of the required signal processing
if redundancy reduction is to be economical.
Statistical Redundancy and Subjective Redundancy
If an information source such as a television camera produces statistically redundant
data - that is, information which could just as well have been derived from past data -
then a saving of transmission bit-rate can result if the redundant information is
removed prior to transmission. In most cases this requires, at the transmitter, a
capability for storing some of the past source output so that a decision can be made as
to what is and what is not redundant in the present source output. Memory of past
information is also required at the receiver so that the redundancy can be rederived and
inserted back into the data stream in order to reproduce the original signal.
For example, in a television picture successive picture points (elements) along a line
are very much alike, and redundancy reduction can be achieved by sending element-to-
element differences instead of the elements themselves. The differences are small most
of the time and large only occasionally; thus, an average bit-rate saving can be
obtained by using short binary words to represent the more-probable, small differences
and longer binary words to represent the infrequent, large differences. In Figure 1,
single element delays are used to store the previous element so that differences can be
generated at the sending end and picture elements can be reconstructed at the receiving
end. In successive frames a picture element also changes very little on the average.
To transmit frame-to-frame differences, the delays in Figure 1 should be increased from
an element period to a frame period. The delay elements comprise the aforementioned
memory required for redundancy removal. Statistical redundancy is not the only form of
redundancy in a video signal. There is also considerable subjective redundancy; that
is, information which is produced by the source, but which is not necessary for
subjectively acceptable picture quality at the receiver. For example, it is well known
that viewers are less sensitive to degradations near edges, i.e., large brightness
transitions, in the picture. Also, viewers require less reproduced resolution for
moving objects in a picture than for stationary objects. Thus, in nonscientific
applications where exact reproduction of the video source output is not necessary as
SPIE Vol 66 (1975) Efficient Transmission of Pictorial Information / 9
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
long as the displayed picture is subjectively pleasing, a further reduction in
transmission bit-rate can be achieved by removing subjective redundancy.
For example, the element-differential PCM coder of Figure 1 need not transmit large
differences as accurately as small differences because of the viewer's insensitivity to
distortion near large brightness transitions. Thus, prior to transmission, large
element differences can be quantized more coarsely than small differences, thereby
reducing the number of levels which must be transmitted. Since the original signal is
distorted by using this type of quantization (companded quantization) the output of the
transmitter delay in Figure 1 is not the actual previous element as produced by the
camera. It is, instead, the previous element as it appears at the receiver with the
subjective redundancy removed.
Videotelephone format pictures (1 MHz, 267 lines) can be transmitted with 16 level
quantization of element differences (4 bits per picture element). The bit-rate can be
reduced further by using multilength binary words for transmission; however, a buffer
memory is then needed to transmit the resulting irregular data-rate over a constant bit-
rate digital channel.
CODES TO
-401 COD Fr BINARY
CODER CHANNEL
INPUT 4-09--orE1-0--- QUANTIZED
ELEMENT
QUANTIZER DIFFERENCE
/".
PREVIOUS
ELEMENT
CODES FROM
BINARY
CHANNEL
REPRESENTATION
DELAY
OF INPUT
[1 ELEMENT PERIOD)
DEC
DECODER
TRANSMITTER
IP. OUTPUT
DELAY
[1 ELEMENT PERIOD)
RECEIVER
Figure 1 Transmission of element-to-element differences instead of the elements
themselves. The differences are small most of the time. Thus, a bit rate
reduction results if short binary code words are used for small differences
and longer code words are used for large differences. Frame-to-frame
differences can be transmitted if the delays are increased to a frame
period.
Simple Methods of Frame-to-Frame Redundancy_Reduction
In order to reduce frame-to-frame redundancy a memory or delay capable of storing an
entire frame of video information is needed. At present, this requirement is the main
factor determining the economic feasibility of frame-to-frame signal processing.
However, it is expected that costs for digital storage will continue to decline, thereby
making this type of signal processing even more attractive in the years to come.
Devices for digital storage which are currently being studied include magnetic bubble
memories,(3) charge coupled devices,( large MOS stores(5) and surface acoustic wave
devices. (6)
One method of removing frame-to-frame redundancy is simply to reduce the number of
frames that are transmitted per second as shown in Figure 2. At the receiver, frames
are repeated as in motion picture projection in order to avoid flicker in the display.
This technique takes advantage of the fact that frame display rates must be greater than
about 50 Hz to eliminate objectionable flicker, whereas something between 20 and 30 Hz
is all that is required for rendition of normal motion, and less than 15 Hz for
rendition of low speed movement.(7)
10 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
In most systems interlaced scanning already takes advantage of this phenomenon to some
extent. Odd numbered lines are sent during one half frame period (field 1) and even.
numbered lines during the other half frame period (field 2). For example, broadcast
television and videotelephone systems in the United States transmit 30 frames per second
using 2:1 interlace (60 fields per second). A 50 percent reduction in bit-rate can then
be obtained by transmitting only 15 frames per second and displaying each frame twice as
shown in Figure 2; however, jerkiness is then noticeable if the scene contains objects
moving with moderate or rapid speed.
ONLY ALTERNATE
FRAMES ARE TRANSMITTED FRAMES
TRANSMITTED ARE DISPLAYED TWICE
I I
1 TRANSMISSION 1
1 1
INPUT ????ji ? CHANNEL I
(30 FRAMES /SEC) (15 FRAMES/SEC)' ? I?
I
TWO INTERLACED 11 1(30
0 OUTPUT
FRAMES/SEC)
FRAME
FIELDS PER FRAME DELAY
Figure 2 2:1 frame repeating. Reducing the frame transmission rate by half (15
Frames/sec) and displaying each frame twice at the receiver (30 Frames/sec)
works well for movement at slow speeds but gives rise to visible jerkiness
during movement at higher speeds.
Figure 3
El-BET SIGNAL
VIDEO
PCM --pc>
M-LEVEL
QUANTIZER
REFERENCE FRAME
SIGNAL DELAY
MEMORY
REFERENCE
SIGNAL
TRANSMISSION
CHANNEL
ADDER
FRAME
DELAY
MEMORY
? 0.
VIDEO
OUTPUT
Frame-differential PCM. When transmitting frame-to-frame differences small
values are quantized finely to give high quality reproduction in stationary
areas of the picture, and large values are quantized coarsely to take
advantage of the viewer's tolerance of brightness errors near moving edges
in the picture. Because of the feedback configuration, a high quality
picture is displayed within a few frame periods after motion in the scene
has ceased.
If in Figure 1 the delays of the feedback coder and decoder are increased to one frame
period, then the system transmits companded frame-differences as shown in Figure 3.
Since small frame differences are finely quantized, scenes containing little or no
movement are displayed with exceptionally high picture quality. When movement does
occur, thereby creating larger, more coarsely quantized frame differences, some
distortion is produced; but it is all confined to the moving areas of the picture where
it is less noticeable to the viewer. Because of the feedback configuration, a high
quality picture is again displayed within a few frame periods after motion in the scene
ceases.
In this system a digital transmission error caused by the channel introduces an
erroneous picture element into the receiver frame memory where it would remain
indefinitely if some means for removing it were not provided. One method for doing this
(called leak) is to reduce the magnitude of the reference signals by a small percentage
both at the transmitter and at the receiver. The effect of a transmission error then
slowly dies away as time progresses. Another technique which is more useful in
interframe coders (called forced updating) is to transmit a small percentage of the
picture elements in each frame as PCM. In this case a transmission error will remain
visible until the corresponding PCM updating information arrives, at which time the
error will suddenly disappear.
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 11
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
For video signals, between 30 and 60 levels of frame difference quantization are
required for good picture quality during rapid motion.(8) However, if picture quality
during periods of movement is of no concern, then two-level quantization, i.e., one bit
per element or frame-to-frame delta-modulation, is sufficient to reproduce still scenes
such as charts or graphs with excellent fidelity.(8) Since more quantization levels are
required for frame differences than for element differences (for approximately
equivalent picture quality during periods of rapid movement) there is little advantage
in frame-differential coding unless multilength binary code words are used along with
fairly large buffers. And if complex operations such as these are to be carried out,
other techniques can be used which are of the same order of complexity but are much more
efficient in terms of bit-rate reduction.
Figure 4
a) Videotelephone scene containing movement.
b) Significant changes in a videotelephone scene containing moderate
motion. The white picture elements denote frame-to-frame differences
which are larger in magnitude than 1.5 percent of maximum. There are
about 10,000 changes in this frame out of a total of about 56,000
visible elements.
c) The white picture elements here are the ones which would be transmitt-d
using cluster coding. For each cluster, an address is transmitted
corresponding to the first picture element; then the frame differences
are sent followed by a special code signalling the end of the cluster.
Isolated changes in Figure 4b are ignored, and gaps of six picture
elements or less between clusters are bridged to avoid sending excessive
addressing bits.
12 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Conditional Replenishment of the Referengg_Signal
In most videotelephone scenes, for example, the amount of movement is fairly small (see
Figure 4b). There is very little camera panning, and there are few sudden scene
changes. Furthermore, when there is a lot of movement such as when the camera is moved
or when graphical material is being positioned, it is not necessary to transmit the
picture with as high quality as in the normal situation. Thus, since there are usually
so many small frame differences, there is considerable advantage in not transmitting
them and, instead, sending only the frame differences which are significant;(1o,11) that
is, leaving the reference signals in Figure 3 unchanged unless the frame difference is
large enough to warrant replenishment with new information. If only the significant
frame differences are to be transmitted, i.e., those larger than about 1.5 percent of
maximum, then addressing information must also be sent to tell the decoder where in the
picture the received frame differences should be placed. Also, since data is generated
at a very irregular rate depending on the location and on the amount of moving area in
the picture, an elastic digital store or buffer must be used at the transmitter and
receiver in order to use a constant bit-rate digital channel.
Conditional replenishment takes advantage of the long runs of ingignificant differences
that normally occur in videotelephone pictures. It has been found that gignificant
frame differences also tend to occur in clusters,(8) and for this reason it is more
efficient to address only the beginning and end of each cluster than to address each
significant change separately. Moreover, since an address generally requires many more
bits for transmission than does a frame difference, it is worthwhile to coalesce
clusters of significant changes which are separated only by small gaps as well as to
eliminate as much as possible isolated differences due to noise. The white dots in
Figure 4b show the significant changes; Figure 4c shows the result of removing isolated
changes and of bridging small gaps between clusters of changes.
SIGNIFICANT
CHANGE DETECTOR
SEGMENTER
DELAY-D
FEEDBACK TO PREVENT BUFFER
OVERFLOW AND UNDERFLOW
8-BIT
PCM
CODER SIGNAL
ADDRESS
GENERATER
FRAME 1
DIFFERENCE TO
TRANSMISSION
CHANNEL
0 --IP' 0 ?*ire?
VIDEO .? ipf
DELAY OUANTIZER
REFERENCE
PICTURE
FRAME PERIOD
MINUS 0
4
DELAY
DELAY
Figure 5 A conditional replenishment coder which transmits clusters of significant
frame differences. The segmenter segments each frame into changed areas and
stationary background areas, i.e., defines the clusters of frame differences
which must be transmitted. The reference picture, which is available also
at the receiver, is updated or replenished with quantized significant frame
differences as determined by the segmenter. Since the segmenter requires a
delay in order to operate efficiently, compensating delays must be included
in the coder loops. Buffer overflow or underf low can be prevent by reducing
or increasing the number of clusters transmitted. Forced updating
(transmitting a few lines of each frame as PCM) to alleviate the effect of
channel errors is not shown.
In the conditional replenishment coder of Figure 5 the segmenter defines the clusters of
frame differences which must be transmitted. The required channel bit-rate depends on
the size of the buffer and on the amount of motion in the picture which is to be
accommodated. During periods of rapid movement when the data generation rate of the
coder exceeds the channel rate, the buffer will begin to fill, and if this condition
persists for too long, the buffer will overflow and data will be lost. Figure 6 shows
SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information / 13
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
the required buffer size versus transmission bit-rate for a videotelephone picture in
which a person is engaged in normal conversation. Sixteen bits are assumed for
addressing each cluster and four bits on the average are assumed for each frame-
difference. For high quality pictures (30-60 level frame difference quantization)
multiword-length coding of frame differences prior to transmission is implied. The
videotelephone signal format used here is: 1 MHz analog bandwidth, 271 lines/frame, 30
frames/sec with two interlaced fields per frame. Thus, 3 Megabits/sec corresponds to an
average of 1.5 bits per picture element, blanking included.
BUFFER SIZE PITS)
CUUR
100K
BOK
60K
40K
20K
10K
BK
BK
4K
2.5
3
3.5 4
CHANNEL RATE (MEGABITS/SEC!
Figure 6 Buffer size versus transmission bit rate required for simple conditional
replenishment where the scene contains the head and shoulders view of a
person engaged in normal conversation. The curve is characterized by a
sharp knee at the point where the buffer is just large enough to smooth the
generated data over a field period (1/60 sec). Sixteen bits per cluster was
assumed for addressing and the four bits on the average was assumed per
frame difference. Very large buffers (several million bits) are capable of
smoothing between one human action and the next (the fidget interval), but
they introduce too much delay into the conversation to be useful.
Figure 6 shows that significant savings in bit-rate are achieved as the buffer size is
increased up to about 15,000 bits. Around this point, data peaks generated in small
areas of the picture are smoothed over an entire field period. Above 15,000 bits,
however, the buffer begins to smooth the data over periods of time longer than a field
interval (1/60 second here), and since the amounts of data in successive fields tend to
be very much the same, relatively little saving of transmission bit-rate is
obtained.(11) Very large buffers (several million bits) are capable of reducing the
bit-rate significantly by smoothing the data from one human action to the next; however,
besides being uneconomical they also introduce several seconds of delay into the
transmission, and for face-to-face conversations over videotelephone, for example, this
is intolerable. More will be said about smoothing the irregular data-rate when the
subject of channel sharing by several coders is discussed.
Thus, for simple conditional replenishment there seems to be little advantage in
providing more buffer capacity than is required to smooth the data over a field period.
If normal motion is to be accommodated in head and shoulders views of pernle in
conversation then with a 15,000 bit buffer, a transmission rate of about 3 Megabits/sec
or 1.5 bits per picture element (including blanking) is needed. This is a saving of
about 5:1 over 8-bit PCM. With a 600,000 bit buffer (which is about as large as can be
used without introducing annoying delay into the conversation) a transmission rate of
about 1.25 bits per picture element can be achieved. However, lower transmission rates
can be achieved without using large buffers by using adaptive or multimode coding.
These techniques are described next.
Aflptive or Multimode Conditional Replenishment
Viewers are accustomed to blurring in areas of the picture containing moderate or rapid
movement. The main reason for this blurring is that since television cameras have no
shutters, the light falling on a particular point of the target is integrated over one
14 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
frame period before being read out, and when there is movement this results in some
blurring in the picture even before transmission.
Since rapidly moving objects are blurred, there is no need to transmit these areas with
full picture resolution.(13,14,e) Thus, a different mode of conditional replenishment
coding can be used during periods of rapid movement. Since the transmission channel
capacity required for conditional replenishment coders is determined mainly by the bit-
rate generated during rapid motion, reducing the bit-rate during these periods lowers
the required bit-rate of the channel.
A particularly simple method of reducing the moving-area resolution and, hence, the bit-
rate in conditional replenishment systems is to transmit only every other significant
frame difference along a line instead of all of them as is normally done. The moving-
area picture elements which are not transmitted can then be obtained from the ones that
are by simple interpolation. Full resolution is maintained in the stationary background
areas of the picture since they are simply repeated from the previous frame. An
effective way of making sure that the reduced resolution coding mode is only used during
periods of rapid movement is to bring it in only when the number of bits stored in the
buffer exceeds a certain threshold value. Subsampling the moving area at half-rate in
this way almost halves the channel capacity required for transmission. It has been
found that further horizontal subsampling gives rise to visible degradation except with
extremely rapid motion.(14)
Feedback from the buffer can be used to control the picture quality in other ways as
well. For example, the frame difference threshold value used by the segmenter to test
for significance can be raised as the buffer fills. And when there is violent motion or
movement of the camera (periods during which the viewer is not very critical of picture
quality) a simple, but effective way of cutting down the data rate is to stop
conditional replenishment altogether for one frame period whenever the buffer threatens
to overflow and in this way give the buffer a chance to empty. At the receiver the
previous frame is simply displayed twice resulting in a slight jerkiness in the movement
instead of the smooth motion which is normally rerroduced.
NUMBER OF BITS IN THE BUFFER
EMPTY FULL
10K 20K 30K 40K 50K 60K
t
ST" START
SUBSAMPLING REPEAT/
SUBSAMPLING A FRAME
FORCE
UPDATING
4 5 6 7
SIGNIFICANT FRAME DIFFERENCE THRESHOLD
(ON A SCALE OF 0...255)
Figure 7 Multimode conditional replenishment where the coding mode is controlled by
the buffer fullness or queue length. Fairly active motion can be
accommodated with a 65K bit buffer and a transmission channel rate of 2
Megabits/sec or 1 bit per picture element. A saving of 8:1 over 8-bit PCM
is obtained. The significant frame difference threshold is raised from 4 to
5, 6 and 7 on a scale of 0...255 as the buffer-queue-length increases beyond
20K, 35K and 50K bits, respectively. All replenishment is stopped when the
queue length excees 65K bits, and it does not resume again until the queue
length falls below 2500 bits. The system starts subsampling in the moving
area at half rate when the queue length exceeds 20K bits and does not return
to full sampling until the queue length falls below 10K bits. One TV line
is sent as 8-bit PCM (forced upda*!--,q) whenever the queue length falls below
2500 bits.
Multimode conditional replenishment with forced updating, subsampling at half rate
during rapid movement and frame repeating to prevent buffer overflow as shown in Figure
7 requires, for videotelephone, a transmission rate of about 2 Megabits/sec - an 8:1
savircr in bit rate over 8-bit PCM.(e) Fairly active motion can be accommodated using a
buffer size of about 65K bits. Round trip delay introduced by the coder is about 1/15
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 15
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
of a second which is not noticeable in most situations. If, during violent motion, the
transmitting buffer fills then repeating one frame allows the buffer to completely empty
so that several frames of violent motion can then be transmitted before another frame
must be repeated. Emptying of the transmitter buffer during periods of very slow motion
can be conveniently prevented by a forced updating of one TV line as 8-bit PCM whenever
the number of bits stored in the buffer falls below a certain threshold. In fact, it is
prudent to transmit a few lines of each frame as PCM regardless of the state of the
buffer in order to accommodate for the effects of digital transmission errors which
would otherwise last indefinitely.
Conditional Subsampling in the Vertical Direction
Subsampling horizontally along 'a line in the moving area at less than half rate gives
rise to visible degradation unless movement is very fast-(is) However, moving-area
subsampling at half rate in the vertical direction is feasible, and like horizontal
subsampling it reduces the transmitted data-rate significantly. Because of interlace,
vertical subsampling at half rate in the moving area is the same as not transmitting any
frame differences at all during alternate fields. The moving area picture elements in
those fields would be obtained at the receiver by interpolation between the adjacent
fields(15) as shown in Figure 8. The effect on spatial resolution is usually negligible
partly because of the smearing due to movement and partly due to the fact that good
vertical resolution is difficult to obtain in a television picture in any case because
of the scanning mechanism.
FIELD 1
FRAME 1
FIELD 3
FIELD 4
FRAME 2
Figure 8 Subsamplina at half rate in the vertical direction. Because of interlace
this is equivalent to conditionally replenishing alternate fields only. In
the diagram, fields 1 and 3 would be coded and transmitted via conditional
replenishment. Moving area picture elements in the intervening fields would
be obtained at the receiver by interpolation. Thus, changed picture
elements in line B2 would be obtained by a four-way average of picture
elements in lines Al, CI, A3 and C3. Simple vertical subsampling and four-
way irterpolation in this manner does not give good rendition of moving
edges, however, and results in a slightly visible jerkiness in the picture.
By sending a small amount of additional information to repair these edges,
the jerkiness can be eliminated. This is called conditional vertical
subsampling.. (16)
The effect of field-to-field interpolation does have a noticeable effect on temporal
resolution, however, especially during periods of rapid motion.(15,16) Since the
interpolation is based on picture information occurring 1/60 of a second away in time,
moving edges are poorly represented, and this leads to jerkiness in movement rendition
when large areas move fast. This jerkiness is much less noticeable than with frame
repeating, but it is nevertheless detectable.
In most scenes only a small amount of additional information need be transmitted to the
receiver to enable it to adequately reproduce those edges which have been distorted by
the field-to-field interpolation.(16) This procedure is called "conditional vertical
subsampling." The white picture elements of Figure 9 show where simple field-to-field
16 / SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
interpolation is not good enough and additional information must be transmitted. The
original scene is the same as in Figure 4. In a single videotelephone field of Figure
4c there are 7870 picture elemtns in the moving area, but only 2250 of them as shown in
Figure 9 need be transmitted using conditional vertical subsampling. The channel bit-
rate can be reduced by about a factor of 1.5 using conditional vertical subsampling.
Figure 9 Picture elements in one field of Figure 4 which would be transmitted using
conditional vertical subsampling with trheshold T = 12 (on a scale of
0...255). In the fields to be interpolated (fields 2 and 4 of Figure 8) the
actual value of each moving area picture element is compared with the value
obtained by the four-way interpolation. If the magnitude of the difference
exceeds the threshold T, then the difference exceeds the threshold T, then
the difference is transmitted along with addressing information. Otherwise,
nothing is sent, and the four-way average is used at the receiver. In this
field, there are 7870 picture elements in the moving area, but only 2250
need be transmitted using conditional vertical subsampling.
Buffer and Channel Sharing
The channel bit-rate required by the conditional replenishment coder of Figure 5 is
determined mainly by the np_ak data-rate produced by the coder, i.e., the data-rate
generated during periods of active motion in the scene. Since periods of active motion
in videotelephone scenes, for example, tend to come in bursts, separated by long
intervals of inactivity, the channel bit-rate tends to be much larger than the long-term
aver2ge data-rate (excluding the forced-updating needed to prevent buffer emptying). As
was stated previously, an extremely large buffer (several million bits) would be needed
in order to appreciably smooth the data from one period of active motion to the next
and, thus, bring the channel bit-rate down closer to the long-term average data-rate.
Such a buffer would not be feasible because of economic reasons and because of the
excessive delay it would introduce into a videotelephone conversation.
At any given time, however, the data-rate averaged amongst many conditional
replenishment coders will be fairly close to the long-term average data-rate produced by
a single coder. Thus, if the data produced by many videotelephone users is combined
prior to buffering and transmission, as shown in Figure 10, the advantage of a large
buffer can be realized, namely lower channel bit-rate per data source, without the
disadvantages of high buffer-memory cost and excessive delay. This technique
automatically allocates more buffer space and channel capacity to those users which are
temporarily in rapid motion at the expense of those which are not. Computer simulations
have been carried out to evaluate this scheme, and results indicate that by combining
the data from about 15 conditional replenishment coders prior to transmission, the
channel bit-rate per coder can be halved and the buffer size requirements can be
drastically reduced. (19)
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 17
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
UNBUFFERED _
DATA
SOURCE
1
PREBUFFERS
SOURCE 11
2
SOURCE
3
SOURCE
4
SOURCE
.0*****?13?.
PRINCIPAL
BUFFER
TO
CHANNEL
'Figure 10 Buffer and channel sharing by several conditional replenishment coders. The
unbuffered data, (entering from the left) produced by each coder occurs at a
very irregular rate - short periods of high data-rate caused by active
motion in the videotelephone scene tend to be separated by long periods of
low data-rate. The data is first stored temporarily in individual
prebuffers, then transferred to the principal buffer via the multiplexing
switch, and finally transmitted over a single high capacity channel to the
receiver where the inverse operation takes place. This technique
automatically allocates more buffer space and channel capacity to those
sources which are temporarily producing the higher data-rates. By sharing
the channel in this way amongst about 15 conditional replenishment coders,
the channel bit-rate per coder can be halved.(19)
Frame-to-Frame Coding of Signals Which Have Been Previously
Transmitted Via Intraframe Coding
A possible configuration for long-distance transmission of video signals is the
hierarchical arrangement shown in Figure 11. Within large cities, for example, the
signals might be sent via relatively cheap intraframe coding techniques such as element
differential PCM. Between cities where transmission costs are much higher, interframe
techniques such as those described previously might be used to reduce the bit-rate as
much as possible.
A fundamental difficulty arises with this situation, however. All intraframe coders
introduce a certain amount of frame-to-frame noise values into the signal which,
although they may not be objectionable to the viewer, are detected by the interframe
coder as a large number of significant frame differences. The white picture elements in
Figure 12b show the frame differences which are larger in magnitude than 1.5 percent of
maximum using a video signal which has been passed through an element-differential PCM
codec. The number of significant changes far exceeds the number of changes which occur
when 8-bit PCM is used as the input signal as in Figure 4b.
In order to reduce the number of picture elements which must be transmitted it is not
sufficient to simply raise the threshold which determines whether or not a frame
difference is significant. Unacceptable picture quality results. Instead, advantage
must be taken of the distinguishing properties of the frame differences caused by
intraframe quantization noise as opposed to the distinguishing properties of the frame
differences cause by movement. For example, frame differences caused by intraframe
quantization noise tend to be uncorrelated both spatially and temporally, and they are
usually larger near edges in the picture because of companding. Frame differences
caused by movement, on the other hand, are highly correlated both spatially and
temporally and tend to be large only near moving edges in the picture.
Using information such as this, algorithms have been designed for segmenting the picture
with good accuracy into moving areas and stationary background areas.(17) The white
18 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
w41.
Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0
picture elements in Figure 12c show
the same signal used in Figure 12b.
still tends to be significantly
conditional vertical subsampling at
multiword-length coding of frame
coder for videotelephone has been
transmission rate of 2 Megabits/sec
the moving areas as defined by such an algorithm for
However, the number of moving-area picture elements
higher than with an 8-bit PCM input. PIT using
half rate, horizontal subsampling at half rate and
differences, a multimode conditional replenishment
simulated which operates with an average data
(1 bit/picture element) C18)
INPUT
INTRAFRAME
CODER
IINTRACITY DIGITAL LNK
INTRAFRAME
DECODER
iPeja
INTERFRAME
? CODER
NTERCITY DIGITAL LINK
INTERFRAME
DECODER
1Pcm
NTRAFRAME
CODER
INTRACITY DIGITAL LINK
INTRAFRAME
DECODER
OUTPUT
Figure 11 A possible configuration for long-distance transmission of videotelephone
signals. Cheaper intraframe codecs are used for short, intracity digi+al
links. Between cities where transmission costs are high, interframe codecs
are used to reduce bit-rate as much as possible.
Channel Bit-Rate
Coding Method Megabits/Sec
1.
2.
3.
5.
6.
8-Bit PCM
Element-Differential PCM
Simple Conditional Replenishment
Multimode Conditional Replenishment
Multimode Conditional Replenishment and
Conditional Vertical Subsampling
Multimode Conditional Replenishment, Conditional
Vertical Subsampling and Channel Sharing by 15 Users
TABLE
16
8
3
2
1.3
0.7
- Bit-rates required for videotelephone using various coding techniques.
Element-differential PCM is an intraframe coding technique while methods
3-6 are interframe techniques. Method 4 has been demonstrated while the
figures cited for methods 5 and 6 are based on simulations.
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 19
Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Figure 12 a) Videotelephone scene containing movement.
b) The white picture elements denote frame-to-frame differences which are
larger in magnitude than 1.5 percent of maximum using a video signal
which has been passed through an element-differential PCM codec. The
number of significant changes far exceeds the number of changes in
Figure 4c where 8-bit PCM was used as the input signal.
C) Using the same video signal as in a) the white picture elements here
denote the moving area as defined by a more sophisticated segmenting
algorithm.(17) This algorithm takes advantage of a number of signal
properties which enable it to distinguish between frame differences due
to intraframe quantization noise and the frame differences caused by
movement.
Conclusion
This article has covered some of the known methods for taking advantage of frame-to-
frame redundancy in video signals in order to reduce the transmission channel bit-rate.
The problem was approached from two sides. First, since only a portion of the picture
elements change significantly between frames, there is considerable advantage in
transmitting only the changed area. i.e., using conditional replenishment.(10,11)
Second, since viewers are less critical of picture quality in moving areas than in the
stationary areas, a saving of bit-rate can be achieved by progressively lowering the
resolution of the moving area as the amount of movement increases.(13,8,14)
The bit-rate savings possible using various frame-to-frame coding techniques on
videotelephone signals are shown approximately in Table I. A saving of 8:1 in bit-rate
over 8-bit PCM has been demonstrated using multimode conditional replenishment,) while
20 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
the figures cited for conditional vertical subsampling(16) and channel sharing(19) are
based on simulations. All but rapid movement of large areas is accommodated with
excellent picture quality. With large area movement, frame repeating is used which
causes some visible jerkiness in the picture. If the input signal contains noise due to
a previous intraframe coding and decoding, then too many spurious frame differences are
present to use only these methods. More sophisticated moving area detection
techniques(17) must be used to transmit such a signal efficiently.
Transmission errors in the digital channel have been mentioned only in passing. A basic
principle in the efficient coding of information is that the more one removes redundancy
from a data stream the more vulnerable to transmission errors the information becomes.
This is certainly true of picture coding. However, some information is more important
to the receiver than o4-her information. For example, in conditional replenishment,
addresses and synchronization information are far more important than frame differences.
Thus they should be protected by adding redundancy in a controlled fashion. In this
way, the effect of transmission errors can be minimized.
Other techniques for removing frame-to-frame redundancy are currently being
explored. (20-22) However, most of these schemes, like the techniques described in this
paper, are fairly expensive to implement, and this tends to limit their usefulness at
the present time to only long-distance transmission links where the expense of the
redundancy reduction systems are significantly outweighed by the savings in transmission
costs. In the future, however, prices of digital logic and storage are expected to
continue the dramatic decline experienced in the last several years making some of these
bit-rate reduction schemes economically feasible for digital transmission over shorter
distances.
Acknowlelgment
This article describes the work of a number of people, many of whom we have had the
opportunity to talk with personally. We would like to express our gratitude to them for
the many stimulating discussions which we have had.
REFERENCES
1. Special Issue on Redundancy Reduction, Proceedingg of the IEEE, v. 55, no.
1967.
2. Special Issue on Digital Picture Processing, Proceedings of the IEEE,
768-820, July. 1972.
3, March
V. 60, pp.
3. G. Lapidus, "The Domain of Magnetic Bubbles," IEEE Spectrum, v. 9, no. 9,
1972, po.
4.
September
W. S. Boyle and G. E. Smith, "Charge-Coupled Devices - A New Approach to MIS Device
Structures," IEEE Spectrum, v. 8, no. 7, July 1971, pp. 18-27.
5. L. M. Terman, "MOSFET Memory Circuits," Proceedingg of the IEEE, v. 59, no. 7, July
1971, pp. 1044-1058.
6. G. S. Kino and H. Matthews, "Signal Processing in Acoustic Surface-Wave
IEEE Spectrum, V. 8, no. 6, August 1971, pp. 22-35.
7. R. C. Brainard, F. W. Mounts, and B. Prasada; "Low Pesolution TV:
Effects of Frame Repetition and Picture Replenishment," Bell System Tech.
pp. 261-271, January 1967.
8. J. C. Candy, M. A. Franke, B. G. Haskell, and F. W. Mounts, "Transmitting
as Clusters of Frame-to-Frame Differences," Bell gygtem Tech. J., v. 50,
1888, July-August 1971.
Devices,"
Subjective
J., v. 46,
Television
pp. 1877-
9. R. Schaphorst, "Frame-to-Frame Coding of NTSC Color TV," in Symposium on Picture
Bandwidth Compression (Mass. Inst. Technol., Cambridge, Mass., April 1969).
10. A. J. Syler, "Real-Time Recording of Television Frame Difference Areas," Proc. IEEE
(Corresp.) v. 51, pp. 478-480, March 1963.
11. F. W. Mounts, "Video Encoding System with Conditional Picture Element
Replenishment," Bell System Tech. Jz., v. 48, pp. 2545-2554, September 1969.
12. J. 0. Limb, "Buffering of Data Generated by the Coding of Moving Images," Bell
System Tech. J., v. '11, pp. 239-259, January 1972.
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 21
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
13. A. J. Seyler, "The Coding of Visual Signals to Reduce Channel-Capacity
Requirements," Proc. IEEE, v. 109, pt. C, pp. 676-684, September 1962.
14. R. F. W. Pease and J. 0. Limb, "Exchange of Spatial and Temporal Resolution in
Television Coding," Bell System Tech. J., v. 50, pp. 191-200, January 1971.
15. J. O. Limb and R. F. W. Pease, "A Simple Interframe Coder for Video Telephony,"
Bell ystem Tech. J., v. 50, pp. 1877-1888, July-August 1971.
16. R. F. W. Pease, "Conditional Veritcal Subsampling - A Technique to Assist in the
Coding of Television Signals," Bell Eystein Tech. J., v. 51, pp. 787-802, April
1972.
17. D. J. Connor, J. O. Limb, R. F. W. Pease, And W. G. Scholes, "Detecting the Moving
Area in Noisy Video Signals," to be published.
18. D. J. Connor, B. G. Haskell, and F. W. Mounts, "A Frame-to-Frame PICTUREPHONE
Coder for Signals Containing Differential Quantizing Noise," Bell astern Tech. J.,
v. 52, pp. 35-51, January 1973.
19. B. G. Haskell, "Buffer and Channel Sharing by Several Interframe PICTUREPHONE
Coders," Bell Sy2tem Tech. J., v. 51, pp. 261-289, January 1972.
20. B. G. Haskell and J. O. Limb, "Predictive Video Encoding Using Measured Subject
Velocity," U. S. Patent 3,632,865; January 4, 1972.
21. B. G. Haskell, "Frame-to-Frame Predictive Coding of Videotelephone Signals," IEEE
International Conference on Communications - Conference Record, v. 8, p. 31-51,
June 19-21, 1972.
22. F. Rocca and S. Zanoletti, "Bandwidth Reduction Via Movement Compensation on a
Model of the Random Video Process," IEEE Trans. on Communications, v. COM-20, pp.
060-965, October 1972.
22 / SINE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
BANDWIDTH COMPRESSION OF MULTISPECTRAL DATA
A. Habibi and A. S. Samulon
TRW Systems Group
Redondo Beach, California 90278
Abstract
The coding methods suitable for bandwidth compression of multispectral imagery are considered. These
methods are compared using various criteria of optimality such as MSE,signal-to-noise ratio, recognition
accuracy, as well as computational complexity. Based on these results the candidate methods are reduced to
three recommended methods. These are KL-2 Dimensional DPCM, KL-cosine-DPCM and a cluster coding method.
The performance of the recommended methods are examined in the presence of a noisy channel and concatenated
with a variable rate (Huffman) encoder.
Introduction
Improvements in earth resource satellite MSS scanners will result in higher spatial resolutions, great-
er number of spectral bands, and faster output rates for the multispectral earth resources data. This will
require larger bandwidth for transmission of the multispectral data as well as larger storage capacities
and more complicated ground processing systems for classification and dissemination of the data. Compression
of data at the source can reduce the costs and the constraints upon each of these parts of a scanner-to-
user information management system. Most studies of data compression techniques for multispectral data
have considered only one coding technique. General Electric and Philco-Ford have considered only simple
and elementary coding algorithms [1-2]. Purdue University has considered use of transform coding techniques
and clustering algorithms for compressing the multispectral data in separate studies [3,4]. TRW Systems
Group considered coding multispectral data using errorless entropy coding of this data for archiving
applications [5]. Jet Propulsion Laboratory has considered using clustering algorithms for joint bandwidth
compression and classification of multispectral images [6]. These studies have used different data bases.
They leave out many types of efficient coding algorithms, and generally do not include the effect of
important parameters such as channel noise and the sensor behavior on the performance of the individual
coding algorithms. In addition due to use of different data bases and different criteria of optimality
a meaningful comparison of various coding techniques is impossible.
We consider a number of promising algorithms for on-board compression of the multispectral data. Included
are Differential Pulse Code Modulator (DPCM) and transform coding systems as well as hybrid coding techniques.
The performance of these systems are improved by concatenating these coding algorithms with variable rate
(Huffman) encoders. In addition, we examine a cluster coding method which reduces the bandwidth of multi-
spectral imagery by classifying the multispectral data into a fixed number of clusters. The clusters are
approximated at the receiver using a classified image and the cluster averages. These algorithms are
compared based on their performances on representative ERTS data. The experimental results are obtained by
simulating the algorithms on digital computers. A number of criteria of optimality such as mean square
error, peak-to-peak signal to noise ratio, retention of classification accuracy and the computational
complexity are used to compare these techniques.
Criteria of Bandwidth Comparison
Several criteria serve as a basis for comparison of alternative compression techniques. These include
distortion, complexity, and susceptibility to channel and sensor effects.
Distortion
In order that data compression be possible, one or both of two conditions must be satisfied. The first
condition is that the data have some structure or redundancy. By knowing something about the data struct-
ure, data compression can be accomplished without losing any of the information the data contains. Hence,
data compression that relies solely on knowledge of data structure is called information preserving or
error free data compression. The second condition which leads to compression is that the image user have
interest in only limited accuracy or a particular aspect of the data collected. In this case, data which
is of low value to the user can be discarded, resulting in data compression. Since information discarded
is permanently lost, compression based on user interest is called information destroying data compression.
In order to determine whether a particular information destroying compression algorithm preserves
sufficient information for a particular use, a criterion must be defined by which the information loss can
be measured. Because of its mathematical convenience, a number of authors have used mean square error as
a measure of distortion. We use this as one measure during our study. It has the advantages of being
widely used and understood, of making tractable analytical treatments of the problem, and of leading to
satisfactory results for many applications. On the negative side, it is a criterion not specific to
particular uses of the data.
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 23
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
A criterion closely related to mean square error is the signal to noise ratio. Indeed this criterion
can be considered a normalized form of the MSE. Peak-to-peak signal to root mean square (RMS) value of the
noise as well as RMS signal-to-noise ratio is widely used by the television industry as a measure of
television signal quality.
Recognition Accuracy
Another criterion which is more application oriented is the algorithm's effect on classification
accuracy. For example, if each raster point in the image is to be classified as corresponding to some
crop or land use, based on the recorded brightnesses in the several spectral bands, the increase in class-
ification errors resulting from performing classification on the reconstruction of the compressed data
rather than on the original image is a measure of the relevant information lost due to compression.
The increased probability of misclassification can be used as a distortion measure, but, unfortunately,
the complexity of the classification procedure generally precludes calculation of the rate distortion
function.
The problem of classification is equivalent to partitioning the space of possible measurement values.
All measurement values falling within one element of the partition are viewed as representative of some
category. For example, in the case of earth resources data, one might assume that all recorded brightness
values within some range are representative of wheat.
Two basic approaches exist for determining a partition. In one method, a set of categories is
specified. The space of measurement values is then broken into segments such that the values in a partic-
ular segment correspond with one of the prespecified categories. The segments are chosen in a way that in
some sense is "good". In the other approach to determination of a partition, no categories are specified
in advance. Instead the measurement values are examined to see if they cluster into groups. In general it
is difficult to define what is meant by certain values belong to one cluster and certain values belong to
another cluster. A myriad of criteria have been proposed. Most of them yield different clusters, for some
sets of measurement values, than a human would intuitively obtain.
Now we define what is meant by preservation of classification accuracy under a distortion of the
measurement space. We assume that we have two measurement spaces, each having its own partition. In the
first space are the undistorted measurements, x, and in the second are the distorted measurements. In order
that classification accuracy be preserved the two partitions should be such that if y is the distorted
value corresponding with x, then the category which y represents should be the same as the category which
x represents. Using more precise notation, we make the following definitions:
X a space of measurements with elements x
PX a partition of X (PX (x) is the category which x represents)
a space of measurements with elements y
a partition of Y
a mapping form X into Y
Then we say that Py preserves the classification accuracy of Px under the measurement distortion D.
We can see in Figure 1 that class 1 and class 2 can be distinguished equally well in the space X and
the space Y with the appropriate partitions. Under cetain distortions, however, it may not be possible to
preserve classification accuracy. It is always possible to find a partition which preserves classification
accuracy. It is always possible to find a partition which preserves classification accuracy if D is a 1 to
1 function since, in that case, values which can be distinguished in space X can be distinguished in space
Y. If D is not a 1 to I function, it may still be possible to preserve classification accuracy. For ex-
ample a function which maps all values in one category in the original space into one point in the distorted
space can still preserve classification accuracy as long as it maps values in different categories into
different points.
We now fit the definition we have made into the context of earth resources data. For each point in
the image we obtain a set of brightness measurements in several spectral bands. The space of measurement
values in the various spectral bands can be represented by the space X in the definition above. In
addition, further measurements can be obtained by taking into account the spatial variation of brightness
in the neighborhood of each point. Such measurements are often called "texture" measurements. These
measurements can also be included in the space X. Thus, for each point in the image we have a space X of
measurement values. Using pattern classification, we can determine, based on the measurement space, what
category (e.g., wheat, petroleum bearing land) is represented by a particular point in the measurement
space. Data compression which is classification accuracy preserving is then obtained in two ways. First,
several points in the same category can be mapped into one point. Second, spatial redundancy of the
measurement values can be exploited to achieve further data compression.
It was decided to use a clustering technique for classification for several reasons. First, a
technique using prespecified categories would have required training samples to define the best partition.
24 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Thus, ground truth information would be required for each prespecified category represented in the data.
Second, each category (e.g. winter wheat) has a different spectral signature depending on sun angle,
latitude, season, and so on. Therefore, ground truth data for the particular data sets processed would be
required. Third, we wanted to use a data set with a number of different categories, including a variety of
crops, water, desert, and mountains so that any conclusions developed during the study would be relevant for
a number of disciplines.
In the classification accuracy determination procedure, we calculate the centroids of the clusters.
Unfortunately, calculation of the centroids is an iterative process which is quite time consuming. Hence,
it is desirable to sample the available imagery and determine the centroids using a subset of the picture
elements. For convenience we have chosen to use only elements belonging to every fourth row and column of
the imagery in determining the centroids.
Using the clustered subset we obtain the centroids of the clusters. By centroid is meant the following.
Each cluster has a particular mean value in every spectral band. The centroid of a cluster is the set of
mean values for that cluster. Figure 2 shows pictorially the meaning of centroid for the case of only two
spectral bands.
The centroid finding program has a number of steps that can be summarized as follows: A particular
number of clusters is selected. More or less arbitrarily, a set of initial centroid values are chosen. Each
sample value is assigned to the category corresponding to the closest centroid. Then the centroids of the
sample values belonging to each class are computed. Now the data samples are mapped into those categories
having the closest of the new centroids. This entire process is repeated until very few of the sample
vloes change category on one of the iterations. At that point, the so called "Swain-Fu" distance between
each pair of classes is measured. If the minimum distance between classes is less than some prescribed
distance, the two closest classes are merged and the whole process is repeated with one fewer class.
Otherwise the clustering algorithm terminates. A minimum number of clusters is also specified in order
that merging not continue indefinitely.
After completing determination of the cluster centroids, the entire 256x256 image is classified. This
is accomplished by determining to which centroid each sample is closest. The output of this program is an
image in which the value corresponding with any pixel is the number of the cluster to which it belongs.
In order to compare the clusters obtained on the original imagery and on the reconstruction of the
compressed imagery, we first determine the correspondence between clusters in the two sets of imagery. To
illustrate what is meant by finding a correspondence between clusters, consider the case when cluster 1 in
the original image represents water, and in the reconstruction water is represented by cluster 3. In order
to save time this correspondence determining program does not find the cluster correspondence which will
minimize the difference in classification but instead first minimizes the difference in classification for
the subarea of pixels belonging to the cluster with the most elements, then minimizes the difference in
classification error for the subarea of pixels belonging to the cluster with the next greatest number of
elements, and so on.
Once the cluster correspondence is made the percentage of picture elements classified differently in
the original imagery and the reconstruction is determined. This number is used as the classification in-
accuracy for the particular bandwidth compression method.
Implementation Complexity
Once it is shown that several compression techniques can significantly reduce the required data rate
without introducing unacceptable distortion, it is necessary to compare the implementation complexity of the
techniques. Compressor complexity is a particularly significant factor for spacecraft applications. Re-
construction complexity on the ground has less effect on overall system costs.
To evaluate complexity, an algorithm was proposed for each technique. Then the required number of
additions and multiplications were counted and the needed buffer sizes determined, based on typical block
sizes and numbers of spectral bands.
Sensitivity to Sensor Effects
The properties of the sensors that gather multispectral earth resources data can be expected to affect
the data structure and thus in turn affect compression performance. As a typical sensor to be used in the
next decade for gathering multispectral earth resources data, we considered the thematic mapper projected
for NASA's EOS. The thematic mapper is a multispectral scanner similar to that on the LANDSAT (ERTS). It
uses photodiodes as transducers and a moving mirror to scan lines perpendicular to the satellites track.
The sensor properties that can be expected to affect compression are nonuniformity and nonlinearity of
the photodiodes, and geometric distortion and spectral misregistration due to the optics and scanning
patterns.
For a fixed mean square error, the compression obtainable with a particular compression technique can
be related to the correlation coefficient of the process of which the data is a sample function. Based on
the scanning geometry of a likely thematic mapper and on the instruments present specifications, the
variation in correlation coefficient over an entire image was predicted under the assumption that the data
be samples of a wide sense stationary process. These variations were in turn related to increased mean
square error or equivalently to decreased signal to noise ratio.
SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information / 25
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Bandwidth Compression Methods for Multispectral Imagery
Multispectral data can be modeled by a stochastic process f(x,y,k) which is a function of three discrete
variable x, y, and k; x and y refer to spatial variables, and k is the spectral variable. In theoretical
studies of image bandwidth compression methods one often models the data as a stochastic process having
statistics identical to those of the imagery so that one can then evaluate the performance of various
encoders on the process. This approach to the analysis of algorithms used for coding video data is often
carried out by assuming that the stochastic model f(x,y,k) is Markov on all three variables x, y, and k.
This would imply that the auto-covariance of the assumed stationary process f(x,y,k) is exponential in all
three dimensions [3], i.e.
R(x,x^,y,Y,k,k) =e-cillx-x^1-azIY-Y^1-0431k-kl
(1)
This is a desirable assumption since it simplifies the analysis of the problem to a considerable degree. The
results are valid on most naturally obtained video data that'show ordinary movements in both spatial and
temporal directions. Considering this model for MSS data one proceeds with the correlation measurements
which specify (1102, and a3 in (1). These measurements also verify the correctness and the accuracy of the
above model
Measurements of the spatial correlation of the data reveal that the correlation of the data in the
horizontal and the vertical directions is indeed exponential. However, spectral correlation does not
reduce exponentially by moving across the bands. Indeed various bands show similarities and differences
which are totally different from interframe behavior of other video data encountered in television or in
reconnaissance flights. The following summarizes the statistics of the MSS LANDSAT imagery.
a) Large positive correlation between the red and the green bands.
b) Large positive correlation between the two infrared bands.
c) Small negative correlations between the red and the two infrared bands.
While these correlations apply on the average they do not apply for many particular classes - water being
one counterexample.
Transform Coding
MSS data is three-dimensional, it possesses spectral as well as spatial correlation. Three-dimensional
transform coding methods take advantage of the correlation of the data in all three-dimensions for bandwidth
compression. In this study we will use three-dimensional Fourier, Cosine, Hadamard and Slant transforms.
Delta Modulators and Predictive Coding
In the class of predictive coding methods only DPCM systems are considered. Delta modulators are not
considered because for bit rates higher than one bit per picture element they require sampling the analog
signal at a higher sampling rate. Besides it has been shown that the performances of delta modulators and
adaptive delta modulators are suboptimal to the performance of two-dimensional DPCM encoders for image data
[8]
In addition to DPCM and transform coding techniques the hybrid encoders that combine transform coding
systems with DPCM encoders have been shown to give good performance for both intra-and inter-frame coding
of television signals [9-10]. These systems as applied to MSS data can be divided in three catagories;
1) Systems that perform a two-dimensional transform of each band of the MSS data
and use a bank of DPCM encoders to process the transformed data across the
individual bands.
2) The systems that take a transform across the spectral bands and use a two-
dimensional DPCM encoder to process the data in the spectrally transformed domain.
3) Systems that follow a transformation in the spectral domain of MSS data by a second
transform in the horizontal direction (scan direction) and a DPCM encoder in the
vertical direction.
The hybrid systems in category (1) are rejected because most multispecral data have a small number of
bands and this does not allow DPCM systems to reach a steady state. In addition small correlation of the
spectral bands and the absence of the Markovian characteristics limit the performance of the DPCM systems.
26 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Cluster Coding of Multispectral Data
Many important uses of earth resources data rely heavily on the use of computerized classification and
pattern recognition of the multispectral earth resources imagery. For many classification applications
multispectral data is first used to obtain a clustered image. This presents an alternate approach to
compressing the bandwidth of multispectral imagery consisting of classifying the multispectral data on-
board the space vehicle and transmitting only the clustered imagery. The problem with this approach is
that there are many users who are interested in information contained in the individual spectral bands as
well as the classified imagery. For this reason in addition to the classified picture, which contains an
arbitrary number of clusters, additional information should also be transmitted such that individual bands
of the original multispectral data can be reconstructed with an arbitrarily small level of degredation using
the classified imagery and the additional information.
Different approaches to this method of bandwidth compression has been discussed in recent literature
[4,6]. In this study we will discuss the cluster coding algorithm that uses Swain-Fu distance in gener-
ating various clusters.
Coding Methodology
Classification of multispectral data discussed in a previous section is a method of quantization in
the spectral domain. That is, a number of picture elements in the measurement domain are represented by a
single point in the four-dimensional space. This is illustrated in Figure for a two-dimensional space.
Thus, the combination of the clustered imagery and the centroids represent a quantized form of the multi-
spectral data which can be used to reconstruct a quantized form of each individual band. Thus, reduction
of a multispectral image to an image of clusters and centroids for each cluster actually corresponds to
exploiting the spectral correlation of multispectral imagery. The spatial correlation inherent in the
multispectral imagery is preserved, to some extent, in the form of spatial correlation in the classified
imagery. Naturally in addition to spatial correlation of the classified imagery,correlation of the centroids
can be used for further compression of the bandwidth of the multispectral data.
The cluster coding algorithm discussed here uses the following steps for encoding and decoding the
multispectral data.
1. A multispectral image is divided into blocks of 16 by 16 picture elements in each band.
2. Using all spectral components each block is partitioned into a number of classes. A
clustered image and centroids corresponding to each cluster are the output of this step in
the coding operation. The number of clusters in each block can be fixed or it can be allowed
to vary generating differing numbers of clusters for each block. The latter corresponds to an
adaptive bandwidth compression technique that operates at a variable bit rate where the former
could be made to operate at a fixed bit rate. Even when the number of clusters is allowed to
vary from one block to the other the maximum and the minimum number of clusters in each block can
be fixed. This sets a fixed upper-bound and a lower-bound on the output bit rate which can be
used for a more effective control of a storage buffer that has to be used with any variable-rate
encoder.
3. The receiver reconstructs each block of the multispectral image by generating the individual bands
in each block from the clustered image and the corresponding centroids of those clusters. The
procedure is to examine each point in the clustered image and specify to which class it belongs.
Then individual bands corresponding to the particular picture location are reconstructed by
choosing their values equal to the centroid of that particular class.
The centroids are real numbers. Thus, they must be quantized and encoded. We have used the rather
inefficient (PCM) scheme for coding the centroids for the preliminary results reported here. However, the
ij th block in clustered imagery can be coded using [1o92 C],
binary digits where Cis the number of
clusters in block indexed by i, j and [log,C]+ is the smallest integer larger than log,C44. Using [log2C]+
bits for coding the ij th block in the cla?sified imagery is a-rather inefficient use bf'Jbinary digits.
Since it does not exploit any part of the spatial correlation in the clustered imagery. Several methods
for efficient coding of the clustered imagery exist. One is using contour tracing combined with statistical
coding algorithms such as Huffman encoding of directionals and gray levels. The other is use of
differential encoders combined with the statistical coding algorithms. These modifications reduce the bit
rate required for transmitting the clustered image and the centroids and thus correspond to a further
bandwidth compression of the multispectral imagery.
The average bit rate per sample per band R? for the ij th block of imagery for the cluster coding
algorithm using PCM transmission of centroids .Jand the clustered imagery is,
R [log C..]-1. P C..
2 ij ij (2)
R.. -
N2
where 2P is the number of levels used in quantizing each centroid component, B is the number of bands and
N2 is the number of samples in each block.
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 27
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Equation (2) shows that the bit rate is directly related to the number of clusters in each block and the
number of levels that one uses for quantization of centroids. It is inversely related to the number of
samples in each block and the number of spectral bands in the multispectral data. Since one has to allow
for a large number of clusters in each block to recover the details of multispectral imagery the method
becomes more efficient for a large number of spectral bands.
The idea of dividing the multispectral imagery into blocks of N2 samples and cluster coding each
block is not essential for the coding algorithm. Indeed if one is interested only in a classified image
using the whole image as one block will result in a very efficient encoder using this procedure. However,
to reconstruct details in various bands requires a large number of clusters, which increases the compu-
tational complexity of the system enormously. On the other hand using relatively small block sizes detailed
information in various spectral bands can be reconstructed rather accurately using a moderate number of
classes.
Experimental Results
Simulation of DPCM, transform, hybrid and clustering methods using the criteria discussed iniprevious
sections leads to recommending the following three bandwidth compression techniques for multispectral imagery;
1. A hybrid method using Karhunen-Loeve (KL) transform in the spectral domain followed by a
two-dimensional DPCM encoder.
2. A hybrid method using KL transform in the spectral domain followed by a cosine transform in the
horizontal and a DPCM encoder in the vertical direction.
3. The cluster coding method.
In methods (1) and (2) Haar transform is the most suitable deterministic replacement for the KL transform.
In (2) Hadamard or Slant transform can replace the cosine transform with almost unnoticeable effects on the
performance of the technique. It also was found that eliminating the spectral transformation effects the
performance of methods (1) and (2) by about 1.5dBS, based on the results using the LANDSAT imagery shown
on Figure 3.
In the following the recommended techniques are compared using various criteria of optimality such as
MSE, signal-to-noise ratio, recognition accuracy, and computational complexity. The performance of methods
(1) and (2) is also evaluated in presence of channel noise and a variable rate (Huffman) encoder which
further reduces the output bit rate without effecting the system performance.
Comparison of the Recommended Techniques Based on MSE and Signal-to-Noise Ratio
Figure 4 shows the performance of the three-dimensional hybrid encoder using KL-Cosine transforms with
the DPCM encoder in comparison with the other two recommended bandwidth compression systems using S/N as the
criterion of optimality. One is the system that follows the Karhunen-Loeve Transform in the spectral domain
with a two-dimensional DPCM encoder. The other is the cluster coding technique. It is shown that the
performance of the encoder using KL transformation followed by a two-dimensional DPCM encoder performs
significantly worse than the hybrid encoder (KL-Cosine-DPCM) at low bit rate. However, their performances
are rather similar at higher bit rates (about 2 bits per sample per band). The performance of the cluster
coding method is superior to that of the hybrid encoder at low bit rates. However, it gives inferior
results at high bit rates. In this aspect its performance is completely oppositeto that of the KL-2 Dimen-
sional DPCM encoder.
Figure 4 also shows the performance of the hybrid (KL-Cosine-DPCM) and KL-2 Dimensional DPCM encoders
in concatenation with a variable rate encoder. The variable rate encoder is the Huffman encoder which
reduces the output bit rate by assigning shorter code words to more probable output levels and longer code
words to the less probable output levels. The effect of entropy coding on the performance of the KL-Cosine-
DPCM encoder is less than its effect on the performance of KL-2 Dimensional DPCM System. This is because in
KL-Cosine-DPCM System a bank of DPCM modulators are used where the KL-2 Dimensional System uses a single
DPCM modulator. Using a number of DPCM modulators increases the total number of output symbols, thus re-
ducing the number of times each symbol occurs. This will reduce the effectiveness of the variable-length
encoder in reducing the output bit rate. The end result is that the performance of the KL-2 Dimensional
DPCM System improves an average of 25% where the performance of the KL-Cosine-DPCM System improves by about
10%.
Comparison of the Recommended Techniques Based on Recognition Accuracy
Figure 5 shows the performances of the three recommended bandwidth compression methods in "preserving
the classification accuracy of the reconstructed multispectral imagery." Using this criterion the comparative
performance of the KL-2 Dimensional DPCM System and KL-Cosine-DPCM System is basically the same as it was
for using signal-to-noise ratio as the criteria of optimality. This shows a strong correlation between the
signal-to-noise ratio and the classification accuracy.
28 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Figure 5 also shows the performance of the cluster coding method. The classification accuracy
decreases as a result of increasing the number of clusters in each block up to 16 clusters per block. Then
the classification accuracy increases sharlpy when 32 clusters per block is allowed. This behavior is due
to the fact that the cluster coding algorithm is using a classification procedure on blocks of 16 by 16
samples while classification accuracy is measured using a classification procedure on blocks of 256 by 256
samples. A classification accuracy of 100% results if the block size for measuring the classification
accuracy and the block size in cluster coding methods are the same. The argument for using large block
sizes in classifying multispectral data is not very solid. This depends very much on users and applications
of multispectral imagery. Large block sizes are likely to be used in applications where one is only inter-
ested in a limited number of objects in a large segment of imagery, as in hydrology and urban planning.
For this application the imagery obtained using cluster coding algorithm has a rather poor classification
accuracy compared to the other two recommended bandwidth compression algorithms.
Comparison of the Recommended Techniques Based on System Complexity
The implementational complexity of the selected bandwidth compression techniques is summarized in
Table 1. The results are in terms of the number of computations and memory per picture element. This table
also shows typical numbers for 4 channel data and the block sizes used in simulating the compression methods.
A detailed analysis of the complexity of the cluster coding method is not performed. But based on the
computer time required for compressing the bandwidth of the representative LANDSAT data using this system
and comparing it with the computer time required in simulating the KL-2 Dimensional DPCM encoder, it was
concluded that the cluster coding algorithm is about one order of magnitude more complicated.
Comparison of the Recommended Techniques Based on Channel Noise and Sensor Effects
Figure 6 shows the reduction in the signal-to-noise ratio and the classification accuracy which is
caused by introduction of a goisy chqnnel. The results refer to a binary symmetric channel with bit-
error rates ranging form 10-4 to 10-4. Both algorithms degrade significantly at higher bit-error rates.
The system using the KL-2 Dimensional encoder has greater degredation. Figure 7 shows the reconstructed
form of band 3 of the LANDSAT data after bandwidth compression using Haar - 2 Dimensional DPCM encoder at
bit error rates of 10-3 and 10-4. The propagation of the channel error is clearly visible for bit-error
rate of 10-3. However, this is less significant than the effect of channel error for a simple two-
dimensional DPCM system. This is because the recommended system using Haar - 2 Dimensional DPCM encoder
utilizes the two-dimensional DPCM encoder in the spatial domain. The signal at the receiver of the two-
dimensional DPCM system is then transformed by the inverse of the Haar transform to reconstruct multi-
spectral data. The total effect of the inverse Haar transform at the receiver is to distribute the channel
error(and its propagation)among all spectral bands. Although this leaves the total error unchanged, it
distributes the channel error among all spectral bands thus making it less objectionable to the human
vision.
The sensor imperfections affect the correlation of the data which in turn affects the performance of
the bandwidth compression techniques. Analytical results show that this effect is minimal.
Conclusions
The following are the major conclusions of the study
(1) Fixed rate encoders give excellent quality for the reconstructed imagery at 4 to 1 compression
ratio. This compression ratio is increased to 6 to 1 by making the system operate at a variable rate by
adding a Huffman encoder to the system. At this compression ratio the method using Haar 2 Dimensional DPCM
system has a signal-to-noise ratio better than 35d8S, a classification accuracy better than 91% and gives
reconstructed imagery which are almost indistinguishable from the originals.
(2) At a fixed bit rate and a compression ratio of 8 to 1 the Haar-Cosine-DPCM system gives
acceptable results. This corresponds to signal-to-noise ratio of better than 310S, recognition accuracy
of better than 82% and slightly degraded imagery.
(3) Cluster coding methods can be used to obtain a compression ratio of 12 to 1 which corresponds to
a signal-to-noise ratio of better than 29d8S, recognition accuracy of larger than 71% and possibly accep-
table reconstructed imagery.
(4) Compression ratios of 16 to 1 and higher correspond to noticeably degraded imagery which are
only acceptable for some users.
(5) The effect of the sensor imperfection on the performance of the candidate bandwidth compression
methods was minimal. Therefore, it had minimal impact on the choice of the recommended techniques.
(6) Channel error effects are minimal for transform and DPCM systems for bit error rates less than
10-4.
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 29
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
References
[1] Definition of the Total Earth Resources System for the Shuttle Era, General Electric Space
Division, Contract No. NAS9-13401, NASA Lyndon B. Johnson Space Center, Houston, Texas 77058.
[2] Multispectral Scanner Data Redundance Study, Final Report, May 15, 1971, Philco-Ford, Contract No.
NAS5-21162, NASA/Goddard Space Field Center, Greenbelt, Maryland.
[3] P. J. Ready, P. A. Wintz, "Information Extraction, SNR Improvement, and Data Compression in
Multispectral Imagery", IEEE Transactions on Communications, Vol. 21, October 1973, pp. 1123-1131.
[4] J. N.Gupta, P. A. Wintz, "A Boundary Finding Algorithm and its Applications", IEEE Transactions
on Circuits and Systems, Vol. CAS-22, No. 4, April 1975, pp. 351-362.
[5] C. L. May, D. J. Spencer, "Data Compression for Earth Resourcs Satellites", International
Telemetering Conference, Los Angeles, September 1972.
[6] E. E. Hilbert, Joint Classification and Data Compression of Multidimensional Information Source
Application to ERTS, Conference Record Vol. II, Internation Conference on Communications, June
1975, pp. 27(1-6).
[7] A. Habibi, "Two-Dimensional Bayesian Estimate of Images", IEEE Proc. Vol.,60, No. 7, July 1972
pp. 878-883.
[8] A. Habibi, "Delta Modulation and DPCM Coding of Color Signals", Proceedings of International
Telemetering Conference, Dec. 1972, pp. 363-373.
[9] A. Habibi, "Hybrid Coding of Pictorial Data", IEEE Trans. on Communications, Vol. COM-22, No. 5,
May 1974, pp. 614-624.
[10] J. A. Roese, W. K. Pratt, G. S. Robinson, A. Habibi,"Interframe Transform Coding and Predictive
Coding Methods", Conference Record of International Conference on Communications, June 1975,
pp. 23(17-21).
Class
1
D
11, Class 2
line which defines
partition Px
...e.
?Class
??
Class
1
line which defines
partition Py
Figure 1. Partitioning Data in Two Classes Before and After
Distorting the Data.
30 / SPIE Vol 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Brightness
in Band 2
Centroid of
Class 1
? ? ? ??
S. - ?
? ? ? .
Class 1
???
? ? '
? ? Centroid of
? Class 2
O.
4
Class 2
Brightness in
Band 1
Figure 2. Each Class has a Centroid Equal to the
Average Location of Data in that Class.
Figure 3. Original Bands
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 31
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
2.0
1.0
0.5
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
22 24 26 28 30 32 34 36
Figure 4. Bit Rate Versus Signal-To-Noise Ratio for KL-2Dim, DPCM, KL-Cosine-DPCM, and Cluster Coding.
32 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
A CLUSTER CODING
`-
_
X KL-COSINE-DPCM
KL-2DIM.-DPCM
A
6 8 10 12
16 20
24 28
Figure 5. Bit Rate Versus Classification Inaccuracy
32 36
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 33
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
30
28
0
ce
2
4.,
0,
24
26
22
20-
18?
() Haar-2 Dim. DPCM
Haar-Cos-DPCM
131
29
27
25
23
21
19
0.0 10-4 10-3 10-2
Bit Error Rate
Figure 6. P-P Signal-to-Noise and Classification Inaccuracy (Dash line)
versus Bit Error Rate.
% Classification Inaccuracy
P = 10-4
Figure 7. Band 3 - 1 Bit - Haar - 2 Dim. DPCM.
P = 10-3
34 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
SPIE Vol. 56(1975) Efficient Transmission of Pictorial Information / 35
Table 1. Computational Complexity of 3 Bandwidth Compression Methods
3-Dimensional Hadamard Transform
and Block Quantization
COMPUTATIONAL NEEDS
STORAGE NEEDS (BITS)
ADDS (OR
SUBTRACTS)/
PIXEL
TYPTCM
Log2B2S+T
MULTIPLIES/
PIXEL
RECORD
STORAGE
TyPirtu.
I TYPICM
1 16BSW
3,300,00C
MEMORY
TYPICAI
44B2S+8000
54,000
1-Dimensional Karhunen-Loeve Transform on
the spectral domain followed by
2-Dimensional DPCM
S+T+3
S+4
16SW
200,000
16S2+148S+8000
17,077
Karhuner-Loeve Transform on the
spectral domain followed by a Had-
amand Transform in the scan direction
(x-direction) with y-direction DPCM
using block quantization.
Log2B+S+T+1
14
S+2
32SW
400,000
84BS+32B+32S(S+1)
+8000
15,000
TYPICAL
B; Block size B = 16
T; Bits/pixel T = 1
S; Spectral Bands S = 4
W; Sample in each scan W = 3200
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
REAL TIME TELEVISION IMAGE BANDWIDTH REDUCTION
USING CHARGE TRANSFER DEVICES
H. J. Whitehouse
R. W. Means
E. H. Wrench
Abstract
Recent advances in analog semiconductor technology have made possible the direct sensing and processing of television images. By
combining a charge transfer device (CTD) imager and a CTD transversal filter, real time image sensing and encoding have been achieved
with low power integrated circuits so that digital transmission and bit rate reduction are made possible using differential pulse code
modulation (DPCM). Good mean square error performance and freedom from DPCM artifacts are possible in a hybrid intraframe image
encoder. The hybrid transform encoder performs a discrete cosine transform (DCT) on each line of the television image as it is scanned.
This compacts the variance into low frequency coefficients and the DPCM encodes the corresponding DCT coefficients between successive
lines. Computer simulation of this hybrid coding technique has shown good performance on 256 x 256 pixel images at 0.5 bits/pixel
and channel bit error rates of 10-2. An experimental system using a low resolution General Electric 100 x 100 charge injection device
camera and a Texas Instruments bucket brigade transversal filter as part of the DCT processor has been constructed and provides good
low resolution image quality at 1 bit/pixel and bit error rates of 10-3. A high resolution vidicon compatible system is also being
constructed.
Introduction
Unitary transforms for image encoding have been used for intraframe encoding.( I) In addition, these techniques may also be
applied to interframe and multispectral encoding. However, all unitary transformations are information preserving and no bandwidth
reduction results from the application of the transform to the image. Instead, the transforms redistribute the variance associated with
each picture element (pixel) so that subsequent to the transform, basis restricting operations on the transform coefficients will result in
bandwidth reduction. Upon reconstruction of the original image from the basis restricted transform coefficients, a degraded version of the
original image can be obtained. Unfortunately, the interrelationship between the type of transform, the form of the noninvertible oper-
ation, and the type of degradation in the reconstructed image is very complicated and subjective. The universally used analytic criterion
of the mean-square-error is, at present, the best compromise technique for transform comparison.
For the particular operation of basis restriction by truncation, a particularly simple interpretation of the bandwidth reduction can
be made. The transforms may be viewed as a variance redistributing operation that approximately decorrelates the transform coefficients
while transforming the variance associated with each picture element into the low-order coefficients of the transform. Under the assump-
tion that each set of picture elements can be considered as a sample function from a wide sense stationary random process with correlation
function rITI, there exists an optimum discrete transformation, the Karhunen-Loeve transformation, which totally decorrelates the trans-
form coefficients and maximally compacts the variance to the low-order coefficients. All other transformations can be compared in their
performance by comparing their transform coefficient decorrelation and variance compaction with this optimum transformation.
This intuitive interpretation can be made rigorous through the use of the rate-distortion criterion.(2, 3) It has been found from
experience that the closer the eigenvectors of the transformation are to the eigenvectors of the optimum Karhunen-Loeve transformation
the greater the variance is compacted and the more the coefficients can be truncated while maintaining a fixed rate distortion or mean-
square-error.
Transform Encoding
Karhunen-Loeve Transformation
If a continuous time function of zero mean and autocorrelation function R(r) = Cairl is considered to be a sample function from a
wide-sense stationary random process, then this time function can be explicitly expanded by the Karhunen-Loeve expansion(4) and the
resulting coefficients will be uncorrelated. For a discrete function of zero mean and autocorrelation function R(r) =rITI, which may be
considered as a sample function from a first-order Markov process, a similar discrete Karhunen-Loeve transformation may be defined.(5)
This transformation diagonalizes the covariance matrix and is optimal in the mean-square-error sense for a restricted set of basic functions
that do not span the complete space.
The discrete Karhunen-Loeve expansion is given by(5) for the case N = 2m as
where
2m
Gk= 2 sin {con [k - (2m + 1)/21 + nr/2
n= I 2m + Xn2
k = 1, 2, . . . , 2m
X2 - 1 - r2
1 - 2r cos 0.1 n r2
36 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
(1)
(2)
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
and con are the first N positive roots of
tan 2inco - -(1 -r2) sin co
(cos co -2r + r2 cos co)
(3)
Since the discrete Karhunen-Loeve expansion involves both the solution of a transcendental equation and the evaluation of the auto-
correlation function of the data to be transformed, real time computation of this transform is quite complex. However, Habibi and
Wintz(1) have shown that Karhunen-Loeve transformations calculated using approximate autocorrelation functions are satisfactory for
many applications.
Discrete Fourier Transform
Since the discrete Fourier transform is asymptotic to the Karhunen-Loeve transformation(6) for the exponential covariance function
and the basis vectors are picture independent, the Fourier transform represents a logical choice for real time implementation. The Fourier
transform exists for all data lengths N. This is defined by
N-1
Gk cirr21k/N gn k = 0, 1, N - 1 (4)
n=0
Discrete Cosine Transform
Two different types of discrete cosine transform (DCT) are useful for reduced redundancy television image transmission. Both are
obtained by extending the length N data block to have even symmetry, taking the discrete Fourier transform (DFT) of the extended data
block, and saving N terms of the resulting DFT. Since the DFT of a real even sequence is a real even sequence, either DCT is its own in-
verse if a normalized DFT is used.
The "Odd DCT" (ODCT) extends the length N data block to length 2N - 1, with the middle point of the extended block as a center
of even symmetry. The "Even DCT" (EDCT) extends the length N data block to length 2N, with a center of even symmetry located be-
tween the two points nearest the middle. For example, the odd length extension of the sequence A B C is CBAB C, and even length is
CB AAB C. In both cases, the symmetrization eliminates the jumps in the periodic extension of the data block which would occur if
one edge of the data block had a high value and the other edge had a low value; in effect, it performs a sort of smoothing operation with
no loss of information. It will be noted that the terms "odd" and "even" in ODCT and EDCT refer only to the extended data block -
in both cases the extended data block has even symmetry.
Let the data sequence be go, g1 , , gN_i . The ODCT of g is defined as
N-1
Gk= I
n=-(N-1)
where
By straightforward substitution it may be shown that
where is defined by equation (8).
-i27rnk
2N -1
gn c for k =0, 1, , N - 1
(5)
g_n = gn for n = 0, 1, .. N - 1. (6)
N-1 -i27rnk
Gk = 2 Re I e 2N
n=0
1 0.5 go, n = 0
=
The EDCT of g is defined by equation (9), where the extended sequence is defined by equation (10).
-irk N-1
-i27rnk
2N 2N
Gk = e gn e
n=-N
for k = 0, 1, ,N - 1
g_i_0gn for n = 0, 1, . . , N - 1
(7)
(8)
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 37
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
If the mutually complex conjugate terms in equation (9) are combined, then equation (11) results. Equation (11) may be viewed as
an alternate way of defining the EDCT.
( -irrk
Gk = 2 Re e 2.1\E
N-1
n=0
-i2rrnk )
1(
gn e2N
N-1
= 2 gn cos
[ 2N27r (n + 0.5) k
/
n=0
Ahmed(7) has investigated the use of the EDCT as a substitute for the Karhunen-Loeve transform and finds that it is superior to the
Fourier transform and is comparable to the Karhunen-Loeve (K-L) in rate-distortion performance while maintaining the computation
simplicity of a transform which does not depend on the picture statistics. Habibi(8) has shown by simulation that the DCT is equivalent
in a mean-square-error sense to the K-L transform under basis restriction.
Hybrid Transforms
Mixed transforms can be used either for intraframe encoding or interframe encoding depending on the available memory and the
type of transforms implemented. In order for a mixed transform to be competitive with one of the conventional two-dimensional trans-
forms, it must offer either superior performance or simplicity of implementation. Of the transforms examined the odd length cosine
transform is competitive in performance since it can be implemented as the real part of a CZT and since the transform samples are real
and are the samples of an autocorrelation function which may then be extrapolated by well-known techniques. In simulations of trans-
form performance, the cosine transform has been shown to closely approximate the behavior of the Karhunen-Loeve transform.
The benefits of mixed transformation implementation and minimum memory may be achieved for digital transmission by combining
a one- or two-dimensional unitary transform with generalized DPCM in a hybrid system. The basis of operation of the hybrid transform
is that the unitary transform decorrelates the image within its constraints of transform type, dimensionality, and block size, while the
generalized DPCM removes the correlation between transform blocks. This hybrid system is particularly attractive for remote sensor
application since it has been found that its performance is approximately as good as the Karhunen-Loeve transform and its implementation
requires minimum memory.(8)
Implementation
Computational Modules
.A linear transform on sampled data of finite extent may be viewed as the multiplication of a vector by a matrix. Multiplication by
diagonal, circulant, or Toeplitz matrices may be accomplished rapidly with simple computational hardware modules. Multiplication by
an N X N diagonal matrix requires only a scalar multiplier and a memory containing N values to provide serial access to the reference
function. Multiplication by an N X N Toeplitz matrix corresponds to a convolution and may be performed using a transversal filter
having 2N-1 taps. Multiplication by an N X N circulant is a special case of multiplication by N X N Toeplitz matrix in which the length
of the transversal filter may be reduced to N taps if the data block is recirculated through the filter or reread into the filter from a buffer
memory.
One-Dimensional DFT
Linear filters have been used for many years for the calculation of the power spectra of continuous signals. One of the earliest
methods used a bank of wave filters to measure the spectra in fractional octave bands for telephone network equalization.(9) However,
when increased resolution was required the number of filters rapidly become unmanageable. An alternative which overcame the difficulty
of a large number of filters each with small time-bandwidth product was to substitute one linear fm (chirp) filter with large time-bandwidth
product and to employ matched filtering. In this system the signal to be analyzed is used to single sideband (SSB) modulate a locally
generated chirp signal and the composite modulated signal is filtered in a chirp delay line matched filter. Each component of the input
signal spectrum shifts the locally generated chirp to a different position in the spectrum after SSB modulation and these shifted chirps
then correlate with the reference signal represented as the impulse response of the matched filter at different times. Thus the output
signal amplitude-time history reflects the amplitude-frequency composition of the input signal.
Bleustein(I 0) recognized that the discrete Fourier transform (DFT) of sampled data was amenable to a similar interpretation. In
addition to just calculating the magnitude of the Fourier transform, linear filters could calculate the phase and thus all of the operations
such a cross convolution and a cross correlation could be calculated. This technique came to be called the chirp-Z transform (CZT) and
can be applied to other problems besides just the calculation of the DFT.(1 I) Prior to these developments, digital computation of the
DFT had been significantly improved by the use of a special algorithm called the fast Fourier transform (FFT) which was described by
Cooley and Turkey.(12) The FFT algorithm gained rapid popularity in signal processing since it allowed the calculation of the DFT to
be done using significantly fewer machine operations (multiplications) than direct evaluation.
By direct inspection it is observed that, if symmetries of the function exp jr2nm/N are not exploited, then the number of complex
multiplications required will be N2 corresponding to N multiplications for each frequency component evaluated. Even on high speed
digital computers this can become the limiting consideration in signal processing applications. The advantage of the FFT algorithm is
that for highly composite values of the DFT size N the number of multiplications is proportional to N log2N.
Although the FFT has been successful in substantially reducing the computing time and cost of using general purpose digital com-
puters it has several disadvantages for special purpose real time computation. At high throughput rates which are required for real time
image processing the processor either must operate log2N times faster than the data rate or pipeline structures which use distributed
38 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
memory and log2N multipliers must be used. In addition, the internal arithmetic of the FFT processor must be done at increased preci-
sion in order to compensate for the multiple round off errors introduced by the successive stages in the FFT processor. Although these
difficulties can be overcome, it is not always possible to arrange the computation in a form where the size of the transform is highly
composite. For the above reasons and because of the difficulty of obtaining small, low power, fast analog to digital converters, linear
transversal filter implementations of the chirp-Z-transform are attractive(1 3) rather than the previous CZT implementation which used
an FFT to perform the required convolution.
The DFT may be easily reduced to the form suitable for linear filtering by the substitution
2nm = n2 - (n - m)2 + m2
which changes a product of variables into a difference so that
N-1
0j7r(n-m)2/N e-j7rn2/N
Gm = Ci71112/N gn
n=0
This form is seen to be equivalent to factoring the Fourier matrix F into the product of three matrices
F = DTD
(12)
(13)
(14)
where D is a diagonal matrix with elements dnn = exp (-j7rn2/N) and T is a Toeplitz matrix with elements tnm = exp (j7r(n - m)2/N).
The CZT algorithm is easily implemented by transversal filter techniques. In this case the DFT is computed by premultiplication
by a discrete chirp, convolution with a discrete chirp, and postmultiplication by a discrete chirp. Figure 1 shows this configuration.
However, it must be remembered that both the multiplications and convolutions are complex and a suitable representation of the com-
plex numbers must be used. One representation is by real and imaginary part. Figure 2 shows the DFT organized as a CZT and imple-
mented with parallel conputation of the real and imaginary parts. In Figure 2 the input signal is represented as g = gR + jgT and the out-
put signal is represented as G = GR + jG1, where it is understood that g = gn n = 0, N - 1 and G = Gn n = 0, . , N - I.
In order to determine the specific form of the transversal filters it is necessary to know the specific value of N. When N is odd the
2
Toeplitz matrix T may be represented as a transversal filter with 2N - 1 complex taps h4Nr_oto hN_I where hn = W-11 hn = -(N-1)
to N-1), and W = exp (-j27r/N). The required convolution has been implemented with the general transversal filter shown in Figure 3.
When N is even, it can be shown tha Tn,m = TN+n, m where the subscripts are reduced mod N. Thus T is a circulant matrix and
can be implemented with a recirculating transversal filter as shown in Figure 4 where the number of complex taps is N and the tap weights
2/2
= w-n
are: h n = 0, . , N - 1.
Charge Coupled Devices
CCDs are sampled data analog circuits which can be fabricated by Metal Oxide Semiconductor (MOS) technology as LSI compo-
nents.(14) As such they are directly compatible with other MOS circuits. Current CCD transversal filters have operated as video devices
with sample rates up to 5 MHz. CCDs operate by the manipulation of injected minority carriers in potential wells under MOS capacitors
and thus behave as capacitive reactances with low power dissipation. However, since the potential wells which contain the minority car-
riers also attract thermally generated minority carriers, there is a maximum storage time for the analog signal which depends on the dark
current associated with the temperature of the silicon. Under normal conditions at room temperature, dark currents are tens of nAmps/
cm2 and storage times of hundreds of milliseconds can be achieved.
There are many ways in which unidirectional charge transfer can be achieved. The first developed was a three-phase clocking struc-
ture which is illustrated in the transversal filter of Figure 5. The three electrode CCD structure is planar, much like the SAW devices, and
the direction of charge propagation is determined by the sequence of potentials applied to the three electrodes. Unfortunately, if the
minority carriers are allowed to collect at the semiconductor-oxide boundary, poor charge transfer efficiency will result due to minority
carriers getting caught in trapping sites. This means that the CCD will behave nonlinearly unless there is sufficient propagating charge
present to fill all of the traps. By biasing the operating condition of the CCD so that about 10% of the dynamic range is used for the
injection of a "fat zero," the traps are kept continuously filled and the device has over a 60 dB dynamic range. In practice, a video signal
representing the signal to be processed is added to a fixed bias somewhat larger than one-half of the peak-to-peak value of the signal.
Since the effective storage time of the device is long relative to the time required to execute a convolution, CCDs can be considered to be
e-j7m2/N
* eirm2/N
* Denotes either convolution or circular convolution
Figure 1. Chirp-Z Transform Implementation of the DFT
SPIE Vol. 66 (19751 Efficient Transmission of Pictorial Information / 39
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
2
COS LUL
CONVOLUTIONS
COS?
Figure 2. DFT Via CZT Algorithm with Parallel Implementation
of Complex Arithmetic
// yo
CLAY LINE
hN-1
YN-1
)\ Y2N-2
no
hN-1 hN-2 h h
? 1' 0
0, h
0/
xN-1
Figure 3. Transversal Filter
xN-1' ' x0
no
hN-1
1-01.Y2N-2 YN-1 YO
Vo IhN-1 hN-2 " h0
vi
C0 h h
0 N-1 " ? 1
hN-2 hN-3 " hN-1
Figure 4. Circular Convolution
xN-1
Y
40 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
o,
Ihr.nti.1
Currint 43 Drivsr
7
WWI
(b)
42 DrIvor
Figure 5. Schematic of the Sampling, Weighting, and Summing Operation
interruptible signal processors and as such are more compatible with the executive control required for signal processing. A 64 point CCD
filter with discrete cosine transform sine and cosine chirps is shown in Figure 6. This chip was developed by Texas Instruments for the
Naval Undersea Center for image processing. In addition to the four DCT filters, four DFT filters, a Hilbert transform and other experi-
mental signal processing functions were also implemented.
Current research in CCDs is directed toward improving the charge transfer efficiency and removing the requirement of continuous
"fat zero" charge injection by ion implantation techniques which keep the minority carriers away from the semiconductor oxide bound-
ary. Ion implantation is also being used to provide asymmetric potential wells so that simpler two-phase clocking can be employed. Cur-
rently available CCDs have 500 stages with 0.9999 transfer efficiency and devices with up to 2000 stages are planned.
Another charge transfer device similar to the CCD is the Bucket Brigade Device (BBD). This is a sequence of MOS transistors coupled
together by diffusion enhanced Miller capacitance. Although these devices do not operate at frequencies as high as CCDs, they have
better low frequency performance since they include active devices. A CZT has been implemented with two BBD chips. Two 200 tap
filters are implemented on each chip: one a discrete cosine and the other a discrete sine filter. The BBD chip is shown in Figure 7. The
complex chirp used in the premultiplier and a typical input and output are shown in Figure 8. The input is an offset cosine wave and the
output shows a D.C. component plus a response at the cosine wave frequency. These filters can operate at 100 kHz and have tap accura-
cies better than 1%. With careful control of geometry, both BBD and CCD filters with tap accuracies approaching 0.1% should be possible.
This chip was also developed by Texas Instruments for the Naval Undersea Center.
0
A
iiiiii iiqm
j? _J?
? _p
6?A
1211111311111
?
?
111111111
ki
ItiE1111111111111111111111111",
NEE'
Iliu,111[1.,Lil.1.1.1.1 j 111111.111,)1:111J4,ItoLL,51..ii
? NJ'
? jo mrmr?
NP2.4.
114_11
Figure 6. 64 Point CCD Filters
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 41
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
ff. ,11,Ailii llllllllllllllllllllllllllll ii p.itlfilliTililfilljilliff 04110 IC _
li:rj, I
rii':7 1
L'id i
ir?ii I
;,.
.0
-I' -
"I
31-44a-I'71';';'-:'7'7'-.'7'7'-','-','"": -I' -1'7'7' "I'I'7,'"I'7,T-.I71?7:1-.I-__.-:- 7270-1711
(
_
P....F.-
ITAi'l:"4144W0:4W1:441:1:0:41401
I.itftfifififififef l l 'TIN kNelk
V
lll l gahrilipikitreiftliMT-ur 1-1-1-11-r-W-Ti
ll ll
ll llll
"1%.01,1?1,t,21.11,11ifilelif llllll
t.:T4e4wio.,,w...? ? .
'10.49!1!!! l i'i!i!1-9! itr
4,ift'tfi!4!!!4!41.4Ntlqi9Nilegoill'irerilifitifilifiRre to,
Figure 7. Bucket Brigade Chirp Filters
IMPULSE RESPONSE
V 1 0,u :";
? ? ? ?
? ? e ?
? ? 0
? ? ? ?
*10: ./0 ? . ? ?
?? 9 ?
0 ? ? ?
0 ? ?
?
? ? ? p
????????;"...kote?oo?
200NS
\VINVIV
Figure 8. BBD Performance
PREMULTIPLIER CHIRP
INPUT TO OCT
OCT OUTPUT
42 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Discrete Cosine Transform
Let the data sequence be gO, gl , ? ? ? gN_I . The ODCT of g is defined in (5) as
N-1 -j2rnk
2N-1
Gk = gn e
n=-(N-l)
fork = 0, 1, ,N -1
The identity (15) may be used to obtain the CZT form of the ODCT shown in equation (16).
2nk= n2 + k2 - (n -k)2
-jirk2 N-1 _j,n2 jr(n-k)2
2N-1 2N-1 2N-1
Gk = 2 Re e e gn e
n=0
(5)
(15)
(16)
The block diagram of the ODCT is shown in Figure 9. Since only real inputs and outputs are required, a simplified implementation is
possible and is shown in Figure 10,
A corresponding implementation may be found for the EDCT. The EDCT of g is defined by equation (9), where the extended se-
quence is defined by equation (10).
-irk N-1
Gk e 2N
n=-N
-j2wrik
2N
gn e fork
g-I -n = gn for n = 0, 1, , N - I
el/TM2 /(2N -1)
e G
Orn2 /(2N ?1) cj1rm2/(2N ?1)
-IN - ?m
and
1)k(0, 1) 5 <
(40)
Now x1(,r,e), x2(71 ) and x3(71,Te) are trans-
form coefficients as transmitted, that is, ele-
ments of C(/, , ). Whereas Xi (0,1), X2(0,1)
A
and x3(0,1) represent estimates, from surround-
ing data, of the original coefficients in C(e, 0,1)
before shifting and thresholding. If the Xk(O, 1)1
are indeed valid estimates of the true coefficients,
then operating on them with S r? I and T-t f?
should produce the same set of transmitted
numbersIx,lc (2' ;7 I; )1. We next make use of this
principal to guide a decision to use
A A
Xic( 0, 1) or some modification of k(0, 1).
In order to make more direct use of equa-
tions developed earlier we let
I
XOSI ) 4Xk(81 ) and K(O, 1) 5 4k(0,1)
(42)
where we recall that xki (7e, Tie ) is simply the
stored integer form of xk(si, ri) (see (15), (17)
and Fig. 6).
Making use of (16) and (20), the final form of
any estimate xk(0,1) must satisfy
I
xk(s/j-i)= 2-6-11 TE S, [x,_Al(0,1)]] (42)
1`.
Here we will treat only the case of primary inter-
est where x1 ('7 ) = x2 (7 ,) = x3(s-/, ) = 0.
/2 /
We define
A E max
-130
2
S_ [
s 2
(0,1)]
(43)
- 0 5
_ ?
a =
A
Then if A < we choose the final form for the
two by two array W to be W. If A a T:e we obtain
this output by applying the inverse two by two
Hadamard transform to
(44)
We recognize a as a scaling factor which makes
A
ax.k(0,1) satisfy (42).
The condition in (42) is much more restric-
tive when some of the transmitted coefficients
are non-zero. As a result the algorithm presently
simulated will tend to accept W unmodified a
large percentage of the time. It basically leaves
edges alone.
Observe that when W lies on the border of
Wo(/ - 1) the surrounding data, E. ., may still be
1,3,
available. Provided the particular W does not
lie on the border of a picture, the necessary E. .
J
can be obtained from the corresponding transform
level of adjacent 64 by 64 data arrays. Present
simulations have shown that this eliminates
noticeable transitions between adjacent data arrays
even at very low data rates.
82 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Comments. The adaptive inverse algorithm
described above has not been shown to be optimum
in any fashion. It is however both practical and
effective in improving subjective fidelity and in
reducing mean-square-error. It is felt that both
simulations and analysis could lead to more
sophisticated adaptive versions with worthwhile
additional performance benefits.
RM2 IMPLEMENTATION
Based upon preliminary implementation studies,
it was estimated that 300 to 400 CMOS integrated
circuits would be required to implement the RM2
compression algorithminthe form described in
this report. Suc ha design would accommodate an
80 KBS input data rate. More restrictive applica-
tions could reduce this substantially although the
300-400 chips is considered quite practical. Con-
tinuing advances in solid state technology promise
to further minimize the chip count.
It is of interest to note that the transform
implementation is almost trivial since it requires
no more than an iterative application of a two
[6]
by two fast Hadamard transform. The rate
allocation algorithms could ideally be performed
by microcomputers, a likely element in future
flight data systems. This was not assumed in the
estimates however. Much higher speeds could
be accommodated by using a more parallel struc-
ture than the one assumed.
SIMULATIONS
A study was recently initiated to investigate
the applicability of AICS to advanced Pioneer
missions which incorporate imaging. Sample
images have been selected for this purpose by
imaging scientists to be representative of
expected mission targets, and via processing,
to exhibit characteristics similar to those of a
planned Pioneer CCD camera. We will present
some preliminary pictorial results.
Mercury Closeup
One of the sample images to be used in this
study is a 512 by 768 closeup of Mercury derived
from the recent Mariner flybys. A standard
stretched display of this scene, including
histograms, is given in Fig. 10. A more reveal-
ing display is shown in Fig. 11. Here a two dimen-
sional filter which tends to equalize average bright-
ness throughout the scene was applied. This permits
a more severe contrast stretch to be used, enhancing
visual detail on film. These operations are
representative of the block labeled "image pro-
cessing" in the overall AICS system diagram in
Fig. 2.
RM2 Compressed
Examples of images compressed by RM2 at
rates of 2. 0, 1.33, 1. 0, 0.66 and 0.5 bits/pixel
(compression factors of 4, 6, 8, 12 and 16) are
shown in Figs. 12-16 respectively. The same
critical display technique of Fig. 11 is used.
Except for the rate; all internal parameters of
RM2 were identical in each case. These are
noted in Table 1. The reader may refer to the
appropriate section for further details. In
addition, a simple black space detection algorithm
was employed (see (4) and (5)) to illustrate the
idea.
Table 1. Parameters for Simulation Examples
Parameters
Used for
Where
Picture Loop
Rate Control
Fig. 3
Gain = 1.0
Within Imaging
Sequence
Array Loop
Rate Control
Fig. 4
Gain = 0. 35
Within an Image
e(m), m= 1,
Rate Allocation
Equa.
2, ? ? ? 8
to picture arrays
6, 7
0. 0,0. 5, 1. 0,
1. 5, 1. 9, 2. 2,
2. 5, 2.8
t
n aMIN= 0.35
Rate Allocation
Equa.
to picture arrays
6-10
Initial shifts
Obtaining Array
Equa.
81(0) = sz(0)
Activity; starting
31-34
= 2, s3(0)= 1
point for finding
gi.
pi = 0.2,
= 0 3
32 ' '
Frequency
Emphasis
Equa.
29-32
133 = O. 5
\j1 = ?12 = V3
Adaptive Inverse
Equa. 38
= 0. 6
Estimation
Parameters
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 83
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
MMIIMMEMMWMUNNW17
1141 III/ 111111111 111.11111111111.101 111.111ft11111.11111
itieltotolittol,tol otelleolvillitifigi flotioitillit
i'ltf #1,triAta ol,ottliiii
TEST FRAME 2 MERCURY 2 KM,PIXEL
RESSAR7
RESEAU EMOVED
STRETCH D (12,6991.26)10(0,128,255)
STRETCH
71.52 RETCHED AtiTOGRAM
Lig. 10. Stretched Pioneer 'Fest Frarne.
84 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
v?i?itift, ty.1?4
11111611,111,11 111111???11.;iivt,2,1
fit
"111111,11
111"111" II
111111,11kil
Vig. 11. Mercury Closeup Original,Ililtered.
$1.11114?11.111110. 1.114.419111?1
t? ?111111?11111111fit 1/101?11.1.11111111 111111,111ti
1"1141.1
ilflovlift411 II 1.41111.01101.11.1t 1..11111111"W!
? t tAIJItilliititi?111,4,1?.1411?
12. RI\42 Coded to Z. 0 bits /pixel (4:1)
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 86
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
1,1611011111Oli 41101101/11,11
oto,
IIIV1111111111.111 .
;4111,1111114.111,11 itillItioloo,J1,11. ,t1,1*,1,1,,,,1 1,1,11,,,I,,,,fixt Itt,,I,,,,111, 1,1,,,,I,,,,11,1, ,,,,I,,,,11,,
11111.1111111111ML,
11111filill1/11/$1/ lItIVOlt11,11111 I 1"1111111111111,11 $11t111101:4,111.14. ft,*11011111.,141', 111111111W1,11111 IIIIIIIIIII..114114 11111111i1111
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
111/111111111111111 4101111.11111111., 1.141.1111111111.10 1101111.1i1111111011 111,141111111.11111 1.11111111111111111 1M/1141111111111f 41111
? 4 `11,
t 11,
?7?
, at'
In,1,1
I M2 Coded to 0. 66 bits. /pixel (1 : I1
till 1111111111.4111101 1111r1111141111.11 4114111111111111111 impoillotopm iloglitel itIllt.14
111,11 11111totdm11111, Itt11041;1.1
rig. 16. 111\AZ Coded to 0.5 bits /pixel (I 6:11
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 8/
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05
Rate distribution. The top and bottom of the
Mercury picture were separately treated as a
sequence of two 256 by 768 images. It is of inter-
est to look at the initial rate allocation to each of
the 48 arrays making up the top half as a result
of the algorithm described in (5) - (9). This
demonstrates one aspect of the adaptive nature of
RM2. The distribution of array activities derived
from (32) is shown in Fig. 17.
Choosing the image rate of 1.33 bits/pixel
as an example (Fig. 13), the application of the
algorithm in (5)-(9) yields the initial rate alloca-
tions in bits/pixel to each of these arrays
shown in Fig. 18.
Observe that the two arrays in the lower
righthand corner have been allocated a rate of
zero bits/pixel. These have actually been edited
(see (5)). No real data has been lost as can be
noted from Figs. 10 and 11. Without the editing
they would have been assigned a rate of around
0.5 bits/pixel, the same as for the low activity
arrays immediately above. The rate assignments
to other areas reflect their relative activity, with
one extremely active array receiving over
3 bits/pixel.
Control loop error. At the completion of the top
half picture the rate error (ei in Fig. 3) was
1125 bits or 0.006 bits/pixel. All these "unused
bits" were used to adjust the rate of the bottom
half picture (i. e. , a picture loop gain of 1). At
all other compression rates lei was quite
similar.
Rate/fidelity. Recall that because of the con-
catenated Reed-Solomon/Viterbi Channel in Fig. 2
any rate/fidelity tradeoffs provided by RM2, as
in the above sequence of images, represent true
end-to-end communication system tradeoffs.
Further, the user could specify any intermediate
: CIA-RDP80600829A001100060001-0
rate and expect a smooth monotonic relationship
between the rate specified and the fidelity obtained.
These considerations and a preliminary assess-
ment of RM2 performance from these examples and
other simulation results gives AICS in Fig. 2 an
effective average information rate capability of 3 to
5 times that of the Jupiter/Saturn Communication
System in Fig. 1. Similar comparisons with an
uncoded communication system, such as that of
the recent Mariner Venus /Mercury flybys, would
yield an AICS information rate advantage of up to
10 dB.
Amore extensive evaluation of RM2 when
applied to planetary imaging data will come
from the Pioneer study. Applications of RM2 to
other problems is being investigated and initial
results have been excellent. In addition to tailor-
ing the simulation version of RM2 described here
to specific applications, further improvements
in performance can be expected from extensions
and modifications to the algorithm. Particular
emphasis here should be placed on expanding the
rate allocation concepts and improving the adap-
tive inverse. Also important is the specification
of simpler versions of RM2 for more restrictive
applications in which being extremely adaptive
is not important.
REFERENCES
1. Rice, R. F., "Channel coding and data
compression system considerations for
efficient communication of planetary
imaging data, " Technical Memorandum
33-695. Jet Propulsion Laboratory,
Pasadena, California, June 15, 1974. ,
2. J. P. Odenwalder et al., "Hybrid coding sys-
tems study," submitted to NASA Ames Res.
Ctr. by Linkabit Corp., San Diego, Calif.,
Final Rep. , ContractNAS2-6722, Sept. 1972.
0.47
0.43
0.54
0.55
0.50
0.42
0.55
0.69
0.95
0.47
0.31
0.03
0.49
0.53
0.49
0.34
0.28
0.33
0.43
0.63
0.63
0.44
0.25
0.07
0.42
0.46
0.60
0.44
0.37
0.41
0.38
0.66
0.62
0.47
0.25
0.02
0.46
0.40
0.37
0.43
0.43
0.51
0.61
0.81
0.40
0.42
0.22
0.02
Fig. 17. Array (64 by 64) Activities for Top Half Mercury Closeup.
88 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
1. 38
1. 29
1. 57
1. 61
1. 48
1.25
1.60
2.06
3. 11
1. 38
1. 00
0. 50
1.43
1.55
1.43
1.06
0.931
1.05
1.27
1.88
1.87
1.30
0.88
0.56
1. 24
1. 36
1.75
1.31
1. 13
1.22
1. 16
1.95
1.84
1.39
0.86
0.00
1.36
1.22
1. 13
1.29
1.29
1.50
1.79
2.52
1. 19
1.25
0.82
O. 00
Fig. 18. Initial Rate Allocations for Average Rate of 1. 33 bits/pixel.
3. J. P. Odenwalder, "Concatenated Reed-
Solornon/Viterbi channel coding for advanced
planetary missions: analysis, simulations
and tests, " submitted to Jet Propulsion
Laboratory by Linkabit Corp., San Diego,
Calif., Final Pep.., Contract 953866,
Dec. 1, 1974.
4. J. A. Heller and I. M. Jacobs, "Viterbi
decoding for satellite and space communica-
tion," IEEE Trans. Commun. Technol. ,
Vol. COM-19, part II, Oct. 1971,
pp. 835-848.
5. "Deep Space Network/Flight Project Interface
Design Handbook," JPL Document 810-5,
Rev. C. Jet Propulsion Laboratory,
Pasadena, Calif., April 15, 1972 (JPL
Internal Document).
6. R. F. Rice, "RMZ: transform operations, 1'
Technical Memorandum 33-680. Jet
Propulsion Laboratory, Pasadena,
California, March 1, 1974.
7. R. F. Rice and J. R. Plaunt,
"Adaptive variable length coding
for efficient compression of space-
craft television data, "IEEE Trans.
Commun. Technol. , Vol. COM-19,
part I, Dec. 1971, pp. 889-897.
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 89
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
ADVANCED FACSIMILE SYSTEM
AND NETWORK APPLICATIONS
John W. Watkins
Harris Corporation, Electronic Systems Division
Melbourne, Florida 32901
Introduction
Laser, acousto-optics and dry silver material technologies were designed into an advanced facsimile system to achieve
high recording speeds, high copy resolution, and chemical-free operation. The basic system was designed so that its per-
formance specifications can be modified over a wide range with minor changes required to enable a broad spectrum of
applications to be addressed.
The technologies that were used and the design features of the facsimile system are reviewed in this report, including
a description of the unique optics bench that was designed to provide electronic control of its performance parameters.
These unique features, along with the modular design of the system, provide the versatility required to satisfy a wide range
of network applications.
Laserfax Design Objectives
Facsimile and hard copy display applications have been found to require a relatively wide range of operational and
functional specifications. While it would be difficult and expensive, if not impossible, to develop a single system that
would satisfy all requirements, an approach has been devised by which a significant segment of the applications can be
addressed through a functionally designed system whose specifications can be modified significantly without requiring a
major redesign.
It was first determined that although the facsimile system to be developed would be capable of transmitting and repro-
ducing printed documents, its principal capability would be to satisfy those applications that require quality photographic
reproductions. Therefore, the system trade-off for a specific application is usually between the desired spatial resolution
in the output copy and the transmission time over a predefined data link. The dynamic range or the number of gray shades
in the reproduced photograph is normally not a significant trade-off parameter. Thus, the next constraint that was placed
upon the design was to require that the resolution of the system be easily modifiable with the document scanning rate being
treated as the dependent variable.
With new telephone, microwave and CATV data links becoming more available, it was also desirable for the overall
speed of the system to be changeable while causing a minimum impact to the basic system design.
In addition to the system versatility described above, the equipment would have many convenient operational features
such as, copy reproduction on a dry recording material, low cost per copy, daylight loading of the recording material,
ease of operation with no operator required at the receiver, low maintenance, high reliability, easy access to all sub-
assemblies for service, and others.
System Configuration and Operation
The drawings presented in Figures 1 and 2 show cross-sectional outlines of the major subassemblies in the facsimile
transmitter and receiver, respectively. In the transmitter, the document or photograph to be transmitted is placed in the
input chute. Upon command, the capstan drive translates the document past the scanning laser beam that is projected from
the optics bench and that is brought to focus on the document as depicted in the figure. With the intensity of the laser
beam being held constant from the optics bench, the light that is scattered by the document and that falls onto the detector
is therefore proportional to the reflectivity of the photograph or document being scanned. This video signal is then ampli-
fied and conditioned for transmission over the data link that has been selected for the particular application.
In the receiver, unexposed dry silver paper (3M Type 7771) is fed from the supply cassette around the capstan drive and
past the exposure station where the modulated laser beam from the optics beam is brought to focus. After exposure, the
paper progresses up through the paper feed mechanism towards the heat processor. When the transmission is complete, the
paper is brought forward until the exposed paper passes the cutter bar where it is then cut. The unexposed paper is then
retracted to the exposure station and the exposed sheet progresses through the heat processor where it is developed and
delivered out the exit chute into the stacking tray.
The major subassemblies in the receiver are the capstan drive, the paper supply cassette, the heat processor, the optics
bench and the electronics assembly; and in the transmitter, they are the capstan drive, the optics bench and the electronics
90 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Imagery Transmission System (ITS)
The Harris ITS is a high speed, photographic quality, secure,digital
facsimile, switched network which is currently installed and can
provide full duplex connections for up to 60 subscribers.
0
Secure
0
Automatic Unattended Receiver Operation
0
Digital T-1 Rates (1.544 Mb/s)
0
Automatic Opaque/Transparency Scan
One Minute Transmission
D
285 Scan Lines Per Inch, 6-Bit Quantization
0
Full Duplex, Simultaneous Voice/Data
0
Dry Silver Output
0
TEMPEST Qualified
0
Switch-Selectable Contrast Enhancement
0
Multi-Terminal Broadcast Connections
0
Loopback Fault Diagnosis
Approved For iwg
For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP801300829A001100060001-0
REMOTE SITE
REMOTE
TERMINAL
SUBSYSTEM
CENTRAL
SITE
REMOTE
TERMINAL
SUBSYSTEM
REMOTE
TERMINAL
SUBSYSTEM
CFE COMSEC
44i_Ho
AND LINK--
1.21...IBSYSTEM
CFE COMSEC
AND LINK
SUBSYST=A
USTOMER FURNISHED
OUIPMENT
REMOTE
TERMINAL
SUBSYSTEM
REMOTE SITE
The central site consists of a computer-controlled
nonblocking space-division solid-state switch
capable of providing point-to-point and broadcast
connections on prearranged subscriber lists. The
ITS terminal consists of a Harris facsimile trans-
mitter/receiver pair and a voice-control unit which
are connected in a network arrangement to provide
full-duplex simultaneous voice and imagery data
over secure T-1 digital lines. The terminal operates
at 285 scan lines per inch and 6-bit pixel quantiza-
tion to provide extremely high quality photographic
images on 3M Dry Silver paper. Transmission time
is one minute over 1.544 Mb/s (T-1) data links
conditioned through standard COMSEC interface
HARRIS
ILILP
equipment. Various terminal modes are available
including unattended receiver operation. A status
panel provides the operator with continuous status
display of all terminals on the network including
in-service, out-of-service and busy conditions. The
facsimile transmitter is capable of automatically
detecting and scanning either opaque paper copies
or transparent film copies. Up to four object
enhancement curves with PROM density transfer
functions are provided to modify the imagery for
shadow stretch or glint enhancement. Several loop-
back modes are designed to ease maintenance by
quickly isolating system and terminal faults.
HARRIS CORPORATION Government Systems Group
Communications Systems PO Box 37 Melbourne, FL 32901 (305)727-431
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Imagery Transmission System (ITS)
The Harris ITS is a high speed, photographic quality, secure,digital
facsimile, switched network which is currently installed and can
provide full duplex connections for up to 60 subscribers.
D
Secure
0
Automatic Unattended Receiver Operation
0
Digital T-1 Rates (1.544 Mb/s)
LII
Automatic Opaque/Transparency Scan
0
One Minute Transmission
285 Scan Lines Per Inch, 6-Bit Quantization
0
Full Duplex, Simultaneous Voice/Data
0
Dry Silver Output
II
TEMPEST Qualified
0
Switch-Selectable Contrast Enhancement
0
Multi-Terminal Broadcast Connections
0
Loopback Fault Diagnosis
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
REMOTE SITE
?
REMOTE
TERMINAL
SUBSYSTEM
I"CF E COMM'
AND LINK"?
LaBSYSTEM
CENTRAL
SITE
REMOTE
TERMINAL
SUBSYSTEM
4N.
'\ UP TO 60 RTS
I REMOTE SITE
REMOTE
TERMINAL.
SUBSYSTEM
CFE COMSEC1
AND LINK 10.
SUBSYSTEI2j
CENTRAL
SITE
SUBSYSTEM
A
oFFE COMSEC
4
AND LINK
LS......HUBSYSTEM
OPM111?1?011111111
REMOTE SITE
REMOTE
TERMINAL
SUBSYSTEM
UP TO 60 RTS
V
CFE COMSEC
AND LINK
SUBSYSTLIV..1
/ NETWORK
NOTE:
CFE - CUSTOMER FURNISHED
EQUIPMENT
REMOTE
TERMINAL
SUBSYSTEM
REMOTE SITE
The central site consists of a computer-controlled
nonblocking space-division solid-state switch
capable of providing point-to-point and broadcast
connections on prearranged subscriber lists. The
ITS terminal consists of a Harris facsimile trans-
mitter/receiver pair and a voice-control unit which
are connected in a network arrangement to provide
full-duplex simultaneous voice and imagery data
over secure T-1 digital lines. The terminal operates
at 285 scan lines per inch and 6-bit pixel quantiza-
tion to provide extremely high quality photographic
images on 3M Dry Silver paper. Transmission time
is one minute over 1.544 Mb/s (T-1) data links
conditioned through standard COMSEC interface
LILO HARRIS
equipment. Various terminal modes are available
including unattended receiver operation. A status
panel provides the operator with continuous status
display of all terminals on the network including
in-service, out-of-service and busy conditions. The
facsimile transmitter is capable of automatically
detecting and scanning either opaque paper copies
or transparent film copies. Up to four object
enhancement curves with PROM density transfer
functions are provided to modify the imagery for
shadow stretch or glint enhancement. Several loop-
back modes are designed to ease maintenance by
quickly isolating system and terminal faults.
HARRIS CORPORATION Government Systems Group
Communications Systems P0. Box 37, Melbourne. FL 32901. (305)727-4311
Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0
DRIVE
CAPSTAN
EXIT
CHUTE
FOCAL
POINT
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
INPUT
CHUTE
READ
DETECTORS
LASER
DRIVE MOTOR
OPTICAL
BENCH
Fig. 1. Facsimile transmitter system configuration.
CUTTER
SUPPLY
ELECTRONICS
PAPER FEED
MECHANISM
EXIT CHUTE
FOCAL POINT
DRIVE CAPSTAN ASSEMBLY
PROCESSOR
LASER
OPTICAL
BENCH
Fig. 2. Facsimile receiver system configuration.
assembly. Each of these subassemblies has many unique features that could be discussed in length. However, in keeping
with the purpose of this report, only those features that help clarify the system approach, operation and versatility will be
discussed here.
The speed of the paper in both the transmitter and receiver can be modified over a wide range by either replacing the
drive motors or part of the drive assembly. In many cases, the desired speed can be achieved by simply changing a drive
belt and pulley. Thus, the desired paper speed for a particular application can readily be achieved.
For applications that require data rates that are significantly higher than can be transmitted over conditional telephone
lines, a broadband single element silicon photodetector is used. The mechanical mounts for both detectors are compatible
with the assembly.
The electronics subassembly is functionally designed with unique functions being isolated to individual circuit cards.
This permits a specific function to be modified by interchanging the corresponding card. The entire electronics assembly,
including the motherboard, can be replaced with another assembly to satisfy those applications with significantly different
system specifications.
Although the optics bench was designed so that it can be modified to satisfy a wide range of applications, it possesses
a few unique features that enables many requirements to be satisfied with little or no modifications required. The key to
the versatility of this facsimile system lies with the optical system design. Therefore, it will be discussed in more detail
throughout the remainder of this paper.
Optical System Description
The basic optical system is the same for both the facsimile transmitter and receiver. It is a straightforward coherent
optical system design consisting of a 2 mW helium neon laser, an acousto-optical modulator, focusing lenses and a galva-
nometer. This configuration is shown in the conceptual diagram that is presented in Figure 3 and in the optical system
schematic that is presented in Figure 4. A long photodetector is mounted on the outside of the optics bench in the trans-
mitter to detect the light that is scattered off of the document that is being transmitted.
This basic optical system was designed to scan and reproduce photographic documents with a spatial resolution across
an 11-inch or less flatbed scan of up to about 300 TV lines per inch within a transmission time of about 1 minute or longer
(8.5- by 11-inch document). With 3M Type 7771 dry silver paper as the output medium the system is capable of reproduc-
ing up to about 12 shades of gray (IEEE facsimile chart) in the output copy.
As will be discussed later, the spatial resolution can be substantially increased above 300 lines/inch by making
relatively inexpensive modifications to the optical system. The transmission time can also be reduced for many applications
by utilizing a 6 mW laser that is mechanically interchangeable with the 2 mW laser. More expensive coatings can also be
used on the optical elements to further improve the overall efficiency of the optical system. The light intensity at the
paper can be increased by using a more efficient AOM than is used in the basic system.
Refer to the optical system schematic shown in Figure 4. The laser beam from the 2 mW laser is projected through the
acousto-optical modulator (AOM) and brought to focus on the optical axis by lens Ll . This focused spot is reimaged by
SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information / 91
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
HEAT PROCESSOR
EXPOSING STATION
/eN
ROTATING MIRROR
2 MW
LASER
(He-Ne)
PAPER ROLL
LASER
ACOUSTO-OPTIC
MODULATOR
FOCUSING LENS
U GALVANOMETER
Fig. 3. Laserfax conceptual optical system diagram.
0.9 mm AT 1/e2
Li
I= II 111?111110- -41111111
4.6 mm
ACOUSTO-OPTIC
MODULATOR
GALVO
_
0= 7.2 mradians
_ i
= =
_
_
593 mm
639 mm
ru
Fig. 4. Laserfax optical system schematic (typical parameters).
lens L2 onto the paper after the beam has been reflected by a mirror that is attached to the shaft of the galvanometer. Sta-
tionary mirrors that are used to fold the optics path into a rigid and compact optics bench are not shown in Figure 4. A
beam splitter that is located between the AOM and lens LI is also not shown in this figure. Its function is to reflect part
of the laser beam onto a small photodetector to provide a feedback loop to the AOM that is used to normalize the output
power of the laser.
In the mechanical package of the optics bench lens L2 is hard-mounted a distance of 639 mm from the paper and the
galvanometer is located a distance of 593 mm from the paper. Lens Ll is attached to an axial adjustable mount for two
reasons. First, this provides a focus adjustment in the system after the bench has been assembled. The second but most
significant reason is that lenses with different focal lengths can be mounted in this location to provide a wide range of
spot sizes at the paper. As will be discussed later, this is a significant feature where a high resolution system is to be
achieved.
The diameter of the laser beam to the left of the lens L2 is normally about 0.9 mm (1/e2 points). In the system shown
in Figure 4 the beam has been expanded to 4.6 mm by the lens combination. The diameter of the focused spot at the paper
is therefore
92 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
4
do = = 0.112 mm = 0.0044 inch.
7re
It is clear that if this spot is used to reproduce photographs at a scan line density of from 100 to 200 lines/inch, a
significant raster structure would be present in the output copy. To prevent the raster structure and to provide added ver-
satility to the optical system, the focused laser spot at the paper is deflected perpendicular to the galvanometer scan
direction at a very high rate. This high rate vertical scanning is referred to as spot dither and is achieved with the AOM.
The drive signal to the AOM is amplitude modulated with the video signal in the receiver but its amplitude is held constant
in the transmitter. It is also frequency modulated in both the transmitter and the receiver to achieve spot dither at the
paper. Figure 5 shows the effective spot profiles in both the
horizontal and vertical dimensions with dither providing spot
elongation in the vertical dimension. The parameteis presented
in the figure are for a typical system with a scan line density SCAN DIRECTION
of 110 lines/inch.
It is clear that the line height can be decreased by reduc-
ing the frequency deviation of the FM signal to the AOM and
the line density can be increased by either driving the paper
at a slower speed or by increasing the scan rate of the gal-
vanometer. Hence, the resolution of the facsimile system is
now electronically controlled. The speed of the galvanometer
is directly proportional to the video signal bandwidth. And,
the speed of the paper determines the transmission time. These
electronically controlled system performance parameters are
the trade-offs that provide the versatility that is required to
enable a wide range of applications to be readily addressed
with this basic facsimile system.
LINE
HEIGHT
ACROSS SCAN
PROFILE
ALONG SCAN
PROFILE
11111111111
5 4 3 2 1 1 2 3 4 5
DISTANCE (THOUSANDS OF AN INCH)
Fig. 5. Typical spot profile for Laserfax system.
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 93
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0
AN OPERATIONAL VIDEO DATA COMPRESSION SYSTEM FOR ATS AND ITOS
R. L. Kutz
NASA
Goddard Space Flight Center
Greenbelt, Maryland
L. D. Davisson
University of Southern California
Los Angeles, California
Abstract
An operational data compression system has been developed and implemented for transmission of digitized ATS
and ITOS-VHRR satellite video data over the wideband communication link between the Wallops Island, Va. Command
and Data Acquisition Station and the National Environmental Satellite Service at Suitland, Md. This system uses mini-
computers for the coding and decoding of the data to achieve maximum flexibility together with specially designed inter-
face equipment for greater efficiency. No loss in data quality occurs due to the compression, and, in certain cases,
data is transmitted which would be otherwise unavailable due to the limited channel capacity. This paper describes the
method of compression, the equipment used, and the compression results attained.
Summary
A real-time programmable data compression system has been developed for operation over a digital wideband com-
munication link between the National Environmental Satellite Service (NESS) Wallops Island, Virginia, command and
data acquisition station and the NESS Data Processing and Analysis Division at Suitland, Maryland. The NESS is part
of the National Oceanic and Atmospheric Administration (NOAA) of the Department of Commerce. The source data
consist of satellite derived images of the earth from which environmental data are extracted by NESS and others. Four
spacecraft have been used to provide the source data. The first imagery data used came from the Applications Tech-
nology Satellite (ATS) 1 or 3 Spin-Scan Camera and the imagery currently being employed comes from the NOAA 2 or 3
spacecraft Very High Resolution Radiometer (VHRR). The imagery data are received at Wallops Island, Virginia, com-
pressed, transmitted over a 300 km wideband ground link to Suitland, Maryland and reconstructed in real-time. Using
general purpose minicomputers together with specially designed input/output channels, a reduction in channel time-
bandwidth requirements of two-to-one can be attained with zero loss of data quality with respect to the quantized uncom-
pressed data. As far as the final user is concerned, the system is invisible. Because of the tremendous flexibility of
the minicomputers, it is possible to vary the compression algorithm and to perform auxiliary tasks such as data for-
matting. The compression system has seen continuous use 18 hours a day from December 1972 through July 1974.
Introduction
For many applications, technology has created the ability to generate ever increasing amounts of data. With the
generation of this data comes the increased cost of transmission, storage, and processing in proportion to the greater
channel capacity required. Data compression is a highly promising technique which can be applied to reduce the
increased capacity requirements through more efficient coding of source data. Such techniques have been studied for
the last ten years but as yet have had little impact on actual operations due, at least in part, to the increased hardware
complexity, lack of appreciation by operational people, and reluctance on the part of users to allow their data to be
subjected to any more manipulation than is absolutely necessary. The time has come, however, when economic con-
siderations necessitate the careful review and evaluation of techniques that suggest a more cost effective use of the
channel storage and computer capacity, e.g. data compression. This paper describes one important application to an
operational situation.
Large quantities of earth-observational data are now generated and it is possible to apply sophisticated processing
to that data for many purposes. Principal among these data are meteorological observations for the many aspects of
weather prediction with their obvious economic benefits. The use of satellites to acquire images of the earth on an
operational basis has dramatically increased the quantity and quality of data involved in weather prediction. The NOAA
image data are transmitted from two Command and Data Acquisition (CDA) stations to a central location for computer
processing. No processing capability exists at the CDA stations. The particular link used for the data compression
system to be described is a microwave link between the CDA station at Wallops Island, Virginia and the Data Process-
ing and Analysis Division of NOAA/NESS at Suitland, Maryland. This link has been used operationally since 1969 for
digital ATS spin scan image data without data compression. The recent deployment of the Synchronous Meteorological
Satellite (SMS) involves the transmission of higher resolution data than currently installed NOAA/NESS ground trans-
mission equipment can handle. It is apparent that the channel capacity needed to transmit meteorological data will con-
tinue to increase as the multiple SMS and TIROS-N spacecraft go into operation.
94 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
With the required large channel capacity, data compression provides a method for significant economic savings in
transmission costs. Basically, data compression in this application is based on the fact that data from TV pictures of
the earth have a significant amount of redundancy from one resolution element to the next. Nevertheless, since it is
not known, a priori, where the redundancy will exist within the picture, the data transmission system without data
compression is generally designed as if no redundancy exists. In effect the system transmits each resolution element
independently. Data compression techniques involve the processing of the picture data to remove redundancy prior to
transmission so that a reduced quantity of data is transmitted. Upon reception of the compressed data, processing
reinserts the redundancy to reproduce the original image data.
Because of the potentiality of data compression, a feasibility study was initiated by the University of Southern
California in participation with and under contract to NASA Goddard Space Flight Center with the cooperation of NESS.
Theoretical studies complemented by an empirical analysis of taped digital data verified that data compression could
be applied at a significant economic savings to ATS video data. After the NOAA-2 spacecraft launch in October of 1972,
the ATS derived data compression algorithm was applied to the NOAA-2 Very High Resolution Radiometer (VHRR) data.
A performance comparison verified that the ATS data compression technique could also be used with VHRR data to pro-
duce a significant economic savings. A more detailed description of the data, the compression techniques, and the
equipment is contained in the succeeding sections.
Data Source
Two types of spacecraft have been used to provide the source data. The first type is the ATS (1 or 3) spacecraft
in geostationary orbit ? height 35,780 km, period 24 hours ? and the second is the NOAA (2 or 3) spacecraft in sun-
synchronous polar orbit ? height 1460 km, period 115 minutes.
The ATS 1&3 spacecraft are spin stabilized with a spin rate of 100 rpm and with the spacecraft spin axis parallel
to the earth's spin axis. The ATS data used is from the ATS spin scan camera which sweeps one image line per space-
craft revolution. The ATS spin scan camera image line period is determined by the spin period of the spacecraft,
nominally 600 milliseconds. To permit the full earth disc to be scanned from pole to pole, a worm drive moves the
image field of view via the scan optics so that the optical axis stays in the plane containing the spacecraft spin axis. The
two mile image field of view is moved southward two miles for each revolution of the ATS spacecraft. A full earth
image takes 24 minutes to scan 2400 image lines. Since the ATS 1&3 spacecraft are in synchronous orbit, the earth
subtends an angle of 18 degrees or 1/20th of a revolution. The video data are derived from a photomultiplier tube in
the spin scan camera which points at the earth for only 30 msec. out of each 600 msec. scan period. The data are
transmitted in analog form to the CDA station where it is digitized for transmission to the central data processing loca-
tion. A digital synchronizing circuit uses a spacecraft derived sun pulse to produce a signal to indicate when the A/D
converter should begin to operate for the 30 msec. of useful data out of each 600 msec. spin period.
The sampling rate and digital resolution depends upon the user requirements and the transmission capability.
Nominal characteristics are 4096 samples per line each quantized to one of 64 levels and represented by a six bit binary
word. The communications link is a 48 khz wideband group with Western Electric 303C modems operated asynchron-
ously without a scrambler.
The NOAA spacecraft are 3-axis stabilized so that the scanning radiometers always point earthward. The space-
craft rotates about its pitch axis once per orbit with the pitch attitude being nominally maintained to within ?0.5 degrees.
The NOAA spacecraft data used with the data compression equipment come from a scanning radiometer called the Very
High Resolution Radiometer (VHRR). The two-channel scanning VHRR instrument is sensitive to energy in the visible
spectrum (0.6 to 0.7 microns) and infrared window (10.5 to 12.5 microns). Within the VHRR instrument, energy is
gathered by a 12.7 cm eliptical scan mirror, set at an angle of 45 degrees to the scan axis, and a telescope. The scan
mirror rotates at 400 rpm which causes the half nautical mile image field of view to sweep the earth in a direction per-
pendicular to the NOAA spacecraft ground track once every 150 msec. One image scan line is generated per revolution
of the VHRR mirror. The consecutive scan lines of the VHRR data can be used to produce an image due to the space-
craft motion along the orbital track which causes the subsatellite point to move one half nautical mile per 150 msec. ?
one scan line period. The VHRR instrument produces a continuous stream of scan line image data without frame seg-
mentation. Since the NOAA spacecraft are in a low altitude orbit compared to ATS 1&3 the earth is viewed for one third
of each VHRR scan or about 50 msec. The VHRR data are handled in two ways. First, the VHRR data are continuously
transmitted throughout every orbit. This mode of operation is known as the High Resolution Picture Transmission
(HRPT) and at the ground station the data are designated HRPT data. Second, in addition to the HRPT mode of opera-
tion up to 8.5 minutes of data may be recorded onboard the spacecraft for later playback when commanded during a CDA
contact. The spacecraft recorded VHRR instrument data are designated VREC data. The VREC data are played back
from the reel-to-reel spacecraft recorder by driving the tape in the reverse direction and at the same speed as that
used for recording. At the CDA station the VREC data differ from tie HRPT data in that the time axis is reversed, i.e.
the VREC data are backwards with respect to the HRPT data. Tape speed variations affect both timing and amplitude of
the playback VREC data (FM recording is used), however the timing error is much more noticeable in the displayed
image without compensation. Image compensation is provided for by recording a reference tone along with the VREC
instrument data on the spacecraft recorder. During playback the reference tone is transmitted on a separate subcarrier
to permit its use on the ground in compensating for tape recorder flutter and wow (F&W). The space-to-ground NOAA-
2&3 communications link uses FM-FM modulation. At the CDA station the NOAA spacecraft data are demodulated to
produce FM modulated HRPT data, VREC data, and the VREC F&W reference signal. These three signals plus a CDA
crystal controlled reference tone are recorded on an instrumentation ground recorder in the 'direct mode' at 60 inches
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 95
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0
per second (ips). All of the NOAA-2&3 data processed by the data compression system is obtained from the CDA
instrumentation recorder operating at one fourth the record speed or 15 ips. The VREC F&W signal passes through
two record/playback steps and the HRPT F&W signal through one step since it is generated locally at the CDA station.
No attempt is made to correct the amplitude variations introduced in the FM modulated HRPT & VREC data signals
due to the F&W of the analog tape recorders. However the F&W signals are used to control the sampling rate of the
analog to digital (A/D) converter and this corrects for timing variations. The A/D converter is part of the data com-
pression equipment and is preceded by an FM discriminator, figure 1.
ATS
NOAA
S/C
RECEIVER
SYNCH.
A/D
ATS 7.--04 A/D
HRPT
VREC
RECEIVER
RECORD
60 IPS
-00
BUFFER
INPUT
CHANNEL
MINNOMMIMIN????111111.
303C TO
7"."1MODEM SUITLAND
11
INTERDATA
4
L
OUTPUT
CHANNEL
ASR 33 9TK - 800 CPI
TELETYPE MAGNETIC TAPE
PLAYBACK
15IPS
ANALOG
TAPE
Li
FM
DEMODULATOR
Figure 1. Wallops Island Data Compression System
Implementation Requirements and Constraints
In the case of the ATS data compression experiment an operational digital transmission system had been installed
in 1969 and was in daily use between the Wallops CDA station and the Suitland NESS central data handling facility when
the data compression experiment was conceived. At Suitland the ATS data is used to produce hundreds of wind vectors
over the oceans on a 400 kin grid to be used as one type of input to the primary NOAA numerical weather model. The
ATS data is also converted to an analog facsimile format which is transmitted over 4 khz phone tines with C5 condition-
ing to remote weather analysis centers e.g. the hurricane watch center in Miami, Florida and the tornado watch center
in Kansas City, Kansas. In an on-line weather data transmission system, particularly where the data is being used to
monitor severe storms, reliability is very important. For this reason, it was necessary to test the data compression
system in an off-line mode before it could be used to transmit real-time data in an operational mode. Since the non-
data compression operational ATS Data handling system between Wallops and Suitland handles digital data quantized to
six bits per sample in a gray code, it is necessary to deliver reconstructed data from a data compression system in the
same format. NESS also requested that the data compression system not be allowed to introduce errors in the samples
due to the compression algorithm. The data compression system is required to be transparent to the receiving terminal
at Suitland and to the data user. To provide for ease of testing and rapid recovery in the event of a malfunction, the data
compression system is designed to be inserted or removed from the transmission system by the operation of a single
switch at each end of the wideband communications link. The operation of the data compression system is designed to be
simple to insure that operator errors are minimized. Several versions of the software interface between the operator
and the data compression hardware were required to achieve a satisfactory level of operation.
The NOAA spacecraft data can be effectively handled by preprocessing to provide an improved line synchronization,
calibration, and gray scale correction. The image display device used with the NOAA spacecraft data is the same as
that used for the ATS data and all users on the ATS facsimile circuit can receive NOAA 2&3 VHRR images. Since it is
desired to produce a 1:1 aspect ratio in the center of the VHRR images and the line to line spacing is fixed by the worm
drive in the photofax display, it is necessary to select a sampling rate which allows 3/4 of the earth view to be dis-
played. This is a satisfactory compromise since the geometric distortion of the image is significant beyond the area
displayed, due to the low 1460 km orbit of the NOAA spacecraft. The photofax display available for the VHRR data is
the same one used to display ATS image data and operates at 100 rpm. At the CDA station the VHRR data are received
at a 400 line per minute rate and recorded on a Mincom instrumentation recorder at 60 ips. It would have been prefer-
able to digitize the VERB data at the CDA station as it was received from the spacecraft, however the ATS data com-
pression hardware cannot handle the high data rates necessary. A high speed large volume digital data buffer would
be required and funds were not available for its procurement. The analog Mincom recorder is used to replace the
digital buffer and to slow down the VHRR line rate from 400 to 100 lines per minute by playing back the tape at 15
ips. Two types of operations are required for the VHRR data handling; the first type transforms the VHRR data into
96 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0
FROM
WALLOPS
ISLAND
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
303C
MODEM
BIT
SYNCH.
0 io
Crt'(
INPUT
CHANNEL
PICTURE
PROCESSOR
DIGITAL DATA
TO NESS COMPUTER
ANALOG VIDEO DATA TO
KANSAS CITY & MIAMI
INTERDATA/4
MINICOMPUTER
ASR 33
TELETYPE
OUTPUT
CHANNEL
9TK -800 CPI
DIGITAL TAPE
Figure 2. Suitland Data Reconstruction System
a format compatible with the ATS digital display and data handling equipment. This has the advantage of requiring
only half of the processing equipment ? the Wallops end of the compression system. The second type of operation
required is that digital VHRR data be quantized to 8 bits and transmitted to Suitland. An additional requirement of the
second type of operation is that 100% of the VHRR scan be reconstructed at Suitland while only 50% was needed in the
first type of operation. Since the use of data compression could accomplish the data volume objectives of the second
type operation, it was further required that no error be introduced in the reconstructed data due to the compression
algorithm in the absence of channel errors.
Data Compression Techniques
The goal set for the data compression system is to attain a reduction in transmission rate of two to one with six
bit ATS data or eight bit VHRR data and not introduce error due to the compression algorithm. The algorithm used
has been chosen to meet the real-time requirements for error free compression within the limitations of the Interdata
4 minicomputer used at each end of the compression system. The data compression algorithm can be varied within
wide limits because of the use of general purpose minicomputers to do the encoding and decoding. Presently the
algorithm forms successive sample-to-sample differences and encodes the differences of up to a maximum magnitude
of 9 for six bit data or 14 for eight bit data using a variable length code. When the difference exceeds the maximum
magnitude, the sample value itself is sent with an appropriate prefix to distinguish this from a difference code word.
The code words are chosen using the Shannon-Fano encoding procedure with empirical frequencies for the relative
probabilities. In particular, a difference greater than 14 is encoded by sending a prefix of four zeros followed by the
sample from the data.
The above procedure was found to result in a code whose efficiency is greater than 90%. The code words are
characterized by a prefix whose leading zeros identify each code word's length and decoding table.
When redundancy is removed, the data quality becomes more sensitive to channel error. To combat this effect,
controlled redundancy is added in the form of synchronization words. In addition to the usual line synch., data synch.
words are periodically inserted in the encoded data after each fixed number of encoded samples. Following each synch.
word the difference from zero is sent rather than the sample difference to eliminate any error accumulated at the re-
ceiving terminal. Each scan line starts with a 32-bit line synchronization word, which establishes both line and word
synchronization. The 9096 data samples in each scan line are next encoded into 16 blocks with each block represent-
ing 256 data samples. At the end of each block is a 16-bit data synchronization word which ensures that word syn-
chronization is maintained and that word synch, can be reacquired within 256 data samples in the event of a synch.
loss. Due to the variable length coding, the number of bits between two successive synchronization words varies.
Although the data compression encoding procedure selected leads to a data format where word synchronization depends
on a low probability of channel error, experience has shown that the data compression system has a more robust syn-
chronization characteristic than an image transmission system in which there is only one line synchronization word on
a scan line and in which word synchronization is obtained from a fixed word length data format. The variable block
length and hence variable line length data format produces a variable number of bits per image scan line; the number
of bits depends on the redundancy in the scene at the scan line location.
In figure 3, the image scan line number runs from 0 to 2000 on the abscissa, andthe ordinate indicates the per-
centage of non-redundant data from the original uncompressed line remaining to be transmitted after data compres-
sion. The image data to which figure 3 corresponds is an ATS spin-scan cloud cover image of the full earth disc
enclosed by a black space-view surround. Figure 3 shows that more data must be transmitted in the middle of the
picture than at either end, since the intersection of the scan line with the earth disc has its maximum chord length near
line number 1000. However, not all of the data compression in each line is due to the constant amplitude space-view,
since space-view accounts for only 10 percent of the uncompressed data near line 1000 and yet the compression ratio
exceeds two-to-one in this area of figure 3. The area above the curve of figure 3 represents the increased capacity
of the data compression communication system as a function of the ATS image scan line number.
SPIE Vot 66 (1975) Efficient Transmission of Pictorial Information / 97
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
100
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
1000
COMPRESSED LINE NUMBER
Figure 3. % Data Remaining After Data
Compression on a Line-by-Line Basis
2000
ENTER MODE (0 - 2CT XV
X V TAPE OUTPUT
ATS
HRPT
2 VREC
0
1
2
3
4
5
6
7
FBI
BA
OFF
OFF
RECORD COMP.
RECORD GRAY
RECORD 8811
RECORD 8 BIT
RECORD 8811
READ 8 BIT
HEAD 8 BIT
GRAY
READ OR
COMP.
RECORD COMP.
RECORD GRAY
RECORD QUANTITATIVE
COMPRESSED
GRAY
OFF
OFF
OF F
GRAY
COMPRESSED
GRAY
COMPRESSED
GRAY
COMPRESSED
COMPRESSED
GRAY
OFF
Figure 4. Wallops Commands
Since data compression can produce short term increases in data quantity, the shorter of the compressed image
line and the original image line without compression can be transmitted with a special code used to indicate which. In
spite of the obvious varying data activity from one pole of the earth to the other in figure 3, the compressed line length
remains relatively constant suggesting that the system can be quite effective with only a one line buffer. This is in
fact true except for noisy scan lines which are usually of little interest to the user.
Compression System Hardware
The Wallops Island data compression equipment configuration is shown in block diagram form in figure 1. The
front end of the system accepts analog data from one of three sources ? ATS, HRPT, and VREC, of which the latter
two are derived from the NOAA-2&3 spacecraft VHRR instrument. The analog data are sampled, quantized to eight
bits and stored in the Interdata-4 (I4) core memory through a direct memory access channel. The A/D sampling
rate is controlled from the 14 by loading a register with the desired sampling rate control word. Between the A/D
converter and the 14 memory is the A/D and direct memory access control logic. The sampling rate of the A/D con-
verter may be phase locked to the F&W signal from the CDA ground recorder or a crystal controlled oscillator. The
input direct-memory-access (DMA) logic may be triggered to begin storing A/D samples in memory. An externally
derived line synchronization pulse is used for this purpose.
There are four DMA channels in the data compression system, two at Wallops Island and two at Suitland. At
Wallops and Suitland one DMA channel is used for input and one is used for output. A DMA channel has a 16 bit
register which may be loaded by an instruction from the 14. This register is called the Pointer Register (PR). The
PR must be loaded with the 14 memory address of the DMA channel instructions. The DMA channel instruction list
contains three types of information, the start and end of an 14 memory data I/O buffer, the address of the next instruc-
tion list, and DMA channel control bits to indicate the action of the channel when an I/O buffer data transfer is com-
plete. The starting and final address registers of the DMA channel are loaded by the DMA control logic from the 14
core memory instruction list once the 'DMA START' instruction is executed by the 14. Once started the DMA channel
requires no further action by the 14 CPU and thus, leaves the CPU free to compress data, etc. The DMA channel
can generate interrupts when each I/0 buffer is complete. This permits the 14 computerprogram to maintain syn-
chronization with the I/0 to be processed. It is a simple matter with this type of DMA channel to generate interrupts
at the end of several regions of an input scan line and initiate different types of processing algorithms for each region.
This procedure is used to control the timing of the data compression/reconstruction software.
There are three other elements of the data compression hardware which are used in both the Wallops Island
redundancy removal equipment, figure 1 and the Suitland data reconstruction equipment, figure 2: an ASR 33 teletype,
a 9 track 800 CPI digital magnetic tape drive, and the 16K byte (eight bit) core memory of the 14 minicomputer ?
1K = 1024. The teletype is used to transmit operator commands to the compression system software and to display
the compression system status both for the operator and as a historical record for future reference. Programs are
entered into the 14 memory from the digital 9 track tape. Simple revisions to the software are made while the pro-
gram is core resident by modifying, adding, deleting the necessary 14 instructions/constants, and finally writing the
modified core image out to the digital tape for later re-entry. The 800 CPI 9 track tape runs at a speed of 25 inches
per second which produces a data transfer rate of 20,000 bytes per second. The transfer of data between tape and
the 14 memory is handled by a standard Interdata Selector Channel which operates as a DMA interface between the
tape controller and the core memory. The Interdata Selector Channel must be set up under program control of the
14 and can transfer only a single sequential data area before it halts and must be set up again. Major changes to the
compression system software are accomplished at GSFC or USC, stored on digital tape, and sent through the mail to
be used at Wallops Island or Suitland. However, the main use of the digital tape is for the temporary storage of
processed data. At Wallops the digital tape is used for recording when the Wallops-Suitland communications link
is unavailable. The advantage of having the data on digital tape is that for the VHRR data the image data stream can be
easily broken into 2400 line frames with a 100 line overlap. The overlap consistency is difficult to maintain when
working from analog tape.
98 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
UNDERLINE Indicates Operator Input
B. V. 17.4 3/28/74
ENTER MODE(0-2C):10
MODE 10
VIIRR
TAPE OFF
D* 040974
* ST
ST
]3.V. 17.4 3/28/74
ENTER MODE(0-2C):10
MODE 10
VHRR
TAPE OFF
T* 1900Z
R/0* 1901
D* 040974
VIS OR IR* VIS
T* 19352
040974 1900Z VHRR 1901
VIS
R/O* 10-908
TELE ON SEND COMP
VIS OR IR* IR
* GO
040974 1935Z VIIRR
1908
IR
GO
TELE ON SEND COMP
* L=100
* GO
L=100
GO
* L=1000
* L=100
L=1000
L-100
* COMP. RAT. =2.92
* COMP. RAT.-2.15
END 1000 LINES
END 0100 LINES
Figure 5. Example of System Control Teletype Input-Output
Figure 6. ATS-3 Spin Scan IMAGE 17, Dec. 21, 1972
The Wallops CDA compression hardware contains an output parallel-to-serial converter to interface the 14
memory to the 303C modem. The output bit rate is selected from the 14 software by loading a register in the output
channel control logic. Between the 14 core memory and the output controller logic is a DMA channel described above.
Note that none of the compression operations are done with special purpose hardware.
The NESS data reconstruction equipment at Suitland is similar to the Wallops Island system. However the input
and output device controllers are different. The input device controller receives a clock and serial data from a bit
synchronizer which is in turn driven from the receiving Western Electric 303C modem at Suitland. The input channel
logic can be set up to find both the 32 bit line synch. and the 16 bit data synch. using a Hamming metric with the
allowed distance usually set to zero. The input channel is essentially a serial-to-parallel converter with two select-
able modes of operation provided for handling the serial input data. The first mode assembles sequential 16 bit words
of data from the incoming data stream and sends the parallel words to the DMA channel to be stored in the 14 core
memory. The second mode uses a hardware decoder to help reconstruct compressed data. The code class which can
be decoded by the hardware must have a prefix of N leading zeros terminated by a one, N is from 0 to 15. The hard-
ware compression decoder has been used to reconstruct six bit source data. Implementation of the hardware com-
pression decoder was undertaken to speed up the data reconstruction process. It has since been found that the pure
software reconstruction algorithm for eight bit source data could be written to operate nearly as fast as the six bit
data hardware/software decoder combination. The hardware decoder, as implemented, has a disadvantage in that
it stores 16 bits in the 14 memory for each sample decoded, producing a temporary data expansion, and does not
allow compressed data to be buffered since it is not available in the core memory.
At Suitland the output device controller is the final element of the data compression communication system. The
output device controller is basically a parallel-to-serial converter which can produce a data stream in the same for-
mat as the Suitland bit synchronizer. The Suitland output device drives the NESS digital picture terminal which is
the distribution point at Suitland for analog facsimile data to be used by remote display devices as shown in figure 2.
The NESS digital picture terminal also generates images for use at Suitland and sends digital data to the NOAA
weather computer, an IBM 360-195. The output channel requests data from a DMA channel connected to the 14
memory. There are two selectable modes of operation for the output channel. First, each eight bit byte can be con-
verted to the six bit gray code format required by the digital picture terminal. Second, consecutive 16 bit halfwords
from the 14 memory are shifted out of the output channel device in one long serial stream. The second mode is used
to test the Suitland compression equipment input channel. The input device and output device are designed to be
driven from independent data clock rates, although only a common clock has been used to date.
Data Compression System Operation
To operate the data compression system the operator initiates execution of the compression system executive
software and receives a response defining the program revision number, the date revised, and a request that the
operating mode number be entered as shown in figure 4. The operator enters a two character operating mode on the
teletype console. The first digit indicates the data source, 0 = ATS, 1 = HRPT, and 2 = VREC. The second digit
selects the I/0 devices and the type of processing to be used. The operating system responds by requesting that
documentation information be entered by the operator. For example, D*, T*, R/0*, VIS OR IR* indicate the date,
time, spacecraft orbit, and whether the data is visible or infrared ? see figure 5. The VIS OR IR input is used both
for documentation and to select the appropriate display gray scale lookup table. Messages are displayed to indicate
the status of the input/output devices, e.g., TAPE RECORD 8 BIT, TELE ON, and SEND COMP. When the operator
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 99
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
VIS
Figure 7. NOAA ? 2 VHRR Images of Hurricane "AVA" Rev 2942, June 7, 1973
is ready to begin processing data he can type GO to start or SE to enable the processing of data which will start upon
the receipt of a frame synch. signal. At Wallops the termination of the processing of image data by the data com-
pression hardware can occur in two ways ? either the line count reaches a preset limit or the operator types FI to
finish processing. The line count limit is automatically set to 2400 by the compression system software before the
processing of each image but can be changed by the operator command L=XXXX, e.g. L=1000. On the front panel of
the Interdata-4 computer are two rows of display lights. Each row of display lights contains four hexadecimal light
emitting diode display characters which can be loaded under program control. The upper and lower rows of the 14
display are loaded with the decimal line count limit and the current line count respectively. At Suitland the end-of-
image is found by reaching the preset line count limit or by receiving a frame-complete synch, word from Wallops.
At Wallops two types of operating procedures are employed. When the ground link to Suitland is available data
are sent in the desired format. When the ground link is busy with other traffic, analog data is input to the compres-
sion system and stored on digital tape for later playback to Suitland. These digital tapes are selectively sent to GSFC
or USC for evaluation. At Suitland the data compression system is used in two ways. First, to receive, process, and
record on digital tape data to be used in the IBM 360-195 operational weather data processing system. Second the
data reconstruction system can drive the digital picture terminal and its associated facsimile remote displays.
Results
The Wallops Island, VA to Suitland, MD data compression system has been operating for the past two and one
half years, has demonstrated a high degree of reliability, and has helped to achieve a high degree of user confidence
in the value of compressed data. The compression ratio varies depending on the image scene content but a typical
value is 2:1. The quality of the received and reconstructed data have been monitored in two ways. First, based upon
a visual review, the images produced have been found to be of equal or superior quality to those produced without
data compression ? see example in figures 6 & 7. In the rare cases where a channel error affect is discernable in
a reconstructed data compression image, the effect shows up as a streak of less than 1/16 of the image scan line
length. Second, an error detection code shows less than one percent of the reconstructed image lines are in error.
These image line errors are caused by channel errors and not by the compression system hardware or software.
The role of data compression is not yet clear in the operational NOAA/NESS spacecraft program. The Alaska
CDA to Suitland, MD communication system will very likely be reconfigured to take advantage of the economies inher-
ent with communication satellites for long ground distances. Satellite-to-satellite communication systems are being
planned which will reduce the necessity for on-board storage and allow continuous real time transmission of space-
craft sensor data to the central data processing facility. Perhaps the ground storage and processing areas will be
the first to benefit from the practical application of data compression techniques.
Acknowledgement
The authors are pleased to acknowledge the encouragement, operational support, and helpful comments of R.
Koffler, J. Leese, C. Bristor, E. Connelly, and H. Van Dyke of the NOAA/NESS.
100 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
A CAQ BANDWIDTH REDUCTION SYSTEM
FOR RPV VIDEO TRANSMISSION
James J. Pearson
Lockheed Palo Alto Research Laboratory
Palo Alto, California
Abstract
A bandwidth compression system for the transmission of video images from remotely piloted vehicles has
been built and demonstrated. Novel features of this system are the use of the Constant Area Quantization
(CAQ) technique to obtain spatial bit rate reduction of 6:1 and a rugged and compact scan convertor, based
on a core memory, to accommodate temporal frame rate reduction. Based on the ability of the human eye to
perceive more detail in high contrast regions than in low, the CAQ method transmits higher resolution in
the former areas. The original six-bit digitized video is converted to a three level signal by the quanti-
zing circuit and then Huffman - encoded to exploit its statistical properties and reduce it further to
one-bit per pixel. These circuits operate on one line of the picture at a time, and can handle information
at full video (10 MHz) rate. The compressed information when received on the ground is stored in coded
form in a two-frame (500,000 bit) digital core memory. One frame of the memory is filled while the other
is being displayed and then the two are interchanged. Decoding and reconstruction of the video are per-
formed between the memory and the display.
Introduction
The use of remotely piloted vehicles (RPV's) to perform tactical reconnaissance and strike missions
under conditions too dangerous to permit the risk of sending manned aircraft is being given increasingly
serious consideration by the armed services. The eyes of the absent pilot are replaced by transmission of
video imagery from an onboard camera, and the success of the mission depends crucially on the integrity of
the video link. Video signals are inherently broad band, and the necessity of protecting them from jamming
by some form of spectrum spreading technique makes even greater demands on the limited available spectrum.
When transmissions from a number of vehicles are occurring simultaneously, as envisioned in most RPV battle
scenarios, the spectrum required is simply unavailable. Therefore, some means must be employed to reduce
the bandwidth of the video signal prior to transmission.
Fortunately, a sequence of meaningful video images contains enough redundancies to permit its com-
pression by a sizeable factor without the loss of significant information. These redundancies are of two
basic types--temporal and spatial. The temporal, or frame-to-frame, redundancies are most practically
exploited by reducing the frame rate from the standard thirty frames per second to some lesser number,
dictated by the particular mission requirements, through the use of some form of slow-scan camera. The
spatial, or single frame, redundancies arise from the existence of finite range correlations within the
picture. It is in the choice of a method to exploit these spatial redundancies that the constraints of the
RPV application are most strongly felt. Since the vehicles are small and light, often comparable to large
model airplanes, the equipment must be compact and lightweight and require little power. In addition,
since the RPV's are, to some degree at least, expendable, the equipment must have a low cost.
With this application and its attendant constraints in mind, the Lockheed Palo Alto Research Labora-
tories have designed, built and demonstrated in the laboratory a bandwidth compression system based on the
Constant Area Quantization (CAQ) technique. The Lockheed system performs spatial compression to reduce a
six-bit per pixel digitized input signal to one-bit per pixel. It is designed to operate either at full
video rate or at any reduced frame rate for which the jerkiness and delay are tolerable in the particular
application. In the reduced frame rate case, scan conversion is performed on the ground in a rugged and
compact digital memory, and an analog signal is delivered to a standard monitor at thirty frames per second.
Principle of Operation
Constant Area Quantization is a one-dimensional
spatial compression technique. It is based on the
property of human vision that the eye sees more
detail in high contrast regions than in low con-
trast ones. This property is illustrated in Fig.
1, where the circles decrease in size from left to
right and in contrast from top to bottom.
Fig. 1 Test pattern demonstrating the principle
of Constant Area Quantization.
?
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information/ 101
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
If the figure is viewed under conditions of distance and illumination such that some but not all of the
circles are visible, it will be seen that the cutoff is not along a verticle or horizontal line but along a
diagonal, indicating a combination of size and contrast. This fact is exploited by transmitting higher
resolution in the high contrast areas where the eye will detect it, and lower resolution elsewhere.
In practice, this is accomplished in the simple
way shown in Fig. 2. The bold line in 2 (a) repre-
sents the analogue video signal which is sampled at
a rate indicated by the tick marks. The smooth
curve is replaced by line segments, beginning and
ending at sample points, in such a way that the
right triangles with these segments as hypotenuse
and sides parallel to the coordinate axes all have
the same area, A/2. The value of the area
threshold A, is an arbitrary parameter chosen at
the outset. It has the somewhat peculiar units of
brightness levels times pixels. At each sample
point the output signal in 2 (b) is generated.
This signal is zero except at the end of a line
segment, where it is either +1 or -1 (P or N)
depending on whether the signal increased or de-
creased during that interval. In this way, a
six-bit (64 level) input signal is reduced to a
three level one. In binary terms three levels
corresponds to 1.58 (= log 3/log 2) bits, and this
value can be approached arbitrarily closely by
coding a number of picture elements together.
(For example, the 243 possible values for five
adjacent pixels can be represented by one eight
bit number, giving an average of 1.6 bits per
pixel.) The original picture can be recon-
structed from the three level signal in two
different ways illustrated in 2 (c) and 2 (d).
In the first or "slope" method, the line segments
in 2 (a) are reconstructed from a knowledge of the
base and area of the appropriate right triangle.
In the second, or "step" method only the endpoints
of the segments are computed and the intervening
pixels are filled in with constant values. The
relative performance of the two reconstruction
techniques is discussed later, but the step re-
construction is used in the actual system.
PP
0 0 0 0 0 0 0 1 t 1 0 11 1
0 000 1
111 111
CAQ OUTPUT
SIGNAL
.......-
SLOT,E
RECONSTRUCTION
1
000000
FrIl
STEP
RECONSTRUCTION
1 I 1TI11111.
Fig. 2. Constant Area Quantization applied to a
simulated video signal.
The way in which the three level signal is generated is shown schematically in Fig. 3. The video signal
is sampled and digitized to form the signal Vi for pixel i. Vi is subtracted from the reference level R
corresponding to the beginning of a line segment in Fig. 2 (a). The magnitude of the resulting difference
is then compared to the value A/Ai, where Ai is the time difference between pixel i and the pixel at the
beginning of the segment. If IVi - RkA/Ai this indicates that the area, 1/2 IVi - RI.Ai, of the
previously mentioned right triangle exceeds the value
A/2. When this occurs, a P or an N is produced de-
pending on the sign of Vi - R; otherwise a zero is
r? cmi, 1 output. Whenever a P or an N is generated, a new
I At
COMPARISON reference level, R, is formed by the reconstructor, and
SYNC. ---fd CLOCK ---
WAVEFORM a new line segment is thus initiated. The value R,
'---1-4"'GENERATOR which is identical to the picture element which will be
1
I reconstructed on the ground, differs from the actual
video value, V, because the threshold is in general not
IvSUBTRACTOR
, reached exactly at a sampling point. R is used instead
COMPARATORI-on G. P.N of V to prevent the accumulation of errors.
ANALOG
VIDEO ?4.1
SIGNAL
SAMPLER is
A TO
CONVERTER
RECONSTRUCTED
SIGNAL R
-
L
SIGNAL
RECONSTRUCTOR
P OR N
Fig. 3. Constant Area Quantizer block
diagram.
102 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Experience with a number of pictures indicates that
the three output signals of the CAQ are not equally
probable. In fact, zeroes predominate to such an
extent that a considerable additional bandwidth com-
pression can be achieved by using some form of
statistical coding. The code actually chosen is a
Huffman code, applied to the nine possible values of a
pair of adjacent pixels. The code itself is given in
Table I. It is a variable length comma-free code,
which assigns the shortest code (0) to the most proba-
ble event (two adjacent zeroes) and the longest codes
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
to the least probable events. This encoding is
used to reduce the picture to one bit per pixel.
This value of one bit per pixel is strictly
maintained for each line. This can result in a
small amount of truncation at the right hand
Table I.
Huffman code for ternary-to-binary conversion
Original
Symbols
Probability
Binary
Code
Code
Length
Average Code
Length
00
.5390
o
1
.5390
edge of a particularly busy line, but that is
OP
.0976
101
3
.2928
the only additional image degradation introduced
ON
.0976
110
3
.2928
by the Huffman encoding.
PO
.0976
111
3
.2928
NO
.0976
1000
4
.3904
System Description
PP
.0177
100110
6
.1062
PN
.0177
100111
6
.1062
The bandwidth compression system, as it would
NP
.0177
100100
6
.1062
be configured for flight is outlined in Fig. 4.
NN
.0177
100101
6
.1062
The analog video output from either a standard
2.2326
or slow scan video camera is sampled and digi-
tized to six-bits (the rest of the system can
handle eight-bits if sufficient camera quality
and A/D convertor complexity are justified).
The digital signal is fed to the Constant Area Quantizer, which outputs one of three values (0, P, N) at the
pixel rate. This three level signal goes to the Huffman encoder whose output comes in bursts of from one to
six-bits ?at the end of each two-pixel interval. This coded signal must be buffered for uniform transmission.
The signal at the output of the buffer has a bandwidth corresponding to one-bit per pixel and can be trans-
mitted over any desired form of digital anti-jam data link.
SENSOR
-11????
ANALOG
TO
DIGITAL
AIRBORNE EQUIPMENT
CONSTANT
HUFFMAN
AREA
QUANTIZER CODER
BUFFER
SYNCHRONIZATION
AND
TIMING GENERATOR
GROUND EQUIPMENT
I2 FRAME
REFRESH BUFFER
MEMORY
[SYNCHRONIZER
HHUFFMAN
DECODER
TRANS-
MITTER
CONSTANT
AREA
RECONSTRUCTOR
DIGITAL
TO
ANALOG
4
ITIMING
GENERATOR
Fig. 4. System block diagram.
DISPLAY
The received and demodulated signal is stored in coded form in one-half of a two frame memory. While
one-half of the memory is being filled, the other half is being used to refresh the display at a full
thirty frame per second TV rate. Then the roles of the two halves of the memory are interchanged. If a
standard rate video camera is used in the vehicle, this scan conversion step is by-passed. One line at a
time is removed from the memory at video pixel rate into a buffer capable of delivering it to the decoder in
high speed bursts. The decoder removes from the buffer as many bits as it finds necessary to generate
two-pixels in CAQ form. The three level output of the decoder, which is uniform in time, is used by the re-
constructor to generate a digital video signal which passes through a D/A convertor to the monitor.
It is not the intention of this paper to describe the circuitry in detail. A few comments about some of
the individual blocks in Fig. 4 may, however, prove helpful. All of the circuitry is designed to operate at
a 10 MHz pixel rate, to enable it to handle standard video frame rates easily. This speed requirement can
be relaxed for the airborne portions of the system if a slow scan camera is to be used, and a simplification
in the circuitry results. The ground-based decoder and reconstructor must operate at the high speed, how-
ever, since they operate between the scan convertor and the display in order to permit storage of the images
in compact coded form. Since several operations must, in general, occur within one pixel time, a 40 MHz
clock is used throughout. All of the components are Schottke series TTL except in the Huffman encoder,
where some 10K series ECL chips are used. The quantizer operates as described in Fig. 3. The quantity 1/L1
is generated by a hyperbola generator consisting of a counter and a decoder network. Thresholds are
restricted to powers of two so that multiplication by A implies simply a shift in the bit connections. The
encoder straightforwardly generates the bits specified by Table I for each pair of pixels, using a series of
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 103
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
NOR gates, and shifts out the appropriate number of them into the buffer. Figures 5 and 6 are photographs
of the CAA and Huffman encoder circuit boards. The decoder is a sequential logic network which takes bits
from the buffer until it recognizes the end of a code and then generates the appropriate pair of pixel
values. The scan convertor consists of two 32,768 word by nine bit LEC memories on a single chassis. The
memories can deliver nine bits to the line buffer in 800 nanoseconds, which is sufficiently fast to assure
that 512 bits can be removed in each video line time. This form of scan convertor has the advantage of
being compact, rugged, and relatively inexpensive, without adding additional degradation to the imagery.
Fig. 5. Constant Area Quantizer
Circuit Board.
Performance
?
Fig. 6. Huffman Encoder Circuit Board.
Since this system is designed to transfer certain types of infor-
mation from moving video imagery to human observers, the only really
satisfactory measure of its performance is its effectiveness in doing
that job. Within the confines of a paper and in the absence of any
definitive subject tests, more indirect criteria must be used. Most
of the pictures in this section, illustrating as they do a wider
variety of algorithms and parameter values than are wired into the
actual system, are computer simulations. They are all of the same
subject to facilitate comparison. The original is shown in Fig. 7.
The tank farm subject, having a great deal of detail, is in some
sense a "worst case". The simulations reproduced on a microfilm
plotter have 256x256 pixels and thus represent only a portion of the
actual frame, which in the system contains 480x480 pixels. Figures
8 (a) and 8 (b) are original and compressed versions respectively of
video taped imagery run through the actual system and photographed
with a Polaroid camera directly off the monitor. The principle
parameter which can be varied in the CAQ quantizer is the area
threshold, A. Varying A does not alter the amount of bandwidth
reduction directly (the output of the quantizer is always 1.6 bits
per pixel, independent of both A and the input picture) but it does
greatly affect the image quality. A small value of A causes P's
and N's to be produced more often, thus increasing the low contrast
detail, but reduces the speed with which high contrast changes can
be followed, resulting in edge blurring. A large A improves the
edge response, but washes out the low contrast detail. The visible result of this is a streaked appearance,
as large regions are replaced by constant values which vary from line to line. Fig. 9, which shows CAQ
compressed versions with three different thresholds, illustrates these effects. The same point is made in
a different way by Fig. 10, in which mean squared error is plotted against threshold for the tank farm
picture (Fig. 7). The deterioration for low and high values of A is reflected in this plot. The difficulty
of using an objective criterion like mean squared error to judge a technique which attempts a subjective
match to human vision is illustrated, however, by a comparison of Figures 11 and 12. Both of these have
the same mean squared error (1.6%), but it is difficult to conclude that they are of the same quality.
Fig. 7. Original tank farm
image.
While the choice of A does not affect the amount of compression directly, it does vary the information
content of the output by altering the proportion of P's, N's, and O's produced, and thus it affects the
further saving that can be achieved by statistical coding. Fig. 13 indicates, for the same image, the
average run length of zeroes as a function of threshold. Fig. 14 contains a plot of the entropy of the CAQ
signal versus the average run length. Since the entropy places a lower limit on the number of bits per
pixel that can be achieved without further loss, it is a good measure of the possible benefit of coding.
104 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
(a)
Fig. 8. Original (a) and compressed (b) imagery
photographed from the monitor.
(h)
(a)
(b)
Fig. 9. CAQ - compressed images with three different thresholds:
(a) A = 2, (b) A = 12, (c) A = 32.
(c)
0 4 8 12 16 20 24
THRESHOLD
(BRIGHTNESS LEVELS X PIXELS)
Fig. 10. Dependence of mean
squared error on
threshold for tank
farm image.
Fig. 11. CAQ - compressed
image with A = 6.
Fig. 12. CAQ - compressed
image with A = 20.
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 105
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
AVERAGE RUN LENGTH
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
5
4
3
2
1
4 8 12 16 20 24
THRESHOLD
(BRIGHTNESS LEVELS x PIXELS)
Fig. 13. Dependence of average
run length on threshold
for tank farm image.
3.0
2.8
2.6
2.4
W 2.2
2.0
04 1.8
CI) 1.6
1.4
1.2
1.0
0. 8
0. 6
0.4
0. 2
1
3
5 7
9
11
AVERAGE RUN LENGTH
Fig. 14. Entropy of CAQ signal
and Huffman-coded bits
per pixel as a function
of average run length
for tank farm image.
The actual performance of the Huffman code is plotted on the same axes, and it can be seen to approach the
entropy curve very closely in the region of highest picture quality. The probabilities given in Table I,
which were used in determining the optimum code, were obtained by quantizing the tank farm scene, using a
threshold of 16, and counting the P's and N's. P's, N's, and O's were assumed to be statistically inde-
pendent and P's and N's to be equally probable. Different scenes, different thresholds, and more accurate
statistics lead to slightly different codes, but the differences in performance among them are essentially
negligible over normal operating ranges. The system is designed to pass one bit per pixel. Since the run
length versus threshold curve (Fig. 13) is picture dependent, a trade-
off exists between picture degradation caused by too high a threshold,
and line truncation produced by too short an average run length. This
tradeoff can be made by adjusting the threshold. The best threshold
for the busy image of Fig. 7 was found to be 16, while the optimum for
the video scene (Fig. 8) was 8.
The picture quality is affected to some small degree by the recon-
struction method chosen. Figures 15 and 16 compare the results using
the "slope" and "step" reconstructions discussed earlier. Although
the former gives a closer approximation to the original, in a mean
squared error sense, the subjective appearance of the picture was not
felt to be improved enough to justify the additional complexity.
Attempts to compare the CAQ technique with others must necessarily
involve subjective judgements. Since it is a one-dimensional algo-
rithm, it does not, of course, exploit correlations in the other
direction. An 8x8 block Hadamard encoding with compression comparable
to Fig. 16 is included for comparison as Fig. 17.
Fig. 15. "Slope" reconstruction
of CAQ image (A = 16).
Fig. 16. "Step" reconstruction of
of CAQ image (A = 16).
Fig. 17. Two-dimensional
Hadamard compressed
Image.
106 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Fig. 18. Three level DPCM
image.
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
The closest relatives of the CAQ are the class of predictive techniques, such as Differential Pulse
Code Modulation (DPCM), and it is very similar to them in its response to bit errors. CAQ differs from
these predictive techniques in quantizing the brightness-distance product, rather than merely the bright-
ness. A picture produced by a three level DPCM is shown in Fig. 18 and can also be compared with the CAQ
picture in Fig. 16. The DPCM levels are symmetric around zero change and the threshold was chosen to give
the same entropy (1.1 bits per pixel) as the CAQ picture. The mean squared errors were 1.41% for the CAQ
and 1.54% for the DPCM.
In summary, the CAQ technique provides a relatively simple approach, which can be implemented with con-
ventional and available components to meet the size, weight, and power restrictions of RPV's. The
technique requires no image storage for encoding; it exploits properties of human vision to provide
pictures of good subjective quality using only one bit per pixel; and it has been implemented to operate
over a full range of frame rates up to thirty frames per second.
Acknowledgements
This bandwidth compression system has been the work of a large number of people. The CAQ concept was
originated a number of years ago by G. C. Sziklai. His advice and suggestions strongly influenced the
development of the system, particularly in the early stages. Algorithm development, computer simulation
and coding analysis were performed by C. D. Kuglin. The CAQ circuitry and the Huffman encoder and decoder
were designed by M. Hilsenrath. The scan convertor and video circuitry were the work of B. S. Golosman
and H. Massey.
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information /107
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
INTRAFRAME AND INTERFRAME ADAPTIVE TRANSFORM CODING
Clifford Reader
Aeronutronic Ford Corporation
3939 Fabian Way, Palo Alto, California 94303
Abstract
This paper describes work performed as an extension of the techniques of transform
coding with block quantization. These techniques have been very well defined theoreti-
cally with implementation of the predicted optimum results. The purpose of this work was
to examine the techniques in practice and to define adaptive coding techniques which
could be practically applied to the coding of still and moving pictures.
Introduction
The techniques of transform coding and block quantization for a single Image are well
established, the purpose of the work described in the first part of this paper was to
examine in detail the application of these techniques in practice [1]. It is to be
expected that there will be a variability of performance since any practical implementa-
tion will be rather removed from the theoretical optimum case with two dimensional
Karhunen-Loeve sampling according to a known covariance and quantization based upon
assumed statistics. In particular it is noted that the condition of stationarity of
statistics will be violated and that for ease of implementation separable sub-optimum
transformations - in this case the Slant - will be used over sub-blocks (16 x 16 pixels)
of the image. A range of adaptive coding techniques is examined with practicality as
well as coding efficiency being of importance.
The work is then continued into interframe coding. A coder is described which
achieves a significant reduction in bit rate while not requiring the full frame memory
of other interframe coders. The error performance of the coder is examined and static
examples are presented. A discussion follows of the visual effect of the coder in real
time together with the projected buffer requirements.
Intraframe Coding
Summary of Coding Procedure
The image is slant transformed in sub-blocks of 16 x 16 pixels. To perform quantiza-
tion, the data is modelled as a Markov process and from this the variances of the trans-
form domain samples are estimated. Coding bits are allocated in proportion to these
variances and the samples are normalized in proportion to their estimated variances
before being quantized according to the Max quantizer. Tuning of the process is possible
by manipulation of the row and column correlations used to define the variances and with
the constant of proportionality used in normalizing by the variances. The apparently
arbitrary way in which tuning of the parameters influenced the coding error provided the
motivation for this work.
Adaptive Coding 1 - The Amplitude Parameter
The constant of proportionality (C) in the normalization of the data is termed the
amplitude parameter (AP). It was found that each sub-block of an image required a
different AP for optimal coding (in mean squared error sense) and that although for most
sub-blocks a good approximation could be made with one AP, in certain signficant cases a
markedly different - higher - AP was required suggesting that an adaptive process was
required. This was effected by making the constant C a function of the square root of
the energy of the sub-block less the D.C. component, i.e., the A.C. energy. This is
termed the amplitude factor AF. Thus C ? AP x AF. The result was as follows. In non-
adaptively coded images, the best result is obtained with an AP high enough to encode
the problem sub-blocks and high frequency quantization noise then appears on the other
sub-blocks. In the adaptive coding, the AF automatically compensates for the high energy
problem sub-blocks such that one AP is more suitable for the whole image. This is seen
as a freedom from noise in "flat" image areas and a slightly reduced mean square error.
Figure 1.
Adaptive Coding 2 - Inter-Pixel Correlation
True horizontal and vertical correlations for natural images fall in the region of
0.91 to 0.96. Nevertheless, the effect was examined of using different correlations to
define the model from which bit allocations and normalizing constants are derived.
Results are summarized in Figure 2 where it can be seen that the lowest error is
108 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
encountered using far lower correlations than the real ones at the expense of increased
sensitivity. What happens is that with the lower correlations, the available bits are
spread futher into the transform domain and more samples are encoded. Figure 3. However
each sample receives fewer bits and thus the position of the quantization levels (as fine
tuned by the AP) is more critical. Correlations of 0.76 were selected as a compromise
with the highly satisfactory results shaan in Figure 4.
Figure 1
Comparison of Adaptive and Non-Adaptive Coding
Correlations (0.86; 0.86) 1.5 bits per Pixel
AP 0.55 MSE . 0.000639
AP . 0.85 MSE . 0.000724
9
10
Ld 5
AP . 0.65 MSE ? 0.000501
Square Root AC Energy
Adaptive Coding
AP . 0.85 MSE 0.000588
Non-Adaptive Coding
Figure 2
AP . 0.75 MSE - 0.000827
AP = 0.85 MSE . 0.000818
GRAPH OF MEAN 'SQUARED ERROR vs AMPLITUDE PARAMETER
4
p=0.70
p=0.74
p40.80
p=0.76
p=0.62
/ p=0.66
p=0.82
p=0.90
p5D.84
1)=0.86
GIRL PICTURE
p=CORRELATION
p=0.8,01
I I I I 1 1 i _J. 1
0 0.1 02 0.3 OA 0.5 0.6 0..7 0.0 02 LO
AMPLITUDE PARAMETER
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 109
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Figure 3
Bit Allocations
8 8 8
7 7
7 5
5 4
4 4
4 4
4 4
4
8 8
8 6
5 5
4 4
4 3
3 3
3 3
3 3
8 8 7
5 5
5 3
3 3
3 2
2 2
2 2
2
8 7
6 5
4 4
3 3
3 3
3 3
2 2
2 2
8 7 6
4 4
4 3
3 2
2 2
2 2
2 2
2
8 6
5 4
4 4
3 3
2 2
2 2
2 2
2 2
7 5 4
3 2
2 2
2 0
0 0
0 0
0 0
0
6 5
4 3
3 3
2 2
2 2
2 2
0 0
0 0
7 5 4
2 2
2 2
2 0
0 0
U 0
0 0
0
5443222222220000
7 5 4
2 2
2 2
2 0
0 0
0 0
0 0
0
5 4 4 3 2 2 2 2 0 0 0 0 0 0 0 0
5 3 3
2 2
2 0
0 0
0 0
0 0
0 0
0
4 3
2 2
2 0
0 0
0 0
0 0
0 0
0 0
5 3 3
2 2
2 0
0 0
0 0
0 0
0 0
0
4 3
3 2
2 2
0 0
0 0
0 0
0 0
0
4 3 2
0 0
0 0
0 0
0 0
0 0
0
0
4 3
2 2
2 0
0 0
0 0
0 0
0 0
0 0
4 3 2
0 0
0 0
0 0
0 0
0 0
0 0
0
3 3
2 2
2 0
0 0
0 0
0 0
0 0
0 0
4 3 2
0 0
0 0
C 0
0 0
0 0
0 0
0
3 3
2 2
2 0
0 0
0 0
0 0
0 0
0 0
4 3 2
0 0
0 0
0 0
0 0
0 0
0 0
0
3 3
2 2
2 0
0 0
0 0
0 0
0 0
0 0
4 2 2
0 0
0 0
0 0
0
0 0
0 0
0
3 2
2 0
0 0
0 0
0 0
0 0
0 0
0 0
4 2 2
0 0
0 0
0 0
0 0
0 0
0 0
0
3 2
2 0
0 0
0 0
0 0
0 0
0 0
0 0
4 2 2
0 0
0 0 C 0
0 0
0 0
0 0
0
3 2
2 0
0 0
0 0
0 0
0 0
0 0
0 0
4 2 2
0 0
0 0 0 0
0 0
0 0
0 0
0
3 2
2 0
0 0
0 0
0 0
0 0
0 0
0 0
Correlations 0.86
Figure 4
Coding with Square Root AC Energy Amplitude Factor
Correlations (0.76; 0.76)
Correlations 0.76
AP = 0.25 MSE 0.000544
AP = 0.25 MSE = 0.000863
AP = 0.30 MSE = 0.000446
1.5 Bits Per Pixel
AP = 0.3 MSE = 0.000757
1.0 Bits Per Pixel
AP m 0.35 MSE 0.000749
AP = 0.35 MSE 0.001023
Adaptive Coding 3 - Optimised Coding
The fine structure of the parameters required by each sub-block for optimum coding was
examined by performing an experiment in which fifteen pairs of inter-pixel correlations
(vertical and horizontal; Figure 5) were used to define bit allocations and normalizing
constants which were applied in every combination (225) to every sub-block. Figure 6
shows sub-block maps in which the reference number (from Figure 5) is shown for the
optimum combinations of bit allocations and normalizing constants. The resultant minimum
mean squared error is shown in Figure 7 with the coded pictures shown in Figure 8. The
pictures show good detail and edge rendition and it should be noted that even better
coding could have resulted if correlations of less than 0.80 (type 1) had been allowed.
110 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP801300829A001100060001-0
Figure 5
Correlation
Correlations
Reference No.
1
0.80;
0.80
2
0.84;
0.84
3
0.88;
0.88
4
0.92;
0.92
5
0.96;
0.96
6
0.80;
0.92
7
0.80;
0.84
8
0.84;
0.92
9
0.88;
0.96
10
0.92;
0.96
11
0.92;
0.80
12
0.84;
0.80
13
0.92;
0.84
14
0.96;
0.88
15
0.96;
0.92
Figure 7
Sub-Block Map (Girl Image)
Optimum Mean Square Error
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 16
1
0
0
0
0
0
0
0
3
7
7
2
1
1
1
2 2
2
0
0
0
0
0
1
15
10
6
4
8
7
1
2
2 1
3
0
0
0
0
1
9
15
15
9
6
522
2
2
1 1
4
o
0
0
0
2
5
15
7
4
2
1
15
3
2
1 1
5
0
0
0
0
1
4
5
6
2
3
6
8
3
1
1 1
6
0
0
0
0
1
4
3
4
2
2
2
6
1
1
1 1
7
0
0
0
0
0
11
14
11
4
1
4
9
1
1
1 1
8
0000035542592111
9
0
0
0
0
0
2
11
8
3
2
7
4
2
1
1 1
10
0
0
0
0
0
3
9
6
4
3
4
2
2
1
1 1
11
0
0
0
0
0
1
27
13
1
1
14
7
2
1
2 2
12
1
1
0
1
17
11
413
1
4
4
2
18
14
1 1
13
1
1
212
1
2
911
9
2
1
1
10
12
5 1
14
1
0
2
514
5
3
5
4
15
16
3
0
0
7 9
15
00
1
0
0
5
16
15
11
54
11
28
12
0
0 3
16
0 0
3
1
0
5
6
17
21
25
22
15
31
42
0 1
The Mean Square Error is coarsely approximated
1 E 0.0001
1
2
3
4
5
6
7
9
10
Figure 6
Nth-Block Mupo (Girl Image)
1 2 3 4 5 6 7 8 9 10 11 12 1514 15 16
11
11
12 1
13 12
14 1
15 1 1
16 ()c
1 1 1 1 1 1 7 1 6 1 1 12 1 1 1
1 1 1 1 1 y 1 1 0)12 8 1 1 1 1
1 1 1 1
i 1 3 1 1
1 1 1 1 1 1 1 6 1 7 1 1 1 1
1 1 1 1 1 1 1 1 7 .1 1 1 1 1
1 1 1 1 ? 1 1 1 1 1 1 irii 1
1 1 1 1 1 6 1 1 1 1 1 1 12 1 1
1 1 1 1 1 6 1 12 1 1 1 1 1 1 1
1 1 1 1 11 1 6 6 1 1 6 1 1 1 1
1 1 1 1 1 1 6 1 6 1 1 1 1
1 1 1
3 1 1
3
1
11
12)
1 1
1 1
1 1 7 1 1 1 1 1 1
6 1 1087 1 6 1 1 1 1
1 1 1 1 1 106 1 ? 1 1
1 1 1 1 1 1 12 6 1 1 1 1
1 1 6 7 1 1 1
1 7 7 12 6 12 12 1 12
(1)
Sub-optimum Bit Allocations Types
1 2 3 4 5 6 7 3 9 10 11 12 13 14 15 16
1 1 1 1 1 1 1 1 10 10 10 1 1 3 1 1 1
2 1 1 1 1 1 1 51 1056151 1 1
3 1 1 1 1 1 1 108 8 604 1 1 1
4 1 1 1 1 1 1 1 .1 1 3 1 1 1 1
5 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1
6 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1
7' 1 1 1 1 1 1 8 1 1 0 1 3 1 1
8 1 1 1 1 1 5 6 3 6 1 1 1 1 1 1 1
9 1 1 1 1 1 15 1 6 8 1 1 1 1 1 1 1
10 1 1 1 1 1 1 601 1 1 1 1 1 1 1
11 ?1 1 1 1 1 6 6 1 1 1 1 1 1 1
12 4101 1 9 6 105 8 8 8 1 1 1
13 10 4 1 0 1 1 1 1 1 1 1 6 1 1 1
14 5 11 li 1 1 1 1 1 8 1 1 1 1 1
15 1 1 11 1 1 1 0 1 4 0 10 0 1 1 1
16 1 1 71 1 9 3 3 8 5 5 1 15
(ii)
Sub-optjmum compenJIng Constant Types
Sub-blocks marked with a circle may be fairly well coded ith bit
allocations end companding constanto of Type 1.
Adaptive Coding 4 - Zonal and Threshold Coding
The coding thus far described employs bit
allocations in which those samples which are
quantized fall inside a maximum variance zone.
The coding error may be divided into the quanti-
zation error and the error resulting from those
samples outside the zone which do not get coded
at all. There will always be exceptional signi-
ficant samples in the latter category and they
could be coded by setting a magnitude threshold
in the area outside the zone and run-length cod-
ing all samples exceeding such threshold. As an
example, four bits were allocated to code each of
such samples and, assuming that a line starting
code were used, four bits were required to
address each sample. The number of thresholded
samples is shown by the sub-block maps of Figure
9 where it can be seen that a picture which has
been zoned coded at 1.5 bits per pixel now
requires 1.7 bits per pixel. In order to get
back to 1.5 bits per pixel, the zonal coding
SPIE Vol. 66(7975) Efficient TransmissionofPictorialInformation /111
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
must be effected at 1.2 bits per pixel. The corresponding coded pictures are shown in
Figure 10. It is considered that the result obtained at a total of 1.5 bits per pixel
is not significantly improved and does not justify the work involved especially since
channel error would be a problem.
Figure 8
Optimised Coding
Original
1.5 Bits Per Pixel
MSE = 0.000744
Figure 9
Sub-Block Maps (Girl Image)
The Number of Threshold Samples
1
2
3 4
5
6
7
8
9
10
11
12
13
14
15
16
1
00
0 0
0
0
0
5157
1
0
0
0
1
1
2
0
0
0 0
0
0
20
19
14
6
11
11
0
1
3
0
3
0
0
0 0
0
10
19
33
25
12
4
21
0
0
0
0
4
0
0
0 0
2
821
8
6
2
017
3
2
0
1
5
0
0
0 0
0
5
11
11
0
3
14
10
4
0
0
0
6
0
0
0 0
0
7
1
1
0
2
1
10
0
0
0
1
7
0
0
0 0
0
14
23
18
4
1
213
0
0
0
0
8
00
0 00
2
7.
8
7
0
610
1
0
00
9
0
0
0 0
0
1
10
9
3
2
9
0
1
0
0
0
10
0
0
0 0
0
311
9
6
4
0
0
0
0
0
0
11
0
0
0 0
0
0
29
26
0
0
23
7
0
0
0
0
12
0
0
0 0
22
23
9
20
0
5
3
3
23
31
0
1
13
0
0
221
1
0
12
17
11
1
1
0
18
21
6
0
14
0
0
0 6
25
9
1
2
8
22
30
0
0
0
2
20
15
0
0
0 0
0
6
30
22
21
45
19
41
19
0
0
2
16
0
0
2 0
0
7
13
22
28
29
39
20
55
37
0
0
(1)
1.7 bits per Pixel
Total Number of Threshold Samples = 6,008
Original
1 2 3 4
1.0 Bits Per Pixel
MSE = 0.000796
5 6 7 8 9 10 11 12 13 14 15 16
1 0 0 0 0 0 0 0 7 21 14 8 0 0 0 1 4
2 0 0 0 0 0 1 36 26 18 14 23 20 0 1 3 2
3 0 0 0 0 0 18 28 47 34 18 10 43 2 0 0 0
4 0 0 0 0 2 14 31 20 11 5 1 26 6 2 0 1
5 0 0 0 0 1 12 18 21 8 6 22 23 7 0 1 0
6 0 00 0016 46 6 5 2 18001 1
7 0 0 0 0 0 27 35 35 18 4 9 25 0 0 0 0
8 0 0 0 0 0 7 9 14 14 2 15 15 1 1 0 0
9 0 0 0 00 2 24 21 6 3200 1 0 00
10 0 0 0 0 0 7 18 18 8 5 0 1 1 0 0 0
11 0 0 0 0 0 A 39 39 0 0 36 lo o 1 o 2
12 1 0 0 0 32 34 12 33 0 11 El 6 43 43 0 2
13 0 0 538 1 0 23 29 27 5 2 8 32 33 16 0
14 1 0 0 11 40 19 4 13 17 40 47 4 0 0 13 31
15 0 0 3 0 0 20 44 34 36 70 33 59 36 0 0 5
16 0010 1 0 20 25 37 48 47 55 29 75 60 00
(ii)
1.5 bits per Pixel
Total number of Threshold Samples . 10,532
Figure 10
Threshold Coding
1.7 Bits MBE = 0.000533
1.5 Bits MSE
112 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
o.0006o6
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Adaptive Coding 5 - Zonal Coding
If the sub-block maps of Figures 6, 7 and 9 are studied, then it is seen that there is
high correlation between them in terms of which sub-blocks display exceptional informa-
tion. Two further maps are presented in Figure 11.
Figure 11
Sub-Block Map (Girl Imago)
Sub-Optimum Companding Constant Types with Indication of High
Amplitude Factor
1
2
3
4 5
6
7 8 9 10
11
12 13
14
15
16
0.1
0.2
0.1
1
1
1
1
1 1
1
1 FOP) 1
1 3
1
1
1
0.1
0.2
0.1
2
1
1
1
1 1
1?1
1
6
3615
1
1
1,
3
1
1
1
1 1
1 7 6
8
6 M 4
1
1
1
0.2
0.2
0.1
4
1
1
1
1 1
1
6
1
1
1
1 al 1
1
1
1
0.1
0.2
0.1
5
1
1
1
1 1
1
1
1
1
1
1
1 1131
1
0.1
0.2
0.1
6
1
1
1
1 1
2
1
1
1
1
1
1 1
2
1
1
0.1
0.1
0.1
7
1
1
1
1 1
1
[61
2
2
1
1
8 1
3
1
1
0.1
0.1
0.6
8
1
1
1
1 1
6
3
6
1
1
1 1
1
1
1
9
1
1
1
1 1
15
1
6
8
1
1
1 1
1
1
1
0.1
0.1
0.0
10
1
1
1
1 1
1
6
6
1
1
1
1 1
1
1
1
0.1
0.1
0.1
11
6
1
1
1 1
1 DIE 1
1
7
1 1 1
1
1
0.1
0.1
0.1
12
4101
1 D 6
1
11 E 0
3
8 El 1
1
1
0.7
0.1
0.0
13
10
4
1
3 1 1
1
1
1
1
1
G 1
13
1
1
3.3
4.3
0.2
14
5
11
11
11 1
1
1
1
1
1
E110 1
1
1
1
3.7
3.4
1.1
15
1
1
11
1 2
1
1
2
1
8
2
1
1
1
4.7
0.6
2.0
16
7
1
1
1 7
6
1
1
1
15
0.1
0.1
1.7
0.2
0.1
1.1
0 a AF
16
AF ;;;;.7: 9
LII a AP 6
ii
Soh-Block Map
AC Energy Amplitude Factors for Girl Image
0.0
0.1
0.5
0.1
9.0
13.
13.
1.1
0.2
1.5
1.2
0.2
0.3
0.0
0.0
0.9
16.
1.3
0.8
2.6
5.8
5.4
3.1
0.6
0.5
0.9
0.0
0.1
3.2
2.3
3.3
4.4
4.1
1.9
6.6
4.5
0.7
1.3
0.3
0.0
0.8
1.3
6.0
2.1
0.7
0.5
0.8
6.1
0.4
0.4
0.6
0.4
0.0
1.2
2.3
2.8
2.7
1.7
0.5
2.4
1.8
0.9
1.3
0.7
0.1
0.0
0.2
3.5
2.0
1.0
1.7
1.0
1.0
2.5
1.0
1.1
0.4
0.4
0.0
0.0
3.5
6.9
5.0
2.8
1.3
0.5
5.8
1.0
1.4
0.3
0.2
0.0
0.0
9.5
2.8
3.1
4.1
1.2
2.0
1.6
0.4
0.9
0.2
0.3
0.0
0.0
5.7
3.8
4.2
3.2
1.3
1.4
0.7
0.4
0.8
0.2
0.3
0.0
0.0
2.0
4,5
3.1
1.8
0.8
1.5
0.6
0.9
0.9
0.7
0.5
0.0
0.1
0.6
7.5
6.3
0.5
0.5
4.8
1.0
0.2
0.7
0.3
0.3
0.2
9.4
5.3
1.8
5.0
6.6
4.3
2.3
3.0
7.7
2.8
0.6
0.6
5.2
1.0
0.4
2.7
1.0
2.5
2.1
0.6
2.1
2.6
4.5
2.5
0.4
S.3
2.0
0.8
0.7
1.3
2.0
4.6
6.9
0.5
0.0
0.2
2.9
1.0
0.2
1.0
2.7
3.2
5.4
2.3
20.
2.1
22,
4.6
0.4
0.4
2.0
0,6
0.6
3.7
1.5
5.3
20.
8.1
6.3
1.0
20.
36.
0.1
4.2
Now it can be seen that certain sub-blocks require exceptional bit allocations and nor-
malizing constants, have many significant samples without a normal bit allocation zone,
produce a high mean squared error even when optimally coded and furthermore possess the
highest AC energy. This last property allows easy identification of these sub-blocks
and hence definition of an adaptive, variable zonal coding is possible. An example is
presented to indicate the performance of such a scheme. The most difficult sub-blocks
will require many bits in order to be coded satisfactorily, e.g. 3 bits per pixel then
the relatively high energy sub-blocks might be coded well at 2 bits per pixel. The effect
for these images is shown in Figure 12 where the bit rate for the remaining (background
information) sub-blocks is shown calculated to yield an overall 1.5 bits per pixel.
Figure 12
Image
Number of sub-
blocks at 3.0
Bits per pixel
Number of sub-
blocks at 2.0
Bits per pixel
Number of sub-
blocks remain-
ing
Rate of remain-
ing sub-blocks
Bits per pixel
Girl
11
127
118
0.82
Couple
3
120
133
1.02
Moon
4
71
181
1.27
The distribution of bits is shown in Figure 13 and it is immediately obvious that the AP
is a very effective delineator of detail in the picture. Excellent quality would result
from such a scheme with the only problem occurring when the discreteness of the sub-
blocks causes part of a feature to be missed.
Conclusion - The Application of Statistical Coding
The results presented in previous sections show that most areas of the images are
coded effectively by straightforward application of the statistical models. The excep-
tions, when they occur are significant because they involve prominent features.
SPIE Vol 66 (1975) Efficient Transmission of Pictorial Information/ 113
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Figure 13
Adaptive Zonal Coding
Sub-Blocks Coded at 3 Bits Per Pixel
Sub-Blocks Coded at 3 Bits Per Pixel Plus Those Coded at 2 Bits Per Pixel
Typically these are characterized by a sharp high contrast edge. Obviously, in a sub-
block containing such features the statistics are not Markovian and not stationary. No
manipulation of the maximum variance zone except to simply increase its size is effective
in coding this data. As an example, the transform domain of the sub-block located on the
diagonal edge of the girl's hairline (reference Figure 13) is shown in Figure 14.
Figure 14
Slant Transform domain of Sub-block (2,7)
Girl Image
6.080
-2.038
-0.173
-0.051
0.056
-0.099
0.005
0.002
-0.014
-0.008
0.025
0.002
0.008
-0.040
-0.001
-0.039
:2.951
-0.204
0.914
0.006
0.210
-0.040
-0.013
-0.016
-0.001
0.006
-0.021
-0.004
0.011
0.024
0.027
0.015
0.185
1.060
-0.049
-0.389
-0.009
-0.258
0.004
0.078
0.062
-0.093
0.019
0.039
-0.035
-0.014
-0.009
-0.038
0.135
0.001
-0.394
-0.055
0.161
-0.026
0.008
-0.072
-0.007
-0.011
-0.007
-0.006
-0.009
0.000
0.000
0.049
-0.269
0.418
-0.042
0.229
-0.016
0.072
0.070
-0.121
0.029
0.001
0.028
-0.124
0.003
-0.007
-0.013
-0.012
0.108
0.055
-0.543
3.090
-0.003
0.003
0.119
-0.064
-0.043
0.034
0.109
-0.022
0.012
-0.017
-0.018
0.031
0.123
-0.140
-0.049
0.144
-0.106
0.231
-0.070
0.058
0.065
-0.018
-0.001'
0.020
0.001
-0.001
0.016
-0.008
0.037
-0.007
0.038
-0.021
-0.226
0.068
-0.045
0.074
-0.009
-0.057
-0.013
0.014
-0;005
0.007
0.015
-0.021
-0.018
0.029
0.084
-0.030
0.054
-0.053
-0.059
0.059
-0.037
0.020
-0.026
0.054
0.004
0.001
0.034
-0.001
0.032
-0.065
-0.047
-0.024
0.013
-0.020
-0.039
0.053
0.000
0.033
-0.061
0.036
-0.022
0.032
-0.026
0.011
0.099
-0.089
0.074
0.053
-0.030
0.063
0.012
-0.016
-0.088
-0.002
-0.005
0.002
0.020
0.031
-0.016
-0.001
0.015
-0.082
0.011
-0.046
-0.062.
0.024
-0.003
0.027
0.026
0.041
0.002
0.014
0.003
-0.033
0.016
-0.026
-0.006
-0.095
-0.030
-0.001
0.026
0.020
0.060
0.003
0.009
-0.006
0.038
-0.016
0.022
-0.015
-0.020
-0.012
0.007
0.056
0.079
-0.023
-0.021
-0.044
-0.029
-0.030
0.019
-0.029
-0.005
-0.001
-0.002
0.003
0.017
0.019
0.001
0.028
-0.056
-0.007
-0.009
0.005
-0.001
-0.022
0.012
0.012
-0.007
-0.053
0.002
0.032
0.021
-0.029
-0.046
0.028
0.014
*0.010
0.037
-0.612
0.034
-0.017
0.020
-0.020
0.026
0.015
0.012
-0.035
-0,014
0.015
114/SPIE Vol 66(1975)EffichnitTransminkmoffttori011nfOrra9don
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
The significant samples have a distribution out along the leading diagonal, very unlike
the maximum variance zone. In consequence, the adaptive zonal coding is recommended as
best, particularly since bits are concentrated where desired - in the girl image, the girl
herself is coded mostly with two bits and the insignificant background with only 0.82
bits. The moon picture, however, has the few outstanding features coded at 2 or 3 bits
but the more noticeable background receives 1.27 bits.
Lastly it is noted that the mathematical model approach to coding has now been
abandoned in favor of zones based upon measured statistics (variances) an improved coding
results since the requirements of the problem sub-blocks are built-in.
Interframe Coding
It is well known that considerable redundancy is present when a moving picture is
transmitted. The adaptive coder described here was designed to take advantage of the
limitations of human vision, essentially providing only sufficient information to yield
the illusion of a complete high detail moving picture. The coder is specifically designed
to minimize terminal cost in addition to minimizing transmission cost. In addition a two
stage coding is applied in order to obtain flexibility in implementation. Ad application
was picture telephone in which simple coding might be used locally with additional coding
for long distance. The first stage of coding consists of the removal of spatial redun-
dancy within single frames. This employs the techniques discussed earlier. The second
stage of coding - interframe coding is thus presented with coded transform domain samples
at a rate considerably less than the source rate. The bit rate reduction is to be
achieved by the updating of only those portions of the frame in which movement has taken
place since the previous frame. Since the interframe coder is presented with data which
has been transformed in sub-blocks, it is necessary to decide whether a particular sub-
block contains movement and if so, to update it. Detection of the sub-blocks which con-
tain motion may be effected by examination of the sub-block frame difference energy with
a threshold decision of whether to transmit a new sub-block or not. The effect is a
remarkably linear one as indicated by the error curves of Figure 15.
2
26
21
MIL:J.410U) SET AS A FUNCTION
01- SUN -134.0C.K DI EFIIPENCE ENERGY
23
23
21
3
2
0 2 4 6 8 0
Figure 15
VMYVAST
.MEDIUM
FAS r
SLOW
SLOW
GRAPH OF MILAN somara) ERROR vs THAFSHO'..0
FOR FINS: DEGREFS OF MOTION
IS 8
20 d-? 24 do dti 30 'LL'.7 di do
THRES1101.0
11
GRAP1-1 Or MEAN SOUARED ERROR vs
111121:S1'101 0. 0E0150 MOTION
1GISF -110 r
11 -
SUFEFELQCK DI F FE RENCE E050.1552
.12 -
-
I0 -
W
_
THRESHOLD SET AS A FUNCTION
2 8 -
86
-
54 4
2
2
_
_l___I___J _4_1 1 _1 1 1
4 6 8 10 12 14 IN
THRESHOLD
SLOPE 0.74
y 0.74 x-1.63
0 2
18
Serious deviation occurs only for very slow
motion where the discreteness of the update
process becomes apparent. The block diagram
for an implementation of the system is shown
in Figure 16. Shown is the difference energy
summing and thresholding, the memory and
delays which compensate for the time to make
the update decision. In a normal coder the
memory would have to store a complete frame.
The advantage of this coder is that compres-
sion has already been achieved by the intra-
frame coder and the same degree of compres-
sion applies to the size of the memory. A
4:1 compression (6 bits to 1.5 bits) is
likely and this results in only a one quarter
frame memory. From the curve of Figure 15
(ii) a threshold of 4 is selected as offering
the best compromise between updating all
SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information /115
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
moving areas but not updating differences due to noise. The result is shown in Figure 17.
Some blur is visible from the intraframe coding but the interframe coding is perfect.
The result with a threshold of 9 is shown in Figure 18. Some updating errors are visible.
Figure 16
- INTRAFRAME CODER
E u
tx, It)
,j
0? TRANSFORM 0UANTISER
DELAY
{I >T
0 EUT
0
uij It) ? uij It -4
3LOCK DIAGRAM OF ADAPTIVE INTERFRAME CODER
Ed>r
0 W.",
Owing to the non-availability of data, it was not possible to test the coder dynami-
cally - possibly a problem might be visibility of the sub-block updating structure. This
could be coped with by overlapping sub-blocks or careful transmission of the DC and low
frequency information. The buffering problems associated with the variable data rate
produced by the coder were found to be similar to those encountered with a pixel cluster
coding update scheme. In particular, in the case of buffer overload it is noted that an
elegant reduction of data input maybe effected by a further spatial compression (re-
quantization or removal of transform domain samples) to take advantage of the limited
spatial resolution of the eye to moving images.
Figure 17
Interframe Coding with Threshold of 4.0
Four Frame Sequence
Frame 2
1.16 bits per pixel
IMSE = 0.000052
Frame 3
1.17 bits per pixel
IMSE = 0.000051
Frame 4
(i) Intraframe coding at 3.0 bits per pixel
1.19 bits per pixel
IMSE = 0.000048
116/SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Figure 17 (Continued)
Frame 2
0.54 bits per pixel
IMSE = 0.000049
Frame 2
Frame 3
0.54 bits per pixel
IMSE = 0.000048
Frame 4
0.55
IMSE
(ii) Intraframe coding at 1.5 bits per pixel
Figure 18
Interframe Coding with Threshold of 9.0
Four Frame Sequence
Intraframe Coding at 1.5 bits per pixel
0.5 bits per pixel
IMSE m, 0.000063
Frame 3
0.52 bits per pixel
IMSE = 0.000074
bits per pixel
= 0.000055
Frame 4
0.52 bits per pixel
IMSE = 0.000092
References
1. "Orothogonal Transform Coding of Still and Moving Pictures", C. Reader,
Ph.D. Thesis, Sussex University, England, 1974.
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 117
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
118 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Session III
IMPACTS OF IMAGE DATA COMPRESSION
Panel Moderator
ANDREW G. TESCHER
The Aerospace Corporation
N
SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information /119
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
120 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
c N
Session IV
POTENTIAL IMPLEMENTATIONS
Chairman
J. R. PARSONS
The Aerospace Corporation
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 121
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
JOINT PATTERN RECOGNITION/DATA COMPRESSION CONCEPT
FOR ERTS MULTISPECTRAL IMAGING*
Edward E. Hilbert
Jet Propulsion Laboratory
4800 Oak Grove Drive
Pasadena, California 91103
Abstract
This paper describes a new technique which jointly applies clustering and source encoding concepts to
obtain data compression. The cluster compression technique basically uses clustering to extract features
from the measurement data set which are used to describe characteristics of the entire data set. In
addition, the features may be used to approximate each individual measurement vector by forming a sequence
of scalar numbers which define each measurement vector in terms of the cluster features. This sequence,
called the feature map, is then efficiently represented by using source encoding concepts. A description
of a practical cluster compression algorithm is given and experimental results are presented to show
trade-offs and characteristics of various implementations. Examples are provided which demonstrate the
application of cluster compression to multispectral image data of the Earth Resources Technology Satellite.
Introduction
This paper further describes a new technique which jointly applies clustering and source encoding
concepts to obtain data compression. A practical algorithm for implementing this concept was initially
discussed in a previous paper [1]. This cluster compression technique is applicable to multidimensional
information sources in general, but particular emphasis is given here to compressing multispectral image
data from future operational Earth Resources Technology Satellites (ERTS). A Joint Pattern Recognition/
Data Compression Model is defined to illustrate the general concept of combining pattern recognition and
data compression. Then a practical realization of the cluster compression concept is described and
experimental results are presented which show trade-offs and characteristics of various implementations.
Examples are also provided which demonstrate the application of cluster compression to multispectral image
data from ERTS.
Data Model
The multispectral image source can be modeled by the continuous random process s(y1,y21w) which is
the electromagnetic energy at wavelength w for spatial coordinates yl and y2. The measurement vector
elements of a digitized d-band multispectral image are then represented by the vector X(Y1,Y2) obtained
from s(y1,y2,w) by discretizing the variables yl, y2 and w to give
X(Y,,Y2).[s(Y1,Y2,W1),s(Y1,Y2,142),...,s(Y1,Y2,Wd)].
*This paper presents the results of one phase of research carried out at the Jet Propulsion Laboratory,
California Institute of Technology, under contract No. NAS7-100, sponsored by the National Aeronautics
and Space Administration.
122 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
(1)
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
A Joint Pattern Recognition/Data Compression Model
Figure 1 describes a model of the generalized concept for the cluster compression technique. Let
{X.}11
11 be a sequence of n measurement vectors obtained from a subset of the multispectral image data. The
-1 1=
entire sequence {X1}7=1 is analyzed to extract features {,j}j.1 for the sequence CX07.1. These features
could also be considered as primitives, or as a basis for approximating measurement vectors. The sequence
of m features {T.lm provides a description of the characteristics pertaining jointly to the entire
j j=1
sequence of measurement vectors {X07.1 and in some applications may be the only output required. In other
applications each measurement vector needs to be approximated in terms of ftp.)111 by the sequence of scalar
j j=1
numbers 1,(01(.1 which assigns to each measurement vector an approximation in terms of one of the primitives.
Alternately, the concept of Fuzzy Set Theory [2] might be used to allow each measurement vector to be
described by a weighted mixture of more than one of the features in the sequence {y3=1. The scalar
sequence {yk4=1 constitutes a spatial map of the measurement vectors in terms of the primitives, and is
called the feature map. Spatial features can be extracted from 1)(01:1(.1, and source encoding can be used
to efficiently represent the spatial characteristics of {X.14.1 through the encoded feature map denoted by
C . For each sequence {X.}1:1 of measurement vectors the model basically uses pattern recognition concepts
to determine a set of features or primitives, and then the model uses data compression techniques to
efficiently represent the spatial features in terms of the primitives.
SENSOR
EXTRACT
FEATURES/
PRIMITIVES
{T.1.
j J=1
{x07.1
DEFINE
EACH Xi
IN TERMS
OF {T.111.1
J J=1
SPATIAL
k=1
FEATURE
--->" EXTRACTOR/
ENCODER
Fig. 1. A Joint Pattern Recognition/Data Compression Model.
>. FEATURES
Cy ENCODED
FEATURE
MAP
The remainder of this paper describes a new practical method of data compression, called cluster
compression, which is based upon the model in Fig. 1. Cluster compression uses clustering to extract
multidimensional primitives from subsets of spatially contiguous measurements in the image data and then
applies entropy encoding to the feature map. Another independently developed data compression technique
related to the model in Fig. 1 is the Mob algorithm [3]. This algorithm is basically a boundary finding
algorithm that guarantees closed boundaries. A blob is defined as a connected set of picture elements all
of which have some common characteristic. In the Blob algorithm the primitives are defined by the boundary
finding algorithm and consist of statistical descriptions of the measurement vectors contained within the
various blobs. Spatial definition can then be provided by encoding a sequence which defines the spatial
boundaries of the blobs. This algorithm is most useful as a data compressor when the image consists of
well defined boundaries enclosing picture elements of nearly uniform characteristics, (e.g., agricultural
fields). The Cluster Compression algorithm is different in that it extracts primitives and obtains signif-
icant compression independent of the spatial characteristics of the data. Thus the cluster compression
technique can efficiently compress images whether or not they contain uniform areas with well defined
boundaries. In addition, if the image does consist of uniform areas with well defined boundaries, such as
fields, the entropy encoding of the feature map also results in efficient representation of these spatial
boundaries.
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 123
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP801300829A001100060001-0
CLUSTERING FOR EXTRACTION OF FEATURES
Let{X.}n be a sequence of n d-dimensional spectral intensity measurements from a spatially local
region of a multispectral image. These vectors generally tend to form groups in multispectral intensity
space. A typical plot of {Xi}7.1 for two bands in spectral intensity space is shown in Fig. 2(a). An
intuitively natural set of features to use for representing {Xi}7=1 in Fig. 2(a) are the descriptions for
groups of measurement vectors, for example, the groups depicted in Figs. 2(b)-2(d). Each feature then
consists of whatever set of parameters are used to describe the source distribution for the group of
measurement vectors. The mean and covariance are an obvious example of a group description, or feature.
The number of features extracted from any sequence of measurement vectors would depend on how accurately
{X.1n must be represented, and correspondingly what data rate is acceotable. An example of how two, four,
i=1
or eight features might be chosen is shown in Fig.s 2(b)-2(d).
C
CO
BAND I
(a)
Fig 2.
BAND 1
(b)
BAND 1
(c)
BAND 1
(d)
Typical display in two band spectral intensity space for measurement vectors and n
for groups of measurement vectors used as features. (a) Measurement vectors
(b) Two groups. (c) Four groups. (d) Eight groups.
Clustering Approaches
Clustering is a means for automatically grouping multidimensional data [4]-[5]. Figure 3 shows the
basic iterative approach to clustering. All clustering techniques depend on a distance measure between
vectors and a distance measure between clusters, or groups of vectors. The distance measure between
vectors is usually either the Euclidean distance or the absolute value distance. A distance measure
between clusters could be as simple as the distance between means, or more generally the shape of the
two clusters could also be used in computing a measure of distance.
USER
SUPERVISION
(NUMBER OF
CLUSTERS)
CHOOSE
INITIAL
CLUSTER
CENTERS
ITERATE
ASSIGN
EACH Xi
TO CLOSYST
CLUSTER
CENTER
RECOMPUTE
CLUSTER
CENTERS
Fig. 3. The basic iterative approach to clustering
STOP IF
NO CHANGE
The simple clustering algorithm of Fig. 3 is defined as the Basic Clustering Algorithm (RCA), and
simulation results involving its use in cluster compression are presented later in this paper. The repeti-
tive assignment computations for the BCA are very simple, and the algorithm can be structured in a highly
parallel manner making very high data rates feasible. The only supervision required in this clustering
is the choice of how many clusters are to be obtained for each set of measurement vectors. Even when the
number of clusters for each data set is constant, the self scaling characteristics of clustering provide
124 / SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
a good multidimensional quantization of spectral intensity space. However, for some applications it is
desirable to have the number of clusters used per data set be variable and adaptively determined to meet
certain quality requirements. There are many ways of making the basic iterative clustering algorithm
in Fig. 3 adaptive. For example, clusters might be split, combined, or deleted based on intracluster
and intercluster distance measures. A widely used algorithm, ISODATA, with these adaptive traits was
originated by Ball and Hall [7]. Modifications of ISODATA for ground data evaluations of ERTS multispec-
tral data produced ISOCLS [8], [9]. The Adaptive Clustering Algorithm (ACA) referred to later in the
simulation examples is basically ISOCLS with some simplifying modifications. Another clustering approach
often used with ERTS multispectral data is defined in [10]. This approach is similar to the ISOCLS
approach except that a Swain-Fu distance is used as a more accurate measure of intercluster distance.
However, the Swain-Fu distance also requries more computation. Any clustering approach could be used
with the cluster compression concept, but the emphasis in this paper is on clustering approaches amenable
to high data rate implementations. Furthermore, initial simulation results suggest that more sophisti-
cated clustering is unnecessary.
Cluster Features
The features extracted from {X.}n 1 by clustering are typically a subset of the following: cluster
-1 i=
means, cluster variances per band, cluster covariances, number of vectors in each cluster, set of
intercluster distances, etc. Any description of a cluster can be considered a feature, even the set of
vectors themselves. These types of cluster parameters comprise a compact description of a group
of measurement vectors in a statistical format which is both easily interpreted and directly meaningful
for image approximation and computer classification applications. An example of a simple cluster feature
is the cluster mean.
Cluster Feature Rate. Let (T1m be the set of m cluster features which describe the sequence of
j=1
n measurement vectors. Assume each T requires b bits of quantization per band. For example, in image
approximation T * might be the j*th cluster mean with b equal to 6 (or 8) bits per band, or in classifica-
tion uses Tj* might be the j*th cluster mean and variance with b equal to 9 (or 11) bits per band if the
variance per band is defined to 3 bit resolution. The spectral rate Rspec is defined as the bits per pic-
ture element per band (bpppb) needed to define the spectral characteristics of
m.b
(2)
Rspec n
A graph of Rspec vs. m for various values of n is shown in Fig. 4 with b assumed equal to six, which
corresponds to using only cluster means as features. In some cases this description of {X07.1 may be
all that is needed, and Rspec then represents the total rate in bpppb.
FEATURE MAP SPATIAL DEFINITION
Feature Map Rate
In many applications it is desired to further approximate {Xi}7.1 by approximating each vector
individually in terms of the cluster features. For example, if vector Xi, belongs to cluster j*, then
X. could be approximated by T.*, which might be simply the mean of cluster Y. Alternately, X1. could
_q* _*
be approximated by a combination of ITA1.1. The feature map sequence {yk4=1 defines the approximation
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 125
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
,
2 4
1.6
Fig. 4. Graph of spectral rate per band
vs. number of features for
various measurement vector
sequence lengths with 6 bits per
band per feature.
4
3
cL 2
et.
eL
1
0 R3 ood=1
/179/"P9------"I
spat
-- Rspat
2 4
8
in
16
d=4
Fig. 5. Graph of spatial rate and prac-
tical spatial rate per band
vs. number of features for
1 and 4 dimensional measure-
ment vectors.
of each measurement vector and thus gives spatial defintion for the sequence {Xi17.1. The data rate in
bpppb for fy is defined as Rspat. If each d-dimensional measurement vector is approximated by
one of in primitives, then
Rspat d
spat d "
log2m
(3)
Practical System. Rate. In practical systems, however, if in is not a power of two, the rate in
(3) can only be approximated by representing extensions of the source. Let R3 pat be the rate achieved
s
in practice when groups of three of the original m alphabet source are represented by natural coding.
3
This representation of the source is easy to implement since m is small. Figure 5 shows Rspat and Rspat
3
vs. in for two values of d and shows Rspat is never significantly greater than Rspat for any value of in
which is of interest.
Feature Map Source Coding
Entropy Coding.. A typical feature map sequence for an fX0r.11,...1 described by four cluster features
is shown in Fig. 6(a). In image data there is significant spectral correlation between spatially close
picture elements, and correspondingly the differences between spatially adjacent elements in the feature
map have a sample distribution function similar to that shown in Fig. 6(b). The entropy for the distribu-
tion of differences in the feature maps is on the average lower than log2m. Thus entropy coding
techniques offer an opportunity to reduce Rspat without any degradation in the reconstructed feature
map. The entropy for the distribution of differences for each fyk4.1 is often used as a measure of per-
formance for the encoding of the differences. This entropy represents a performance bound for the encoding
of a zero memory sequence with that sample distribution. Similarly, averaging the entropies over any number
of sequences can be used as a measure of average performance for encoding sequences of differences. The
average entropy represents an unachievable bound on the average performance for encoding all such sequences
with zero memory, since it ignores the overhead required to identify the optimum code assignment for each
sequence of differences.
Entropy Coding Simulation Results. Experimental results for entropy coding of the feature maps are
given in Fig. 7. Rspat is shown as a function of n and m for no coding, and for coding of differences
using an adaptive variable length entropy coding technique defined in [11]. Also shown in Fig. 7 is the
unachievable performance bound for coding differences with the assumption that the source has zero memory.
126 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP801300829A001100060001-0
33331111222
33311121122
44331111122
44121112143
Fig.
-3 -2 -1 0 +1 +2 +3
(a) (b)
6. (a) Partial example of typical feature
map. (b) Example of typical distribu-
tion of differences between adjacent
elements in feature map.
4
3
E:2
1
m=16 R344 pc
X
3
m=8 Rspat
Rc
x spat
PR
m=4 R3
spat
Rc
v spat
-PB
CODEU cqpiatr- m=2 UNCODEBt
PERFORMANCE 3
(Rspat)
BOUND (PB)
i.Ip 1 1
36 100 256 576
1024
Fig. 7. Uncoded and entropy coded spatial
rate vs. n for m=2,4,8,16. Also a
performance bound for coding the
distribution of differences.
All curves in Fig. 7 represent the average results for all square arrays with n picture elements in the
4-band test image shown in Fig. 14(a). Figure 7 shows that entropy coding is more effective for larger n.
In addition, Fig. 7 shows the entropy coding technique described in [11] and used in these simulations
performed close to the unachievable performance bounds (for the zero memory source). This entropy coding
is easy to implement, especially for the small source alphabet sizes of the feature maps.
Other Feature Map Coding. The above entropy coding did not introduce any degradation in represen-
ting the feature map. However, in some applications only certain spatial characteristics of the feature
map are of interest. For example, only closed boundaries in the feature map may be desired in field
image interpretation. The detection of spatial characteristics in {Xi4=1 has been simplified, since
the clustering has reduced the d-dimensional image to an array of scalar numbers indicating the spatial
occurrence of multidimensional features. When only spatial boundaries are of interest, the spatial
encoder must preserve only those numbers in the feature map needed to identify the boundaries, and then the
resulting differences in the feature map will encode to a lower spatial rate.
Another simple example of spatial redundancy removal is the deletion of subsets of the feature map
with reconstruction by linear interpolation. This approach might be used if spatial resolution is higher
than necessary to observe texture required in a particular application. The spatial rate in this case is
simply reduced by the subsampling factor.
Coded Feature Map_ Rate. Let Rc pat represent the rate in bpppb for the encoded feature map sequence,
s
Cr. Define the compression ratio CR as
CR - R spat
Rs
pat
(4)
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 127
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
If only natural coding of the source extension is used, the CR is nearly one. In general, the rate for
the coded feature map sequence Cr is
log,m
R ? `
spat d.CR '
BASIC CLUSTER COMPRESSION ALGORITHM
Total Rate
(s)
A basic configuration of the Cluster Compression Algorithm (CCA) is depicted in Fig. 8. The input to
the CCA is the user supervision, which for a fixed rate mode may be only the constant number of clusters
to be used for every input sequence. For a quality controlled mode, the supervision input consists of
simple threshold values which control the splitting and combining of clusters in the adaptive clustering,
thereby determining the number of clusters needed for each input sequence. The CCA output for each input
sequence consists of the spectral and spatial definitions for which the rates are defined respectively
in (2) and (5). Thus the total rate Rtot is given by
b-m log2m
R = R + Rc = +
tot spec spat n d.CR '
(6)
Equation (6) is still valid if the entropy coder is not used, but then CR is approximately one. Note that
either Rspec or Rspat could be zero in some applications, but in general Rtot is a mixture of both spectral
and spatial definitions. Let Rtot in (6) be equal to a constant Rot, and express n as a function of m
to get
where
n = b.m/Eqot log2m/(d.CR)j (7)
* log2m
Rtot > U7717 (8)
A plot of n vs. m in (7) for a constant Rt0t=1 is given in Fig. 9 with b=6, d=4 and CR=1 and 2. This plot
demonstrates that a given Rtot may be approximately obtained by many combinations of m, n and CR, and
similarly there is much flexiblity in choosing the division of Rtot between the spectral and spatial
definitions.
SENSOR
?1 1=1
CLUSTER
Rspec
ASSIGN
FEATURE
TO EACH
X.
ENTROPY
CODER
USER
SUPERVISION
CLUSTER
FEATURES
Rspat ENCODED
FEATURE
MAP
Fig. 8. The Basic Cluster Compression Algorithm
Rtot
128 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
7-
6
1000
?
5
4
n
1.'
500w
?
3
CR=1 ?
250-
?
CR=2
125.
64.
? ? ? ?
? .....
111 at i I I
2
1 i !
1
-
2 4 6 8 10 12 14 16
Fig. 9.
Plot of n vs. m for b=6,
d=4 and Rtot=1
.2
Fig. 10.
Performance vs. Rate
UNCODED CCA
1. n=36
2. n=100
3. n=576, 1024
4. n=256
.4 .6 .8 1.0 1.2 1.4
Rtot (bpppb)
%MSE vs. Rtot for various values of n
for the uncoded CCA
In terms of implementation it is desirable to keep both in and n small, since clustering requires
more computations and memory when in and n are larger. However, one must also question the comparative
impact on performance of various (m, n, CR) combinations which have the same Rtot. Performance might be
measured subjectively in terms of image appearance, or quantitatively in terms of percent mean square error
(OISE), or classification accuracy. Some pictures are provided at the end of this paper for subjective
evaluations of image quality, but at this point a quantitative measure of quality is desired. Classifica-
tion accuracy is an important quantitative measure of image quality for many ERTS applications. However,
classification accuracy is dependent on the classification technique, as well as the compression tech-
nique, and often significant accuracy improvements can be obtained by tailoring the classification tech-
nique for each specific compression technique. Results of jointly investigating corresponding compression
and classification techniques will appear in a later publication. Percent MSE is often not a good per-
formance criteria when comparing across different compression techniques, or different images, but %MSE
is generally meaningful for comparing the performance from options on the same compression technique and
same test image. Thus %MSE is used here as a performance criteria in comparing (m, n, CR) combinations.
Let X represent the original d-dimensional image data, and let X represent the reconstructed
approximation of the same image data after compression. Then the MSE between the original image and the
approximate image is the expected value of the square of the Euclidean distance between X and X.
MSE = Ell X - X 112] (9)
The average energy, e in the original image is defined in terms of the variance in the ith spectral
band, af, i=1,2,...,d.
Thus the %MSE is defined by
(10)
SPIE Vol 66 (1975) Efficient Transmission of Pictorial Information /129
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Simulation Results
Uncoded CCA
Various configurations of the CCA were simulated on the test image shown in Fig. 14(a). This test
image is a 256 x 256 picture element subset from a 4-band ERTS image of the Verde Valley. The purpose of
these simulations is to obtain a measure of performance vs. rate as a function of the (n, m, CR) combina-
tions. The first simulations used the Basic Clustering Algorithm (BCA) of Fig. 3, but without coding of the
feature map. This corresponds to determining performance vs. rate as a function of (n,m) pairs for CR=1.
The performance of this uncoded CCA in terms of %MSE vs. Rtot for various values of n is plotted in Fig. 10.
For each n shown, all square subsets of n picture elements in the test image were compressed using various
values of m. For example, for n=36 the test image was processed four times as follows: two features
(m=2) per each subset of 6 x 6 (n=36) picture elements, three features (m=3) per 6 x 6, four features
(m=4) per 6 x 6, and five features (m=5) per 6 x 6. These four simulation results were then plotted in
Fig. 10 and a smooth curve was drawn between them. The cluster feature used for this image approximation
application was only the cluster mean, and each cluster mean was defined by either 6 or 8 bits per band
(b=6 or 8). Therefore, the image approximation consisted of substituting the closest cluster mean for
each picture element. For every simulation with a given n and m, the %MSE between the original and
reconstructed image was calculated from (9)-(11), and Rtot was calculated from (6). For n=576 and n=1024
the performance curves were essentially identical. The results in Fig. 10 show that for the selected
source data the uncoded CCA peaks in performance for all Rtot at n approximately equal to 256. Similar
results have also been obtained from simulations with other test images.
Coded CCA
The same simulations that were conducted for the uncoded CCA (CR=1) were also conducted with entropy
coding of the feature map. This simulation configuration is called the coded CCA. The purpose of these
simulations was to measure performance vs. rate as a function of (m,n) pairs when CR is determined by the
entropy coding of the feature map differences. The entropy coding technique used is that which was
discussed earlier and which is defined in [ll]. Basically the encoder adaptively assigns variable length
codewords to the differences between adjacent elements of the feature map, which results in reducing the
average rate while allowing the feature map to be exactly reconstructed. All overhead rate costs have
been included in the compression ratio results for the entropy coding. The spatial rate reduction, or CR
due to entropy coding was shown in Fig. 7 to be greater for larger n. Thus the performance curves for the
coded CCA will shift more to the left of the uncoded CCA curves when n is larger. This effect is observed
by comparing the coded CCA curves in Fig. 11 with the uncoded CCA curves in Fig. 10. The coded CCA
continues to increase in performance as n increases, but it is near peak performance for n=256.
Adaptive CCA
In the next simulations, the Adaptive Clustering Algorithm (ACA), which was discussed under Cluster
Approaches, was used to adaptively determine the number of clusters to use for each sequence of n picture
elements. The total rate for this CCA configuration is approximately the average of all total rates for
each of the sequences of n elements, since the overhead to identify the number of clusters for each
sequence is very small. The advantage of the adaptive CCA is that a lower average value of in can typically
be used to obtain the same image quality. The performance and rate of the adaptive CCA are now dependent
on the criteria for increasing or decreasing the number of clusters, as well as the combination of
(n, m, CR). The performance of the adaptive CCA was observed to be less dependent on the size of n, with
all results for n of ten or greater nearly on the same performance curve. Although Rtot was sensitive to
130 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
changes in the criteria for determining the number of clusters used per sequence, the performance vs. rate
trade-off was very insensitive to changes in that criteria. There is a significant performance gain in the
adaptive CCA relative to the coded and uncoded CCA. This gain is shown in curves 3, 4 and 5 of Fig. 12.
However, the performance gain due to adaptability is dependent on both the image data and user application.
Hadamard and Fourier Transform Techniques
Tests were performed to compare the MSE performance of the CCA with some other compression techniques
which are well known. The two techniques used for comparison are the well known Hadamard and Fourier
transform techniques. A discussion of Hadamard and Fourier transforms in image coding is given in [12]
and [13]. Both techniques were used in an adaptive form, and the compression was done on each band of the
test image independently. Some improvement in performance could be obtained by modifying the technique to
work jointly on all four bands. The Hadamard transform technique involved taking the two dimensional
Hadamard transform of all 8 x 8 subsets of the image. Then the 64 coefficients were placed in 9 zones,
each of which had the quantization and corresponding bit rate per coefficient based on the variance, or
energy of the coefficients in the zone. The Fourier transform technique was similarly adaptive, but in
addition a symmetrical transform approach was used which doubly folded each data subset to provide
horizontal and vertical symmetry. This symmetry reduces the intensity discontinuities at the boundaries
due to the low pass filtering. The %MSE vs. total rate for these two adaptive transform techniques is
shown in curves 1 and 2 of Fig. 12 along with the CCA performance curves.
Simulation Summary
In Fig. 13 a summary of the above simulation results is given by plotting Rtot required for 2 %MSE
vs. n for the various compression configurations. The data points plotted in Fig. 13 were interpreted
from the smooth curves in the previous figures, as well as from other simulation results. These curves
point out a substantial MSE penalty for n 0
sgn (a) =
-1 a < 0
and / is expressed in binary form as the set
1 = (1m-11m-2 1110)
with /r (0,1)
Slant Transform(4)
(2)
(3)
(4)
For n = 4 we
have:
1
1
1
1
1
[T]4 =7
3/ 5
1/ 5
-1/ 5
-3/ 5
(5)
1
-1
-1
1
1/5
-3/5
3/5
-1/ 5
Larger arrays are constructed iteratively.
By design n e 2' to allow implementation in a "fast" algorithm. The resulting set of coefficients
[C]n representing the partitioned image data must now be requantized into a coefficient set [C']
0
according to some quantization model to produce the desired bit rate reduction. It is this step which is
perhaps the most important in any compression algorithm. For example one can perform "zonal" coding by
partitioning the transform array and assigning a bit rate to each coefficient within a partition based on
rate distortion theory (7) or pre-assigned rates based on subjective evaluations of the decoded imagery.
Another strategy is to threshold the coefficients and quantize only those elements greater than the thres-
hold value.
Utilization of adaptive techniques can also be applied. For example, bit assignments can be varied
as a function of image statistics over the partition. For this case, the output bits should be buffered
to insure a constant bit rate for transmission.
The code word bit stream representing the sequential source coding of each image partition, along
with any overhead bits required for the decoding process such as utilized in certain adaptive schemes, is
then channel encoded and transmitted. At the receiver the inverse operations are performed and the
coefficient set [C']n reconstructed. The inverse transform operation
[II]n = [T]t[C'ln[T]n
(6)
then yields an image partition [I']n which should be a faithful reproduction of [I]n. These partitions
are then used to form the composite reconstructed image [1'41.
Before going on to discuss the comparison of I and I', consider another operation which can be
accomplished before source decoding. The process discussed above can be related to a low pass filtering
operation (assuming that the coefficient requantization effects can be ignored) and therefore one can
investigate the possibility of applying a high frequency boost or other transfer function compensation to
the coefficients of [Ct]n for each partition before inverse transformation. Such an ac-4r would, for
the case of a cosine transform, correspond to the block-mode filtering of digital imagery except that
there is no overlap between partitions to reduce edge effects. Thus one would form the scalar multiplica-
tion:
198 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
[v]n = FICI]n (7)
where Fn is an array of coefficient weights. The utility of such a technique is demonstrated in Figure 3.
The original image (a) is digitized at 10 bits/pixel and N = 512 while (b) is the reconstructed image
after a compression to 2.6 bits/pixel using the cosine transform with n = 16. Figure 3(c) is the recon-
structed image where the coefficient weighting array F(k,/) = 1 + fik-1)2 + (A-1)2/15 for kyt = 1,2,...16
was first applied to the quantized coefficients of each partition before reconstruction. A radial section
of this 2:1 boost filter is shown in 3(d). The improvement is obvious and in fact the filtered image has
been subjectively rated superior to the original.
The MSE As A Quality Metric
The most widely used measure of image quality for coding analysis is the normalized mean square error
(ISE) between the coder input (I) and output (I') images.
MSE = 1I(i,j) - I' (i,j)12/ li(i,J)12
i,j i,j
n 10
= :E: I 12 II(12j)41(1)3)12 /I: II: 1I(i,j)12
j..0
where indicates a sum over all partitions.
The orthonormality of the transform [T] allows us to restate (8) as:
MSE =
Ep
k;t
I/I:p :12..?
k,t
(8)
(9)
One can therefore relate the overall MSE to the errors introduced on the coefficients C(X,t) by their
requantization and/or deletion. The problem of optimizing the quanization algorithm relative to the MSE
and subjective quality assessment has been considered previously.0) We have found that decoded image
errors due to coefficient requantization in zonal coding schemes are more objectional than the smoothing
effect introduced by high frequency coefficient delation (low pass filtering). To develop a useful
requantization design strategy which is consistent with subjective quality measures, we separate the two
major components of the MSE:
where:
MSE
MSE = MSE + MSEf
IE i c(keld)I2 /E E
kd'td p 1 k,/
I c(k,01
2
for deleted coefficients C(kd,td) is the error due to filtered or deleted coefficients and
MSE = MSE - MBEf
In
E? co?0-c,(k,012
- 1C(kd,/d)12
kd'd
\ill: t IC(k,L)121
p
(10)
is the error due to coefficient requantization.
The subjective effects are considerably different for the two error terms. The coefficient requantiza-
tion generates a "noise-like" distortion and artifacts while deletion of the high frequency terms actually
reduces noise, although at the expense of resolution. An obvious observation is that both quantization
and filtering effects are undesirable yet unavoidable parts of the compression process, and it is felt that
the compression algorithm designer can utilize MSEf and MSE in order to optimize his algorithm. Assuming
an optimum quantization procedure is utilized for individual coefficients, a good statistical model is 2
available for transform domain variances, the transmission rate is fixed, and there are a total of K = n
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 199
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
(a)
Original (I) image at 10 Bits/Pixel
Reconstructed decoded image with high
frequency boost over each partition
(b)
Reconstructed decoded (I') image after
compression to 2.6 Bits/Pixel
2
1 10 16
Coefficient ( k or 10
(d)
Radial section of high frequency boost
filter
FIGURE 3. Compression/Filtering Experiment Using Cosine Transform
and 16 x 16 Partitioning
200 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
coefficients per sub-block of which K are quantized and Kf are deleted (so K = Kf +), then the
following procedure can be used:
1. Set Kf such that MSE MSE
q f
2. Compare I and I. If I' contains negligible noise, iteratively decrease Kf.
3. If I contains excessive additional noise and/or artifacting, iteratively increase Kf.
Step (1) should result in a minimum MSE but steps (2) and (3) can result in improved visual representation
by the tradeoff of quantization noise and resolution loss.
Experiment
A computer program has been developed for the simulation of various transform coding methods and
it has been used to generate data for actual imagery. Figure 4 illustrates the manner in which this
package was utilized in an experiment to evaluate the correlation of MSEf and MSE with subjective
image ranking.
2,
The image used for this study is shown in Figure 3(a) and consists of 512 cN=512) pixel values, each
quantized to 10 bits. Analysis of this array indicates that there are 9 bits of useful image data and
furthermore its structure is representative of what might be encountered in a low altitude remote sensing
application. This image was partitioned into one of two sizes (n = 8 or 16) and one of 3 transforms
applied (Cosine, Slant, Hadamard). Selected coefficients were then requantized and others deleted to give
bit rates (R) of .64, 1.4, 2.6, 4.6 bits/pixel for n = 16 and rates of 1.28, 2.6, 4.3, 6.9 for n = 8.
The requantization scheme consisted of first looing at the coefficients in bands parallel to the opposing
diagonal of [C]n (Figure 5(a)). We, and othersk111, have observed that iso-variance regions of 2: [C]
P n
tend to line up quite well with this zonal approach. After determining the coefficient variance2 for
the 5th zone one of the following two methods was applied.
1. The maximum absolute value 1 CI
max i in each band was determined and the coefficients in that zone
uniformly requantized between + ICI ? C was quantized to 10 bits for each zone.
? max max
2. The variance a2 was used to define a gaussian quantizer for that band, cr2 was quantized
z z
to 10 bits for each zone.
The actual bit assignments for each zone to give the above bit rates are shown in Table 1. In
addition to these zonal coding experiments, an adaptive approach as used in reference 10 was utilized
for the cosine transform with n = 16. In this case the coefficients of [C], were first ordered into a
pe-dimensional sequence of 256 values. The analog for n = 8 is shown in Figl;re 5(b). The coefficients
C(m) for in = 1 to 256 were then recursively coded using a variance estimate e- based on previously
coded coefficients, and a gaussian quantize. The coded coefficients 'a(m) are determined by the quantiz-
ation operator Q such that Ckm) = /t1 C(m)) and the next estimated variance is am+1 =
m'
We2 + (1-W) 6(m)2 for a pre-determined weight W. The primary assumptions of such a technique are that
the C(m) are uncorrelated and the 02 are highly correlated. This particular coder was operated at 1.55
bits/pixel.
Each of the above 49 coded data sets was decoded and the resulting images [It]N were both numerically
and subjectively compared to [I]N. The numerical comparison consisted of a computation of MSE, MSEf and
MsE for each image [I']N while the subjective analysis consisted of a ranking of the displayed imagery
according to a variety of tasks aimed at "exploiting" the imagery. Care was taken to insure that the
subjective ranking was somehow tied to how the data would be used.
The displayed imagery represented a wide range of quality and bit rates as demonstrated by Figures 6
and 7. Figure 6(a) is the decoded image for the cosine transform with n = 16, R = 2.6 bits and uniform
requantization while 6(c) is the same transform with R = .64 bits. The corresponding scaled absolute
value difference images (al[I], - [I'JNI) are daown in Figures 6(b) and 6(d). The scaling factor a was
set to 15 to enhance the visua/ impact of the error distribution. One can immediately note the correlation
of errors with image edges and other regions of high spatial frequency content. Since image information
may be located at or near edges, it is a natural step to look at the performance of the adaptive scheme.
It should be sensitive to the frequency content (or sub-block "business") and therefore tend to distribute
the errors over the entire scene rather than at edges. Figure 7 demonstrates that phenomenon. Figure 7(a)
is the decoded image for R = 1.4 bits, n = 16 and gaussian requantization of the cosine transform
coefficients, while 7(b) is the scaled (a = 15) difference image. Figure 7(c) is the decoded image for
the adaptive case (R = 1.55) with 7(d) the scaled difference image. Comparison of 7(b) and 7(d)illustrates
the superior performance of the adaptive approach in decorrelating coding errors and image structure, while
providing an overall MBE roughly half that of the non-adaptive case for nearly equal bit rates.
The results of the computed objective (MSE) metrics are shown in Table 2 for all 49 cases used in
the experiment. The cosine transform is clearly the superior performer when one compares the metrics for
equivalent quantizer, R and n. Also the gaussian quantizer is superior to the uniform case, but the
margin is suprisingly narrow. Finally, the apparent ambiguity in MSEf and MSE between Slant and Hadamard
transform cases should be noted. This is due to the different convergence properties of the higher sequency
Slant and Hadamard terms.
A regression analysis was performed on each ranking of the image data and the corresponding set of
metrics, this being done for all such rankings. Table 2 shows the resulting correlation coefficients for
each MSE metric.
SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information /201
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Input image
N = 512
= 10 bits
V
Hardcopy
Subjective
ranking
Difference
images
Hardcopy
Output Image
N = 512
= 10 Bits
1
2
3
4
5
6
Partition
n = 8,16
1.
[I ]n
Compute MSE,
MSEf ' MSE
Reformatting
Transform:
Cosine, Slant
Hadamard
Inverse
Transform
[C3
1_31: ima
Coefficient
Requantization and
Deletion
R = .64, 1.4, 2.6, 4.6
=1.28, 2.6, 4.3, 6.9
Uniform, Gaussian
Adaptive (R = 1.55)
FIGURE 4. Transform Coding Evaluation Experiment
k
1 2 3
1 2
7
8
Z=
4 5 6 7 8
3 4 5
6
7
8
/
/
/
/
/
4-
/
/
/
/
/
/
/
/
-,
/
//
4
/
/
/
/
/
/
e
A
/
/
#
/
/
/
/
/
/
/
/
/
/
?
/
/
/
/
e
/
/
/
/
,
/
/
,
/
/
,
/
/
/
e
/-
/
/
.
/
/
/
/
/
/
,
/
/
.
/
/
/
/
,
/
/
/
.0.'
/
?
/
/
/
?
.
/
/
.0
/
/
/
/
I
/
/
?
/
/
I
/
/
/
/
/
/
(
/
/
/
/
/
?
/
/
/
/
/
/
/
/
?
/
/
/
C
of
?
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
//
,
o?
/
/
,
,
,
(a) Coefficient Zones (z)
9
10
14
15
Coefficient
Reconstruction
(b) Coefficient Ordering to
Obtain C(m)
FIGURE 5. Definition of Transform Domain Zones and Coefficient
Ordering for 8 x 8 Partitioning
202 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
(a)
Reconstructed decoded (I') image for n = 16,
R = 2.6 Bits/Pixel, MSE = .073%
(c)
Reconstructed decoded (I') image for n = 16,
R = .64 Bits/Pixel, MSE = .5934
(b)
Difference image for case R - 2.6
and a = 15
(d)
Difference image oIIt for R = .64 and
a = 15
FIGURE 6. Cosine Transform Coding Examples with Uniform Quantizer
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information /203
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Reconstructed decoded (I') image for n = 16,
R = 1.4 Bits/Pixel, Gaussian Quantizer,
MSE = .17%
(c)
Reconstructed decoded (I') image for adaptive
coder, n = 16, R = 1.55 Bits/Pixel, MSE = .08%
(b)
Difference Image alI-I'l for case R = 1.4 and
a = 15
Difference image
a = 15
(d)
for adaptive case and
FIGURE 7. Comparison of Adaptive and Von-Adaptive Coders using Cosine Transform
204 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
zone
z
-
n
R
1
2
3
4
5
6 7
8
9
10
11
12
13
14
15
17
to
3
16
INIMMINa
0.64
109
8
8
6
DELETED
1.4
109
8
8
6
4
4
4
3
3
2.6
109
8
8
6
4
4
4
3
3
3
3
3
3
3
3
4.6
109
8
8
6
4
4
4
3
3
3
3
3
3
3
3
3
8
1.28
lo
9
8
DELETED
2.6
1098
8
6
4.3
109
8
8
6
4
4
4
6.9
10
9
8
8
6
4
4
4 13 13
13
13
13 13
13
13
TABLE 1. Zone Bit Assignments for Experiment Bit Rates (R) and
Partition Size (n).
TRANSFORM
COSINE
SLANT
HADAMARD
n
R
QUANTIZER
MBE
msEf
MSE
MSE
MSE
MSE
MSE
M SEf
MSE
16
.64
Uniform
.59
.579
q
.011
* .67
.26
f
.657
.23
q
.013
.03
.69
.28
.676
.25
q
.014
.03
1.4
.19
.17
.020
2.6
ti
.073
.030
.046
0
.027
.030
.096
.040
.o6o
o
.036
.040
.102
.06
.068
o
.034
.06
4.6
II
16
.64
Gaussian
.55
.542
.008
.62
.609
.011
.63
.618
.012
1.4
"
.17
.155
.014
.24
.219
.021
.25
.231
.019
2.6
fl
.070
.049
.021
.09
.062
.028
.088
.062
.026
4.6
fl
.028
0
.028
.038
0
.038
.035
0
8
8
1.28
Uniform
.47
.465
.005
.49
.483
.007
.50
.493
.035
.007 ,
2.6
u
.123
.112
.011
.16
.138
.022
.17
.149
.021
4.3
II
.047
.023
.024
.060
.030
.030
.061
.033
.028
6,9
ti
.028
0
.028
.030
0
.030
.028
0
1.28
Gaussian
.45
.446
.004
.46
.454
.006
.46
.455
.028
.005
2.6
i,
.10
.092
.008
.11
.095
.015
.12
.109
.011
.0
.018
.012
.040
.020
.020
.038
.020
.018
6.9
ht
.021
0
.021
.024
0
.024
.022
0
.022
1.55
Adaptive
.080
.046
.034
TABLE 2. Objective Metrics for Coder Evaluation Experiment
SPIE Vol. 66(19751 Efficient Transmission of Pictorial Information /205
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
METRIC
CORRELATION
COaTICIENT
MSE
.73
M
mI,
.75
MSE,
.86
TABLE 3. Objective MSE Metrics and their Correlation
with Subjective Image Ranking
Conclusions
The development of a viable and general objective meansure of image quality is a difficult task. One
should try and identify the salient features of a particular image chain model and encorporate them in
some fashion into a particular quality criterion well-suited and in harmony with the image and its intended
use. We have taken such a track in the development of a refined MSE error as it applies to the judgement
of imagery which has been transform coded. The noise introduced by coefficient requantization (MSE q) has
been found to be a superior measure of quality than the traditional overall MSE. It is hoped that future
work in this area will yield still better metrics and aid in the efficient modelling of such data
transmission systems.
Acknowledgements
The authors wish to acknowledge the programming support provided by Mr. John N. Hamilton.
References
1. A. G. Tescher and J. R. Parsons, "Cross-Spectrum Error Criterion as an Image Quality Measure,"
Applied Optics, Vol. 12, June 1974, p. 1460-1465.
2. P. A. Wintz, "Transform Picture Coding", Proc. IEEE, Vol. 60, No. TO, July, 1972.
3. W. K. Pratt, J. Kane and H. C. Andrews, "Hadamard Transform Image Coding," Proc. IhhE, Vol. 57,
Jan. 1969.
4. W. K. Pratt, W. Chen and L. R. Welch, "Slant Transform Image Coding," IEEE Trans. Comm., Vol.
COM-22, No. 8, Aug. 1974.
5. N. Ahmed, T. Natarajan and K. R. Rao, "Discrete Cosine Transform," IEEE Trans. Computers,
Vol. C-23, pp. 90-93, Jan. 1974.
6. C. Cardot, "A New Eye on Walsh Functions", Proc. 1972 Symposium on Applications of Walsh
Functions, AD-744-650, pp. 393-400.
7. L. D. Davisson, "Rate Distortion Theory and Applications," Proc. IEEE, Vol. 60, July 1972.
8. B. R. Hunt, "Block-Mode Digital Filtering of Pictures", Mathematical Biosciences, Vol. 11 (1971),
p. 343-354.
9. M. Tasto and P. A. Wintz, "Image Coding by Adaptive Block Quantization," IEEE Trans. Comm.,
Vol. coni-19, pp. 957-972, Dec. 1971.
10. A. G. Tescher, "Adapative Intra-and Inter-Frame Image Coding Utilizing Fourier Techniques,"
USC Image Processing Research Semiannual Technical Report No. USCEE 444, pp. 24-33.
11. W. B. Scheming, "Digital Image Transform Encoding," RCA Corporate Engineering Services, Report
RE-20-1-10, 1973.
206 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
A COMPARISON OF HARDWARE IMPLEMENTATIONS OF
THE HADAMARD TRANSFORM FOR REAL-TIME IMAGE CODING
Stephen C. Noble
Air Force Avionics Laboratory
Wright-Patterson Air Force Base, Ohio 45433
Abstract
Several different hardware configurations for implementing the Hadamard transform in
real-time are discussed. The discussion is referenced to a 64 point (8 x 8) transform
that is used in image coding for bandwidth reduction. A successive implementation of two
8 point transformations with intermediate random access memory using the straight forward
N2 computation is compared to implementations of the Fast Hadamard Transform (FHT). The
speed of computing transform coefficients is compared to the hardware requirements of the
various techniques. The relationship between computational speed and the resolution and
frame rate of television imagery that may be transform encoded using the different
methods is illustrated. It is shown that for picture element (pixel) rates up to 5 MHz
the N2 computation is the more economical to implement.
Introduction
Transform coding for the efficient transmission of images has been a subject of inter-
est in recent years. In a general article on the subject, Wintz (1) describes the com-
monly used procedure of subdividing the image into subpictures and then computing the
transform of the pixels in each subpicture separately. The subpicture size referred to
in this paper is 8 x 8 pixels.
The basis for the interest in transform coding is that, for a variety of transforms,
picture information is concentrated into fewer transform coefficients than the pixel
amplitudes that are transformed. By efficiently assigning bits to the transform coeffi-
cients a substantial reduction in the number of bits over those required for conventional
pulse code modulation (PCM) of the pixels is obtained. Reductions from 6 bits/pixel PCM
to 1-2 bits/pixel (average over the subpicture) in the transform domain is typical.
Of the two-dimensional transforms, the Hadamard transform allows for the fastest im-
plementations in real-time.(2) Recent advances in charge coupled device (CCD) technology
have made possible the use of the cosine transform at pixel rates in the 1-5 MHz range.(d)
The differences in the mean square error performance of the various transforms when used
to encode images having exponential correlation statistics are in the range of .05% at
the .25% error 1evel.(4) In choosing a transform for image coding the paramount question
is one of cost for the given speed requirements.
The Hadamard transform has been implemented in real-time(5) using TTL hardware in the
pipeline FHT configuration. For cases where a slightly lower resolution of 256 elements
per scan line or lower frame rates of 7.5 frames per second are acceptable, a combination
of two 8 point Hadamard transforms with intermediate random access memory (RAM) may be
used. This combination requires only two arithmetic accumulators.
The Hadamard Transform
The following discussion will address the 8 point Hadamard transform. Larger point
transforms are common, but for purposes of illustration the 8 point transform is easier
to use. Furthermore, there is evidence that there is no substantial improvement in per-
formance for the Hadamard transform for subpicture sizes greater than 8 x 8 pixels.()
The 64 point two-dimensional transform may be obtained by cascading two 8 point trans-
forms.
The 8 point Hadamard transform is described by eight equations of the type shown in
equation (1), which is the sequency 5 Hadamard coefficient.
5 -4 PO - P1 - P2 P3 - P4 P5 P6 - P7
(1)
As can be seen, each Hadamard coefficient is generated by a combination of sums ,ad
differences of the eight pixel amplitudes. Equation (2) is the matrix operator") that
defines the addition and subtraction operations for all eight Hadamard coefficients.
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information /207
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Sequency or
No. of Sign Changes
+
+
+
+
+
+
+
+
0
+
+
+
+
-
7
+
+
+
+
3
+
+
+
+
4
[H] 8
=
+
+
+
+
1
+
+
+
+
6
+
+
+
+
2
+
+
+
+
5
(2)
The only operations required to implement the Hadamard transform in hardware are addition/
subtraction and register shifts (multiplication/division by 2).
The Fast Hadamard Transform
The fast Hadamard transform takes advantage of intermediate results in the computation
of coefficients to reduce the total number of computations required. The implementation
is very similar to the fast Fourier transform which requires multiplication of complex
numbers to implement.
The Pipeline Implementation
Fig. 1 illustrates the pipeline implementation of the FHT. The dashed line input to
each arithmetic element designated with a minus sign (-) is the subtrahend input to ti}e,
indicated subtraction. The figure is organized in the perfect shuffle configuration."'
Each stage of the computation is interconnected in the same way. The speed of this method
is limited only by the speed of the arithmetic hardware used. Each symbol in Fig. 1
represents an arithmetic element together with a register. Arithmetic operation speeds
on the order of 25 ns are available using emitter coupled logic (ECL) 10,000 series
devices. For an 8 point transform this corresponds to a pixel transformat4.okl rate
greater than 300 MHz. The system implemented at NASA-Ames Research Centerk5) uses a
16 point pipeline Hadamard transform together with a multiplexed 4 point implementation
to achieve a 128 megapixel per second transformation rate. The system uses TTL hardware
operating at 125 ns for each arithmetic operation. Such speeds are required in special
applications where it is not possible to process pixels at a steady rate. That such
high speeds are not required may be made apparent by considering a 525 line 30 frame
per second video signal that has been digitized to 512 pixels per line. The pixel rate
is 512 x 525 x 30 = 8 MHz. Clearly, when it is feasible to buffer the video data properly,
the number of arithmetic elements used may be traded against lower speed requirements.
The Iterative Implementation of the Perfect Shuffle
Since the interconnection pattern of the perfect shuffle is the same for all stages of
the computation, the hardware can be reduced to one stage by feeding the output back to
the input and cycling the operation 3 times. In general, the number of cycle times
required is log 2 N, where N is the number of points to be transformed. This cycling
operation is illustrated in Fig. 2 with a feedback loop to show that the output of each
arithmetic element is feed back to the corresponding point above the element. Every
third operation new data is entered through the multiplexer (MUX). This configuration
operates at one third the rate of the pipeline system, but requires only eight arithmetic
elements. A pixel rate of 100 MHz can be achieved using ECL 10,000 series components.
Using low power, Schottky TTL pixel rates of 20 MHz can be maintained. Based on the
calculation in the preceeding paragraph, these speeds are still in excess of the 8 MHz
required. The main reason for using such high speed processing has been the difficulty
in fully buffering the pixel data. To transform 525 line interlaced video may require
a field storage with access to 4 lines of video data in each of two fields. Once the
required data is collected it is usually available at the 8 MHz rate from many outputs
simultaneously. In the case of a 64 point (8 x 8) transform a total of 8 inputs at
8 MHz would result since the output of eight scan lines would have to be processed
simultaneously. To avoid this requirement, eight lines of pixel data must be stored
and read out one subpicture at a time. For 512 pixels by 8 lines this amounts to 4K
208 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0
Po
P1
P2
P3
P4
P
P6
P7
H0
H7
H3
H4
H1
H6
H2
Fig. 1 Pipeline Implementation of the FHT
H5
storage with a read-modify-write speed of 125 ns to meet the requirements for 30 frame
per second video. At 6 bits/pixel this amounts to 24 11< TTL RAM chips. In addition to
the chip count, these high speed RAMS will consume 15-20 watts of DC power.
An 8 Point FHT Using One Arithmetic Element
Except for the high speed memory requirement a single ECL arithmetic element could
perform the required 8 log 2 8 = 24 additions and subtractions fast enough to transform
pixels at the required 8 MHz rate. At 25 ns for an addition operation the 24 operations
could be completed in 600 no. This equates to 75 ns per pixel or a 13.3 MHz rate, which
is more than adequate. Several very high speed ECL memories are required to manipulate
the data to be operated on by the ECL arithmetic element. A possible sequence for that
manipulation was outlined by Pratt.(2)
An Economical Implementation for Low Frame Rates
The preceeding section discussed several implementations of the FHT that operate at
pixel rates in excess of 10 MHz. If the application is suitable for low frame rates
(7.5 frames per second or less) or lower resolution (256 x 256 pixels per frame), then a
simple implementation of a 64 point (8 x 8) transform can be achieved in real-time.
Fig. 3 illustrates the use of two 8 point transforms using N2 = 64 computations each
and a RAM to process the video data.
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information /209
Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0
FEEDBACK
Pixels
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
1 P2 P3 P4 Ps P6 P7
HO
H7 H3 H4 H1 H6
H2
Fig. 2 An Iterative Implementation of the FHT
Mode
ALU
Accum.
RAM
H5
R4
Mode
ALU
Accum.
Fig. 3 A Two-Dimensional Implementation of the
Hadamard Transform Using Two ALU's and a RAM
Transform
Coef.
210 / SPIE Vol. 66 ( 9i EfficientTraRSIni 1.0 Qf Pictorialformation2.i,_
1m5pprovea r-orssKeieaseziO U 09 / 0 5 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0
Circuit Operation
Video data in the form of pixels enters R1. After the accumulation of 8 pixels they
are loaded into R2 and recirculated in R2 8 times at a rate 8 times faster than the
shift rate in R1. During each circulation of the 8 pixels the mode input to the
arithmetic logic unit (ALU) provides the proper sequence of add (+) and subtract (-)
instructions to the ALU to generate a Hadamard coefficient. At the end of each circula-
tion the Hadamard coefficient is transferred from the accumulator into the RAM. The RAM
is large enough to store 8 lines of transformed video data. This amounts to 41< words for
512 elements per line. The data is read from the RAM orthogonal to the orientation in
which the data was stored. This procedure generates the two-dimensional Hadamard trans-
form coefficients when the data from the RAM is processed through the second 8 point
transform circuit.
Speed of Operation
The primary limitation in the speed of operation of this circuit is the RAM. A read-
modify-write time of 500 ns will result in a 7.5 frame per second processing rate.
Schottky TTL or ECL logic will meet the 62 ns operating speed required to implement the
8 point transform at this speed (8 operations x 62 ns = 500 ns). For this lower speed
operation, less than 50 IC's are required to implement this two-dimensional Hadamard
transform. The use of ECL to build this unit would result in a pixel rate of greater
than 5 MHz. This would be 20 frames per second at a resolution of 500 x 500 pixels.
The use of higher speed RAM's (200 ns read-modify-write) would be necessary to build the
system. 41< RAM's are not generally available with this speed. An increase in the number
of RAM IC's from 8 to 24 would be required.
Conclusions
Several methods of computing the Hadamard transform have been illustrated covering
a wide range of computational speeds. In particular, a combination of two 8 point
transformers using the straight forward N2 = 64 computation and a small RAM results in
an economical implementation when lower frame rates or resolution are adequate to meet
the requirement.
References
1. Wintz, P. A., "Transform Image Coding," Proceedings of the IEEE, Vol. 60, No. 7,
pp. 809-820, July 1972.
2. Pratt, W. K., Kane, J., and Andrews, H. C., "Hadamard Transform Image Coding,"
Proceedings of the IEEE, Vol. 57, No. 1, pp. 58-68, January 1969.
3. Habibi, A., Pratt, W. K., Robinson, G., Means, R., Whitehouse, H., and Speiser, J.,
"Real Time Image Redundancy Reduction Using Transform Coding Techniques," 1974 Inter-
national Communications Conference, June 1974.
4. Ahmed, N., Natarajan, T., and Rao, K. R., "Discrete Cosine Transform," IEEE
Transactions on Computers, Vol. C-23, pp. 90-93, January 1974.
5. Noble, S. C., Knauer, S. C., and Giem, J. I., "A Real-Time Hadamard Transform
System for Spacial and Temporal Redundancy Reduction in Television," presented at 1973
International Telemetering Conference, October 1973.
6. Robinson, G. S., "Orthogonal Transform Feasibility Study," prepared for NASA/MSC
by COMSAT, Contract No. NAS 9-11240, Final Report, pp. 3-7 to 3-21, October 1971.
SPIE Vol. 66 (1975) Efficient Transtni.ssion of MctotialinformItion /211
Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
A TEACHING STEREO-VIDEO MICROSCOPE
James F. Butterfield, President
Dimensional Television Corporation
13720 Riverside Drive
Sherman Oaks, California 91423
Abstract
A new medical training instrument has been developed, which provides a means whereby microsurgery procedures
performed using an optical microscope can be viewed by any number of medical students. This is accomplished by
picking-up a television picture from the operating microscope and displaying the picture so the operation can be seen
"live" on TV monitors at remote locations. Also the picture can be recorded on video tape for later playback for med-
ical teaching purposes. This instrument has the principal advantage over other teaching means of microsurgery in that
It can provide a three dimensional color picture so that the student sees the same view the surgeon saw as he perform-
ed the operation.
Background
In recent years an increasing number of surgeons have been using optical stereo microscopes to view small areas
of the body so that they can perform microsurgical techniques. Such surgical microscopes are used in the fields of
ophthalmology, otolaryngology, neurosurgery and plastic surgery to name a few. However, medical schools and indi-
vidual surgeons have had difficulty in teaching medical students their techniques and in keeping up with new techniques
developed by others. This problem is occasioned by the fact that such surgery is performed in a small area and usu-
ally has to be viewed with magnifications of up to 20x. It is, therefore, difficult for a group of students to gather
around the patient and attempt to see the same area of the body the surgeon is viewing. The only exception to this is an
assistant's microscope trained on the area, which allows one person at a time to see an oblique view.
Surgeons some years ago drew illustrations of surgery and used these as visual teaching aids. More recently still
and motion picture cameras have been trained directly on the area of operation or have been attached with beam split-
ters to operating microscopes. These photographic means have certain disadvantages. Except in the case of Polaroid?
photos, the picture is not immediately available as development and printing usually take several days. When the pic-
ture is viewed it is sometimes out of focus or some other technical problem occurred, making the photograph or film
not useable. Also there is no way for others to see this picture simultaneously with the operation and it must be view-
ed at a special meeting set up some days later. Finally, these methods provide a picture in only two dimensions and
not three dimensional as the surgeon sees with his own two eyes directly or through the stereo optical microscope.
DISPLAY CONSOLE
SIDE VIEW
TV MONITOR
ktt
VIDEO TAPE
RECORDER
CLASSROOM TV MONITOR
TOP VIEW
STEREO-HOOD
STEREO-SCREEN
TELEVISION CAMERA
/
ZOOM
TUBES TUBES
rip
EYEPIECES
STEREO BEAM
SPLITTER
OBJECTIVE SLIT
LENS / LAMP
/
MICROSCOPE
FRONT VIEW
TEACHING STEREO-VIDEO MICROSCOPE
212 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
The author and his associates have been active in the fields of video microscopy and three dimensional television
(1 thru 6)? A recent paper on the development of a surgical video microscope describes an instrument which produces
a high resolution stereoscopic color television picture so that the surgeon can actually operate by looking at a televi-
sion display. This equipment was designed in coordination with and named in honor of Dr. John P. Beale, Jr. of the
San Francisco Eye Institute. With such equipment the picture must be three dimensional for the surgeon to perform
delicate operations, such as cutting and sewing. The surgical video microscope has the advantage over an optical
microscope of permitting the surgeon considerable head movement (7).
Description
The illustration shows the various components of a Teaching Video Microscope. The Slit Lamp directs its light on
the patient and the light is in turn reflected up through the Objective Lens and Zoom Tubes of the optical microscope.
The eyepiece housing has been separated from the objective lens housing and a Stereo Beam Splitter inserted therein.
This transmits half of the light of each image of the stereo-pair up through the Eyepieces and the other half of the light
of each image is transmitted horizontally to a Television Camera. The left and right eye images are focused side-by-
side on the pickup tubes of the miniaturized color television camera. The two images are displayed side-by-side on a
color Television Monitor in a Display Console for "real time" viewing and on classroom TV monitors.
The classroom TV monitors have a Stereo-Screen positioned in front of them. This consists on each half of polar-
izing filters oriented at right angles to each other. The students wear Stereo-Glasses, which include corresponding
polarizing filters in front of each eye that insure the proper image is channeled to the appropriate eye. The Stereo-
Glasses also include prism wedges, which cause the images to appear superimposed. Attached to the Display Console
monitor is a hinged Stereo-Hood, which consists of an enclosure holding a large pair of Stereo-Glasses. In this case,
only individual viewing is possible but the viewer can wear regular corrective glasses and turn away and make notes
and look back at the image without the necessity of removing the special Stereo-Glasses. The Display Console also
contains a conventional Video Tape Recorder, which is connected into the circuit. This VTR records the two side-by-
side pictures and plays them back for stereo viewing on the classroom monitors.
Advantages
The Teaching Video Microscope has several valuable advantages. This is the only means whereby a group of
students can see the operation in exactly the same manner as the surgeon does in 3D and color. This viewing can take
place simultaneously with the operation or a video tape may be made for later instruction or analysis purposes. The
video tape can be edited into a quality educational tape, which may be distributed through medical television network
channels.
As the operation is being performed other members of the surgical team can follow the progress of the operation
on two dimensional or three dimensional television monitors and thereby be more proficient in aiding the surgeon. Al-
so, other surgeons can follow the progress of the operation from remote areas without having to scrub and suit up.
They see the same picture that the surgeon does and hear his comments. Since the tapes are conventionally recorded
and played back, this permits easy distribution for medical training purposes in teaching hospitals, research organi-
zations, or for viewing by the doctor in his office or home. The equipment required is that of a conventional video
tape recorder and monitor. If the viewing is to be in 3D then a Stereo-Screen and Stereo-Glasses or a Stereo-Hood is
necessary. The Hood can be removed or swung up out of the way during two dimensional viewing.
Specialized Teaching Video Microscopes can use "false-color" coding, wherein a black-and-white television pic-
ture is picked up and electronically processed so that the different levels of gray provide many shades of color on a
color TV monitor. The "false-color" picture is similar to microscopic "staining" and conveys more visual informa-
tion to the eye than a black-and-white and gray picture. Another possibility is to use electronic polarity reversing so
that the picture is seen in its normal shades of black-and-white and then at the touch of a switch the blacks are re-
versed to white. Sometimes rapid polarity reversing by the surgeon brings out features not seen in a normal manner.
Television pickup tubes are now available, which are sensitive to a wide range of the electromagnetic spectrum. For
example, there are tubes sensitive to infrared or ultraviolet and these may be used so that heretofore unobservable
features can be seen by the surgeon and student. Finally, low light level pickup tubes can be employed for conditions
where the intense illumination of a slit lamp would be damaging to the tissues.
Microscopes are following the evolution that has occurred in other fields in that optical instruments, such as
motion pictures, are being supplemented and replaced by electronic instruments, such as television. The electronic
picture of the Teaching Video Microscope can be stored in a computer, processed and displayed. Electronic enhance-
ment can bring out features not decernable optically.
Conclusion
In the past the surgeon has been extremely limited in his ability to visually communicate information regarding
operational techniques on small areas of the body. Now television has revolutionized this by converting the optical
picture into an electronic picture, which can be sent down a cable, through the air, and recorded on magnetic tape to
be seen now or at a later time. The Teaching Video Microscope can provide a three dimensional color television dis-
play of surgical procedures so that the student sees essentially the same picture that the surgeon saw during the this
he performed the operation.
SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information /213
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
References
1. Butterfield, J. , "Video Microscope", S. P. L E. Seminar on Imaging Techniques for Testing and Inspection,
February, 1972.
2. Butterfield, J.., "Television Microscopy", Preprints of 25th Annual S. P. S. E. Conference, p. 1-4, May, 1972.
3, Butterfield, J., "Three Dimensional Television", Proc. of 15th Annual S. P. I. E. Symposium, Vol. 3, p. 3-9,
September, 1970.
4. Butterfield, J., "A Survey of Three Dimensional Displays", Digest of IEEE International Convention, p. 116-
117, March, 1972.
5. Butterfield, J., "Stereoscopic Television", Digest of OSA Tech. Papers of "Design & Visual Interface of
Biocular Systems", p. ThA7-1, May, 1972.
6. Butterfield, J., "Stereo Television Microscope", U.S. Patent No. 3,818,125, June, 1974.
7, Butterfield, J., "Development of a Surgical Stereo-Video Microscope", Proc. of 18th Annual S. P, I. E. Tech
Meeting, Vol. 54, p. 8-11, August, 1974.
214 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP801300829A001100060001-0
PICTORIAL INFORMATION TRANSMISSION THROUGH SIMULATION
Robert T. P. Wang
Honeywell
Marine Systems Division California Center
West Covina, California 91790
Abstract
In an age when man is inundated with information, his natural ability to selectively assimilate the data presented
to him has become an indispensible tool for survival. By the same token, man's visual perception limitations have
been used to reduce the amount of data needed to reproduce pictorial information designed for his consumption.
A comparative study of two divergent approaches to the problem of providing an optimum amount of video in-
formation to a human viewer is discussed. Video communication systems exemplify one area where the approach
consists of reducing the data from real world scenes by video data compression algorithms. The opposite approach
is found in visual simulators where scenes are constructed synthetically to approach real world realism by adding
cues to the basic structure of the digital image representation used. Such simulators are used in groundbased trainers
designed to reduce the cost of training operators of expensive equipment. In both situations there is a need to provide
realistic video to a human observer. In the quest for optimum pictorial information transmission, simulated scenes
are shown to provide some rather unusual, hitherto unexplored, insights and alternatives.
1. 0 Introduction
Communicating through visual imagery has received considerable attention in two rather diverse fields of
application -- the efficient transmission of video information over digital communication channels, and the means of
providing the visual stimuli needs of groundbased operator trainers. In both cases, a realistic real-time image
reconstructed from a minimum number of information bits is desired.
Video data compression begins with an analog image that is sampled, and digitally coded. The resulting bits are
then acted upon by algorithms that use a priori knowledge of the local structure of the pictures in time and space.
Some processing algorithms also use visual perception properties of the human eye. The result of such processing is
a minimal set of data that can be used by its matching receiver to produce a subjectively pleasing picture to the
viewer. In this way the required channel bandwidth can be reduced.
Visual imagery displays in simulation trainers are used to provide a sense of realism to the trainees. The
simulator must respond to physical inputs just as the operational equipment would with sufficient detail and resolution
to supply all the necessary visual cues to the student. Since the imagery is simulated, the "world" which forms the
"source" of the imagery must be stored within the simulator as a data base. Using the stored data and the geometry
of the scenario, the proper projection of the light rays is computed to provide the desired image. To produce a real-
time system, a study to determine the minimum information needed to provide realistic, real-time visual imagery
must be undertaken. Therefore, the approach used to simulate visual imagery for a trainer is discussed. Beginning
with basic line drawings, factors that add realism to the simulated image are progressively introduced until an image
is obtained.
The approaches used to provide a "realistic" or visually pleasing image for digital video data communication
systems appears to be diametrically opposed to the approach used in simulation trainers. However, since the goals
of both endeavours are ultimately the same, it is important to both fields that the approaches used by one another be
understood. Such a cross fertilization of ideas will result in better techniques for both.
The fundamentals of data compression and simulation trainers are outlined in the next two sections followed by a
summary of salient principles used. By applying the underlying principles from one field to the other, it is shown
that new directions emerge in both endeavors that will potentially aid the advancement of each. A discussion of these
methods is given in Section Four.
2. 0 Fundamentals Of Video Data Compression
The generic open-loop video data communication
system is shown in Figure 2-1.
The basic nature of the source signal and the per-
ception properties of the human eye play key roles in the
development of any video data compression system. For
completeness, the underlying nature of both the source
and the viewer are reviewed in the following two sub-
sections.
2.1 The Source
The data source is derived from scanning a
projection of the visible world onto a two-dimensional
CAMERA TRANSMITTER
CHANNEL!
RECEIVER DISPLAY
S = Signal Source
0 = Optical Imager (Camera Optics)
T = Transducer (Vidicon)
E $ = Source Encoder (Data Compressor)
E c = Channel Encoder (Error Coding and Modulation)
Dc = Channel Decoder
(Error detection and correction and Demodulation)
Ds = Source Decoder
(Data Decompressor/Image Reconstruction)
TV = Image Display
Fig. 2-1. Video Communication System
SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 215
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
plane*. The projected data display spatial and temporal properties that can be exploited to reduce signalling
rates.
Estournet [1969] showed the video source to be a non-stationary stochastic process. Clustering of iso-intensity
picture elements (pixels), forming a source parameter that obey certain area distributions, was studied by Nishikawa,
et al [1965]. Both of these properties are inherent in pictures and were used in the development of a generalized
structural model of the source [Wang, 1972]. In addition to these characteristics, it is known that intensity variations
form patterns in the spatial domain that cover non-trivial areas. Thus, correlation between spatially distinct pixels
can persist even over large spatial distances [Wang, 1975].
Time sampling of the video source results in a series of frames. ** Within the time span between two suc-
cessive frames, the amount of motion that produces changes between the two frames is minimal. This results in
considerable temporal redundancy.t
2.2 The Viewer
The human visual system can be grossly partitioned into four major components. Consider Figure 2-2. The
optical system images the "real world" onto a screen called the retina. Transducers (rods and cones) on the retina
sample and convert the image into electrical impulses that are transmitted along neurons to the brain. It is at the
transducer juncture that man's knowledge becomes extremely dim. Considerable processing is thought to occur
between the rods and cones and the optic nerve. This is based on the fact that the number of individual fibres in the
optic nerve accounts for only 1% of the total number of sensor elements in the retina [Merchant, 1965]. It is further
conjectured that processing might even occur along the optic nerve itself.
Rods and cones are unevenly distributed on the retina. A high density of sensors concentrate around the fovea
and drop off rapidly on either side as shown in Figure 2-3. This sampling configuration results in spatial band-
limitedness that changes with the distance an object lies off the optical axis of the eye.
S
0 = Physical Optics (Eyeball)
T = Sampler/Transducer (Retina)
P = Preprocessor/Transmission Line (Neural Network)
B = Memory/Processor (Brains)
C = Feedback Motor Controls (Eye Muscles)
S- Signal Signal Source (Light from the field of view)
Fig. 2-2. Simplified Block Diagram of the Human
Visual System.
16
12
Lqi 8
p 4
------------
30
I ' '
(
NASAL SIDE
BLIND SPOT
---- -- ?
60 30 0 30 60
ANGULAR SEPARATION FROM FOVEA MEG/
90
Fig. 2-3. Distribution of Rods and Cones on the Retina.
The human eye is very sensitive to sudden changes that are connected in space to form edges. However, a
pattern of edges that form a spatial structure will produce an averaging that makes the eye relatively insensitive to
noise having a matching pattern. These phenomena have been modelled by a cross-coupling neural net that produces
laternal inhibition [Cornsweet, 19711. This model also explains the high sensitivity of the eye to motion, noise in
low activity regions of a picture and, an insensitivity to noise along edges in a picture.
In the temporal domain, bandlimitedness is manifested through the well known phenomenon of "persistence of
vision". Persistence of vision has been the basis for motion in movies and television for many years, allowing
movement to be captured in time samples of the pictorial data.
Finally, the eye is responsive to a tri-stimulus color system and becomes progressively color blind as the
spatial size of a colored object diminishes. This fact was used in the design of the NTSC standard where high spatial
frequencies are represented by the luminance signal alone. A further experimental fact concerning colored imagery
is that the eye is more tolerant of luminance noise and error when color is present.
The factors discussed in the previous two sections will be used to support the reasons underlying the various
video data compression techniques that have been developed through the years.
2.3 Basic Compression Approaches
2.3.1 Overview. Data compression algorithms initially concentrated on the characteristics of pictures, but
later expanded to include the properties of human visual perception.
* The visible world is assumed to include imagery that is not necessarily visible to the naked eye, but which can be
made visible via various processing means.
** American NTSC standard frame repetition rate is 30 frames per second.
t It has been shown that camera pan and rapid scene changes do occur, and that frame-to-frame redundancies are
not as extensive as earlier expected.
216 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information
Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0
Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0
Predominant among early techniques were those that used iso-intensity pixel clustering or "redundancy" in the
spatial domain*. As memories or storage devices became economically reasonable, frame-to-frame redundancies
were explored to provide further savings.
Context-dependent noise sensitivity of human visual perception was first explored through non-linear quantization
of difference signals. Such perception-matched noise distribution was finally carried into the two-dimensional
transform-domain. Algorithms tailored to visual perception properties were later extended into the temporal domain
through motion dependent spatial coding. A controversy still rages as to whether the "inability" of the eye to perceive
high spatial structure in moving objects is truly a characteristic of visual perception. Some researchers feel that the
phenomenon, presently adopted as a visual perception trait, is actually induced by long hours of viewing television,
whose cathode ray tube introduces lags due to phosphor and electron beam characteristics.
2. 3.2 Redundancy Removal Techniques. Given that X1, X2, ... Xn denotes a string of n pixel values sampled
from a picture through a raster scan, the two basic options available to the designer are depicted in Figure 2-4.
2. 3.2. 1 Interpolator. An interpolator forms a
function f (t:m, n) through two preselected pixels Xm, X,
where Xm is some pixel defining the initial value Of the
function and Xn the latest value of that function. A term-
by-term error analysis is then conducted, where
Xt - 1m' n) = ev m t eb, then the values Xn_i and Xn are both transmitted to
the receiver as the terminal value of the previous inter-
polation interval and the start of a new interpolation inter-
val respectively. The process is then repeated for the
next series of pixels that obey function fn_1(t:m, n), where
the subscript denotes the starting location of the function.
In this way, the picture is described by a series of
functions fm, fn_1,...1 .
2. 3.2. 2 Predictor. A similar breaking-up of the
picture into a series of functions is accomplished by
prediction. In this algorithm, a function of the previous
s pixel values are used to predict the value of the latest sample. Mathematically,
A
Xn = f(Xn-r Xn_s)
If eb is the error bound for the system and
(2. 2)
A
Xn ? Xn