EFFICIENT TRANSMISSION OF PICTORIAL INFORMATION

Document Type: 
Collection: 
Document Number (FOIA) /ESDN (CREST): 
CIA-RDP80B00829A001100060001-0
Release Decision: 
RIPPUB
Original Classification: 
K
Document Page Count: 
245
Document Creation Date: 
December 9, 2016
Document Release Date: 
August 22, 2001
Sequence Number: 
1
Case Number: 
Publication Date: 
August 22, 1976
Content Type: 
REPORT
File: 
AttachmentSize
PDF icon CIA-RDP80B00829A001100060001-0.pdf22.28 MB
Body: 
STATINTL Approved For Release 2001/09/05 OF THE SOCIETY OF PHOTO-OPTICAL INSTRUMENTATION ENGINEERS 400.11=111111?11MIMIS VOLUME 66 Efficient Transmission of Pictorial Information August 21-22, 1975 San Diego, California a..t.t e4x..g.e.LA.ey. ,2-ey (90 Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Proceedings of the Society of Photo-Optical Instrumentation Engineers VOLUME 66 Efficient Transmission of Pictorial Information August 21-22, 1975 e San Diego, California Andrew G. Tescher Editor Presented by The Society of Photo-Optical Instrumentation Engineers In cooperation with Department of the Army, United States Army Electronics Command, Night Vision Laboratory Department of the Navy, Office of Naval Research Energy Research Development Agency Environmental Research Institute of Michigan Institute of Electrical & Electronic Engineers, San Diego Section The Institute of Optics, University of Rochester Jet Propulsion Laboratory National Aeronautics & Space Administration, Ames Research Center National Aeronautics & Space Administration, Headquarters, Washington, D. C. National Science Foundation The Optical Society of America U. S. Department of Commerce, National Oceanic and Atmospheric Administration Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Copyright ? 1975 by the Society of Photo-Optical Instrumentation Engineers, 338 Tejon Place, Palos Verdes Estates, California 90274 U.S.A. All rights reserved. This book or any part thereof must not be reproduced in any form without the written permission of the publisher. Printed in U.S.A. II / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Contents Author Index Seminar Committee vii Reasons for Data Compression?An Introduction ix Andrew G. Tescher, The Aerospace Corporation SESSION I. FUNDAMENTALS OF IMAGE DATA COMPRESSION 1 An Overview of Human Observer Characteristics and their Effect on Image Transmission and Display 2 Thomas G. Stockham, Jr., University of Utah Coding of Two-Tone Images 5 T. S. Huang, Purdue University Source Coding of Television Signals Using Interframe Techniques 9 B. G. Haskell and P. L. Gordon, Bell Telephone Laboratories, Inc. Bandwidth Compression of Multispectral Data 23 A. Habib' and A. S. Samulon, TRW Systems Group Real Time Television Image Bandwidth Reduction Using Charge Transfer Devices 36 H. J. Whitehouse, R. W. Means, E. H. Wrench, Naval Undersea Center Technology of Charge-Coupled Devices for Video Bandwidth Reduction 48 D. D. Buss, R. W. Brodersen, C. R. Hewes, A. F. Tasch, Jr., Texas Instruments, Inc. SESSION II. ADVANCED IMPLEMENTATIONS 57 Real-Time Video Compression Algorithm for Hadamard Transform Processing 58 Scott C. Knauer, Ames Research Center, NASA An Advanced Imaging Communication System for Planetary Exploration 70 Robert F. Rice, Jet Propulsion Laboratory Advanced Facsimile System and Network Applications 90 John W. Watkins, Harris Corporation, Electronic Systems Division An Operational Video Data Compression System for ATS and ITOS 94 R. L. Kutz, NASa/Goddard Space Flight Center; L. D. Davisson, Univ. of Southern California A CAQ Bandwidth Reduction System for RPV Video Transmission 101 James J. Pearson, Lockheed Palo Alto Research Laboratory Intraframe and Interframe Adaptive Transform Coding 108 Clifford Reader, Aeronutronic Ford Corporation SESSION III. Panel: IMPACTS OF IMAGE DATA COMPRESSION 119 SESSION IV. POTENTIAL IMPLEMENTATIONS 121 Joint Pattern Recognition Data Compression Concept for ERTS Multispectral Imaging 122 Edward E. Hilbert, Jet Propulsion Laboratory A Dual Mode Nonlinear Interpolative Compressor for SAR Images 138 Wen-hsiung Chen, Aeronutronic Ford Corporation A Fast Two-Dimensional Karhunen-Loeve Transform 144 Robert M. Heralick, Norman Griswold, Nimitra Kattiyakulwanich, University of Kansas Center for Research, Inc., Remote Sensing Laboratory Singular Value Decomposition for Image Coding 160 Harry C. Andrews, University of Southern California, Image Processing Institute TV Bandwidth Compression 161 Paul Wintz, Purdue University; Jim Wilson, WINTEK Corp.; Ernie Schmidt, Aeronutronic Ford Corp. DPCM Quantization Error Reduction for Image Coding 167 William K. Pratt, Michael N. Huhns, University of Southern California Combined Spatial and Temporal Coding of Digital Image Sequences 172 John A. Roese, Naval Undersea Center; Guner S. Robinson, University of Southern California SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / III Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 SESSION V. RELATED TOPICS 181 Laser Scanner/Recorders for Image Transmission and Computer Processing 182 W. F. Schreiber, U. F. Gronemann, D. E. Troxel, Massachusetts Institute of Technology Real-Time Image Data Acquisition/Retrieval (RIDAR) System 190 Harold H. Kantner, CAI, a Division of Bourns, Inc. An Investigation of MSE Contributions in Transform Image Coding Schemes 196 John R. Parsons, Andrew G. Tescher, The Aerospace Corporation A Comparison of Hardware Implementations of the Hadamard Transform for Real-Time Image Coding. . 207 Stephen C. Noble, Air Force Avionics Laboratory, Wright Patterson Air Force Base A Teaching Stereo-Video Microscope 212 James F. Butterfield, Dimensional Television Corporation Pictorial Information Transmission through Simulation 215 Robert T. P. Wang, Honeywell, Marine Systems Division California Center Three-Dimensional Perception Produced by Two 2-Dimensional Geometric Patterns Presented with Brief Exposure Times and Time Intervals 223 Kiyoe Mizusawa, The Pennsylvania State University iv / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Author Index Andrews, Harry C. (University of Southern California, Image Processing Institute) 160 Singular Value Decomposition for Image Coding Brodersen, R. W. (Texas Instruments, mc) 48 Technology of Charge-Coupled Devices for Video Bandwidth Reduction Buss, D. D. (Texas Instruments, Inc.) 48 Technology of Charge-Coupled Devices for Video Bandwidth Reduction Butterfield, James F. (Dimensional Television Corporation) 212 A Teaching Stereo-Video Microscope Chen, Wen-hsiung (Aeronutronic Ford Corporation) 138 A Dual Mode Nonlinear Interpolative Compressor for SAR Images Davisson, L. D. (University of Southern California) 94 An Operational Video Data Compression System for ATS and ITOS Gordon, P. L. (Bell Telephone Laboratories, Inc.) 9 Source Coding of Television Signals Using Interframe Techniques Griswold, Norman (University of Kansas Center for Research, Inc.) 144 A Fast Two-Dimensional Karhunen-Loeve Transform Gronemann, U. F. (Massachusetts Institute of Technology) 182 Laser Scanner/Recorders for Image Transmission and Computer Processing Habibi, A. (TRW Systems Group) 23 Bandwidth Compression of Multispectral Data Haralick, Robert M. (University of Kansas Center for Research, Inc.) 144 A Fast Two-Dimensional Karhunen-Loeve Transform Haskell, B. G. (Bell Telephone Laboratories, Inc.) 9 Source Coding of Television Signals Using Inter frame Techniques Hewes, C. R. (Texas Instruments, Inc.) 48 Technology of Charge-Coupled Devices for Video Bandwidth Reduction Hilbert. Edward E. (Jet Propulsion Laboratory) 122 Joint Pattern Recognition/Data Compression Concept for EATS Multispectral Imaging Huang, T. S. (Purdue University) 5 Coding of Two-Tone Images Huhns, Michael N. (University of Southern California) 167 DPCM Quantization Error Reduction for Image Coding Kantner, Harold H. (CAI, a Division of Bourns, Inc.) 190 Real-Time Image Data Acquisition/Retrieval (RIDAR) System Kattiyakulwanich, Nimitra (University of Kansas Center for Research, Inc ) 144 A Fast Two-Dimensional Karhunen-Loeve Transform Knauer, Scott C. (Ames Research Center/NASA) 58 Real-Time Video Compression Algorithm for Hadamard Transform Processing Kutz, R. L. (NASA/Goddard Space Flight Center) 94 An Operational Video Data Compression System for ATS and ITOS Means, R. W. (Naval Undersea Center) 36 Real Time Television Image Bandwidth Reduction Using Charge Transfer Devices Mizusawa, Kiyoe (The Pennsylvania State University) 223 Three-Dimensional Perception Produced by Two 2-Dimensional Geometric Patterns Presented with Brief Exposure Times and Time Intervals Noble, Stephen C. (Air Force Avionics Laboratory, Wright-Patterson Air Force Base) 207 A Comparison of Hardware Implementations of the Hadamard Transform for Real-Time Image Coding Parsons, John R. (The Aerospace Corporation) 196 An Investigation of MSE Contributions in Transform Image Coding Schemes Pearson, James J. (Lockheed Palo Alto Research Laboratory) 101 A CAQ Bandwidth Reduction System for RPV Video Transmission SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / v Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Pratt, William K. (University of Southern California) 167 DPCM Quantization Error Reduction for Image Coding Reader, Clifford (Aeronutronic Ford Corporation) 108 Intro frame and lnterframe Adaptive Transform Coding Rice, Robert F. (Jet Propulsion Laboratory) 70 An Advanced Imaging Communication System for Planetary Exploration Robinson, Guner, S. (University of Southern California) 172 Combined Spatial and Temporal Coding of Digital Image Sequences Roese, John A. (Naval Undersea Center) 172 Combined Spatial and Temporal Coding of Digital Image Sequences Samulon, A. S. (TRW Systems Group) 23 Bandwidth Compression of Multispectral Data Schmidt, Ernie (Aeronutronic Ford Corporation) 161 TV Bandwidth Compression Schreiber, W. F. (Massachusetts Institute of Technology) 182 Laser Scanner/Recorders for Image Transmission and Computer Processing Stockham, Jr., Thomas G. (University of Utah) 2 Overview of Human Observer Characteristics & their Effect on Image Transmission & Display Tasch, Jr., A. F. (Texas Instruments, Inc.) 48 Technology of Charge-Coupled Devices for Video Bandwidth Reduction Tescher, Andrew G. (The Aerospace Corporation) ix 196 Reasons for Data Compression?An Introduction An Investigation of MSE Contributions in Transform Image Coding Schemes Troxel, D. E. (Massachusetts Institute of Technology) 182 Laser Scanner/Recorders for Image Transmission and Computer Processing Wang, Robert T. P. (Honeywell, Marine Systems Division California Center) 215 Pictorial Information Transmission through Simulation Watkins, John W. (Harris Corporation, Electronic Systems Division) 90 Advanced Facsimile System and Network Applications Whitehouse, H. J. (Naval Undersea Center) 36 Real Time Television Image Bandwidth Reduction Using Charge Transfer Devices Wilson, Jim (WINTEK Corp) 161 TV Bandwidth Compression Wintz, Paul (Purdue University) 161 TV Bandwidth Compression Wrench, E. H. (Naval Undersea Center) 36 Real Time Television Image Bandwidth Reduction Using Charge Transfer Devices vi / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 roved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Seminar Committee General Chairman Andrew G. Tescher, The Aerospace Corporation Technical Program Committee Harry C. Andrews, University of Southern California All Habibi, TRW Systems Group John R. Parsons, The Aerospace Corporation and University of Arizona Session I. Fundamentals of Image Data Compression Chairman, Andrew G. Tescher, The Aerospace Corporation Session II Advanced Implementations Chairman, Ali Habibi, TRW Systems Group Session III. Impacts of Image Data Compression Panel Moderator, Andrew G. Tescher, The Aerospace Corporation Session IV. Potential Implementations Chairman, J. R. Parsons, The Aerospace Corporation Session V. Related Topics Chairman, Harry C. Andrews, Image Processing Institute, University of Southern California SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / vii Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 viii / SPIE Vol. 66 (1975) Efficient Transmission of fictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP806008 9A001100060001-0 Reasons for Data Compression?An Introduction Andrew G. Tescher The Aerospace Corporation, P. 0. Box 92957, Los Angeles, California 90009 Visual data is the primary form by which man acquires information. However, the re- quired information and its potential user may be separated by large distances. Although conventional communication channels are available, the large amount of information present in pictorial data quite often taxes this available communication bandwidth. At the practical level, pictorial data transmission may be prohibitively expensive. The seminar on "Efficient Transmission of Pictorial Information" attempted to cover various problem areas associated with visual information transfer technology. The twenty- six, primarily invited, papers address current and future problem areas. The various sub- ject areas can be grouped into image representation, image coding, psychophysical consid- erations, channel effects, image quality, data management, and display technology. There is a balanced ratio of primarily theoretical discussions and those papers which emphasized hardware development. By design, the presentation level of the various papers is diverse. On one side of the spec- trum, the several review papers, whose authors are well-respected experts of long experi- ence in this field, are primarily provided for the newcomers to the subject of image com- pression. The majority of the papers, however, present new and valuable results and in most cases advance the state-of-the-art of their respective specializations. In addition to the formal presentations, an informal panel discussion was organized and moderated by this author. The panelists were F. Billingsley of JPL, H. M. Federhen of ARPA, C. R. Hewes of Texas Instruments, Inc., and P. A Wintz of Purdue University. The panel, interactively with the numerous individuals offering comments and sugges- tions, has reiterated the need for new solutions in the field of pictorial data communica- tion. Regarding just how fast new solutions will come, the opinion is diverse. Perhaps based on a personal bias leaning towards optimism, this author believes the new and effective solutions will come soon. This belief is supported by the extensive and high quality ac- tivity in the field of image compression and related technologies. Andrew G. Tescher, Editor SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / ix Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 x / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP801300829A001100060001-0 Session I FUNDAMENTALS OF IMAGE DATA COMPRESSION Chairman ANDREW G. TESCHER The Aerospace Corporation SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 1 Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 AN OVERVIEW OF HUMAN OBSERVER CHARACTERISTICS AND THEIR EFFECT ON IMAGE TRANSMISSION AND DISPLAY Thomas G. Stockham, Jr. Departments of Computer Science and Electrical Engineering University of Utah Salt Lake City, Utah 84112 The science and technology of image transmission and display has evolved primarily from the diciplines of computer science, electrical engineering and physics. Thus, it is only natural that techniques and attitudes which have developed are characterized by the style in which people in these areas approach the subject of images and their trans- mission, display, and processing. For example, one finds many important differences between the methods of television and those of photography. Moreover, an even greater contrast is found when comparing the methods of electrical engineering, physics, tele- vision and photography with those now understood to be employed by the human eye and the human visual system. Specifically, let us contrast the method by which television, photography and the human visual system, represent image information. In television (particularly digital imaging), image values are represented by signals analogous to quantities of light. They are called intensities. In photography, on the other hand, the representing quantities are concentrations of silver or dyes. Consequently, due to the natural exponentiating laws governing the interaction of light with these media, they are analogous to the logarithm of quantities of light. They are called densities. The human visual system, on the other hand, (while being generally logarithmically sensitive) moves a large step further away from representation by physical quantities of light and produces at very early stages in its processing highly modified versions of the patterns of light or their logarithms. The natural question then arrises, why should the human visual system try to do this; and since it does, what consequences are implied in terms of the television and photographic presentations normally employed? The answers to these questions rest crucially on the issue of errors. In any trans- mission or display system, errors will be committed. These are unavoidable and are a result of the physical limitations encountered. In a broad variety of applications, the most important forms of error encountered are those imposed by limited dynamic range and various forms of noise. The classic dilemma one faces in the diciplines of image trans- mission and display is that a compromise must be effective between the conflicting con- straints of dynamic range and noise. On the one hand, one wishes to make signals larger so that the noise may be rendered negligible. At the same time, large signals are pre- cluded by the limitation in dynamic range in the form of distortions which effect the signals when their values are made too variable. More specifically, in any transmission and/or display design, the natural goal would seem to be the attainment of fidelity reproduction. Unfortunately, the demands of typical imagery upon transmission and dis- play systems is usually so severe that this goal cannot be reasonably met. The dis- tortions due to limited dynamic range and noise, especially the former, are relentlessly unavoidable and so the designer must content himself with one form of distortion or another. Fortunately, the human visual system itself produces a large quantity of distortion. This phenomenon may be exploited by trading off undesirable forms for forms which will be encountered naturally anyway. The distortions produced by the human visual system are often referred to as optical illusions. The simplest of these is the gray scale illusion which reveals the logarithmic sensitivity of the human visual system alluded to above(1). This illusion is responsible for the fact that we do not turn the lights on during the day although they add just as much light then as at night. Another ramification is that gray scales must be arranged in exponential progression to appear arithmetic to the observer. Other illusions which are spatial in character are much more important and striking, however. The simplest of these is the illusion of simultaneous contrast which permits two neutral gray shades to appear so different from one another at the same time, that one can assign the names near-white to one and near-black to the other. Careful study of these illusions has permitted researchers to formulate signal processing models for the early processing stages of the human visual systekn. These models (e.g. consider those of Stockham(2), Baudelaire(3), Frei(4), and Baxter(5)) predict the human visual illusions or distortions well enough to permit their use in defining useful objective measures of how different two images (one original, the other distorted) will look from one another. Applications of these models to problems in image transmission and display have already yielded significant advantages. For example, when the model is used to evaluate the distortions produced by a given display instrument, it is often possible to predict how the physical performance of that display may be significantly relaxed without 2 / SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 producing appreciable visual degradation. Additionally, by using the visual model as a processor prototype, it is possible to "look at the image" before it is displayed or transmitted by first processing the image with the visual model processor and then trans- mitting or displaying it. This procedure places the image information in a signal space maximally compatible with the most recent knowledge of the human visual system before the degradations characteristic of either transmission or display are allowed to detract from the quality. The advantages are significant as can be seen from (2) and (6). Rom shows how in the case of transmission with coding, it is possible by including human observer character- istics in the criterion for error to produce significantly superior subjective image transmission. First, the image isprocessed through the visual model. The resulting signal is then coded and transmitted. The received image is then passed through the inverse of the visual model and displayed. Subject to the validity of the model, the effect is as if the distortions due to coding were not allowed to enter the system until the image had been processed significantly by the human visual system itself. The most important problem in image display is that associated with limited dynamic range. Cathode ray tubes, photographic transparencies, reflection photographs (i.e. prints), and images printed with ink on paper all face the basic light-to-dark ratio limitations of their media. For example, a good photographic film is not capable of reflectiong much more than 90% of the light that strikes it. Neither is it capable of reflecting much less than 1% of the light that strikes it. The result is a dynamic range limitation of about 100:1. Unfortunately, high fidelity images such as those seen by the human eye in the real world, obtain dynamic ranges far in excess of 1000:1 or even 10,000:1. This limitation can and has been overcome throughout the years by various techniques. Now, however, knowledge of the human visual system allows us to effect the same dynamic range limiting processes on the raw images to be displayed that the eye uses naturally. Since these processes take place anyway when the display is viewed, little is lost by also invoking them before the limitations of the display or transmission medium is encountered. In addition, by the very nature of the processes which the visual system uses, double processing is not subjectively disagreeable and often can hardly be dis- tinguished from the single natural process which takes place in direct viewing of the original scene. Thus, by deliberately distorting the image before transmission and dis- play by using visual model processing, one can avoid the much more disagreeable dis- tortions which would be otherwise encountered due to the transmission or display. As our knowledge of human observer characteristics grows, it is possible to formulate more sophisticated and complex visual models, and our abilities to cope with the problems being described here increases. While the latest work,inrvi.sual modeling has included such aspects of vision as color constancy and contrast(7J01) as well as the relatively recently disclosed frequency channel model of vision(8)(5), it is anticipated that in the future, much more sophisticated aspects will also be embraced. Such aspects will undoubtedly include edge detection, motion sensitivity, non-uniform visual acuity over the field and perhaps higher forms of image abstraction. Studies are already underway in an effort to discover methods for analyzing images by synthesis. One possible synthesis method is through the modern technology of shaded image synthesis by computer. While image transmission systems employing coding have for the most part made use of signal properties to achieve their bandwidth reduction, it is also possible to achieve potentially much greater reduction by coding to a level embrac- ing the geometric structure constituting the scene which has been imaged. Such a break- through would be based, not only on the structure of images themselves, but also upon the way in which that structure is perceived by the visual system. An analogy with speech bandwidth compression, which is capable of much greater information rate savings then the best general sound bandwidth compression methods, is an informative one. The basic structure of speech and the mechanism which produces it, is exploited in such bandwidth compression methods. Techniques for detecting the parameters of vocal chord motion and the shape of the human vocal tract directly from the acoustic signal are at the heart of such techniques. In addition, the nature of the humanhearing is exploited in effecting compromises which must be introduced during the analysis. So it is that one might expect the most modern efforts in image transmission, display and processing to be strongly influenced by a knowledge of the human observer charac- teristics. The implication of this statement is that the image processing scientist must be well versed not only in the physics of optics, the electrophysics of sensing, the electronics and chemistry of transmission, and the analog and digital diciplines of pro- cessing, but also in the psychophysics, biophysics and psychology of vision. References 1. Cornsweet, T.N. Vi4uat Penception. Chapter X. New York:Academic Press, 1970, pp. 249-253. SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 3 Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 2. Stockham, T.G., Jr. "Image Processing in the Context of a Visual Model," Proc. IEEE, vol. 60, pp. 828-842, July 1972. 3. Colas-Baudelaire, Patrick. "Digital Picture Processing and Psychophysics: A Study of Brightness Perception," Computer Science Dept., University of Utah, Salt Lake City, Utah, UTEC-CSc-74-025, March 1973. 4. Frei, Werner. "A New Model of Color Vision and Some Practical Implications," Univ. of Southern California Image Processing Institute, Semiannual Technical Report, March 31, 1974, USCIPI Report 530, pp. 128-143. 5. Baxter, Brent S. "Image Processing in the Human Visual System," Computer Science Dept., University of Utah, Salt Lake City, Utah, (in publication). 6. Rom, Raphael. "Image Transmission and Coding Based on Human Vision," Computer Science Dept., University of Utah, Salt Lake City, Utah, (in publication). 7. Cornsweet, T.N. Vitluat Peteeption. Chapter XIII. New York:Academic Press, 1970. 8. Campbell, F.W., Cooper, F.F. and Christina Enrath-Cndgell. "The Spacial Cells of the Cat." Journal of Physiology 203, p. 223. 4 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 CODING OF TWO-TONE IMAGES T. S. Huang School of Electrical Engineering Purdue University West Lafayette, Indiana 47907 Abstract We give a brief overview of efficient coding methods for two-tone Images, especially: white block skip- ping and runiength coding. I. Introduction In this tutotial paper we discuss some aspects of the digitization and effcient coding of two-tone images, such as business documents, line drawings, and weather maps. After a brief discussion on sampling and quantization of nominally two-tone images, we describe the two main classes of efficient coding methods: White skipping, and contour coding. Perhaps the most commonly used efficient coding scheme is runlength coding. The remaining part of the paper concentrates on easily implementable codes for runlengths. II. Digitization We shall concern ourselves with the digital transmission of images. Then, the first step is to sample the imayg and quantize the amplitude of each sample. For two-tone images we quantize the amplitude to two levels' '; thus each sample is represented by one bit. The number of samples required depends on the Image at hand. We take two examples: business letters and newspaper pages. In analog transmission of business letters, a resolution of around 100 lines per Inch gives satisfactory reproductions. For digital transmission where the amplitude of each sample is quantized to only two levels, the same resolution of 100 lines per Inch yields reproductions which are marginally acceptable, the main de- fect being the staircase appearance of the boundaries between the black and the white. We can iTuove the image quality by either increasing the resolution or using more than two levels for each sample ' . Keep- ing two levels per sample, a linear resolution of around 150 points per inch (pp!) is required for satis- factory image quality. At 150 eel and 1 bit per sample, a 8 1/2" x 11" letter gives us approximately 2 million bits. To transmit such a letter using PCM through a telephone channel with a 2400 bits/sec modem takes about 14 minutes. In order to reduce the transmission time to say I minute, a data reduction or compression ratio of 14:1 is required. There has been considerable interest in the digital transmission of newspaper pages(2). It has been found that, for text material, a samp11990ensity of 400 ppi is sufficient to ensure satisfactory repro- duction quality. For halftone pictures'', the required sampling density depends on two factors: (1) how many equivalent brightness levels (i.e. how many different halftone dot sizes) would be desirable; and (2) how dense we have to sample in order to avoid the appearance of moire patterns. No systematic subjective experiments have been conducted ylp respect to the first question. However, to extrapolate from experiments with continuous-tone pictures' , it is estimated that 64 levels are suf- ficient and 32 levels may be satisfactory In many cases. Let fh and f be the halftone screen density and the sampling density,2respectively, in ppi. If f is k times fh, thea we have k samples per dot, which allows us to create k dot sizes (by making some 8f the samples black). For example, for 64 levels, we need f =8th' ? for 32 levels, fs = 6f With respect to the second question, less is known. It is geryally assumed that f must be at least 10f, in order to avoid moire patterns. However, recent experiments showed that the facior 10 might be too conVervative since no moire patterns were observed in any experiment with fs > 8fh. From the above discussion, then, it seems that fs = 8f, is a good choice for the sampling density in the case of halftone pictures. A typical value for fk is 85 pH!. The corresponding sampling density is f = f, = 680 ppi, which is 70% larger than what Ts needed for the text material. Since, operationally, it much simpler to use the same sampling density for both the text material and the halftone pictures, we will be wasting a lot of samples in the text material. In any case, it should be obvious that a huge amount of bits is required to represent newspaper pages. Even with a wideband channel, some data com- pression is desirable. III. Basic Principles of Efficient Coding To reduce transmission time, we can either do source coding to reduce data redundancy or to design higher speed modems. In this paper we are concerned with the first approach. We shall also limit our- selves to error-free coding - the receiver can reconstruct an exact copy of the original digitized image (assuming the channel is noiseless). SPIE Vol, 66 (1975) Efficient Transmission of Pictorial Information / 5 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 There are two basic principles of efficient coding for two-tone image: skipping white, and contour coding. In most two-tone images, such as business documents and weather maps, the information is contained in black and there is a large amount of white background. We should aim to skip the white space and not trans- mit it. Furthermore, to transmit the (sampled) black areas, we h. 4e to transmit only the boundary points. The receiver can fill in the interiors by black. In sections IV and V, we discuss some practical implementations of these basic principles. IV. White Block Ski2ping Most documents and line drawings contain a large amount of white space. One approach to theirfgfficient coding isf6o skip the white space. A simple way of doing that was suggested by de Coulon and Kunt''', Horlander 1, and others. Each sampled scan line is divided into N-picture-element (pel) blocks. For a block containing all white pels, the 1-bit code word "0" is used. For a block containing at least one black pel, we use an (N+1)-bit code word, the first bit being "1" and the next N bits being the binary pattern of the N-pel block in question. For example, if N = 4 and a block contains two white pels followed by two black pels, then the code word for that block is "10011" - we have used "0" to represent white and "1" black. We call this code the white block skipping code. Once a value for N is fixed, the optimum code for the 2N possible N-pel patterns is of course the Huffman code based on the probability distribution of these patterns. However, the implementation of a Huffman code is complicated. If the image being coded contains a large amount of white, then the white block skipping (WBS) code may perform almost as well as the Huffman code and in the meantime is much simpler to implement. For a given image or class of images, the average bit rate bN (bits/pel) of the WBS code depends on the block size N. We would like of course to use an N which minimizes the bit rate bN. The WBS code does not take full advantage of the fact that most texts contain a large number of all- white scan lines. To take advantage of that, we propose a modified WBS code where all-white scan lines are treated specially (7). We use the 1-bit code word "0" to represent an all-white scan line. For a scan line containing at least one black pel, we transmit "1" followed by the regular WBS code words for that line. Assuming N = 3, then for a scan line containing the pels: WWWBBWBWBW..., we will transmit: 1011101101...". The first bit "1" indicates the scan line is not all white, and the second bit "0" represents WWW, etc. The optimum N is chosen by looking at the statistics of the image after we have discarded the all-white scan lines. Thus, by adding a one-bit overhead for each line, we have eliminated the necessity of coding and transmitting the all-white scan lines. This overhead is low on a per pel basis. For example, if we have 1,000 pels in each scan line, then the overhead is only .001 bit/pel which is negligible in most cases. For images such as weather maps which contain very few or no all-white scan lines, the modified code reduces essentially to the regular WBS codes. For some experimental results, see Table I. The pictures used are the same as in Ref. 5. V. Runlength Coding and Extensions(8) In runlength coding, the image is scanned linearly in the usual manner; however, instead of transmitting the individual samples, the lengths of the black and the white runs are transmitted. To achieve compression, the statistics of the runs are utilized. Some results are shown in Table I. Huffman coding was used for the runs. For a given set of probabilities for the runlengths, the most efficient code is of course the Huffman code. However, the implementation of a Huffman code requires a large dictionary and is complicated. There are suboptimum variable-length codes which are almost as efficient as Huffman codes yet much simpler to implement. Two clas of such codes, the linear codes and the logarithmic codes are particular- ly suitable for graphical data'''. The linear codes, which includes the Laemmel codes and the Golomb codes, has the property that the code word length increases approximately linearly with runlength. Tfm\are nearly optimum for exponential distributions. The logarithmic codes, which includes the Hasler codes' ', are so named because the codeword length increases approximately logarithmically with runlength. They are nearly optimum for negative-power distributions. Experimental evidence indicates that the frequency distributions of runlengths of alphanumerical data (typewritten letters, printed texts, etc.) are roughly negative-power, and that those of runlengths of line drawings and weather maps are roughly exponential. One would therefore expect that linear codes are nearly optimum for line drawings and weather maps and that logarithmic codes are nearly optimum for alphanumerical data. Such indeed turns out to be the case. Both linear and logarithmic codes are easy to implement. The main part of the encoder or decoder is a counter. An extension of runlength coding, called PDQ, was described in Ref. 8. In this method, essentially the changes in the black-to-white and white-to-black transition locations from scanline to scanline are coded. 6 SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Assuming that the black/white boundaries have no preferred direction, we can show that the probability dis- tribution of the amount of change of a transition location from line to line is Cauchy, which is approxi- mately negative-power. We therefore expect that logarithmic codes are suitable in this case. VI. Other Coding Schemes For other coding schemes, see Refs. 11-14. For channel error consideration, see Ref. 15. Acknowledgement This work was supported in part by the Advanced Research Projects Agency of the Department of Defense under Contract No. F 30602-75-C-0150. Table I. Bit Rates of Coding Schemes Picture Bit rate (Bits/pel) Number WBS Code Modified WBS Code Rullength Coding Al (typewritten) .15 .10 .10 A2 (typewritten) .30 .18 .22 A3 (circuit diagram) .25 .25 .13 A4 (handwritten) .19 .14 .12 A5 (weather map) .40 .41 .24 A6 weather map) .37 .37 .23 References 1. Huang, T.S. and Koller, H. U., "Coding of Multilevel Images," IEEE Trans. on Communications, June 1975. 2. ANPA Research Institute Bulletin 949, All-digital page facsimile tests at the ANPA/RI Research Center, March 18, 1968. 3. Huang, T.S., "Digital Transmission of Halftone Pictures," Computer Graphics and Image Processing, September 1974. 4. Huang, T. S., "PCM Picture Transmission," IEEE Spectrum, December 1965. 5. Coulon, F. de and Kunt, M, "An Alternative to Run-length Coding for Black-and-White Facsimile," Proc. of the 1974 International ZUrich Seminar on Digital Communications, ZUrich, Switzerland, March 1974. 6. Horlander, F. J., IBM Disclosure, 1972. 7. Huang, T. S. and Hussain, A. B. S., "Facsimile Coding by Skipping White," to appear in IEEE Trans. on Communications. SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 7 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 8. Huang, T. S., "Runlength Coding and its Extensions," in Picture Bandwidth Compression, ed. by T. S. Huang and O. J. Tretiak, Gordon and Breach, 1972. 9. Huang, T. S., "Easily Implementable Suboptimum Runlength Codes," 1975 Int'l Communication Conference Record, June 16-18, 1975, San Francisco. 10. Meyr, H, Rosdolsky, H and Huang, T. S., "Optimum Runlength Codes," IEEE Trans. on Communications, June 1974. 11. Arps, R. B., "Bibliography on Digital Graphic Image Compression and Quality," IEEE Trans. on Inf. Theo., Jan. 1974. 12. Stern, P. A. and Heinlein, W. E., "Facsimile Coding Using Two-dimensional Runlength Prediction," Proc. 1974 Zurich Seminar on Digital Communications, ZUrich, Switzerland, March 1974. 13. Ascher, R. N. and Nagy, G., "A Means for Achieving a High Degree of Compaction on Scan-digitized Printed Text," IEEE Trans. on Computers, Nov. 1974. 14. Session 7: Digital Facsimile Compression, 1975 Int'l Comm. Conf., June 16-18, 1975, San Francisco. ?? 15. Koller, H. U., "Einfluss von Kanalfehlern auf Digitale Faksimile-Ubertragungen," Ph.D. Thesis, Institut fUr Technische Physik, ETH-ZUrich, Switzerland, 1975. 8 / SPIE Vol 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 SOURCE CODING OF TELEVISION SIGNALS USING INTERFRAME TECHNIQUES by B. G. Haskell and P. L. Gordon Bell Telephone Laboratories, Incorporated Holmdel, New Jersey The channel capacity required for long distance digital transmission of video signals can be substantially reduced through a system using interframe coding. This is possible because of the amount of frame-to-frame redundancy which exists in a video source output. Through appropriate signal processing it is feasible to send only the changed area of each frame rather than the full frame provided that the previous frame values are retained in memories. Since the very beginning of television, efforts have been underway to develop cheaper means for transmitting video signals.(1,2) Recently, most of the attention has been focused on digital systems because, as is well known, noise does not accumulate in digital regenerators as it does in analog amplifiers, and, in addition, signal processing is much easier in a digital format. Progress is being made on two fronts. First, the present high cost-2er-bit of transmitting a digital data stream has generated interest in a number of methods for cost reduction which have general application and are not confined to data streams produced by a video signal source. However, it is important to remember that video bit- rates tend to be considerably higher than those required for voice or data transmission. The most promising techniques for more economical digital transmission include waveguides, optical fibers, laser light pipes, among others. The second front on which progress is being made involves reducing the number of bits which have to be transmitted in a video communication system. The balance of this paper deals with this facet of the problem. Bit-rate reduction is accomplished by eliminating, as much as possible, the substantial amount of redundant information which exists in a video signal as it leaves the camera. The cost of the signal processing which is required to reduce this redundancy determines its economic feasibility in a given system. The savings which accrue from lowering the transmission bit-rate must more than offset the cost of the required signal processing if redundancy reduction is to be economical. Statistical Redundancy and Subjective Redundancy If an information source such as a television camera produces statistically redundant data - that is, information which could just as well have been derived from past data - then a saving of transmission bit-rate can result if the redundant information is removed prior to transmission. In most cases this requires, at the transmitter, a capability for storing some of the past source output so that a decision can be made as to what is and what is not redundant in the present source output. Memory of past information is also required at the receiver so that the redundancy can be rederived and inserted back into the data stream in order to reproduce the original signal. For example, in a television picture successive picture points (elements) along a line are very much alike, and redundancy reduction can be achieved by sending element-to- element differences instead of the elements themselves. The differences are small most of the time and large only occasionally; thus, an average bit-rate saving can be obtained by using short binary words to represent the more-probable, small differences and longer binary words to represent the infrequent, large differences. In Figure 1, single element delays are used to store the previous element so that differences can be generated at the sending end and picture elements can be reconstructed at the receiving end. In successive frames a picture element also changes very little on the average. To transmit frame-to-frame differences, the delays in Figure 1 should be increased from an element period to a frame period. The delay elements comprise the aforementioned memory required for redundancy removal. Statistical redundancy is not the only form of redundancy in a video signal. There is also considerable subjective redundancy; that is, information which is produced by the source, but which is not necessary for subjectively acceptable picture quality at the receiver. For example, it is well known that viewers are less sensitive to degradations near edges, i.e., large brightness transitions, in the picture. Also, viewers require less reproduced resolution for moving objects in a picture than for stationary objects. Thus, in nonscientific applications where exact reproduction of the video source output is not necessary as SPIE Vol 66 (1975) Efficient Transmission of Pictorial Information / 9 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 long as the displayed picture is subjectively pleasing, a further reduction in transmission bit-rate can be achieved by removing subjective redundancy. For example, the element-differential PCM coder of Figure 1 need not transmit large differences as accurately as small differences because of the viewer's insensitivity to distortion near large brightness transitions. Thus, prior to transmission, large element differences can be quantized more coarsely than small differences, thereby reducing the number of levels which must be transmitted. Since the original signal is distorted by using this type of quantization (companded quantization) the output of the transmitter delay in Figure 1 is not the actual previous element as produced by the camera. It is, instead, the previous element as it appears at the receiver with the subjective redundancy removed. Videotelephone format pictures (1 MHz, 267 lines) can be transmitted with 16 level quantization of element differences (4 bits per picture element). The bit-rate can be reduced further by using multilength binary words for transmission; however, a buffer memory is then needed to transmit the resulting irregular data-rate over a constant bit- rate digital channel. CODES TO -401 COD Fr BINARY CODER CHANNEL INPUT 4-09--orE1-0--- QUANTIZED ELEMENT QUANTIZER DIFFERENCE /". PREVIOUS ELEMENT CODES FROM BINARY CHANNEL REPRESENTATION DELAY OF INPUT [1 ELEMENT PERIOD) DEC DECODER TRANSMITTER IP. OUTPUT DELAY [1 ELEMENT PERIOD) RECEIVER Figure 1 Transmission of element-to-element differences instead of the elements themselves. The differences are small most of the time. Thus, a bit rate reduction results if short binary code words are used for small differences and longer code words are used for large differences. Frame-to-frame differences can be transmitted if the delays are increased to a frame period. Simple Methods of Frame-to-Frame Redundancy_Reduction In order to reduce frame-to-frame redundancy a memory or delay capable of storing an entire frame of video information is needed. At present, this requirement is the main factor determining the economic feasibility of frame-to-frame signal processing. However, it is expected that costs for digital storage will continue to decline, thereby making this type of signal processing even more attractive in the years to come. Devices for digital storage which are currently being studied include magnetic bubble memories,(3) charge coupled devices,( large MOS stores(5) and surface acoustic wave devices. (6) One method of removing frame-to-frame redundancy is simply to reduce the number of frames that are transmitted per second as shown in Figure 2. At the receiver, frames are repeated as in motion picture projection in order to avoid flicker in the display. This technique takes advantage of the fact that frame display rates must be greater than about 50 Hz to eliminate objectionable flicker, whereas something between 20 and 30 Hz is all that is required for rendition of normal motion, and less than 15 Hz for rendition of low speed movement.(7) 10 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 In most systems interlaced scanning already takes advantage of this phenomenon to some extent. Odd numbered lines are sent during one half frame period (field 1) and even. numbered lines during the other half frame period (field 2). For example, broadcast television and videotelephone systems in the United States transmit 30 frames per second using 2:1 interlace (60 fields per second). A 50 percent reduction in bit-rate can then be obtained by transmitting only 15 frames per second and displaying each frame twice as shown in Figure 2; however, jerkiness is then noticeable if the scene contains objects moving with moderate or rapid speed. ONLY ALTERNATE FRAMES ARE TRANSMITTED FRAMES TRANSMITTED ARE DISPLAYED TWICE I I 1 TRANSMISSION 1 1 1 INPUT ????ji ? CHANNEL I (30 FRAMES /SEC) (15 FRAMES/SEC)' ? I? I TWO INTERLACED 11 1(30 0 OUTPUT FRAMES/SEC) FRAME FIELDS PER FRAME DELAY Figure 2 2:1 frame repeating. Reducing the frame transmission rate by half (15 Frames/sec) and displaying each frame twice at the receiver (30 Frames/sec) works well for movement at slow speeds but gives rise to visible jerkiness during movement at higher speeds. Figure 3 El-BET SIGNAL VIDEO PCM --pc> M-LEVEL QUANTIZER REFERENCE FRAME SIGNAL DELAY MEMORY REFERENCE SIGNAL TRANSMISSION CHANNEL ADDER FRAME DELAY MEMORY ? 0. VIDEO OUTPUT Frame-differential PCM. When transmitting frame-to-frame differences small values are quantized finely to give high quality reproduction in stationary areas of the picture, and large values are quantized coarsely to take advantage of the viewer's tolerance of brightness errors near moving edges in the picture. Because of the feedback configuration, a high quality picture is displayed within a few frame periods after motion in the scene has ceased. If in Figure 1 the delays of the feedback coder and decoder are increased to one frame period, then the system transmits companded frame-differences as shown in Figure 3. Since small frame differences are finely quantized, scenes containing little or no movement are displayed with exceptionally high picture quality. When movement does occur, thereby creating larger, more coarsely quantized frame differences, some distortion is produced; but it is all confined to the moving areas of the picture where it is less noticeable to the viewer. Because of the feedback configuration, a high quality picture is again displayed within a few frame periods after motion in the scene ceases. In this system a digital transmission error caused by the channel introduces an erroneous picture element into the receiver frame memory where it would remain indefinitely if some means for removing it were not provided. One method for doing this (called leak) is to reduce the magnitude of the reference signals by a small percentage both at the transmitter and at the receiver. The effect of a transmission error then slowly dies away as time progresses. Another technique which is more useful in interframe coders (called forced updating) is to transmit a small percentage of the picture elements in each frame as PCM. In this case a transmission error will remain visible until the corresponding PCM updating information arrives, at which time the error will suddenly disappear. SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 11 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 For video signals, between 30 and 60 levels of frame difference quantization are required for good picture quality during rapid motion.(8) However, if picture quality during periods of movement is of no concern, then two-level quantization, i.e., one bit per element or frame-to-frame delta-modulation, is sufficient to reproduce still scenes such as charts or graphs with excellent fidelity.(8) Since more quantization levels are required for frame differences than for element differences (for approximately equivalent picture quality during periods of rapid movement) there is little advantage in frame-differential coding unless multilength binary code words are used along with fairly large buffers. And if complex operations such as these are to be carried out, other techniques can be used which are of the same order of complexity but are much more efficient in terms of bit-rate reduction. Figure 4 a) Videotelephone scene containing movement. b) Significant changes in a videotelephone scene containing moderate motion. The white picture elements denote frame-to-frame differences which are larger in magnitude than 1.5 percent of maximum. There are about 10,000 changes in this frame out of a total of about 56,000 visible elements. c) The white picture elements here are the ones which would be transmitt-d using cluster coding. For each cluster, an address is transmitted corresponding to the first picture element; then the frame differences are sent followed by a special code signalling the end of the cluster. Isolated changes in Figure 4b are ignored, and gaps of six picture elements or less between clusters are bridged to avoid sending excessive addressing bits. 12 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Conditional Replenishment of the Referengg_Signal In most videotelephone scenes, for example, the amount of movement is fairly small (see Figure 4b). There is very little camera panning, and there are few sudden scene changes. Furthermore, when there is a lot of movement such as when the camera is moved or when graphical material is being positioned, it is not necessary to transmit the picture with as high quality as in the normal situation. Thus, since there are usually so many small frame differences, there is considerable advantage in not transmitting them and, instead, sending only the frame differences which are significant;(1o,11) that is, leaving the reference signals in Figure 3 unchanged unless the frame difference is large enough to warrant replenishment with new information. If only the significant frame differences are to be transmitted, i.e., those larger than about 1.5 percent of maximum, then addressing information must also be sent to tell the decoder where in the picture the received frame differences should be placed. Also, since data is generated at a very irregular rate depending on the location and on the amount of moving area in the picture, an elastic digital store or buffer must be used at the transmitter and receiver in order to use a constant bit-rate digital channel. Conditional replenishment takes advantage of the long runs of ingignificant differences that normally occur in videotelephone pictures. It has been found that gignificant frame differences also tend to occur in clusters,(8) and for this reason it is more efficient to address only the beginning and end of each cluster than to address each significant change separately. Moreover, since an address generally requires many more bits for transmission than does a frame difference, it is worthwhile to coalesce clusters of significant changes which are separated only by small gaps as well as to eliminate as much as possible isolated differences due to noise. The white dots in Figure 4b show the significant changes; Figure 4c shows the result of removing isolated changes and of bridging small gaps between clusters of changes. SIGNIFICANT CHANGE DETECTOR SEGMENTER DELAY-D FEEDBACK TO PREVENT BUFFER OVERFLOW AND UNDERFLOW 8-BIT PCM CODER SIGNAL ADDRESS GENERATER FRAME 1 DIFFERENCE TO TRANSMISSION CHANNEL 0 --IP' 0 ?*ire? VIDEO .? ipf DELAY OUANTIZER REFERENCE PICTURE FRAME PERIOD MINUS 0 4 DELAY DELAY Figure 5 A conditional replenishment coder which transmits clusters of significant frame differences. The segmenter segments each frame into changed areas and stationary background areas, i.e., defines the clusters of frame differences which must be transmitted. The reference picture, which is available also at the receiver, is updated or replenished with quantized significant frame differences as determined by the segmenter. Since the segmenter requires a delay in order to operate efficiently, compensating delays must be included in the coder loops. Buffer overflow or underf low can be prevent by reducing or increasing the number of clusters transmitted. Forced updating (transmitting a few lines of each frame as PCM) to alleviate the effect of channel errors is not shown. In the conditional replenishment coder of Figure 5 the segmenter defines the clusters of frame differences which must be transmitted. The required channel bit-rate depends on the size of the buffer and on the amount of motion in the picture which is to be accommodated. During periods of rapid movement when the data generation rate of the coder exceeds the channel rate, the buffer will begin to fill, and if this condition persists for too long, the buffer will overflow and data will be lost. Figure 6 shows SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information / 13 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 the required buffer size versus transmission bit-rate for a videotelephone picture in which a person is engaged in normal conversation. Sixteen bits are assumed for addressing each cluster and four bits on the average are assumed for each frame- difference. For high quality pictures (30-60 level frame difference quantization) multiword-length coding of frame differences prior to transmission is implied. The videotelephone signal format used here is: 1 MHz analog bandwidth, 271 lines/frame, 30 frames/sec with two interlaced fields per frame. Thus, 3 Megabits/sec corresponds to an average of 1.5 bits per picture element, blanking included. BUFFER SIZE PITS) CUUR 100K BOK 60K 40K 20K 10K BK BK 4K 2.5 3 3.5 4 CHANNEL RATE (MEGABITS/SEC! Figure 6 Buffer size versus transmission bit rate required for simple conditional replenishment where the scene contains the head and shoulders view of a person engaged in normal conversation. The curve is characterized by a sharp knee at the point where the buffer is just large enough to smooth the generated data over a field period (1/60 sec). Sixteen bits per cluster was assumed for addressing and the four bits on the average was assumed per frame difference. Very large buffers (several million bits) are capable of smoothing between one human action and the next (the fidget interval), but they introduce too much delay into the conversation to be useful. Figure 6 shows that significant savings in bit-rate are achieved as the buffer size is increased up to about 15,000 bits. Around this point, data peaks generated in small areas of the picture are smoothed over an entire field period. Above 15,000 bits, however, the buffer begins to smooth the data over periods of time longer than a field interval (1/60 second here), and since the amounts of data in successive fields tend to be very much the same, relatively little saving of transmission bit-rate is obtained.(11) Very large buffers (several million bits) are capable of reducing the bit-rate significantly by smoothing the data from one human action to the next; however, besides being uneconomical they also introduce several seconds of delay into the transmission, and for face-to-face conversations over videotelephone, for example, this is intolerable. More will be said about smoothing the irregular data-rate when the subject of channel sharing by several coders is discussed. Thus, for simple conditional replenishment there seems to be little advantage in providing more buffer capacity than is required to smooth the data over a field period. If normal motion is to be accommodated in head and shoulders views of pernle in conversation then with a 15,000 bit buffer, a transmission rate of about 3 Megabits/sec or 1.5 bits per picture element (including blanking) is needed. This is a saving of about 5:1 over 8-bit PCM. With a 600,000 bit buffer (which is about as large as can be used without introducing annoying delay into the conversation) a transmission rate of about 1.25 bits per picture element can be achieved. However, lower transmission rates can be achieved without using large buffers by using adaptive or multimode coding. These techniques are described next. Aflptive or Multimode Conditional Replenishment Viewers are accustomed to blurring in areas of the picture containing moderate or rapid movement. The main reason for this blurring is that since television cameras have no shutters, the light falling on a particular point of the target is integrated over one 14 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 frame period before being read out, and when there is movement this results in some blurring in the picture even before transmission. Since rapidly moving objects are blurred, there is no need to transmit these areas with full picture resolution.(13,14,e) Thus, a different mode of conditional replenishment coding can be used during periods of rapid movement. Since the transmission channel capacity required for conditional replenishment coders is determined mainly by the bit- rate generated during rapid motion, reducing the bit-rate during these periods lowers the required bit-rate of the channel. A particularly simple method of reducing the moving-area resolution and, hence, the bit- rate in conditional replenishment systems is to transmit only every other significant frame difference along a line instead of all of them as is normally done. The moving- area picture elements which are not transmitted can then be obtained from the ones that are by simple interpolation. Full resolution is maintained in the stationary background areas of the picture since they are simply repeated from the previous frame. An effective way of making sure that the reduced resolution coding mode is only used during periods of rapid movement is to bring it in only when the number of bits stored in the buffer exceeds a certain threshold value. Subsampling the moving area at half-rate in this way almost halves the channel capacity required for transmission. It has been found that further horizontal subsampling gives rise to visible degradation except with extremely rapid motion.(14) Feedback from the buffer can be used to control the picture quality in other ways as well. For example, the frame difference threshold value used by the segmenter to test for significance can be raised as the buffer fills. And when there is violent motion or movement of the camera (periods during which the viewer is not very critical of picture quality) a simple, but effective way of cutting down the data rate is to stop conditional replenishment altogether for one frame period whenever the buffer threatens to overflow and in this way give the buffer a chance to empty. At the receiver the previous frame is simply displayed twice resulting in a slight jerkiness in the movement instead of the smooth motion which is normally rerroduced. NUMBER OF BITS IN THE BUFFER EMPTY FULL 10K 20K 30K 40K 50K 60K t ST" START SUBSAMPLING REPEAT/ SUBSAMPLING A FRAME FORCE UPDATING 4 5 6 7 SIGNIFICANT FRAME DIFFERENCE THRESHOLD (ON A SCALE OF 0...255) Figure 7 Multimode conditional replenishment where the coding mode is controlled by the buffer fullness or queue length. Fairly active motion can be accommodated with a 65K bit buffer and a transmission channel rate of 2 Megabits/sec or 1 bit per picture element. A saving of 8:1 over 8-bit PCM is obtained. The significant frame difference threshold is raised from 4 to 5, 6 and 7 on a scale of 0...255 as the buffer-queue-length increases beyond 20K, 35K and 50K bits, respectively. All replenishment is stopped when the queue length excees 65K bits, and it does not resume again until the queue length falls below 2500 bits. The system starts subsampling in the moving area at half rate when the queue length exceeds 20K bits and does not return to full sampling until the queue length falls below 10K bits. One TV line is sent as 8-bit PCM (forced upda*!--,q) whenever the queue length falls below 2500 bits. Multimode conditional replenishment with forced updating, subsampling at half rate during rapid movement and frame repeating to prevent buffer overflow as shown in Figure 7 requires, for videotelephone, a transmission rate of about 2 Megabits/sec - an 8:1 savircr in bit rate over 8-bit PCM.(e) Fairly active motion can be accommodated using a buffer size of about 65K bits. Round trip delay introduced by the coder is about 1/15 SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 15 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 of a second which is not noticeable in most situations. If, during violent motion, the transmitting buffer fills then repeating one frame allows the buffer to completely empty so that several frames of violent motion can then be transmitted before another frame must be repeated. Emptying of the transmitter buffer during periods of very slow motion can be conveniently prevented by a forced updating of one TV line as 8-bit PCM whenever the number of bits stored in the buffer falls below a certain threshold. In fact, it is prudent to transmit a few lines of each frame as PCM regardless of the state of the buffer in order to accommodate for the effects of digital transmission errors which would otherwise last indefinitely. Conditional Subsampling in the Vertical Direction Subsampling horizontally along 'a line in the moving area at less than half rate gives rise to visible degradation unless movement is very fast-(is) However, moving-area subsampling at half rate in the vertical direction is feasible, and like horizontal subsampling it reduces the transmitted data-rate significantly. Because of interlace, vertical subsampling at half rate in the moving area is the same as not transmitting any frame differences at all during alternate fields. The moving area picture elements in those fields would be obtained at the receiver by interpolation between the adjacent fields(15) as shown in Figure 8. The effect on spatial resolution is usually negligible partly because of the smearing due to movement and partly due to the fact that good vertical resolution is difficult to obtain in a television picture in any case because of the scanning mechanism. FIELD 1 FRAME 1 FIELD 3 FIELD 4 FRAME 2 Figure 8 Subsamplina at half rate in the vertical direction. Because of interlace this is equivalent to conditionally replenishing alternate fields only. In the diagram, fields 1 and 3 would be coded and transmitted via conditional replenishment. Moving area picture elements in the intervening fields would be obtained at the receiver by interpolation. Thus, changed picture elements in line B2 would be obtained by a four-way average of picture elements in lines Al, CI, A3 and C3. Simple vertical subsampling and four- way irterpolation in this manner does not give good rendition of moving edges, however, and results in a slightly visible jerkiness in the picture. By sending a small amount of additional information to repair these edges, the jerkiness can be eliminated. This is called conditional vertical subsampling.. (16) The effect of field-to-field interpolation does have a noticeable effect on temporal resolution, however, especially during periods of rapid motion.(15,16) Since the interpolation is based on picture information occurring 1/60 of a second away in time, moving edges are poorly represented, and this leads to jerkiness in movement rendition when large areas move fast. This jerkiness is much less noticeable than with frame repeating, but it is nevertheless detectable. In most scenes only a small amount of additional information need be transmitted to the receiver to enable it to adequately reproduce those edges which have been distorted by the field-to-field interpolation.(16) This procedure is called "conditional vertical subsampling." The white picture elements of Figure 9 show where simple field-to-field 16 / SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 interpolation is not good enough and additional information must be transmitted. The original scene is the same as in Figure 4. In a single videotelephone field of Figure 4c there are 7870 picture elemtns in the moving area, but only 2250 of them as shown in Figure 9 need be transmitted using conditional vertical subsampling. The channel bit- rate can be reduced by about a factor of 1.5 using conditional vertical subsampling. Figure 9 Picture elements in one field of Figure 4 which would be transmitted using conditional vertical subsampling with trheshold T = 12 (on a scale of 0...255). In the fields to be interpolated (fields 2 and 4 of Figure 8) the actual value of each moving area picture element is compared with the value obtained by the four-way interpolation. If the magnitude of the difference exceeds the threshold T, then the difference exceeds the threshold T, then the difference is transmitted along with addressing information. Otherwise, nothing is sent, and the four-way average is used at the receiver. In this field, there are 7870 picture elements in the moving area, but only 2250 need be transmitted using conditional vertical subsampling. Buffer and Channel Sharing The channel bit-rate required by the conditional replenishment coder of Figure 5 is determined mainly by the np_ak data-rate produced by the coder, i.e., the data-rate generated during periods of active motion in the scene. Since periods of active motion in videotelephone scenes, for example, tend to come in bursts, separated by long intervals of inactivity, the channel bit-rate tends to be much larger than the long-term aver2ge data-rate (excluding the forced-updating needed to prevent buffer emptying). As was stated previously, an extremely large buffer (several million bits) would be needed in order to appreciably smooth the data from one period of active motion to the next and, thus, bring the channel bit-rate down closer to the long-term average data-rate. Such a buffer would not be feasible because of economic reasons and because of the excessive delay it would introduce into a videotelephone conversation. At any given time, however, the data-rate averaged amongst many conditional replenishment coders will be fairly close to the long-term average data-rate produced by a single coder. Thus, if the data produced by many videotelephone users is combined prior to buffering and transmission, as shown in Figure 10, the advantage of a large buffer can be realized, namely lower channel bit-rate per data source, without the disadvantages of high buffer-memory cost and excessive delay. This technique automatically allocates more buffer space and channel capacity to those users which are temporarily in rapid motion at the expense of those which are not. Computer simulations have been carried out to evaluate this scheme, and results indicate that by combining the data from about 15 conditional replenishment coders prior to transmission, the channel bit-rate per coder can be halved and the buffer size requirements can be drastically reduced. (19) SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 17 Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 UNBUFFERED _ DATA SOURCE 1 PREBUFFERS SOURCE 11 2 SOURCE 3 SOURCE 4 SOURCE .0*****?13?. PRINCIPAL BUFFER TO CHANNEL 'Figure 10 Buffer and channel sharing by several conditional replenishment coders. The unbuffered data, (entering from the left) produced by each coder occurs at a very irregular rate - short periods of high data-rate caused by active motion in the videotelephone scene tend to be separated by long periods of low data-rate. The data is first stored temporarily in individual prebuffers, then transferred to the principal buffer via the multiplexing switch, and finally transmitted over a single high capacity channel to the receiver where the inverse operation takes place. This technique automatically allocates more buffer space and channel capacity to those sources which are temporarily producing the higher data-rates. By sharing the channel in this way amongst about 15 conditional replenishment coders, the channel bit-rate per coder can be halved.(19) Frame-to-Frame Coding of Signals Which Have Been Previously Transmitted Via Intraframe Coding A possible configuration for long-distance transmission of video signals is the hierarchical arrangement shown in Figure 11. Within large cities, for example, the signals might be sent via relatively cheap intraframe coding techniques such as element differential PCM. Between cities where transmission costs are much higher, interframe techniques such as those described previously might be used to reduce the bit-rate as much as possible. A fundamental difficulty arises with this situation, however. All intraframe coders introduce a certain amount of frame-to-frame noise values into the signal which, although they may not be objectionable to the viewer, are detected by the interframe coder as a large number of significant frame differences. The white picture elements in Figure 12b show the frame differences which are larger in magnitude than 1.5 percent of maximum using a video signal which has been passed through an element-differential PCM codec. The number of significant changes far exceeds the number of changes which occur when 8-bit PCM is used as the input signal as in Figure 4b. In order to reduce the number of picture elements which must be transmitted it is not sufficient to simply raise the threshold which determines whether or not a frame difference is significant. Unacceptable picture quality results. Instead, advantage must be taken of the distinguishing properties of the frame differences caused by intraframe quantization noise as opposed to the distinguishing properties of the frame differences cause by movement. For example, frame differences caused by intraframe quantization noise tend to be uncorrelated both spatially and temporally, and they are usually larger near edges in the picture because of companding. Frame differences caused by movement, on the other hand, are highly correlated both spatially and temporally and tend to be large only near moving edges in the picture. Using information such as this, algorithms have been designed for segmenting the picture with good accuracy into moving areas and stationary background areas.(17) The white 18 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 w41. Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0 picture elements in Figure 12c show the same signal used in Figure 12b. still tends to be significantly conditional vertical subsampling at multiword-length coding of frame coder for videotelephone has been transmission rate of 2 Megabits/sec the moving areas as defined by such an algorithm for However, the number of moving-area picture elements higher than with an 8-bit PCM input. PIT using half rate, horizontal subsampling at half rate and differences, a multimode conditional replenishment simulated which operates with an average data (1 bit/picture element) C18) INPUT INTRAFRAME CODER IINTRACITY DIGITAL LNK INTRAFRAME DECODER iPeja INTERFRAME ? CODER NTERCITY DIGITAL LINK INTERFRAME DECODER 1Pcm NTRAFRAME CODER INTRACITY DIGITAL LINK INTRAFRAME DECODER OUTPUT Figure 11 A possible configuration for long-distance transmission of videotelephone signals. Cheaper intraframe codecs are used for short, intracity digi+al links. Between cities where transmission costs are high, interframe codecs are used to reduce bit-rate as much as possible. Channel Bit-Rate Coding Method Megabits/Sec 1. 2. 3. 5. 6. 8-Bit PCM Element-Differential PCM Simple Conditional Replenishment Multimode Conditional Replenishment Multimode Conditional Replenishment and Conditional Vertical Subsampling Multimode Conditional Replenishment, Conditional Vertical Subsampling and Channel Sharing by 15 Users TABLE 16 8 3 2 1.3 0.7 - Bit-rates required for videotelephone using various coding techniques. Element-differential PCM is an intraframe coding technique while methods 3-6 are interframe techniques. Method 4 has been demonstrated while the figures cited for methods 5 and 6 are based on simulations. SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 19 Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0 Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Figure 12 a) Videotelephone scene containing movement. b) The white picture elements denote frame-to-frame differences which are larger in magnitude than 1.5 percent of maximum using a video signal which has been passed through an element-differential PCM codec. The number of significant changes far exceeds the number of changes in Figure 4c where 8-bit PCM was used as the input signal. C) Using the same video signal as in a) the white picture elements here denote the moving area as defined by a more sophisticated segmenting algorithm.(17) This algorithm takes advantage of a number of signal properties which enable it to distinguish between frame differences due to intraframe quantization noise and the frame differences caused by movement. Conclusion This article has covered some of the known methods for taking advantage of frame-to- frame redundancy in video signals in order to reduce the transmission channel bit-rate. The problem was approached from two sides. First, since only a portion of the picture elements change significantly between frames, there is considerable advantage in transmitting only the changed area. i.e., using conditional replenishment.(10,11) Second, since viewers are less critical of picture quality in moving areas than in the stationary areas, a saving of bit-rate can be achieved by progressively lowering the resolution of the moving area as the amount of movement increases.(13,8,14) The bit-rate savings possible using various frame-to-frame coding techniques on videotelephone signals are shown approximately in Table I. A saving of 8:1 in bit-rate over 8-bit PCM has been demonstrated using multimode conditional replenishment,) while 20 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 the figures cited for conditional vertical subsampling(16) and channel sharing(19) are based on simulations. All but rapid movement of large areas is accommodated with excellent picture quality. With large area movement, frame repeating is used which causes some visible jerkiness in the picture. If the input signal contains noise due to a previous intraframe coding and decoding, then too many spurious frame differences are present to use only these methods. More sophisticated moving area detection techniques(17) must be used to transmit such a signal efficiently. Transmission errors in the digital channel have been mentioned only in passing. A basic principle in the efficient coding of information is that the more one removes redundancy from a data stream the more vulnerable to transmission errors the information becomes. This is certainly true of picture coding. However, some information is more important to the receiver than o4-her information. For example, in conditional replenishment, addresses and synchronization information are far more important than frame differences. Thus they should be protected by adding redundancy in a controlled fashion. In this way, the effect of transmission errors can be minimized. Other techniques for removing frame-to-frame redundancy are currently being explored. (20-22) However, most of these schemes, like the techniques described in this paper, are fairly expensive to implement, and this tends to limit their usefulness at the present time to only long-distance transmission links where the expense of the redundancy reduction systems are significantly outweighed by the savings in transmission costs. In the future, however, prices of digital logic and storage are expected to continue the dramatic decline experienced in the last several years making some of these bit-rate reduction schemes economically feasible for digital transmission over shorter distances. Acknowlelgment This article describes the work of a number of people, many of whom we have had the opportunity to talk with personally. We would like to express our gratitude to them for the many stimulating discussions which we have had. REFERENCES 1. Special Issue on Redundancy Reduction, Proceedingg of the IEEE, v. 55, no. 1967. 2. Special Issue on Digital Picture Processing, Proceedings of the IEEE, 768-820, July. 1972. 3, March V. 60, pp. 3. G. Lapidus, "The Domain of Magnetic Bubbles," IEEE Spectrum, v. 9, no. 9, 1972, po. 4. September W. S. Boyle and G. E. Smith, "Charge-Coupled Devices - A New Approach to MIS Device Structures," IEEE Spectrum, v. 8, no. 7, July 1971, pp. 18-27. 5. L. M. Terman, "MOSFET Memory Circuits," Proceedingg of the IEEE, v. 59, no. 7, July 1971, pp. 1044-1058. 6. G. S. Kino and H. Matthews, "Signal Processing in Acoustic Surface-Wave IEEE Spectrum, V. 8, no. 6, August 1971, pp. 22-35. 7. R. C. Brainard, F. W. Mounts, and B. Prasada; "Low Pesolution TV: Effects of Frame Repetition and Picture Replenishment," Bell System Tech. pp. 261-271, January 1967. 8. J. C. Candy, M. A. Franke, B. G. Haskell, and F. W. Mounts, "Transmitting as Clusters of Frame-to-Frame Differences," Bell gygtem Tech. J., v. 50, 1888, July-August 1971. Devices," Subjective J., v. 46, Television pp. 1877- 9. R. Schaphorst, "Frame-to-Frame Coding of NTSC Color TV," in Symposium on Picture Bandwidth Compression (Mass. Inst. Technol., Cambridge, Mass., April 1969). 10. A. J. Syler, "Real-Time Recording of Television Frame Difference Areas," Proc. IEEE (Corresp.) v. 51, pp. 478-480, March 1963. 11. F. W. Mounts, "Video Encoding System with Conditional Picture Element Replenishment," Bell System Tech. Jz., v. 48, pp. 2545-2554, September 1969. 12. J. 0. Limb, "Buffering of Data Generated by the Coding of Moving Images," Bell System Tech. J., v. '11, pp. 239-259, January 1972. SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 21 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 13. A. J. Seyler, "The Coding of Visual Signals to Reduce Channel-Capacity Requirements," Proc. IEEE, v. 109, pt. C, pp. 676-684, September 1962. 14. R. F. W. Pease and J. 0. Limb, "Exchange of Spatial and Temporal Resolution in Television Coding," Bell System Tech. J., v. 50, pp. 191-200, January 1971. 15. J. O. Limb and R. F. W. Pease, "A Simple Interframe Coder for Video Telephony," Bell ystem Tech. J., v. 50, pp. 1877-1888, July-August 1971. 16. R. F. W. Pease, "Conditional Veritcal Subsampling - A Technique to Assist in the Coding of Television Signals," Bell Eystein Tech. J., v. 51, pp. 787-802, April 1972. 17. D. J. Connor, J. O. Limb, R. F. W. Pease, And W. G. Scholes, "Detecting the Moving Area in Noisy Video Signals," to be published. 18. D. J. Connor, B. G. Haskell, and F. W. Mounts, "A Frame-to-Frame PICTUREPHONE Coder for Signals Containing Differential Quantizing Noise," Bell astern Tech. J., v. 52, pp. 35-51, January 1973. 19. B. G. Haskell, "Buffer and Channel Sharing by Several Interframe PICTUREPHONE Coders," Bell Sy2tem Tech. J., v. 51, pp. 261-289, January 1972. 20. B. G. Haskell and J. O. Limb, "Predictive Video Encoding Using Measured Subject Velocity," U. S. Patent 3,632,865; January 4, 1972. 21. B. G. Haskell, "Frame-to-Frame Predictive Coding of Videotelephone Signals," IEEE International Conference on Communications - Conference Record, v. 8, p. 31-51, June 19-21, 1972. 22. F. Rocca and S. Zanoletti, "Bandwidth Reduction Via Movement Compensation on a Model of the Random Video Process," IEEE Trans. on Communications, v. COM-20, pp. 060-965, October 1972. 22 / SINE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 BANDWIDTH COMPRESSION OF MULTISPECTRAL DATA A. Habibi and A. S. Samulon TRW Systems Group Redondo Beach, California 90278 Abstract The coding methods suitable for bandwidth compression of multispectral imagery are considered. These methods are compared using various criteria of optimality such as MSE,signal-to-noise ratio, recognition accuracy, as well as computational complexity. Based on these results the candidate methods are reduced to three recommended methods. These are KL-2 Dimensional DPCM, KL-cosine-DPCM and a cluster coding method. The performance of the recommended methods are examined in the presence of a noisy channel and concatenated with a variable rate (Huffman) encoder. Introduction Improvements in earth resource satellite MSS scanners will result in higher spatial resolutions, great- er number of spectral bands, and faster output rates for the multispectral earth resources data. This will require larger bandwidth for transmission of the multispectral data as well as larger storage capacities and more complicated ground processing systems for classification and dissemination of the data. Compression of data at the source can reduce the costs and the constraints upon each of these parts of a scanner-to- user information management system. Most studies of data compression techniques for multispectral data have considered only one coding technique. General Electric and Philco-Ford have considered only simple and elementary coding algorithms [1-2]. Purdue University has considered use of transform coding techniques and clustering algorithms for compressing the multispectral data in separate studies [3,4]. TRW Systems Group considered coding multispectral data using errorless entropy coding of this data for archiving applications [5]. Jet Propulsion Laboratory has considered using clustering algorithms for joint bandwidth compression and classification of multispectral images [6]. These studies have used different data bases. They leave out many types of efficient coding algorithms, and generally do not include the effect of important parameters such as channel noise and the sensor behavior on the performance of the individual coding algorithms. In addition due to use of different data bases and different criteria of optimality a meaningful comparison of various coding techniques is impossible. We consider a number of promising algorithms for on-board compression of the multispectral data. Included are Differential Pulse Code Modulator (DPCM) and transform coding systems as well as hybrid coding techniques. The performance of these systems are improved by concatenating these coding algorithms with variable rate (Huffman) encoders. In addition, we examine a cluster coding method which reduces the bandwidth of multi- spectral imagery by classifying the multispectral data into a fixed number of clusters. The clusters are approximated at the receiver using a classified image and the cluster averages. These algorithms are compared based on their performances on representative ERTS data. The experimental results are obtained by simulating the algorithms on digital computers. A number of criteria of optimality such as mean square error, peak-to-peak signal to noise ratio, retention of classification accuracy and the computational complexity are used to compare these techniques. Criteria of Bandwidth Comparison Several criteria serve as a basis for comparison of alternative compression techniques. These include distortion, complexity, and susceptibility to channel and sensor effects. Distortion In order that data compression be possible, one or both of two conditions must be satisfied. The first condition is that the data have some structure or redundancy. By knowing something about the data struct- ure, data compression can be accomplished without losing any of the information the data contains. Hence, data compression that relies solely on knowledge of data structure is called information preserving or error free data compression. The second condition which leads to compression is that the image user have interest in only limited accuracy or a particular aspect of the data collected. In this case, data which is of low value to the user can be discarded, resulting in data compression. Since information discarded is permanently lost, compression based on user interest is called information destroying data compression. In order to determine whether a particular information destroying compression algorithm preserves sufficient information for a particular use, a criterion must be defined by which the information loss can be measured. Because of its mathematical convenience, a number of authors have used mean square error as a measure of distortion. We use this as one measure during our study. It has the advantages of being widely used and understood, of making tractable analytical treatments of the problem, and of leading to satisfactory results for many applications. On the negative side, it is a criterion not specific to particular uses of the data. SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 23 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 A criterion closely related to mean square error is the signal to noise ratio. Indeed this criterion can be considered a normalized form of the MSE. Peak-to-peak signal to root mean square (RMS) value of the noise as well as RMS signal-to-noise ratio is widely used by the television industry as a measure of television signal quality. Recognition Accuracy Another criterion which is more application oriented is the algorithm's effect on classification accuracy. For example, if each raster point in the image is to be classified as corresponding to some crop or land use, based on the recorded brightnesses in the several spectral bands, the increase in class- ification errors resulting from performing classification on the reconstruction of the compressed data rather than on the original image is a measure of the relevant information lost due to compression. The increased probability of misclassification can be used as a distortion measure, but, unfortunately, the complexity of the classification procedure generally precludes calculation of the rate distortion function. The problem of classification is equivalent to partitioning the space of possible measurement values. All measurement values falling within one element of the partition are viewed as representative of some category. For example, in the case of earth resources data, one might assume that all recorded brightness values within some range are representative of wheat. Two basic approaches exist for determining a partition. In one method, a set of categories is specified. The space of measurement values is then broken into segments such that the values in a partic- ular segment correspond with one of the prespecified categories. The segments are chosen in a way that in some sense is "good". In the other approach to determination of a partition, no categories are specified in advance. Instead the measurement values are examined to see if they cluster into groups. In general it is difficult to define what is meant by certain values belong to one cluster and certain values belong to another cluster. A myriad of criteria have been proposed. Most of them yield different clusters, for some sets of measurement values, than a human would intuitively obtain. Now we define what is meant by preservation of classification accuracy under a distortion of the measurement space. We assume that we have two measurement spaces, each having its own partition. In the first space are the undistorted measurements, x, and in the second are the distorted measurements. In order that classification accuracy be preserved the two partitions should be such that if y is the distorted value corresponding with x, then the category which y represents should be the same as the category which x represents. Using more precise notation, we make the following definitions: X a space of measurements with elements x PX a partition of X (PX (x) is the category which x represents) a space of measurements with elements y a partition of Y a mapping form X into Y Then we say that Py preserves the classification accuracy of Px under the measurement distortion D. We can see in Figure 1 that class 1 and class 2 can be distinguished equally well in the space X and the space Y with the appropriate partitions. Under cetain distortions, however, it may not be possible to preserve classification accuracy. It is always possible to find a partition which preserves classification accuracy. It is always possible to find a partition which preserves classification accuracy if D is a 1 to 1 function since, in that case, values which can be distinguished in space X can be distinguished in space Y. If D is not a 1 to I function, it may still be possible to preserve classification accuracy. For ex- ample a function which maps all values in one category in the original space into one point in the distorted space can still preserve classification accuracy as long as it maps values in different categories into different points. We now fit the definition we have made into the context of earth resources data. For each point in the image we obtain a set of brightness measurements in several spectral bands. The space of measurement values in the various spectral bands can be represented by the space X in the definition above. In addition, further measurements can be obtained by taking into account the spatial variation of brightness in the neighborhood of each point. Such measurements are often called "texture" measurements. These measurements can also be included in the space X. Thus, for each point in the image we have a space X of measurement values. Using pattern classification, we can determine, based on the measurement space, what category (e.g., wheat, petroleum bearing land) is represented by a particular point in the measurement space. Data compression which is classification accuracy preserving is then obtained in two ways. First, several points in the same category can be mapped into one point. Second, spatial redundancy of the measurement values can be exploited to achieve further data compression. It was decided to use a clustering technique for classification for several reasons. First, a technique using prespecified categories would have required training samples to define the best partition. 24 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Thus, ground truth information would be required for each prespecified category represented in the data. Second, each category (e.g. winter wheat) has a different spectral signature depending on sun angle, latitude, season, and so on. Therefore, ground truth data for the particular data sets processed would be required. Third, we wanted to use a data set with a number of different categories, including a variety of crops, water, desert, and mountains so that any conclusions developed during the study would be relevant for a number of disciplines. In the classification accuracy determination procedure, we calculate the centroids of the clusters. Unfortunately, calculation of the centroids is an iterative process which is quite time consuming. Hence, it is desirable to sample the available imagery and determine the centroids using a subset of the picture elements. For convenience we have chosen to use only elements belonging to every fourth row and column of the imagery in determining the centroids. Using the clustered subset we obtain the centroids of the clusters. By centroid is meant the following. Each cluster has a particular mean value in every spectral band. The centroid of a cluster is the set of mean values for that cluster. Figure 2 shows pictorially the meaning of centroid for the case of only two spectral bands. The centroid finding program has a number of steps that can be summarized as follows: A particular number of clusters is selected. More or less arbitrarily, a set of initial centroid values are chosen. Each sample value is assigned to the category corresponding to the closest centroid. Then the centroids of the sample values belonging to each class are computed. Now the data samples are mapped into those categories having the closest of the new centroids. This entire process is repeated until very few of the sample vloes change category on one of the iterations. At that point, the so called "Swain-Fu" distance between each pair of classes is measured. If the minimum distance between classes is less than some prescribed distance, the two closest classes are merged and the whole process is repeated with one fewer class. Otherwise the clustering algorithm terminates. A minimum number of clusters is also specified in order that merging not continue indefinitely. After completing determination of the cluster centroids, the entire 256x256 image is classified. This is accomplished by determining to which centroid each sample is closest. The output of this program is an image in which the value corresponding with any pixel is the number of the cluster to which it belongs. In order to compare the clusters obtained on the original imagery and on the reconstruction of the compressed imagery, we first determine the correspondence between clusters in the two sets of imagery. To illustrate what is meant by finding a correspondence between clusters, consider the case when cluster 1 in the original image represents water, and in the reconstruction water is represented by cluster 3. In order to save time this correspondence determining program does not find the cluster correspondence which will minimize the difference in classification but instead first minimizes the difference in classification for the subarea of pixels belonging to the cluster with the most elements, then minimizes the difference in classification error for the subarea of pixels belonging to the cluster with the next greatest number of elements, and so on. Once the cluster correspondence is made the percentage of picture elements classified differently in the original imagery and the reconstruction is determined. This number is used as the classification in- accuracy for the particular bandwidth compression method. Implementation Complexity Once it is shown that several compression techniques can significantly reduce the required data rate without introducing unacceptable distortion, it is necessary to compare the implementation complexity of the techniques. Compressor complexity is a particularly significant factor for spacecraft applications. Re- construction complexity on the ground has less effect on overall system costs. To evaluate complexity, an algorithm was proposed for each technique. Then the required number of additions and multiplications were counted and the needed buffer sizes determined, based on typical block sizes and numbers of spectral bands. Sensitivity to Sensor Effects The properties of the sensors that gather multispectral earth resources data can be expected to affect the data structure and thus in turn affect compression performance. As a typical sensor to be used in the next decade for gathering multispectral earth resources data, we considered the thematic mapper projected for NASA's EOS. The thematic mapper is a multispectral scanner similar to that on the LANDSAT (ERTS). It uses photodiodes as transducers and a moving mirror to scan lines perpendicular to the satellites track. The sensor properties that can be expected to affect compression are nonuniformity and nonlinearity of the photodiodes, and geometric distortion and spectral misregistration due to the optics and scanning patterns. For a fixed mean square error, the compression obtainable with a particular compression technique can be related to the correlation coefficient of the process of which the data is a sample function. Based on the scanning geometry of a likely thematic mapper and on the instruments present specifications, the variation in correlation coefficient over an entire image was predicted under the assumption that the data be samples of a wide sense stationary process. These variations were in turn related to increased mean square error or equivalently to decreased signal to noise ratio. SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information / 25 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Bandwidth Compression Methods for Multispectral Imagery Multispectral data can be modeled by a stochastic process f(x,y,k) which is a function of three discrete variable x, y, and k; x and y refer to spatial variables, and k is the spectral variable. In theoretical studies of image bandwidth compression methods one often models the data as a stochastic process having statistics identical to those of the imagery so that one can then evaluate the performance of various encoders on the process. This approach to the analysis of algorithms used for coding video data is often carried out by assuming that the stochastic model f(x,y,k) is Markov on all three variables x, y, and k. This would imply that the auto-covariance of the assumed stationary process f(x,y,k) is exponential in all three dimensions [3], i.e. R(x,x^,y,Y,k,k) =e-cillx-x^1-azIY-Y^1-0431k-kl (1) This is a desirable assumption since it simplifies the analysis of the problem to a considerable degree. The results are valid on most naturally obtained video data that'show ordinary movements in both spatial and temporal directions. Considering this model for MSS data one proceeds with the correlation measurements which specify (1102, and a3 in (1). These measurements also verify the correctness and the accuracy of the above model Measurements of the spatial correlation of the data reveal that the correlation of the data in the horizontal and the vertical directions is indeed exponential. However, spectral correlation does not reduce exponentially by moving across the bands. Indeed various bands show similarities and differences which are totally different from interframe behavior of other video data encountered in television or in reconnaissance flights. The following summarizes the statistics of the MSS LANDSAT imagery. a) Large positive correlation between the red and the green bands. b) Large positive correlation between the two infrared bands. c) Small negative correlations between the red and the two infrared bands. While these correlations apply on the average they do not apply for many particular classes - water being one counterexample. Transform Coding MSS data is three-dimensional, it possesses spectral as well as spatial correlation. Three-dimensional transform coding methods take advantage of the correlation of the data in all three-dimensions for bandwidth compression. In this study we will use three-dimensional Fourier, Cosine, Hadamard and Slant transforms. Delta Modulators and Predictive Coding In the class of predictive coding methods only DPCM systems are considered. Delta modulators are not considered because for bit rates higher than one bit per picture element they require sampling the analog signal at a higher sampling rate. Besides it has been shown that the performances of delta modulators and adaptive delta modulators are suboptimal to the performance of two-dimensional DPCM encoders for image data [8] In addition to DPCM and transform coding techniques the hybrid encoders that combine transform coding systems with DPCM encoders have been shown to give good performance for both intra-and inter-frame coding of television signals [9-10]. These systems as applied to MSS data can be divided in three catagories; 1) Systems that perform a two-dimensional transform of each band of the MSS data and use a bank of DPCM encoders to process the transformed data across the individual bands. 2) The systems that take a transform across the spectral bands and use a two- dimensional DPCM encoder to process the data in the spectrally transformed domain. 3) Systems that follow a transformation in the spectral domain of MSS data by a second transform in the horizontal direction (scan direction) and a DPCM encoder in the vertical direction. The hybrid systems in category (1) are rejected because most multispecral data have a small number of bands and this does not allow DPCM systems to reach a steady state. In addition small correlation of the spectral bands and the absence of the Markovian characteristics limit the performance of the DPCM systems. 26 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Cluster Coding of Multispectral Data Many important uses of earth resources data rely heavily on the use of computerized classification and pattern recognition of the multispectral earth resources imagery. For many classification applications multispectral data is first used to obtain a clustered image. This presents an alternate approach to compressing the bandwidth of multispectral imagery consisting of classifying the multispectral data on- board the space vehicle and transmitting only the clustered imagery. The problem with this approach is that there are many users who are interested in information contained in the individual spectral bands as well as the classified imagery. For this reason in addition to the classified picture, which contains an arbitrary number of clusters, additional information should also be transmitted such that individual bands of the original multispectral data can be reconstructed with an arbitrarily small level of degredation using the classified imagery and the additional information. Different approaches to this method of bandwidth compression has been discussed in recent literature [4,6]. In this study we will discuss the cluster coding algorithm that uses Swain-Fu distance in gener- ating various clusters. Coding Methodology Classification of multispectral data discussed in a previous section is a method of quantization in the spectral domain. That is, a number of picture elements in the measurement domain are represented by a single point in the four-dimensional space. This is illustrated in Figure for a two-dimensional space. Thus, the combination of the clustered imagery and the centroids represent a quantized form of the multi- spectral data which can be used to reconstruct a quantized form of each individual band. Thus, reduction of a multispectral image to an image of clusters and centroids for each cluster actually corresponds to exploiting the spectral correlation of multispectral imagery. The spatial correlation inherent in the multispectral imagery is preserved, to some extent, in the form of spatial correlation in the classified imagery. Naturally in addition to spatial correlation of the classified imagery,correlation of the centroids can be used for further compression of the bandwidth of the multispectral data. The cluster coding algorithm discussed here uses the following steps for encoding and decoding the multispectral data. 1. A multispectral image is divided into blocks of 16 by 16 picture elements in each band. 2. Using all spectral components each block is partitioned into a number of classes. A clustered image and centroids corresponding to each cluster are the output of this step in the coding operation. The number of clusters in each block can be fixed or it can be allowed to vary generating differing numbers of clusters for each block. The latter corresponds to an adaptive bandwidth compression technique that operates at a variable bit rate where the former could be made to operate at a fixed bit rate. Even when the number of clusters is allowed to vary from one block to the other the maximum and the minimum number of clusters in each block can be fixed. This sets a fixed upper-bound and a lower-bound on the output bit rate which can be used for a more effective control of a storage buffer that has to be used with any variable-rate encoder. 3. The receiver reconstructs each block of the multispectral image by generating the individual bands in each block from the clustered image and the corresponding centroids of those clusters. The procedure is to examine each point in the clustered image and specify to which class it belongs. Then individual bands corresponding to the particular picture location are reconstructed by choosing their values equal to the centroid of that particular class. The centroids are real numbers. Thus, they must be quantized and encoded. We have used the rather inefficient (PCM) scheme for coding the centroids for the preliminary results reported here. However, the ij th block in clustered imagery can be coded using [1o92 C], binary digits where Cis the number of clusters in block indexed by i, j and [log,C]+ is the smallest integer larger than log,C44. Using [log2C]+ bits for coding the ij th block in the cla?sified imagery is a-rather inefficient use bf'Jbinary digits. Since it does not exploit any part of the spatial correlation in the clustered imagery. Several methods for efficient coding of the clustered imagery exist. One is using contour tracing combined with statistical coding algorithms such as Huffman encoding of directionals and gray levels. The other is use of differential encoders combined with the statistical coding algorithms. These modifications reduce the bit rate required for transmitting the clustered image and the centroids and thus correspond to a further bandwidth compression of the multispectral imagery. The average bit rate per sample per band R? for the ij th block of imagery for the cluster coding algorithm using PCM transmission of centroids .Jand the clustered imagery is, R [log C..]-1. P C.. 2 ij ij (2) R.. - N2 where 2P is the number of levels used in quantizing each centroid component, B is the number of bands and N2 is the number of samples in each block. SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 27 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Equation (2) shows that the bit rate is directly related to the number of clusters in each block and the number of levels that one uses for quantization of centroids. It is inversely related to the number of samples in each block and the number of spectral bands in the multispectral data. Since one has to allow for a large number of clusters in each block to recover the details of multispectral imagery the method becomes more efficient for a large number of spectral bands. The idea of dividing the multispectral imagery into blocks of N2 samples and cluster coding each block is not essential for the coding algorithm. Indeed if one is interested only in a classified image using the whole image as one block will result in a very efficient encoder using this procedure. However, to reconstruct details in various bands requires a large number of clusters, which increases the compu- tational complexity of the system enormously. On the other hand using relatively small block sizes detailed information in various spectral bands can be reconstructed rather accurately using a moderate number of classes. Experimental Results Simulation of DPCM, transform, hybrid and clustering methods using the criteria discussed iniprevious sections leads to recommending the following three bandwidth compression techniques for multispectral imagery; 1. A hybrid method using Karhunen-Loeve (KL) transform in the spectral domain followed by a two-dimensional DPCM encoder. 2. A hybrid method using KL transform in the spectral domain followed by a cosine transform in the horizontal and a DPCM encoder in the vertical direction. 3. The cluster coding method. In methods (1) and (2) Haar transform is the most suitable deterministic replacement for the KL transform. In (2) Hadamard or Slant transform can replace the cosine transform with almost unnoticeable effects on the performance of the technique. It also was found that eliminating the spectral transformation effects the performance of methods (1) and (2) by about 1.5dBS, based on the results using the LANDSAT imagery shown on Figure 3. In the following the recommended techniques are compared using various criteria of optimality such as MSE, signal-to-noise ratio, recognition accuracy, and computational complexity. The performance of methods (1) and (2) is also evaluated in presence of channel noise and a variable rate (Huffman) encoder which further reduces the output bit rate without effecting the system performance. Comparison of the Recommended Techniques Based on MSE and Signal-to-Noise Ratio Figure 4 shows the performance of the three-dimensional hybrid encoder using KL-Cosine transforms with the DPCM encoder in comparison with the other two recommended bandwidth compression systems using S/N as the criterion of optimality. One is the system that follows the Karhunen-Loeve Transform in the spectral domain with a two-dimensional DPCM encoder. The other is the cluster coding technique. It is shown that the performance of the encoder using KL transformation followed by a two-dimensional DPCM encoder performs significantly worse than the hybrid encoder (KL-Cosine-DPCM) at low bit rate. However, their performances are rather similar at higher bit rates (about 2 bits per sample per band). The performance of the cluster coding method is superior to that of the hybrid encoder at low bit rates. However, it gives inferior results at high bit rates. In this aspect its performance is completely oppositeto that of the KL-2 Dimen- sional DPCM encoder. Figure 4 also shows the performance of the hybrid (KL-Cosine-DPCM) and KL-2 Dimensional DPCM encoders in concatenation with a variable rate encoder. The variable rate encoder is the Huffman encoder which reduces the output bit rate by assigning shorter code words to more probable output levels and longer code words to the less probable output levels. The effect of entropy coding on the performance of the KL-Cosine- DPCM encoder is less than its effect on the performance of KL-2 Dimensional DPCM System. This is because in KL-Cosine-DPCM System a bank of DPCM modulators are used where the KL-2 Dimensional System uses a single DPCM modulator. Using a number of DPCM modulators increases the total number of output symbols, thus re- ducing the number of times each symbol occurs. This will reduce the effectiveness of the variable-length encoder in reducing the output bit rate. The end result is that the performance of the KL-2 Dimensional DPCM System improves an average of 25% where the performance of the KL-Cosine-DPCM System improves by about 10%. Comparison of the Recommended Techniques Based on Recognition Accuracy Figure 5 shows the performances of the three recommended bandwidth compression methods in "preserving the classification accuracy of the reconstructed multispectral imagery." Using this criterion the comparative performance of the KL-2 Dimensional DPCM System and KL-Cosine-DPCM System is basically the same as it was for using signal-to-noise ratio as the criteria of optimality. This shows a strong correlation between the signal-to-noise ratio and the classification accuracy. 28 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Figure 5 also shows the performance of the cluster coding method. The classification accuracy decreases as a result of increasing the number of clusters in each block up to 16 clusters per block. Then the classification accuracy increases sharlpy when 32 clusters per block is allowed. This behavior is due to the fact that the cluster coding algorithm is using a classification procedure on blocks of 16 by 16 samples while classification accuracy is measured using a classification procedure on blocks of 256 by 256 samples. A classification accuracy of 100% results if the block size for measuring the classification accuracy and the block size in cluster coding methods are the same. The argument for using large block sizes in classifying multispectral data is not very solid. This depends very much on users and applications of multispectral imagery. Large block sizes are likely to be used in applications where one is only inter- ested in a limited number of objects in a large segment of imagery, as in hydrology and urban planning. For this application the imagery obtained using cluster coding algorithm has a rather poor classification accuracy compared to the other two recommended bandwidth compression algorithms. Comparison of the Recommended Techniques Based on System Complexity The implementational complexity of the selected bandwidth compression techniques is summarized in Table 1. The results are in terms of the number of computations and memory per picture element. This table also shows typical numbers for 4 channel data and the block sizes used in simulating the compression methods. A detailed analysis of the complexity of the cluster coding method is not performed. But based on the computer time required for compressing the bandwidth of the representative LANDSAT data using this system and comparing it with the computer time required in simulating the KL-2 Dimensional DPCM encoder, it was concluded that the cluster coding algorithm is about one order of magnitude more complicated. Comparison of the Recommended Techniques Based on Channel Noise and Sensor Effects Figure 6 shows the reduction in the signal-to-noise ratio and the classification accuracy which is caused by introduction of a goisy chqnnel. The results refer to a binary symmetric channel with bit- error rates ranging form 10-4 to 10-4. Both algorithms degrade significantly at higher bit-error rates. The system using the KL-2 Dimensional encoder has greater degredation. Figure 7 shows the reconstructed form of band 3 of the LANDSAT data after bandwidth compression using Haar - 2 Dimensional DPCM encoder at bit error rates of 10-3 and 10-4. The propagation of the channel error is clearly visible for bit-error rate of 10-3. However, this is less significant than the effect of channel error for a simple two- dimensional DPCM system. This is because the recommended system using Haar - 2 Dimensional DPCM encoder utilizes the two-dimensional DPCM encoder in the spatial domain. The signal at the receiver of the two- dimensional DPCM system is then transformed by the inverse of the Haar transform to reconstruct multi- spectral data. The total effect of the inverse Haar transform at the receiver is to distribute the channel error(and its propagation)among all spectral bands. Although this leaves the total error unchanged, it distributes the channel error among all spectral bands thus making it less objectionable to the human vision. The sensor imperfections affect the correlation of the data which in turn affects the performance of the bandwidth compression techniques. Analytical results show that this effect is minimal. Conclusions The following are the major conclusions of the study (1) Fixed rate encoders give excellent quality for the reconstructed imagery at 4 to 1 compression ratio. This compression ratio is increased to 6 to 1 by making the system operate at a variable rate by adding a Huffman encoder to the system. At this compression ratio the method using Haar 2 Dimensional DPCM system has a signal-to-noise ratio better than 35d8S, a classification accuracy better than 91% and gives reconstructed imagery which are almost indistinguishable from the originals. (2) At a fixed bit rate and a compression ratio of 8 to 1 the Haar-Cosine-DPCM system gives acceptable results. This corresponds to signal-to-noise ratio of better than 310S, recognition accuracy of better than 82% and slightly degraded imagery. (3) Cluster coding methods can be used to obtain a compression ratio of 12 to 1 which corresponds to a signal-to-noise ratio of better than 29d8S, recognition accuracy of larger than 71% and possibly accep- table reconstructed imagery. (4) Compression ratios of 16 to 1 and higher correspond to noticeably degraded imagery which are only acceptable for some users. (5) The effect of the sensor imperfection on the performance of the candidate bandwidth compression methods was minimal. Therefore, it had minimal impact on the choice of the recommended techniques. (6) Channel error effects are minimal for transform and DPCM systems for bit error rates less than 10-4. SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 29 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 References [1] Definition of the Total Earth Resources System for the Shuttle Era, General Electric Space Division, Contract No. NAS9-13401, NASA Lyndon B. Johnson Space Center, Houston, Texas 77058. [2] Multispectral Scanner Data Redundance Study, Final Report, May 15, 1971, Philco-Ford, Contract No. NAS5-21162, NASA/Goddard Space Field Center, Greenbelt, Maryland. [3] P. J. Ready, P. A. Wintz, "Information Extraction, SNR Improvement, and Data Compression in Multispectral Imagery", IEEE Transactions on Communications, Vol. 21, October 1973, pp. 1123-1131. [4] J. N.Gupta, P. A. Wintz, "A Boundary Finding Algorithm and its Applications", IEEE Transactions on Circuits and Systems, Vol. CAS-22, No. 4, April 1975, pp. 351-362. [5] C. L. May, D. J. Spencer, "Data Compression for Earth Resourcs Satellites", International Telemetering Conference, Los Angeles, September 1972. [6] E. E. Hilbert, Joint Classification and Data Compression of Multidimensional Information Source Application to ERTS, Conference Record Vol. II, Internation Conference on Communications, June 1975, pp. 27(1-6). [7] A. Habibi, "Two-Dimensional Bayesian Estimate of Images", IEEE Proc. Vol.,60, No. 7, July 1972 pp. 878-883. [8] A. Habibi, "Delta Modulation and DPCM Coding of Color Signals", Proceedings of International Telemetering Conference, Dec. 1972, pp. 363-373. [9] A. Habibi, "Hybrid Coding of Pictorial Data", IEEE Trans. on Communications, Vol. COM-22, No. 5, May 1974, pp. 614-624. [10] J. A. Roese, W. K. Pratt, G. S. Robinson, A. Habibi,"Interframe Transform Coding and Predictive Coding Methods", Conference Record of International Conference on Communications, June 1975, pp. 23(17-21). Class 1 D 11, Class 2 line which defines partition Px ...e. ?Class ?? Class 1 line which defines partition Py Figure 1. Partitioning Data in Two Classes Before and After Distorting the Data. 30 / SPIE Vol 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Brightness in Band 2 Centroid of Class 1 ? ? ? ?? S. - ? ? ? ? . Class 1 ??? ? ? ' ? ? Centroid of ? Class 2 O. 4 Class 2 Brightness in Band 1 Figure 2. Each Class has a Centroid Equal to the Average Location of Data in that Class. Figure 3. Original Bands SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 31 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 2.0 1.0 0.5 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 22 24 26 28 30 32 34 36 Figure 4. Bit Rate Versus Signal-To-Noise Ratio for KL-2Dim, DPCM, KL-Cosine-DPCM, and Cluster Coding. 32 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 A CLUSTER CODING `- _ X KL-COSINE-DPCM KL-2DIM.-DPCM A 6 8 10 12 16 20 24 28 Figure 5. Bit Rate Versus Classification Inaccuracy 32 36 SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 33 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 30 28 0 ce 2 4., 0, 24 26 22 20- 18? () Haar-2 Dim. DPCM Haar-Cos-DPCM 131 29 27 25 23 21 19 0.0 10-4 10-3 10-2 Bit Error Rate Figure 6. P-P Signal-to-Noise and Classification Inaccuracy (Dash line) versus Bit Error Rate. % Classification Inaccuracy P = 10-4 Figure 7. Band 3 - 1 Bit - Haar - 2 Dim. DPCM. P = 10-3 34 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 SPIE Vol. 56(1975) Efficient Transmission of Pictorial Information / 35 Table 1. Computational Complexity of 3 Bandwidth Compression Methods 3-Dimensional Hadamard Transform and Block Quantization COMPUTATIONAL NEEDS STORAGE NEEDS (BITS) ADDS (OR SUBTRACTS)/ PIXEL TYPTCM Log2B2S+T MULTIPLIES/ PIXEL RECORD STORAGE TyPirtu. I TYPICM 1 16BSW 3,300,00C MEMORY TYPICAI 44B2S+8000 54,000 1-Dimensional Karhunen-Loeve Transform on the spectral domain followed by 2-Dimensional DPCM S+T+3 S+4 16SW 200,000 16S2+148S+8000 17,077 Karhuner-Loeve Transform on the spectral domain followed by a Had- amand Transform in the scan direction (x-direction) with y-direction DPCM using block quantization. Log2B+S+T+1 14 S+2 32SW 400,000 84BS+32B+32S(S+1) +8000 15,000 TYPICAL B; Block size B = 16 T; Bits/pixel T = 1 S; Spectral Bands S = 4 W; Sample in each scan W = 3200 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 REAL TIME TELEVISION IMAGE BANDWIDTH REDUCTION USING CHARGE TRANSFER DEVICES H. J. Whitehouse R. W. Means E. H. Wrench Abstract Recent advances in analog semiconductor technology have made possible the direct sensing and processing of television images. By combining a charge transfer device (CTD) imager and a CTD transversal filter, real time image sensing and encoding have been achieved with low power integrated circuits so that digital transmission and bit rate reduction are made possible using differential pulse code modulation (DPCM). Good mean square error performance and freedom from DPCM artifacts are possible in a hybrid intraframe image encoder. The hybrid transform encoder performs a discrete cosine transform (DCT) on each line of the television image as it is scanned. This compacts the variance into low frequency coefficients and the DPCM encodes the corresponding DCT coefficients between successive lines. Computer simulation of this hybrid coding technique has shown good performance on 256 x 256 pixel images at 0.5 bits/pixel and channel bit error rates of 10-2. An experimental system using a low resolution General Electric 100 x 100 charge injection device camera and a Texas Instruments bucket brigade transversal filter as part of the DCT processor has been constructed and provides good low resolution image quality at 1 bit/pixel and bit error rates of 10-3. A high resolution vidicon compatible system is also being constructed. Introduction Unitary transforms for image encoding have been used for intraframe encoding.( I) In addition, these techniques may also be applied to interframe and multispectral encoding. However, all unitary transformations are information preserving and no bandwidth reduction results from the application of the transform to the image. Instead, the transforms redistribute the variance associated with each picture element (pixel) so that subsequent to the transform, basis restricting operations on the transform coefficients will result in bandwidth reduction. Upon reconstruction of the original image from the basis restricted transform coefficients, a degraded version of the original image can be obtained. Unfortunately, the interrelationship between the type of transform, the form of the noninvertible oper- ation, and the type of degradation in the reconstructed image is very complicated and subjective. The universally used analytic criterion of the mean-square-error is, at present, the best compromise technique for transform comparison. For the particular operation of basis restriction by truncation, a particularly simple interpretation of the bandwidth reduction can be made. The transforms may be viewed as a variance redistributing operation that approximately decorrelates the transform coefficients while transforming the variance associated with each picture element into the low-order coefficients of the transform. Under the assump- tion that each set of picture elements can be considered as a sample function from a wide sense stationary random process with correlation function rITI, there exists an optimum discrete transformation, the Karhunen-Loeve transformation, which totally decorrelates the trans- form coefficients and maximally compacts the variance to the low-order coefficients. All other transformations can be compared in their performance by comparing their transform coefficient decorrelation and variance compaction with this optimum transformation. This intuitive interpretation can be made rigorous through the use of the rate-distortion criterion.(2, 3) It has been found from experience that the closer the eigenvectors of the transformation are to the eigenvectors of the optimum Karhunen-Loeve transformation the greater the variance is compacted and the more the coefficients can be truncated while maintaining a fixed rate distortion or mean- square-error. Transform Encoding Karhunen-Loeve Transformation If a continuous time function of zero mean and autocorrelation function R(r) = Cairl is considered to be a sample function from a wide-sense stationary random process, then this time function can be explicitly expanded by the Karhunen-Loeve expansion(4) and the resulting coefficients will be uncorrelated. For a discrete function of zero mean and autocorrelation function R(r) =rITI, which may be considered as a sample function from a first-order Markov process, a similar discrete Karhunen-Loeve transformation may be defined.(5) This transformation diagonalizes the covariance matrix and is optimal in the mean-square-error sense for a restricted set of basic functions that do not span the complete space. The discrete Karhunen-Loeve expansion is given by(5) for the case N = 2m as where 2m Gk= 2 sin {con [k - (2m + 1)/21 + nr/2 n= I 2m + Xn2 k = 1, 2, . . . , 2m X2 - 1 - r2 1 - 2r cos 0.1 n r2 36 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 (1) (2) Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 and con are the first N positive roots of tan 2inco - -(1 -r2) sin co (cos co -2r + r2 cos co) (3) Since the discrete Karhunen-Loeve expansion involves both the solution of a transcendental equation and the evaluation of the auto- correlation function of the data to be transformed, real time computation of this transform is quite complex. However, Habibi and Wintz(1) have shown that Karhunen-Loeve transformations calculated using approximate autocorrelation functions are satisfactory for many applications. Discrete Fourier Transform Since the discrete Fourier transform is asymptotic to the Karhunen-Loeve transformation(6) for the exponential covariance function and the basis vectors are picture independent, the Fourier transform represents a logical choice for real time implementation. The Fourier transform exists for all data lengths N. This is defined by N-1 Gk cirr21k/N gn k = 0, 1, N - 1 (4) n=0 Discrete Cosine Transform Two different types of discrete cosine transform (DCT) are useful for reduced redundancy television image transmission. Both are obtained by extending the length N data block to have even symmetry, taking the discrete Fourier transform (DFT) of the extended data block, and saving N terms of the resulting DFT. Since the DFT of a real even sequence is a real even sequence, either DCT is its own in- verse if a normalized DFT is used. The "Odd DCT" (ODCT) extends the length N data block to length 2N - 1, with the middle point of the extended block as a center of even symmetry. The "Even DCT" (EDCT) extends the length N data block to length 2N, with a center of even symmetry located be- tween the two points nearest the middle. For example, the odd length extension of the sequence A B C is CBAB C, and even length is CB AAB C. In both cases, the symmetrization eliminates the jumps in the periodic extension of the data block which would occur if one edge of the data block had a high value and the other edge had a low value; in effect, it performs a sort of smoothing operation with no loss of information. It will be noted that the terms "odd" and "even" in ODCT and EDCT refer only to the extended data block - in both cases the extended data block has even symmetry. Let the data sequence be go, g1 , , gN_i . The ODCT of g is defined as N-1 Gk= I n=-(N-1) where By straightforward substitution it may be shown that where is defined by equation (8). -i27rnk 2N -1 gn c for k =0, 1, , N - 1 (5) g_n = gn for n = 0, 1, .. N - 1. (6) N-1 -i27rnk Gk = 2 Re I e 2N n=0 1 0.5 go, n = 0 = The EDCT of g is defined by equation (9), where the extended sequence is defined by equation (10). -irk N-1 -i27rnk 2N 2N Gk = e gn e n=-N for k = 0, 1, ,N - 1 g_i_0gn for n = 0, 1, . . , N - 1 (7) (8) SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 37 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 If the mutually complex conjugate terms in equation (9) are combined, then equation (11) results. Equation (11) may be viewed as an alternate way of defining the EDCT. ( -irrk Gk = 2 Re e 2.1\E N-1 n=0 -i2rrnk ) 1( gn e2N N-1 = 2 gn cos [ 2N27r (n + 0.5) k / n=0 Ahmed(7) has investigated the use of the EDCT as a substitute for the Karhunen-Loeve transform and finds that it is superior to the Fourier transform and is comparable to the Karhunen-Loeve (K-L) in rate-distortion performance while maintaining the computation simplicity of a transform which does not depend on the picture statistics. Habibi(8) has shown by simulation that the DCT is equivalent in a mean-square-error sense to the K-L transform under basis restriction. Hybrid Transforms Mixed transforms can be used either for intraframe encoding or interframe encoding depending on the available memory and the type of transforms implemented. In order for a mixed transform to be competitive with one of the conventional two-dimensional trans- forms, it must offer either superior performance or simplicity of implementation. Of the transforms examined the odd length cosine transform is competitive in performance since it can be implemented as the real part of a CZT and since the transform samples are real and are the samples of an autocorrelation function which may then be extrapolated by well-known techniques. In simulations of trans- form performance, the cosine transform has been shown to closely approximate the behavior of the Karhunen-Loeve transform. The benefits of mixed transformation implementation and minimum memory may be achieved for digital transmission by combining a one- or two-dimensional unitary transform with generalized DPCM in a hybrid system. The basis of operation of the hybrid transform is that the unitary transform decorrelates the image within its constraints of transform type, dimensionality, and block size, while the generalized DPCM removes the correlation between transform blocks. This hybrid system is particularly attractive for remote sensor application since it has been found that its performance is approximately as good as the Karhunen-Loeve transform and its implementation requires minimum memory.(8) Implementation Computational Modules .A linear transform on sampled data of finite extent may be viewed as the multiplication of a vector by a matrix. Multiplication by diagonal, circulant, or Toeplitz matrices may be accomplished rapidly with simple computational hardware modules. Multiplication by an N X N diagonal matrix requires only a scalar multiplier and a memory containing N values to provide serial access to the reference function. Multiplication by an N X N Toeplitz matrix corresponds to a convolution and may be performed using a transversal filter having 2N-1 taps. Multiplication by an N X N circulant is a special case of multiplication by N X N Toeplitz matrix in which the length of the transversal filter may be reduced to N taps if the data block is recirculated through the filter or reread into the filter from a buffer memory. One-Dimensional DFT Linear filters have been used for many years for the calculation of the power spectra of continuous signals. One of the earliest methods used a bank of wave filters to measure the spectra in fractional octave bands for telephone network equalization.(9) However, when increased resolution was required the number of filters rapidly become unmanageable. An alternative which overcame the difficulty of a large number of filters each with small time-bandwidth product was to substitute one linear fm (chirp) filter with large time-bandwidth product and to employ matched filtering. In this system the signal to be analyzed is used to single sideband (SSB) modulate a locally generated chirp signal and the composite modulated signal is filtered in a chirp delay line matched filter. Each component of the input signal spectrum shifts the locally generated chirp to a different position in the spectrum after SSB modulation and these shifted chirps then correlate with the reference signal represented as the impulse response of the matched filter at different times. Thus the output signal amplitude-time history reflects the amplitude-frequency composition of the input signal. Bleustein(I 0) recognized that the discrete Fourier transform (DFT) of sampled data was amenable to a similar interpretation. In addition to just calculating the magnitude of the Fourier transform, linear filters could calculate the phase and thus all of the operations such a cross convolution and a cross correlation could be calculated. This technique came to be called the chirp-Z transform (CZT) and can be applied to other problems besides just the calculation of the DFT.(1 I) Prior to these developments, digital computation of the DFT had been significantly improved by the use of a special algorithm called the fast Fourier transform (FFT) which was described by Cooley and Turkey.(12) The FFT algorithm gained rapid popularity in signal processing since it allowed the calculation of the DFT to be done using significantly fewer machine operations (multiplications) than direct evaluation. By direct inspection it is observed that, if symmetries of the function exp jr2nm/N are not exploited, then the number of complex multiplications required will be N2 corresponding to N multiplications for each frequency component evaluated. Even on high speed digital computers this can become the limiting consideration in signal processing applications. The advantage of the FFT algorithm is that for highly composite values of the DFT size N the number of multiplications is proportional to N log2N. Although the FFT has been successful in substantially reducing the computing time and cost of using general purpose digital com- puters it has several disadvantages for special purpose real time computation. At high throughput rates which are required for real time image processing the processor either must operate log2N times faster than the data rate or pipeline structures which use distributed 38 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 memory and log2N multipliers must be used. In addition, the internal arithmetic of the FFT processor must be done at increased preci- sion in order to compensate for the multiple round off errors introduced by the successive stages in the FFT processor. Although these difficulties can be overcome, it is not always possible to arrange the computation in a form where the size of the transform is highly composite. For the above reasons and because of the difficulty of obtaining small, low power, fast analog to digital converters, linear transversal filter implementations of the chirp-Z-transform are attractive(1 3) rather than the previous CZT implementation which used an FFT to perform the required convolution. The DFT may be easily reduced to the form suitable for linear filtering by the substitution 2nm = n2 - (n - m)2 + m2 which changes a product of variables into a difference so that N-1 0j7r(n-m)2/N e-j7rn2/N Gm = Ci71112/N gn n=0 This form is seen to be equivalent to factoring the Fourier matrix F into the product of three matrices F = DTD (12) (13) (14) where D is a diagonal matrix with elements dnn = exp (-j7rn2/N) and T is a Toeplitz matrix with elements tnm = exp (j7r(n - m)2/N). The CZT algorithm is easily implemented by transversal filter techniques. In this case the DFT is computed by premultiplication by a discrete chirp, convolution with a discrete chirp, and postmultiplication by a discrete chirp. Figure 1 shows this configuration. However, it must be remembered that both the multiplications and convolutions are complex and a suitable representation of the com- plex numbers must be used. One representation is by real and imaginary part. Figure 2 shows the DFT organized as a CZT and imple- mented with parallel conputation of the real and imaginary parts. In Figure 2 the input signal is represented as g = gR + jgT and the out- put signal is represented as G = GR + jG1, where it is understood that g = gn n = 0, N - 1 and G = Gn n = 0, . , N - I. In order to determine the specific form of the transversal filters it is necessary to know the specific value of N. When N is odd the 2 Toeplitz matrix T may be represented as a transversal filter with 2N - 1 complex taps h4Nr_oto hN_I where hn = W-11 hn = -(N-1) to N-1), and W = exp (-j27r/N). The required convolution has been implemented with the general transversal filter shown in Figure 3. When N is even, it can be shown tha Tn,m = TN+n, m where the subscripts are reduced mod N. Thus T is a circulant matrix and can be implemented with a recirculating transversal filter as shown in Figure 4 where the number of complex taps is N and the tap weights 2/2 = w-n are: h n = 0, . , N - 1. Charge Coupled Devices CCDs are sampled data analog circuits which can be fabricated by Metal Oxide Semiconductor (MOS) technology as LSI compo- nents.(14) As such they are directly compatible with other MOS circuits. Current CCD transversal filters have operated as video devices with sample rates up to 5 MHz. CCDs operate by the manipulation of injected minority carriers in potential wells under MOS capacitors and thus behave as capacitive reactances with low power dissipation. However, since the potential wells which contain the minority car- riers also attract thermally generated minority carriers, there is a maximum storage time for the analog signal which depends on the dark current associated with the temperature of the silicon. Under normal conditions at room temperature, dark currents are tens of nAmps/ cm2 and storage times of hundreds of milliseconds can be achieved. There are many ways in which unidirectional charge transfer can be achieved. The first developed was a three-phase clocking struc- ture which is illustrated in the transversal filter of Figure 5. The three electrode CCD structure is planar, much like the SAW devices, and the direction of charge propagation is determined by the sequence of potentials applied to the three electrodes. Unfortunately, if the minority carriers are allowed to collect at the semiconductor-oxide boundary, poor charge transfer efficiency will result due to minority carriers getting caught in trapping sites. This means that the CCD will behave nonlinearly unless there is sufficient propagating charge present to fill all of the traps. By biasing the operating condition of the CCD so that about 10% of the dynamic range is used for the injection of a "fat zero," the traps are kept continuously filled and the device has over a 60 dB dynamic range. In practice, a video signal representing the signal to be processed is added to a fixed bias somewhat larger than one-half of the peak-to-peak value of the signal. Since the effective storage time of the device is long relative to the time required to execute a convolution, CCDs can be considered to be e-j7m2/N * eirm2/N * Denotes either convolution or circular convolution Figure 1. Chirp-Z Transform Implementation of the DFT SPIE Vol. 66 (19751 Efficient Transmission of Pictorial Information / 39 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 2 COS LUL CONVOLUTIONS COS? Figure 2. DFT Via CZT Algorithm with Parallel Implementation of Complex Arithmetic // yo CLAY LINE hN-1 YN-1 )\ Y2N-2 no hN-1 hN-2 h h ? 1' 0 0, h 0/ xN-1 Figure 3. Transversal Filter xN-1' ' x0 no hN-1 1-01.Y2N-2 YN-1 YO Vo IhN-1 hN-2 " h0 vi C0 h h 0 N-1 " ? 1 hN-2 hN-3 " hN-1 Figure 4. Circular Convolution xN-1 Y 40 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 o, Ihr.nti.1 Currint 43 Drivsr 7 WWI (b) 42 DrIvor Figure 5. Schematic of the Sampling, Weighting, and Summing Operation interruptible signal processors and as such are more compatible with the executive control required for signal processing. A 64 point CCD filter with discrete cosine transform sine and cosine chirps is shown in Figure 6. This chip was developed by Texas Instruments for the Naval Undersea Center for image processing. In addition to the four DCT filters, four DFT filters, a Hilbert transform and other experi- mental signal processing functions were also implemented. Current research in CCDs is directed toward improving the charge transfer efficiency and removing the requirement of continuous "fat zero" charge injection by ion implantation techniques which keep the minority carriers away from the semiconductor oxide bound- ary. Ion implantation is also being used to provide asymmetric potential wells so that simpler two-phase clocking can be employed. Cur- rently available CCDs have 500 stages with 0.9999 transfer efficiency and devices with up to 2000 stages are planned. Another charge transfer device similar to the CCD is the Bucket Brigade Device (BBD). This is a sequence of MOS transistors coupled together by diffusion enhanced Miller capacitance. Although these devices do not operate at frequencies as high as CCDs, they have better low frequency performance since they include active devices. A CZT has been implemented with two BBD chips. Two 200 tap filters are implemented on each chip: one a discrete cosine and the other a discrete sine filter. The BBD chip is shown in Figure 7. The complex chirp used in the premultiplier and a typical input and output are shown in Figure 8. The input is an offset cosine wave and the output shows a D.C. component plus a response at the cosine wave frequency. These filters can operate at 100 kHz and have tap accura- cies better than 1%. With careful control of geometry, both BBD and CCD filters with tap accuracies approaching 0.1% should be possible. This chip was also developed by Texas Instruments for the Naval Undersea Center. 0 A iiiiii iiqm j? _J? ? _p 6?A 1211111311111 ? ? 111111111 ki ItiE1111111111111111111111111", NEE' Iliu,111[1.,Lil.1.1.1.1 j 111111.111,)1:111J4,ItoLL,51..ii ? NJ' ? jo mrmr? NP2.4. 114_11 Figure 6. 64 Point CCD Filters SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 41 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 ff. ,11,Ailii llllllllllllllllllllllllllll ii p.itlfilliTililfilljilliff 04110 IC _ li:rj, I rii':7 1 L'id i ir?ii I ;,. .0 -I' - "I 31-44a-I'71';';'-:'7'7'-.'7'7'-','-','"": -I' -1'7'7' "I'I'7,'"I'7,T-.I71?7:1-.I-__.-:- 7270-1711 ( _ P....F.- ITAi'l:"4144W0:4W1:441:1:0:41401 I.itftfifififififef l l 'TIN kNelk V lll l gahrilipikitreiftliMT-ur 1-1-1-11-r-W-Ti ll ll ll llll "1%.01,1?1,t,21.11,11ifilelif llllll t.:T4e4wio.,,w...? ? . '10.49!1!!! l i'i!i!1-9! itr 4,ift'tfi!4!!!4!41.4Ntlqi9Nilegoill'irerilifitifilifiRre to, Figure 7. Bucket Brigade Chirp Filters IMPULSE RESPONSE V 1 0,u :"; ? ? ? ? ? ? e ? ? ? 0 ? ? ? ? *10: ./0 ? . ? ? ?? 9 ? 0 ? ? ? 0 ? ? ? ? ? ? p ????????;"...kote?oo? 200NS \VINVIV Figure 8. BBD Performance PREMULTIPLIER CHIRP INPUT TO OCT OCT OUTPUT 42 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Discrete Cosine Transform Let the data sequence be gO, gl , ? ? ? gN_I . The ODCT of g is defined in (5) as N-1 -j2rnk 2N-1 Gk = gn e n=-(N-l) fork = 0, 1, ,N -1 The identity (15) may be used to obtain the CZT form of the ODCT shown in equation (16). 2nk= n2 + k2 - (n -k)2 -jirk2 N-1 _j,n2 jr(n-k)2 2N-1 2N-1 2N-1 Gk = 2 Re e e gn e n=0 (5) (15) (16) The block diagram of the ODCT is shown in Figure 9. Since only real inputs and outputs are required, a simplified implementation is possible and is shown in Figure 10, A corresponding implementation may be found for the EDCT. The EDCT of g is defined by equation (9), where the extended se- quence is defined by equation (10). -irk N-1 Gk e 2N n=-N -j2wrik 2N gn e fork g-I -n = gn for n = 0, 1, , N - I el/TM2 /(2N -1) e G Orn2 /(2N ?1) cj1rm2/(2N ?1) -IN - ?m and 1)k(0, 1) 5 < (40) Now x1(,r,e), x2(71 ) and x3(71,Te) are trans- form coefficients as transmitted, that is, ele- ments of C(/, , ). Whereas Xi (0,1), X2(0,1) A and x3(0,1) represent estimates, from surround- ing data, of the original coefficients in C(e, 0,1) before shifting and thresholding. If the Xk(O, 1)1 are indeed valid estimates of the true coefficients, then operating on them with S r? I and T-t f? should produce the same set of transmitted numbersIx,lc (2' ;7 I; )1. We next make use of this principal to guide a decision to use A A Xic( 0, 1) or some modification of k(0, 1). In order to make more direct use of equa- tions developed earlier we let I XOSI ) 4Xk(81 ) and K(O, 1) 5 4k(0,1) (42) where we recall that xki (7e, Tie ) is simply the stored integer form of xk(si, ri) (see (15), (17) and Fig. 6). Making use of (16) and (20), the final form of any estimate xk(0,1) must satisfy I xk(s/j-i)= 2-6-11 TE S, [x,_Al(0,1)]] (42) 1`. Here we will treat only the case of primary inter- est where x1 ('7 ) = x2 (7 ,) = x3(s-/, ) = 0. /2 / We define A E max -130 2 S_ [ s 2 (0,1)] (43) - 0 5 _ ? a = A Then if A < we choose the final form for the two by two array W to be W. If A a T:e we obtain this output by applying the inverse two by two Hadamard transform to (44) We recognize a as a scaling factor which makes A ax.k(0,1) satisfy (42). The condition in (42) is much more restric- tive when some of the transmitted coefficients are non-zero. As a result the algorithm presently simulated will tend to accept W unmodified a large percentage of the time. It basically leaves edges alone. Observe that when W lies on the border of Wo(/ - 1) the surrounding data, E. ., may still be 1,3, available. Provided the particular W does not lie on the border of a picture, the necessary E. . J can be obtained from the corresponding transform level of adjacent 64 by 64 data arrays. Present simulations have shown that this eliminates noticeable transitions between adjacent data arrays even at very low data rates. 82 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Comments. The adaptive inverse algorithm described above has not been shown to be optimum in any fashion. It is however both practical and effective in improving subjective fidelity and in reducing mean-square-error. It is felt that both simulations and analysis could lead to more sophisticated adaptive versions with worthwhile additional performance benefits. RM2 IMPLEMENTATION Based upon preliminary implementation studies, it was estimated that 300 to 400 CMOS integrated circuits would be required to implement the RM2 compression algorithminthe form described in this report. Suc ha design would accommodate an 80 KBS input data rate. More restrictive applica- tions could reduce this substantially although the 300-400 chips is considered quite practical. Con- tinuing advances in solid state technology promise to further minimize the chip count. It is of interest to note that the transform implementation is almost trivial since it requires no more than an iterative application of a two [6] by two fast Hadamard transform. The rate allocation algorithms could ideally be performed by microcomputers, a likely element in future flight data systems. This was not assumed in the estimates however. Much higher speeds could be accommodated by using a more parallel struc- ture than the one assumed. SIMULATIONS A study was recently initiated to investigate the applicability of AICS to advanced Pioneer missions which incorporate imaging. Sample images have been selected for this purpose by imaging scientists to be representative of expected mission targets, and via processing, to exhibit characteristics similar to those of a planned Pioneer CCD camera. We will present some preliminary pictorial results. Mercury Closeup One of the sample images to be used in this study is a 512 by 768 closeup of Mercury derived from the recent Mariner flybys. A standard stretched display of this scene, including histograms, is given in Fig. 10. A more reveal- ing display is shown in Fig. 11. Here a two dimen- sional filter which tends to equalize average bright- ness throughout the scene was applied. This permits a more severe contrast stretch to be used, enhancing visual detail on film. These operations are representative of the block labeled "image pro- cessing" in the overall AICS system diagram in Fig. 2. RM2 Compressed Examples of images compressed by RM2 at rates of 2. 0, 1.33, 1. 0, 0.66 and 0.5 bits/pixel (compression factors of 4, 6, 8, 12 and 16) are shown in Figs. 12-16 respectively. The same critical display technique of Fig. 11 is used. Except for the rate; all internal parameters of RM2 were identical in each case. These are noted in Table 1. The reader may refer to the appropriate section for further details. In addition, a simple black space detection algorithm was employed (see (4) and (5)) to illustrate the idea. Table 1. Parameters for Simulation Examples Parameters Used for Where Picture Loop Rate Control Fig. 3 Gain = 1.0 Within Imaging Sequence Array Loop Rate Control Fig. 4 Gain = 0. 35 Within an Image e(m), m= 1, Rate Allocation Equa. 2, ? ? ? 8 to picture arrays 6, 7 0. 0,0. 5, 1. 0, 1. 5, 1. 9, 2. 2, 2. 5, 2.8 t n aMIN= 0.35 Rate Allocation Equa. to picture arrays 6-10 Initial shifts Obtaining Array Equa. 81(0) = sz(0) Activity; starting 31-34 = 2, s3(0)= 1 point for finding gi. pi = 0.2, = 0 3 32 ' ' Frequency Emphasis Equa. 29-32 133 = O. 5 \j1 = ?12 = V3 Adaptive Inverse Equa. 38 = 0. 6 Estimation Parameters SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 83 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 MMIIMMEMMWMUNNW17 1141 III/ 111111111 111.11111111111.101 111.111ft11111.11111 itieltotolittol,tol otelleolvillitifigi flotioitillit i'ltf #1,triAta ol,ottliiii TEST FRAME 2 MERCURY 2 KM,PIXEL RESSAR7 RESEAU EMOVED STRETCH D (12,6991.26)10(0,128,255) STRETCH 71.52 RETCHED AtiTOGRAM Lig. 10. Stretched Pioneer 'Fest Frarne. 84 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 v?i?itift, ty.1?4 11111611,111,11 111111???11.;iivt,2,1 fit "111111,11 111"111" II 111111,11kil Vig. 11. Mercury Closeup Original,Ililtered. $1.11114?11.111110. 1.114.419111?1 t? ?111111?11111111fit 1/101?11.1.11111111 111111,111ti 1"1141.1 ilflovlift411 II 1.41111.01101.11.1t 1..11111111"W! ? t tAIJItilliititi?111,4,1?.1411? 12. RI\42 Coded to Z. 0 bits /pixel (4:1) SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 86 Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 1,1611011111Oli 41101101/11,11 oto, IIIV1111111111.111 . ;4111,1111114.111,11 itillItioloo,J1,11. ,t1,1*,1,1,,,,1 1,1,11,,,I,,,,fixt Itt,,I,,,,111, 1,1,,,,I,,,,11,1, ,,,,I,,,,11,, 11111.1111111111ML, 11111filill1/11/$1/ lItIVOlt11,11111 I 1"1111111111111,11 $11t111101:4,111.14. ft,*11011111.,141', 111111111W1,11111 IIIIIIIIIII..114114 11111111i1111 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 111/111111111111111 4101111.11111111., 1.141.1111111111.10 1101111.1i1111111011 111,141111111.11111 1.11111111111111111 1M/1141111111111f 41111 ? 4 `11, t 11, ?7? , at' In,1,1 I M2 Coded to 0. 66 bits. /pixel (1 : I1 till 1111111111.4111101 1111r1111141111.11 4114111111111111111 impoillotopm iloglitel itIllt.14 111,11 11111totdm11111, Itt11041;1.1 rig. 16. 111\AZ Coded to 0.5 bits /pixel (I 6:11 SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 8/ Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 Rate distribution. The top and bottom of the Mercury picture were separately treated as a sequence of two 256 by 768 images. It is of inter- est to look at the initial rate allocation to each of the 48 arrays making up the top half as a result of the algorithm described in (5) - (9). This demonstrates one aspect of the adaptive nature of RM2. The distribution of array activities derived from (32) is shown in Fig. 17. Choosing the image rate of 1.33 bits/pixel as an example (Fig. 13), the application of the algorithm in (5)-(9) yields the initial rate alloca- tions in bits/pixel to each of these arrays shown in Fig. 18. Observe that the two arrays in the lower righthand corner have been allocated a rate of zero bits/pixel. These have actually been edited (see (5)). No real data has been lost as can be noted from Figs. 10 and 11. Without the editing they would have been assigned a rate of around 0.5 bits/pixel, the same as for the low activity arrays immediately above. The rate assignments to other areas reflect their relative activity, with one extremely active array receiving over 3 bits/pixel. Control loop error. At the completion of the top half picture the rate error (ei in Fig. 3) was 1125 bits or 0.006 bits/pixel. All these "unused bits" were used to adjust the rate of the bottom half picture (i. e. , a picture loop gain of 1). At all other compression rates lei was quite similar. Rate/fidelity. Recall that because of the con- catenated Reed-Solomon/Viterbi Channel in Fig. 2 any rate/fidelity tradeoffs provided by RM2, as in the above sequence of images, represent true end-to-end communication system tradeoffs. Further, the user could specify any intermediate : CIA-RDP80600829A001100060001-0 rate and expect a smooth monotonic relationship between the rate specified and the fidelity obtained. These considerations and a preliminary assess- ment of RM2 performance from these examples and other simulation results gives AICS in Fig. 2 an effective average information rate capability of 3 to 5 times that of the Jupiter/Saturn Communication System in Fig. 1. Similar comparisons with an uncoded communication system, such as that of the recent Mariner Venus /Mercury flybys, would yield an AICS information rate advantage of up to 10 dB. Amore extensive evaluation of RM2 when applied to planetary imaging data will come from the Pioneer study. Applications of RM2 to other problems is being investigated and initial results have been excellent. In addition to tailor- ing the simulation version of RM2 described here to specific applications, further improvements in performance can be expected from extensions and modifications to the algorithm. Particular emphasis here should be placed on expanding the rate allocation concepts and improving the adap- tive inverse. Also important is the specification of simpler versions of RM2 for more restrictive applications in which being extremely adaptive is not important. REFERENCES 1. Rice, R. F., "Channel coding and data compression system considerations for efficient communication of planetary imaging data, " Technical Memorandum 33-695. Jet Propulsion Laboratory, Pasadena, California, June 15, 1974. , 2. J. P. Odenwalder et al., "Hybrid coding sys- tems study," submitted to NASA Ames Res. Ctr. by Linkabit Corp., San Diego, Calif., Final Rep. , ContractNAS2-6722, Sept. 1972. 0.47 0.43 0.54 0.55 0.50 0.42 0.55 0.69 0.95 0.47 0.31 0.03 0.49 0.53 0.49 0.34 0.28 0.33 0.43 0.63 0.63 0.44 0.25 0.07 0.42 0.46 0.60 0.44 0.37 0.41 0.38 0.66 0.62 0.47 0.25 0.02 0.46 0.40 0.37 0.43 0.43 0.51 0.61 0.81 0.40 0.42 0.22 0.02 Fig. 17. Array (64 by 64) Activities for Top Half Mercury Closeup. 88 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 1. 38 1. 29 1. 57 1. 61 1. 48 1.25 1.60 2.06 3. 11 1. 38 1. 00 0. 50 1.43 1.55 1.43 1.06 0.931 1.05 1.27 1.88 1.87 1.30 0.88 0.56 1. 24 1. 36 1.75 1.31 1. 13 1.22 1. 16 1.95 1.84 1.39 0.86 0.00 1.36 1.22 1. 13 1.29 1.29 1.50 1.79 2.52 1. 19 1.25 0.82 O. 00 Fig. 18. Initial Rate Allocations for Average Rate of 1. 33 bits/pixel. 3. J. P. Odenwalder, "Concatenated Reed- Solornon/Viterbi channel coding for advanced planetary missions: analysis, simulations and tests, " submitted to Jet Propulsion Laboratory by Linkabit Corp., San Diego, Calif., Final Pep.., Contract 953866, Dec. 1, 1974. 4. J. A. Heller and I. M. Jacobs, "Viterbi decoding for satellite and space communica- tion," IEEE Trans. Commun. Technol. , Vol. COM-19, part II, Oct. 1971, pp. 835-848. 5. "Deep Space Network/Flight Project Interface Design Handbook," JPL Document 810-5, Rev. C. Jet Propulsion Laboratory, Pasadena, Calif., April 15, 1972 (JPL Internal Document). 6. R. F. Rice, "RMZ: transform operations, 1' Technical Memorandum 33-680. Jet Propulsion Laboratory, Pasadena, California, March 1, 1974. 7. R. F. Rice and J. R. Plaunt, "Adaptive variable length coding for efficient compression of space- craft television data, "IEEE Trans. Commun. Technol. , Vol. COM-19, part I, Dec. 1971, pp. 889-897. SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 89 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 ADVANCED FACSIMILE SYSTEM AND NETWORK APPLICATIONS John W. Watkins Harris Corporation, Electronic Systems Division Melbourne, Florida 32901 Introduction Laser, acousto-optics and dry silver material technologies were designed into an advanced facsimile system to achieve high recording speeds, high copy resolution, and chemical-free operation. The basic system was designed so that its per- formance specifications can be modified over a wide range with minor changes required to enable a broad spectrum of applications to be addressed. The technologies that were used and the design features of the facsimile system are reviewed in this report, including a description of the unique optics bench that was designed to provide electronic control of its performance parameters. These unique features, along with the modular design of the system, provide the versatility required to satisfy a wide range of network applications. Laserfax Design Objectives Facsimile and hard copy display applications have been found to require a relatively wide range of operational and functional specifications. While it would be difficult and expensive, if not impossible, to develop a single system that would satisfy all requirements, an approach has been devised by which a significant segment of the applications can be addressed through a functionally designed system whose specifications can be modified significantly without requiring a major redesign. It was first determined that although the facsimile system to be developed would be capable of transmitting and repro- ducing printed documents, its principal capability would be to satisfy those applications that require quality photographic reproductions. Therefore, the system trade-off for a specific application is usually between the desired spatial resolution in the output copy and the transmission time over a predefined data link. The dynamic range or the number of gray shades in the reproduced photograph is normally not a significant trade-off parameter. Thus, the next constraint that was placed upon the design was to require that the resolution of the system be easily modifiable with the document scanning rate being treated as the dependent variable. With new telephone, microwave and CATV data links becoming more available, it was also desirable for the overall speed of the system to be changeable while causing a minimum impact to the basic system design. In addition to the system versatility described above, the equipment would have many convenient operational features such as, copy reproduction on a dry recording material, low cost per copy, daylight loading of the recording material, ease of operation with no operator required at the receiver, low maintenance, high reliability, easy access to all sub- assemblies for service, and others. System Configuration and Operation The drawings presented in Figures 1 and 2 show cross-sectional outlines of the major subassemblies in the facsimile transmitter and receiver, respectively. In the transmitter, the document or photograph to be transmitted is placed in the input chute. Upon command, the capstan drive translates the document past the scanning laser beam that is projected from the optics bench and that is brought to focus on the document as depicted in the figure. With the intensity of the laser beam being held constant from the optics bench, the light that is scattered by the document and that falls onto the detector is therefore proportional to the reflectivity of the photograph or document being scanned. This video signal is then ampli- fied and conditioned for transmission over the data link that has been selected for the particular application. In the receiver, unexposed dry silver paper (3M Type 7771) is fed from the supply cassette around the capstan drive and past the exposure station where the modulated laser beam from the optics beam is brought to focus. After exposure, the paper progresses up through the paper feed mechanism towards the heat processor. When the transmission is complete, the paper is brought forward until the exposed paper passes the cutter bar where it is then cut. The unexposed paper is then retracted to the exposure station and the exposed sheet progresses through the heat processor where it is developed and delivered out the exit chute into the stacking tray. The major subassemblies in the receiver are the capstan drive, the paper supply cassette, the heat processor, the optics bench and the electronics assembly; and in the transmitter, they are the capstan drive, the optics bench and the electronics 90 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Imagery Transmission System (ITS) The Harris ITS is a high speed, photographic quality, secure,digital facsimile, switched network which is currently installed and can provide full duplex connections for up to 60 subscribers. 0 Secure 0 Automatic Unattended Receiver Operation 0 Digital T-1 Rates (1.544 Mb/s) 0 Automatic Opaque/Transparency Scan One Minute Transmission D 285 Scan Lines Per Inch, 6-Bit Quantization 0 Full Duplex, Simultaneous Voice/Data 0 Dry Silver Output 0 TEMPEST Qualified 0 Switch-Selectable Contrast Enhancement 0 Multi-Terminal Broadcast Connections 0 Loopback Fault Diagnosis Approved For iwg For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP801300829A001100060001-0 REMOTE SITE REMOTE TERMINAL SUBSYSTEM CENTRAL SITE REMOTE TERMINAL SUBSYSTEM REMOTE TERMINAL SUBSYSTEM CFE COMSEC 44i_Ho AND LINK-- 1.21...IBSYSTEM CFE COMSEC AND LINK SUBSYST=A USTOMER FURNISHED OUIPMENT REMOTE TERMINAL SUBSYSTEM REMOTE SITE The central site consists of a computer-controlled nonblocking space-division solid-state switch capable of providing point-to-point and broadcast connections on prearranged subscriber lists. The ITS terminal consists of a Harris facsimile trans- mitter/receiver pair and a voice-control unit which are connected in a network arrangement to provide full-duplex simultaneous voice and imagery data over secure T-1 digital lines. The terminal operates at 285 scan lines per inch and 6-bit pixel quantiza- tion to provide extremely high quality photographic images on 3M Dry Silver paper. Transmission time is one minute over 1.544 Mb/s (T-1) data links conditioned through standard COMSEC interface HARRIS ILILP equipment. Various terminal modes are available including unattended receiver operation. A status panel provides the operator with continuous status display of all terminals on the network including in-service, out-of-service and busy conditions. The facsimile transmitter is capable of automatically detecting and scanning either opaque paper copies or transparent film copies. Up to four object enhancement curves with PROM density transfer functions are provided to modify the imagery for shadow stretch or glint enhancement. Several loop- back modes are designed to ease maintenance by quickly isolating system and terminal faults. HARRIS CORPORATION Government Systems Group Communications Systems PO Box 37 Melbourne, FL 32901 (305)727-431 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Imagery Transmission System (ITS) The Harris ITS is a high speed, photographic quality, secure,digital facsimile, switched network which is currently installed and can provide full duplex connections for up to 60 subscribers. D Secure 0 Automatic Unattended Receiver Operation 0 Digital T-1 Rates (1.544 Mb/s) LII Automatic Opaque/Transparency Scan 0 One Minute Transmission 285 Scan Lines Per Inch, 6-Bit Quantization 0 Full Duplex, Simultaneous Voice/Data 0 Dry Silver Output II TEMPEST Qualified 0 Switch-Selectable Contrast Enhancement 0 Multi-Terminal Broadcast Connections 0 Loopback Fault Diagnosis Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 REMOTE SITE ? REMOTE TERMINAL SUBSYSTEM I"CF E COMM' AND LINK"? LaBSYSTEM CENTRAL SITE REMOTE TERMINAL SUBSYSTEM 4N. '\ UP TO 60 RTS I REMOTE SITE REMOTE TERMINAL. SUBSYSTEM CFE COMSEC1 AND LINK 10. SUBSYSTEI2j CENTRAL SITE SUBSYSTEM A oFFE COMSEC 4 AND LINK LS......HUBSYSTEM OPM111?1?011111111 REMOTE SITE REMOTE TERMINAL SUBSYSTEM UP TO 60 RTS V CFE COMSEC AND LINK SUBSYSTLIV..1 / NETWORK NOTE: CFE - CUSTOMER FURNISHED EQUIPMENT REMOTE TERMINAL SUBSYSTEM REMOTE SITE The central site consists of a computer-controlled nonblocking space-division solid-state switch capable of providing point-to-point and broadcast connections on prearranged subscriber lists. The ITS terminal consists of a Harris facsimile trans- mitter/receiver pair and a voice-control unit which are connected in a network arrangement to provide full-duplex simultaneous voice and imagery data over secure T-1 digital lines. The terminal operates at 285 scan lines per inch and 6-bit pixel quantiza- tion to provide extremely high quality photographic images on 3M Dry Silver paper. Transmission time is one minute over 1.544 Mb/s (T-1) data links conditioned through standard COMSEC interface LILO HARRIS equipment. Various terminal modes are available including unattended receiver operation. A status panel provides the operator with continuous status display of all terminals on the network including in-service, out-of-service and busy conditions. The facsimile transmitter is capable of automatically detecting and scanning either opaque paper copies or transparent film copies. Up to four object enhancement curves with PROM density transfer functions are provided to modify the imagery for shadow stretch or glint enhancement. Several loop- back modes are designed to ease maintenance by quickly isolating system and terminal faults. HARRIS CORPORATION Government Systems Group Communications Systems P0. Box 37, Melbourne. FL 32901. (305)727-4311 Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0 DRIVE CAPSTAN EXIT CHUTE FOCAL POINT Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 INPUT CHUTE READ DETECTORS LASER DRIVE MOTOR OPTICAL BENCH Fig. 1. Facsimile transmitter system configuration. CUTTER SUPPLY ELECTRONICS PAPER FEED MECHANISM EXIT CHUTE FOCAL POINT DRIVE CAPSTAN ASSEMBLY PROCESSOR LASER OPTICAL BENCH Fig. 2. Facsimile receiver system configuration. assembly. Each of these subassemblies has many unique features that could be discussed in length. However, in keeping with the purpose of this report, only those features that help clarify the system approach, operation and versatility will be discussed here. The speed of the paper in both the transmitter and receiver can be modified over a wide range by either replacing the drive motors or part of the drive assembly. In many cases, the desired speed can be achieved by simply changing a drive belt and pulley. Thus, the desired paper speed for a particular application can readily be achieved. For applications that require data rates that are significantly higher than can be transmitted over conditional telephone lines, a broadband single element silicon photodetector is used. The mechanical mounts for both detectors are compatible with the assembly. The electronics subassembly is functionally designed with unique functions being isolated to individual circuit cards. This permits a specific function to be modified by interchanging the corresponding card. The entire electronics assembly, including the motherboard, can be replaced with another assembly to satisfy those applications with significantly different system specifications. Although the optics bench was designed so that it can be modified to satisfy a wide range of applications, it possesses a few unique features that enables many requirements to be satisfied with little or no modifications required. The key to the versatility of this facsimile system lies with the optical system design. Therefore, it will be discussed in more detail throughout the remainder of this paper. Optical System Description The basic optical system is the same for both the facsimile transmitter and receiver. It is a straightforward coherent optical system design consisting of a 2 mW helium neon laser, an acousto-optical modulator, focusing lenses and a galva- nometer. This configuration is shown in the conceptual diagram that is presented in Figure 3 and in the optical system schematic that is presented in Figure 4. A long photodetector is mounted on the outside of the optics bench in the trans- mitter to detect the light that is scattered off of the document that is being transmitted. This basic optical system was designed to scan and reproduce photographic documents with a spatial resolution across an 11-inch or less flatbed scan of up to about 300 TV lines per inch within a transmission time of about 1 minute or longer (8.5- by 11-inch document). With 3M Type 7771 dry silver paper as the output medium the system is capable of reproduc- ing up to about 12 shades of gray (IEEE facsimile chart) in the output copy. As will be discussed later, the spatial resolution can be substantially increased above 300 lines/inch by making relatively inexpensive modifications to the optical system. The transmission time can also be reduced for many applications by utilizing a 6 mW laser that is mechanically interchangeable with the 2 mW laser. More expensive coatings can also be used on the optical elements to further improve the overall efficiency of the optical system. The light intensity at the paper can be increased by using a more efficient AOM than is used in the basic system. Refer to the optical system schematic shown in Figure 4. The laser beam from the 2 mW laser is projected through the acousto-optical modulator (AOM) and brought to focus on the optical axis by lens Ll . This focused spot is reimaged by SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information / 91 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 HEAT PROCESSOR EXPOSING STATION /eN ROTATING MIRROR 2 MW LASER (He-Ne) PAPER ROLL LASER ACOUSTO-OPTIC MODULATOR FOCUSING LENS U GALVANOMETER Fig. 3. Laserfax conceptual optical system diagram. 0.9 mm AT 1/e2 Li I= II 111?111110- -41111111 4.6 mm ACOUSTO-OPTIC MODULATOR GALVO _ 0= 7.2 mradians _ i = = _ _ 593 mm 639 mm ru Fig. 4. Laserfax optical system schematic (typical parameters). lens L2 onto the paper after the beam has been reflected by a mirror that is attached to the shaft of the galvanometer. Sta- tionary mirrors that are used to fold the optics path into a rigid and compact optics bench are not shown in Figure 4. A beam splitter that is located between the AOM and lens LI is also not shown in this figure. Its function is to reflect part of the laser beam onto a small photodetector to provide a feedback loop to the AOM that is used to normalize the output power of the laser. In the mechanical package of the optics bench lens L2 is hard-mounted a distance of 639 mm from the paper and the galvanometer is located a distance of 593 mm from the paper. Lens Ll is attached to an axial adjustable mount for two reasons. First, this provides a focus adjustment in the system after the bench has been assembled. The second but most significant reason is that lenses with different focal lengths can be mounted in this location to provide a wide range of spot sizes at the paper. As will be discussed later, this is a significant feature where a high resolution system is to be achieved. The diameter of the laser beam to the left of the lens L2 is normally about 0.9 mm (1/e2 points). In the system shown in Figure 4 the beam has been expanded to 4.6 mm by the lens combination. The diameter of the focused spot at the paper is therefore 92 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 4 do = = 0.112 mm = 0.0044 inch. 7re It is clear that if this spot is used to reproduce photographs at a scan line density of from 100 to 200 lines/inch, a significant raster structure would be present in the output copy. To prevent the raster structure and to provide added ver- satility to the optical system, the focused laser spot at the paper is deflected perpendicular to the galvanometer scan direction at a very high rate. This high rate vertical scanning is referred to as spot dither and is achieved with the AOM. The drive signal to the AOM is amplitude modulated with the video signal in the receiver but its amplitude is held constant in the transmitter. It is also frequency modulated in both the transmitter and the receiver to achieve spot dither at the paper. Figure 5 shows the effective spot profiles in both the horizontal and vertical dimensions with dither providing spot elongation in the vertical dimension. The parameteis presented in the figure are for a typical system with a scan line density SCAN DIRECTION of 110 lines/inch. It is clear that the line height can be decreased by reduc- ing the frequency deviation of the FM signal to the AOM and the line density can be increased by either driving the paper at a slower speed or by increasing the scan rate of the gal- vanometer. Hence, the resolution of the facsimile system is now electronically controlled. The speed of the galvanometer is directly proportional to the video signal bandwidth. And, the speed of the paper determines the transmission time. These electronically controlled system performance parameters are the trade-offs that provide the versatility that is required to enable a wide range of applications to be readily addressed with this basic facsimile system. LINE HEIGHT ACROSS SCAN PROFILE ALONG SCAN PROFILE 11111111111 5 4 3 2 1 1 2 3 4 5 DISTANCE (THOUSANDS OF AN INCH) Fig. 5. Typical spot profile for Laserfax system. SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 93 Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0 AN OPERATIONAL VIDEO DATA COMPRESSION SYSTEM FOR ATS AND ITOS R. L. Kutz NASA Goddard Space Flight Center Greenbelt, Maryland L. D. Davisson University of Southern California Los Angeles, California Abstract An operational data compression system has been developed and implemented for transmission of digitized ATS and ITOS-VHRR satellite video data over the wideband communication link between the Wallops Island, Va. Command and Data Acquisition Station and the National Environmental Satellite Service at Suitland, Md. This system uses mini- computers for the coding and decoding of the data to achieve maximum flexibility together with specially designed inter- face equipment for greater efficiency. No loss in data quality occurs due to the compression, and, in certain cases, data is transmitted which would be otherwise unavailable due to the limited channel capacity. This paper describes the method of compression, the equipment used, and the compression results attained. Summary A real-time programmable data compression system has been developed for operation over a digital wideband com- munication link between the National Environmental Satellite Service (NESS) Wallops Island, Virginia, command and data acquisition station and the NESS Data Processing and Analysis Division at Suitland, Maryland. The NESS is part of the National Oceanic and Atmospheric Administration (NOAA) of the Department of Commerce. The source data consist of satellite derived images of the earth from which environmental data are extracted by NESS and others. Four spacecraft have been used to provide the source data. The first imagery data used came from the Applications Tech- nology Satellite (ATS) 1 or 3 Spin-Scan Camera and the imagery currently being employed comes from the NOAA 2 or 3 spacecraft Very High Resolution Radiometer (VHRR). The imagery data are received at Wallops Island, Virginia, com- pressed, transmitted over a 300 km wideband ground link to Suitland, Maryland and reconstructed in real-time. Using general purpose minicomputers together with specially designed input/output channels, a reduction in channel time- bandwidth requirements of two-to-one can be attained with zero loss of data quality with respect to the quantized uncom- pressed data. As far as the final user is concerned, the system is invisible. Because of the tremendous flexibility of the minicomputers, it is possible to vary the compression algorithm and to perform auxiliary tasks such as data for- matting. The compression system has seen continuous use 18 hours a day from December 1972 through July 1974. Introduction For many applications, technology has created the ability to generate ever increasing amounts of data. With the generation of this data comes the increased cost of transmission, storage, and processing in proportion to the greater channel capacity required. Data compression is a highly promising technique which can be applied to reduce the increased capacity requirements through more efficient coding of source data. Such techniques have been studied for the last ten years but as yet have had little impact on actual operations due, at least in part, to the increased hardware complexity, lack of appreciation by operational people, and reluctance on the part of users to allow their data to be subjected to any more manipulation than is absolutely necessary. The time has come, however, when economic con- siderations necessitate the careful review and evaluation of techniques that suggest a more cost effective use of the channel storage and computer capacity, e.g. data compression. This paper describes one important application to an operational situation. Large quantities of earth-observational data are now generated and it is possible to apply sophisticated processing to that data for many purposes. Principal among these data are meteorological observations for the many aspects of weather prediction with their obvious economic benefits. The use of satellites to acquire images of the earth on an operational basis has dramatically increased the quantity and quality of data involved in weather prediction. The NOAA image data are transmitted from two Command and Data Acquisition (CDA) stations to a central location for computer processing. No processing capability exists at the CDA stations. The particular link used for the data compression system to be described is a microwave link between the CDA station at Wallops Island, Virginia and the Data Process- ing and Analysis Division of NOAA/NESS at Suitland, Maryland. This link has been used operationally since 1969 for digital ATS spin scan image data without data compression. The recent deployment of the Synchronous Meteorological Satellite (SMS) involves the transmission of higher resolution data than currently installed NOAA/NESS ground trans- mission equipment can handle. It is apparent that the channel capacity needed to transmit meteorological data will con- tinue to increase as the multiple SMS and TIROS-N spacecraft go into operation. 94 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 With the required large channel capacity, data compression provides a method for significant economic savings in transmission costs. Basically, data compression in this application is based on the fact that data from TV pictures of the earth have a significant amount of redundancy from one resolution element to the next. Nevertheless, since it is not known, a priori, where the redundancy will exist within the picture, the data transmission system without data compression is generally designed as if no redundancy exists. In effect the system transmits each resolution element independently. Data compression techniques involve the processing of the picture data to remove redundancy prior to transmission so that a reduced quantity of data is transmitted. Upon reception of the compressed data, processing reinserts the redundancy to reproduce the original image data. Because of the potentiality of data compression, a feasibility study was initiated by the University of Southern California in participation with and under contract to NASA Goddard Space Flight Center with the cooperation of NESS. Theoretical studies complemented by an empirical analysis of taped digital data verified that data compression could be applied at a significant economic savings to ATS video data. After the NOAA-2 spacecraft launch in October of 1972, the ATS derived data compression algorithm was applied to the NOAA-2 Very High Resolution Radiometer (VHRR) data. A performance comparison verified that the ATS data compression technique could also be used with VHRR data to pro- duce a significant economic savings. A more detailed description of the data, the compression techniques, and the equipment is contained in the succeeding sections. Data Source Two types of spacecraft have been used to provide the source data. The first type is the ATS (1 or 3) spacecraft in geostationary orbit ? height 35,780 km, period 24 hours ? and the second is the NOAA (2 or 3) spacecraft in sun- synchronous polar orbit ? height 1460 km, period 115 minutes. The ATS 1&3 spacecraft are spin stabilized with a spin rate of 100 rpm and with the spacecraft spin axis parallel to the earth's spin axis. The ATS data used is from the ATS spin scan camera which sweeps one image line per space- craft revolution. The ATS spin scan camera image line period is determined by the spin period of the spacecraft, nominally 600 milliseconds. To permit the full earth disc to be scanned from pole to pole, a worm drive moves the image field of view via the scan optics so that the optical axis stays in the plane containing the spacecraft spin axis. The two mile image field of view is moved southward two miles for each revolution of the ATS spacecraft. A full earth image takes 24 minutes to scan 2400 image lines. Since the ATS 1&3 spacecraft are in synchronous orbit, the earth subtends an angle of 18 degrees or 1/20th of a revolution. The video data are derived from a photomultiplier tube in the spin scan camera which points at the earth for only 30 msec. out of each 600 msec. scan period. The data are transmitted in analog form to the CDA station where it is digitized for transmission to the central data processing loca- tion. A digital synchronizing circuit uses a spacecraft derived sun pulse to produce a signal to indicate when the A/D converter should begin to operate for the 30 msec. of useful data out of each 600 msec. spin period. The sampling rate and digital resolution depends upon the user requirements and the transmission capability. Nominal characteristics are 4096 samples per line each quantized to one of 64 levels and represented by a six bit binary word. The communications link is a 48 khz wideband group with Western Electric 303C modems operated asynchron- ously without a scrambler. The NOAA spacecraft are 3-axis stabilized so that the scanning radiometers always point earthward. The space- craft rotates about its pitch axis once per orbit with the pitch attitude being nominally maintained to within ?0.5 degrees. The NOAA spacecraft data used with the data compression equipment come from a scanning radiometer called the Very High Resolution Radiometer (VHRR). The two-channel scanning VHRR instrument is sensitive to energy in the visible spectrum (0.6 to 0.7 microns) and infrared window (10.5 to 12.5 microns). Within the VHRR instrument, energy is gathered by a 12.7 cm eliptical scan mirror, set at an angle of 45 degrees to the scan axis, and a telescope. The scan mirror rotates at 400 rpm which causes the half nautical mile image field of view to sweep the earth in a direction per- pendicular to the NOAA spacecraft ground track once every 150 msec. One image scan line is generated per revolution of the VHRR mirror. The consecutive scan lines of the VHRR data can be used to produce an image due to the space- craft motion along the orbital track which causes the subsatellite point to move one half nautical mile per 150 msec. ? one scan line period. The VHRR instrument produces a continuous stream of scan line image data without frame seg- mentation. Since the NOAA spacecraft are in a low altitude orbit compared to ATS 1&3 the earth is viewed for one third of each VHRR scan or about 50 msec. The VHRR data are handled in two ways. First, the VHRR data are continuously transmitted throughout every orbit. This mode of operation is known as the High Resolution Picture Transmission (HRPT) and at the ground station the data are designated HRPT data. Second, in addition to the HRPT mode of opera- tion up to 8.5 minutes of data may be recorded onboard the spacecraft for later playback when commanded during a CDA contact. The spacecraft recorded VHRR instrument data are designated VREC data. The VREC data are played back from the reel-to-reel spacecraft recorder by driving the tape in the reverse direction and at the same speed as that used for recording. At the CDA station the VREC data differ from tie HRPT data in that the time axis is reversed, i.e. the VREC data are backwards with respect to the HRPT data. Tape speed variations affect both timing and amplitude of the playback VREC data (FM recording is used), however the timing error is much more noticeable in the displayed image without compensation. Image compensation is provided for by recording a reference tone along with the VREC instrument data on the spacecraft recorder. During playback the reference tone is transmitted on a separate subcarrier to permit its use on the ground in compensating for tape recorder flutter and wow (F&W). The space-to-ground NOAA- 2&3 communications link uses FM-FM modulation. At the CDA station the NOAA spacecraft data are demodulated to produce FM modulated HRPT data, VREC data, and the VREC F&W reference signal. These three signals plus a CDA crystal controlled reference tone are recorded on an instrumentation ground recorder in the 'direct mode' at 60 inches SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 95 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0 per second (ips). All of the NOAA-2&3 data processed by the data compression system is obtained from the CDA instrumentation recorder operating at one fourth the record speed or 15 ips. The VREC F&W signal passes through two record/playback steps and the HRPT F&W signal through one step since it is generated locally at the CDA station. No attempt is made to correct the amplitude variations introduced in the FM modulated HRPT & VREC data signals due to the F&W of the analog tape recorders. However the F&W signals are used to control the sampling rate of the analog to digital (A/D) converter and this corrects for timing variations. The A/D converter is part of the data com- pression equipment and is preceded by an FM discriminator, figure 1. ATS NOAA S/C RECEIVER SYNCH. A/D ATS 7.--04 A/D HRPT VREC RECEIVER RECORD 60 IPS -00 BUFFER INPUT CHANNEL MINNOMMIMIN????111111. 303C TO 7"."1MODEM SUITLAND 11 INTERDATA 4 L OUTPUT CHANNEL ASR 33 9TK - 800 CPI TELETYPE MAGNETIC TAPE PLAYBACK 15IPS ANALOG TAPE Li FM DEMODULATOR Figure 1. Wallops Island Data Compression System Implementation Requirements and Constraints In the case of the ATS data compression experiment an operational digital transmission system had been installed in 1969 and was in daily use between the Wallops CDA station and the Suitland NESS central data handling facility when the data compression experiment was conceived. At Suitland the ATS data is used to produce hundreds of wind vectors over the oceans on a 400 kin grid to be used as one type of input to the primary NOAA numerical weather model. The ATS data is also converted to an analog facsimile format which is transmitted over 4 khz phone tines with C5 condition- ing to remote weather analysis centers e.g. the hurricane watch center in Miami, Florida and the tornado watch center in Kansas City, Kansas. In an on-line weather data transmission system, particularly where the data is being used to monitor severe storms, reliability is very important. For this reason, it was necessary to test the data compression system in an off-line mode before it could be used to transmit real-time data in an operational mode. Since the non- data compression operational ATS Data handling system between Wallops and Suitland handles digital data quantized to six bits per sample in a gray code, it is necessary to deliver reconstructed data from a data compression system in the same format. NESS also requested that the data compression system not be allowed to introduce errors in the samples due to the compression algorithm. The data compression system is required to be transparent to the receiving terminal at Suitland and to the data user. To provide for ease of testing and rapid recovery in the event of a malfunction, the data compression system is designed to be inserted or removed from the transmission system by the operation of a single switch at each end of the wideband communications link. The operation of the data compression system is designed to be simple to insure that operator errors are minimized. Several versions of the software interface between the operator and the data compression hardware were required to achieve a satisfactory level of operation. The NOAA spacecraft data can be effectively handled by preprocessing to provide an improved line synchronization, calibration, and gray scale correction. The image display device used with the NOAA spacecraft data is the same as that used for the ATS data and all users on the ATS facsimile circuit can receive NOAA 2&3 VHRR images. Since it is desired to produce a 1:1 aspect ratio in the center of the VHRR images and the line to line spacing is fixed by the worm drive in the photofax display, it is necessary to select a sampling rate which allows 3/4 of the earth view to be dis- played. This is a satisfactory compromise since the geometric distortion of the image is significant beyond the area displayed, due to the low 1460 km orbit of the NOAA spacecraft. The photofax display available for the VHRR data is the same one used to display ATS image data and operates at 100 rpm. At the CDA station the VHRR data are received at a 400 line per minute rate and recorded on a Mincom instrumentation recorder at 60 ips. It would have been prefer- able to digitize the VERB data at the CDA station as it was received from the spacecraft, however the ATS data com- pression hardware cannot handle the high data rates necessary. A high speed large volume digital data buffer would be required and funds were not available for its procurement. The analog Mincom recorder is used to replace the digital buffer and to slow down the VHRR line rate from 400 to 100 lines per minute by playing back the tape at 15 ips. Two types of operations are required for the VHRR data handling; the first type transforms the VHRR data into 96 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0 FROM WALLOPS ISLAND Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 303C MODEM BIT SYNCH. 0 io Crt'( INPUT CHANNEL PICTURE PROCESSOR DIGITAL DATA TO NESS COMPUTER ANALOG VIDEO DATA TO KANSAS CITY & MIAMI INTERDATA/4 MINICOMPUTER ASR 33 TELETYPE OUTPUT CHANNEL 9TK -800 CPI DIGITAL TAPE Figure 2. Suitland Data Reconstruction System a format compatible with the ATS digital display and data handling equipment. This has the advantage of requiring only half of the processing equipment ? the Wallops end of the compression system. The second type of operation required is that digital VHRR data be quantized to 8 bits and transmitted to Suitland. An additional requirement of the second type of operation is that 100% of the VHRR scan be reconstructed at Suitland while only 50% was needed in the first type of operation. Since the use of data compression could accomplish the data volume objectives of the second type operation, it was further required that no error be introduced in the reconstructed data due to the compression algorithm in the absence of channel errors. Data Compression Techniques The goal set for the data compression system is to attain a reduction in transmission rate of two to one with six bit ATS data or eight bit VHRR data and not introduce error due to the compression algorithm. The algorithm used has been chosen to meet the real-time requirements for error free compression within the limitations of the Interdata 4 minicomputer used at each end of the compression system. The data compression algorithm can be varied within wide limits because of the use of general purpose minicomputers to do the encoding and decoding. Presently the algorithm forms successive sample-to-sample differences and encodes the differences of up to a maximum magnitude of 9 for six bit data or 14 for eight bit data using a variable length code. When the difference exceeds the maximum magnitude, the sample value itself is sent with an appropriate prefix to distinguish this from a difference code word. The code words are chosen using the Shannon-Fano encoding procedure with empirical frequencies for the relative probabilities. In particular, a difference greater than 14 is encoded by sending a prefix of four zeros followed by the sample from the data. The above procedure was found to result in a code whose efficiency is greater than 90%. The code words are characterized by a prefix whose leading zeros identify each code word's length and decoding table. When redundancy is removed, the data quality becomes more sensitive to channel error. To combat this effect, controlled redundancy is added in the form of synchronization words. In addition to the usual line synch., data synch. words are periodically inserted in the encoded data after each fixed number of encoded samples. Following each synch. word the difference from zero is sent rather than the sample difference to eliminate any error accumulated at the re- ceiving terminal. Each scan line starts with a 32-bit line synchronization word, which establishes both line and word synchronization. The 9096 data samples in each scan line are next encoded into 16 blocks with each block represent- ing 256 data samples. At the end of each block is a 16-bit data synchronization word which ensures that word syn- chronization is maintained and that word synch, can be reacquired within 256 data samples in the event of a synch. loss. Due to the variable length coding, the number of bits between two successive synchronization words varies. Although the data compression encoding procedure selected leads to a data format where word synchronization depends on a low probability of channel error, experience has shown that the data compression system has a more robust syn- chronization characteristic than an image transmission system in which there is only one line synchronization word on a scan line and in which word synchronization is obtained from a fixed word length data format. The variable block length and hence variable line length data format produces a variable number of bits per image scan line; the number of bits depends on the redundancy in the scene at the scan line location. In figure 3, the image scan line number runs from 0 to 2000 on the abscissa, andthe ordinate indicates the per- centage of non-redundant data from the original uncompressed line remaining to be transmitted after data compres- sion. The image data to which figure 3 corresponds is an ATS spin-scan cloud cover image of the full earth disc enclosed by a black space-view surround. Figure 3 shows that more data must be transmitted in the middle of the picture than at either end, since the intersection of the scan line with the earth disc has its maximum chord length near line number 1000. However, not all of the data compression in each line is due to the constant amplitude space-view, since space-view accounts for only 10 percent of the uncompressed data near line 1000 and yet the compression ratio exceeds two-to-one in this area of figure 3. The area above the curve of figure 3 represents the increased capacity of the data compression communication system as a function of the ATS image scan line number. SPIE Vot 66 (1975) Efficient Transmission of Pictorial Information / 97 Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 100 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 1000 COMPRESSED LINE NUMBER Figure 3. % Data Remaining After Data Compression on a Line-by-Line Basis 2000 ENTER MODE (0 - 2CT XV X V TAPE OUTPUT ATS HRPT 2 VREC 0 1 2 3 4 5 6 7 FBI BA OFF OFF RECORD COMP. RECORD GRAY RECORD 8811 RECORD 8 BIT RECORD 8811 READ 8 BIT HEAD 8 BIT GRAY READ OR COMP. RECORD COMP. RECORD GRAY RECORD QUANTITATIVE COMPRESSED GRAY OFF OFF OF F GRAY COMPRESSED GRAY COMPRESSED GRAY COMPRESSED COMPRESSED GRAY OFF Figure 4. Wallops Commands Since data compression can produce short term increases in data quantity, the shorter of the compressed image line and the original image line without compression can be transmitted with a special code used to indicate which. In spite of the obvious varying data activity from one pole of the earth to the other in figure 3, the compressed line length remains relatively constant suggesting that the system can be quite effective with only a one line buffer. This is in fact true except for noisy scan lines which are usually of little interest to the user. Compression System Hardware The Wallops Island data compression equipment configuration is shown in block diagram form in figure 1. The front end of the system accepts analog data from one of three sources ? ATS, HRPT, and VREC, of which the latter two are derived from the NOAA-2&3 spacecraft VHRR instrument. The analog data are sampled, quantized to eight bits and stored in the Interdata-4 (I4) core memory through a direct memory access channel. The A/D sampling rate is controlled from the 14 by loading a register with the desired sampling rate control word. Between the A/D converter and the 14 memory is the A/D and direct memory access control logic. The sampling rate of the A/D con- verter may be phase locked to the F&W signal from the CDA ground recorder or a crystal controlled oscillator. The input direct-memory-access (DMA) logic may be triggered to begin storing A/D samples in memory. An externally derived line synchronization pulse is used for this purpose. There are four DMA channels in the data compression system, two at Wallops Island and two at Suitland. At Wallops and Suitland one DMA channel is used for input and one is used for output. A DMA channel has a 16 bit register which may be loaded by an instruction from the 14. This register is called the Pointer Register (PR). The PR must be loaded with the 14 memory address of the DMA channel instructions. The DMA channel instruction list contains three types of information, the start and end of an 14 memory data I/O buffer, the address of the next instruc- tion list, and DMA channel control bits to indicate the action of the channel when an I/O buffer data transfer is com- plete. The starting and final address registers of the DMA channel are loaded by the DMA control logic from the 14 core memory instruction list once the 'DMA START' instruction is executed by the 14. Once started the DMA channel requires no further action by the 14 CPU and thus, leaves the CPU free to compress data, etc. The DMA channel can generate interrupts when each I/0 buffer is complete. This permits the 14 computerprogram to maintain syn- chronization with the I/0 to be processed. It is a simple matter with this type of DMA channel to generate interrupts at the end of several regions of an input scan line and initiate different types of processing algorithms for each region. This procedure is used to control the timing of the data compression/reconstruction software. There are three other elements of the data compression hardware which are used in both the Wallops Island redundancy removal equipment, figure 1 and the Suitland data reconstruction equipment, figure 2: an ASR 33 teletype, a 9 track 800 CPI digital magnetic tape drive, and the 16K byte (eight bit) core memory of the 14 minicomputer ? 1K = 1024. The teletype is used to transmit operator commands to the compression system software and to display the compression system status both for the operator and as a historical record for future reference. Programs are entered into the 14 memory from the digital 9 track tape. Simple revisions to the software are made while the pro- gram is core resident by modifying, adding, deleting the necessary 14 instructions/constants, and finally writing the modified core image out to the digital tape for later re-entry. The 800 CPI 9 track tape runs at a speed of 25 inches per second which produces a data transfer rate of 20,000 bytes per second. The transfer of data between tape and the 14 memory is handled by a standard Interdata Selector Channel which operates as a DMA interface between the tape controller and the core memory. The Interdata Selector Channel must be set up under program control of the 14 and can transfer only a single sequential data area before it halts and must be set up again. Major changes to the compression system software are accomplished at GSFC or USC, stored on digital tape, and sent through the mail to be used at Wallops Island or Suitland. However, the main use of the digital tape is for the temporary storage of processed data. At Wallops the digital tape is used for recording when the Wallops-Suitland communications link is unavailable. The advantage of having the data on digital tape is that for the VHRR data the image data stream can be easily broken into 2400 line frames with a 100 line overlap. The overlap consistency is difficult to maintain when working from analog tape. 98 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 UNDERLINE Indicates Operator Input B. V. 17.4 3/28/74 ENTER MODE(0-2C):10 MODE 10 VIIRR TAPE OFF D* 040974 * ST ST ]3.V. 17.4 3/28/74 ENTER MODE(0-2C):10 MODE 10 VHRR TAPE OFF T* 1900Z R/0* 1901 D* 040974 VIS OR IR* VIS T* 19352 040974 1900Z VHRR 1901 VIS R/O* 10-908 TELE ON SEND COMP VIS OR IR* IR * GO 040974 1935Z VIIRR 1908 IR GO TELE ON SEND COMP * L=100 * GO L=100 GO * L=1000 * L=100 L=1000 L-100 * COMP. RAT. =2.92 * COMP. RAT.-2.15 END 1000 LINES END 0100 LINES Figure 5. Example of System Control Teletype Input-Output Figure 6. ATS-3 Spin Scan IMAGE 17, Dec. 21, 1972 The Wallops CDA compression hardware contains an output parallel-to-serial converter to interface the 14 memory to the 303C modem. The output bit rate is selected from the 14 software by loading a register in the output channel control logic. Between the 14 core memory and the output controller logic is a DMA channel described above. Note that none of the compression operations are done with special purpose hardware. The NESS data reconstruction equipment at Suitland is similar to the Wallops Island system. However the input and output device controllers are different. The input device controller receives a clock and serial data from a bit synchronizer which is in turn driven from the receiving Western Electric 303C modem at Suitland. The input channel logic can be set up to find both the 32 bit line synch. and the 16 bit data synch. using a Hamming metric with the allowed distance usually set to zero. The input channel is essentially a serial-to-parallel converter with two select- able modes of operation provided for handling the serial input data. The first mode assembles sequential 16 bit words of data from the incoming data stream and sends the parallel words to the DMA channel to be stored in the 14 core memory. The second mode uses a hardware decoder to help reconstruct compressed data. The code class which can be decoded by the hardware must have a prefix of N leading zeros terminated by a one, N is from 0 to 15. The hard- ware compression decoder has been used to reconstruct six bit source data. Implementation of the hardware com- pression decoder was undertaken to speed up the data reconstruction process. It has since been found that the pure software reconstruction algorithm for eight bit source data could be written to operate nearly as fast as the six bit data hardware/software decoder combination. The hardware decoder, as implemented, has a disadvantage in that it stores 16 bits in the 14 memory for each sample decoded, producing a temporary data expansion, and does not allow compressed data to be buffered since it is not available in the core memory. At Suitland the output device controller is the final element of the data compression communication system. The output device controller is basically a parallel-to-serial converter which can produce a data stream in the same for- mat as the Suitland bit synchronizer. The Suitland output device drives the NESS digital picture terminal which is the distribution point at Suitland for analog facsimile data to be used by remote display devices as shown in figure 2. The NESS digital picture terminal also generates images for use at Suitland and sends digital data to the NOAA weather computer, an IBM 360-195. The output channel requests data from a DMA channel connected to the 14 memory. There are two selectable modes of operation for the output channel. First, each eight bit byte can be con- verted to the six bit gray code format required by the digital picture terminal. Second, consecutive 16 bit halfwords from the 14 memory are shifted out of the output channel device in one long serial stream. The second mode is used to test the Suitland compression equipment input channel. The input device and output device are designed to be driven from independent data clock rates, although only a common clock has been used to date. Data Compression System Operation To operate the data compression system the operator initiates execution of the compression system executive software and receives a response defining the program revision number, the date revised, and a request that the operating mode number be entered as shown in figure 4. The operator enters a two character operating mode on the teletype console. The first digit indicates the data source, 0 = ATS, 1 = HRPT, and 2 = VREC. The second digit selects the I/0 devices and the type of processing to be used. The operating system responds by requesting that documentation information be entered by the operator. For example, D*, T*, R/0*, VIS OR IR* indicate the date, time, spacecraft orbit, and whether the data is visible or infrared ? see figure 5. The VIS OR IR input is used both for documentation and to select the appropriate display gray scale lookup table. Messages are displayed to indicate the status of the input/output devices, e.g., TAPE RECORD 8 BIT, TELE ON, and SEND COMP. When the operator SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 99 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 VIS Figure 7. NOAA ? 2 VHRR Images of Hurricane "AVA" Rev 2942, June 7, 1973 is ready to begin processing data he can type GO to start or SE to enable the processing of data which will start upon the receipt of a frame synch. signal. At Wallops the termination of the processing of image data by the data com- pression hardware can occur in two ways ? either the line count reaches a preset limit or the operator types FI to finish processing. The line count limit is automatically set to 2400 by the compression system software before the processing of each image but can be changed by the operator command L=XXXX, e.g. L=1000. On the front panel of the Interdata-4 computer are two rows of display lights. Each row of display lights contains four hexadecimal light emitting diode display characters which can be loaded under program control. The upper and lower rows of the 14 display are loaded with the decimal line count limit and the current line count respectively. At Suitland the end-of- image is found by reaching the preset line count limit or by receiving a frame-complete synch, word from Wallops. At Wallops two types of operating procedures are employed. When the ground link to Suitland is available data are sent in the desired format. When the ground link is busy with other traffic, analog data is input to the compres- sion system and stored on digital tape for later playback to Suitland. These digital tapes are selectively sent to GSFC or USC for evaluation. At Suitland the data compression system is used in two ways. First, to receive, process, and record on digital tape data to be used in the IBM 360-195 operational weather data processing system. Second the data reconstruction system can drive the digital picture terminal and its associated facsimile remote displays. Results The Wallops Island, VA to Suitland, MD data compression system has been operating for the past two and one half years, has demonstrated a high degree of reliability, and has helped to achieve a high degree of user confidence in the value of compressed data. The compression ratio varies depending on the image scene content but a typical value is 2:1. The quality of the received and reconstructed data have been monitored in two ways. First, based upon a visual review, the images produced have been found to be of equal or superior quality to those produced without data compression ? see example in figures 6 & 7. In the rare cases where a channel error affect is discernable in a reconstructed data compression image, the effect shows up as a streak of less than 1/16 of the image scan line length. Second, an error detection code shows less than one percent of the reconstructed image lines are in error. These image line errors are caused by channel errors and not by the compression system hardware or software. The role of data compression is not yet clear in the operational NOAA/NESS spacecraft program. The Alaska CDA to Suitland, MD communication system will very likely be reconfigured to take advantage of the economies inher- ent with communication satellites for long ground distances. Satellite-to-satellite communication systems are being planned which will reduce the necessity for on-board storage and allow continuous real time transmission of space- craft sensor data to the central data processing facility. Perhaps the ground storage and processing areas will be the first to benefit from the practical application of data compression techniques. Acknowledgement The authors are pleased to acknowledge the encouragement, operational support, and helpful comments of R. Koffler, J. Leese, C. Bristor, E. Connelly, and H. Van Dyke of the NOAA/NESS. 100 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 A CAQ BANDWIDTH REDUCTION SYSTEM FOR RPV VIDEO TRANSMISSION James J. Pearson Lockheed Palo Alto Research Laboratory Palo Alto, California Abstract A bandwidth compression system for the transmission of video images from remotely piloted vehicles has been built and demonstrated. Novel features of this system are the use of the Constant Area Quantization (CAQ) technique to obtain spatial bit rate reduction of 6:1 and a rugged and compact scan convertor, based on a core memory, to accommodate temporal frame rate reduction. Based on the ability of the human eye to perceive more detail in high contrast regions than in low, the CAQ method transmits higher resolution in the former areas. The original six-bit digitized video is converted to a three level signal by the quanti- zing circuit and then Huffman - encoded to exploit its statistical properties and reduce it further to one-bit per pixel. These circuits operate on one line of the picture at a time, and can handle information at full video (10 MHz) rate. The compressed information when received on the ground is stored in coded form in a two-frame (500,000 bit) digital core memory. One frame of the memory is filled while the other is being displayed and then the two are interchanged. Decoding and reconstruction of the video are per- formed between the memory and the display. Introduction The use of remotely piloted vehicles (RPV's) to perform tactical reconnaissance and strike missions under conditions too dangerous to permit the risk of sending manned aircraft is being given increasingly serious consideration by the armed services. The eyes of the absent pilot are replaced by transmission of video imagery from an onboard camera, and the success of the mission depends crucially on the integrity of the video link. Video signals are inherently broad band, and the necessity of protecting them from jamming by some form of spectrum spreading technique makes even greater demands on the limited available spectrum. When transmissions from a number of vehicles are occurring simultaneously, as envisioned in most RPV battle scenarios, the spectrum required is simply unavailable. Therefore, some means must be employed to reduce the bandwidth of the video signal prior to transmission. Fortunately, a sequence of meaningful video images contains enough redundancies to permit its com- pression by a sizeable factor without the loss of significant information. These redundancies are of two basic types--temporal and spatial. The temporal, or frame-to-frame, redundancies are most practically exploited by reducing the frame rate from the standard thirty frames per second to some lesser number, dictated by the particular mission requirements, through the use of some form of slow-scan camera. The spatial, or single frame, redundancies arise from the existence of finite range correlations within the picture. It is in the choice of a method to exploit these spatial redundancies that the constraints of the RPV application are most strongly felt. Since the vehicles are small and light, often comparable to large model airplanes, the equipment must be compact and lightweight and require little power. In addition, since the RPV's are, to some degree at least, expendable, the equipment must have a low cost. With this application and its attendant constraints in mind, the Lockheed Palo Alto Research Labora- tories have designed, built and demonstrated in the laboratory a bandwidth compression system based on the Constant Area Quantization (CAQ) technique. The Lockheed system performs spatial compression to reduce a six-bit per pixel digitized input signal to one-bit per pixel. It is designed to operate either at full video rate or at any reduced frame rate for which the jerkiness and delay are tolerable in the particular application. In the reduced frame rate case, scan conversion is performed on the ground in a rugged and compact digital memory, and an analog signal is delivered to a standard monitor at thirty frames per second. Principle of Operation Constant Area Quantization is a one-dimensional spatial compression technique. It is based on the property of human vision that the eye sees more detail in high contrast regions than in low con- trast ones. This property is illustrated in Fig. 1, where the circles decrease in size from left to right and in contrast from top to bottom. Fig. 1 Test pattern demonstrating the principle of Constant Area Quantization. ? SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information/ 101 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 If the figure is viewed under conditions of distance and illumination such that some but not all of the circles are visible, it will be seen that the cutoff is not along a verticle or horizontal line but along a diagonal, indicating a combination of size and contrast. This fact is exploited by transmitting higher resolution in the high contrast areas where the eye will detect it, and lower resolution elsewhere. In practice, this is accomplished in the simple way shown in Fig. 2. The bold line in 2 (a) repre- sents the analogue video signal which is sampled at a rate indicated by the tick marks. The smooth curve is replaced by line segments, beginning and ending at sample points, in such a way that the right triangles with these segments as hypotenuse and sides parallel to the coordinate axes all have the same area, A/2. The value of the area threshold A, is an arbitrary parameter chosen at the outset. It has the somewhat peculiar units of brightness levels times pixels. At each sample point the output signal in 2 (b) is generated. This signal is zero except at the end of a line segment, where it is either +1 or -1 (P or N) depending on whether the signal increased or de- creased during that interval. In this way, a six-bit (64 level) input signal is reduced to a three level one. In binary terms three levels corresponds to 1.58 (= log 3/log 2) bits, and this value can be approached arbitrarily closely by coding a number of picture elements together. (For example, the 243 possible values for five adjacent pixels can be represented by one eight bit number, giving an average of 1.6 bits per pixel.) The original picture can be recon- structed from the three level signal in two different ways illustrated in 2 (c) and 2 (d). In the first or "slope" method, the line segments in 2 (a) are reconstructed from a knowledge of the base and area of the appropriate right triangle. In the second, or "step" method only the endpoints of the segments are computed and the intervening pixels are filled in with constant values. The relative performance of the two reconstruction techniques is discussed later, but the step re- construction is used in the actual system. PP 0 0 0 0 0 0 0 1 t 1 0 11 1 0 000 1 111 111 CAQ OUTPUT SIGNAL .......- SLOT,E RECONSTRUCTION 1 000000 FrIl STEP RECONSTRUCTION 1 I 1TI11111. Fig. 2. Constant Area Quantization applied to a simulated video signal. The way in which the three level signal is generated is shown schematically in Fig. 3. The video signal is sampled and digitized to form the signal Vi for pixel i. Vi is subtracted from the reference level R corresponding to the beginning of a line segment in Fig. 2 (a). The magnitude of the resulting difference is then compared to the value A/Ai, where Ai is the time difference between pixel i and the pixel at the beginning of the segment. If IVi - RkA/Ai this indicates that the area, 1/2 IVi - RI.Ai, of the previously mentioned right triangle exceeds the value A/2. When this occurs, a P or an N is produced de- pending on the sign of Vi - R; otherwise a zero is r? cmi, 1 output. Whenever a P or an N is generated, a new I At COMPARISON reference level, R, is formed by the reconstructor, and SYNC. ---fd CLOCK --- WAVEFORM a new line segment is thus initiated. The value R, '---1-4"'GENERATOR which is identical to the picture element which will be 1 I reconstructed on the ground, differs from the actual video value, V, because the threshold is in general not IvSUBTRACTOR , reached exactly at a sampling point. R is used instead COMPARATORI-on G. P.N of V to prevent the accumulation of errors. ANALOG VIDEO ?4.1 SIGNAL SAMPLER is A TO CONVERTER RECONSTRUCTED SIGNAL R - L SIGNAL RECONSTRUCTOR P OR N Fig. 3. Constant Area Quantizer block diagram. 102 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Experience with a number of pictures indicates that the three output signals of the CAQ are not equally probable. In fact, zeroes predominate to such an extent that a considerable additional bandwidth com- pression can be achieved by using some form of statistical coding. The code actually chosen is a Huffman code, applied to the nine possible values of a pair of adjacent pixels. The code itself is given in Table I. It is a variable length comma-free code, which assigns the shortest code (0) to the most proba- ble event (two adjacent zeroes) and the longest codes Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 to the least probable events. This encoding is used to reduce the picture to one bit per pixel. This value of one bit per pixel is strictly maintained for each line. This can result in a small amount of truncation at the right hand Table I. Huffman code for ternary-to-binary conversion Original Symbols Probability Binary Code Code Length Average Code Length 00 .5390 o 1 .5390 edge of a particularly busy line, but that is OP .0976 101 3 .2928 the only additional image degradation introduced ON .0976 110 3 .2928 by the Huffman encoding. PO .0976 111 3 .2928 NO .0976 1000 4 .3904 System Description PP .0177 100110 6 .1062 PN .0177 100111 6 .1062 The bandwidth compression system, as it would NP .0177 100100 6 .1062 be configured for flight is outlined in Fig. 4. NN .0177 100101 6 .1062 The analog video output from either a standard 2.2326 or slow scan video camera is sampled and digi- tized to six-bits (the rest of the system can handle eight-bits if sufficient camera quality and A/D convertor complexity are justified). The digital signal is fed to the Constant Area Quantizer, which outputs one of three values (0, P, N) at the pixel rate. This three level signal goes to the Huffman encoder whose output comes in bursts of from one to six-bits ?at the end of each two-pixel interval. This coded signal must be buffered for uniform transmission. The signal at the output of the buffer has a bandwidth corresponding to one-bit per pixel and can be trans- mitted over any desired form of digital anti-jam data link. SENSOR -11???? ANALOG TO DIGITAL AIRBORNE EQUIPMENT CONSTANT HUFFMAN AREA QUANTIZER CODER BUFFER SYNCHRONIZATION AND TIMING GENERATOR GROUND EQUIPMENT I2 FRAME REFRESH BUFFER MEMORY [SYNCHRONIZER HHUFFMAN DECODER TRANS- MITTER CONSTANT AREA RECONSTRUCTOR DIGITAL TO ANALOG 4 ITIMING GENERATOR Fig. 4. System block diagram. DISPLAY The received and demodulated signal is stored in coded form in one-half of a two frame memory. While one-half of the memory is being filled, the other half is being used to refresh the display at a full thirty frame per second TV rate. Then the roles of the two halves of the memory are interchanged. If a standard rate video camera is used in the vehicle, this scan conversion step is by-passed. One line at a time is removed from the memory at video pixel rate into a buffer capable of delivering it to the decoder in high speed bursts. The decoder removes from the buffer as many bits as it finds necessary to generate two-pixels in CAQ form. The three level output of the decoder, which is uniform in time, is used by the re- constructor to generate a digital video signal which passes through a D/A convertor to the monitor. It is not the intention of this paper to describe the circuitry in detail. A few comments about some of the individual blocks in Fig. 4 may, however, prove helpful. All of the circuitry is designed to operate at a 10 MHz pixel rate, to enable it to handle standard video frame rates easily. This speed requirement can be relaxed for the airborne portions of the system if a slow scan camera is to be used, and a simplification in the circuitry results. The ground-based decoder and reconstructor must operate at the high speed, how- ever, since they operate between the scan convertor and the display in order to permit storage of the images in compact coded form. Since several operations must, in general, occur within one pixel time, a 40 MHz clock is used throughout. All of the components are Schottke series TTL except in the Huffman encoder, where some 10K series ECL chips are used. The quantizer operates as described in Fig. 3. The quantity 1/L1 is generated by a hyperbola generator consisting of a counter and a decoder network. Thresholds are restricted to powers of two so that multiplication by A implies simply a shift in the bit connections. The encoder straightforwardly generates the bits specified by Table I for each pair of pixels, using a series of SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 103 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 NOR gates, and shifts out the appropriate number of them into the buffer. Figures 5 and 6 are photographs of the CAA and Huffman encoder circuit boards. The decoder is a sequential logic network which takes bits from the buffer until it recognizes the end of a code and then generates the appropriate pair of pixel values. The scan convertor consists of two 32,768 word by nine bit LEC memories on a single chassis. The memories can deliver nine bits to the line buffer in 800 nanoseconds, which is sufficiently fast to assure that 512 bits can be removed in each video line time. This form of scan convertor has the advantage of being compact, rugged, and relatively inexpensive, without adding additional degradation to the imagery. Fig. 5. Constant Area Quantizer Circuit Board. Performance ? Fig. 6. Huffman Encoder Circuit Board. Since this system is designed to transfer certain types of infor- mation from moving video imagery to human observers, the only really satisfactory measure of its performance is its effectiveness in doing that job. Within the confines of a paper and in the absence of any definitive subject tests, more indirect criteria must be used. Most of the pictures in this section, illustrating as they do a wider variety of algorithms and parameter values than are wired into the actual system, are computer simulations. They are all of the same subject to facilitate comparison. The original is shown in Fig. 7. The tank farm subject, having a great deal of detail, is in some sense a "worst case". The simulations reproduced on a microfilm plotter have 256x256 pixels and thus represent only a portion of the actual frame, which in the system contains 480x480 pixels. Figures 8 (a) and 8 (b) are original and compressed versions respectively of video taped imagery run through the actual system and photographed with a Polaroid camera directly off the monitor. The principle parameter which can be varied in the CAQ quantizer is the area threshold, A. Varying A does not alter the amount of bandwidth reduction directly (the output of the quantizer is always 1.6 bits per pixel, independent of both A and the input picture) but it does greatly affect the image quality. A small value of A causes P's and N's to be produced more often, thus increasing the low contrast detail, but reduces the speed with which high contrast changes can be followed, resulting in edge blurring. A large A improves the edge response, but washes out the low contrast detail. The visible result of this is a streaked appearance, as large regions are replaced by constant values which vary from line to line. Fig. 9, which shows CAQ compressed versions with three different thresholds, illustrates these effects. The same point is made in a different way by Fig. 10, in which mean squared error is plotted against threshold for the tank farm picture (Fig. 7). The deterioration for low and high values of A is reflected in this plot. The difficulty of using an objective criterion like mean squared error to judge a technique which attempts a subjective match to human vision is illustrated, however, by a comparison of Figures 11 and 12. Both of these have the same mean squared error (1.6%), but it is difficult to conclude that they are of the same quality. Fig. 7. Original tank farm image. While the choice of A does not affect the amount of compression directly, it does vary the information content of the output by altering the proportion of P's, N's, and O's produced, and thus it affects the further saving that can be achieved by statistical coding. Fig. 13 indicates, for the same image, the average run length of zeroes as a function of threshold. Fig. 14 contains a plot of the entropy of the CAQ signal versus the average run length. Since the entropy places a lower limit on the number of bits per pixel that can be achieved without further loss, it is a good measure of the possible benefit of coding. 104 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 (a) Fig. 8. Original (a) and compressed (b) imagery photographed from the monitor. (h) (a) (b) Fig. 9. CAQ - compressed images with three different thresholds: (a) A = 2, (b) A = 12, (c) A = 32. (c) 0 4 8 12 16 20 24 THRESHOLD (BRIGHTNESS LEVELS X PIXELS) Fig. 10. Dependence of mean squared error on threshold for tank farm image. Fig. 11. CAQ - compressed image with A = 6. Fig. 12. CAQ - compressed image with A = 20. SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 105 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 AVERAGE RUN LENGTH Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 5 4 3 2 1 4 8 12 16 20 24 THRESHOLD (BRIGHTNESS LEVELS x PIXELS) Fig. 13. Dependence of average run length on threshold for tank farm image. 3.0 2.8 2.6 2.4 W 2.2 2.0 04 1.8 CI) 1.6 1.4 1.2 1.0 0. 8 0. 6 0.4 0. 2 1 3 5 7 9 11 AVERAGE RUN LENGTH Fig. 14. Entropy of CAQ signal and Huffman-coded bits per pixel as a function of average run length for tank farm image. The actual performance of the Huffman code is plotted on the same axes, and it can be seen to approach the entropy curve very closely in the region of highest picture quality. The probabilities given in Table I, which were used in determining the optimum code, were obtained by quantizing the tank farm scene, using a threshold of 16, and counting the P's and N's. P's, N's, and O's were assumed to be statistically inde- pendent and P's and N's to be equally probable. Different scenes, different thresholds, and more accurate statistics lead to slightly different codes, but the differences in performance among them are essentially negligible over normal operating ranges. The system is designed to pass one bit per pixel. Since the run length versus threshold curve (Fig. 13) is picture dependent, a trade- off exists between picture degradation caused by too high a threshold, and line truncation produced by too short an average run length. This tradeoff can be made by adjusting the threshold. The best threshold for the busy image of Fig. 7 was found to be 16, while the optimum for the video scene (Fig. 8) was 8. The picture quality is affected to some small degree by the recon- struction method chosen. Figures 15 and 16 compare the results using the "slope" and "step" reconstructions discussed earlier. Although the former gives a closer approximation to the original, in a mean squared error sense, the subjective appearance of the picture was not felt to be improved enough to justify the additional complexity. Attempts to compare the CAQ technique with others must necessarily involve subjective judgements. Since it is a one-dimensional algo- rithm, it does not, of course, exploit correlations in the other direction. An 8x8 block Hadamard encoding with compression comparable to Fig. 16 is included for comparison as Fig. 17. Fig. 15. "Slope" reconstruction of CAQ image (A = 16). Fig. 16. "Step" reconstruction of of CAQ image (A = 16). Fig. 17. Two-dimensional Hadamard compressed Image. 106 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Fig. 18. Three level DPCM image. Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 The closest relatives of the CAQ are the class of predictive techniques, such as Differential Pulse Code Modulation (DPCM), and it is very similar to them in its response to bit errors. CAQ differs from these predictive techniques in quantizing the brightness-distance product, rather than merely the bright- ness. A picture produced by a three level DPCM is shown in Fig. 18 and can also be compared with the CAQ picture in Fig. 16. The DPCM levels are symmetric around zero change and the threshold was chosen to give the same entropy (1.1 bits per pixel) as the CAQ picture. The mean squared errors were 1.41% for the CAQ and 1.54% for the DPCM. In summary, the CAQ technique provides a relatively simple approach, which can be implemented with con- ventional and available components to meet the size, weight, and power restrictions of RPV's. The technique requires no image storage for encoding; it exploits properties of human vision to provide pictures of good subjective quality using only one bit per pixel; and it has been implemented to operate over a full range of frame rates up to thirty frames per second. Acknowledgements This bandwidth compression system has been the work of a large number of people. The CAQ concept was originated a number of years ago by G. C. Sziklai. His advice and suggestions strongly influenced the development of the system, particularly in the early stages. Algorithm development, computer simulation and coding analysis were performed by C. D. Kuglin. The CAQ circuitry and the Huffman encoder and decoder were designed by M. Hilsenrath. The scan convertor and video circuitry were the work of B. S. Golosman and H. Massey. SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information /107 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 INTRAFRAME AND INTERFRAME ADAPTIVE TRANSFORM CODING Clifford Reader Aeronutronic Ford Corporation 3939 Fabian Way, Palo Alto, California 94303 Abstract This paper describes work performed as an extension of the techniques of transform coding with block quantization. These techniques have been very well defined theoreti- cally with implementation of the predicted optimum results. The purpose of this work was to examine the techniques in practice and to define adaptive coding techniques which could be practically applied to the coding of still and moving pictures. Introduction The techniques of transform coding and block quantization for a single Image are well established, the purpose of the work described in the first part of this paper was to examine in detail the application of these techniques in practice [1]. It is to be expected that there will be a variability of performance since any practical implementa- tion will be rather removed from the theoretical optimum case with two dimensional Karhunen-Loeve sampling according to a known covariance and quantization based upon assumed statistics. In particular it is noted that the condition of stationarity of statistics will be violated and that for ease of implementation separable sub-optimum transformations - in this case the Slant - will be used over sub-blocks (16 x 16 pixels) of the image. A range of adaptive coding techniques is examined with practicality as well as coding efficiency being of importance. The work is then continued into interframe coding. A coder is described which achieves a significant reduction in bit rate while not requiring the full frame memory of other interframe coders. The error performance of the coder is examined and static examples are presented. A discussion follows of the visual effect of the coder in real time together with the projected buffer requirements. Intraframe Coding Summary of Coding Procedure The image is slant transformed in sub-blocks of 16 x 16 pixels. To perform quantiza- tion, the data is modelled as a Markov process and from this the variances of the trans- form domain samples are estimated. Coding bits are allocated in proportion to these variances and the samples are normalized in proportion to their estimated variances before being quantized according to the Max quantizer. Tuning of the process is possible by manipulation of the row and column correlations used to define the variances and with the constant of proportionality used in normalizing by the variances. The apparently arbitrary way in which tuning of the parameters influenced the coding error provided the motivation for this work. Adaptive Coding 1 - The Amplitude Parameter The constant of proportionality (C) in the normalization of the data is termed the amplitude parameter (AP). It was found that each sub-block of an image required a different AP for optimal coding (in mean squared error sense) and that although for most sub-blocks a good approximation could be made with one AP, in certain signficant cases a markedly different - higher - AP was required suggesting that an adaptive process was required. This was effected by making the constant C a function of the square root of the energy of the sub-block less the D.C. component, i.e., the A.C. energy. This is termed the amplitude factor AF. Thus C ? AP x AF. The result was as follows. In non- adaptively coded images, the best result is obtained with an AP high enough to encode the problem sub-blocks and high frequency quantization noise then appears on the other sub-blocks. In the adaptive coding, the AF automatically compensates for the high energy problem sub-blocks such that one AP is more suitable for the whole image. This is seen as a freedom from noise in "flat" image areas and a slightly reduced mean square error. Figure 1. Adaptive Coding 2 - Inter-Pixel Correlation True horizontal and vertical correlations for natural images fall in the region of 0.91 to 0.96. Nevertheless, the effect was examined of using different correlations to define the model from which bit allocations and normalizing constants are derived. Results are summarized in Figure 2 where it can be seen that the lowest error is 108 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 encountered using far lower correlations than the real ones at the expense of increased sensitivity. What happens is that with the lower correlations, the available bits are spread futher into the transform domain and more samples are encoded. Figure 3. However each sample receives fewer bits and thus the position of the quantization levels (as fine tuned by the AP) is more critical. Correlations of 0.76 were selected as a compromise with the highly satisfactory results shaan in Figure 4. Figure 1 Comparison of Adaptive and Non-Adaptive Coding Correlations (0.86; 0.86) 1.5 bits per Pixel AP 0.55 MSE . 0.000639 AP . 0.85 MSE . 0.000724 9 10 Ld 5 AP . 0.65 MSE ? 0.000501 Square Root AC Energy Adaptive Coding AP . 0.85 MSE 0.000588 Non-Adaptive Coding Figure 2 AP . 0.75 MSE - 0.000827 AP = 0.85 MSE . 0.000818 GRAPH OF MEAN 'SQUARED ERROR vs AMPLITUDE PARAMETER 4 p=0.70 p=0.74 p40.80 p=0.76 p=0.62 / p=0.66 p=0.82 p=0.90 p5D.84 1)=0.86 GIRL PICTURE p=CORRELATION p=0.8,01 I I I I 1 1 i _J. 1 0 0.1 02 0.3 OA 0.5 0.6 0..7 0.0 02 LO AMPLITUDE PARAMETER SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 109 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Figure 3 Bit Allocations 8 8 8 7 7 7 5 5 4 4 4 4 4 4 4 4 8 8 8 6 5 5 4 4 4 3 3 3 3 3 3 3 8 8 7 5 5 5 3 3 3 3 2 2 2 2 2 2 8 7 6 5 4 4 3 3 3 3 3 3 2 2 2 2 8 7 6 4 4 4 3 3 2 2 2 2 2 2 2 2 8 6 5 4 4 4 3 3 2 2 2 2 2 2 2 2 7 5 4 3 2 2 2 2 0 0 0 0 0 0 0 0 6 5 4 3 3 3 2 2 2 2 2 2 0 0 0 0 7 5 4 2 2 2 2 2 0 0 0 U 0 0 0 0 5443222222220000 7 5 4 2 2 2 2 2 0 0 0 0 0 0 0 0 5 4 4 3 2 2 2 2 0 0 0 0 0 0 0 0 5 3 3 2 2 2 0 0 0 0 0 0 0 0 0 0 4 3 2 2 2 0 0 0 0 0 0 0 0 0 0 0 5 3 3 2 2 2 0 0 0 0 0 0 0 0 0 0 4 3 3 2 2 2 0 0 0 0 0 0 0 0 0 4 3 2 0 0 0 0 0 0 0 0 0 0 0 0 4 3 2 2 2 0 0 0 0 0 0 0 0 0 0 0 4 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3 2 2 2 0 0 0 0 0 0 0 0 0 0 0 4 3 2 0 0 0 0 C 0 0 0 0 0 0 0 0 3 3 2 2 2 0 0 0 0 0 0 0 0 0 0 0 4 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3 2 2 2 0 0 0 0 0 0 0 0 0 0 0 4 2 2 0 0 0 0 0 0 0 0 0 0 0 0 3 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 4 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 4 2 2 0 0 0 0 C 0 0 0 0 0 0 0 0 3 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 4 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 Correlations 0.86 Figure 4 Coding with Square Root AC Energy Amplitude Factor Correlations (0.76; 0.76) Correlations 0.76 AP = 0.25 MSE 0.000544 AP = 0.25 MSE = 0.000863 AP = 0.30 MSE = 0.000446 1.5 Bits Per Pixel AP = 0.3 MSE = 0.000757 1.0 Bits Per Pixel AP m 0.35 MSE 0.000749 AP = 0.35 MSE 0.001023 Adaptive Coding 3 - Optimised Coding The fine structure of the parameters required by each sub-block for optimum coding was examined by performing an experiment in which fifteen pairs of inter-pixel correlations (vertical and horizontal; Figure 5) were used to define bit allocations and normalizing constants which were applied in every combination (225) to every sub-block. Figure 6 shows sub-block maps in which the reference number (from Figure 5) is shown for the optimum combinations of bit allocations and normalizing constants. The resultant minimum mean squared error is shown in Figure 7 with the coded pictures shown in Figure 8. The pictures show good detail and edge rendition and it should be noted that even better coding could have resulted if correlations of less than 0.80 (type 1) had been allowed. 110 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP801300829A001100060001-0 Figure 5 Correlation Correlations Reference No. 1 0.80; 0.80 2 0.84; 0.84 3 0.88; 0.88 4 0.92; 0.92 5 0.96; 0.96 6 0.80; 0.92 7 0.80; 0.84 8 0.84; 0.92 9 0.88; 0.96 10 0.92; 0.96 11 0.92; 0.80 12 0.84; 0.80 13 0.92; 0.84 14 0.96; 0.88 15 0.96; 0.92 Figure 7 Sub-Block Map (Girl Image) Optimum Mean Square Error 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 0 0 0 0 0 0 0 3 7 7 2 1 1 1 2 2 2 0 0 0 0 0 1 15 10 6 4 8 7 1 2 2 1 3 0 0 0 0 1 9 15 15 9 6 522 2 2 1 1 4 o 0 0 0 2 5 15 7 4 2 1 15 3 2 1 1 5 0 0 0 0 1 4 5 6 2 3 6 8 3 1 1 1 6 0 0 0 0 1 4 3 4 2 2 2 6 1 1 1 1 7 0 0 0 0 0 11 14 11 4 1 4 9 1 1 1 1 8 0000035542592111 9 0 0 0 0 0 2 11 8 3 2 7 4 2 1 1 1 10 0 0 0 0 0 3 9 6 4 3 4 2 2 1 1 1 11 0 0 0 0 0 1 27 13 1 1 14 7 2 1 2 2 12 1 1 0 1 17 11 413 1 4 4 2 18 14 1 1 13 1 1 212 1 2 911 9 2 1 1 10 12 5 1 14 1 0 2 514 5 3 5 4 15 16 3 0 0 7 9 15 00 1 0 0 5 16 15 11 54 11 28 12 0 0 3 16 0 0 3 1 0 5 6 17 21 25 22 15 31 42 0 1 The Mean Square Error is coarsely approximated 1 E 0.0001 1 2 3 4 5 6 7 9 10 Figure 6 Nth-Block Mupo (Girl Image) 1 2 3 4 5 6 7 8 9 10 11 12 1514 15 16 11 11 12 1 13 12 14 1 15 1 1 16 ()c 1 1 1 1 1 1 7 1 6 1 1 12 1 1 1 1 1 1 1 1 y 1 1 0)12 8 1 1 1 1 1 1 1 1 i 1 3 1 1 1 1 1 1 1 1 1 6 1 7 1 1 1 1 1 1 1 1 1 1 1 1 7 .1 1 1 1 1 1 1 1 1 ? 1 1 1 1 1 1 irii 1 1 1 1 1 1 6 1 1 1 1 1 1 12 1 1 1 1 1 1 1 6 1 12 1 1 1 1 1 1 1 1 1 1 1 11 1 6 6 1 1 6 1 1 1 1 1 1 1 1 1 1 6 1 6 1 1 1 1 1 1 1 3 1 1 3 1 11 12) 1 1 1 1 1 1 7 1 1 1 1 1 1 6 1 1087 1 6 1 1 1 1 1 1 1 1 1 106 1 ? 1 1 1 1 1 1 1 1 12 6 1 1 1 1 1 1 6 7 1 1 1 1 7 7 12 6 12 12 1 12 (1) Sub-optimum Bit Allocations Types 1 2 3 4 5 6 7 3 9 10 11 12 13 14 15 16 1 1 1 1 1 1 1 1 10 10 10 1 1 3 1 1 1 2 1 1 1 1 1 1 51 1056151 1 1 3 1 1 1 1 1 1 108 8 604 1 1 1 4 1 1 1 1 1 1 1 .1 1 3 1 1 1 1 5 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 6 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 7' 1 1 1 1 1 1 8 1 1 0 1 3 1 1 8 1 1 1 1 1 5 6 3 6 1 1 1 1 1 1 1 9 1 1 1 1 1 15 1 6 8 1 1 1 1 1 1 1 10 1 1 1 1 1 1 601 1 1 1 1 1 1 1 11 ?1 1 1 1 1 6 6 1 1 1 1 1 1 1 12 4101 1 9 6 105 8 8 8 1 1 1 13 10 4 1 0 1 1 1 1 1 1 1 6 1 1 1 14 5 11 li 1 1 1 1 1 8 1 1 1 1 1 15 1 1 11 1 1 1 0 1 4 0 10 0 1 1 1 16 1 1 71 1 9 3 3 8 5 5 1 15 (ii) Sub-optjmum compenJIng Constant Types Sub-blocks marked with a circle may be fairly well coded ith bit allocations end companding constanto of Type 1. Adaptive Coding 4 - Zonal and Threshold Coding The coding thus far described employs bit allocations in which those samples which are quantized fall inside a maximum variance zone. The coding error may be divided into the quanti- zation error and the error resulting from those samples outside the zone which do not get coded at all. There will always be exceptional signi- ficant samples in the latter category and they could be coded by setting a magnitude threshold in the area outside the zone and run-length cod- ing all samples exceeding such threshold. As an example, four bits were allocated to code each of such samples and, assuming that a line starting code were used, four bits were required to address each sample. The number of thresholded samples is shown by the sub-block maps of Figure 9 where it can be seen that a picture which has been zoned coded at 1.5 bits per pixel now requires 1.7 bits per pixel. In order to get back to 1.5 bits per pixel, the zonal coding SPIE Vol. 66(7975) Efficient TransmissionofPictorialInformation /111 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 must be effected at 1.2 bits per pixel. The corresponding coded pictures are shown in Figure 10. It is considered that the result obtained at a total of 1.5 bits per pixel is not significantly improved and does not justify the work involved especially since channel error would be a problem. Figure 8 Optimised Coding Original 1.5 Bits Per Pixel MSE = 0.000744 Figure 9 Sub-Block Maps (Girl Image) The Number of Threshold Samples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 00 0 0 0 0 0 5157 1 0 0 0 1 1 2 0 0 0 0 0 0 20 19 14 6 11 11 0 1 3 0 3 0 0 0 0 0 10 19 33 25 12 4 21 0 0 0 0 4 0 0 0 0 2 821 8 6 2 017 3 2 0 1 5 0 0 0 0 0 5 11 11 0 3 14 10 4 0 0 0 6 0 0 0 0 0 7 1 1 0 2 1 10 0 0 0 1 7 0 0 0 0 0 14 23 18 4 1 213 0 0 0 0 8 00 0 00 2 7. 8 7 0 610 1 0 00 9 0 0 0 0 0 1 10 9 3 2 9 0 1 0 0 0 10 0 0 0 0 0 311 9 6 4 0 0 0 0 0 0 11 0 0 0 0 0 0 29 26 0 0 23 7 0 0 0 0 12 0 0 0 0 22 23 9 20 0 5 3 3 23 31 0 1 13 0 0 221 1 0 12 17 11 1 1 0 18 21 6 0 14 0 0 0 6 25 9 1 2 8 22 30 0 0 0 2 20 15 0 0 0 0 0 6 30 22 21 45 19 41 19 0 0 2 16 0 0 2 0 0 7 13 22 28 29 39 20 55 37 0 0 (1) 1.7 bits per Pixel Total Number of Threshold Samples = 6,008 Original 1 2 3 4 1.0 Bits Per Pixel MSE = 0.000796 5 6 7 8 9 10 11 12 13 14 15 16 1 0 0 0 0 0 0 0 7 21 14 8 0 0 0 1 4 2 0 0 0 0 0 1 36 26 18 14 23 20 0 1 3 2 3 0 0 0 0 0 18 28 47 34 18 10 43 2 0 0 0 4 0 0 0 0 2 14 31 20 11 5 1 26 6 2 0 1 5 0 0 0 0 1 12 18 21 8 6 22 23 7 0 1 0 6 0 00 0016 46 6 5 2 18001 1 7 0 0 0 0 0 27 35 35 18 4 9 25 0 0 0 0 8 0 0 0 0 0 7 9 14 14 2 15 15 1 1 0 0 9 0 0 0 00 2 24 21 6 3200 1 0 00 10 0 0 0 0 0 7 18 18 8 5 0 1 1 0 0 0 11 0 0 0 0 0 A 39 39 0 0 36 lo o 1 o 2 12 1 0 0 0 32 34 12 33 0 11 El 6 43 43 0 2 13 0 0 538 1 0 23 29 27 5 2 8 32 33 16 0 14 1 0 0 11 40 19 4 13 17 40 47 4 0 0 13 31 15 0 0 3 0 0 20 44 34 36 70 33 59 36 0 0 5 16 0010 1 0 20 25 37 48 47 55 29 75 60 00 (ii) 1.5 bits per Pixel Total number of Threshold Samples . 10,532 Figure 10 Threshold Coding 1.7 Bits MBE = 0.000533 1.5 Bits MSE 112 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 o.0006o6 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Adaptive Coding 5 - Zonal Coding If the sub-block maps of Figures 6, 7 and 9 are studied, then it is seen that there is high correlation between them in terms of which sub-blocks display exceptional informa- tion. Two further maps are presented in Figure 11. Figure 11 Sub-Block Map (Girl Imago) Sub-Optimum Companding Constant Types with Indication of High Amplitude Factor 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0.1 0.2 0.1 1 1 1 1 1 1 1 1 FOP) 1 1 3 1 1 1 0.1 0.2 0.1 2 1 1 1 1 1 1?1 1 6 3615 1 1 1, 3 1 1 1 1 1 1 7 6 8 6 M 4 1 1 1 0.2 0.2 0.1 4 1 1 1 1 1 1 6 1 1 1 1 al 1 1 1 1 0.1 0.2 0.1 5 1 1 1 1 1 1 1 1 1 1 1 1 1131 1 0.1 0.2 0.1 6 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 0.1 0.1 0.1 7 1 1 1 1 1 1 [61 2 2 1 1 8 1 3 1 1 0.1 0.1 0.6 8 1 1 1 1 1 6 3 6 1 1 1 1 1 1 1 9 1 1 1 1 1 15 1 6 8 1 1 1 1 1 1 1 0.1 0.1 0.0 10 1 1 1 1 1 1 6 6 1 1 1 1 1 1 1 1 0.1 0.1 0.1 11 6 1 1 1 1 1 DIE 1 1 7 1 1 1 1 1 0.1 0.1 0.1 12 4101 1 D 6 1 11 E 0 3 8 El 1 1 1 0.7 0.1 0.0 13 10 4 1 3 1 1 1 1 1 1 1 G 1 13 1 1 3.3 4.3 0.2 14 5 11 11 11 1 1 1 1 1 1 E110 1 1 1 1 3.7 3.4 1.1 15 1 1 11 1 2 1 1 2 1 8 2 1 1 1 4.7 0.6 2.0 16 7 1 1 1 7 6 1 1 1 15 0.1 0.1 1.7 0.2 0.1 1.1 0 a AF 16 AF ;;;;.7: 9 LII a AP 6 ii Soh-Block Map AC Energy Amplitude Factors for Girl Image 0.0 0.1 0.5 0.1 9.0 13. 13. 1.1 0.2 1.5 1.2 0.2 0.3 0.0 0.0 0.9 16. 1.3 0.8 2.6 5.8 5.4 3.1 0.6 0.5 0.9 0.0 0.1 3.2 2.3 3.3 4.4 4.1 1.9 6.6 4.5 0.7 1.3 0.3 0.0 0.8 1.3 6.0 2.1 0.7 0.5 0.8 6.1 0.4 0.4 0.6 0.4 0.0 1.2 2.3 2.8 2.7 1.7 0.5 2.4 1.8 0.9 1.3 0.7 0.1 0.0 0.2 3.5 2.0 1.0 1.7 1.0 1.0 2.5 1.0 1.1 0.4 0.4 0.0 0.0 3.5 6.9 5.0 2.8 1.3 0.5 5.8 1.0 1.4 0.3 0.2 0.0 0.0 9.5 2.8 3.1 4.1 1.2 2.0 1.6 0.4 0.9 0.2 0.3 0.0 0.0 5.7 3.8 4.2 3.2 1.3 1.4 0.7 0.4 0.8 0.2 0.3 0.0 0.0 2.0 4,5 3.1 1.8 0.8 1.5 0.6 0.9 0.9 0.7 0.5 0.0 0.1 0.6 7.5 6.3 0.5 0.5 4.8 1.0 0.2 0.7 0.3 0.3 0.2 9.4 5.3 1.8 5.0 6.6 4.3 2.3 3.0 7.7 2.8 0.6 0.6 5.2 1.0 0.4 2.7 1.0 2.5 2.1 0.6 2.1 2.6 4.5 2.5 0.4 S.3 2.0 0.8 0.7 1.3 2.0 4.6 6.9 0.5 0.0 0.2 2.9 1.0 0.2 1.0 2.7 3.2 5.4 2.3 20. 2.1 22, 4.6 0.4 0.4 2.0 0,6 0.6 3.7 1.5 5.3 20. 8.1 6.3 1.0 20. 36. 0.1 4.2 Now it can be seen that certain sub-blocks require exceptional bit allocations and nor- malizing constants, have many significant samples without a normal bit allocation zone, produce a high mean squared error even when optimally coded and furthermore possess the highest AC energy. This last property allows easy identification of these sub-blocks and hence definition of an adaptive, variable zonal coding is possible. An example is presented to indicate the performance of such a scheme. The most difficult sub-blocks will require many bits in order to be coded satisfactorily, e.g. 3 bits per pixel then the relatively high energy sub-blocks might be coded well at 2 bits per pixel. The effect for these images is shown in Figure 12 where the bit rate for the remaining (background information) sub-blocks is shown calculated to yield an overall 1.5 bits per pixel. Figure 12 Image Number of sub- blocks at 3.0 Bits per pixel Number of sub- blocks at 2.0 Bits per pixel Number of sub- blocks remain- ing Rate of remain- ing sub-blocks Bits per pixel Girl 11 127 118 0.82 Couple 3 120 133 1.02 Moon 4 71 181 1.27 The distribution of bits is shown in Figure 13 and it is immediately obvious that the AP is a very effective delineator of detail in the picture. Excellent quality would result from such a scheme with the only problem occurring when the discreteness of the sub- blocks causes part of a feature to be missed. Conclusion - The Application of Statistical Coding The results presented in previous sections show that most areas of the images are coded effectively by straightforward application of the statistical models. The excep- tions, when they occur are significant because they involve prominent features. SPIE Vol 66 (1975) Efficient Transmission of Pictorial Information/ 113 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Figure 13 Adaptive Zonal Coding Sub-Blocks Coded at 3 Bits Per Pixel Sub-Blocks Coded at 3 Bits Per Pixel Plus Those Coded at 2 Bits Per Pixel Typically these are characterized by a sharp high contrast edge. Obviously, in a sub- block containing such features the statistics are not Markovian and not stationary. No manipulation of the maximum variance zone except to simply increase its size is effective in coding this data. As an example, the transform domain of the sub-block located on the diagonal edge of the girl's hairline (reference Figure 13) is shown in Figure 14. Figure 14 Slant Transform domain of Sub-block (2,7) Girl Image 6.080 -2.038 -0.173 -0.051 0.056 -0.099 0.005 0.002 -0.014 -0.008 0.025 0.002 0.008 -0.040 -0.001 -0.039 :2.951 -0.204 0.914 0.006 0.210 -0.040 -0.013 -0.016 -0.001 0.006 -0.021 -0.004 0.011 0.024 0.027 0.015 0.185 1.060 -0.049 -0.389 -0.009 -0.258 0.004 0.078 0.062 -0.093 0.019 0.039 -0.035 -0.014 -0.009 -0.038 0.135 0.001 -0.394 -0.055 0.161 -0.026 0.008 -0.072 -0.007 -0.011 -0.007 -0.006 -0.009 0.000 0.000 0.049 -0.269 0.418 -0.042 0.229 -0.016 0.072 0.070 -0.121 0.029 0.001 0.028 -0.124 0.003 -0.007 -0.013 -0.012 0.108 0.055 -0.543 3.090 -0.003 0.003 0.119 -0.064 -0.043 0.034 0.109 -0.022 0.012 -0.017 -0.018 0.031 0.123 -0.140 -0.049 0.144 -0.106 0.231 -0.070 0.058 0.065 -0.018 -0.001' 0.020 0.001 -0.001 0.016 -0.008 0.037 -0.007 0.038 -0.021 -0.226 0.068 -0.045 0.074 -0.009 -0.057 -0.013 0.014 -0;005 0.007 0.015 -0.021 -0.018 0.029 0.084 -0.030 0.054 -0.053 -0.059 0.059 -0.037 0.020 -0.026 0.054 0.004 0.001 0.034 -0.001 0.032 -0.065 -0.047 -0.024 0.013 -0.020 -0.039 0.053 0.000 0.033 -0.061 0.036 -0.022 0.032 -0.026 0.011 0.099 -0.089 0.074 0.053 -0.030 0.063 0.012 -0.016 -0.088 -0.002 -0.005 0.002 0.020 0.031 -0.016 -0.001 0.015 -0.082 0.011 -0.046 -0.062. 0.024 -0.003 0.027 0.026 0.041 0.002 0.014 0.003 -0.033 0.016 -0.026 -0.006 -0.095 -0.030 -0.001 0.026 0.020 0.060 0.003 0.009 -0.006 0.038 -0.016 0.022 -0.015 -0.020 -0.012 0.007 0.056 0.079 -0.023 -0.021 -0.044 -0.029 -0.030 0.019 -0.029 -0.005 -0.001 -0.002 0.003 0.017 0.019 0.001 0.028 -0.056 -0.007 -0.009 0.005 -0.001 -0.022 0.012 0.012 -0.007 -0.053 0.002 0.032 0.021 -0.029 -0.046 0.028 0.014 *0.010 0.037 -0.612 0.034 -0.017 0.020 -0.020 0.026 0.015 0.012 -0.035 -0,014 0.015 114/SPIE Vol 66(1975)EffichnitTransminkmoffttori011nfOrra9don Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 The significant samples have a distribution out along the leading diagonal, very unlike the maximum variance zone. In consequence, the adaptive zonal coding is recommended as best, particularly since bits are concentrated where desired - in the girl image, the girl herself is coded mostly with two bits and the insignificant background with only 0.82 bits. The moon picture, however, has the few outstanding features coded at 2 or 3 bits but the more noticeable background receives 1.27 bits. Lastly it is noted that the mathematical model approach to coding has now been abandoned in favor of zones based upon measured statistics (variances) an improved coding results since the requirements of the problem sub-blocks are built-in. Interframe Coding It is well known that considerable redundancy is present when a moving picture is transmitted. The adaptive coder described here was designed to take advantage of the limitations of human vision, essentially providing only sufficient information to yield the illusion of a complete high detail moving picture. The coder is specifically designed to minimize terminal cost in addition to minimizing transmission cost. In addition a two stage coding is applied in order to obtain flexibility in implementation. Ad application was picture telephone in which simple coding might be used locally with additional coding for long distance. The first stage of coding consists of the removal of spatial redun- dancy within single frames. This employs the techniques discussed earlier. The second stage of coding - interframe coding is thus presented with coded transform domain samples at a rate considerably less than the source rate. The bit rate reduction is to be achieved by the updating of only those portions of the frame in which movement has taken place since the previous frame. Since the interframe coder is presented with data which has been transformed in sub-blocks, it is necessary to decide whether a particular sub- block contains movement and if so, to update it. Detection of the sub-blocks which con- tain motion may be effected by examination of the sub-block frame difference energy with a threshold decision of whether to transmit a new sub-block or not. The effect is a remarkably linear one as indicated by the error curves of Figure 15. 2 26 21 MIL:J.410U) SET AS A FUNCTION 01- SUN -134.0C.K DI EFIIPENCE ENERGY 23 23 21 3 2 0 2 4 6 8 0 Figure 15 VMYVAST .MEDIUM FAS r SLOW SLOW GRAPH OF MILAN somara) ERROR vs THAFSHO'..0 FOR FINS: DEGREFS OF MOTION IS 8 20 d-? 24 do dti 30 'LL'.7 di do THRES1101.0 11 GRAP1-1 Or MEAN SOUARED ERROR vs 111121:S1'101 0. 0E0150 MOTION 1GISF -110 r 11 - SUFEFELQCK DI F FE RENCE E050.1552 .12 - - I0 - W _ THRESHOLD SET AS A FUNCTION 2 8 - 86 - 54 4 2 2 _ _l___I___J _4_1 1 _1 1 1 4 6 8 10 12 14 IN THRESHOLD SLOPE 0.74 y 0.74 x-1.63 0 2 18 Serious deviation occurs only for very slow motion where the discreteness of the update process becomes apparent. The block diagram for an implementation of the system is shown in Figure 16. Shown is the difference energy summing and thresholding, the memory and delays which compensate for the time to make the update decision. In a normal coder the memory would have to store a complete frame. The advantage of this coder is that compres- sion has already been achieved by the intra- frame coder and the same degree of compres- sion applies to the size of the memory. A 4:1 compression (6 bits to 1.5 bits) is likely and this results in only a one quarter frame memory. From the curve of Figure 15 (ii) a threshold of 4 is selected as offering the best compromise between updating all SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information /115 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 moving areas but not updating differences due to noise. The result is shown in Figure 17. Some blur is visible from the intraframe coding but the interframe coding is perfect. The result with a threshold of 9 is shown in Figure 18. Some updating errors are visible. Figure 16 - INTRAFRAME CODER E u tx, It) ,j 0? TRANSFORM 0UANTISER DELAY {I >T 0 EUT 0 uij It) ? uij It -4 3LOCK DIAGRAM OF ADAPTIVE INTERFRAME CODER Ed>r 0 W.", Owing to the non-availability of data, it was not possible to test the coder dynami- cally - possibly a problem might be visibility of the sub-block updating structure. This could be coped with by overlapping sub-blocks or careful transmission of the DC and low frequency information. The buffering problems associated with the variable data rate produced by the coder were found to be similar to those encountered with a pixel cluster coding update scheme. In particular, in the case of buffer overload it is noted that an elegant reduction of data input maybe effected by a further spatial compression (re- quantization or removal of transform domain samples) to take advantage of the limited spatial resolution of the eye to moving images. Figure 17 Interframe Coding with Threshold of 4.0 Four Frame Sequence Frame 2 1.16 bits per pixel IMSE = 0.000052 Frame 3 1.17 bits per pixel IMSE = 0.000051 Frame 4 (i) Intraframe coding at 3.0 bits per pixel 1.19 bits per pixel IMSE = 0.000048 116/SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Figure 17 (Continued) Frame 2 0.54 bits per pixel IMSE = 0.000049 Frame 2 Frame 3 0.54 bits per pixel IMSE = 0.000048 Frame 4 0.55 IMSE (ii) Intraframe coding at 1.5 bits per pixel Figure 18 Interframe Coding with Threshold of 9.0 Four Frame Sequence Intraframe Coding at 1.5 bits per pixel 0.5 bits per pixel IMSE m, 0.000063 Frame 3 0.52 bits per pixel IMSE = 0.000074 bits per pixel = 0.000055 Frame 4 0.52 bits per pixel IMSE = 0.000092 References 1. "Orothogonal Transform Coding of Still and Moving Pictures", C. Reader, Ph.D. Thesis, Sussex University, England, 1974. SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 117 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 118 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Session III IMPACTS OF IMAGE DATA COMPRESSION Panel Moderator ANDREW G. TESCHER The Aerospace Corporation N SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information /119 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 120 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 c N Session IV POTENTIAL IMPLEMENTATIONS Chairman J. R. PARSONS The Aerospace Corporation SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 121 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 JOINT PATTERN RECOGNITION/DATA COMPRESSION CONCEPT FOR ERTS MULTISPECTRAL IMAGING* Edward E. Hilbert Jet Propulsion Laboratory 4800 Oak Grove Drive Pasadena, California 91103 Abstract This paper describes a new technique which jointly applies clustering and source encoding concepts to obtain data compression. The cluster compression technique basically uses clustering to extract features from the measurement data set which are used to describe characteristics of the entire data set. In addition, the features may be used to approximate each individual measurement vector by forming a sequence of scalar numbers which define each measurement vector in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. A description of a practical cluster compression algorithm is given and experimental results are presented to show trade-offs and characteristics of various implementations. Examples are provided which demonstrate the application of cluster compression to multispectral image data of the Earth Resources Technology Satellite. Introduction This paper further describes a new technique which jointly applies clustering and source encoding concepts to obtain data compression. A practical algorithm for implementing this concept was initially discussed in a previous paper [1]. This cluster compression technique is applicable to multidimensional information sources in general, but particular emphasis is given here to compressing multispectral image data from future operational Earth Resources Technology Satellites (ERTS). A Joint Pattern Recognition/ Data Compression Model is defined to illustrate the general concept of combining pattern recognition and data compression. Then a practical realization of the cluster compression concept is described and experimental results are presented which show trade-offs and characteristics of various implementations. Examples are also provided which demonstrate the application of cluster compression to multispectral image data from ERTS. Data Model The multispectral image source can be modeled by the continuous random process s(y1,y21w) which is the electromagnetic energy at wavelength w for spatial coordinates yl and y2. The measurement vector elements of a digitized d-band multispectral image are then represented by the vector X(Y1,Y2) obtained from s(y1,y2,w) by discretizing the variables yl, y2 and w to give X(Y,,Y2).[s(Y1,Y2,W1),s(Y1,Y2,142),...,s(Y1,Y2,Wd)]. *This paper presents the results of one phase of research carried out at the Jet Propulsion Laboratory, California Institute of Technology, under contract No. NAS7-100, sponsored by the National Aeronautics and Space Administration. 122 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 (1) Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 A Joint Pattern Recognition/Data Compression Model Figure 1 describes a model of the generalized concept for the cluster compression technique. Let {X.}11 11 be a sequence of n measurement vectors obtained from a subset of the multispectral image data. The -1 1= entire sequence {X1}7=1 is analyzed to extract features {,j}j.1 for the sequence CX07.1. These features could also be considered as primitives, or as a basis for approximating measurement vectors. The sequence of m features {T.lm provides a description of the characteristics pertaining jointly to the entire j j=1 sequence of measurement vectors {X07.1 and in some applications may be the only output required. In other applications each measurement vector needs to be approximated in terms of ftp.)111 by the sequence of scalar j j=1 numbers 1,(01(.1 which assigns to each measurement vector an approximation in terms of one of the primitives. Alternately, the concept of Fuzzy Set Theory [2] might be used to allow each measurement vector to be described by a weighted mixture of more than one of the features in the sequence {y3=1. The scalar sequence {yk4=1 constitutes a spatial map of the measurement vectors in terms of the primitives, and is called the feature map. Spatial features can be extracted from 1)(01:1(.1, and source encoding can be used to efficiently represent the spatial characteristics of {X.14.1 through the encoded feature map denoted by C . For each sequence {X.}1:1 of measurement vectors the model basically uses pattern recognition concepts to determine a set of features or primitives, and then the model uses data compression techniques to efficiently represent the spatial features in terms of the primitives. SENSOR EXTRACT FEATURES/ PRIMITIVES {T.1. j J=1 {x07.1 DEFINE EACH Xi IN TERMS OF {T.111.1 J J=1 SPATIAL k=1 FEATURE --->" EXTRACTOR/ ENCODER Fig. 1. A Joint Pattern Recognition/Data Compression Model. >. FEATURES Cy ENCODED FEATURE MAP The remainder of this paper describes a new practical method of data compression, called cluster compression, which is based upon the model in Fig. 1. Cluster compression uses clustering to extract multidimensional primitives from subsets of spatially contiguous measurements in the image data and then applies entropy encoding to the feature map. Another independently developed data compression technique related to the model in Fig. 1 is the Mob algorithm [3]. This algorithm is basically a boundary finding algorithm that guarantees closed boundaries. A blob is defined as a connected set of picture elements all of which have some common characteristic. In the Blob algorithm the primitives are defined by the boundary finding algorithm and consist of statistical descriptions of the measurement vectors contained within the various blobs. Spatial definition can then be provided by encoding a sequence which defines the spatial boundaries of the blobs. This algorithm is most useful as a data compressor when the image consists of well defined boundaries enclosing picture elements of nearly uniform characteristics, (e.g., agricultural fields). The Cluster Compression algorithm is different in that it extracts primitives and obtains signif- icant compression independent of the spatial characteristics of the data. Thus the cluster compression technique can efficiently compress images whether or not they contain uniform areas with well defined boundaries. In addition, if the image does consist of uniform areas with well defined boundaries, such as fields, the entropy encoding of the feature map also results in efficient representation of these spatial boundaries. SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 123 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP801300829A001100060001-0 CLUSTERING FOR EXTRACTION OF FEATURES Let{X.}n be a sequence of n d-dimensional spectral intensity measurements from a spatially local region of a multispectral image. These vectors generally tend to form groups in multispectral intensity space. A typical plot of {Xi}7.1 for two bands in spectral intensity space is shown in Fig. 2(a). An intuitively natural set of features to use for representing {Xi}7=1 in Fig. 2(a) are the descriptions for groups of measurement vectors, for example, the groups depicted in Figs. 2(b)-2(d). Each feature then consists of whatever set of parameters are used to describe the source distribution for the group of measurement vectors. The mean and covariance are an obvious example of a group description, or feature. The number of features extracted from any sequence of measurement vectors would depend on how accurately {X.1n must be represented, and correspondingly what data rate is acceotable. An example of how two, four, i=1 or eight features might be chosen is shown in Fig.s 2(b)-2(d). C CO BAND I (a) Fig 2. BAND 1 (b) BAND 1 (c) BAND 1 (d) Typical display in two band spectral intensity space for measurement vectors and n for groups of measurement vectors used as features. (a) Measurement vectors (b) Two groups. (c) Four groups. (d) Eight groups. Clustering Approaches Clustering is a means for automatically grouping multidimensional data [4]-[5]. Figure 3 shows the basic iterative approach to clustering. All clustering techniques depend on a distance measure between vectors and a distance measure between clusters, or groups of vectors. The distance measure between vectors is usually either the Euclidean distance or the absolute value distance. A distance measure between clusters could be as simple as the distance between means, or more generally the shape of the two clusters could also be used in computing a measure of distance. USER SUPERVISION (NUMBER OF CLUSTERS) CHOOSE INITIAL CLUSTER CENTERS ITERATE ASSIGN EACH Xi TO CLOSYST CLUSTER CENTER RECOMPUTE CLUSTER CENTERS Fig. 3. The basic iterative approach to clustering STOP IF NO CHANGE The simple clustering algorithm of Fig. 3 is defined as the Basic Clustering Algorithm (RCA), and simulation results involving its use in cluster compression are presented later in this paper. The repeti- tive assignment computations for the BCA are very simple, and the algorithm can be structured in a highly parallel manner making very high data rates feasible. The only supervision required in this clustering is the choice of how many clusters are to be obtained for each set of measurement vectors. Even when the number of clusters for each data set is constant, the self scaling characteristics of clustering provide 124 / SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 a good multidimensional quantization of spectral intensity space. However, for some applications it is desirable to have the number of clusters used per data set be variable and adaptively determined to meet certain quality requirements. There are many ways of making the basic iterative clustering algorithm in Fig. 3 adaptive. For example, clusters might be split, combined, or deleted based on intracluster and intercluster distance measures. A widely used algorithm, ISODATA, with these adaptive traits was originated by Ball and Hall [7]. Modifications of ISODATA for ground data evaluations of ERTS multispec- tral data produced ISOCLS [8], [9]. The Adaptive Clustering Algorithm (ACA) referred to later in the simulation examples is basically ISOCLS with some simplifying modifications. Another clustering approach often used with ERTS multispectral data is defined in [10]. This approach is similar to the ISOCLS approach except that a Swain-Fu distance is used as a more accurate measure of intercluster distance. However, the Swain-Fu distance also requries more computation. Any clustering approach could be used with the cluster compression concept, but the emphasis in this paper is on clustering approaches amenable to high data rate implementations. Furthermore, initial simulation results suggest that more sophisti- cated clustering is unnecessary. Cluster Features The features extracted from {X.}n 1 by clustering are typically a subset of the following: cluster -1 i= means, cluster variances per band, cluster covariances, number of vectors in each cluster, set of intercluster distances, etc. Any description of a cluster can be considered a feature, even the set of vectors themselves. These types of cluster parameters comprise a compact description of a group of measurement vectors in a statistical format which is both easily interpreted and directly meaningful for image approximation and computer classification applications. An example of a simple cluster feature is the cluster mean. Cluster Feature Rate. Let (T1m be the set of m cluster features which describe the sequence of j=1 n measurement vectors. Assume each T requires b bits of quantization per band. For example, in image approximation T * might be the j*th cluster mean with b equal to 6 (or 8) bits per band, or in classifica- tion uses Tj* might be the j*th cluster mean and variance with b equal to 9 (or 11) bits per band if the variance per band is defined to 3 bit resolution. The spectral rate Rspec is defined as the bits per pic- ture element per band (bpppb) needed to define the spectral characteristics of m.b (2) Rspec n A graph of Rspec vs. m for various values of n is shown in Fig. 4 with b assumed equal to six, which corresponds to using only cluster means as features. In some cases this description of {X07.1 may be all that is needed, and Rspec then represents the total rate in bpppb. FEATURE MAP SPATIAL DEFINITION Feature Map Rate In many applications it is desired to further approximate {Xi}7.1 by approximating each vector individually in terms of the cluster features. For example, if vector Xi, belongs to cluster j*, then X. could be approximated by T.*, which might be simply the mean of cluster Y. Alternately, X1. could _q* _* be approximated by a combination of ITA1.1. The feature map sequence {yk4=1 defines the approximation SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 125 Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 , 2 4 1.6 Fig. 4. Graph of spectral rate per band vs. number of features for various measurement vector sequence lengths with 6 bits per band per feature. 4 3 cL 2 et. eL 1 0 R3 ood=1 /179/"P9------"I spat -- Rspat 2 4 8 in 16 d=4 Fig. 5. Graph of spatial rate and prac- tical spatial rate per band vs. number of features for 1 and 4 dimensional measure- ment vectors. of each measurement vector and thus gives spatial defintion for the sequence {Xi17.1. The data rate in bpppb for fy is defined as Rspat. If each d-dimensional measurement vector is approximated by one of in primitives, then Rspat d spat d " log2m (3) Practical System. Rate. In practical systems, however, if in is not a power of two, the rate in (3) can only be approximated by representing extensions of the source. Let R3 pat be the rate achieved s in practice when groups of three of the original m alphabet source are represented by natural coding. 3 This representation of the source is easy to implement since m is small. Figure 5 shows Rspat and Rspat 3 vs. in for two values of d and shows Rspat is never significantly greater than Rspat for any value of in which is of interest. Feature Map Source Coding Entropy Coding.. A typical feature map sequence for an fX0r.11,...1 described by four cluster features is shown in Fig. 6(a). In image data there is significant spectral correlation between spatially close picture elements, and correspondingly the differences between spatially adjacent elements in the feature map have a sample distribution function similar to that shown in Fig. 6(b). The entropy for the distribu- tion of differences in the feature maps is on the average lower than log2m. Thus entropy coding techniques offer an opportunity to reduce Rspat without any degradation in the reconstructed feature map. The entropy for the distribution of differences for each fyk4.1 is often used as a measure of per- formance for the encoding of the differences. This entropy represents a performance bound for the encoding of a zero memory sequence with that sample distribution. Similarly, averaging the entropies over any number of sequences can be used as a measure of average performance for encoding sequences of differences. The average entropy represents an unachievable bound on the average performance for encoding all such sequences with zero memory, since it ignores the overhead required to identify the optimum code assignment for each sequence of differences. Entropy Coding Simulation Results. Experimental results for entropy coding of the feature maps are given in Fig. 7. Rspat is shown as a function of n and m for no coding, and for coding of differences using an adaptive variable length entropy coding technique defined in [11]. Also shown in Fig. 7 is the unachievable performance bound for coding differences with the assumption that the source has zero memory. 126 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP801300829A001100060001-0 33331111222 33311121122 44331111122 44121112143 Fig. -3 -2 -1 0 +1 +2 +3 (a) (b) 6. (a) Partial example of typical feature map. (b) Example of typical distribu- tion of differences between adjacent elements in feature map. 4 3 E:2 1 m=16 R344 pc X 3 m=8 Rspat Rc x spat PR m=4 R3 spat Rc v spat -PB CODEU cqpiatr- m=2 UNCODEBt PERFORMANCE 3 (Rspat) BOUND (PB) i.Ip 1 1 36 100 256 576 1024 Fig. 7. Uncoded and entropy coded spatial rate vs. n for m=2,4,8,16. Also a performance bound for coding the distribution of differences. All curves in Fig. 7 represent the average results for all square arrays with n picture elements in the 4-band test image shown in Fig. 14(a). Figure 7 shows that entropy coding is more effective for larger n. In addition, Fig. 7 shows the entropy coding technique described in [11] and used in these simulations performed close to the unachievable performance bounds (for the zero memory source). This entropy coding is easy to implement, especially for the small source alphabet sizes of the feature maps. Other Feature Map Coding. The above entropy coding did not introduce any degradation in represen- ting the feature map. However, in some applications only certain spatial characteristics of the feature map are of interest. For example, only closed boundaries in the feature map may be desired in field image interpretation. The detection of spatial characteristics in {Xi4=1 has been simplified, since the clustering has reduced the d-dimensional image to an array of scalar numbers indicating the spatial occurrence of multidimensional features. When only spatial boundaries are of interest, the spatial encoder must preserve only those numbers in the feature map needed to identify the boundaries, and then the resulting differences in the feature map will encode to a lower spatial rate. Another simple example of spatial redundancy removal is the deletion of subsets of the feature map with reconstruction by linear interpolation. This approach might be used if spatial resolution is higher than necessary to observe texture required in a particular application. The spatial rate in this case is simply reduced by the subsampling factor. Coded Feature Map_ Rate. Let Rc pat represent the rate in bpppb for the encoded feature map sequence, s Cr. Define the compression ratio CR as CR - R spat Rs pat (4) SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 127 Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 If only natural coding of the source extension is used, the CR is nearly one. In general, the rate for the coded feature map sequence Cr is log,m R ? ` spat d.CR ' BASIC CLUSTER COMPRESSION ALGORITHM Total Rate (s) A basic configuration of the Cluster Compression Algorithm (CCA) is depicted in Fig. 8. The input to the CCA is the user supervision, which for a fixed rate mode may be only the constant number of clusters to be used for every input sequence. For a quality controlled mode, the supervision input consists of simple threshold values which control the splitting and combining of clusters in the adaptive clustering, thereby determining the number of clusters needed for each input sequence. The CCA output for each input sequence consists of the spectral and spatial definitions for which the rates are defined respectively in (2) and (5). Thus the total rate Rtot is given by b-m log2m R = R + Rc = + tot spec spat n d.CR ' (6) Equation (6) is still valid if the entropy coder is not used, but then CR is approximately one. Note that either Rspec or Rspat could be zero in some applications, but in general Rtot is a mixture of both spectral and spatial definitions. Let Rtot in (6) be equal to a constant Rot, and express n as a function of m to get where n = b.m/Eqot log2m/(d.CR)j (7) * log2m Rtot > U7717 (8) A plot of n vs. m in (7) for a constant Rt0t=1 is given in Fig. 9 with b=6, d=4 and CR=1 and 2. This plot demonstrates that a given Rtot may be approximately obtained by many combinations of m, n and CR, and similarly there is much flexiblity in choosing the division of Rtot between the spectral and spatial definitions. SENSOR ?1 1=1 CLUSTER Rspec ASSIGN FEATURE TO EACH X. ENTROPY CODER USER SUPERVISION CLUSTER FEATURES Rspat ENCODED FEATURE MAP Fig. 8. The Basic Cluster Compression Algorithm Rtot 128 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 7- 6 1000 ? 5 4 n 1.' 500w ? 3 CR=1 ? 250- ? CR=2 125. 64. ? ? ? ? ? ..... 111 at i I I 2 1 i ! 1 - 2 4 6 8 10 12 14 16 Fig. 9. Plot of n vs. m for b=6, d=4 and Rtot=1 .2 Fig. 10. Performance vs. Rate UNCODED CCA 1. n=36 2. n=100 3. n=576, 1024 4. n=256 .4 .6 .8 1.0 1.2 1.4 Rtot (bpppb) %MSE vs. Rtot for various values of n for the uncoded CCA In terms of implementation it is desirable to keep both in and n small, since clustering requires more computations and memory when in and n are larger. However, one must also question the comparative impact on performance of various (m, n, CR) combinations which have the same Rtot. Performance might be measured subjectively in terms of image appearance, or quantitatively in terms of percent mean square error (OISE), or classification accuracy. Some pictures are provided at the end of this paper for subjective evaluations of image quality, but at this point a quantitative measure of quality is desired. Classifica- tion accuracy is an important quantitative measure of image quality for many ERTS applications. However, classification accuracy is dependent on the classification technique, as well as the compression tech- nique, and often significant accuracy improvements can be obtained by tailoring the classification tech- nique for each specific compression technique. Results of jointly investigating corresponding compression and classification techniques will appear in a later publication. Percent MSE is often not a good per- formance criteria when comparing across different compression techniques, or different images, but %MSE is generally meaningful for comparing the performance from options on the same compression technique and same test image. Thus %MSE is used here as a performance criteria in comparing (m, n, CR) combinations. Let X represent the original d-dimensional image data, and let X represent the reconstructed approximation of the same image data after compression. Then the MSE between the original image and the approximate image is the expected value of the square of the Euclidean distance between X and X. MSE = Ell X - X 112] (9) The average energy, e in the original image is defined in terms of the variance in the ith spectral band, af, i=1,2,...,d. Thus the %MSE is defined by (10) SPIE Vol 66 (1975) Efficient Transmission of Pictorial Information /129 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Simulation Results Uncoded CCA Various configurations of the CCA were simulated on the test image shown in Fig. 14(a). This test image is a 256 x 256 picture element subset from a 4-band ERTS image of the Verde Valley. The purpose of these simulations is to obtain a measure of performance vs. rate as a function of the (n, m, CR) combina- tions. The first simulations used the Basic Clustering Algorithm (BCA) of Fig. 3, but without coding of the feature map. This corresponds to determining performance vs. rate as a function of (n,m) pairs for CR=1. The performance of this uncoded CCA in terms of %MSE vs. Rtot for various values of n is plotted in Fig. 10. For each n shown, all square subsets of n picture elements in the test image were compressed using various values of m. For example, for n=36 the test image was processed four times as follows: two features (m=2) per each subset of 6 x 6 (n=36) picture elements, three features (m=3) per 6 x 6, four features (m=4) per 6 x 6, and five features (m=5) per 6 x 6. These four simulation results were then plotted in Fig. 10 and a smooth curve was drawn between them. The cluster feature used for this image approximation application was only the cluster mean, and each cluster mean was defined by either 6 or 8 bits per band (b=6 or 8). Therefore, the image approximation consisted of substituting the closest cluster mean for each picture element. For every simulation with a given n and m, the %MSE between the original and reconstructed image was calculated from (9)-(11), and Rtot was calculated from (6). For n=576 and n=1024 the performance curves were essentially identical. The results in Fig. 10 show that for the selected source data the uncoded CCA peaks in performance for all Rtot at n approximately equal to 256. Similar results have also been obtained from simulations with other test images. Coded CCA The same simulations that were conducted for the uncoded CCA (CR=1) were also conducted with entropy coding of the feature map. This simulation configuration is called the coded CCA. The purpose of these simulations was to measure performance vs. rate as a function of (m,n) pairs when CR is determined by the entropy coding of the feature map differences. The entropy coding technique used is that which was discussed earlier and which is defined in [ll]. Basically the encoder adaptively assigns variable length codewords to the differences between adjacent elements of the feature map, which results in reducing the average rate while allowing the feature map to be exactly reconstructed. All overhead rate costs have been included in the compression ratio results for the entropy coding. The spatial rate reduction, or CR due to entropy coding was shown in Fig. 7 to be greater for larger n. Thus the performance curves for the coded CCA will shift more to the left of the uncoded CCA curves when n is larger. This effect is observed by comparing the coded CCA curves in Fig. 11 with the uncoded CCA curves in Fig. 10. The coded CCA continues to increase in performance as n increases, but it is near peak performance for n=256. Adaptive CCA In the next simulations, the Adaptive Clustering Algorithm (ACA), which was discussed under Cluster Approaches, was used to adaptively determine the number of clusters to use for each sequence of n picture elements. The total rate for this CCA configuration is approximately the average of all total rates for each of the sequences of n elements, since the overhead to identify the number of clusters for each sequence is very small. The advantage of the adaptive CCA is that a lower average value of in can typically be used to obtain the same image quality. The performance and rate of the adaptive CCA are now dependent on the criteria for increasing or decreasing the number of clusters, as well as the combination of (n, m, CR). The performance of the adaptive CCA was observed to be less dependent on the size of n, with all results for n of ten or greater nearly on the same performance curve. Although Rtot was sensitive to 130 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 changes in the criteria for determining the number of clusters used per sequence, the performance vs. rate trade-off was very insensitive to changes in that criteria. There is a significant performance gain in the adaptive CCA relative to the coded and uncoded CCA. This gain is shown in curves 3, 4 and 5 of Fig. 12. However, the performance gain due to adaptability is dependent on both the image data and user application. Hadamard and Fourier Transform Techniques Tests were performed to compare the MSE performance of the CCA with some other compression techniques which are well known. The two techniques used for comparison are the well known Hadamard and Fourier transform techniques. A discussion of Hadamard and Fourier transforms in image coding is given in [12] and [13]. Both techniques were used in an adaptive form, and the compression was done on each band of the test image independently. Some improvement in performance could be obtained by modifying the technique to work jointly on all four bands. The Hadamard transform technique involved taking the two dimensional Hadamard transform of all 8 x 8 subsets of the image. Then the 64 coefficients were placed in 9 zones, each of which had the quantization and corresponding bit rate per coefficient based on the variance, or energy of the coefficients in the zone. The Fourier transform technique was similarly adaptive, but in addition a symmetrical transform approach was used which doubly folded each data subset to provide horizontal and vertical symmetry. This symmetry reduces the intensity discontinuities at the boundaries due to the low pass filtering. The %MSE vs. total rate for these two adaptive transform techniques is shown in curves 1 and 2 of Fig. 12 along with the CCA performance curves. Simulation Summary In Fig. 13 a summary of the above simulation results is given by plotting Rtot required for 2 %MSE vs. n for the various compression configurations. The data points plotted in Fig. 13 were interpreted from the smooth curves in the previous figures, as well as from other simulation results. These curves point out a substantial MSE penalty for n 0 sgn (a) = -1 a < 0 and / is expressed in binary form as the set 1 = (1m-11m-2 1110) with /r (0,1) Slant Transform(4) (2) (3) (4) For n = 4 we have: 1 1 1 1 1 [T]4 =7 3/ 5 1/ 5 -1/ 5 -3/ 5 (5) 1 -1 -1 1 1/5 -3/5 3/5 -1/ 5 Larger arrays are constructed iteratively. By design n e 2' to allow implementation in a "fast" algorithm. The resulting set of coefficients [C]n representing the partitioned image data must now be requantized into a coefficient set [C'] 0 according to some quantization model to produce the desired bit rate reduction. It is this step which is perhaps the most important in any compression algorithm. For example one can perform "zonal" coding by partitioning the transform array and assigning a bit rate to each coefficient within a partition based on rate distortion theory (7) or pre-assigned rates based on subjective evaluations of the decoded imagery. Another strategy is to threshold the coefficients and quantize only those elements greater than the thres- hold value. Utilization of adaptive techniques can also be applied. For example, bit assignments can be varied as a function of image statistics over the partition. For this case, the output bits should be buffered to insure a constant bit rate for transmission. The code word bit stream representing the sequential source coding of each image partition, along with any overhead bits required for the decoding process such as utilized in certain adaptive schemes, is then channel encoded and transmitted. At the receiver the inverse operations are performed and the coefficient set [C']n reconstructed. The inverse transform operation [II]n = [T]t[C'ln[T]n (6) then yields an image partition [I']n which should be a faithful reproduction of [I]n. These partitions are then used to form the composite reconstructed image [1'41. Before going on to discuss the comparison of I and I', consider another operation which can be accomplished before source decoding. The process discussed above can be related to a low pass filtering operation (assuming that the coefficient requantization effects can be ignored) and therefore one can investigate the possibility of applying a high frequency boost or other transfer function compensation to the coefficients of [Ct]n for each partition before inverse transformation. Such an ac-4r would, for the case of a cosine transform, correspond to the block-mode filtering of digital imagery except that there is no overlap between partitions to reduce edge effects. Thus one would form the scalar multiplica- tion: 198 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 [v]n = FICI]n (7) where Fn is an array of coefficient weights. The utility of such a technique is demonstrated in Figure 3. The original image (a) is digitized at 10 bits/pixel and N = 512 while (b) is the reconstructed image after a compression to 2.6 bits/pixel using the cosine transform with n = 16. Figure 3(c) is the recon- structed image where the coefficient weighting array F(k,/) = 1 + fik-1)2 + (A-1)2/15 for kyt = 1,2,...16 was first applied to the quantized coefficients of each partition before reconstruction. A radial section of this 2:1 boost filter is shown in 3(d). The improvement is obvious and in fact the filtered image has been subjectively rated superior to the original. The MSE As A Quality Metric The most widely used measure of image quality for coding analysis is the normalized mean square error (ISE) between the coder input (I) and output (I') images. MSE = 1I(i,j) - I' (i,j)12/ li(i,J)12 i,j i,j n 10 = :E: I 12 II(12j)41(1)3)12 /I: II: 1I(i,j)12 j..0 where indicates a sum over all partitions. The orthonormality of the transform [T] allows us to restate (8) as: MSE = Ep k;t I/I:p :12..? k,t (8) (9) One can therefore relate the overall MSE to the errors introduced on the coefficients C(X,t) by their requantization and/or deletion. The problem of optimizing the quanization algorithm relative to the MSE and subjective quality assessment has been considered previously.0) We have found that decoded image errors due to coefficient requantization in zonal coding schemes are more objectional than the smoothing effect introduced by high frequency coefficient delation (low pass filtering). To develop a useful requantization design strategy which is consistent with subjective quality measures, we separate the two major components of the MSE: where: MSE MSE = MSE + MSEf IE i c(keld)I2 /E E kd'td p 1 k,/ I c(k,01 2 for deleted coefficients C(kd,td) is the error due to filtered or deleted coefficients and MSE = MSE - MBEf In E? co?0-c,(k,012 - 1C(kd,/d)12 kd'd \ill: t IC(k,L)121 p (10) is the error due to coefficient requantization. The subjective effects are considerably different for the two error terms. The coefficient requantiza- tion generates a "noise-like" distortion and artifacts while deletion of the high frequency terms actually reduces noise, although at the expense of resolution. An obvious observation is that both quantization and filtering effects are undesirable yet unavoidable parts of the compression process, and it is felt that the compression algorithm designer can utilize MSEf and MSE in order to optimize his algorithm. Assuming an optimum quantization procedure is utilized for individual coefficients, a good statistical model is 2 available for transform domain variances, the transmission rate is fixed, and there are a total of K = n SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 199 Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 (a) Original (I) image at 10 Bits/Pixel Reconstructed decoded image with high frequency boost over each partition (b) Reconstructed decoded (I') image after compression to 2.6 Bits/Pixel 2 1 10 16 Coefficient ( k or 10 (d) Radial section of high frequency boost filter FIGURE 3. Compression/Filtering Experiment Using Cosine Transform and 16 x 16 Partitioning 200 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 coefficients per sub-block of which K are quantized and Kf are deleted (so K = Kf +), then the following procedure can be used: 1. Set Kf such that MSE MSE q f 2. Compare I and I. If I' contains negligible noise, iteratively decrease Kf. 3. If I contains excessive additional noise and/or artifacting, iteratively increase Kf. Step (1) should result in a minimum MSE but steps (2) and (3) can result in improved visual representation by the tradeoff of quantization noise and resolution loss. Experiment A computer program has been developed for the simulation of various transform coding methods and it has been used to generate data for actual imagery. Figure 4 illustrates the manner in which this package was utilized in an experiment to evaluate the correlation of MSEf and MSE with subjective image ranking. 2, The image used for this study is shown in Figure 3(a) and consists of 512 cN=512) pixel values, each quantized to 10 bits. Analysis of this array indicates that there are 9 bits of useful image data and furthermore its structure is representative of what might be encountered in a low altitude remote sensing application. This image was partitioned into one of two sizes (n = 8 or 16) and one of 3 transforms applied (Cosine, Slant, Hadamard). Selected coefficients were then requantized and others deleted to give bit rates (R) of .64, 1.4, 2.6, 4.6 bits/pixel for n = 16 and rates of 1.28, 2.6, 4.3, 6.9 for n = 8. The requantization scheme consisted of first looing at the coefficients in bands parallel to the opposing diagonal of [C]n (Figure 5(a)). We, and othersk111, have observed that iso-variance regions of 2: [C] P n tend to line up quite well with this zonal approach. After determining the coefficient variance2 for the 5th zone one of the following two methods was applied. 1. The maximum absolute value 1 CI max i in each band was determined and the coefficients in that zone uniformly requantized between + ICI ? C was quantized to 10 bits for each zone. ? max max 2. The variance a2 was used to define a gaussian quantizer for that band, cr2 was quantized z z to 10 bits for each zone. The actual bit assignments for each zone to give the above bit rates are shown in Table 1. In addition to these zonal coding experiments, an adaptive approach as used in reference 10 was utilized for the cosine transform with n = 16. In this case the coefficients of [C], were first ordered into a pe-dimensional sequence of 256 values. The analog for n = 8 is shown in Figl;re 5(b). The coefficients C(m) for in = 1 to 256 were then recursively coded using a variance estimate e- based on previously coded coefficients, and a gaussian quantize. The coded coefficients 'a(m) are determined by the quantiz- ation operator Q such that Ckm) = /t1 C(m)) and the next estimated variance is am+1 = m' We2 + (1-W) 6(m)2 for a pre-determined weight W. The primary assumptions of such a technique are that the C(m) are uncorrelated and the 02 are highly correlated. This particular coder was operated at 1.55 bits/pixel. Each of the above 49 coded data sets was decoded and the resulting images [It]N were both numerically and subjectively compared to [I]N. The numerical comparison consisted of a computation of MSE, MSEf and MsE for each image [I']N while the subjective analysis consisted of a ranking of the displayed imagery according to a variety of tasks aimed at "exploiting" the imagery. Care was taken to insure that the subjective ranking was somehow tied to how the data would be used. The displayed imagery represented a wide range of quality and bit rates as demonstrated by Figures 6 and 7. Figure 6(a) is the decoded image for the cosine transform with n = 16, R = 2.6 bits and uniform requantization while 6(c) is the same transform with R = .64 bits. The corresponding scaled absolute value difference images (al[I], - [I'JNI) are daown in Figures 6(b) and 6(d). The scaling factor a was set to 15 to enhance the visua/ impact of the error distribution. One can immediately note the correlation of errors with image edges and other regions of high spatial frequency content. Since image information may be located at or near edges, it is a natural step to look at the performance of the adaptive scheme. It should be sensitive to the frequency content (or sub-block "business") and therefore tend to distribute the errors over the entire scene rather than at edges. Figure 7 demonstrates that phenomenon. Figure 7(a) is the decoded image for R = 1.4 bits, n = 16 and gaussian requantization of the cosine transform coefficients, while 7(b) is the scaled (a = 15) difference image. Figure 7(c) is the decoded image for the adaptive case (R = 1.55) with 7(d) the scaled difference image. Comparison of 7(b) and 7(d)illustrates the superior performance of the adaptive approach in decorrelating coding errors and image structure, while providing an overall MBE roughly half that of the non-adaptive case for nearly equal bit rates. The results of the computed objective (MSE) metrics are shown in Table 2 for all 49 cases used in the experiment. The cosine transform is clearly the superior performer when one compares the metrics for equivalent quantizer, R and n. Also the gaussian quantizer is superior to the uniform case, but the margin is suprisingly narrow. Finally, the apparent ambiguity in MSEf and MSE between Slant and Hadamard transform cases should be noted. This is due to the different convergence properties of the higher sequency Slant and Hadamard terms. A regression analysis was performed on each ranking of the image data and the corresponding set of metrics, this being done for all such rankings. Table 2 shows the resulting correlation coefficients for each MSE metric. SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information /201 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Input image N = 512 = 10 bits V Hardcopy Subjective ranking Difference images Hardcopy Output Image N = 512 = 10 Bits 1 2 3 4 5 6 Partition n = 8,16 1. [I ]n Compute MSE, MSEf ' MSE Reformatting Transform: Cosine, Slant Hadamard Inverse Transform [C3 1_31: ima Coefficient Requantization and Deletion R = .64, 1.4, 2.6, 4.6 =1.28, 2.6, 4.3, 6.9 Uniform, Gaussian Adaptive (R = 1.55) FIGURE 4. Transform Coding Evaluation Experiment k 1 2 3 1 2 7 8 Z= 4 5 6 7 8 3 4 5 6 7 8 / / / / / 4- / / / / / / / / -, / // 4 / / / / / / e A / / # / / / / / / / / / / ? / / / / e / / / / , / / , / / , / / / e /- / / . / / / / / / , / / . / / / / , / / / .0.' / ? / / / ? . / / .0 / / / / I / / ? / / I / / / / / / ( / / / / / ? / / / / / / / / ? / / / C of ? / / / / / / / / / / / / / / / / / / / / / / / / / // , o? / / , , , (a) Coefficient Zones (z) 9 10 14 15 Coefficient Reconstruction (b) Coefficient Ordering to Obtain C(m) FIGURE 5. Definition of Transform Domain Zones and Coefficient Ordering for 8 x 8 Partitioning 202 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 (a) Reconstructed decoded (I') image for n = 16, R = 2.6 Bits/Pixel, MSE = .073% (c) Reconstructed decoded (I') image for n = 16, R = .64 Bits/Pixel, MSE = .5934 (b) Difference image for case R - 2.6 and a = 15 (d) Difference image oIIt for R = .64 and a = 15 FIGURE 6. Cosine Transform Coding Examples with Uniform Quantizer SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information /203 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Reconstructed decoded (I') image for n = 16, R = 1.4 Bits/Pixel, Gaussian Quantizer, MSE = .17% (c) Reconstructed decoded (I') image for adaptive coder, n = 16, R = 1.55 Bits/Pixel, MSE = .08% (b) Difference Image alI-I'l for case R = 1.4 and a = 15 Difference image a = 15 (d) for adaptive case and FIGURE 7. Comparison of Adaptive and Von-Adaptive Coders using Cosine Transform 204 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 zone z - n R 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 to 3 16 INIMMINa 0.64 109 8 8 6 DELETED 1.4 109 8 8 6 4 4 4 3 3 2.6 109 8 8 6 4 4 4 3 3 3 3 3 3 3 3 4.6 109 8 8 6 4 4 4 3 3 3 3 3 3 3 3 3 8 1.28 lo 9 8 DELETED 2.6 1098 8 6 4.3 109 8 8 6 4 4 4 6.9 10 9 8 8 6 4 4 4 13 13 13 13 13 13 13 13 TABLE 1. Zone Bit Assignments for Experiment Bit Rates (R) and Partition Size (n). TRANSFORM COSINE SLANT HADAMARD n R QUANTIZER MBE msEf MSE MSE MSE MSE MSE M SEf MSE 16 .64 Uniform .59 .579 q .011 * .67 .26 f .657 .23 q .013 .03 .69 .28 .676 .25 q .014 .03 1.4 .19 .17 .020 2.6 ti .073 .030 .046 0 .027 .030 .096 .040 .o6o o .036 .040 .102 .06 .068 o .034 .06 4.6 II 16 .64 Gaussian .55 .542 .008 .62 .609 .011 .63 .618 .012 1.4 " .17 .155 .014 .24 .219 .021 .25 .231 .019 2.6 fl .070 .049 .021 .09 .062 .028 .088 .062 .026 4.6 fl .028 0 .028 .038 0 .038 .035 0 8 8 1.28 Uniform .47 .465 .005 .49 .483 .007 .50 .493 .035 .007 , 2.6 u .123 .112 .011 .16 .138 .022 .17 .149 .021 4.3 II .047 .023 .024 .060 .030 .030 .061 .033 .028 6,9 ti .028 0 .028 .030 0 .030 .028 0 1.28 Gaussian .45 .446 .004 .46 .454 .006 .46 .455 .028 .005 2.6 i, .10 .092 .008 .11 .095 .015 .12 .109 .011 .0 .018 .012 .040 .020 .020 .038 .020 .018 6.9 ht .021 0 .021 .024 0 .024 .022 0 .022 1.55 Adaptive .080 .046 .034 TABLE 2. Objective Metrics for Coder Evaluation Experiment SPIE Vol. 66(19751 Efficient Transmission of Pictorial Information /205 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 METRIC CORRELATION COaTICIENT MSE .73 M mI, .75 MSE, .86 TABLE 3. Objective MSE Metrics and their Correlation with Subjective Image Ranking Conclusions The development of a viable and general objective meansure of image quality is a difficult task. One should try and identify the salient features of a particular image chain model and encorporate them in some fashion into a particular quality criterion well-suited and in harmony with the image and its intended use. We have taken such a track in the development of a refined MSE error as it applies to the judgement of imagery which has been transform coded. The noise introduced by coefficient requantization (MSE q) has been found to be a superior measure of quality than the traditional overall MSE. It is hoped that future work in this area will yield still better metrics and aid in the efficient modelling of such data transmission systems. Acknowledgements The authors wish to acknowledge the programming support provided by Mr. John N. Hamilton. References 1. A. G. Tescher and J. R. Parsons, "Cross-Spectrum Error Criterion as an Image Quality Measure," Applied Optics, Vol. 12, June 1974, p. 1460-1465. 2. P. A. Wintz, "Transform Picture Coding", Proc. IEEE, Vol. 60, No. TO, July, 1972. 3. W. K. Pratt, J. Kane and H. C. Andrews, "Hadamard Transform Image Coding," Proc. IhhE, Vol. 57, Jan. 1969. 4. W. K. Pratt, W. Chen and L. R. Welch, "Slant Transform Image Coding," IEEE Trans. Comm., Vol. COM-22, No. 8, Aug. 1974. 5. N. Ahmed, T. Natarajan and K. R. Rao, "Discrete Cosine Transform," IEEE Trans. Computers, Vol. C-23, pp. 90-93, Jan. 1974. 6. C. Cardot, "A New Eye on Walsh Functions", Proc. 1972 Symposium on Applications of Walsh Functions, AD-744-650, pp. 393-400. 7. L. D. Davisson, "Rate Distortion Theory and Applications," Proc. IEEE, Vol. 60, July 1972. 8. B. R. Hunt, "Block-Mode Digital Filtering of Pictures", Mathematical Biosciences, Vol. 11 (1971), p. 343-354. 9. M. Tasto and P. A. Wintz, "Image Coding by Adaptive Block Quantization," IEEE Trans. Comm., Vol. coni-19, pp. 957-972, Dec. 1971. 10. A. G. Tescher, "Adapative Intra-and Inter-Frame Image Coding Utilizing Fourier Techniques," USC Image Processing Research Semiannual Technical Report No. USCEE 444, pp. 24-33. 11. W. B. Scheming, "Digital Image Transform Encoding," RCA Corporate Engineering Services, Report RE-20-1-10, 1973. 206 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 A COMPARISON OF HARDWARE IMPLEMENTATIONS OF THE HADAMARD TRANSFORM FOR REAL-TIME IMAGE CODING Stephen C. Noble Air Force Avionics Laboratory Wright-Patterson Air Force Base, Ohio 45433 Abstract Several different hardware configurations for implementing the Hadamard transform in real-time are discussed. The discussion is referenced to a 64 point (8 x 8) transform that is used in image coding for bandwidth reduction. A successive implementation of two 8 point transformations with intermediate random access memory using the straight forward N2 computation is compared to implementations of the Fast Hadamard Transform (FHT). The speed of computing transform coefficients is compared to the hardware requirements of the various techniques. The relationship between computational speed and the resolution and frame rate of television imagery that may be transform encoded using the different methods is illustrated. It is shown that for picture element (pixel) rates up to 5 MHz the N2 computation is the more economical to implement. Introduction Transform coding for the efficient transmission of images has been a subject of inter- est in recent years. In a general article on the subject, Wintz (1) describes the com- monly used procedure of subdividing the image into subpictures and then computing the transform of the pixels in each subpicture separately. The subpicture size referred to in this paper is 8 x 8 pixels. The basis for the interest in transform coding is that, for a variety of transforms, picture information is concentrated into fewer transform coefficients than the pixel amplitudes that are transformed. By efficiently assigning bits to the transform coeffi- cients a substantial reduction in the number of bits over those required for conventional pulse code modulation (PCM) of the pixels is obtained. Reductions from 6 bits/pixel PCM to 1-2 bits/pixel (average over the subpicture) in the transform domain is typical. Of the two-dimensional transforms, the Hadamard transform allows for the fastest im- plementations in real-time.(2) Recent advances in charge coupled device (CCD) technology have made possible the use of the cosine transform at pixel rates in the 1-5 MHz range.(d) The differences in the mean square error performance of the various transforms when used to encode images having exponential correlation statistics are in the range of .05% at the .25% error 1evel.(4) In choosing a transform for image coding the paramount question is one of cost for the given speed requirements. The Hadamard transform has been implemented in real-time(5) using TTL hardware in the pipeline FHT configuration. For cases where a slightly lower resolution of 256 elements per scan line or lower frame rates of 7.5 frames per second are acceptable, a combination of two 8 point Hadamard transforms with intermediate random access memory (RAM) may be used. This combination requires only two arithmetic accumulators. The Hadamard Transform The following discussion will address the 8 point Hadamard transform. Larger point transforms are common, but for purposes of illustration the 8 point transform is easier to use. Furthermore, there is evidence that there is no substantial improvement in per- formance for the Hadamard transform for subpicture sizes greater than 8 x 8 pixels.() The 64 point two-dimensional transform may be obtained by cascading two 8 point trans- forms. The 8 point Hadamard transform is described by eight equations of the type shown in equation (1), which is the sequency 5 Hadamard coefficient. 5 -4 PO - P1 - P2 P3 - P4 P5 P6 - P7 (1) As can be seen, each Hadamard coefficient is generated by a combination of sums ,ad differences of the eight pixel amplitudes. Equation (2) is the matrix operator") that defines the addition and subtraction operations for all eight Hadamard coefficients. SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information /207 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Sequency or No. of Sign Changes + + + + + + + + 0 + + + + - 7 + + + + 3 + + + + 4 [H] 8 = + + + + 1 + + + + 6 + + + + 2 + + + + 5 (2) The only operations required to implement the Hadamard transform in hardware are addition/ subtraction and register shifts (multiplication/division by 2). The Fast Hadamard Transform The fast Hadamard transform takes advantage of intermediate results in the computation of coefficients to reduce the total number of computations required. The implementation is very similar to the fast Fourier transform which requires multiplication of complex numbers to implement. The Pipeline Implementation Fig. 1 illustrates the pipeline implementation of the FHT. The dashed line input to each arithmetic element designated with a minus sign (-) is the subtrahend input to ti}e, indicated subtraction. The figure is organized in the perfect shuffle configuration."' Each stage of the computation is interconnected in the same way. The speed of this method is limited only by the speed of the arithmetic hardware used. Each symbol in Fig. 1 represents an arithmetic element together with a register. Arithmetic operation speeds on the order of 25 ns are available using emitter coupled logic (ECL) 10,000 series devices. For an 8 point transform this corresponds to a pixel transformat4.okl rate greater than 300 MHz. The system implemented at NASA-Ames Research Centerk5) uses a 16 point pipeline Hadamard transform together with a multiplexed 4 point implementation to achieve a 128 megapixel per second transformation rate. The system uses TTL hardware operating at 125 ns for each arithmetic operation. Such speeds are required in special applications where it is not possible to process pixels at a steady rate. That such high speeds are not required may be made apparent by considering a 525 line 30 frame per second video signal that has been digitized to 512 pixels per line. The pixel rate is 512 x 525 x 30 = 8 MHz. Clearly, when it is feasible to buffer the video data properly, the number of arithmetic elements used may be traded against lower speed requirements. The Iterative Implementation of the Perfect Shuffle Since the interconnection pattern of the perfect shuffle is the same for all stages of the computation, the hardware can be reduced to one stage by feeding the output back to the input and cycling the operation 3 times. In general, the number of cycle times required is log 2 N, where N is the number of points to be transformed. This cycling operation is illustrated in Fig. 2 with a feedback loop to show that the output of each arithmetic element is feed back to the corresponding point above the element. Every third operation new data is entered through the multiplexer (MUX). This configuration operates at one third the rate of the pipeline system, but requires only eight arithmetic elements. A pixel rate of 100 MHz can be achieved using ECL 10,000 series components. Using low power, Schottky TTL pixel rates of 20 MHz can be maintained. Based on the calculation in the preceeding paragraph, these speeds are still in excess of the 8 MHz required. The main reason for using such high speed processing has been the difficulty in fully buffering the pixel data. To transform 525 line interlaced video may require a field storage with access to 4 lines of video data in each of two fields. Once the required data is collected it is usually available at the 8 MHz rate from many outputs simultaneously. In the case of a 64 point (8 x 8) transform a total of 8 inputs at 8 MHz would result since the output of eight scan lines would have to be processed simultaneously. To avoid this requirement, eight lines of pixel data must be stored and read out one subpicture at a time. For 512 pixels by 8 lines this amounts to 4K 208 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0 Po P1 P2 P3 P4 P P6 P7 H0 H7 H3 H4 H1 H6 H2 Fig. 1 Pipeline Implementation of the FHT H5 storage with a read-modify-write speed of 125 ns to meet the requirements for 30 frame per second video. At 6 bits/pixel this amounts to 24 11< TTL RAM chips. In addition to the chip count, these high speed RAMS will consume 15-20 watts of DC power. An 8 Point FHT Using One Arithmetic Element Except for the high speed memory requirement a single ECL arithmetic element could perform the required 8 log 2 8 = 24 additions and subtractions fast enough to transform pixels at the required 8 MHz rate. At 25 ns for an addition operation the 24 operations could be completed in 600 no. This equates to 75 ns per pixel or a 13.3 MHz rate, which is more than adequate. Several very high speed ECL memories are required to manipulate the data to be operated on by the ECL arithmetic element. A possible sequence for that manipulation was outlined by Pratt.(2) An Economical Implementation for Low Frame Rates The preceeding section discussed several implementations of the FHT that operate at pixel rates in excess of 10 MHz. If the application is suitable for low frame rates (7.5 frames per second or less) or lower resolution (256 x 256 pixels per frame), then a simple implementation of a 64 point (8 x 8) transform can be achieved in real-time. Fig. 3 illustrates the use of two 8 point transforms using N2 = 64 computations each and a RAM to process the video data. SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information /209 Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0 FEEDBACK Pixels Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 1 P2 P3 P4 Ps P6 P7 HO H7 H3 H4 H1 H6 H2 Fig. 2 An Iterative Implementation of the FHT Mode ALU Accum. RAM H5 R4 Mode ALU Accum. Fig. 3 A Two-Dimensional Implementation of the Hadamard Transform Using Two ALU's and a RAM Transform Coef. 210 / SPIE Vol. 66 ( 9i EfficientTraRSIni 1.0 Qf Pictorialformation2.i,_ 1m5pprovea r-orssKeieaseziO U 09 / 0 5 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0 Circuit Operation Video data in the form of pixels enters R1. After the accumulation of 8 pixels they are loaded into R2 and recirculated in R2 8 times at a rate 8 times faster than the shift rate in R1. During each circulation of the 8 pixels the mode input to the arithmetic logic unit (ALU) provides the proper sequence of add (+) and subtract (-) instructions to the ALU to generate a Hadamard coefficient. At the end of each circula- tion the Hadamard coefficient is transferred from the accumulator into the RAM. The RAM is large enough to store 8 lines of transformed video data. This amounts to 41< words for 512 elements per line. The data is read from the RAM orthogonal to the orientation in which the data was stored. This procedure generates the two-dimensional Hadamard trans- form coefficients when the data from the RAM is processed through the second 8 point transform circuit. Speed of Operation The primary limitation in the speed of operation of this circuit is the RAM. A read- modify-write time of 500 ns will result in a 7.5 frame per second processing rate. Schottky TTL or ECL logic will meet the 62 ns operating speed required to implement the 8 point transform at this speed (8 operations x 62 ns = 500 ns). For this lower speed operation, less than 50 IC's are required to implement this two-dimensional Hadamard transform. The use of ECL to build this unit would result in a pixel rate of greater than 5 MHz. This would be 20 frames per second at a resolution of 500 x 500 pixels. The use of higher speed RAM's (200 ns read-modify-write) would be necessary to build the system. 41< RAM's are not generally available with this speed. An increase in the number of RAM IC's from 8 to 24 would be required. Conclusions Several methods of computing the Hadamard transform have been illustrated covering a wide range of computational speeds. In particular, a combination of two 8 point transformers using the straight forward N2 = 64 computation and a small RAM results in an economical implementation when lower frame rates or resolution are adequate to meet the requirement. References 1. Wintz, P. A., "Transform Image Coding," Proceedings of the IEEE, Vol. 60, No. 7, pp. 809-820, July 1972. 2. Pratt, W. K., Kane, J., and Andrews, H. C., "Hadamard Transform Image Coding," Proceedings of the IEEE, Vol. 57, No. 1, pp. 58-68, January 1969. 3. Habibi, A., Pratt, W. K., Robinson, G., Means, R., Whitehouse, H., and Speiser, J., "Real Time Image Redundancy Reduction Using Transform Coding Techniques," 1974 Inter- national Communications Conference, June 1974. 4. Ahmed, N., Natarajan, T., and Rao, K. R., "Discrete Cosine Transform," IEEE Transactions on Computers, Vol. C-23, pp. 90-93, January 1974. 5. Noble, S. C., Knauer, S. C., and Giem, J. I., "A Real-Time Hadamard Transform System for Spacial and Temporal Redundancy Reduction in Television," presented at 1973 International Telemetering Conference, October 1973. 6. Robinson, G. S., "Orthogonal Transform Feasibility Study," prepared for NASA/MSC by COMSAT, Contract No. NAS 9-11240, Final Report, pp. 3-7 to 3-21, October 1971. SPIE Vol. 66 (1975) Efficient Transtni.ssion of MctotialinformItion /211 Approved For Release 2001/09/05 : CIA-RDP80B00829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 A TEACHING STEREO-VIDEO MICROSCOPE James F. Butterfield, President Dimensional Television Corporation 13720 Riverside Drive Sherman Oaks, California 91423 Abstract A new medical training instrument has been developed, which provides a means whereby microsurgery procedures performed using an optical microscope can be viewed by any number of medical students. This is accomplished by picking-up a television picture from the operating microscope and displaying the picture so the operation can be seen "live" on TV monitors at remote locations. Also the picture can be recorded on video tape for later playback for med- ical teaching purposes. This instrument has the principal advantage over other teaching means of microsurgery in that It can provide a three dimensional color picture so that the student sees the same view the surgeon saw as he perform- ed the operation. Background In recent years an increasing number of surgeons have been using optical stereo microscopes to view small areas of the body so that they can perform microsurgical techniques. Such surgical microscopes are used in the fields of ophthalmology, otolaryngology, neurosurgery and plastic surgery to name a few. However, medical schools and indi- vidual surgeons have had difficulty in teaching medical students their techniques and in keeping up with new techniques developed by others. This problem is occasioned by the fact that such surgery is performed in a small area and usu- ally has to be viewed with magnifications of up to 20x. It is, therefore, difficult for a group of students to gather around the patient and attempt to see the same area of the body the surgeon is viewing. The only exception to this is an assistant's microscope trained on the area, which allows one person at a time to see an oblique view. Surgeons some years ago drew illustrations of surgery and used these as visual teaching aids. More recently still and motion picture cameras have been trained directly on the area of operation or have been attached with beam split- ters to operating microscopes. These photographic means have certain disadvantages. Except in the case of Polaroid? photos, the picture is not immediately available as development and printing usually take several days. When the pic- ture is viewed it is sometimes out of focus or some other technical problem occurred, making the photograph or film not useable. Also there is no way for others to see this picture simultaneously with the operation and it must be view- ed at a special meeting set up some days later. Finally, these methods provide a picture in only two dimensions and not three dimensional as the surgeon sees with his own two eyes directly or through the stereo optical microscope. DISPLAY CONSOLE SIDE VIEW TV MONITOR ktt VIDEO TAPE RECORDER CLASSROOM TV MONITOR TOP VIEW STEREO-HOOD STEREO-SCREEN TELEVISION CAMERA / ZOOM TUBES TUBES rip EYEPIECES STEREO BEAM SPLITTER OBJECTIVE SLIT LENS / LAMP / MICROSCOPE FRONT VIEW TEACHING STEREO-VIDEO MICROSCOPE 212 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 The author and his associates have been active in the fields of video microscopy and three dimensional television (1 thru 6)? A recent paper on the development of a surgical video microscope describes an instrument which produces a high resolution stereoscopic color television picture so that the surgeon can actually operate by looking at a televi- sion display. This equipment was designed in coordination with and named in honor of Dr. John P. Beale, Jr. of the San Francisco Eye Institute. With such equipment the picture must be three dimensional for the surgeon to perform delicate operations, such as cutting and sewing. The surgical video microscope has the advantage over an optical microscope of permitting the surgeon considerable head movement (7). Description The illustration shows the various components of a Teaching Video Microscope. The Slit Lamp directs its light on the patient and the light is in turn reflected up through the Objective Lens and Zoom Tubes of the optical microscope. The eyepiece housing has been separated from the objective lens housing and a Stereo Beam Splitter inserted therein. This transmits half of the light of each image of the stereo-pair up through the Eyepieces and the other half of the light of each image is transmitted horizontally to a Television Camera. The left and right eye images are focused side-by- side on the pickup tubes of the miniaturized color television camera. The two images are displayed side-by-side on a color Television Monitor in a Display Console for "real time" viewing and on classroom TV monitors. The classroom TV monitors have a Stereo-Screen positioned in front of them. This consists on each half of polar- izing filters oriented at right angles to each other. The students wear Stereo-Glasses, which include corresponding polarizing filters in front of each eye that insure the proper image is channeled to the appropriate eye. The Stereo- Glasses also include prism wedges, which cause the images to appear superimposed. Attached to the Display Console monitor is a hinged Stereo-Hood, which consists of an enclosure holding a large pair of Stereo-Glasses. In this case, only individual viewing is possible but the viewer can wear regular corrective glasses and turn away and make notes and look back at the image without the necessity of removing the special Stereo-Glasses. The Display Console also contains a conventional Video Tape Recorder, which is connected into the circuit. This VTR records the two side-by- side pictures and plays them back for stereo viewing on the classroom monitors. Advantages The Teaching Video Microscope has several valuable advantages. This is the only means whereby a group of students can see the operation in exactly the same manner as the surgeon does in 3D and color. This viewing can take place simultaneously with the operation or a video tape may be made for later instruction or analysis purposes. The video tape can be edited into a quality educational tape, which may be distributed through medical television network channels. As the operation is being performed other members of the surgical team can follow the progress of the operation on two dimensional or three dimensional television monitors and thereby be more proficient in aiding the surgeon. Al- so, other surgeons can follow the progress of the operation from remote areas without having to scrub and suit up. They see the same picture that the surgeon does and hear his comments. Since the tapes are conventionally recorded and played back, this permits easy distribution for medical training purposes in teaching hospitals, research organi- zations, or for viewing by the doctor in his office or home. The equipment required is that of a conventional video tape recorder and monitor. If the viewing is to be in 3D then a Stereo-Screen and Stereo-Glasses or a Stereo-Hood is necessary. The Hood can be removed or swung up out of the way during two dimensional viewing. Specialized Teaching Video Microscopes can use "false-color" coding, wherein a black-and-white television pic- ture is picked up and electronically processed so that the different levels of gray provide many shades of color on a color TV monitor. The "false-color" picture is similar to microscopic "staining" and conveys more visual informa- tion to the eye than a black-and-white and gray picture. Another possibility is to use electronic polarity reversing so that the picture is seen in its normal shades of black-and-white and then at the touch of a switch the blacks are re- versed to white. Sometimes rapid polarity reversing by the surgeon brings out features not seen in a normal manner. Television pickup tubes are now available, which are sensitive to a wide range of the electromagnetic spectrum. For example, there are tubes sensitive to infrared or ultraviolet and these may be used so that heretofore unobservable features can be seen by the surgeon and student. Finally, low light level pickup tubes can be employed for conditions where the intense illumination of a slit lamp would be damaging to the tissues. Microscopes are following the evolution that has occurred in other fields in that optical instruments, such as motion pictures, are being supplemented and replaced by electronic instruments, such as television. The electronic picture of the Teaching Video Microscope can be stored in a computer, processed and displayed. Electronic enhance- ment can bring out features not decernable optically. Conclusion In the past the surgeon has been extremely limited in his ability to visually communicate information regarding operational techniques on small areas of the body. Now television has revolutionized this by converting the optical picture into an electronic picture, which can be sent down a cable, through the air, and recorded on magnetic tape to be seen now or at a later time. The Teaching Video Microscope can provide a three dimensional color television dis- play of surgical procedures so that the student sees essentially the same picture that the surgeon saw during the this he performed the operation. SPIE Vol. 66(1975) Efficient Transmission of Pictorial Information /213 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 References 1. Butterfield, J. , "Video Microscope", S. P. L E. Seminar on Imaging Techniques for Testing and Inspection, February, 1972. 2. Butterfield, J.., "Television Microscopy", Preprints of 25th Annual S. P. S. E. Conference, p. 1-4, May, 1972. 3, Butterfield, J., "Three Dimensional Television", Proc. of 15th Annual S. P. I. E. Symposium, Vol. 3, p. 3-9, September, 1970. 4. Butterfield, J., "A Survey of Three Dimensional Displays", Digest of IEEE International Convention, p. 116- 117, March, 1972. 5. Butterfield, J., "Stereoscopic Television", Digest of OSA Tech. Papers of "Design & Visual Interface of Biocular Systems", p. ThA7-1, May, 1972. 6. Butterfield, J., "Stereo Television Microscope", U.S. Patent No. 3,818,125, June, 1974. 7, Butterfield, J., "Development of a Surgical Stereo-Video Microscope", Proc. of 18th Annual S. P, I. E. Tech Meeting, Vol. 54, p. 8-11, August, 1974. 214 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP801300829A001100060001-0 PICTORIAL INFORMATION TRANSMISSION THROUGH SIMULATION Robert T. P. Wang Honeywell Marine Systems Division California Center West Covina, California 91790 Abstract In an age when man is inundated with information, his natural ability to selectively assimilate the data presented to him has become an indispensible tool for survival. By the same token, man's visual perception limitations have been used to reduce the amount of data needed to reproduce pictorial information designed for his consumption. A comparative study of two divergent approaches to the problem of providing an optimum amount of video in- formation to a human viewer is discussed. Video communication systems exemplify one area where the approach consists of reducing the data from real world scenes by video data compression algorithms. The opposite approach is found in visual simulators where scenes are constructed synthetically to approach real world realism by adding cues to the basic structure of the digital image representation used. Such simulators are used in groundbased trainers designed to reduce the cost of training operators of expensive equipment. In both situations there is a need to provide realistic video to a human observer. In the quest for optimum pictorial information transmission, simulated scenes are shown to provide some rather unusual, hitherto unexplored, insights and alternatives. 1. 0 Introduction Communicating through visual imagery has received considerable attention in two rather diverse fields of application -- the efficient transmission of video information over digital communication channels, and the means of providing the visual stimuli needs of groundbased operator trainers. In both cases, a realistic real-time image reconstructed from a minimum number of information bits is desired. Video data compression begins with an analog image that is sampled, and digitally coded. The resulting bits are then acted upon by algorithms that use a priori knowledge of the local structure of the pictures in time and space. Some processing algorithms also use visual perception properties of the human eye. The result of such processing is a minimal set of data that can be used by its matching receiver to produce a subjectively pleasing picture to the viewer. In this way the required channel bandwidth can be reduced. Visual imagery displays in simulation trainers are used to provide a sense of realism to the trainees. The simulator must respond to physical inputs just as the operational equipment would with sufficient detail and resolution to supply all the necessary visual cues to the student. Since the imagery is simulated, the "world" which forms the "source" of the imagery must be stored within the simulator as a data base. Using the stored data and the geometry of the scenario, the proper projection of the light rays is computed to provide the desired image. To produce a real- time system, a study to determine the minimum information needed to provide realistic, real-time visual imagery must be undertaken. Therefore, the approach used to simulate visual imagery for a trainer is discussed. Beginning with basic line drawings, factors that add realism to the simulated image are progressively introduced until an image is obtained. The approaches used to provide a "realistic" or visually pleasing image for digital video data communication systems appears to be diametrically opposed to the approach used in simulation trainers. However, since the goals of both endeavours are ultimately the same, it is important to both fields that the approaches used by one another be understood. Such a cross fertilization of ideas will result in better techniques for both. The fundamentals of data compression and simulation trainers are outlined in the next two sections followed by a summary of salient principles used. By applying the underlying principles from one field to the other, it is shown that new directions emerge in both endeavors that will potentially aid the advancement of each. A discussion of these methods is given in Section Four. 2. 0 Fundamentals Of Video Data Compression The generic open-loop video data communication system is shown in Figure 2-1. The basic nature of the source signal and the per- ception properties of the human eye play key roles in the development of any video data compression system. For completeness, the underlying nature of both the source and the viewer are reviewed in the following two sub- sections. 2.1 The Source The data source is derived from scanning a projection of the visible world onto a two-dimensional CAMERA TRANSMITTER CHANNEL! RECEIVER DISPLAY S = Signal Source 0 = Optical Imager (Camera Optics) T = Transducer (Vidicon) E $ = Source Encoder (Data Compressor) E c = Channel Encoder (Error Coding and Modulation) Dc = Channel Decoder (Error detection and correction and Demodulation) Ds = Source Decoder (Data Decompressor/Image Reconstruction) TV = Image Display Fig. 2-1. Video Communication System SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information / 215 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 plane*. The projected data display spatial and temporal properties that can be exploited to reduce signalling rates. Estournet [1969] showed the video source to be a non-stationary stochastic process. Clustering of iso-intensity picture elements (pixels), forming a source parameter that obey certain area distributions, was studied by Nishikawa, et al [1965]. Both of these properties are inherent in pictures and were used in the development of a generalized structural model of the source [Wang, 1972]. In addition to these characteristics, it is known that intensity variations form patterns in the spatial domain that cover non-trivial areas. Thus, correlation between spatially distinct pixels can persist even over large spatial distances [Wang, 1975]. Time sampling of the video source results in a series of frames. ** Within the time span between two suc- cessive frames, the amount of motion that produces changes between the two frames is minimal. This results in considerable temporal redundancy.t 2.2 The Viewer The human visual system can be grossly partitioned into four major components. Consider Figure 2-2. The optical system images the "real world" onto a screen called the retina. Transducers (rods and cones) on the retina sample and convert the image into electrical impulses that are transmitted along neurons to the brain. It is at the transducer juncture that man's knowledge becomes extremely dim. Considerable processing is thought to occur between the rods and cones and the optic nerve. This is based on the fact that the number of individual fibres in the optic nerve accounts for only 1% of the total number of sensor elements in the retina [Merchant, 1965]. It is further conjectured that processing might even occur along the optic nerve itself. Rods and cones are unevenly distributed on the retina. A high density of sensors concentrate around the fovea and drop off rapidly on either side as shown in Figure 2-3. This sampling configuration results in spatial band- limitedness that changes with the distance an object lies off the optical axis of the eye. S 0 = Physical Optics (Eyeball) T = Sampler/Transducer (Retina) P = Preprocessor/Transmission Line (Neural Network) B = Memory/Processor (Brains) C = Feedback Motor Controls (Eye Muscles) S- Signal Signal Source (Light from the field of view) Fig. 2-2. Simplified Block Diagram of the Human Visual System. 16 12 Lqi 8 p 4 ------------ 30 I ' ' ( NASAL SIDE BLIND SPOT ---- -- ? 60 30 0 30 60 ANGULAR SEPARATION FROM FOVEA MEG/ 90 Fig. 2-3. Distribution of Rods and Cones on the Retina. The human eye is very sensitive to sudden changes that are connected in space to form edges. However, a pattern of edges that form a spatial structure will produce an averaging that makes the eye relatively insensitive to noise having a matching pattern. These phenomena have been modelled by a cross-coupling neural net that produces laternal inhibition [Cornsweet, 19711. This model also explains the high sensitivity of the eye to motion, noise in low activity regions of a picture and, an insensitivity to noise along edges in a picture. In the temporal domain, bandlimitedness is manifested through the well known phenomenon of "persistence of vision". Persistence of vision has been the basis for motion in movies and television for many years, allowing movement to be captured in time samples of the pictorial data. Finally, the eye is responsive to a tri-stimulus color system and becomes progressively color blind as the spatial size of a colored object diminishes. This fact was used in the design of the NTSC standard where high spatial frequencies are represented by the luminance signal alone. A further experimental fact concerning colored imagery is that the eye is more tolerant of luminance noise and error when color is present. The factors discussed in the previous two sections will be used to support the reasons underlying the various video data compression techniques that have been developed through the years. 2.3 Basic Compression Approaches 2.3.1 Overview. Data compression algorithms initially concentrated on the characteristics of pictures, but later expanded to include the properties of human visual perception. * The visible world is assumed to include imagery that is not necessarily visible to the naked eye, but which can be made visible via various processing means. ** American NTSC standard frame repetition rate is 30 frames per second. t It has been shown that camera pan and rapid scene changes do occur, and that frame-to-frame redundancies are not as extensive as earlier expected. 216 / SPIE Vol. 66 (1975) Efficient Transmission of Pictorial Information Approved For Release 2001/09/05 : CIA-RDP80600829A001100060001-0 Approved For Release 2001/09/05: CIA-RDP80600829A001100060001-0 Predominant among early techniques were those that used iso-intensity pixel clustering or "redundancy" in the spatial domain*. As memories or storage devices became economically reasonable, frame-to-frame redundancies were explored to provide further savings. Context-dependent noise sensitivity of human visual perception was first explored through non-linear quantization of difference signals. Such perception-matched noise distribution was finally carried into the two-dimensional transform-domain. Algorithms tailored to visual perception properties were later extended into the temporal domain through motion dependent spatial coding. A controversy still rages as to whether the "inability" of the eye to perceive high spatial structure in moving objects is truly a characteristic of visual perception. Some researchers feel that the phenomenon, presently adopted as a visual perception trait, is actually induced by long hours of viewing television, whose cathode ray tube introduces lags due to phosphor and electron beam characteristics. 2. 3.2 Redundancy Removal Techniques. Given that X1, X2, ... Xn denotes a string of n pixel values sampled from a picture through a raster scan, the two basic options available to the designer are depicted in Figure 2-4. 2. 3.2. 1 Interpolator. An interpolator forms a function f (t:m, n) through two preselected pixels Xm, X, where Xm is some pixel defining the initial value Of the function and Xn the latest value of that function. A term- by-term error analysis is then conducted, where Xt - 1m' n) = ev m t eb, then the values Xn_i and Xn are both transmitted to the receiver as the terminal value of the previous inter- polation interval and the start of a new interpolation inter- val respectively. The process is then repeated for the next series of pixels that obey function fn_1(t:m, n), where the subscript denotes the starting location of the function. In this way, the picture is described by a series of functions fm, fn_1,...1 . 2. 3.2. 2 Predictor. A similar breaking-up of the picture into a series of functions is accomplished by prediction. In this algorithm, a function of the previous s pixel values are used to predict the value of the latest sample. Mathematically, A Xn = f(Xn-r Xn_s) If eb is the error bound for the system and (2. 2) A Xn ? Xn