RECORDS MANAGEMENT HANDBOOK INFORMATION RETRIEVAL

Document Type: 
Collection: 
Document Number (FOIA) /ESDN (CREST): 
CIA-RDP74-00005R000100020030-9
Release Decision: 
RIFPUB
Original Classification: 
K
Document Page Count: 
137
Document Creation Date: 
December 9, 2016
Document Release Date: 
April 24, 2001
Sequence Number: 
30
Case Number: 
Publication Date: 
January 1, 1972
Content Type: 
REPORT
File: 
AttachmentSize
PDF icon CIA-RDP74-00005R000100020030-9.pdf13.46 MB
Body: 
Approved for Rel a 200 IO7Y 1 CtA,RDP74-00005ROO HANDLE OOK Iging Information Retrieval 'ION kL rRATION DS SERVICE ,EMENT Federal Stock Number 7610-042-8762 Approved for Release -2004107M;: : GIA-R0F-74-00005R000100020030-9 X0100020030-9 GENERAL SERVICES ADMINIS NATIONAL ARCHIVES AND RECD OFFICE CAF RECORDS MANA Approved For Release 2001/07/17 : CIA-RDP74-00005ROOp100020030-9 RECORDS MANAGEMENT HANDBOOKS are de- veloped by the National Archives and Records Servlice as technical guides to reducing and simplifying pape,work. Managing correspondence: Plain Letters .............. i 1955 47 ;.?. Managing correspondence: Form Letters .............. 1 1954 33 Managing correspondence: Guide Letters ............... 1955 23 n. Managing directives: Communicating Policy and Procedure) 1967 62 p.. Managing forms: Forms Analysis .................... ~ 1960 62 ~., Managing forms : Forms Design ..................... ~ 1960 89 p. Managing forms: Forms Management ................ 1969 34 ~.. Managing mail: Managing the Mail .................. 1971 94 o. Managing current files: Files Operations ............... 1964 76 x Managing current files: File Stations ................... 1967 52 Vii. Managing current files: Subject Filing ................I 1966 40 p.. Managing information retrieval: Information Retrieval ... f 1972 132 p. Managing information retrieval: Information Retrieval Systems ......................................i. 1970 150 p. Managing information retrieval: Microform Retrieval Equipment Guide .............................. 1970 64 p. Managing emergency preparedness files: Federal Vital Rec-i ords Program ..................................I 1968 16 >. Managing noncurrent files: Applying Records Schedules .. 1961 23 p. Managing noncurrent files : Federal Records Centers ....... 1967 39 p. Mechanizing paperwork: Source Data Automation ......I 1965 78 p. Mechanizing paperwork: Source Data Automation Equip ment Guide .................... 11 1970 122 p. Mechanizing paperwork: Source Data Automation SystemA 1963 183 p. General: Bibliography for Records Managers ...........1 1965 58 p. General: Copying Equipment ......... ...............j 1966 82 p. Approved For Release 2001/07/17 : CIA-RDP74-00005R00P100020030-9 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 FOREWORD Management at every level is being subjected to increasing pressure to automate the files of the office-to adopt new, nonconventional methods and equipment to improve the dissemination, storage, and retrieval of information. Professional journals, trade magazines, and agency publications are constantly reporting how management is solving its information problems through the use of these new systems. But today's manager knows that the new systems usually represent a sizable investment, and he is also aware that the investment has not always paid off. It is the purpose of this handbook to provide the manager and those who assist him with guidelines for determining where these new systems might profitably be employed in Government offices and with criteria for selecting the right methods and equipment. While the main objective is to encourage greater use of modern information retrieval techniques, the guidelines should also help prevent the installation of ill-advised or unprofitable systems. For those offices that have already installed modern information retrieval systems, the handbook may prove helpful in analyzing and evaluating existing system performance or in revising an ineffective system. This handbook is intended primarily for the use of management analysts, systems personnel, middle management, and any others who may be directly involved in conducting information retrieval studies or in designing and installing an information retrieval system. Although this handbook is issued as one of a series of Records Management Handbooks produced by the National Archives and Records Service, General Services Administration (GSA), the United States Air Force shared in its development. It was produced under a contract jointly funded and administered by the Air Force and GSA. Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001/07/17(WJ1V P 1 QPR000100020030-9 Page Page 1. WHY NEW INFORMATION RETRIEVAL Prerequisites for a Successful Machine In- SYSTEMS ARE NEEDED dexing or Retrieval System ........... 41 What Is Information Retrieval? ......... 1 Factors Affecting the Choice of the Type Summary of Conventional Methods ...... 1 of a Machine Indexing or Retrieval Summary of Nonconventional Methods ... 3 System ............................. 42 Limitations and Advantages of Conven- Types of Machine Indexing and Retrieval tional Methods ...................... 5 Systems ............................ 42 Advantages and Limitations of Nonconven- Other Machine Indexing and Retrieval tional Methods ...................... 9 Systems ........................... 50 Coordinating Indexing-Key to Many Non- conventional Systems ................ 10 VI. HOW TO DECIDE IF A NEW SYSTEM IS NEEDED II. HOW COORDINATE INDEXING SYS- TEMS WORK Principles of Coordinate Indexing ........ 11 Types of Indexing Terms ................ 12 Index File Arrangements ............... 13 Major Advantages of Coordinate Indexing Systems ........................... 13 III. MICROFORM SYSTEMS How Microforms Help Solve Typical Infor- mation Problems .................... 14 Prerequisites for a Successful Microform System . ......................... 18 Types of Microfilm and Cameras ......... 18 Factors Affecting the Choice of the Type of Microform System .................. 21 Types of Microform Systems ............ 22 Microform-Computer Combinations ...... 31 Special Considerations ................. 32 IV. MANUAL NONCONVENTIONAL INDEX- ING SYSTEMS The Preliminary Survey ................ 51 Where to Look ........................ 51 Examining User Needs ................. 51 Fact-Gathering Forms .................. 53 Decision Tables ....................... 53 Summary ............................ 59 VII. HOW TO DETERMINE SYSTEM REQUIREMENTS Data Collection Techniques ............. 60 Suggested Questionnaires ............... 60 Data Summarization Techniques ........ 65 Final Review and Analysis of Findings ... 71 Users' Briefings ........................ 71 Use of General Analysis Techniques and Tools .............................. 71 VIII. SELECTING THE RIGHT METHODS AND EQUIPMENT Types of Situations Where Nonconven- Step 1, Selecting the Applicable Functional tional Indexing Systems Are Used ...... 33 Category ........................... 72 Prerequisites For a Successful Manual Non- Step 2, Selecting the Right Methods and conventional Indexing System ......... 33 Equipment ......................... 73 Factors Affecting the Choice of the Type of Manual Nonconventional Indexing IX. DESIGNING A COORDINATE INDEX System ............................ 34 Economics of Coordinate Indexes ........ 76 Types of Manual Nonconventional Systems 35 Steps in Developing a Coordinate Index .. 76 Special Considerations ................. 40 Staffing .............................. 85 V. NONCONVENTIONAL MACHINE IN- Current Awareness Services ............. 85 DEXING RETRIEVAL SYSTEMS Quality Control ....................... 88 Types of Situations Where Machine Index- Setting Quality Standards .............. 88 ing and Retrieval Systems Apply ....... 41 Conclusion ........................... 89 APPENDIX A NONCONVENTIONAL METHODS AND EQUIPMENT GUIDE ....................... 90 APPENDIX B INFORMATION RETRIEVAL EQUIPMENT AND SUPPLIES SOURCES ................ 102 APPENDIX C INFORMATION RETRIEVAL-RECOMMENDED PRIMERS AND SELECTED RESEARCH SOURCES ................................................ 108 APPENDIX D FORMS FOR EVALUATING A POTENTIAL INFORMATION RETRIEVAL APPLICATION ....................................................... 111 APPENDIX E SAMPLE DIRECTIVE (AIR FORCE) COVERING DOCUMENT MINIATURIZATION SYSTEMS ................................................... 113 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 I. WHY NEW INFORMATION RETRIEVAL SYSTEMS ARE NEEDED Conventional methods for storing and retrieving information have been doing an effective infor- mation handling job for some 50 years, and in many situations today are still the best answer. However, during and since World War II more and more people have been questioning these conventional methods and looking for new and better ways to satisfy their information needs. The three main reasons for this exploratory re- search have been the information explosion, the trend toward a much higher degree of specializa- tion in all technical fields, and the advent of the new technologies of electronic data processing and document miniaturization. has become known as "information retrieval." Stated in other ways : ? Information retrieval employs methods and equipment that depart in one way or another from the conventional methods we find in most offices and libraries. ? Information retrieval means there are now available methods and equipment for dis- seminating, storing, and retrieving infor- mation that make it possible, and often quite practical, to do things that no one considered doing before. The information explosion is now overtaxing conventional methods and equipment for index- ing and storing the thousands of new documents being prepared each year. The trend toward greater specialization is resulting in preparation of documents that deal with increasingly narrow aspects of subject topics. New classes of infor- mation are constantly being formed by the emer- gence of interdisciplinary specialists. Conven- tional methods for classifying and indexing information are frequently not well suited to meet the demands for greater specificity in organizing and retrieving information nor the need to manip- ulate information freely. Information specialists in the scientific and technical fields were among the first to apply the electronic computer, microforms, and other non- conventional methods and equipment to solve in- formation retrieval problems. This handbook draws largely on their knowledge and experience. What Is Information Retrieval? It is the approach to the problem of information dissemination, storage, and retrieval that is new- nonconventional methods and equipment that have been introduced during the last decade or so. It is this new, nonconventional approach which ? Information retrieval means simply new ways for performing old tasks and is used primarily when conventional methods will no longer suffice. Perhaps one of the best ways to define noncon- ventional systems is to first explain what is meant by conventional methods and equipment-hence, the things not covered in this handbook. Ex- amples of these conventional methods are shown in figure 1, which includes a standard file cabinet, a reference visible file, a mobile shelf file, a rotary file, and a mechanized file. Summary of Conventional Methods The characteristics of the documents and the methods used in organizing the information in conventional files are as follows: ? The documents are largely in paper form. ? The documents are maintained in a struc- tured file, that is, a file organized and ar- ranged for direct searching according to the filing feature (name, number, subject, etc.) most often known by the user when looking up the information. 1 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Rel~,7-~~~09 Rotary File 2 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 ? If n c sar Appror acogr ~fSg o maintained ~ 7nd~9 find DP7m 0 SR4Q(3 0 @61ROM 9hoey d nn conventio modern nn- information when users ask for it on a formation retrieval systems and these, with a basis different from that by which the numbers of others, are described in chapters III, document file is structured. IV, and V of this handbook. The success of conventional methods depends largely on the following factors: ? Stability of information and language con- tained in the documents. Edge-notched cards. Edge-notched cards have been available for many years and em- ploy a technique that is superior to conven- tional filing methods in numerous applica- tions. ? Simplicity and shortness of the documents. ? Predictability of users' needs and the way in which they will ask for documents. ? Simplicity of users' needs. ? Availability of space close to the users to store the documents. The following GSA-Records Management Hand- books relate primarily to conventional systems and should be carefully reviewed before any in- formation retrieval study is undertaken : Files Operations-FSN 7610-985-6973-1964 Subject Filing-FSN 7610-926-2128-1966 File Stations-FSN 7610-926-2129-1967 Summary of Nonconventional Methods Nonconventional methods for storing and re- trieving information have one or more of the following characteristics: ? The information is disseminated and stored in miniaturized form. ? The document file is largely unstructured- the documents are filed by a simple iden- tifier such as an accession number or ma- chine location address. ? The contents of the documents are de- scribed in detail by means of a separate, highly manipulative index file, or the entire contents are maintained in machine- readable form. Optical coincidence cards. The optical co- incidence of "peek-a-boo" cards is useful in special applications for organizing and re- trieving information. Microforms. Microfilm was conceived as a recording medium about 100 years ago, and recent developments have made microforms a vital link in solving many of today's infor- mation problems. EAM punched cards. EAM (electrical ac- counting machine) punched cards have been used extensively for processing numerical data, and they can be used readily for storing and retrieving information. Computers. The most important of the non- conventional tools is the electronic computer, which is playing an increasingly important role in storing and retrieving information. Nonconventional methods can often help when one or more of the following conditions exist: ? Types of information and terminology contained in the document collection are constantly changing. ? Individual documents are lengthy and contain information on a wide variety of subjects or include large quantities of data. ? Users ask for information in a variety of ways and their needs are continuously changing. ? Users' needs are complex in that they re- quire precise information and often must be able to correlate or manipulate it. Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved F_i3er~s~~:l~l'1'1V1VA 1V11:~liVD~ 51~1v7u~~~NT Edge-Notched Cards Optical Coincidence Cards Computers 4 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 App1 d P# l'tR 2004dW 117in C1> PJDP74-00000HMJ.0QQ ,OQ3Q- such conventional maintained in multiple sets to facilitate methods as folder files and printed listings dissemination, storage, and retrieval. for maintaining the data may make this a time-consuming and tiresome chore. Limitations and Advantages of Conventional Methods To fully appreciate why nonconventional meth- ods and equipment are needed and where they can best be used, one must first understand the sort of retrieval problems that cannot readily be solved by conventional methods. The three broad types of problems r'e : ? Location of specific information. Many times today the information the user needs is deeply embedded in a lengthy document -perhaps found in one paragraph of a 50- page research report. If this situation is commonplace and if there are a large num- ber of documents in the collection, re- trieval of needed information can be very difficult. ? Location of individual items of data. In some work situations it is frequently neces- sary to look up individual items of such data as names, numbers, dates, and ? Conducting coordinate-type searches. In many work situations it is necessary or desirable to conduct coordinate-type searches to identify those documents, per- sons, places, or things which meet a partic- ular set of criteria. For example, manage- ment may have an urgent need for locating employees who can speak a certain lan- guage, have had certain types of experi- ence, and are willing to travel. Conven- tional methods usually make it impracti- cal, if not impossible, to conduct searches of this type. Four general types of systems may be used for organizing information by conventional methods. The following is a description of each, together with an explanation of why each may sometimes fail. 1. Subject document files (fig. 3). Definition: Documents arranged by subject categories, as in hierarchical subject classification SUBJECT DOCUMENT FILE Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 MANTV yMkV elNDM/lUA)ZB--IREM-00005R000100020030-9 Vegetables Poultry systems for correspondence folder files, library books, and other written material. Significant problem : Developing a classification scheme that will satisfy the viewpoints, terminol- ogy, and needs of individual users in instances where the users have a wide variety of interests. Why the system may fail: A hierarchical subject classification scheme needs to be directly related to the background and thinking processes of the users served. It is, therefore, virtually impossible to construct a classification scheme that will ideally serve the needs of a wide variety of interest groups. Significant problem : Modifying the system in situations where the fields of knowledge or work functions are constantly changing; or redesigning it to take advantage of a new understanding, gained through additional experience with the system, of how the information should be or- ganized. Why the system may fail: Many times, the ex- perience gained by using the system reveals short- comings in the first arrangement that could be eliminated by reorganizing the classification structure. The rigid structure of a hierarchical classification scheme makes adjustments of this sort very difficult. Significant problem: Classifying, filing, and re- trieving individual documents in situations where they are often lengthy and involve numerous sub- ject categories. Why the system may fail: If an individual docu- ment relates to only one topic represented in the Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 stl rc$ g F atdi eI asei2OO1 71* z1fAidRDP74-0000I51kM1DOIWHHNT FILE it. But if the document has more than one subject, then cross-referencing becomes necessary. When such a situation is commonplace, the conven- tional system will tend to break down. A complex search involving several subjects can become a jungle of cross-references which makes the search- ing process very difficult, time consuming, and possibly unsuccessful. Definition: Manual card files arranged by sub - ject topics, as in a library's 3- by 5-inch subject heading card file. Significant problem: Selecting subject terms that will be meaningful in the future. Why the system may fail: Selecting subject topic terms that will always be meaningful and useful in the future is not only difficult but at times im- possible. The problem is particularly thorny when conventional methods are employed. Significant problem : Card preparation and up- dating costs. Why the system may fail: Just the initial prep- aration and filing of manual index cards can be quite costly, especially if it is necessary to prepare and file several cards for each document; but to update a large file may be so costly that in actual practice it could not be done. Significant problem: Detailed (deep) indexing of documents involving a large number of subject topics. Why the system may fail: The physical limita- tions of index cards are a problem if a document must be indexed in depth. Detailed (deep) index- ing of documents involving a large number of sub- ject topics is difficult because of the size of-the file that this practice would create. A card. must be prepared for each subject in the document and a cross-reference prepared to all other related subjects. The structure of the card and the size of the file create barriers to fast and efficient searching. Collating these cards in a search is also very tedious and time consuming. Definition : Documents arranged by case name or number, as in a personnel folder file. Cows Incorporated Figure 5 Significant problem : Searching large numbers of folders in situations where it is often necessary to correlate, compare, or analyze data, as in person- nel selection and placement. Why the system may fail: A case file containing large numbers of folders is very difficult to search if information must be correlated, compared, or analyzed. The physical problem of handling the folders prevents quick and easy reference. Every folder must be thoroughly analyzed from front to back before a complete job is done. Significant problem : Locating or extracting spe- cific items of data appearing at various places within the folder, in situations where the data is frequently needed for such purposes as answering inquiries and preparing reports. Why the system may fail: The items of data in a document are usually not arranged for retrieval purposes but for easy preparation. When individ- ual items must be located in a large number of case folders, the problem of pulling the folder and finding the item on the form becomes very tedious. A search of this type takes a lot of time and is subject to a large amount of human error in locating and transcribing information. Significant problem : Locating precedent or pol- icy material scattered among the case folders. Why the system may fail: If material on prece- dent or policy matters must be located, usually it can be done only by making a search of the file Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 mWeA d K9Ie!X 0 7j7 : CIA-RDP74-00005ROO010002 Gary Allen, Mary EAid, Anita Adams, Ruth or calling upon the memories of employees who have had long experience in the subject matter field. Seldom is this type of information readily accessible in a separate section of the folder. The problems of interpreting precedent or policy mat- ters are large enough; but in addition, the typical case file has the disadvantage of requiring a tire- some, page-by-page, folder-by-folder search of the file for this type of information. Many organ- izations that depend upon the memories of long- time employees for such information are right- fully becoming, as these older employees retire, concerned with methods and techniques for cap- turing their knowledge in a permanent, readily accessible form. Definition : Manually prepared cards arranged by case names or numbers, as in a personnel data card file. Significant problem : Cost of updating and pre- paring cards. Why the system may fail: Card preparation and updating costs can be very high for such files. Each card must be manually prepared and indi- vidually inspected. As the size of the file grows, the point is reached where the cost of manually maintaining and updating the cards becomes exorbitant. Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Agtp ldm k Fmbl eal es2 iUk1iTiiagCJA' RDP74-0000 O IiQ OOI" refers to the speed at Why the system may fail: Manual case index card files must be properly designed and con- trolled to prevent loss of information. If the card for a certain item is lost, then the whole record of activity for that item is lost. Although methods of color coding, grooving, tabbing, and sequential numbering can make refiling so easy that even a newcomer to the system can do the job well, most systems are not this refined. Therefore, this pos- sibility presents a severe limitation-particularly if the information is valuable. Significant problem: Losing vital information through illegible hand postings and errors. Why the system may fail: Whenever a file is manually maintained, a certain loss of informa- tion results no matter how many precautions are taken to prevent it. This is particularly significant in case card files because of the uniqueness of the information placed on each card. Preparing cards in this way makes verification for accuracy a very time-consuming and costly job. The best that can be hoped for is that most of the important mis- takes are found and corrected. Conventional systems always offer certain ad- vantages, and if they will satisfy the needs of the users, they are often preferable to nonconven- tional systems. Chapter VI provides guidance on how to determine which of the two methods should be used. The following are the major gen- eral advantages of conventional systems : ? Usually simpler to design and operate. ? Require no special equipment. ? Permit direct access and often facilitate browsing. ? Input costs are usually lower. Advantages and Limitations of Nonconventional Methods A cost-benefit study should always be made be- fore converting from a conventional to a noncon- ventional system. Nonconventional methods can, under the proper circumstances and application, result in one or more of the benefits described below. which a user gets the exact information he needs to perform a task. Fast retrieval can be the significant element of a system when need is measured in seconds or minutes. For instance, if a child has swallowed poison and the antidote must be known immediately to save a life, speed is the most essential char- acteristic. Or if a policeman chasing a speed- ing automobile calls the station to identify the license number, again fast retrieval is essential. Better information. This means information that is more complete, more accurate, and more current. For example, modern informa- tion retrieval systems can be designed that will reduce the chance that any pertinent in- formation will be overlooked-a most impor- tant consideration in situations such as those facing the patent attorney or physician. Modern information retrieval systems make it practical to store and correlate more infor- mation and data since they usually have the capability to reduce masses of information to a manageable proportion more quickly than conventional systems. Conserving users' time. How much time is spent searching for information through fold- ers, reports, card files, book indexes, and other document files in an agency or field station? No one knows exactly, but in many situations it is far too much time. In some legal offices, for example, attorneys spend as much as 75 percent of their time searching for precedent decisions and the like. Modern information retrieval methods can save valu- able users' time by reducing the man-hours spent in looking up, searching for, and corre- lating information needed to complete their tasks. Retrieval may be simple yet time- consuming, as in looking up individual social security numbers many times each day; or again it may be as complex and time con- suming as in a one-time correlation of data to determine the possible cause of a missile failure. Improve service. This refers to providing better agency service for the general public rather than to improving service within the agency for the direct users of the information Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 A0 . }M4JeW f f#I@Me iF 7/' 7is 1MA-RDPr7a*d100 00ft41 20fi3&&g, or multiple as- si a to render service never before thought possible or to improve the service far beyond that which was possible when only conven- tional methods were available. The full extent of the disadvantages and limita- tions of a nonconventional system may not be- come evident until the system has been in oper- ation for some time. This is one of the reasons that a feasibility study is needed and that careful at- tention must be given all aspects of the system design. (For guidance in these matters, see chap- ter VII.) When compared with conventional sys- tems, nonconventional systems generally have the following disadvantages: ? Require specially trained personnel to de- sign and operate the-system. ? Usually require special equipment. ? Often require use of special procedures and techniques to retrieve information. ? Input costs are usually higher. Coordinate Indexing-Key to Many Nonconventional. Systems The concept of coordinate indexing-or concept pect indexing, as it is variously called-has been a major factor in removing the restraints imposed by earlier classification and indexing systems. All coordinate indexing systems have one feature in common: No attempt is made at time of input to limit the description of a document by classifying or indexing it under a major subject heading of two. Instead, large numbers of highly definitive indexing terms or data elements are employed, and the document is indexed under all entries that are pertinent. To retrieve information, the user selects those indexing terms or data elements that describe the items he is looking for, and the sys- tem quickly identifies all those that fit his descrip- tion. The key to the success of coordinate indexing is that all the descriptive information in the sys- tem is freely accessible, and no structuring of in- formation takes place until a query is received. This permits an endless variety of on-demand searches to be made, each tailored to the precise interests and needs of the user. Various types of equipment may be employed in coordinate indexing systems, as discussed in chapters IV and V; additional information on this subject is also included in chapters II and IV. Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approvvf For RNC W1/C0(IRF 'N0 VITWIND ING SYSTEMS WORK Through the years, two traditional methods have been employed for organizing information by subject-hierarchical subject classification sys- tems and manual subject indexing systems. The disadvantages and limitations of each were dis- cussed in chapter I. In hierarchical classification systems, the doc- uments themselves are organized and arranged by primary subject categories and then further broken down by secondary categories, and so forth. Figure 7 illustrates two examples of hierar- chical subject classification systems: The manual subject index file-such as the 3- by 5-inch card file found in most libraries-is often employed as a supplementary finding aid. Broad subject headings that are complete unto them- selves are normally used, and the headings are arranged in alphabetical sequence. Typically, the card includes the title, date, author, and similar identifying information, perhaps plus a very brief description of the document. If the docu- ment is a book, usually several subject heading cards are prepared and filed alphabetically. Author and title cards may also be prepared and interfiled among the subject cards. The following are some examples of possible subject headings: EXAMPLES OF HIERARCHICAL SUBJECT CLASSIFICATION SYSTEMS Subject Numeric Filing System (office type) ACCOUNTING 1 Accounts Current 2 Allotments 2-1 Symbols 2-2 Obligations 3 Disbursements 3-1 Loans AUDIT 1 Assignments 2 Contract Audits Dewey Decimal Classification System (library or office type) 600 APPLIED SCIENCE 610 Engineering 611 General Engineering 611.1 Equipment and Supplies 611.11 Tools 611.111 Cutting Tools 611.111.1 Stroke 611.111.11 Depth of Cut Automatic data processing Correspondence management Forms management Information retrieval Records retirement Source data automation Survey techniques Work measurement Principles of Coordinate Indexing Coordinate indexing systems can be used to re- place either or both of the hierarchical subject classification systems described above. The doc- uments are identified and arranged by number, name, author, storage location address, or some other simple identifier. The index is usually a separate, highly manipulative, often mechanized file. In a typical coordinate index, large numbers (sometimes thousands) of short terms are em- ployed, most of which are not intended to be used alone but rather in any desired combina- tion-"coordinated" to describe the various top- ics, concepts, aspects, characteristics, features, or attributes of the document or other item being indexed. These terms range from precise words and quantitative or qualitative data to abstract concepts or ideas. Both broad and narrow terms are used in the same system. Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 APP6%l( 1 46rc f g,2APgllpyi~cl~ ap&fDPThaPQ 9q'ijW W,4P-Pre any documents terms as those illustrated in figure 8 in his vo- that would satisfy the search question, the cabulary of indexing terms: searcher would look for particular document SAMPLE VOCABULARY OF INDEXING TERMS perception Africa fish population albatross food preserve Antarctic fright price Arctic Asia reproduction rescue bear black habit research blue horse respiratory housebreaking rodent capture hunting shelter cat size color South America conservation leg speed deer life span strength lion defense temperature-over 100? diseases temperature-80'-1001 dog domestic 1900 AD-present temperature-60?-80? dorsum 1500-1900 AD temperature-32'-60' duck/goose 1000-1500 AD temperature-under 32? Before 1000 AD whale eagle white ear worm egg obedience elephant zebra offense Europe zooid exercise exterior eye When indexing an individual document, all those terms that are pertinent are used to describe it. Thus, it can be seen that the description of the document consists of a group of interdependent terms that together comprise, in effect, a very brief abstract of the document. In searching a coordinate index, one selects those indexing terms in the vocabulary that best describe the desired information. The index file is then searched to find any documents indexed under those terms. numbers that have been entered on all pertinent cards. As in the indexing process, the searching process permits free coordination of a large num- ber and wide variety of terms. For example, when desirable one can narrow the search by using more specific terms, or broaden the search base by dropping the more specific terms, or form new combinations of information or data by changing the configuration of the terms used in the search. Figure 9 illustrates the principles involved in searching a coordinate index. The cards repre- sent indexing terms considered pertinent to a particular search question; the numbers on each card represent those documents indexed under Types of Indexing Terms Two types of indexing terms that may be used are as follows: Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001/07/ SEARCHING A COORDINATE INDEX Keyword. The index terms consist of key words selected from the title or text of the documents. The indexing vocabulary is a by-product of the indexing process, and some form of control is usually exercised to keep the system manageable. The indexing of individual documents may be accom- plished either by manual or machine (auto- matic) indexing methods. Descriptor. A specially prepared vocabulary of indexing terms developed through a con- tinuing process of analysis of the documents being indexed. The descriptors are usually formalized and controlled by means of a thesaurus. Indexing terms are manually as- signed to individual documents from the ap- proved list. Some of the terms selected to de- scribe a particular document may coincide with keywords appearing within the docu- ment, while many will not. Index File Arrangements The index file is arranged in either of the two fol- lowing ways: By Document Numbers. A card or machine record is prepared for each document stored in the system, with all indexing terms de- scribing the document recorded thereon. This is usually in coded form. Retrieval of infor- mation from the file involves sequential or serial searching, since the searcher must examine all the index records in the system to identify those documents that are assigned the terms used in the search. By Indexing Terms. A card or machine record is established for each indexing term. When the indexer has decided which terms will be assigned to a particular document, the index records for those terms are selected and the document number is recorded thereon. Retrieval involves selective or parallel searching, since the searcher or the machine selects and examines only those records rep- resenting the terms used in the search. Major Advantages of Coordinate Indexing Systems ? More Specific. Coordinate indexing makes it not only possible but practical to de- scribe documents or other items in greater detail (depth) than conventional methods. ? More Adaptable. Coordinate indexes are far more adaptable to changing situations and unanticipated events than conven- tional methods. ? More Manipulative. Coordinate indexing makes it possible to quickly correlate and manipulate information and data in an endless variety of ways to achieve the de- sired search results. Those desiring to install a coordinate indexing system have a wide variety of equipment choices. These include such manual types as the colum- nar, optical coincidence, and edge-notched card systems covered in chapter IV. Also, certain types of microform equipment, electrical accounting machine punched card systems, and electronic computers, described in chapters V and VI, may be used. For information about designing a coor- dinate indexing system, see chapter IX. Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 III. MICROFORM SYSTEMS Microform is the general name for the various types and formats of microfilm and other media used for recording information in miniaturized form. In the past microform was used mainly for space-saving purposes; but numerous studies have shown that it is often less costly to place the records in the low-cost storage facilities provided by the Federal records centers. Today, however, microforms are assuming a new and far more im- portant role in solving problems relating to infor- mation dissemination, storage, and retrieval. cate sets could be placed in various locations in the users' work. areas. A second choice, which solves the competition problem only, is to make film-to-film copies for multiple users who need to see the documents. Problem: Document Servicing and Control ? Man-hour requirements for pulling folders and preparing document chargeouts. How Microforms Help Solve Typical Information Problems The following are some typical problems that can sometimes be solved or partly solved by the use of a microform. Moreover, it is not likely in any given situation that only one of these problems prevails, which largely explains the growing in- terest in microforms, Problem: Document Accessibility It is usually possible, to keep near the users small collections of documents that occupy a file cab- inet or bookcase. But the larger document collec- tions, by necessity, are usually located at some distance from the users' area. This means that either the document or the user has to travel back and forth to the storage site. Further, there are times when the same docu- ment is needed by more than one user, and each must wait his turn. These problems of course cause work delays. They also tend to reduce the usefulness of the information contained in the documents, since the users are inclined to try to do without unavailable documents if they can. Both problems could be solved through the use of a microform system. Once the documents are converted to a microform, inexpensive dupli- ? Man-hour requirements for filing returned documents. ? Man-hour requirements for following up on unreturned documents. ? Man-hour requirements for routine docu- ment maintenance. If a microform ;system is used, inexpensive diazo copies of the documents can be made and given to the user instead of loaning the file copy. The user disposes of the duplicate copy when he is through with it. Thus there is no document chargeout and refile problem, and file mainte- nance is reduced to a minimum. Because personnel costs are rising constantly and it is sometimes difficult to obtain file clerks, situations will be increasing where records man- agers must turn to microform to solve their problems. Problem: Retrieval Speed and Costs ? Random lookup of individual items of data. ? Scanning; and retrieving information in textual documents and indexes. In situations where a large volume of data can be readily converted to a microform, retrieval speeds sometimes can be increased for a very Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 s'3 ~~~ ~ IC @fe"d 17"1'Po ur I DP7` ' O~PtM 9'0? Odd~ument preparation, is rticularly true of instances where retrieval editing, index preparation, formatting, and Com- involves random lookup of individual items of discrete data such as a social security number, date of birth, or street address. If there is a continuing need for examination of graphic information-such as large maps, engi- neering drawings, or photographs-microform often will make the job faster as well as easier. Similarly, scanning or browsing through large collections of textual material and indexes is sometimes easier and faster if they are available in microform. Overall retrieval speeds and costs often can be improved because a microform system makes it possible to store needed documents and data at the user's work station, rather than keeping them at a remote location. Problem: Document Printing, Distribution, and Stocking ? High costs for printing, collating, and packaging of paper documents. ? Transportation and handling costs. ? Stock control and replenishment costs. ? Time-delay problem. Many Government agencies discovered some years ago that the most economical and efficient way to reproduce, distribute, and fill individual requests for unpublished reports is by means of the microform. Federal agencies, within the De- partment of Defense in particular, are saving thousands of dollars each year by using the microform for reproduction and distribution of engineering drawings of military equipment. Not only is it sometimes possible to reduce the initial printing costs, but significant savings can often be realized in handling and transporting documents. Stocking usually can be eliminated altogether, since the microform stored at the orig- inal source or at any distribution point can be used to reproduce on demand low-cost, film-to- film copies or enlarged paper copies. The original microform can be produced readily by photo- graphing paper documents. However, with the ad- puter-Output Microfilm (COM) equipment, di- rect publication of documents in microform is now possible. The computer output magnetic tape also can be used to automatically print pa- per copies. For many agencies, these new tech- niques offer the means for a substantial reduction in the time lag between document drafting and receipt by the users. Problem: Computer Data Storage and Accessibility ? Storage and retrieval of machine language backup data. ? Storage and retrieval of static or semistatic data. It doesn't take long for a computer to fill a reel of magnetic tape with data. If it is kept busy all day, the computer may have produced dozens of tape reels to add to the tape library. It is little wonder, then, that some computer installations have thousands of tape reels or millions of punched cards in their file and must often restrict the com- puter master files to summary data. While this backup data resulting from input processing and other machine runs is usually essential to system documentation, due to its great volume it is often too costly to retain the data in machine language and search it by computer. The Social Security Administration was among the first to use the microform and the first to procure a COM device to solve this problem. While the computer provides the fastest and most accurate means for compiling, updating, and organizing static and semistatic data, the size and cost limitations of mass memories and time requirements often make it impractical to use the computer to retrieve data from these files. Often, the best current solution to the problem is to con- vert data recorded on magnetic tape to a micro- form by means of COM equipment. A special op- tical mark reader, called the "Foto" Optical Sens- ing Device for Input to Computer (FOSDIC), has been developed to read and process Hollerith- coded data on a microfilm copy of punched cards. Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Aggcr]pved computer as s a eba2001 a0s7117 : orAiesR, DP usse0Qoco loRi~.q9c~~99cr KT-We instances where a transportation schedules, rates, and special tables can be converted periodically to microfilm and then searched by means of standard microfilm readers. Where static information ties in with dy- namic data maintained "on-line" with the com- puter, special remote terminals have been de- signed to permit the users to interrogate both data bases at the same time. By necessity most large Automatic Data Proc- essing (ADP) systems must use batch-processing techniques and access the master file on a cyclical basis-perhaps once or twice a day, once a week, or possibly less frequently. During the interim, the data is locked up in the tape reels and in- quiries must wait until the next processing cycle comes around to be answered. By converting the data to a microform by means of COM equip- ment, inquiries and requests can be handled quickly and efficiently by nonskilled personnel equipped with microfilm readers. Problem: Updating and Maintenance of Directives, Manuals, and Catalogs ? Total costs for individual updating of di- rectives, manuals, and catalogs kept at numerous locations. ? Errors and delays in individual updating. ? Maintaining large, frequently used man- uals and catalogs intact and in good condi- tion. The updating of maintenance and procedural manuals, catalogs, and similar publications can be a time-consuming and difficult problem if there are numerous publications and if they are maintained at numerous locations. Errors are made in entering the changes, while the insertion of some changes is delayed or never made at all. If the manuals and catalogs receive heavy use, as they often do in a maintenance shop, the pages are likely to be torn and lost. When detailed in- formation is needed at the job site, the mechanic may have to copy the information by hand or re- move a page. In most agencies, no one knows exactly what this is costing or is aware of the full effects of not having current, accurate data on hand at each detailed study was made, such as at some of the airlines, the savings were sufficient to pay for the cost of the microform system in a comparatively short time. One of the ways to solve these problems through microform is to maintain a single master copy in cut-sheet form at a central point. Changes are entered in this master copy as they occur. The entire master copy is periodically rephoto- graphed, reproduced in microform, and distrib- uted to the users; whereupon, they simply dispose of the entire old copy. The microform readers are often equipped with a paper copier so that me- chanics can make disposable copies to take back to their job sites when needed. In some situations the microform might also be produced through the use of the computer and COM equipment, as described earlier. Problem: Procedural Bottlenecks ? Collection and transportation of large vol- umes of data. ? Verification of data on documents passing through the system. Collection and transportation of large volumes of data such as questionnaires and reports can be a knotty problem if they are retained in their origi- nal paper form. The U.S. Census Bureau, Department of Commerce, solves this problem by having the census questionnaires microfilmed at various lo- cations in the field. The microfilm is then shipped to the headquarters office at Suitland, Md., where it is placed upon a FOSDIC microfilm optical mark reader. It converts the data to machine lan- guage code for processing by computers. Several Government agencies receive large volumes of checks from the public. The checks can be microfilmed while being processed through the system in order to verify any data that may later be questioned. For similar reasons, organiza- tions using Optical Character Recognition (OCR) equipment for computer input sometimes micro- film incoming documents. 16 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 AppfPhvffII[ o15rW@; 4t7,/'t'7ea BQP74-00095 QQt@AO1gWcVzgstruction of informa- partment, must maintain a record of each of the 1.5 million checks it issues each day. In the past, this was done by preparation of a paper record. Using COM equipment, the record is now pro- duced directly from magnetic tape, making it pos- sible to place the issue record for 102,000 or more checks on a single roll of microfilm. Duplicate microfilm copies of each month's veterans' bene- fit check issues are sent to Veterans Administra- tion regional offices throughout the United States where the microfilm is used to answer thousands of inquiries a month, conduct postaudit opera- tions, obtain a historical record of payments in specific cases, and locate addresses. If it is necessary to log incoming and outgoing documents, microfilming is usually a much sim- pler and cheaper method than keeping records by hand. Many libraries use this technique for charg- ing out books. Equipment manufacturers have developed lightweight portable cameras, includ- ing some that are battery operated, that add to the practicability of using a microform. Problem: Storage and Handling of Large and Nonstandard-Size Documents ? Special equipment needs. ? Folding and unfolding of oversize docu- ments. ? Storage of documents with irregular sizes and shapes. Oversize documents such as tracings, drawings, and maps can be recorded on microfilm to elimi- nate the problems of special equipment require- ments and the need for unfolding and folding the documents each time they are used. However, the original documents must conform to certain qual- ity standards in order to produce a satisfactory microfilm substitute. Documents having irregular sizes and shapes can be reduced to a uniform size through micro- film. Improved color microfilm is available if color is a significant factor. Problem: File Integrity Errors in filing occur in spite of the best efforts of file supervisors. If the file is a large one, it may be days, months, or years before a missing docu- ment turns up. Whenever a document is removed from a file and forwarded to a user, it might be lost in transit, accidentally destroyed, damaged, or not returned. These, of course, are serious risks when dealing with important documents such as those affecting individual rights and claims. Often the best way to insure absolute file in- tegrity is to convert documents to a microform system. The user is provided access by furnishing a film-to-film copy or an enlarged paper copy for his use. Problem: Document Acquisition ? Rising cost of hard copy publications. ? Acquisition of rare or unique documents. The rising costs of publications printed in paper copy are making it necessary for many libraries, offices, and others to curb their document-acqui- sition programs. In those instances where a docu- ment is available in either paper copy or micro- form, savings of 70 percent or more can usually be realized by purchasing microform. There are also times when desired documents are out of print. If such documents are needed urgently, the simplest and generally cheapest way is to make microform copies. Problem: Document Preservation and Protection ? Prevention of wear and defacement of val- uable, irreplaceable documents. ? Protection of indispensable operating rec- ords against a disaster. Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 TheAMPfMgfF*n as d2@AlfWt4XXal CAfl-RDFr7{ ROQd`MQQ1QiQ2AWi9 color, size, and chives use microfilm extensively for preservation thickness of documents; intermingling of one- of important documents. The microfilm copies are sided and two-sided documents; the need for re- made available to scholars and researchers, not moval of staples, pins, and other fasteners; and the original documents. the need for sequence checking and screening to remove extraneous material. Microfilm is used by many agencies for pro- tection of indispensable operating records against a fire or national disaster. The film is usually kept in a remote, protected depository that in most in- stances is equipped with machines and supplies for making film-to-film copies or paper enlarge- ments. The original copies of classified documents may be microfilmed so that either the original or copy of the document is always secure. Problem: Equipment and Space for Document Storage Within the next 10 years it can be expected that many of the existing large-folder file systems in the Federal Government will be converted to microform. Steps should be taken as soon as pos- sible, therefore, to clean up and revise such sys- tems so that the essential papers will be suscep- tible to low-cost, high-quality microfilming. Careful attention should also be given to the plan- ning and maintaining of any new, long-term pa- per document files so that they too may be readily converted to a microform should this later be- come desirable. ? Availability of adequate space to house documents. While space and equipment savings are often an important factor in a microform cost-benefit analysis, microfilming can seldom be justified for this purpose alone. Prerequisites for a Successful Microform System For a. microform to serve as a satisfactory substi- tute for paper copy, it must be as legible and easy to use as its paper counterpart. Microform sys- tem success depends upon such factors as condi- tion of the original documents, the film, the cam- era, the camera operator's work, the quality of film processing, the suitability of the microform type, proper storage and handling of the micro- form, the adequacy of viewing equipment, and the ability to quickly locate information within the microform record. A weakness in any of these areas may cause the system to fail. The single most critical factor is the condition of the document. Not only does this largely gov- ern the quality of the finished microform, but it is a major cost factor in the filming operation. Typi- cal problems are poor contrast between the read- ing matter and the paper; extremely fine lines or Types of Microfilm and Cameras Normally, the initial step in any microform sys- tem is the recording of document images on roll microfilm having a silver base. This master film, in which images appear in a negative mode, is then used to produce duplicate reference copies as needed. The copies may also be silver films, but if widespread duplication is necessary the lower cost ammonia-developed diazo films are com- monly used. A third type, thermally developed vesicular films, may also be used for producing reference copies. While the original microfilm master is nor- mally in roll form of 16 mm, 35 mm, 70 mm, or 105 mm width, the reference copies are often cut into small pieces for use in systems employing unitized microform media. These include strips, chips, microfiche, microfilm jackets, and aperture cards, which are described later in this chapter. Four main types of cameras are used in the original filming; operation. See figure 10. These are as follows: Planetary cameras are employed for obtaining high quality microfilm of engineering drawings, maps, and assorted other documents that cannot be satisfactorily filmed by a rotary camera. Step-and-repeat cameras are used for direct film- 18 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2L%Y/ 'i V7MQ]db$ RBM30-9 I,~' i Y' VIII II~V i? < mr Planetary Camera Rotary Camera Step-and-Repeat Camera COM (Computer-Output Microfilm Device) Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved Forl.R"eM7147R6ftPMN twOA2Aq3-9 Washington Scientific Industries Model RH Portable Reader The University Microfilms Model 1212 Reader DASA Corporation's Mark I Model U Reader Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Rl~/D-7~?~OO~~IM The Information Handling Services Satellite IIW Reader ing of documents in the multiple-row microfiche grid format. (Microfiche may also be constructed by cutting 16 mm or 35 mm film into strips and placing the strips in microfilm jackets or arrang- ing them in rows on a special frame or sheet of clear film.) Rotary cameras are used for filming printed and other documents of uniform size and color where ordinary film quality will suffice. They are largely automatic, thus permitting higher input speeds and use of unskilled operators. Computer-Output Microfilm (COM) devices re- cord computer-produced data directly onto microfilm, thereby bypassing the preparation of The Recordak Motormatic Reader, Model MPG paper documents altogether. These devices can also add automatically to the microfilm copy the bars or code lines, image marks (blips), or photo- optical binary codes often employed to assist in the retrieval of documents or data. Factors Affecting the Choice of the Type of Microform System The choice of which microform system to select is governed by many factors. Mainly, these are the height and width of the documents, the num- ber of pages per document, the total volume of documents or data, organization of the file, na- ture and extent of changes and additions to the file, number and location of the users, nature of Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved--o?- eta se 0~'1/ /1~iA r 9M-Xo9&XYA9j6'd '8-HEADERS the reference activity, reference rate, retrieval speed requirements, and requirements for produc- ing film or enlarged paper copies. Information on gathering the necessary data for system require- ments, analyzing user needs, and selecting the right method and equipment class is provided in chapter VII. Guidance on selection of particular manu- facturer's equipment is contained in the records management handbook, Microform Retrieval Equipment Guide. A description of a number of systems employing microforms is included in the records management handbook, Information Retrieval Systems. Types of Microform Systems The following are descriptions of the various types of microform systems, together with a brief summary of the main advantages and limitations of each. Most of the microform readers mentioned are also available in reader-printer models that can produce full-size paper copies of the docu- ments. Conventional Roll Microfilm. These include systems using hand-driven microfilm readers and standard microfilm reels, as illustrated in figure 11. Flashcards or flash targets are used to separate file segments or pages. (Figure 14 depicts a sam- ple of a flashcard used on roll microfilm.) Con- ventional roll microfilm systems are well suited to storage or protection of documents for archival, administrative, legal, or security purposes, and other situations where there is a very low refer- ence activity. The main limitations of conven- tional roll microfilm systems are slow retrieval speeds and inconvenience to the user. The micro- film must be hand threaded through the reader, a slow and tedious operation. The user must then hand crank the film and scan the reader screen image by image until he finds the desired docu- ment. 22 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 *pr6(g&FR6#gWp RQP UqN1k, aBW74-00005 0ApORe3 .g~g09 view. He enters this Image Locator Aids. In both this and the sys- tem that follows, most of the microfilm reading equipment has been improved in three ways. First, a motor usually with both high and low speeds has been added; second, film cartridges or cassettes have been substituted for standard microfilm reels, and the reader has been made self-threading; and third, new techniques or de- vices have been employed to aid in locating de- sired film images. Except for conventional roll microfilm sys- tems, the motorized roll microfilm systems with mechanized image locator aids are generally the lowest in overall costs. They offer particular ad- vantages for lengthy documents or record series. They can be successfully employed for the repro- duction, dissemination, storage, and retrieval of catalogs, manuals, and publications, in which event many of the advantages described below for microfiche apply. Figure 12 shows some typical motorized (mechanized) roll microfilm readers and reader printers while figure 13 provides examples of the various types of cartridges or cassettes employed. The mechanized image locator aids are of three types, as follows: ? Bars or code lines superimposed between images on the film that, when matched with a corresponding scale on the reader screen, can usually localize the search to within ten images or less, in a sequentially arranged numerical or alphabetical file. ? Film pull-down (linear location) aids that employ microfilm readers incorporating an odometer-like device for finding images on the basis of their linear location on the film. As in the system using image counting, this one depends upon the user's knowing or separately looking up the location of the desired image. ? Image count aids, which consist of marks (blips) superimposed beside each film im- age for use on a reader that has a photoelec- tric counting device. To locate an image, the user must know or separately look up the image location number for the docu- number on the reader keyboard, and the film automatically moves through the reader and stops when it reaches that number. Figure 14 depicts examples of roll microfilm em- ploying these three mechanized locator aids. The use of the cartridges and cassettes with self-threading motorized microfilm readers has substantially improved the ease and convenience in the use of roll microfilm. The image-finding aids are a real boon to retrieval speeds in situa- tions where they can be satisfactorily applied. Of the three techniques, the film pull-down (linear location) is usually the least costly and can be in- corporated into a system quite easily. The bar or code systems are the next least costly and some- what more difficult to incorporate into a system. All three image-finding techniques have cer- tain limitations. Bar or code line systems can be used only where the file is sequentially arranged by numerical or alphabetical identifiers and the user is conducting his search on the same basis. While the film pull-down (linear location) and image count techniques permit the documents to be in random sequence, a separately maintained list or index may be required for use in determin- ing the proper microfilm roll and image location. Systems employing the image count technique require microfilm readers that are more complex and hence normally more costly than those used in the other two. Special Note on Changing or Adding to Roll Microfilm Most roll microfilm systems have one problem in common-changing or adding to previously filmed records. There are three methods for do- ing this, and none may prove entirely satisfactory. However, under certain circumstances, one or more might prove practical. The first and least likely method (except for publication of catalogs, manuals, listings, and COM produced items) is to retain the original documents, make the changes, and periodically refilm the entire file. A second but not always practical choice is to film the changes or additions and splice the new film onto the old film. A third method is to film the changes 23 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 INDEX METHODS USED IN 16 mm FILM Image Count Photo-optical Code Figure 14 24 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 App&&yMgsF&qrtWe1ff a el gkVQO 7 113 thg DP sfrypgogRNo0 g0 o900200 or 30;Reet containing full- a to the microfilm collection, and maintain a separate index or locator record (preferably com- puter maintained and produced) showing the lo- cation (microfilm roll and possibly image num- ber) of both the old and new images. This re- quires the user to make a double lookup, but this may prove to be only a minor handicap. Roll Microfilm with Photo-optical Binary Code. This type of coding system can be used to conduct computer-like searches. Figure 14 de- picts a sample of photo-optically coded roll microfilm. Such document descriptive data as titles, names, dates, numbers, and subject topics can be recorded in photo-optical binary code for- mat on the film, thus permitting the user to auto- matically conduct both simple and complex or coordinate-type searches. Depending upon the features of the particular equipment, search entry is made through a keyboard, dials, or a machine record such as edge-punched cards. The major advantage of the motorized roll microfilm system with photo-optical binary code is that it permits the user, while conducting the computer-like search, to simultaneously see the documents involved. The major disadvantage of these systems is the cost. Except where COM equipment is em- ployed for preparing the microfilm, the input costs are usually greater. The retrieval equipment costs more than that used in most other micro- film systems and is somewhat more difficult to operate. Unlike computer systems, the binary op- tical code, once recorded on the film, cannot be changed. Further, unless the file can be broken down into separate autonomous groups and the individual searches confined to a single group, the time required to conduct individual searches will increase as the file grows. This could result in a need for additional equipment and personnel, and thereby tend to offset the initial advantages of the system. Microfilm Strip Systems. Microfilm strip sys- tems employ roll microfilm cut into segments for storage of multipage documents. Three general manual methods used for storage and retrieval of the strips are: (1) maintenance in separate small metal or plastic containers; (2) attachment of the size written information; and (3) attachment of the strips to plastic sticks about a foot long main- tained in horizontal racks for rapid removal and refiling. The first two have received limited use for dissemination, storage, and on-demand repro- duction of lengthy documents, while the third has been used primarily for storing and retrieving in- formation and data contained in such listings as a directory or catalog. Figure 15 depicts a microfilm strip attached to a plastic stick, and the special storage rack and reader used for this type of strip system. All three techniques provide a means for unit- izing microfilm so that the individual documents or parts thereof may be independently selected, viewed or copied, and refiled. The third technique facilitates storage and retrieval of lengthy listings by making it possible to keep them in a very small space while at the same time permitting random, fast access to the information. However, an actual test is always needed to determine comparative retrieval speeds. The major problem with the first type is that of physically handling the strips-opening the container, hand threading it through a reader or splicing it onto another length of film, and return- ing it to storage. The main problem with the sec- ond type is that it, too, is somewhat awkward to handle and can only be used in certain microfilm readers. The main limitation of the third type is the cost of preparing and mounting the film and purchasing the special reader required to view the film. Microfilm Chip Automated Systems. These systems, as illustrated in figure 16, usually em- ploy small pieces of cut microfilm that are often stored in cartridges or cells and manipulated by means of electronic circuitry and electromechan- ical devices. A keyboard or other device is used to conduct searches. These systems have been used primarily to meet the need for high-speed re- trieval of short documents (one to three pages, generally) from extremely large files. In some systems, a considerable amount of photo-optical binary coded data can be entered on the chip, while in others only a document num- ber or address can be recorded. In one system 25 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 ," Approved For PjR I O(3#7A '8 0 ogg2 Special Storage Rack and Reader Figure 15 there is an iron oxide coated strip for recording data by means of a magnetic binary code, as on the magnetic tape used on computers. Microfilm chip systems are quite complex, usually involv- ing rather high equipment costs, and thus have not been used as extensively as some of the other systems. The hardware is generally not available off the shelf but must be custom engineered. production, dissemination, storage, and retrieval of documents or records having a total length of 20-98 pages or having chapters, sections, or parts of that length; they can also be used for longer documents, of course. Microfiche are sometimes used for storage of case-type material, such as hospital records. Microfiche. Microfiche, as illustrated in figure 17, are sheets of microfilm containing multiple rows of micro-images arranged in a grid pattern. Microfiche are particularly well suited to the re- The two most commonly used microfiche for- mats are both about 4- by 6-inches in size. The formats shown in figure 17 (60 pages per micro- fiche) was adopted in 1965 as the Government standard for reproduction of scientific and tech- 26 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001/07/17 AUTOMATED MICROFILM CHIP SYSTEM nical documents. Another format (98 pages per microfiche) has recently been increasingly used by both industry and Government. Figure 18 de- scribes some of the wide variety of microfiche for- mats and reproduction ratios in use today, in- cluding high reduction (HR) ratios. One of the major advantages of the microfiche is a possible savings of 70 percent or more to the user in acquisition costs in instances where a docu- ment is available in both microfiche and paper form. Another advantage is the elimination of document warehousing problems, since low-cost SAMPLE OF A MICROFICHE copies of microfiche can be produced at any point on demand. In many situations the most signif- icant advantage is the savings in time and costs for packaging, shipping, storing, and retrieving documents. Probably the major disadvantage of the microfiche is the relatively high input cost, which may make this type of microform uneconomical for internal application within a single office. However, if the documents are widely distributed, input costs can become quite insignificant. An- other disadvantage is that there has been no prac- APPLICATION OF PERCEPTRONS TO PHOTOINTERPRETATION. AD 605 442 FINAL REPT. FOR 1 JUN 63-1 JUL 64. CORNELL AERONAUTICAL LAB., INC., BUFFALO N. Y. VE-1446-0-4. T. R. BABCOCK, ET AL. CONTRACT NONR-3161-00. 76P UNCLAS JUL 64. U-2-3 I OF I AD 603442 NOS RESOLUTION CHART END DATE PLMED 4,1349 2 3 4 5 6 1 8 9 10 11 12 27 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 COMMON MICROFICHE FORMATS AND REDUCTION RATIOS tical, inexpensive method for changing or adding to individual microfiche (up to the time this handbook was prepared). If updating is required, the alternatives are similar to those described above under the heading, "Special Note on Changing or Adding to Roll Microfilm." Still another factor limiting the use of the microfiche is that special readers are required at every point of use ; and even though inexpensive readers are available, the overall equipment in- vestment may be substantial. However, as the use of the microfiche is extended to more and more document series, the readers may eventually become standard office equipment. Another pos- sible disadvantage is that some users feel that further improvements are needed in the readers in order to make the viewing more convenient and comfortable. Microfilm Jackets. Microfilm jackets are trans- parent carriers with one or more sleeves or pockets for holding strips of microfilm, as shown in figure 19. The entire jacket, with the microfilm inside, is placed in a reader for viewing. Film-to- film copies and paper enlargements may be made without removing the film from the jacket. To get the best results it is necessary to use one of the newer "thin film" jackets. SAMPLE OF A. MICROFILM JACKET 28 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001/ SAMPLE OF AN APERTURE CARD The major advantage of the microfilm jacket is that new images may be added, thus making it particularly suitable for active case-type records. It is compatible with the microfiche and can be used in the same types of readers and film-to-film copiers, and thus has many of the advantages noted above for the microfiche. The major disadvantage of the film jacket is the time required for inserting individual micro- images into the sleeves of the jacket; however, special equipment has been developed for this purpose to make the task much easier. Aperture Cards (Microfilm Electric Ac- counting Machine Punched Card). These cards, illustrated in figure 20, are standard punched cards (or edge-notched cards) with win- dows containing micro-images. The window is usually designed to accommodate one large docu- SUPERMINIATURE (HIGH REDUCTION) MICROFORMS ale&ii=i]Itl W 5 F4e,,,;y L 3I1 #11 1 1 I 1 :1 a01JR i i 1040 i li i0 l o-9 a tlDai,iSvnt in ?o ago 1n..aexne~Aaa Ld? 437fi2S3i 3...I.: 2,2 .'1is733,13:I, 3t .z~u i,3 xz x. s 317133333333113 a 321 a 3333131333313374 i731,73:19738d353 7it 141t14i1 11 11 4 1 4,1 1 S 4 A 44444114414 J 14444,44, 44j 6 i~80i S5i5.3 ij 3555 i i 1446415..I.._. 1-5 15 5 5 66i65ssi1: 566 iii 1166 i 1 i i i 666 k 6s S S 1.6695sea bs ment, such as an engineering drawing, or as many as eight or 10 letter-size pages, which in the case of the punched card, would require 22 card columns of space. This leaves over 50 columns for recording data such as the document number, description, and date in machine-coded form. There are also aperture cards containing sleeves as in microfilm jackets for inserting and adding images. One of the major advantages of the aperture card is the convenience in filing, retrieving, and adding to the file. Another advantage of aperture card systems is the capability for using mechan- ical devices for sorting and selecting individual cards, while at the same time permitting manual filing and selection of cards. Still another impor- tant advantage is the savings in time and cost for duplicating, shipping, handling, storing, and re- trieving documents. Further, there is available a 29 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved IDEO RECORDING SW'1 K 4-000058000100020030-9 Image Storage Desktop Viewing Equipment wide variety of equipment to satisfy the needs of the smallest to the largest user. The major disadvantage of the aperture card system is the relatively high input cost involved in the filming, keypunching (or edge-notching) of the cards, and mounting of the micro-images in the apertures. Therefore, as in the case of microfiche, the cost may make such systems un- economical for internal application within a single office. Further, extensive machine sorting and se- lection of the aperture cards may not be practical if the file is a very large one. When punched card equipment is used for card sorting and selecting, it is usually modified in order to minimize damage to the micro-images; or a duplicate "slave" deck, which does not contain the micro-images, is created for use in the punched card machines. Superminiature (high reduction) Micro- forms. Superminiature microforms and those referred to as ultraminiature microforms (ultra- fiche) employ a reduction ratio much higher than those used for ordinary microforms. (See figure 21 for an example of a book of more than 1,000 pages reduced. to one ultrafiche, and the special reader required for viewing the images.) The standard reduction ratios in use today readily permit the recording of 2,000 to 2,500 letter-size pages on a 100-foot roll of microfilm (and in some systems, up to 4,000 pages per 100-foot roll). Reduction ratios as low as 10 to 1 (10X) are used for newspaper's and as high as 42 to 1 (42X) are used for COM produced listings and cancelled checks. Superminiature microfilm, on the other hand, employs reduction ratios of approxi- mately 200 to 1 (200X) and higher. 30Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 ulaiui. auvantage of superminiature microforms is the further savings in space and shipping costs resulting from the greater com- pactness of the micro-images. Superminiature microforms make it possible to store an ex- tremely large collection of documents close to the users, or possibly within the viewing equipment itself. The major disadvantage of the superminia- ture microforms is the initial cost of preparing the master copy. However, as in the case of the micro- fiche and the aperture card, this cost may not prove excessive if there are a large number of users at various locations who use the same in- formation over and over again. Another disad- vantage is the lack of compatibility between this and any other microform media. Special readers with optics compatible with the very high photo- graphic reductions are required. Video Recording Systems. These systems em- ploy the basic techniques and equipment used in recording television programs, as illustrated in figure 22. Documents are placed under a camera and magnetically recorded on video tape or other media. There is a separate track for recording the document's number or other identifier. Retrieval is accomplished through a keyboard or by prepar- ing a machine record such as a punched card that is fed into the retrieval device. Images of the re- trieval documents may be viewed on remote ter- minal cathode ray tube (CRT) screens, or en- larged paper copies can be produced. The major advantages of the video recording systems are the instant recording and inspection of document images; the ability to add or delete documents; the ease of use; and the relatively fast retrieval speeds. Video recording systems have not been in use long enough to fully evaluate their performance and potential. However, the major disadvantages appear to be the relatively high systems cost; the need for special skills in planning, operating, and maintaining the system; and the need for special work procedures and routines to compensate for the lack of a practical means for gaining random access to the file. Special note on mechanized devices (miscellane- ous card selectors) for storage and selection of microfiche, microfilm jackets, aperture cards, and other unit records. There are numerous electromechanical devices that permit selection of individual unitized micro- forms by means of a keyboard. The smaller ones have trays holding approximately 1,000 items each, which can be interconnected and operated through a single keyboard. Typically, the individ- ual items are notched along the bottom edge, and the selected item pops up when its identifying number or location address is entered on the key- board. There are also very large units, some of which can be accessed through remote terminals equipped with keyboards and CRT displays. Some also have the ability to perform limited co- ordinate-type searches. The major advantages of these devices are that they reduce physical strain, eliminate the need for interfiling as microforms are returned to the file, and make possible an increase in re- trieval speed. The major disadvantage is cost. To justify the purchase of such equipment the file must be very active, but not more so than one person per key- board could handle. Thus, the limited access to the file could pose a serious problem in times of peakloads, expanded reference activity, or ma- chine breakdown. Microform-Computer Combinations The motorized roll microfilm systems with photo- optical binary code and the microfilm chip sys- tems combine in a single medium both machine- readable data and document images for simul- taneous searching and viewing of the micro- images. Further, it is possible to use any of the various types of microform methods and equip- ment described earlier in combination with a com- puter. There are, however, an increasing number of microform devices specially designed for direct use with the computer. Computers, as explained in chapter V, can perform complex coordinate and other types of logical searches, as well as other forms of data manipulation, at fantastically high speeds. How- ever, storage of very large volumes of data on-line can be extremely expensive; and since computers can only work with information that has been converted to a machine language code format, Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 31 Approved For Release 2001/07/17 : CIA-RDP74-00005R00010002003h0-~ their capability for storage and presentation of puter to quickly identify t e ocation of needed graphics and large masses of data is rather lim- information and perform related ADP operations, ited. The situation is much the reverse for micro- new solutions are provided for both today's and forms, of course. Consequently, the computer and tomorrow's problems. the microform can often be used to complement each other very effectively by maintaining the low-volume index data (or dynamic data) on-line with the computer and the large volume of infor- mation and graphics (or static data) in micro- form. Finally, a communications link-either hu- man or part human and part machine-or all machine-is provided to permit the two to work as a team. Roll microfilm and various forms of unitized microfilm such as microfiche, microfilm jackets, aperture cards, and chips are often employed. In any event, the microform portion of the system's work station includes a microfilm reader or copier that is mechanized to some degree. Com- munication with the computer portion of the sys- tem may be accomplished by either of two meth- ods: One uses a remote terminal with a keyboard and possibly a CRT display; the other uses a punched card or punched paper tape equipment for sending messages to and from the computer. There is also equipment available that permits use of a single keyboard to communicate with both the microfilm and computer portions of the system. It employs a split viewing screen for simultaneously displaying information produced by both parts of the system. If a person serves as the communication link between the computer and the microform storage unit, he is responsible for retrieving the appropri- ate micro-images upon receipt of the message from the computer. In other systems the computer message is used to automatically activate a microform reader that finds and displays the re- lated micro-images for the user. In still another system, the computer message is used to control a mechanism that locates the proper microfilm image and makes a film-to-film copy of it. The advantages of combined microform com- puter systems include an increase in the useful- ness of the computer, reduction of computer stor- age costs, faster retrieval of information, and im- proved access to information. By using micro- forms to store close at hand large masses of pre- viously acquired information along with current static or semistatic data and then using the com- The disadvantages are mainly that such sys- tems usually require highly skilled designers and a rather substantial initial investment. Special Considerations It should be quite clear by now that microform systems do not offer a panacea for all of an agency's document dissemination problems. A cost-benefit study should always be made and pilot tests conducted before deciding to go ahead with a system.. A major obstacle in any microform system is gaining user acceptance, and nothing should be left to chance. Appendix "E," (De- partment of the Air Force Regulation 12-40, March 5, 1971) provides a good example of the types of management controls required to insure the successful application of document miniaturi- zation techniques. When designing a microform system, serious consideration. should also be given to capturing and maintaining key identifying data in machine language. Using source data automation tech- niques, this can be done for a small additional cost at the same time the labels are typed. The machine-language record should prove highly useful as a means for automatic preparation of finding aids, inventory lists, and new labels, and purging of the file. Attention should also be given to subpart 101- 11.5 of the Federal Property Management Regu- lations (41 CFR 101-11.5). While this regulation primarily applies to situations involving micro- filming of permanent records in order that they can be destroyed, many of the safeguards pro- vided therein should be observed in all microform systems. The National Archives and Records Service, General Services Administration, operates micro- filming service centers throughout the country. Government officials interested in these services or desiring assistance in microfilming and other paperwork management matters should contact the manager of the nearest GSA Regional Office or Federal Records Center. Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 IV. MANUAL NONCONVENTIONAL INDEXING SYSTEMS The methods and equipment described in chapter III, "Microform Systems," were developed pri- marily to solve the problems associated with the physical handling and storage of documents. It was also explained in that chapter how microform systems can sometimes prove helpful in solving problems involved in looking up data in such voluminous listings as payrolls, directories, schedules, and price lists. If, for example, the user's problem is simply to look up the social se- curity number, address, or telephone number of individuals with whom he deals, a microform sys- tem, or perhaps a conventional tool such as a printed directory or card file, is usually all that is needed. If, on the other hand, retrieval involves searching for documents or information on the basis of subject topics or a variety of characteris- tics, attributes, or other features, the problem is quite a different one. The problems and limita- tions in using conventional methods and equip- ment in situations of this type are described in chapter I, "Why Information Retrieval Systems Are Needed." Chapter I, as well as chapter II, "How Coordinate Indexing Systems Work," ex- plains how the nonconventional information re- trieval systems may be employed to solve these problems. This chapter (IV) and the next one (V) describe the specific methods and equipment used in these nonconventional systems. This chapter covers manual methods and equipment, while tie one that follows describes those employing mech- anized equipment. Manual nonconventional indexing systems, for the purpose of this handbook, include those where the search is conducted by manual meth- ods. The tool or device may have been prepared manually, but some are, and most could be pro- duced and updated by computers and other ma- chines. Further, some of the tools could be con- verted to a microform format for ease in duplica- tion and dissemination. Types of Situations Where Non- conventional Indexing Systems Are Used There are two basic types of situations where the methods and equipment in this and the next chap- ter are applicable. The first type involves organ- ization of information mainly on the basis of sub- ject topics for retrieval of textual documents or information. The second type is concerned with organizing information (data) on the basis of characteristics or attributes (also referred to as indexing terms in this handbook) for use in iden- tifying and retrieving information or documents relating to individual people, places, or things. An example of this second type is a personnel skills inventory describing employees in terms of their education, experience, languages spoken, etc., for use in selecting people for promotion, re- assignment, special projects, or other purposes. This second type of system is far less complex to design and operate than the first, mainly because it is relatively simple to develop and define the characteristics, attributes, or features to be used as indexing terms, while the task of selecting and defining subject topics is difficult and imperfect due to the ambiguity of the human language. Prerequisites for a Successful Manual Nonconventional Indexing System The most important prerequisite for a successful indexing system is to obtain the right people for the job. In all but the smallest and simplest of systems, special talents of two types are required. The first requirement is for the services of a skilled person to design the system and then re- turn periodically to revise it, since there is no such thing as a finished design for an indexing system. If the system involves indexing documents by subject, the individual should have a thorough knowledge of both the subject matter field and indexing. If no such person is available, it may be necessary to use the team approach; that is, to bring together an individual who has a thorough 33 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 i knowledge of indexing but only a limited knowl- edge of the subject matter with a person of the opposite qualifications. The second type of talent needed is qualified personnel to operate the system. Again, if the system is used for indexing documents by subject, the indexers and searchers or indexer-searchers (and abstracters, if any) should have a thorough knowledge of the subject matter field and be properly trained in performing their duties. Of next or perhaps equal importance is the need for an operating manual or rule book. The operating manual should include a vocabulary of indexing terms or a thesaurus, as it is commonly called, listing all indexing terms and defining how they are used in the system, supplemented by cross-references for synonyms and incorporating one or more devices for showing relationships among indexing terms. The operating manual should also include any other rules, guidelines, and reference aids needed for indexers, searchers, and users. Another prerequisite for a successful system is close coordination between the operators and users of the system in all matters, including selec- tion of documents or data entered into the system and continuous feedback on the effectiveness and value of the system. All users need to be kept in- formed about the new accessions, and new users should be oriented in regard to the contents and use of the system. Another possible prerequisite, or at least de- sired feature of the system, is compatibility with other systems with which it may be interfaced now or in the future. This compatibility is of two kinds-system vocabulary and physical aspects. Today, it is seldom that any given collection of documents or data is of interest or value to a sin- gle organizational element. Somewhere within an agency, another agency, or the private sector, there is likely to be one or more groups of people collecting, storing, and retrieving similar if not identical information. System compatibility can therefore be of mutual benefit, possibly contribut- ing through sharing arrangements to lower costs for all systems involved, while increasing the level of service to users. Another important prerequisite is that there ng the new should be a minimum of delay in enter items into the system and making them available to the users. Not only should a search of the index reveal the presence of the item, but it should also be possible for the user to quickly obtain of it. Other prerequisites for providing good service to the user include ready access to the system and satisfactory performance of the system. A highly desirable but not necessarily essential feature would be that the system be readily convertible to an automated system. Factors Affecting the Choice of the Type of Manual Nonconventional Indexing System The major factors to be considered in choosing the most suitable type of manual nonconven- tional indexing system are as follows: (1) the present file size, growth rate, and estimated fu- ture size of the collection; (2) if the system is to be used for retrieving information by subject, the average number of indexing terms that will be assigned each document and the total number of indexing terms for the system; (3) if the system is to be used for retrieving information or iden- tifying people, places, or things on the basis of characteristics, attributes, or features, the number that will be used to describe each item entered into the system; (4) physical form, format, cost, and source of the input; and (5) the extent to which the documents or data will have to be changed, updated, or deleted. Other important factors to be considered in selecting the type of system include: (1) the aver- age number of indexing terms to be used per search, the average number of searches per day, and the extent of workload fluctuations and peak- loads; (2) the number and types of users and their physical location; (3) the physical form, format, and nature of the output required by the users; (4) service speed requirements; (5) special features required, if any, such as abstracting and evaluating documents and selective dissemina- tion of information (SDI) ; (6) accuracy and re- liability requirements; and (7) agency resources including availability of funds, personnel, and equipment for operation of the system. Further information regarding the significance 1oproved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001/07 CLUE-WORD EXTRACT CARD SYSTEM of these factors and guidelines on gathering the data, analyzing user needs, and selecting the right method and equipment are included in chapters VI and VII. A description of a number of systems employing manual nonconventional indexing methods and equipment is included in the records management handbook, Information Retrieval Systems. Types of Manual Nonconventional Indexing Systems The following are descriptions of the various types of manual nonconventional indexing sys- tems, with a brief summary of the main advan- tages and disadvantages or limitations of each. Clue-word Extract Card Systems. These sys- tems are subject indexes consisting of 5- by 8- inch cards arranged alphabetically by "clue- words" (keywords) taken from the titles and text of the documents. Each card contains an ex- tract of the document in which the keyword ap- peared. The extract is marked to indicate other keywords contained in the document, thus pro- viding built-in "clues" as to other places to look in the file when conducting a search. Figure 23 il- lustrates how the "clue-word" principle operates. Information specialists, or preferably users of the system, evaluate incoming documents for rele- vancy. They underline the keywords in each se- lected document and place brackets around the portions to be extracted. They also assign addi- tional indexing terms, if needed. Typically, tables of contents, author-prepared abstracts, and key illustrations are included in the extract. Typists then prepare a 5- by 8-inch dupli- cating master containing the document number, title, author, other standard descriptive headings, and the extract with all keywords underscored. A sufficient number of cards are made of each document to permit the filing of one card under each of its keywords and the standard headings. Various colored cards, colored stripes, and corner cuts are employed to code the cards as to date, source, type of document, etc. The incoming ma- terial is maintained in a separate file. The user begins his search by choosing a key- word he thinks should be helpful in identifying documents that may contain the information he is seeking. If, after scanning the cards filed under that particular term, he is still unable to find what he wants or needs further information, he takes note of other underlined keywords appearing in the body of the cards for "clues" as to where else to search for the needed information. He then refers to the other cards and thus proceeds with the search until he finds the desired information or until he has satisfied himself that the docu- ment collection contains nothing significant on the subject. The major advantages of the clue-word ex- tract card system are that no complicated input and output equipment is required; no precon- structed index vocabulary is needed (system is self-organizing) ; no special training is needed for conducting searches; it is highly browsable; and the extract cards are self-sufficient (it is usually not necessary to refer to original document). Further, this technique offers a simple, effective means for compacting text. The system concept is 35 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 SAMPLE PAGE FROM KWIC indexes have successfully applied A PERMUTED (KWIC) INDEX in indexing operating procedures and directives, forms catalogs, the Controller General's deci- INVESTMENTS-LESS DEVELOPED COUNTRIES 0954 04 00 INVESTORS EXPENSES 0212 03 00 INVOL. CONVERSIONS PROPERTY US 1231 00 00 INVOLUNTARY CONVERSION 1231 10 00 INVOLUNTARY CONVERSION -RECOGNITION- 1033 00 00 INVOLUNTARY CONVERSIONS UNDER SEC 10 0381 12 00 INVOLUNTARY LIQUIDATION OF LIFO INVE 1321 00 00 SEC 482 ISSUES INVOLVED 0260 06 00 CLAIMS AGAINST U.S. INVOLVING ACQUISITON OF PROPERTY 1347 00 O0 0HERASSEOSMENT INVOLVING EXCESS PROFITS' 6214 DI 04 OTHER THAN CONT IN THE IRC ISSUES RELATED TO STATUTES 9999 92 00 SUSPENSION UNDER IRC 6503 -B-' 9104 18 03 SAL OF COAL OR DOMESTIC IRON ORE DI SPO D272 00 00 TIMBER COAL OR DOMESTIC IRON ORE GAIN OR LOSS IN CASE OF 0631 00 00 IMPERFECT OR I RR=GULAR ORGANIZATION 6012 03 02 ELECTION I RREVOCADLE 1361 02 00 RUSTEE OR BENEFICIARIES IRREVOCABLE TRUST 8 IN HANDS OF T 1015 03 01 MUTUAL DITCH OR IRRIGATION COMPAR IES 0501 12 01 IAB INCURRED TO THE VIR ISL 8 ON REDUCT IN INC TAX L 0934 00 00 C ISL-GUAM-CAN ZONE-VIR ISL ADM PUERTO RICO-TRST TER PA 4735 00 00 UERTO RICO-TRST TER PAC ISL-GUAM-CAN ZONE-VIR ISL ADM P 4735 00 00 VIRGIN ISLAND RESIDENTS 0932 01 00 ISCLAT ION OF PROPERTY-GENERAL 2035 05 07 LAT ION ISOLATION-FEAR OF LOSS THROUGH SPECU 2035 05 OB ISOLATION-HAZARDS OF BUSINESS 2035 05 09 D ISOLATION-MARITAL STATUS CONTEMPLATE 2035 05 10 ISSUANCE OF STOCK TO VOTING TRUSTEES 4321 04 01 LIMITATIONS OF ISSUANCE TAX 430? 01 00 BAILEY ISSUE -1939 CODE- 2042 06 00 FOREIGN CENTRAL BANK OF ISSUE FROM U S OBLIG & INC DERIVED DY 0895 00 00 ISSUE NOT PROPERLY PLEADED' 7453 18 02 SIGNER ISSUE RAISED V. NOT RAISED BY COMMIS 7453 35 01 JOINDER OF ISSUE' 7453 I6 00 OBLIGATIONS ISSUED AT A DISCOUNT 0454 D0 00 WHEN ISSUED TRANSACTION 1223 12 D3 TAX ON ORIGINAL STOCK ISSUES 4301 01 00 SEC 482 ISSUES INVOLVED 0269 06 00 susceptible to application of computer techniques for information dissemination, automatic search- ing, and preparation of special finding aids. There- fore, anyone establishing a manual clue-word ex- tract card system today should capture and retain the input data in machine language format for possible conversion to a computerized system at a later date. The major disadvantages of the original clue- word extract card systems are the bulkiness of the files and the lack of a practical means for con- verting the file to an automated system. Permuted Indexes. Permuted indexes are spe- cially printed and organized printed manual in- dexes usually prepared by a computer from docu- ment titles, full text, a catalog, or index entries, as illustrated by the format of the KWIC (keyword- in-context) index shown in figure 24. There are various other formats, many of which are an im- provement over this one. Some of the better known other permuted indexes are KWOC (Key- word Out of Context), WADEX (Word and Author Index), and SPINDEX (Special Per- muted Index). To obtain these indexes a com- puter is programmed to alphabetically arrange the entries so that: each document or other thing being described in the index is listed under each of its keywords. sions, and in numerous other situations. In cases where permuted indexes are used for indexing procedures and directives, a special dividend may be expected-the index will highlight inconsisten- cies, duplications, and omissions. With the in- creased usage of permuted title indexing, authors are giving more attention to selecting meaningful, useful titles; and this, together with the improved formats and low costs, is enhancing the use of permuted title indexing. The retrieval capability of permuted indexes can be increased by inclu- sion of additional indexing terms selected from an index vocabulary such as the Thesaurus of Engineering and Scientific Terms used by the Department of Defense and other Government agencies. The major advantages of permuted indexes are the following: (1) the relatively low overall cost (in some situations the index can serve as a low cost substitute for manually prepared indexes or can make it :practical to provide an index where none existed before); (2) speed and ease of prep- aration (computer printouts that serve as final copy for offset: printing of the index can be ob- tained in a matter of hours) ; (3) ease of revision (the speed of a computer makes it possible to print out an entire new index including any revi- sions, rather than trying to manually patch up a printed copy as revisions are made) ; (4) more meaningful and browsable than conventionally printed indexes (the one or two word entries do not normally provide an entire concept) ; and (5) reduction in the time required to announce new documents and enter them into the system. Permuted indexes can also serve as a means for developing in-house capability in the use of computers for information processing, and in at least some instances will result in the establish- ment of a computer data base that may serve even more important purposes in the future. The major disadvantage of the permuted in- dex is that it does not provide cross-references for synonyms; therefore, it is subject to searching problems created by the author's inconsistencies in word usage and the normal ambiguity of hu- man language. Further, if limited to document titles only it becomes a shallow index; if applied 36 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 to the entire text, it may become too cumbersome The major advantages of the columnar card to be practical. systems are that the costs for supplies and equip- An ideal permuted index for procedural man- uals and similar publications would include a per- muted listing of titles for the parts, chapters, sec- tions, the paragraph or other headings, and any abstracts or other summaries of the contents of the documents. Columnar Card Systems. These systems, as shown in figure 25, are coordinate indexes in which one card is p: epared for each indexing term used in the system. The numbers of all documents indexed under each term are entered on its term card, Each term card is divided into ten columns, 0 through 9, and the document number is posted in the column corresponding to its terminal digit. Searches are conducted by selecting those term cards that seem pertinent, and then matching doc- ument numbers column by column to locate any numbers that appear on all the cards. The cards are usually prepared and maintained manually, either by hand or typewriter; the basic data, how- ever, could be maintained in machine language form and the cards produced by a computer. ment are extremely low; they permit parallel searching of the index file (rather than requiring a card-by-card serial search) ; and they are simple and easy to maintain and use, being highly ma- nipulative and browsable. The major disadvantages or limitations of the columnar card systems are that it is usually neces- sary to refer to a second document, such as an ab- stract or even the document itself, to obtain a document description or to determine a docu- ment's relevancy; and if the system is used exten- sively, searching can become slow and tedious should the columns of numbers become long and individual searches involve several indexing terms. Dual Dictionary Systems. These systems, il- lustrated in figure 26, are similar in design and use to columnar card systems, except that all the indexing terms and document numbers are printed on two identical lists mounted side by side in a binder. Instead of matching cards during the search process, the user looks up the first term SEARCHING WITH COLUMNAR CARDS Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For RJVI 5VW Wft-R TQQ00 ~~Q~~QQp -9 in its alphabetical location on the left side of the dual dictionary and then locates the second and other terms on the right side (or vice versa), checking for coinciding numbers at each step until the search is completed. Usually many copies of the dual dictionary are made and distributed to individual users. The disadvantages of the dual dictionary are also essentially the same as those for the colum- nar cards, with one exception-they are far more costly to maintain; however, if the number of users is sufficient the overall systems costs could, by comparison, be relatively low. The dual dictionary is best suited to those sit- uations where there are many users in different locations. The dictionary's usefulness can be in- creased by furnishing with it abstracts of the doc- uments and a copy of the thesaurus or other vo- cabulary of indexing terms. The data for the dual dictionary may be man- ually maintained; however, more often it is main- tained and updated by computer and then peri- odically printed out, duplicated, and distributed to the users. The major advantages of dual dictionary sys- tems are the same as those for the columnar card system, plus an important, additional one-these systems permit numerous individual users or groups of users to do their own searching, thus reducing the workload at the main information center and giving the user direct access to the system. Edge-notched Card Systems. These systems, as illustrated in figure 27, are cards containing punching positions, represented by pilot holes along one or more of their edges, used in recording in coded form such data as indexing terms, dates, and numbers. The data is recorded by punching out the area in front of the pilot hole. The edge notching may be done manually by a hand punch or semiautomatically by special equipment. The interior of the cards, which are printed in various sizes and formats, may be used for written infor- mation or graphics. Typically, one card is pre- pared for every document or item being indexed. To search the file, needles are passed through the appropriate pilot holes in the deck of edge- notched cards. The selected cards (those that are notched) fall out, while the others remain on the needle. Searching usually involves numerous needle passes. Other devices and equipment, in addition to the standard needles, are available for assisting in the search process. Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001/ EDGE-NOTCHED CARDS The major advantages of the edge-notched card systems include low cost, simplicity, the ease with which users may browse, immediate access to the description of the documents or things involved in the search process, and in many situations, elimination of the need to maintain the cards in a precise sequence. The major disadvantages of the edge-notched card systems include limitations on the amount of coded data that may be recorded on the card; slowness and awkwardness in the search proce- dure if the cards are used extensively for complex searches (due to the system requirement of serial searching) ; limitations on the size of file (many information specialists consider 5,000 cards to be the upper practical limit) ; the somewhat compli- cated code patterns; and the possible difficulty in detecting coding (edge-notching) errors. OPTICAL COINCIDENCE CARDS AND VIEWER Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Opt Aoo oand.For R eatse 2001/07/17 :CIA-RDP74-0( 005R000100020030-9 n c.e ys ems. ese systems, ment o browsing by t e user; rapid searching as illustrated in figure 28, employ cards (or speeds (partly because these systems permit sheets) with a fixed number of dedicated positions parallel searching of the index file rather than re- or address locations for drilling (or punching) quiring a serial card-by-card search) ; low cost for holes representing the individual documents or supplies and equipment; simplicity; and fast, easy items being indexed. A separate optical coinci- read-out of the search results. dence term card is maintained for each indexing term. After each incoming item has been indexed and assigned a serial number or optical coinci- dence card address location, all related term cards are removed from the file and machine drilled or punched in the appropriate position. Searching is accomplished by first selecting the optical coincidence term cards pertinent to the query. The selected cards are then stacked and are placed in front of a light source to visualize the existence of coinciding holes. The position of the matching holes on the cards indicates the number or address locations of any documents or items that fully satisfy the search question. In addition to identifying documents or other items pertinent to a query, the cards may also be used as a data manipulation and tallying device for compiling statistics; or, through the use of transparent overlays, as a means for presenting statistical data in a visual manner. Although in most optical coincidence systems the cards are drilled, manipulated, and interrogated manu- ally, there is equipment available for machine controlled drilling of the cards, machine counting of holes, and automatic printout of numbers. In the system developed by the National Bureau of Standards, the user can see an enlarged micro- film image of the related document abstract during the interrogation process. The optical coincidence cards most com- monly used are about 9 inches in size and can accommodate up to 10,000 documents or items and 1,000 indexing terms. Prescored punched cards that can accommodate 480 items are also sometimes used. The major advantages of optical coincidence systems are manipulatory ability; encourage- The major disadvantage of optical coinci- dence cards is that it is usually necessary to refer to a second information source to obtain a de- scription of the document or item, or to determine its relevancy. Another possible problem is in error correction; however, some types of input equip- ment help keep errors to a minimum by prevent- ing redrilling in the same hole. Special Considerations This chapter reveals that there are many simple, rather inexpensive nonconventional indexing sys- tems which, although manually operated, offer significant advantages over conventional systems for organizing and retrieving information. In many situations today, one of these manual sys- tems may be all that is needed to solve the infor- mation retrieval problem. However, in most situ- ations it will some day become desirable to con- vert the system to one that takes advantage of computer capabilities for maintaining, reorgan- izing, reformatting, merging, updating, and purg- ing of information in the file, and manipulating, selecting, and presenting the information. In order to do these things the data contained in the index file must be in machine language. Consequently, when developing and installing any manual nonconventional indexing system, serious consideration should be given to recording the index data in machine language as a by- product of the input operations. Such devices as paper tape and magnetic tape or card typewriters are ideally suited to this purpose. Further, as mentioned earlier in this chapter, the machine language data base, with the aid of a computer, can be used to produce many of the nonconven- tional manual indexing tools. Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 V. NONCONVENTIONAL MACHINE INDEXING AND RETRIEVAL SYSTEMS The significance of nonconventional machine in- dexing and retrieval systems rests not in the num- ber of basic types of equipment that are available, but in the wide variety of tasks these systems can perform, their flexibility, and their future poten- tial. In numerous instances the indexing, storage, and retrieval operations are, or could be, a satel- lite of a larger integrated automatic data process- ing (ADP) system. Today, there are many in- stances where the data base maintained for an ADP system could, with slight modification and expansion, serve as the nucleus for a highly use- ful information retrieval system. On the other hand, there are situations where machine non- conventional indexing and retrieval systems could largely pay for themselves by solving logistical and other problems involved in the preparation, stocking, distribution, replenishing, and control of documents. Obviously, the equipment used in machine nonconventional indexing and retrieval systems is usually more expensive than that used in the manual systems. Further, the machine systems are generally more difficult to design and operate. However, these conclusions can be misleading, and in practice they prove to be but a slight bar- rier in installing a machine system. The first rea- son for this is that instead of acquiring your own equipment, you could more than likely obtain machine time on equipment already installed in the agency or available through a service bureau. The second reason is that there are available many standard and special machine programs (machine instructions and procedures) that, with slight modifications, can be adapted to the job at hand. When one considers these possibilities, and the indisputable move toward automation in all areas, it becomes increasingly clear that any in- formation retrieval system study should include a thorough investigation of machine methods for doing all or part of the job either now or in the future. Types of Situations Where Machine Indexing and Retrieval Systems Apply There are two basic situations where the methods and equipment described in chapter IV and in this chapter may apply. In the first situation, i.e., retrieval of textual documents or information on the basis of subject topics, machine systems are proving highly satisfactory; and in addition, many of the systems can automatically furnish the user with a complete description of the docu- ment or permit him to view the document or-per- haps immediately-to obtain a copy of it. In the second type of situation, i.e., retrieval of informa- tion or documents on the basis of characteristics or attributes, machine systems have the addi- tional capability of being able to automatically retrieve selected data about a person, place, or thing, or a complete description or image of it. There is also an additional type of situation where only nonconventional machine information re- trieval systems apply-the storage and retrieval of large masses of data in what are commonly called data banks. Machine methods and equipment can be used to update these files, to automatically and selectively transfer data from one file to another, and, on demand, to selectively retrieve data and perform data manipulations. Prerequisites for a Successful Machine Indexing or Retrieval System All the prerequisites cited in chapter IV for a successful manual nonconventional indexing sys- tem are also important to the success of machine systems, and therefore should be carefully noted. An additional prerequisite for machine systems is the ready availability of personnel, either on a full or part-time basis, who are trained and experi- enced in the operation of the equipment. Another important prerequisite is the accessibility of equipment being able to have access to it at the right time and frequency required by the users. Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 EAM PUNCHED CARDS AND COLLATOR Another important but not necessarily essen- tial feature is that the data elements and codes be compatible with other computer data banks in the same field of interest so if it should later become necessary or desirable the data can be readily ex- changed, compared, or combined on a machine- to-machine basis. Factors Affecting; the Choice of the Type of Machine Indexing or Retrieval System In addition to the factors cited for manual sys- tems in chapter IV, which also apply here, ma- chine systems are concerned with machine record lengths. Machine record lengths involve: the number of data elements (for example, date of birth) per record; the, number of data items (for example, year of birth) within the data element; and the total number of characters (alphabetical, numerical, and special) per record. Types of Machine Indexing and Retrieval Systems The following are descriptions of the various types of machine nonconventional indexing and retrieval systems, together with a brief summary of the main advantages and disadvantages or lim- itations of each. EAM (electrical accounting machine) Punched Card Systems. These systems em- ploy cards divided into vertical columns, with each column then divided into 12 punching posi- tions. Each column can be used to record, by means of one or more punched holes, a single al- phabetical, numerical, or special character. The cards are divided into segments (fields) of various lengths for recording such individual data ele- ments as the following: titles, segments of text, names, dates, addresses; and code numbers repre- senting names of organizations, forms, products, or indexing terms. A wide variety of equipment is available for punching, sorting (including elec- tronic high-speed sorters), collating, interpreting (card printing), selecting, and analyzing the punched cards, in addition to equipment for per- forming arithmetic operations and preparing printed listings. Figure 29 illustrates a punched card and a special collating machine. Punched card systems were originally in- tended for use in performing statistical and ac- counting operations. In using punched cards as a medium for recording and retrieving data for in- formation retrieval, the system designer has to adjust his methods to the capabilities and charac- teristics inherent in punched card equipment. In organizing a punched card file for a coordi- nate index, there are two general ways for record- ing the index data and arranging the punched card file. One way is to prepare one or more punched cards, as needed, for each document or other thing being indexed and record thereon a Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 p I% VV OSignei~an eei H. X g i7n/i ~Cst7lijec~t ; P7~it'c~~ti 49n~a4~?9$e~qe9archeprocess and (2)cthe characteristics or attributes). The file is arranged limited accessibility of the punched card system, in document number sequence. The second way including a restriction upon the freedom of the to organize the index file is to prepare a separate punched card for each indexing term assigned each document. Each card usually contains only the document number and the assigned index term; the cards are arranged in groups according to the indexing terms. This is commonly referred to as an inverted file. The first way of organizing the file has the dis- advantage of making it necessary to pass the en- tire punched card file through the equipment each time a search is conducted; however, it has the advantage of furnishing the user at least a brief description of the document. The second ap- proach has the advantage of making it necessary to process only those punched cards representing the indexing terms involved in the search, which is conducted by comparing the punched cards representing any two of the indexing terms to de- termine coinciding document numbers, and re- peating the matching process for the remainder of the term cards involved. This second method has the disadvantage of providing the user with the document numbers only, thus making it necessary for him to refer to a second source or to the docu- ment itself to obtain a description of the docu- ment and determine its relevance to the search question. Another method of recording the indexing terms on the punched cards is to use super- imposed coding, which offers greater data com- paction but requires considerably more skill on the part of the system designers and operators. The major advantages of punched card sys- tems, when used for information retrieval, are their ease of manipulation; their relative simplic- ity (when compared with computers) ; their ease in reformatting, transferring, extracting, updating, and duplicating data; their capability for produc- ing low-cost duplicate sets and printed listings; the ability of the cards to also be manually se- lected, read, and refiled; and their ready convert- ibility to computer systems. The two major disadvantages of punched cards used as information retrieval systems are (1) the relatively slow searching speeds and the user to browse, due to the fact that card files and equipment are usually maintained in a machine room and their use requires trained machine operators. Most systems employing punched cards for coordinate indexing consist of less than 20,000 cards; however, if used primarily for simple data lookups and only occasionally for coordinate- type searches, a file of 50,000 or more may be feasible. Therefore, punched cards, due to this reason and the advantages described above, are particularly well suited to personnel skills inven- tory and other systems that usually entail a large volume of manual data lookups and recurring or special printed listings of various types and for- mats, but only a limited number of coordinate- type searches. Punched cards may also be used for selective dissemination of information (SDI) systems, but since today computers are more often used for this purpose, selective dissemina- tion of information systems are included in the latter category. Computers. Computer equipment is of two basic types: analog and digital. Analog computers may be likened to a slide rule or an automobile odometer, since they work with physical quan- tities and compute by measuring. Digital com- puters, on the other hand, work with numbers or digits and compute by counting. Digital com- puters are divided into two classes, special and general; computers in the general class are nor- mally used for automatic data processing (ADP) and information retrieval. A typical equipment configuration is shown in figure 30. Computers are the most versatile and power- ful of all the devices used for information re- trieval, due to their high processing speeds, ac- curacy, ease of updating, ability to perform com- plex transactions automatically and to commu- nicate with each other, and their ability to provide the user with a wide range of on-line search capa- bility and off-line services and tools, including permuted indexes such as the KWIC index de- scribed in chapter III. Another advantage offered by the computer used for information retrieval purposes is its usefulness for administrative and Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001([PDR,100020030-9 logistical tasks. For example, it can be used to prepare requisitions and announcements of new accessions, to operate a selective dissemination of information system (SDI), to bill for user charges, and to maintain an inventory. These spe- cial tasks are all accomplished as a by-product of normal input and output operations. The com- puter can also be useful in controlling access to restricted or classified information. mation retrieval area is still acute, and the only significant relief available at present is to utilize existing computer programs and operating sys- tems developed and designed by others. The prob- lem of developing low cost, on-line mass mem- ories is the object of intensive research by many computer manufacturers and others, and while the results look promising, none are yet com- monly available. Three of the major limitations in using the computer for information retrieval are (1) high input costs; (2) shortage of systems analysts and programmers having experience in informa- tion retrieval systems; and (3) lack of low cost, on-line computer mass memories. Solution to the input problem depends on applying source data automation (SDA) techniques, including captur- ing data in machine language as a by-product of other processing operations and using optical character recognition (OCR) equipment for auto- matic document reading and conversion to ma- chine language. The problem of the scarcity of experienced systems analysts and programmers in the infor- Notable progress has been made in computer- user communications. While most systems still re- quire the preparation of a punched card to gain access to the computer and most of the output is still in the form of printed forms and listings, punched cards, or microfilm produced by COM equipment, there are more and more systems that permit direct communication between the user and the computer. These two-way communications are accom- plished by means of remote terminals employing teletypewriters, other types of typewriters, and cathode ray tube (CRT) devices with keyboards and light pens, as illustrated in figure 31. By keying in the proper user identification code and Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 folTe i%vRdpFegFrg@hegWiaQOfdiOYI&Zrb di~6LRDP7mIf MooogV92'M3otgpect an increasing ement to computer- f the computer, the user is able to obtain answers to his questions or possibly update, edit, or delete data in the computer store. With the addition of the light pen, he is able to pinpoint numbers, words, or phrases appearing on a CRT to make searching easier and faster or to quickly instruct the computer to delete, change, edit, or transfer stored data. Data in the computer store can also be used to produce charts and other graphics. Significant refinements in computer programs, which make communication with the computer more like conversation, plus improvements in the hardware and reduction in equipment costs, as- sure that the remote terminal will eventually be- come commonplace. Since the main use of the re- mote terminal is to retrieve and manipulate data, those who manage the agency's records and other manag demand on the part o ize the agency's important data bases, particu- larly those that are dynamic in nature. Rather than describing computers in accord- ance with their size, type, or operating character- istics, this chapter describes them in terms of the ways they are most often used for information storage and retrieval. Computer index searching systems are those used to search index files where the indexing itself is performed manually. Indexers, using a guide such as a thesaurus of indexing terms, assign the index- ing terms to the individual documents. The index- ing terms are then usually coded, that is, con- verted to a numerical representation, and along with other pertinent data recorded in machine A CRT TERMINAL WITH A LIGHT PEN Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 language roved For Release 2Q01/Q7/17 ? CIA-RDP74-90005ROQ010002fi0030-9 g by means of a ey r device such as a searching can be ene cla in numerous card punch, paper typewriter, or other encoding device. The output of the machine language re- cording device is made a part of the computer As in the case of punched card systems, the index file can be organized and arranged in either of two ways-by document numbers or by index- ing terms. If the index file is arranged by docu- ment numbers, the file description of the docu- ment may include the title, author, date, and a list of indexing terms assigned to the document, together with other bibliographic data and pos- sibly an abstract or extract of the document. If the index file is organized and arranged by index- ing terms, only the number of each indexing term and the numbers of all documents assigned that term are shown on the main computer's index file (inverted file arrangement). In systems arranged by indexing terms, a separate auxiliary biblio- graphic record similar to the main index record for systems arranged by document numbers, is often maintained on the computer. When conducting a coordinate-type search in those systems where the index file is organized and arranged by document numbers, it is neces- sary to make a serial search of the file, which may necessitate the loading and unloading of sev- eral reels of magnetic tape if the information is stored on tape. Whenever a document satisfies the search requirements, its complete description is immediately available. When conducting a search in those systems where the main index file is organized and ar- ranged by indexing terms (inverted file) the en- tire index file, which is highly compact, can often be quickly searched on-line. However, it is neces- sary then to go to the auxiliary computer index file or perhaps a separate manual index file or the document itself, to obtain the description of the document. Determination as to which file ar- rangement is best is governed by such factors as the index file size, the number and frequency of searches, the type of equipment and machine pro- gram used, the needs of the users, and the capa- bility of the computer to conduct more than one search at a time. In addition to the general advantages of the computer mentioned earlier, its use for index other ways. The computer can be used to provide statis- tics on the frequency of use of indexing terms in both indexing and searching and the frequency of association between indexing terms-information that will provide valuable clues in system modi- fication and control. The computer can be used to construct or prepare the index dictionary or thesaurus of indexing terms and the various spe- cial reference aids for indexers, searchers, and users. Computer automatic indexing and searching or "full text" systems substitute the computer and its programmed instructions for human effort, not only in conducting searches but also in indexing documents. The full title, the full text, and other bibliographic data including an abstract, if any, are converted to machine language for input to the computer. Automatic indexing is based on the general principle that the noncommon words in the document are suitable indexing terms. In order to make it possible for the computer to choose the noncommon words, it is supplied with a list ("stoplist") of such common words as "the" and "of," which are not to be included in the in- dex. In the input processing the computer com- pares each word in the text against those con- tained in the stoplist, and where they do not match the word becomes an indexing term. Typically, in deriving an index in this manner each document, paragraph, sentence, line, and word is automatically assigned a serial number and the computer index file is arranged in con- cordance fashion. Following each of the indexing terms (the noncommon words), the serial number is listed for each location where the term appears in the text. In addition to the index, the complete original text is also usually maintained in ma- chine language. Numerous techniques are used for conducting computer searches of the full text index file. Typi- cally they include the Boolean algebra or set theory concepts employing the computer logic operations of and (intersection), or (union), and but not (negation), as illustrated in figure 32. Ad- ditional techniques commonly employed include specifying how many times the indexing term must appear in a document (word frequency counts) and the proximity of one indexing term to another. Further refinements in searching may be Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001 /07/1in9zCk, DDR74,, Identification of all documents or things that have been indexed with one particular term, for example: Term A-college education. Logical sum: A + B + C - - - + Z Identification of all documents or things which have been indexed with one or more of certain indexing terms, for example: A-college education; and/or B-speaks French; and/or C-speaks German; etc. CONDUCTING SUBJECT-TYPE SEARCHES BY COMPUTER Logical product: A x B Identification of all documents or things that have been indexed with two or more terms in common, for example: A-college educa- tion and B-speaks French. Logical product of logical sums: (A + B) x (C + D) Identification of all documents that have been indexed with one or more of the terms in designated groups of terms, for example: When using A-college education, B-speaks French, C-speaks German, and D-cartographer, all documents or things identified with any of the following combinations of terms would be retrieved: A and C; A and D; B and C; B and D; A B, and C ; A, B, and D; B , A, C, and D; and, A B, C, and D. Logical difference: (A - B) Identification of all documents indexed with one or more terms but not another, for example: Selection of all people with an A-college education except those also indexed under term B-speaks French. Sequence: A x B Identification of all documents or things where two or more particu- lar indexing terms appear in a particular sequence, for example: A-blue (first) and C-steel (second). Searches between barriers: (Barrier X (A x B) X Barrier) Identification of all documents or things where the indexing terms appear within a specified subunit, for example, A-railroad, and B-rates in the same paragraph. Greater than and leas than: > Identification of documents or things that have been indexed with numerical data, generally, which lies between specified limits, for example, all people who were born between 1900 and 1910: )1899 (1911. achieved by placing special conditions on the search, such as that the index term must follow the phrase `in conclusion," or must appear in the first sentence of a paragraph, and so on. The United States Air Force Legal Informa- tion Thru Electronics (LITE) system at Denver, Colo., available for use by all Government agen- cies, is a good example of the versatility of an automatic indexing and searching system. The LITE system includes the full text of all pub- lished Decisions of the Comptroller General of the United States; Armed Services Procurement Reg- ulations; Air Force manual 75-34, Reporting of Transportation Discrepancies in Shipments; and some 30 other sets of documents. When request- ing a search the user has three choices as to the output: A list citing the documents found to be pertinent to the search question; a three-line KWIC listing from those parts of the document text where the index term appears; or a complete printout of the full text of the documents. By using many of the same techniques as those employed for automatic indexing and searching, computers can also be programmed for development of classification systems, auto- matic classification of documents, and automatic preparation of abstracts and extracts. However, work in these areas is largely experimental. Other forms of automatic indexing include tech- niques employing statistical word counts and as- sociation maps. Work has also been done in re- fining automatic indexes by adding a thesaurus- like computer record that is used to provide guid- ance and assistance in either the indexing or searching process. No system for indexing textual material by subject is without its faults. All things considered, Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Ajpgoved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 d a w esigne and properly operated computer indexing and searching system can be expected to perform about as well as those information re- trieval systems where the indexing is done manu- ally. The major limitations of automatic indexing, searching, and preparing abstracts or extracts are the cost and the high degree of expertise required to design and operate such systems. However, the cost factor will become less critical as more and more offices move toward integrated information processing and retrieval systems that ultimately may include such features as computer-assisted document preparation and revision, computerized editing and preparation of the table of contents and index, and computerized printing. A copy of the same computer magnetic tape that goes to the Government Printing Office for use in automatic photocomposition and printing or is used to pro- duce microform copy by COM equipment will also serve as input to the automatic indexing sys- tem, thereby eliminating one major cost-that of conversion of the information retrieval system in- put to machine language. These integrated infor- mation processing systems have one advantage that for many organizations may be far more im- portant than the possible savings in cost-namely, the reduction in the period that elapses between the time an important event occurs, a fact is dis- covered, or a decision rendered, and the time the information is in the hands of those for whom it is destined or who may be searching for it. Those persons interested in learning more on the subject should read NBS Monograph 91, Automatic Indexing: A State-of-the-Art Report, reissued February 1970 by the National Bureau of Standards (NBS) U.S. Department of Com- merce. Selective dissemination of information (SDI) systems are those that employ the computer or punched cards to provide individual users or user groups with tailormade announcements of new documents in their individual spheres of interest. The user's interest profile may be developed by having him look over the thesaurus of indexing terms and select those terms that reflect his areas of interest. The results are then recorded on a magnetic tape. Each time a new document is in- dexed, the indexing terms assigned the document or appearing in the abstract are compared with those stored on the user profile magnetic tape. In those instances where the requirements for a match are satisfied, the user is sent an announce- ment of the document, including its abstract, if any. Figure 33 illustrates an article announce- ment (abstract) card and a card used by the re- cipient to respond to the SDI system operators. Note that there are blocks on the recipient's re- sponse forms for him to use in indicating whether or not he wants to see the document and if not, why not, thereby providing the system operators with the necessary feedback. An interesting variation of the SDI technique is to develop interest profiles for major projects or programs, instead of for people, and to use the computer to keep the project director informed of any new documents on the subject. While the costs for SDI systems are appreci- able, the costs may not be considered unreason- able from management's point of view, particu- larly in the areas of scientific and technical research and development. However, scientists and engineers are not the only professionals hav- ing problems in wading through the tremendous volume of new documents made available to them, while at the same time trying to make sure they have not missed any documents that could have a major impact on their work. The trend toward using group interest profiles rather than the profiles of individual users is re- sulting in less expensive and many times more practical SDI systems. SDI systems are especially valuable in providing the user with "peripheral vision" of information of direct interest to him, but which might be overlooked without the bene- fit of an SDI service. Computer data storage and retrieval systems, sometimes referred to as data banks, are those used to store, retrieve, and manipulate large vol- umes of data (facts, numbers, letters, and sym- bols representing basic elements of information that can be processed or produced). Data bases may be either of two types or perhaps a mixture of the two : (1) recurrent or dynamic data, which is subject to change, and (2) noncurrent or static (archival) data relating to a unique event or rep- resenting an unchanging situation. The data base Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For ReleaS Wl/ 7tfI: i OOH G-9 5473 RESNICK A RELATIVE EFFECTIVENESS OF DOCUMENT TITLES AND ABSTRACTS FOR DETERMINING RELEVANCE OF DOCUMENTS IBM ASDD YORKTOWN HGTS NY, 17-033, OCT 1961 INDIVIDUALS WHO, RECEIVED DOCUMENTS THROUGH A SELECTIVE DISSEMINATION OF INFORMATION SYSTEM WERE ASKED TO DETERMINE THE RELEVANCE OF DOCUMENTS TO THEIR WORK INTERESTS ON THE BASIS OF TITLES AND OF ABSTRACTS. THE RESULTS INDICATE THAT THERE WAS NO SIGNIFICANT DIFFERENCE BETWEEN THE USEFULNESS OF TITLES AND OF ABSTRACTS FOR THIS PURPOSE. 2 PAGES 1 1. Read the Abstract 2. Punch the Appropriate Bfx 3. If )you care to comr4nt punch the comment box and write your comment on tkis card 1 1 4. Return tiffs card to SDI Of Interest Document Not Wanted.) Of Interest, Have Copy .................... I Of Interest, Document Requested . Recipient's Response Card Figure 33 may be specially created for information retrieval purposes, as in the case of weather data, or it may be used to serve multiple purposes. For example, census data is used for developing statistics and preparing reports as well as for information re- trieval. The social security and Federal income tax data bases are used mainly for automatic data processing purposes and only secondarily for in- formation retrieval. Computerized management information systems also serve two purposes-to automatically produce reports and other com- munications and for information retrieval. It is the exception rather than the rule that a data bank is created and used solely for information retrieval. However, unless careful attention is given to the information retrieval needs in the planning and design of these multipurpose com- puter systems, there may be serious limitations or problems when later attempts are made to use the system for retrieving information. For example some of the earlier ADP systems, in attempting to keep the machine record as short as possible, omitted such important data as the names of the individuals whose records were be- ing maintained in the computer. Others were de- signed in such a way that individual items of data could not be selectively retrieved because the data was merely printed out in long lines without column headings. Sometimes the data was ex- pressed in coded form, making it necessary for the user to refer to a special table to interpret the printout. Another problem, which is particularly critical at this time, is the lack of standardization or compatibility in data elements, thus making it difficult and sometimes impossible to exchange, compare, or.combine data maintained in separate systems but relating to the same people, places, or things. Unlike computer index searching systems and computer automatic indexing and searching sys- tems, computer data storage and retrieval sys- Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 tem~euedf mrzRberg.(} t01i i:eG A-R[ P 4 a"d$OOS2063D1ose of identify- less variety of ways. Generally, the method used ing those which at some future date will or should initially for organizing and arranging the data be converted to a computerized data base and prior to conversion to a computerized system is then work with management in developing an also the method selected for the new system. orderly schedule for the conversion. Thus, computerized census records are organized and arranged on a geographical basis much as they were before the advent of the computer. Per- sonnel data banks are usually organized by the name or identification number of individual em- ployees or job applicants. However, the computer offers one distinct advantage not normally pos- sible or practical in. conventional systems-the capability of organizing and arranging the same data in a variety of other ways. For example, per- sonnel data can, in addition to the basic arrange- ment, be organized on the basis of organizational assignment, position classification series, years of service, etc., for direct searching or preparation of special listings. Case files (files organized by the names or identifying numbers of people, places, or things) represent approximately 85 percent of the folder- ized records of the Federal Government. These files contain a wealth of data, but when stored in conventional systems the data is buried so deep in the file that it receives only limited use. By converting the data in these files to computerized systems, it becomes possible to readily select, ex- tract, compare, and manipulate the data in an endless variety of ways to meet day-to-day oper- ational requirements, to provide statistical data for management decisions, and to satisfy unpre- dictable needs of the future. The only serious disadvantage of computer data storage and retrieval systems at present is their cost. However, the cost picture is gradually changing due to reduction in computer input costs through the application of SDA techniques; larger and cheaper computer data storage de- vices; faster processing speeds; and faster, less costly methods and equipment for retrieving and producing the system output. Tomorrow's records manager will more than likely discover that most of the data needed to satisfy his clientele will be available via the com- puter and that his conventional files will serve mainly as depositories for selected original docu- ments having legal or archival value. Today's records managers should therefore survey every Other Machine Indexing and Retrieval Systems While most of the microform equipment de- scribed in chapter III is designed primarily for storage of documents or data in miniaturized form, some also have the capability to conduct logic-type searches. These are as follows: Motorized (mechanized) Roll Microfilm with Photo-optical Binary Code. Although re- trieval speeds with this type of equipment are not nearly so fast as those that are possible with a computer, they permit the user to automatically retrieve information. The information is dis- played in page size, usually on a viewing screen, or reproduced on a film or paper copy. However, data on the film cannot be moved from one loca- tion to another, nor rearranged or changed. (For further information, see chapter III.) Microfilm Chip, Automated. This equipment has about the same capabilities as the system de- scribed immediately above. The use of the chips, however, does make it possible to insert and delete individual pages. (For further information, see chapter III.) Aperture Card. (EAM punched card-micro- film). Systems of this type make it possible to mechanically sort, select, display, and copy printed or graphic information appearing on the film images displayed on the cards. However, as in the case of microfilm chip automated systems, the equipment is not well suited to personal searching by individual users. (For further infor- mation, see chapter III.) Microform-Computer Combinations. Var- ious types of microform equipment can be linked either directly or indirectly to a computer so that the computer can be used to conduct the searches and the microform device used to store and dis- play the information or documents the user is seeking. (For further information, see chapter III.) Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 VI. HOW. TO DECIDE IF A NEW SYSTEM IS NEEDED The Preliminary Survey This handbook gives considerable attention to finding the best system for storing and retrieving information. There will always be situations where the best system is the same system used in the past. Other situations will warrant the use of modern information retrieval methods and equip- ment. For further clarification of the wide potential, consider any of the following situations: Case-type records used to correlate or com- pare data relating to individual persons, places, or things, for such purposes as per- sonnel selection and placement, selection of contractors for bidding, selection of equip- ment, and conducting special analyses. Sometimes information retrieval studies are pursued for weeks or months, or a new system is installed, only to discover that a conventional system is all that is needed. The first question, therefore, that needs to be answered-and rather quickly-is "When do I use the old and when do I use the new?" This chapter describes a step-by- step procedure for making a preliminary survey to answer that question. It will help in deciding when conventional methods should be used and when it is worthwhile to spend the time and effort to make a detailed study of the possibilities of modern information retrieval methods and equip- ment. Where to Look The preliminary survey should not be limited to the major files, the library, or collections of refer- ence materials. Rather, you should look anywhere there is a collection of information stashed away, regardless of the form in which it is stored. In this handbook, these files or other collections are re- ferred to as "information facilities." Certainly, the size and frequency of use of the information facil- ity are considerations, but they are less likely to rule out any system than they are to affect the type of system needed when weighed on the cost- benefits scale. Small units can sometimes justify relatively inexpensive and yet modern informa- tion retrieval systems. This is particularly true where there are many small information facilities containing information all or a substantial por- tion of which is the same. Case-type records used for looking up and extracting discrete data such as names, ad- dresses, amounts, dates, and other data needed for such purposes as answering cor- respondence, processing applications, and preparing reports. Subject files and indexes relating to written text and used for obtaining any information that might aid in handling a current task or problem in connection with such activities as legal work, research, preparation of instruc- tions, and management planning. Reference collections containing such items as publications, technical reports, procedural manuals, directories, catalogs, and statistics used in day-to-day operations or research. Files of graphic or pictorial material such as maps, photographs, slides, and engineering or architectural drawings in situations where the users are trying to find items having set characteristics or attributes. Examining User Needs Looking at all information facilities, of whatever description, is a practical and solid starting point. It is, however, at least equally important to ex- amine the needs of the people who use the infor- mation. Why is it important to look at both the infor- Approved For Release 2001/07/17 : CIA-RDP74-00005R000100020030-9 pro ed or Relea se 2 001/07 17 :CIA RD - LL 0 wz LLO O z J- Jf w 2 U.M LLO z 0 0 a wLL 1z ?' O ? V d a O J z f ^ ( Q W U 0 0 O Q . N Ix >- J W 0 (9 a ^ Z ?N~ C7 w z Q ^j z0 U. 0 Q ^ m aF W ? U 0. F s c ti c z m `o W y o Eo q S 0 ~O N ? d V 1 o .. 0 S ^. w ~ VRp z Jr~ 41 e ? o:. a FZZ ~mn'` W o w ~~ O 4?~. Qo . J q^ O e: Ua'E o o f ?~ Z p cu a.` o_L_a W} ..r. ? . LuN`8. 0 -t uJ w Zo N , d IL a`~"= LL 0.q . > ~ i F SUE 1z C7 Da O~y o v z 0< j ~o=~a hE U m 0 { s N c ' z O Zo oa - q?1N F 0 o rc D 0 y w 1y F ' z 0 0 r 'I N F 0 Lr; 1-1 cd z J C C 3E Q N 0 44 O LL y y H C LL ^ z F- v0 ` 0 U C-3 0 W K O 4n 19 0 -It cd ?^ ? LL O --~ O a' G3 1z4 2 J Lr < h ? QI z F' Ch Y ix Ir azo a