DOCUMENT CLASSIFICATION
Document Type:
Collection:
Document Number (FOIA) /ESDN (CREST):
CIA-RDP84-00951R000400070003-0
Release Decision:
RIPPUB
Original Classification:
U
Document Page Count:
64
Document Creation Date:
November 11, 2016
Document Release Date:
February 25, 1999
Sequence Number:
3
Case Number:
Publication Date:
January 1, 1960
Content Type:
REPORT
File:
Attachment | Size |
---|---|
CIA-RDP84-00951R000400070003-0.pdf | 5.07 MB |
Body:
Release 1999/09/24: CIA-RDP84-00951R000400070003-0
FLJAL USE I"
REFERENCE AID
DOCUMENT CLASSIFICATION
PAPERS PRESENTED
AT THE
CONFERENCE ON
PHILOSOPHY OF DOCUMENT CLASSIFICATION IN OCR
CIA/CR-31
January 1960
)
CENTRAL INTELLIGENCE AGENCY
OFFICE OF CENTRAL REFERENCE
FOR
Approved For Release 1999/09/24: CIA-RDP84-00951R0004000700
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
F-O-R 0-F-F-I-C-I-A-L U-S-E
DOCUMENT CLASSIFICATION
PAPERS PRESENTED
ATTIRE
CCNFERENCE ON
PHILOSOPHY OF DOCUMENT CLASSIFICATION IN CCR
21 November 1959
CIA/CR-31
OFFICE OF CENTRAL REFERENCE
CENTRAL INTELLIGENCE AGENCY
JANUARY 1960
Approved For Release 499,w09-41;_cmk:F_Rft84,11:29,11 no94.90070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
CONTENTS
Foreword - ...........
Page
Introduction to the Nature of Classification. .
4
?
4
1
Panel I. The Intelligence Subject Code
8
Discussion . . . . . . * 4 I ? 4 4 0 4 t r V
*
r
r
18
Panel II. Classification Tools . : . .....
19
Discussion V a I 4 4 ......
v
1
29
Pane III. Supplements, to the Main Classified File
30
Discussion r
r
39
Panel IV. Contribution. of Machines to the
Classification Process
Discussion 56
Summary of Final General Discussion ?...
Appendix: Conference Schedule,
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Foreword
Until recent years the subject control of written informa-
tion has been largely limited to the control of printed books.
The preseat flood tide of documents and. reports has resulted
in a proliferation of specialized subject codes and classifi-
cations, each suitable for the control Of a portion of this
flow.
Intelligence documentation must cover virtually all fields
of human :cnowledge at sufficient speed to bring the pertinent
portions of millions of documents to bear on a problem demanding
immediate solution. Advances towards this goal have been achieved
by the liberal extension end modification of traditional informa-
tion processing techniques. In many areas machines have super-
seded the manual searcher, bringing with them new capabilities
and new lLmitations,
The -.7apers that follow reflect some of the developments
that have taken place within the Central Intelligence Agency
to make documentary information usable.] They were presented
at an off-duty gathering of document analysts and reference
personnel, sponsored by the Office of Central Reference. The
views expessed are those of the individual writers and do not
necessarily constitute OCR policy.
11
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
INTRODUCTION TO THE NATURE OF CLASSIFICATION
25X1A9a
I have been assigned the task this morning of providing some introductory
remarks on the nature and use of classification.
The first problem, of course, will be to take sure that we agree on
that we mean by classification. That calls for some definitions.
Secondly, it will help in our orientation to take a brief look at the
history of this art and to try to identify some of the principal systems of
classification which have evolved.
Finally, it will; be appropriate and I hope useful to speculate on the
use and value of classification in intelligence.
I should note at the outset that we are concerned here with the classi-
fication of knowledge, not with security classification which is a highly
specialized application in the field.
Classification is not some inert device such as you might look at in
a display case in a museum. It is a highly adaptable tool for solving
problems. Every organization of individuals engaged in a common activity
will inevitably require and develop a locally adapted classifioation for
sorting an retrieving its information. How effectively the system operates .
cannot be judged in any abstract manner. It must be evaluated in the par-
ticular, local situation in which it has been developed and employed.
Fortunately I am talking to an audience this morning that possesses an
advanced degree of experience in this field, even so it is not a little
ambitious to discuss the general aspects of classification in twenty minutes
when one remembers that library schools offer year-long courses on the subject.
Now for some definitions.
"Classification is the grouping of various things on the basis of
likeness." Classification is Also described as a grouping or segregation
Into classes which have systematic relations usually founded on common
properties or characteristics. I should insert a comment at this point con-
cerning the sources I have been drawn on for this paper. I have relied at a
number of points on the work of an Englishman, Mk. John Edwin Holmstrom. His
book "Facts-, Files and Action" published. in 1953 is the most complete and
satisfactory single discussion I have found on the general subject of docu-
mentation. Also useful was the book "Classification in Theory and Practice"
by Thelma Eaton published in 1957.
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Mr. Holmstxom makes the following observation at one point: A true
classification is a map showing the interrelationship of ideas whereby a
user can orient himself and make cross-country journeys from one idea to
another in a more or less distant part of the field.
Should a person seek to devise a scheme Of classification there are
two conditions he must satisfy in order to make it workable: First, a
distinctively lEbelled home must be provided (Or providable) for every
possible kind of item that is liable to arisee secondly, these homes must
be so labelled as to make them mutually exclusive. A classification must
have conciseness, orienting power, and specificity . If its terms cause a
user to knock on the wrong door or to go past doors which in fact enclose
what he is looking for, his scheme is inadequate in one or more of these
respects.
Knowledge is gathered into classes so that causes and effects may be
systematically examined. Where a cause invariably produces a certain effect
we discover in this process a natural laws It has been pointed out that
classification clarifies thought, advances inVestigation, reveals gaps in
the sequence of knowledge and thus promotes discovery
The type of knowledge classification with which we are concerned today
was first applied to books. The development Of mass produced books and of
public education brought such tremendous increases in the rate of growth
of libraries thut no one individual could any longer hope to command personal
knowledge of all of the books in a large library. Thus it became important
both for librarians and for researchers that books be arranged on shelves
so that they could be got at without endless Searching.
Systematic preplanned classification of books according to a scheme
workable in many libraries is a surprisingly recent development. An
American, Melvil Dewey, in. 1876 developed the first of the present-day
widely used booh classifications, usually referred to as the Dewey Decimal
Classification. The Dewey concept proceeded by branching the whole of knows
ledge into 'ten; main divisions. Each of these in turn was sub-divided into
ten and so on to whatever number of decimal places might be necessary to
specify the subject matter under analysis.
Dewey also integrated in a logical and cOncise manner the other com-
ponents that male up the system of organizatiOn of the holdings of a modern
library. We need to check our glossary at this point. The outline of a field
of knowledge is called a schedule An index in alphabetic order of the terms
contained in the schedule is required to provide ease of entry for a user
with a particular subject interest. The "pooka are located on the shelves
according to a notation, a scheme of numbers or letters or a combination of
the two, patterned To reflect the hierarchy of knowledge in the classification
achedale, Firialy since books contain many subjects as a rule, yet only one
of these can control the point at which the given book shall be shelved, a
system of .22b1set headings is required as a means by Which all pertinent
subjects in the book may be individually recorded in a subject catalog.
Approved For Release 1999/09124: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
The Dewey system has enjoyed a remarkable success probably for reasons
well stated in a comment by William Gladstone, the former British Prime
Minister: "It is an immense advantage to bring the eye in aid of the minds
to seek within a limited compass all the works that are accessible in a
given library on a given subject; and have the power of dealing with them at
a given spot instead of hunting them through an entire collection.'
The next major system to appear after Dewey was the Library of Congress
claosification first published in 1901. Perhaps its principal innovation
was the use of many more classes and the development of an alpha-numeric
notation to accommodate them. Furthermore many libraries were discovering
even by that date that the Dewey structuring of classes in ten divisions
was arbitrary and unsatisfactory in many subject fields.
The third modern classification scheme, the Universal Decimal Classi-
fication, appeared in 1905. Basically it was an internationally standardized
extension of Dewey. It followed the same plan as Dewey and its main classes
bore the same numbers. However, because it was designed to deal with the
analytical indexing of miscellaneous detailed items of information, especially
in scientific and technical journal literature, its categories were extended
much further into detail and to date over 100,000 categories have been agreed
upon against 11,000 for Dewey. There are some very familiar criticisms of
the UDC. Listen to these from Holmstrom: "Despite its standardization it is
not in fact the case that independent classifiers will always give the same
item exactly the same class number and searchers will invariably know under
which number to look. At many points the choice of nunbers still leaves
room for a considerable personal equation. Also, since expansions are
centrally controlled the' extension of clasa numbers to cover new develop-
ments always lags substantially behind the needs of libraries."
The systems we have talked about thus far are what we call pre-planned
classifications because they seek to provide in advance for all knowledge.
A rather remarkable occurrence of the at thirty years has been the appearance
of what may be called self-developing classificatione. These represent an
attempt to avoid the difficulties experienced under pre-planned classifications
in dealing with change and with the growth of knowledge. They seek a flexi-
bility that will permit the addition of new terms and new intersections of
knowledge without upsettingtoxisting recazxls. They avoid making a cataloger
establish a correlation of knowledge and insteLd, make it possible for each
researcher to proceed accordin3 to his own personal concept Of the classifi-
cation of a subject field.
In this country the best known schemes of this variety carry the label
"coordinate indexing" as foaght over by Calvin Mooers and Mortimer Taube.
The system is applied roughly as follows:
1. A documentatiOn staff develops a list of-terms which are
.significant to its researchers and under which it wishes
to eatabiish a record Of pertinent documents, books or
other recorded information.
3
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
2. A Eeparate record card is established for each of these terms.
3. Incoming documents are analyzed for subject content according
to these terms.
4. The control number of a document is posted on the term cards
which the indexer has determined are appropriate.
5. Nov if one compares the document numbers posted on the record
cards for terms A and B, a match of numbers will indicate that
both terms are dealt with in the given document.
Knowledge comes in patterns. Here is an attempt to atomize such patterns on
the theory that previous and new patterns can be reconstructed by the user of
the system. Whether this is truly practicable or not is a highly controversial
issue at the preEent time. Much thought has been given to the possibilities
of symbolic representation of the terms and their manipulation according to the
rules of mathematical logic.
In 1934 S. F. Ranganathan of India published the first edition of his now
famous proposal for what he called colon classification. Unfortunately there,
just isn't time to dig into this system in detail. A major characteristic is
the lack of a comprehensive hierarchical structure of knowledge. Rather,
Ranganathan has Eought to develop a method for analyzing knowledge. He con-
ceives of five basic facets from which logical branchings of knowledge and
indications of the intersections of knowledge should proceed. These are Time,
Space, Energy, Material and Personality. The symbols for expression of facets
are letters, numlers and punctuation. Relationships between facets must be
expressed in a prescribed sequence, separated, or linked if you wish, by colons
and various other symbols which specify role. In making a subject entry in
the Ranganathan card catalog one places a card not only at the terminal point
of a complete facet analysis, for example, at the end of a linkage base idea.
I might add that various applications and tests! of the system are underway;
also, that the comment has been made that the whole concept is rather alien
to American thouEht patterns.
The future cf self-developing classifications is closely tied to machines.
The original applications of coordinate indexing were embodied in simple
manual card systems. Applications involving progressively sophisticated equip-
ment have fairly mushroomed in the post-war peAod. We can mention edge
notched cards, the Batten or peek-a-boo perforated card devices, applications
of IBM punched cards and the use of computers, for example at the General
:Electric gas turline plant in Cincinnati.
It is time row, however, to break off front this- line of discussion and
to consider the Lae of clasification in CIA. Thee will be time for only the
most general of propositions.
I think you have got to approach the problems of information handling
initially from tle viewpoint of the researcherH This is interesting business.
The typical analyst brings with him a university background of training in
Approved For Release 1999/?24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
scholarly research and a rather elaborate structure of formal knowledge of
his chosen Subject field. In CIA he finds a kind of newspaper world with
its field collectors or reporters of information and its copywriters,
editorial writers and editors at headquarters who estimate the news up to
each edition deadline, fellow developing stories through edition after edition
and employ "successive approximation" as it has been called by MakMilIikan,,,
as the method of getting At the "truth:" Today's conclusions may confirm,
contradict or modify those of yesterday. Much depends on the brain power of
the team, but good sources, good methods and just plain luck will also bear
on the quality of the performance.
Now what does the analyst do on the job? He sifts incoming information
from a variety of sources. What he can't keep in his head he records, pretty
much as he pleases in a first-stage external memory, his working file. This
is a classification system. It may be formal or highly subjective in its
structuring of knowledge. It represents a sort of capillary system for hand-
ling new knowledge and new language. It also constitutes the basic platform
on which the analyst proceeds with his problem solving in his field of
specialization.
In large systems, a second-stage externalized memory such as the Office
of Central Reference is also required. This facility must serve the needs
of all analysts in its audience. Unavoidably it must compromise the desires
of individuals. It must perform what the analysts have neither the time nor
the acquired skill to do. Each category of knowledge requires a discipline -
rules for consistent processing and manipulation. Thus specialization of
information storage occurs. And let me say that I think OCR does these jobs
very well indeed, as well as they are done anywhere. We have developed
"know-how" in defining, controlling and retrieving by category or within
category, our data on names, area, photography, industrial plants, trade
fairs or information on Communist Party activities. We need apologize to
no one when we also acknowledge that we hope to improve in the future our
methods for making these systems better serve our customers.
The intelligence process in which both the analyst and we play a part is
certain to be deeply affected by automation and in the relatively near future.
There is much evidence already at hand.
Print-reading devices will transfer information from document to
machine.
A method is already proved for bringing field reports to headquarters
on tapes ready formathine inptt.
A first-cut identification of information in the document will be
made by auto-abstracting and language manipulation techniques. As you
may know, ACSI hopes to begin a program of this sort in 1960.
Dissemination to the analyst's office by Western Union type ticker
tape will be inaugurated based on computer matching of document contents
with his requirements-
5
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Analysts ard field collection staff will sounenicate by voice and
television and adjust requirements to "feedback" immediately as agreed
upon. This communication will include instantaneous transmission of
text, photograpts, maps, et cetera.
The analyst will store at least some of his information under
categories of his own choosing in the central facelity. He will record
his evaluation cf each document with us so that we may correlate it with
others for the tenefit of future users.
The analyst will contribute to the constrUction of OCR's indexing
dictionary, thesaurus and hierarchic classification scheme.
We will apply automatic indexing techniques in particular to
our directory type programs.
Our indexirg staffs will be primarily concerned with the subject
control of complex subjects, with abstracting and consolidation of index
data, with purging, and with the coverage of many categories of informa-
tion we cannot row afford to process.
Our customers will learn to utilize our facility on a much more
spontaneous basis, as they now use their personal files, because our
system will respond promptly. It will return their own contributions
and those of otter experts including their evaluations of any report
in the system.
I offer you these opinions in closing:
Our system does not mesh as well as it out to with the informa-
tion handling petterns of the intelligence researcher. In the future
the researcher must be able to query our facilities as simply, quickly
and directly as he does his own files, or his telephone directory,
dictionary or ercyclopedia.
We must be keenly aware, day in and day Out of the researcher's
iaterests, his language, the file system he uses and his opinions as
to the effectiveness of our operations.
We cannot provide the value judgments on what we process. We may
assist the analyst to discover significant relationships between facts
but ours will never be the final authoritative judgment.
I am much puzzled by the problem of handling what I will call
saturation reporting. I have seen 20 or more reports on the Paris
information conference of last summer. These reports have been
highly repetitive in content. It may be a reductio ad absurdum to
try to subject-code each facet of their contents for fine-mesh re-
trievability.
Washington Platt has estimated that strategic intelligence informa-
tion depreciate: at the rate of 20 per cent per year and that facts
Approved For Release 1999/0%/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
such as those concerning a port or manufacturing plant depreciate at the
rate of 10 per cent per year. As our body of knowledge grows, we will
experience increasing difficulty in selecting out, that is purging, this
dead information.
We will have to be thick skinned towards criticism. A retrieval
of unsatisfactory information or an answer of "no information" may find
the researcher condemning us even though the system itself operated
efficiently. Failure in a search may mean that no information has been
received in our system. On the other hand we must not evade criticism
when it can be shown that we have in effect hidden information from its
potential user.
In the automated reference center of the future, the anslyst will
be able to a degree not now possible to orient himself on the map of
interrelationships of ideas and to make cross country journeys from
one idea to another anywhere in the field of knOwledge.
7
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Panel I
THE INTELLIGENCE SUBJECT CODE
25X1A9a
X1A9a
Panel Members: s okesman)
Ia carrying out their joint obligation for making intelligence documents
available to consumers, the Document Division and the CIA Library have developed
a system of classification which is the basis for the storage and retrieval
of information t1 rough the Intellofax System. The experience of these two
divisions reflects a continuous encounter with problems of input and re-
trieval. Actions taken in response to immediate needs are integrated into the
system. Collectively, these may seem to reveal, more hopeful pragmatism than
systematic and lcgical progress. Yet each decision has been taken with the
same object in view -- that of providing the user with the material he needs
and excluding that which is irrelevant. While this paper is intended to
examine the system from the viewpoint of its aims rather than methods, it is
still necessary to rely upon a measure of description to explain how the
classification scheme has taken its present forM. The various factors which
cambine to form the problem of classification refinement will be discussed
in general and ir particular. These are: the elassification system, the
dpcuments, and tte requests for information.
While it is true that plain words are besth, all technical operations
develop a vocabulary which permits the substitution of word or phrase for
a lengthy description. Certain terms used frequently in describing the
processing of documents and the system. itself should be defined here
before they appesr in context:
Intellofax System. A mechanically supported system in effect in OCI
since 19/r8 consisting of an index file of IBM cards coded by subject and
area; taped lists of bibliographic citations to documents; and microfilm
aperture cards wtich are the Library's file copies of documents.
Index. The verbs "to index", "to classifYr, and "to code" are
used interchangeably.
Nodex. The term used to indicate material which is not indexed.
8
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Notation, The professional term for what we commonly call the 'code'
It may be alphabetic, numeric, or a combination. In our particular classi-
fication system, numbers are used to represent subjects.
Processing. Here refers to the entire activity concerned with re-
--
ceiving, disseminating) indexing, and microfilming documents; and punching
IBM cards.
ISC. Abbreviation for the Intelligence Subject Code, the classifi-
cation system used in the Document Division and the Library since 1948.
I. The Classification System
The need for a classification system was evident from the outset of
operations if the incoming documents were to be handled in keeping with
principles of systematic process and uniform control. At its inception
in 1948 the ISC consisted of 8 chapters each representing one major subject
category. The entire code contained 980 notations. Within each series,
decimal breakdowns permitted refinement to the extent of six digits.
This project originated in an attempt to consolidate various systems of
classification into one comprehensive code. There was no particular
requirement that the scheme devised at the time should form a pattern
for other agencies forming the intelligence community. As A matter of
fact) there was no meeting of minds at that time for the need for a
common system. Ti' expanded code of today with its 15)000 notations
and the revised new issue to be published in 1960 represent the response
by the Document Division to the need imposed by the increased flow of
documents, the wider range of Intellofax patrons, and its more recently
Nroposed USIB adoption of the ISC as a common classification scheme.
One example and comparison may illustrate how a single subject has
I ,been treated in response to intelligence needs. In the original ISC,
Communism and the Communist party were covered by eight notations.
There are now 190. For this same subject) the Dewey Decimal Classifi-
cation uses only 12; the Library of Congress lists 13 headings. This
one example shows clearly how detailed a special purpose index can
become as compared with a universal system. Likewise, the particular
needs of the Agency and the diversity of material received from other
Government agencies have determined the direction of our efforts to
provide the service required by consumers.
The chronology of the many adaptations within the present ISC
shows that there have been two parallel developments. First, there
has been a continual process of additions to the code. Some were in
the form of major revisions of entire chapters, but most are single
new notations, generally inserted as a result of internal decisions
to take care of day-to-day needs not already provided for in the ISC.
The major changes in the past 10 years have been new issues of .
2 chapters, one for Air Force, the other on Scientific Research and
Development. Both were reconstructed according to the wishes of the
9
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
office most coneerned. The Air Force chapter was designed to fit Air
Intelligence needs when the Air Force adopted the ESC for its own
Minicard use in 1956. In consequence, the classification refinements
of that section are more suited to Air Force needs than to those of
this Agency.
Other Ageneies and Offices have expressed active interest in
revisions of the ISO with results similar to the example of the Air
Force. Army submitted a revised code in 1956 which was so refined
in subjectsof eurely military interest such as order of battle,
Logistics, and army organization that it was unsuited for general
use. .
In the case of the chapter on Scientific .Research and Develop-
ment, the demands for expansion have been many and varied. One of
the most articuate was in 1949 from a plant biologist who succeeded
in introducing 38 codes for diseases of plant. Although not one of
these codes has been used in retrieval during the past two years,
the section remains as a harmless curiosity of over-classification of
no particular use to intelligence needs.
At the same time, while the classification scheme was growing
more detailed, ;he number of documents received for processing in-
creased rapidly Therefore, the second majoridevelopment and a
Logical consequence was the decision to exclude altogether certain
types of documents from. the coding system. This process of not
indexing has been termed NODEX, a necessary limitation intended to
allow more time for coding those documents selected for thorough
processing because of their intelligence value.
In the present ISO, within a moderate size of 15,000 notations,
a place is proveded for very wide ranges of human knowledge and ac-
tivity. This development, however, has not been altogether systematic
and a glance at its varying degrees of fineneSs suggests that a
number of influences have been brought to bear on its contents.
The pressure from outside has generally been for detail expansion,
but usually in a haphazard way. The result has been equally unbal-
anced, as certaen sections have been accorded very fine breakdowns
and others have remained relatively unchanged Demands for
over-specializsion create a problem which invariably faces those
who are trying to construct a classification Scheme which fits the
needs of input and retrieval. The fineness of the classification
structure must reflect a compromise between the document analyst's
need to apply the scheme to all types of intelligenceidocuments
and the librarian's and researcher's need to retrieve:for very
specific needs. Because the documents vary greatly in their degree
of generality oe specificity and because the Intellofax System
serves a variet7 of customers whose needs often contradietebachecther,
indexing standards must aim at being at once both uniform and
flexible.
10
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
The subject specialist's approach to code structure and indexing
is sometimes termed "over-sophistication". It commonly reflects a
close familiarity with one subject and only certain aspects of this
subject, and quite naturally assumes a particular importance to the
individual concerned. A better term might well be "over-simplifi-
cation" for it can lead to a degree of specificity which requires
indexing every substance and fnnetion by name. If carried to its
logical conclusion, this fineness of claasification would require
a code of dictionary size while at the same time narrowing the base
of application. Coding by words as oppobed to ideas may serve
fairly well in dealing with commodities and inatailations. But
when the element of judgment is removed altogether from coding, the
end product becomes a mass of unmanageable size and much unrelated
data.
This conflict over specificity is carried over into an unresolved
disagreement about who may best be assigned to the task of coding
documents. There are three possible candidates: 1. The subject
specialist with professional status in indexing. This type is hard
to find, particularly so at moderate salary levele. 2. The subject
specialist with no indexing experience and no great desire to index.
3. The trained indexer with a general educational background. The
only experiment of any size which was intended to form some con-
clusions on this problem has been conducted in Great Britain. Its
value is exceedingly limited by the selection of one type of techni-
cal document as the entire body of test material. Here in OCR in
19,7, a team of library consultants surveyed operations and made
many recommendations. Task Team #1 assigned to answering the con-
sultants' observations on the Intellofax System stated that "it
did not recommend the hiring and maintaining of true subject
specialists (such as an organic chemist, an inorganic chemist or
biochemist) but rather the division of the coding universe into
large subject .groups, and specialization only within these groups.
Even though the coder would be a generalist compared with the subject
specialist in the consumer office, rough specialization would result
in many factors capable of improving the coding."
II. Nature of Documents
Other considerations which bear upon classification stem from
the nature of the documents and the requests which are expected to be
placed upon retrieval.
The volume of documents received in the Document Division ranges
between 700 and 1,000 a day and the variety is extremely diverse.
There are many short factual attachd or foreign service information
reports which are 1 or 2 pages inlength and can be covered with
1 or 2 codes with little depth in analysis required. There are the longer
raw information reports, such as many 00-B reports, which cover
technical research, and therefore require more specialized subject
knowledge. In these cases, the language of the document often does
11
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA7RDP84-00951R000400070003-0
not match the language of the classification Scheme, and the document
analyst, who is a generalist and not a subject specialist, may have
difficulties with toe fine a classification. Her it is better to
use a broader approach which then places the burden upon the techni-
cal researcher to read many related documents in order to pull out
the specificity he needs.
Many raw information reports cover political and economic
subjects - some factual, others abstract and theoretical. It is the
Latter type of aeport which is the most challenging and which reaaires
effective analyeis. This is the information Which is also the most
difficult to reerieve for it allows the minis*. reliance .on mechaai-
cal aids. The eonsumer may not be clear in his request and the clas-
sification cannot always provide the proper clues.
Finished intelligence reports prepared by evaluating components
of the intelligence community are normally loager than mw information
reports. The enphasis is different and becauae much Of the factual
data has been culled from raw information reperts, the depth of
coding is not the same. Although it may be necessary to assign many
codes, the fineness of classification is not So vital since broader
aspects are used.
III. Nature of Requests
]
In the conatruction of the chapter an ecOnomic subjects and
commodities proaision is made for discrete coding of Materials and
fabricated objects. It was assumed that analYsts would ask for all
material on one or more items, for example - Copper. It was soon
evident that even this one subject divided by further decimals into
stages of proceesing from ore to finished product would be too general
for the customea who asked only for production 'statistics. With the
addition of supplemental modifying prefix codes, it is now possible
to select any oa all 35 functions of the copper industry (or any
selected industay). This is the finest classificatioa we have
attained in our present Intellofax System with the use of prefix
codes. However, it is also the least analytiCal. It requires no,
particular skill and certainly no interpretation. Likewise, the
newer mechanical or electronic devices which Will retrieve this type
of document are better equipment only in the Sense that they can
produce referenees more rapidly.
Experience has shown that searches in reponse to requests for
information of abis sort are frevently the least successful in terms
of customer satasfaction. The difficulty steMs from the fact that
the analyst is asking for statistical data and he receives a list
of documents which contain some coded referenee to that subject.
Available equipment does not permit the accumOlation of information
to be issued in the form of direct answers toAueries, nor would
this necessarily be a desirable substitute for the source documents.
12
Approved For Release 1999/09/24: CIALRDP84-00951R000400070003-0
4PSPad iFno
InM Meenct9 ?? 212 ?..t? Filf'-g ?rut& 2215s1 ic,fp oo cx:). Ivo() 70 0 3 -0
essential to accurate and logical evaluation of information. With
these supporting facts removed, the reliability of pure information
cannot be determined, while it is perfectly possible that different
documents would give different figures in answer to a given
question.
A second category of specific subjects falls within the general
description of target information, that is data about installations
and cities. Information of this Sort is more often requested. from
users outside the Agency who are not well acquainted with the, facili-
ties or services of OCR. What may appear to be a very simple request
such as "major industriee in a selected area, While it could be
retrieved Would be highly impractical and uneconomical to translate
into an ISC search. Here the detailed coding which the ISC provides
becomes an obstacle to efficient retrieval, and the Industrial
Register becomes the proper source for such information. Target
information is further hindered by the limitations of area coding
on the present Intellofax card. Locations are not coded beyond
the level of countries with the exception of Russian ?blasts and
Chinese provinces. There are occasions in searches for instal-
lations when an IBM punch for city coding or clear text of city
names would be valuable. However: the Intellofax index does not
stress finer classification of this type of information because the
Registers have been established to service such requests.
The fineness of a classification scheme may also be dependent
upon the use of clear text in conjunction with codes. The Intellofax
System. uses no clear text punch columns today because of limitations
of the IBM card as it is designed for our present use. Some special-
ized files have used clear text for many years and with great success,
and any new system devised for Intellofax will without doubt consider'?
the possibility of adopting clear text. Panel II will discuss in
greater detail clear text as a necessary classification tool. It
should be mentioned here, however, that many of our present problems
with the classification structure could have been solved and the
requester more adequately served if we had been able to use clear
text.
Sp far we have examined the coding of concrete subjects, which
have proved to be the least demanding in terms of coding experience
or substantive knowledge. Here fine classification has proven to be
useful for the most part only in reducing volume in retrieval.
When we turn to the use of subjective codes, the handling of
reports which discuss political and economic conditions and affairs,
the criteria for both codes and coders must be altered to accomplish
efficient retrieval. Here the classification tool and the ability
of the indexer must join to produce a useful product capable of
supplying the information needed for consumers.. The chapter of the
ISC which covers world politics has also been expanded greatly from
its original 1,2 notations, but its growth has been more systematic
13
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIARDP8440951R000400070003-0
and controlled. Epre the problem has been the difficulty in maintain- e
ing a balance between the language of documents and. the ideas' which
they reflect. Moreover, as the need for interpretation increases,
uniformity in coding becomes more and more difficult to maintain.
Supervision and review are the available mean of control, but much
individual analysis of documents remains unchecked as the large volume
of documents flow-through processing each day Yet the problems of
ensuring uniform coding of ideas are fewer than those which arise
when each and every word is indexed, and the Concepts: or ideas are
removed. Withcut'a lengthy excursion into the subject of semantics,
it is still possible to observe some of the problems which are implicit
in reconciling words and meanings. In theory the meanings which may
become attached to words are not subject to fixed limits, yet the words
when printed become a:measurable piece of information. If the
material is to be of any intellectual value, the general context in ?
which a sentence is placed must be taken into account. The alterna-
tive is a very limited fora of classification which reduces effec-
tiveness of the system. to the minimum. With a document before him.
a coder can achieve some measure of understanding of the total meaning
created by the words in print, consisting of their literal values
plus their sigrificance as presently used. This in turn becomes the
basis for the interpretive coding which the IOC classification for
political subjects allows.
The ISC's principle of mutually exclusive subdivisions within
larger categories serves equally well in subjective classifications
but it does Unit the fineness of classification which can be
employed. Fran the viewpoint of retrieval, this is not the handicap
that it may apiear to be. Requests for political information are
commonly made in search of material bearing on a research topic for
selected countries such as "Present situation and outlook for
Austria," or "felative importance of military, labor, Church, and
intellectual fcrces" in Spain. In efforts to: serve such problems,
the need is for a collection of documents reflecting ,these various
subjects not only with a set of isolated facto but complete with. .
the observations of the report writers whose 'work is based on local
experience. At best. it is difficult for a research analyst to
reconstruct the variables of any recorded event or situation.
He must be especially careful not to assign Meaning to what he reads
which may stem from his own bias or imaginatiOn. To .assist him. in
.making estimative judgments, in weighing one Political faction against
another, meanings rather than isolated words Provide the clues.
So in classification, broad categories serve the, need more effectively
than a close word-indexing system. which may provide good statistics
but no ideas.
Returning to our original problem, it ie clear that classifi-
cation is the means employed for the systematic storage and retrieval
of information contained in intelligence docUnents. The operational
system. by which this information is brought to the requester, and
2.14.
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
the form of the end-product) while they may enhance the epeed and
scientific appearance, do not contribute to the intellectual quality*
of the research reports to be written. The emphasis must remain
therefore upon the documents themselves and the consumers who will
use them. The diverse contents of the first and the varied circum-
stances of the second create the essential problem of classification.
If the two are to be brought together with any degree of success,
some variety of treatment is needed. For material objects,- the finer
classification which records statistics such as: how much? and by,
whom? is most likely to provide the answers. There is little room.
for marginal material, none for the irrelevent in subjects of this
nature, and we should be able to provide only such documents as
will contain the desired information. Whatever is more or less
must be regarded as a poor product, For abstract and interpretive
subjects a classification scheme must be specific enough to permit
fairly rapid indexing within a uniform pattern which will allow for
discrete retrieval searches. Yet it must also be capable of reflect-
ing concepts and ideas which may not be found in the direct language
of the documents.
IV. Theoretical Problems of Fine Classification
Specificity of classification is sought in order to pin-point a
species inside a class, e.g., to distinguish men from all other animals
inside the class vertebrates. By means of such specificity the special-
ist can go to his field of interest immediately and without regard
for all other coordinate fields. However) the end-result of such
specificity is, oddly enough, the creation of a new broad category,
particularly if the specific subject is elaborated from its original
status as a differentiated species into a pseudo-class. This can be
seen by a consideration of the guided missile.
The missile most probably made its earliest appearance as a
piece of rock which fitted easily into a man's hand and was just the
right weight to be thrown a short distance and to bash a man's skull.
Modern technology has changed the piece of rock into an interconti-
nental ballistic missile with an atomic warhead) and the skull into
a city or a nation, but the principle is still the sate: the guided
missile, like the rock, is a weapon. However, inside the large
including class of military weapons it seemed logical to give a
special notation for the species, guided missiles, to distinguish
them from boomerangs', for example.
Soon there was the anti-missile missile, and then the anti-
anti-missile. Finally there will be a missile which is ittelf an
anti-missile missile or which will carry its own anti-missile missiles.
Since it no longer seems possible to consider a missile without
regard for the possibility of an anti-missile missile specifically
designed to frustrate it, the differentiated species of guided -
missiles should necessarily include the anti-missile; the anti-missile
should necessarily ineltediathe anti-anti-missile. However, if the
15
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA.RDP84-00951R000400070003-0
specificity has been carried to the point that, from the beginning)
the anti-missile and the anti-anti-missile are equai to the missrle
in the classification as co-ordinate species, even as the ape and the
chimpanzee are equal to man in the large clasa of vertebrates, there
is another prob:em in classification. Double or triple coding must
be used to compensate for the excessive specificity by creating a
new large class or broad category which might be called guided missile
warfare and weapons. It is not beyond probability that, since so many
other military devices, each given as a differentlated species, e.g.,
radar, are invo:ared in guided missile warfare, eventually the guided
missile warfare and weapons class would be as large as or eqaivalent
to the original broad category of military weapons) particularly in
a document collection specializing in current military technology.
Furthermore, consideration of the anti-missile missile without
regard for the missile could lead a researcher to think of it as
something which had sprung full-grown into being. An apt analogy can
be found in spaGe medicine. It did not spring into being as a' fully
defined and specific field of knowledge. It 4as its gestation in
aviation medicine, which grew itself from arm Y medicine, which was
an outgrowth of general medicine. It was subaumed in the class for
aviation medicine until suddenly it had such a.sturdysself-existence
that it seemed ao regaire a class for itself. 1 However, what about
the earlier material classified as aviation medicine?:
The easiess way to handle the problem is to double-code space
medicine, that ss to assign to it both its own code and the code for
aviation medicine, treating it as if space medicine were both a pew
class and an old class. In this way it is possible in the ISC to
relate the present state of knowledge to its antecedents, to show
the hierarchica:i relationship, not by means ot classification but
by means of an additional subject heading. EVen though classifi-
cation has been ignored, there is built into this method the seed,
of a new specific class, i.e., that portion of A, e.g., aviation
medicine, which is also part of B, e.g., space medicine or space
aviation medicine. The alternative to this third new specific is
to reclassify a3 much of the class for aviatien as is exclusively
space medicine. In this way A is A, B is B, ind the two can be
made coordinate subdivisions of AB, flight medicine. This is an
expensive bustn2ss, but it is good classification.
As the field of knowledge increases in size, it is necessary to
readjast the eLassification, i.e., the arbitrary, formal, conventional
scheme of organization. The possible ways of adjustment are four.
First: Each nes field can be made into a disCrete class. Second:
As the specificity of the discrete classes explodes like the popu-
lation of the esrth, larger and broader categories can be constructed
to pull the spezific small classes into another large class
Third: The whole classification scheme can be altered on a continuous
basis so that tae specific small classes will be simultaneously
specific and related in a hierarchical order. Fourth: Double,
16
Approved For Release 1999/09/24: CIAIRDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
triple, and quadruple coding can be given to the sane subject in a vain
attempt to classify into two or more classes simultaneously. One can
feel more and more sympathy with the oriental belief that the "All is
one," and "One is all," or even with the "Bellum omnium. contra omnes."
The foregoing observations reflect the exceptional complexity of
general intelligence document collections as opposed to the relatively
definitive problems of more specialized and limited collections.
This distinction accounts in great measure for the lack of literature
applicable to intelligence document problems.. Opinions regarding
new approaches to classification vary from, those which may be termed
unreasonably doubtful to the other extreme of the unwisely confident.
Both are subject to some adjustment when confronted by the realities
of daily practice which must take its form from: the materials available
and the demands which are made upon them.
Mr. Jesse Shera, Dean of the Library School at Western Reserve
University, has offered one brief analysis which can be applied to
our situation: "The pattern of classification appropriate to a
given library situation is conditioned by (a) the volume; (b) the
characteristics; (c) the pattern of thought of the field; (d) the
pattern of thought of the individual user." We have discussed all
the points Mr. Shera has made. We have indicated the large volume
of approximately 1,000 intelligence reports a day received for
processing. The characteristics of these reports are found, to be
as varied as the types of people who are hired to do the processing.
The pattern of thought of the writer of the report may not be the
same as the pattern of thought of the indexer or document analyst,
nor as a matter of fact, as the classification system_ itself.
And last, but certainly not least, there is the thought pattern
of the individual consumer whom, we are trying to serve. Certainly
all these factors, subject to such varied degrees of control, have
played and will continue to play an important role in determing
the nature of classification.
There will undoubtedly be some powerful and complex machinery
devised for future use in document and information retrieval. Its
success will still rest in great measure upon the skill acquired by
those who perform the indexing phase of the system. These people,
when thoroughly trained are specialists in their own right, responsible
for constant and discreet judgment upon the documents. Neither broad
coding nor fine coding has any intrinsic value unless accompanied by
sound reason and systematic control for both input and retrieval.
We have found that there are inherent deficiencies in any system or
classification, and there are ambiguities in requests which often
resist a ready solution. It is the obligation of those assigned to
retrieval to make up for these deficiencies.
17
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
?5X1A9a
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Panel I
DISCUSSION
OESTION: What have been the positive responses to requests for the changes
to the Intelligence Subject Code (ISC)?
The revised ISC incorporates many,suggested changes. The
proposed Army ACSI code has been accepted in part;- however the detailed
specificity of the Order of Battle and logistics codes would have been too
many to include in toto. Likewise Air Force Suggestions are included.
QUESTION: Who Ws using the present ISC?
The present ISC is used by Air FOrce Intelligence in its
ding project and by two AF command; Strategic Air Command (SAC)
In Tart by the Army Signal
above the ISC is used within
as the SHAPE Intelligence
today as a national intent-
25X11M
25X1
and Shepherd At Force Base. The code is used
Corps Intelligeace Agency. In addition to the
the military organization of SHAPE and is known.
Code (SISC). FWve NATO countries are using it
gence subject code.
QUESTION: In aadition to the obsolescence of the system what about
obsolescence of the information itself.
The ;;ystem reflects the present stage of knowledge but the older
iso remains in the collection. The search for retired material
endent upon the memories of people.1
'it has been ascertained that about 22 per cent of our re-
trieval is for :?etired material, that is, material more than five years
old. The Minicard Coding Group, which is worlang with a discrete corpus
of documents, was asked to assess the half lite of the documents handled.
The preliminary conclusion is that we cannot estimate the half life of a
document because of its historical value and the nature of research.
QUESTION: How Ws the document analyst kept cOrrent with the needs of
25X1 archers?
. Participation on a monthly rotation basislof senior analysts
on the Composite Group (Library/DD IntellofaxI5rocessing Team) alerts them
to needs of the researcher which they relay to junior analysts. In addition,
Vera. provid d
es a means of keeping the permanent Library member apprise
Telt coding practices.
MIN:[-t may seem extravagant to have two persons doing the work
instead of one as in the past, but we have foUnd the Cost of two on the
Composite Group in interrogating the requesters has been more than offset
by the savings Ln processing time and the endiproduct
Approved For Release 1999/01424 : CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Panel II
25X1A9a
CLASSIFICATION TOOLS
Panel Members spokesman)
A. Introduction
1. The views and problems presented in this paper represent the ex-
perience and experimentation of the Intellofax System, the Special
Register, and the Minicard Coding Group.
2. Intelligence document storage and retrieval present complex problems.
Intelligence documents vary from highly polished, well organized,
single topic dissertations to disorganized, multi-topic fragments,.
The questions asked of an intelligence index vary from the generic
NIS type of request to specific problems, the nature of which no
indexing system so far envisioned could possibly anticipate with
Index categories. In addition, a general indexing service such as
Intellofax or Special Register must be able to cope with all fields
of knowledge using document analysts with limited subject speciali-
zation. As recently stated by a Library of Congress consultant,
we are facing problems never before experienced or anticipated in
the field of documentation. Other documentation services face
these problems in part, but the totality of factors mentioned
above is unique to the intelligence community. For this reason it
is difficult to get competent advice from experts in the field
of indexing. Most of the experts' experience is limited to book
cataloging or specialized, technical document collections. It some-
times seems there is entirely too much furor over which indexing
system should be applied to the average, small, specialized document
collection held by various U.S. industrial concerns. It would seem
that subject heading indexing, the simplest form of all, would
suffice except for highly complex subjects such as chemistry.
3. There are two problems critical to any storage and retrieval system
which are particularly applicable to an intelligence system. They
are the need for uniformity of input and specificity of retrieval.
The indexing tools or techniques discussed in this paper are attempts
to resolve these problems.
B. Intelligence Subl!ct Code
1. There are in present use three main systems of indexing. They are:
a. Subject Readings Subject headings are a simple alphabetical
19
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
arrangement of recognized words ot compound terms which are
farther to the users of the indexing system for which they
are designed. Complicated subject heading schemes tend to
take on many of the characteristics of a classification
system. The most notable example of an index using subject
headings is the Reader's Guide to. Periodical literature.
b. Cocrdinate Indexes - Coordinate indexing carries the subject
heading system a step further by allowing, in the retrieval
prccess, the coordination of ideas or index terms which refer
to the same document. Unlike subject headings, coordinate
indexing is more effective if used with some sort of mechanical
eqnipment. Uniterms, key words, and descriptors are all used
in coordinate indexing systems.
c. Classified Indexes - Classification systems attempt to classify
knowledge into broad groupings and sib-groupings. The botanical
classification of plant life, Dewey Decimal Classification, and
Library of Congress Classification are examples of classification
systems as well as the systems used in OCR, namely the Intelli-
gence Subject, Code and the schemes used in the specialized
registers.
2. Subject headings are in general not applicable to intelligence doc-
uments A very specific subject heading list tends to get complicated
and difficult to use, and generic searching is extremely laborious.
In addition, the use of subject 'aeadings does not provide for the
coordination of ideas which is extremely necessary to specific
retrielral in an intelligence organization.
3. Coordinate indexing, which seems to be gaining the most popularity,
also has serious drawbacks when applied to intelligence documentation.
Coordinate indexing has been applied almost exclusively to limited
fields of knowledge, particularly scientific knowledge. The language
of these limited fields is usually fairly stable and concrete. When
new teems do arise they generally haVe an entirely new meaning and
do not conflict with previously accepted terms. However, when co-
ordinaee indexing is applied to broad fields of knowledge it en-
counters many semantic difficulties. A word does not have the same
meaninz in one field of knowledge that it has in another, e.g.,
stabilLty has a different meaning for the chemist, the physicist,
the aeronautical engineer, and even the political scientist. The
problen of synonyms is obvious and very difficult to overcome. In
additisn, as in the case of subject headings, generic searching is
laborisus or impossible without complicated techniques. Coordinate
indexiag seems to work very well in limited subject fields, par-
ticularly well disciplined scientific fields, but it presents some
seemingly insurmountable problems when applied to a large collection
covering all fields of knowledge, especially fields such as politics
and sociology, which include mary abstract concepts.
4. It has become very popular in the donumentation field to criticize
Approved For Release 1999/09W : CIA RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
classified indexes. They are said to be structurally complicated,
difficult to use, too rigid for easy incorporation of new subjects,
and not specific enough or too specific. It is also maintained that
they do not change fast enough and quickly become outdated. Many
of these criticisms are true when one of the standard classification
systems is applied to a document collection. The basic principles
of classification systems originally designed for books have been
used in the development of systems applicable to document indexing.
Any classified scheme has the one great advantage, when properly
indexed and crossed referenced, of gathering all the subjects in a
particular field together in a minimum number of places. It greatly
facilitates generic searching and it alerts the indexer to subjects
of index interest. When correctly constructed and indexed it need
not be overly, complicated and difficult to use. When designed to
handle a particular documentation problem it need not suffer the
criticism applicable to the general classification schemes designed
for books. It would appear that the classified index and its
auxiliary tools are more applicable to the intelligence document
problem than the other index choices.
The Intelligence Subject Code, which has been in use in the Intellofax
System since 1948, is currently under revision for publication in
early 1960. The ISO has been criticized for having the following
weaknesses:
a. There is no guide on how to apply the ISC, and its structure
is difficult to understand without knowledge of the interpre-
tations placed on the various sections by CIA.
b. The repetition of the same commodities in several different
sections is confusing and unnecessary in lightof developments
such as the subject modifiers.
The ISO is unbalanced in subject coverage. Important subjects
such as space travel and artificial satellites have limited
coverage, whereas an extensive section is allocated to plant
diseases on which there is little reporting.
d. Its index is unreliable and outdated.
e. It does not have enough cross references and explanation of
individual code meanings.
6. The revision attempts to overcome these weaknesses by:
a. Providing an introduction explaining the ISO's content and how
it should be applied.
b. Placing commodities, including military weapons and equipment,
in one chapter and assigning appropriate subject modifiers
21
Approved For Release 1999/09/24 : CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIMRDP8400951R000400070003-0
(action codes) to distinguish the Various actions affecting
comnodities. Also, the three former separate chapters for
the armed forces are combined into one chapter with appropri-
ate subject modifiers.
c. Updating the subject content and deleting unnecessary subjects.
d. Providing a complete index prepared on IBM cards which can be
kept current.
e. Providing complete cross references, liberal scope notes, and
other annotations to aid the analyst and the reference
librarian.
7. The revnsion is no panacea, but it will overcome many of the Ob-
jections to the present. code. In most respects the revision does
not go anto great subject depth, but greater depth can be added
as needed. With the addition of clear text, classification depth
is not as critical a problem as it is with the present Intellofax
System. The revised ISC should ensure mueh greater uniformity of
input end with the addition of other techniques discussed below
there should be much greater specificity of retrieval.
C. Subject Modafiers (Action Codes, End Use dodes)
It was found in early retrieval experience in boththe Intellofax
System and Special Register that requesters were not interested
in all aspects of some subjects, e.g., commodities, but that they
wanted certain modifications or actions only, e.g., production,
export, etc. It was also found. that in the commodity field, for
instance, these same actions were reqUested repeatedly. One
solution to this problem would have been .to add these modifications
as subject subdivisions to all the subjects to which they applied.
This wan impracticable- because there Were many modifications and
they applied to many subjects. Their iaddition as subject sub-
divisions would have increased the site of the code book tenfold.
These modifiers finally evolved as twO or three digit action
codes which can be combined with various subjects as appropriate.
The subject modifier or action code aS applied in OR is a new
develorcent but the idea itself is old. It is very similar to the
Universal Decimal Classification system of auxiliary tables and
bears some resemblance to faceted classification. Its use greatly
facilitates specificity of input and retrieval.
Da Area Codes
2. Intelligence analysts usually have an area responsibility in addition
to thein subject responsibility. An overwhelming number of machine
run requests are for information on a specific country only and in
some cases on a subordinate area within a country. Area codes are
22
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
not difficult to construct aside from the problems of digital
limitations and whether the code should consist of numbers or
letters. The:constructl.on of the code should in general conform
to-the Area interests of-the users, e4.0 Middle East, South
East-Asia, and should also be able to show limited geopolitical
Concepts, e.g., Communist versus Non-Communist. In some systems
specificity is important enough to require an area code based
on longitude and latitude.
2. A primary consideration in area coding is one of depth. It is
obvious that the interest in Russia and China is so great that
these areas should be broken down to at least the oblast and
province levels. The need for fine area coding in other parts of
the world is not so obvious. There are occasional requests for
areas such as Lower Saxony in Germany, but it is questionable
whether the additional coding time involved in fine area coding
could be justified in view of the few requests received. Cities
also present a problem. It does not seem feasible to construct
an area code for cities, even one limited to Soviet and Chinese
cities,because there is no particular criteria for choosing those
cities considered important. A Soviet settlement consisting of
100 people becomes very important if a guided missile site is dis-
covered nearby.
An areacode consisting of all the countries and other major area
-of the world, e.g., international waters, and the major political
subdivisions of Russia and China would seem to be adequate. In
addition, for the Intellofax System it seems necessary to be able
to code in clear text at least Soviet and Chinese cities and other
bloc or not-bloc cities deemed important.
4. One further consideration regarding area coding is file arrange-
ment. A completely reversible subject-area file is the most useful,
i.e., one file arranged by subject with area subdivision and a
duplicate file arranged by area with subject subdivisions. The
area file should include related (secondary) areas as well as
main (primary) areas. The above arrangement is desirable because
some searches are more easily accomplished through entry into the
area file and conversely some searches are feasible only through
the subject file. Very broad subject searches, e.g., everything
on science, for a particular country are almost impossible without
an area file approach.
E. Direction, Nationality and Reaction Codes
1. It is Often necessary to show area relationships in order to ensure
specific retrieval. The indexing of information such as export-
import data is of little value unless both the origin and destina-
tion of the shipment are shown. Area relationships are expressed
In the Intellofax System by the use of a two digit code called a
23
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
related area which can be selected by the IBM machines in con-
junction with the main area. The related area code is fairly
satisfaatory, but it has taken on a variety of meanings. Usually
it is u3ed to show the direction of an area relationship, but
it is aiso used to express nationality, e.g., French troops in
Morocco, and comments and reactions, e.g., Soviet reactions to a
U.S. nualear bomb test. The multiple use of the related area has
led to the need for strict rules for its application, a variety
of memos to handle specific coding situations, and some retrieval
confusion.
2. The Minicard Coding Group has adopted a very simple device for
overcoming this problem similar to that used in SR. This was done
by enteing a 1, 21 N, or R, in the fourth position as an extension
of the three digit area code. "1" equals the concepts of sending,
from, wience, or source country. "2" equals the concepts of
receivilg, whither, target, or destination. "N" and "R" stand for
concepts of nationality and reactions, This is a valuable codiag
techniqae which should be included in any future indexing system.
F. Clear Text 13odlneE
Clear text coding is the entering of Words, abbreviations, and
numbers into a machine system to givemore specific meaning to
subject, area, and modifier codes. It is an auxiliary device which
allows :or any degree of coding depthdesired. It has been used
successTully by the MCG and SR and itwouldlave been highly de-
sirable in the Intellofax System had there been space available on
the IBM card. It is presently being Used in the Minicard, experiment
to specify subjects for which there is no exact code, and to cite
the names of people, organizations, installations, and geographic
place names. Clear text coding is the ultimate key to the
specificity problem. Clear text and phrase coding (see below) used
with a elassified index allows for the organizational and. generic
values of classification plus the speCificity advantages of co-
ordinate indexing. It is an essential auxiliary to an index such as
the ISO which for practical reasons cannot go into great depth on
all subjects.
G. Phrase Codisg
Phrase oding is inverse coordinate coding. In phrase coding,
subjects, areas, modifiers, and clear text codes are all linked
togethe7 by logic on input to express an idea or phrase rather
than a single subject. The phrase can then be retrieved as a
unified idea. The main advantage of the phrase is that it prevents
the so Nailed false drop. If index terms were entered into a
Minicard type system (or IBM for that matter) without a phrase
linkage and retrieval involved a request for two different subjects
linked together in the same document, e.g., aluminum and aircraft,
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
false answers or false drops would occur since there would be a
number of documents which discussed aluminum and aircraft un-
related to each other. Phrase coding does not limit searching
since subjects can also be searched without regard to the phrase.
Phrase coding is an integral part of any computer type system and
there would seem to be some very real advantages to linking several
subjects together in an IBM punch card system in order to eliminate
false drops on coordinated searches. It might also simplify IBM
searches.
H. Coding Dictionary
1. ,The coding dictionary should contain those aids to the classifi-
cation scheme, necessary or valuable to the coding operation.
It need not be bound in one volume.
2. It has often been stated that the key to uniformity of classified
indexing input is the alphabetical index to the classification
scheme. Classified indexes by their nature must place similar
subjects in more than one place in the classification scheme in
order to maintain the classification of knowledge pattern, e.g.,
locomotive production would normally not fall in the same subject
series as locomotive engineering. The alphabetical index to the
classification scheme points up these distinctions or at a minimum
gives the various places in the classified index where locomotives
are indexed.
3. No classified index can specifically include all of the subject
matter which it must index, but it generally has subject categories
broad enough to blanket almost any subject which may be reported,
e.g., the index may contain pharmaceuticals, but no specific types.
Specific types would be entered under the broad subject pharmaceuticals.
When subjects such as specific types of pharmaceuticals are identified
and their place in the classification scheme is located, an entry
should be made in the index to the classification scheme so that
when indexers encounter the same specific pharmaceutical in future
reports, they can easily determine the previous decision and ensure
uniformity of input.
4. There is also a third type of entry which should be included in the
coding dictionary; namely, coding rules applicable to specific
happenings, e.g., Berlin crisis. Generally a coding pattern has
to be established to handle these situations. This coding pattern
may consist of several subjects and areas. One way of informing
the coding group of these coding patterns is to circulate a memo
and establish a central authority card file. This has been the
Intellofax practice. A superior method.is to include these decisions
In the coding dictionary with the other index entries, thereby
lessening the number of places the indexer must search.
5. For those indexing operations which have a clear text entry capability,
25
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA.RDP84-00951R000400070003-0
the coding 'dictionary should also contain the form aed authority
for the clear text entry. Uniformity of clear text input is
vital, Ind therefore has to be rigidly controlled.
The coding dictionary should contain then the index to specific
entries in the classified index, the index to entries which de not
specifisally appear in the classified index, clear text entry
authority, and other valuable coding rules or aids. The Special
Register and the Minicard Code Group have both proved that the
coding iictionary can be efficiently Managed on IBM cards and
issued in the form of an IBM printout. IBM cards can be added or
deleted from the authority as necessary and printouts can be easily
obtainel.
I.IIITII?El_aerating Procedures
1. Growth ef the classification scheme - There are three main sources
of suggestions for subject addition to the classification scheme:
a. Suggestions for code additions arise from the document analysts
who feel that certain subjects are not represented or that
reperting on certain topics is so voluminous that further sub-
ject breakdown is needed.
b. Whea reference traffic indicates that certain subjects are
difficult to search, consideration should be given to subject
additions or changes to ease the earch problem.
c. Research analysts may feel that their interests are not fully
represented and that further subject breakdown or rearrangement
is lesirable.
Any logical and necessary subjects should be added to the classifi-
cation scheme after due regard has been given to the possibility of
using clear text in place of subject additions. Subject expansion,
however, requires the strictest management to ensure that the sub-
ject does not already exist in the classification scheme in a
different form, that the subject expansion requested is not mote
extensive than required, and also to ensure that when an addition
is made, it is placed in the proper place in the classification
scheme. Great caution should be exercised in considering the ex-
pansion needs of research analysts. It should be insured that there
is actual reporting on the requested expansion. Also it is ex-
tremely important that the expansion not be too technical, Other-
wise it may be beyond the comprehension of the average document
analyst.
General Coding Procedures - Aside from specific coding rules, there
are a number of general procedures on which the document analyst
should be instructed. These procedures include such things as
26
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
depth of coding for certain types of documents, a list of types
of documents which should not be indexed, how to prepare abstracts
and title expansions, how to fill in a code sheet, etc. The
analyst should not be expected to retain all of these procedures
in his head, rather he should be given a separate coding manual
incorporating these procedures. Supplements should be issued
as needed in form suitable for filing in the manual.
3. Informing the Document Analyst - In order to keep the analyst informed
so that he can do a better job of subject analysis, there should be
available to him a number of easily understood classified and un-
classified reference works on difficult subject fields. If the
reporting contains many abbreviations, an abbreviations file
should be established (the Intellofax System has had such a file
since 1950). Briefings by staff and non-staff members should be
arranged to clarify the subject content of the code book. Any-
thing which keeps the analyst better informed improves the accuracy
of the input and makes the analyst's job more interesting.
4. Review - It is desirable to have total review of each analyst's
work in order to assure quality and uniformity of input. The
reviewers should of course have unquestioned coding competence.
If 100 per cent review is unrealistic, there should be a definite
program of review. The analyst should be made to feel that the
purpose of the review is to better the input rather than to maintain
a constant check on him.
J. Selection Problems
1. Inclusion of the tools and techniques discussed above will insure
a high degree of input uniformity and retrieval specificity. There
is one area of intelligence indexing, however, that bears heavily
on these problems and on which there are no guidelines. What do
you index and how much do you index? The managers of the Intellofax
System determined that there were certain types of documents whfdh
had little intelligence value, or did not fit into the indexing
system, e.g., fragmentary order of battle and State Department
housekeeping reporting. These documents fall within a nodex or
"no index" category. These nodexes which are not entered in the
indexing system are, however, disseminated. This nodex category
has been extended to the point Where it is occasionally criticized
by research analysts, yet we can get little guidance from using
offices as to what we should index.
2. There are certain reports which are considered very important by
research analysts which are not included in the Intellofax System,
for example, FDD Summaries and me Daily Reports. For a number of
reasons these reports do not easily lend themselves to Intellofaxing.
However, should not perhaps more attention be given on incorpo-
rating these reports into the system and less attention to reports
of marginal value? This question, of course, raises again the basic
27
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIARDP84700951R000400070003-0
question: How can OCR determine whether a report is of marginal
value?
3. Summary reports are another problem. Should finished intelligence
be indexed in great depth or indexed only very broadly? The
problem of indexing depth for intelligence summaries arises
constantly and decisions are often made on the basis of the work
involved rather than that of the value of the document.
4. On various coding uniformity tests there is usually agreement
as to the central theme codes which should be assigned, but there
is wide disagreement as to how much of the peripheral information
should be indexed. Every indexer develops his own patterrs
which he can justify but which do not arise from any specific
direction from the user.
Greater participation by the user in the form of briefings and
actual selection of materials for indeXing would improve feedback.
Until this interaction between the indexer and user is achieved,
the system cannot reach its full reliability and effectiveness.
28
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
25X1A9a
25X1A
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Panel II
DISCUSSION,
Q STION* Is the Area Code being revised?
and I were members of the CODIB Working Group
which developed a new Area Code. This code will be issued as a part
of the Revised ISC and will carry a 4 digit numeric notation as well
aa a 6 character alphabetic notation.
QUESTION:. Will dictionary building affect the organization of coding
activity and/or the distribution of documents within the activity?
Panel members were in agreement that a "Dictionary Building" activity
does not necessitate reorganization or effect appreciably the die-
tribUtion or flow of documente. Recognition of the need for a
dictionary entry and the recording of a term of concept according to a
pre-planned format rests with the desk analyst. The dictionary card,
togethervith the document on Which it hae been based, must then be
routed to the Review Officer, who is responsible for standardization
of all entries and for keeping updated listings on the desks of all
Classification Analysts'.
lain the term "Minicard".
Minicard" is a means of storage and retrieval of In-
binary form on film. OCR has an experimental set of
equipment in the 3rd Wing of M Building. The Minicard Coding Group
(MCG) has been working in support ?of this experimental group of
machinery for almost a year. A decision to use Minicard equipment
involves the expenditure of 1.5 million dollars and therefore must be
based on all evidence that can be accumulated as to the ease of input
and retrieval, file organization, dependability of equipment, mechanical
problems, etc. OCR has already learned enough to more than justify its
experimental group and should the equipment itself be rejected the
system of classification as used by the MCG in the coding of the corpus
could be Used with other equipment.
29
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Panel III
SOPPLEMENTS TO THE MAIN CLASS "IED.FILE
panel :Members:-
25X1 A9a
spokesman)
5X1 A9a
Thie paper deals with the classification ohilosephy underlying the
existence or ieformation systems which suppleMent the Main elassified
file in OCR. some might prefer to call these eupplemental systems "special
collections," "auxiliary files," or "special librarieiWi Whatever expres-
sion we use, WE. are all are that they exist (the more obvious examples
being the Induetrial, Graphics, Biographic or Special Registers), although
we may never heme thought too much about the .'why? Of their existence
In the following paragraphs, I hope to outline some of the reasons why
these files exist as entities separate and diStinct from the main file,
as well as some of the issues which relate to their maintenance or
management.
It might It advisable to point out, fir; of all, that most files or
information etalections in the intelligence Conmelnity are supplements to
Other files in one sense or another. This Agencyle Office of Central
Reference and ell the Registers contained therein might be considered a
special file created to serve the special needs of this Agency. Similarly,
the RI file mit be regarded as a supplemental file to OCR, speeializing
as it does in data of counter-intelligence sigaificaace.
At the other extreme are the files that au ansaYst, section, or
branch might keep for whatever purposes they have in mind. There are
countless such supplemental files. Almost every office has one, aad new
ones are born every day. The Document telvision, for example, has an
Abbreviation File. which is maintained for the obvious purpose of identify-
Ing abbreviatiens found in documents or used in abstracting documents.
The Library used to maintain a finished intelligence file which indexed
by area and suNect the intelligence reports Of a finished and evaluated
nature. It aleo has a bibliographic card file on the publicatioas and
speeches of Marx, Lenin and Stalin. ORR's Geography Division ha e a file
on the Kurdish problem; the Industrial Register has a special file deal-
ing with certain trip reports; While Graphic Register has a file which
controle films containing: information on the tradecroft of intelligence,
as well as a file in which photographs of naNial vessels are controlled by
clase of vessel rather than by the location there. My own Pegister has
zany special files which supplement our main file system. Logically, we
cannot exclude any of these files, small thoUgh they may be, from being
designated "supplements to the main file." or the Size of the file has
30
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
nothing to do with it. A supplement io simply, as the dictionary defines
something which supplies a want) fille the deficiencies of, or makes an
addition to soZething already organized or set apart.
If there is agreement then that these supplemental files include a
wide variety of files both great and smalI, the next question that might
be eked. is, "Why are these files separate from the main co11ection74
Why do we have a Biographic Register, an Industrial Resister, a Histori-
cal Intelligence Collection, or an analyst file on Communist front
organizations? Why aren't they part and parcel of the main collection
where these and other kinds of data conld be analyzed, stored, and retrieved
in one centralized operation?
IP we reflect for a moment on the world outside, we become aware of
the obvious parallels between our auxiliary collections and the informa-
tion libraries maintained by industrial concerns, the special collections
within general libraries, libraries devoted to a single subject (such as
the Pager Llbrary on Shakespeare), and so on. In 1953 there were some
2,489 special libraries in the United States, covering about every sub-
ject field.
A number of these libraries developed because of caprice. Perhaps
a wealthy benefactor wanted to bring together in one place all the books
with a certain kind of binding, and supplied the funds to See that it was
done, No doubt some special intelligence collections have been founded
at least in part because of the caprice of a high-level official (or even
a medium-level analyst), and either should never have been created et all,
or at least have long elute fulfilled what useful purpose they once had.
But caprice does not fully explain the phenomenal growth .of speoialized
libraries in the outside world, nor is it an acceptable explanation for
the creation of most anxiliary collections in the field of intelligence.
The reason that is most often given for this extraordinary develop-
ment is the tremendous increase in recorded knowledge. Until a few
centuries ego, information control problems, as we know them today, were
entirely unknown. Few books were written, and these could be easily
classified in one or more of the few then recognized sciences or fields
of human endeavor. The industrial revolution, however, created a new
body of knowledge entirely different in nature and much larger than all
preceding knowledge. This knowledge was not only published in many forms
other than books, but it also oontained bits of information related to or
of potential importance to many different subjects.
We in the intelligence reference business would certainly admit that
the sheer size and diversity of application of the information in our
collections has been a factor contributing to the development of supple-
ments to our main file. But there ie another factor also which has con-
tributed to this development. The business of digging out information
has become so involved and time consuming that a librarian can no longer
remain a mere "custodian of knowledge," as Webster once defined him. He
31
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIAADP84-00951R000400070003-0
ean no longer nerely collect and pard data. He is asked to aseeee it and
oftea called ueon to summarize it. In brief, he la sated to file laformatioa
elether than material and this has necessitated the introduction of special
tiIing and retrieval techniques.
We should ale? not forget the problenn of the physical eheratter Of
Our dOeumentar) materials. No one as yet has, foumd one aceeptatle solation
for.tataloging books, journals, maps, photographs, films, and so on -- or
for Ming therr- Zach medium raises problemedifferent from the, other, as
eVideneed by the growth of special manueeript collectiOne, photo libraries,
etc.
Yet even if We admit all these facts to be true, it still does not
entirely explain why certain files have been eeparated from the main system
and. not Others. And it is even more perplexing when one considere suet
auxiliary files as the Biographic or Industria Register, where the physical
nature- Of the eaterial does not differ greatly from what is filled in the
eentral document collection. Why can't these registers, who read many of
the same documents received and classified in the Document Division, become
a part of the in file?
I would set that the real reason is none of those that have been
cited -- neither the growth in recorded knowledge, nor the increasing
demand for int:motion service, nor the physical nature of the Material
Itatif thou ih all no doubt play theirs,part. Vbat actually eauses a
4peeta1 or auxiliary file to be establishild I think we all wOuld agree --
ls consumer denand. But it is more complex than that, for consumers demand
Veiny things but they do not all occasion the creation of a special file or
regiSter. If cue attempted to embrace all the cosplex factore that enter
into/it in a single sentence or formula, it might read as follows: Given
the resent state of _information storage and retrieval theory, the magnitude
;Me requirenent for establishing an auxiliary file system is a function
"71726-1SFESTe as the size, nature, and orgaeization of the collection,
22e of requests, and the comprehensiveness and form of the answere
Trie provided.
Note that I have qualified this statement w-ith the words, "Given the
Misent state cf information storage and retrieval theory." It would, of
001ree, be most desirable to have one central information system where all
COell go to get the data they wanted. It would eliminate the very real
denesr of duplication of effort, overcome the problem of attempting to
Wine mutually exclusive subject areas, and achieve greater efficiency.
30 the science (or art) of indexing and machine filing and manipulation
of ellaft has not advanced to the point where such centralization is possible.
40no4ivab1y, tbe day might come when we will have one universal classification
System applicatle to all kinds of data of whatever depth, and sone otaanical
storage device into which all types of graphic materials could be filed,
With such a development, we would theoretically not have to worry about the
SIAS or nature of our special materials, nor about the kind of Information
VS IOWA be exsected to provide. But that dt* 16 not here, and even if it
Wiltel ft.Nety 7:act dispense with the need for specialisation on the pert of
32
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
the classifier Or information officer. Some human analysis and Judgment will
probably still be required, both in classification and in retrieval, even if
auto-abstracting and similar tools become available. If so, then at woald be
immaterial whether the eatire information system and all its personnel oould
be housed under one roof, all the data classified by one index syetem, and:
all stored in one machine -- for there would still be a need for industrial
analysts, if not an Industrial Regieter, for graphic analysts, if not a
Graphics Register, and so on.
I have eaid that consumer demand is the most important, factor affecting
the establishment of an auxiliary file system, because it seems self-evident
that we do not -( or should not) store information for the sake of storage
alone. Even a public library, which ie not an information system in our
sense, tends to reflect the interests of the community in which it is located.
It is true that we may collect and store material in which there is no
immediate interest, but we will certainly not index it to any great extent
nor separate it from our main collection. Whether we will ever do so depends
firet of all on consumer demand, and secondly on a combination of certain
other factors which I inserted in the formula offered above.
One of the items referred to was the scope of requests. And by scope
I mean depth as well as breadth. Let us take, ac an example, a document
dealing with the Ukrainian Academy of Sciences Whibh might be q?r? y
our central collection. Presumably the document would be ebstracted for
the Intellofax System and classified under the Intelligence Subject Code
by such subjects as history, the Ukraine, science, and so on. If this kind
of general claseification satisfies the users of the main file, if they are
only rarely interested in obtaining information about particular institutes
within the various Soviet academies, then it would be foolish to index such
a document in greater detail, much lees to divert it from the main file to
some auxiliary collection. Moreover, it would not be necessary for tha
classifier to have any special knowledge of the subject in order to catalog
the document adequately.
Let us suppose, however, that the information system frequently
receives requests on various matters related to Soviet science, including
the general organization of scientific research in the Soviet Union. If
the main file's classification system is too cumbersome or too general to
enable one to 3ocate documents dealing with this subject quickly and easily,
a small desk file or background folder will be set up to make the information
easier to locate. We now have the beginnings of an auxiliary collection
which miaht easily expand into a large supplementary file operation -- an
"Organization Register" -- employing dozensof epecialists. It would all
depend on how many customers required this kind of data, how much informa-
tion was received, whether the depth of requesters' interest was such that
they might require information on research conducted in a specific Ukrainian
laboratory, whether the information specialists would have to learn Ukrainian
or some other foreign language to perform their job properly, and what form
of response would be required. I have used the example of organizational
information because it is a subject which has, in fact, become of such interest
33
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
to the intellignme community, that within the USSR Section of BR we have
created an organizational file which is auxiliary to our main biographic
tile, which in .:urn supplements the central intellofax system. We have
persons who specialize in this kind of data, and they have even gone so far
as to write organizational studies for the NIS and other intelligence
Vropams
I am told that somewhere there exists a tile on the agents of a certain
intelligence se:evice. These agents habitually use aliases which may consist
simply of a given name -- such as "Frank" -- Which they often change. It is
important that information obtained on these individuals be collected and
tiled for counter-intelligence purposes. But how can one classify and re-
trieve the pertinent data on one of these persons if he is continittar re-
ported under different or incomplete names?
The method employed in this case is to classify by physical charac-
teristics -- by scars, birthmarks, moles, and other distinctive features of
a person's physiognomy. Presumably, material on a certain "Henri" with a
scar across his nose would be filed with information on a "Gustav" who is
maid to have a similar scar. Can anyone imagine such data being mixed in
with our centrai document collection with any'hope of retrieval? Think
What it would do to the Intelligence Subject Code. Think too of the poor
classifier who ehould have to leap from reflecting on how to code intra-
bloc fiscal policies to those persons he has indexed as having scars on
their faces.
Having exaeined some of the reasons for the existence of collections
Itich supplement the main file, let us now consider some of the problems
we encounter in their maintenance. '
One of the inevitable consequences of a Compartmentalized information
system is duplimtion in some sense between the auxiliary files and the
main file, and emong the auxiliary files themeelves. This invariably dis-
turbs management. Duplication connotes waste! and. waste must be eliminated.
An. effort is often made to centralize the various information processing
and retrieval oeerations.
Recently, E and two other OCR representatives were invited to study
the information handling activities carried on by the various branches of
a division in aaother CIA office, with a view to the possible centraliza-
tion and staadardization of these activities. Each of the branches of this
particular division had certain specialized substantive interests, but
all were conceraed with the same general subject. Each branch was coding
and classifying material that flowed into the division in a way that it felt
beat, None of the classification systems were the same, and there was no
single place where a person could go to get all the data on a given subject.
Naturally there was a certain amount of duplieation among these specialized
file systems, ad pressure was being exerted or uniformity of processing,
if not centralization.
In the course of this investigation I had occasion to visit another
3l.
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Leivision of this same office. Here the eituation vas exactly the reverse of
the one we were studying. In order to avoid duplication and the other ills
of decentralization, the responsible officials of this division had, some-
time in the past, concentrated their information activities in one branch.
I learned, however, from one of the information specialists in this branch,
that this trend was actually being revised. Intelligence officer in the
other branches were beginning to compile their own files again in their own
ways, and it had reached the point where one could no longer rely on the
central information system.
It may be that this kind of problem could be avoided if there was a
better understanding of what we mean by duplication. Duplication in what
sense? When is it permissible and when is it not?
Al]. of us in the reference business, I am sure, can sense when duplica-
tion is good and when it ie bad. I feel certain that my colleagues in IR,
BR, and FDD would all agree that much of our work on organizations and
institutes is duplication, and it is bad. Why? Not simply because we are
processing the exact same data out of the exact same documents. But because
we are all processing and storing in anticipation of certain needs of our
customers -- in anticipation of present or future retrieval requirements --
which in this instance happen to be identical. The same cannot be said of
IR's and GR 's coding and storage of industrial photos. Although the photos are
the same, each division feels that its retrieval reqeirements differ.
There is duplication too between the main classified file and some of
the registers in that they are coding the same data. But it has been said,
and I think rightly so, that neither activity can substitute for the other.
The central system must supply the necessary generality in indexing so that
it can handle intangible or abstract subjects, without delimiting file
categories to a degree that might hamper future searching from a new approach.
Since it is these abstract sUbjects which are most susceptible to change,
reflecting aS they do the thought patterns of the time and a particular re-
searcher, they must not be coded in any great detail lest they be unable
to generate the answers to new problems.
Let us imagine for a moment that the Library was told to stop answering
requests having to do with Soviet scientiots since this is a duplication of
work done in BR. A requester might then approach BR for information on the
number of Soviet physicists who received the doctorate degree in 1958. BR,
like any specialized information system, attempts to exploit all sources
of information which have any bearing on its reference mission, and to index
such data to the greatest depth required.
It is conceivable that BR could answer this request by the laborious
process of first selecting out all the Russian physicists in its files,
and then reviewing these dossiers to see which physicists held the doctorate
degree, and, of those, the number that received their degrees in 108. But
a reference facility could have answered this kind of gener1. question much
more easily since, unlike a specialized file, it is not imprisoned by the
detail of its own classification system. The central systen, like BR, may
35
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIMRDP84-00951R000400070003-0
have received information pertaining to the training Of physicists in Russia,
but instead of indexing names that might have appeared therein, it would
have cataloged the material under "Science EdUcation -- Russia," or some
such subject.
In additicn to this problem of duplicatiOn, there is also the issue
of whether the classification problems of the subsystem differ in character
from those of the main file or are any less difficult to resolve,
It has been said that it is the degree of diffuseness of information
that is the heart of the classification problem. If this is true, then:
Lt seems logical to carry the reasoning further, as Berne do, and. state
that Where an inforMation collection le asse#1e6 for special purposes the
problem becomea less severe, since the IrdeSing need cover only e fraction
of the full potential of the information.
Supporting this line of reaeoning. is the argument which some of our
own people used a few years ego in replying to the criticisns of the
Library Consultants. Their reply emphasized that it is much easier to
classify specific named objects, such as people, plants, geographic place-
names, and so oa, than to classify abstract sUbjects.. For the classification
of rimmed objects, they said, lends itself to Specificity, detail, and rela-
tive stability when compared with abstract orintangible subjects.
At first thought, this view appears to make sense. BR's classification
problems, for example, seem fairly simple -- Its business is people, and
people are specific enough. As the poet said; when asked to explain geography
and biography:
"Geography is about maps,
Biography is about chaps."
Neverthelem, we who have been concerned with the maintenance of the
special collections have found that it is not quite that simple. That, in
fact, as time goes on, every subfile of the main file system ultimately en-
counters most ar. the same classification prob4ems which cause such difficulty
for classifiers associated with the main file.
The reasons for this are not hard to determine. No respectable specialized
collection which deals with named objects would be content to index by these
name& objects alone. If they did so they would soon be servicing only a
fraction of thsoLr potential customers. In truth, the user of the information
system would like to have every item of infor*ation, named object or other-
Vise, indexed b7 every conceivable category into which it may fall. This,
of course, is imossible, since we are limited not only by cost considerations,
but also by our incomplete mastery of the science of information storage and
retrieval. But we do make some effort in this direction and it inevitably
leads us into the indistinct world of abstract ideas and patterns of thought.
For soon we are not simply indexing the name of the plant or the person, but
the economic, scientific, or social scientific sUblects with which these
named objects are connected. We are not merely. indexing the name "Dmitriy
36
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Yemelyanov," we are also saying that we think he is a cyberneticist and that
perhaps he should be connected with the subject of aid to under-developed
countries, 'This is one of the reasons why we have "Snag Files.," and why
we develop complex hierarchical coding systems which look very similar to
the in collection's Intelligence Subject Code -- although differing in
content since they have been formulated to meet our own peculiar require-
ments. It also explains why one .large specialized collection which at one
time attempted to expand beyond mere name-index control of its material,
found that its subject files had become catchall repositories, and has now
concluded that the semantic problem is too great to overcome.
Another question that has disturbed some of us is whether there are
any logical limits to the number of items of information that should, be
classified by the auxiliary file system. Must we index our primary sub*.
jects -- whether they be plants, people, photos, or other -- by every
known fact which can be applied to them?
One of the reasons for indexing in detail is, of course, to enable
one to find a specific piece of information quickly in a large mass of
material. Of course, this does not mean that you have to index everything
In your file system in order to find what you want. There is, however,
a second advantage to the kind of detailed coding and indexing done by an
auxiliary file system, and that is that it enables data to be synthesized
at a later date in such a way that it may reveal items of intelligence
information that might otherwise never have been discovered.
To put the matter in another way -- one of the best reasons for an
auxiliary file to index in great detail is that it permits statistical
analysis of a whole range of intelligence problems. And on occasion this
kind of analysis will load to significant intelligence breakthroughs.
Undoubtedly we will tee this kind of technique being used even more in
the future, especially as we acquire faster and better machines to do
the job. But while statistical analysis does require a vast body of
detail to work on, this does not mean that we must index all incoming
information by every classification category possible. Detailed classifi-
cation is justified only when there is sufficient data to have statistical
significance and when there is likelihood that there will be inquiries
that' can be answered byaconclusions from this data.
Thia may seem to be an obvious point, but it is one which we tend
to forget. Too often, in our zeal to satisfy the wishes of our consumers,
we begin to classify what they want (or what we think they may want) even
though we will never have enough data in these subject categories from which
any significant conclusions can be drawn. Whether the subject is license
plate information, the age of factory buildings, or the domestic travel
of Soviet nationals, the classifier has the reaponsibility to decide, on
the basis of his intimate knowledge of the materials with which he deals,
whether indexing would be worthwhile. It is in this way, as in others,
that he fulfills his true role as an information specialist.
In summary, this paper has argued: that when we talk about supplements
37
Approved For Release 1999/09/24 : CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIARDP84-00951R000400070003-0
to the main c1a3sified file we must itClude in 'our thinking any file that fills
the deficiencie; of another file. that we have these files because Specialized
research requirm specialized inormation; that they are born Of consumer
demand and i*ped by its needs; that file duplication is to be Condemned only
when the retrieval objectives are identical; that while an auxiliary file 122q
tegin with a neorow field of operationsy in the attempt to win complete
mastery of its Information content it meets the same classification obstacles
as the general Tile; and that the application:of index controls to data must
always be governed by the quality and quantity of material coming in, and by
the good judgmelt of the classifier.
38
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
25X1A9a
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Panel III
25X1A9a
IDo the members of the Panel desire to add to what
said with respect to supplementary files?
II
DISCUSSION
bject which I didn't cover with much detail is
yne, do you want to .comment on that?
25X1A9MIIM
I propose that a Snag File may be understood to con-
trol selecfed information under criteria which are not adequately
served by formal files already in the environment and which continue
to conflict with the criteria of the formal files, To put this a
little more simply, if you don't like the system, don't resign -- start
a snag file.
Snag files are the competing rudiments of future specialized
registers. Everybody, who is conscious of a problem which irritates
him in his duties and which is not adequately controlled by the
apparatus already available, has the responsibility (because of
his awareness) to begin doing something about it. The physical
equipment with which he records his evidence on his chosen subject
is his snag file. I regard most of the information files, desk
files and specialized accumulations of records in the possession
of analysts throughout the intelligence community as snag files,
supplementing (in a regretfully discoordinated way) the main file.
We need a much better organized means of feeding back the quality
that these files possess into the main file. For example, there
are a great many files on the subject of organizations, in a great
many different hands, An "Organization Register" could be supported
with an immense amount of data, if it were assembled in one place.
Referring to the subject under which we assembled: the
philosophy of classification, let us not forget that philosophy
involves controversy. It is not usual for philosophers to agree.
And we may all claim the right to maintain our own point of view,
and document it and support it by building our own snag files
(to the extent that we can get the energy and means to do so)
because compromises which defeat our own point of view are not
necessarily losses for all time. The occasion which justifies
our special point of view may be coming. Of course, there is
survival of the fittest in this game. Not everybody always
wins.
The difference between the main, central file and all the
many little snag files, at two polar opposites of the information
control activity, may be summarized
39
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIAARDP8400951R000400070003-0
In ths main file) everything received is thrust
into some sategory or other, It amountsto an array
of 5,000 or 15,000 boxes. A requester is presented
with a selection of boxes in which he muSt rummage
Mang the nuts and bolts to pick out what he wants,
He is participating in a stage of the information
control preets because he has nobody elSe to do
it for him,
In his snag file his selections hate already
been made.
We must resetber that the indexes we build, whether snag files
or central files) are accumulations of our perceptions. We could
repeat the effoet two years hence, or we could have the indexing
done by a dozen people (the White Stork method) and come up with a
Wide range oftrrceptions But this process is never finished
because perceiving is never finished* Variety of perteptions
SMOVidea opportanity for snag files.
25X1 Don't you think these snag files ore actually files that
are closer to tle consuner, files that meet his needs in the most
trosediate way, ')ecause the saag filer knows what the consumer really
wants? He is not guided by official missions Or descriptions of
missions. He talcs to consumers all the time arid he recognizes that
they want a cer:ain kind of control set up apart from the main file,
or a supplement to the main file. He is really providing a more
2EMTIAgb more Lntelligent service than that available from the main
'(es, I agree. The snag files you are describing are
The ones which accumulate around input desks. I've been emphasising
that they are not the only snag files. Around input desks we can
halo* maverick flies -- that is, files that fail to conform to team
requirements. When thirty or forty indexers perform a major input
rOgram as a team they can't behave like thirty or forty mavericks.
The freedom to be a maverick is available only in the snag file.
10t I, propose rt to overlook the snag files in the user community.
A. sufficiently active dissatisfaction with the detailed service
that he can get from the main file is the proper license of every
vait is wi*Osin his means to start a file which goes some
a remedying his problem.
I nr.ght ask Mary this question: do you think there should
be more special collections rather than less should more of the
named objects be separated from the main file and made the subject of
254xemilement81egister?
Well, as you said, such files would have to be born of
consumer demand and shaped by its needs. I think if we have one
tond in cotton here today it is our problem with the consumer,
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
trying to ascertain what he wants and what we can do to give him better
service. So we might well have a separate register for organizations.
As brought out in the discussion, the coverage of organizations is
scattered and there is different emphasis in retrieving information
on organizations. So we'd have to ask the consumer three questions:
(1) We'd want to know how great is his need. Does the: unequal
depth of coverage cause any great problem: to him?
(2) We should want to know what, exactly, he hopes to find in
an "Organizations Register." Just what information does he hope to
find? Because it would be possible to just give him back specific
organizations but very often we find that the consumer is using an
organization as an approach to another type of information. For
instance, in the USSR he may be tracing the changes in subordi-
nation of organizations. He may find that a plant has Changed in
subordination from one ministry to another, that this indicates a
change in production, and change in emphasis of a whole industry
or an armament program. or he may be interested in approaching
a particular subject. There may be many cards in an IBM index
of a certain subject and it might be much simpler to take an
organization, or a few organizations, involved in this subject,
if that's what he really wants to get. Or the consumer may be
interested in finding out the names of particular persons associ-
ated with particular industries, plants or organizations. If
we had an organization file we would want to emphasize the thing
the consumer hopes to find by placing a request with us.
(3) Then we'd want to know in what form he hoped to get this
information. Does he want documents, IBM listings, or a synthesis
of data? Or a combination of all three? The answers to these
questions would guide the operation of the register. A synthesis
of data could be obtained as a by-product of the classification
system. Particularly if you use a punched card System, you Could
determine which factors you were interested in, what you wanted
to know about a particular organization. As this information
became available in indexing on a daily basis, it could be re-
corded, kept up to date, corrected and changed, and the infor-
mation could be arranged. Now, such a synthesis of data would
speed up service to the consumer because you could arrange
information, or, by checking your file) you could establish
whether you had ever seen such information. (There would be no
point in searching through 1,500 documents to find the street
address for a particular factory if you had controlled that
street address in your information file.) Arranging the infor-
mation file in several different ways might suggest further
channels for investigation. It wuldhelp achieve consistency
in classification to have the information file On hand for
your classification analysts as well as for the consumer. It
would help to identify vague references, and pull them together.
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIARDP8440951R000400070003-0
0Ap1in&gink we could, or the consumer could use, an organizations
"ifeg13;etr, But, of course, it would be up to him- in the final analysis.
Aren't there two possible interpretations of the term
snag file 'T? Are they by their nature suppletental to a register,
or are they files of information which should: be handled by the
register? I'm thinking of a file that the Industrial Register might
have that identifies certain types of installaticns. Shouldn't the
0334atg thefiles:t em i way?n some aichrepresentailures of the system be incorpo-
111111110.
yes. I think we all share the employee's suggestion
that analysts siould please come forward with their half-done files
and arrange for an improved degree of community accessibility by
central listing of them. This kind of call has been made many
times, we know. I'd estimate rather more that a thousand little
files around the community that might be called snag files. I
think a very small proportion of these will ever develop into,
or be incorporated into a register. Some of them are not going
to achieve this (perhaps desirable) central responsibility. If
you mean there Ls vagueness in the term "snag file", that is
granted. There may be a better term. But we are talking about
voluntary files, the result of active dissatisfaction with the
degree of control of information in a subject field where certain
251(9B services are already available.
Were the Registers created to process data and supply
informational answers, as distinct from the Document Division's
25MAIWbf provding for document retrievability?
A requester asking for material relating to a specific
indivi ual might.; be provided with a dossier, which is nothing more
than a collection of documents which refer to the individual. This
La not much dif:erent from the provision of documents from the main
file pertaining, for instance, to the subject of underdeveloped
areas. Admittedly, in one case there is reference to .a named
object and in the other to a more abstract idea, but I don't see
much difference in the way the questions were answered. True,
the Registers pcoduce research aids intended to save requesters
time. These amount to research for them, in a sense. In the
Register there as greater emphasis on providing the requester
with information, but you can't carry that idea too tar. The
main file provides information, too, as well as providing
INFX4Patif?
Graphics Register is somewhat unique since the
material on file is the photographs themselves. We have found,
through experience, that the consumer prefers to :tome in and use
the photographs as they are found in our files. Governed by this
preference, we generalize in our coding. We have two classification
systems. In the Film Branch, we use the ISC, iand f.n_the Photo
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Branch, we have a photo-intelligence code, much more generalized
than the ISC.
25X1A9MIIMINis: Perhaps some of these "supplements to supplements"
cann6t be merged with the auxi11,4ry; file, desirable as that
might be. For example, why couldn't you merge your little file
of photographs of naval vessels, for control, with your photographic
file system?
Our photographic file system just wouldn't handle
25X1A9a 11,11,111,11analysts are expected to select pictures which add
to photo intelligence on the naval vessels. This file is handy
for us, and it can be used, on occasion, by a requester who is
searching for a photo of a particular vessel, The present general
filing system for photographs is by country, province, city and
then by a numerical control number. The mounting card bears a
32-subject set of broad categories which we call "selection by
compartmentation". When a photograph of a naval vessel in
Odessa comes in, we don't have the physical means of filing the
master photo in two places. It is filed in Odessa, but this file
for naval vessels gives a cross-reference, by number, type) name,
etc,, to the master copy. The photo is coded also in the ISC
system, but not with the same degree of fineness as in the naval
25X1A9a vessel file.
,X1A9a
25X1A9a
In the Industrial Register, in using a three-digit
code, do they also use clear text as a means of entry to the indexed
information?
141114WWW! No, the primary emphasis is on the three-digit code.
But we do have little reference files and subject files, as on
mining within a country, transportation facilities, other economic
iiWina filed by country, but not a clear-text index.
Is the three-digit code card-punched?
y1111industrial categories,
111111111 It is card-punched, and recoverable by machine methods
R
IWI
ma-Ion, such ,,,s Is there space on the punched card for other infor-
.
clear text, if you chose to use it?
25-X1A9
Yes, I'm sure there are many, possibly eight spaces,
available. Further punching would create several problems, however,
perhaps longer listings and an unwieldy working tool. We deem the
three digits sufficient.
4.3
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
PANEL IV
CONTRIBUTION OF MACHINES :TCLTBE
25X1A9a CLASSIFICATION PROCESS
IIIIIIIIIIIIIW
Panel Members; (spokesman)
laagaga! 25X1A9a
A. Pu7Ps.,2LEEPIE
The pumose of this paper is to suggest areas and ways in which
machines can be put to use to assist ClasSification personnel in
the performance of their indexing function,
B, Composi-Aon of Paper
The paper is composed of two sections - one major, the other
minor. The major section identifies area in which machines have
been used Ic.thin OCR in support of the classification function
and outlines some specific applications in this regard. The
second and ninor section of the paper treks of the working climate
which must exist amonE classification, maChine, and reference
components jaa order that advantage be takEin of the full potential
of machines in the OCR complex.
C. Apologia
I should like to admit a few things right away about this
paper: Pint, there is throughout a presUmption that our topic
"Contribution of Machines to the ClassifiCation Process" is not
an absurdity, a presumption that machines Can assist persons
engaged in the task of classifying data fdr input to a machine
Index system. Second, we are talking here about standard EAM
machines; that is, punch-card equipment of the type now available
in OCR, we are not talking about EDPM machines or high-speed
magnetic tape equipment with computer capabilities, etc.
Third, therE is what may appear to be an Undue accent on the
experience and practices of the Special IRgister (SR) in this
paper. SR'E entire reference system is *wily oriented
towards mactines tuch more 00 than is the.casevwit4 any other
OCR reference component. So we have developed a habit of trying
to make mactines do things for us. And one of the things we
have tried to make them do is to assist classification people
with their indexing function.
Approved For Release 1999/09/24: CIALRDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Some of the tasks performed by machines in support of classi-
fication personnel could be accomplished by other means, of course.
This paper, however, will outline something of what has been done
in, the prospect that you may find some "transfer" value in these
applications.
D. Basic Machine Capabilities
Before getting into the specifics of how machines can assist
in the classification process, it may be helpful to list briefly
the types of processes EAM machines can perform, Each of these
processes or functional capabilities may be of help .to classi-
fication personnel attempting to use machines in support of their
classification task.
First, of course, machines can store information, keeping it
at the ready" in the form of punched card files or indexes.
Machines can cumulate or merge information, thereby updating
files.
Machines can compare information and check file sequence
in the process.
Machines can arrange or sort information into differing
sequences.
Machines can select wanted information from a larger mass
of data.
Machines can perform simple arithmetic tasks, such as counting
cards, adding and subtracting figures, etc.
Machines can re reduce data and, at the same time, adjust or
rearrange the relative eft/right positions of data fields in the
process.
Lastly, machines can Eint out information.
EAM equipment is today considered to be slow, through com?
parison with EDPM equipment. However, even EAM machines perform
these data-handling processes with great speeds. One of our
print-out machines, for example, could keep pace with a complement
of 37 Agency-qualified typists.
Well, these are the basic functional capabilities you have
at your disposal. I'm. sure most of you are familiar with them,
45
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIARDP84,.00951R000400070003-0
MACHINE SUPPORT TO THE CLASSIFICATION PROCESS
Now let us turn to those areas, together with some-? speeific
applieations, where machines have been used in. OCR to support
Classification personnel in their indexing activities.
A, Classification Manuals
?
Of course, machines can be used to store, sort, and print-out
your classification manuals or code books. There are several
advantages to this:
1. Revision of manuals
Because a punched card file constitutes a kind. of
"dynamic storage" of data (that is, the storage is ex-
tremely flexible and mobile), the recording of your
classification manuals in machine cards greatly fa-
cilitates posting revisions to your coding scheme.
Not all code systems change appreciably. Some change
a great deal. The Special Register's code scheme for
Soviet organizations, for example, undergoes hundreds
of changes each year in response to the changes oc-
curring in the organizational structure in the Soviet
Union and in response to our improved_ understanding
of this Soviet structure through new intelligence
receipts,' The Intelligence Subject Code (ISC) has
recently been extensively revised and the planning
for processing with Minicard equipment may result in
further changes. Such revisions are very easily
recorded and controlled when your code book is stored
in punched cards.
2. Currency of Manuals
The speed of machine print-out Makes feasible
more frequent printings of your code books, with the
very desirable result that the manuals on the desks
of your classification analysts are kept more up to
date. Ina growing classification system, this is
particularly important.
3. Multi le Sequence of Manuals (Index to basic book and
others
Through their sorting capability, machines make
it entirely feasible to list your code books in vary-
ing seiusnces. The basic sequence for a structured
or classed code is, of course, by code number. This
sequence groups the topics of your classification
46
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
scheme by major category with generic subordination
within each major category If your code books are
recorded in punched cards, however, it is possible to
list your books in alphabetical sequence by code title,
thereby obtaining alphabetical indexes to your basic
(code number) books.
Such alphabetical indexes to the basic code books
have proved very helpful to classification analysts.
The basic purpose of the alphabetical index, of course,
is to provide the classification analyst with a tool for
quick access to the code number for a given concept
through direct alphabetic look-up rather than oblig-
ing him to find his topic within the structured ar-
rangement of his basic classed code.
The alphabetical index, however, serves two
auxiliary purposes which seem worthy of note.
Such an index to the basic code often facilitates
more complete and accurate use of the code scheme
by the classification analyst in that it serves to
alert him to the full range of coverage within the
overall coding system of a given term and the mul-
tiple meanings this term may possess. For example,
the analyst sees a document reference to the term
plates, without further specification. In under-
taking to assign a code number to this topic he
may think of some of the following possibilities
but probably would not think of all of them:
anchor plates, armor plates, battery plates, boiler
Elates, cathode plates, cutting plates, dental
plates, dinner plates, etc... to pursue only the
first four letters of the alphabet.
It may be possible (it often is) in view of
the larger context of the document and with the
aid of such an alphabetical index, to determine
the specific nature of the reference at hand. If
not, the analyst at least has the range of pos-
sibilities at his finger tips and can code the
item in accordance with classification procedures
established for such contingencies,
In addition, the alphabetical index provides
to classification personnel a tool which may be
used to improve the code scheme itself. Not
infrequently, confusion creeps into a growing clas-
sification scheme through entry into the system of
code titles which, although ideationally unrelated,
contain similar or even identical title words.
The inexperienced or careless analyst in his search
11.7
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIPORDP84700951R000400070003-0
for a code number May find the desired word in the
basic classification manual but not the desired con-
cept - with the result that incorrect classification
occurs. With an alphabetical:littoUt at alloode
titles cOntained in the system at hit disposal, the
classification officer can readily locate for survey
such potential trouble areas in his coding scheme and,
through appropriate re-wording snd cross referencing
of coda titles1 minimize the confusion caused by
similarity of title words and consegnently the po-
tential misuse of the classification system.
The basic code number sequence and the alpha-
beticaL sequence by code title (the index) may not
be the only sequences it will prove profitable to
have your classification scheme listed by. For
exampla, Sills classification scheme for Soviet
organimtions is listed, believe it or not, in ten
sequenes: (1) by code number; (2) by title of
organitation; (3) by city of location; (4) by city
of location within Oblast; (5) by SOVNATKHOZ or
Regionll Economic Council; (6) by Soviet plant
number; and by four different echelon levels,
Viz., ''r) by Chief Directorate or Nein Adminis-
tretio3; (8) by Directorate or Department;
(9) by Plant, Trust, or Combine, and (10) by
Labora:ory, Office, Base, etc., within Plant.
It is .rery unlikely that any classificati=
manual not controlled by machine would be'l
listed in so many sequences. Yet each of aese
sequences is important to BR's classificatiqn
effort
Control of Problem Topics
Anothe::7 area in which machines can be used in support of the
Classificat%on process is in the control Of "problem topics"-that
is, topics which cannot be clearly identified or which do not,
for one reanon or another, fit precisely into theclassification
scheme ?
In SR, we have developed two types of controls for handling
these problem topics, both of -which enlist the support of machines.
1. Tho Authority File
The first of these controls is the so-called
Authorty File, By Authority File, We mean here
a file or index of topics which have proved trouble-
some to code and for which,. accordingly, codes have
been entablished by supervisory direction as those
48
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24 : CIA-RDP84-00951R000400070003-0
to be used by all input analysts. These code numbers,
although perhaps largely arbitrary, by such action
'1'become the "official" or "authorized" codes for the
topics in question. Thus, the term Authority File.
Machines, of course, can be used to store, sort) and
print -out the Authority File, and wiiral the ad-
vantages concomitant to machine-card maintenance of
the Classification Manuals themselves.
2. The Snag File
The second control for "problem. topics" is what
is called, in the terminology of this paper, the Snag
File. The Snag File in SR is a special auxiliary file
of those problem. topics which are not controlled by
the Authority File, The Snag File is maintained in
sequence (alphabetically and numerically) by the problem.
topics themselves rather than by the classification
code numbers under which they were indexed, thereby
providing direct reference access to problem topics
irrespective of codes used.
The Authority File and the Snag File both record
coding actions that have been taken. The difference
is that the Authority File contains those coding
actions which have been thoroughly thought through
and constitute authoritative and lasting decisions,
whereas the Snag File is composed of coding actions
which are not likely to recur and are not considered
worthy of long deliberation. Actions recorded in
the Snag File might be termed spur-of-the-moment
decisions. Strict uniformity in this type of coding
action is not necessary because data retrieval is
guaranteed by inclusion in the Snag File of all
such decisions. Pragmatically, the Authority File
tells how to classify a problem topic (providing
a single, authorized code for each) while the
Snag File tells how a problem topic has been
classified (providing a record of the several
codes which may have been used in ad hoc coding
actions taken). The Authority File, then, is
primarily an aid to 212..2.L.:fice.lialt) the
Snag File is an aid to reference recovery.
In SR, machines have been used to prepare a
separate deck of cards for problem topics caught
in the daily work flow. This culling by machine
from our daily work flow of questionable classi-
fication entries is accomplished by a simple
overpunch in the machine card ordered by the
classification analyst whenever he feels he has
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIPORDP8400951R000400070003-0
not fully resolved all reasonable doubt in assign-
ing to the topic- in question the code he has ehosent
This iE an extremely low-cost method Of compiling
these data.
3, Authority File and Index to Classitipapion
Marual Combined
It may be worthy of note that in SE we have
combined "authority file" entries with the regular
code book entries, thereby producing A single
alphabetical listing of these combined decks of cards
which serves the classification analyst both as an
index to the basic code book and as an authority list-
ing for problem topics. That both the authority file
and the code book are stored in punched cards makes
merging and sequencing of these data A simple matter.
C. Input Quality Control
A third major area in which machines can be of assistance to
the classification function is that of inpiat quality control There
are severe/ techniques which may be employed here - the objec-
tive being to catch errors in the index cards before they are
merged into the standing indexes of the service system.
1. Daily Work Li!IELfalialma
One technique is to prepare listings in code
number order of new index cards ELS generated in the
daily work flow. In the hands of the classification
analyst, such listings make it feasible for him to
catch (1) impossible codes (2) alphabetical entries
which are incorrectly spelled, and (3) entries which
do not conform to established procedures governing
the manner of entry and fielding of data. This
technique has been used in the Special Register as
a check point in our quality control efforts for all
our basic indexes.
Z. Daily Work Matched Against the Classification Manual
Another technique aimed at minimizing input
error consists of matching by machine new index cards
against the official classification manual codes in
order to isolate all impossible codes. This technique
is currently used by the Machine Division in Connection
with the Intellofax system.
50
Approved For Release 1999/09/24: CIA,RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
This same technique is _used by SR in the case
of our Soviet place nate index. This matching of
new index cards against our area classification? manual
-validates the accuracy of both area code nutbera
and area- place names.
3 Automatic Authentication of Data Entry Patterns
Another technique for the control of input
quality consists of screening by machine new detail
cards to assure that prescribed patterns of data
recording are being maintained. Conformity to
prescribed patterns can, be checked at multiple
points in the index cards by a single pass through
the machines. In SR, all the major fields of new
subject and commodity index cards are checked by
this technique before being merged into the standing
indexes.
Correction of Index Cards
A fourth major area in which machines can be used in support
of the classification function relates to the correction prooess.
Every index system. has its errors and it is one of our less
pleasant tasks to try to get them out. Machines can help.
1. Correction as Step in Service Processin
The following technique, now used in SR, possesses
the particular virtue of limiting or restricting
the correction effort to those portions of the index
file actively used in the servicing of requests. As
a standard step in the processing of each search
of our machine indexes, a listing is prepared of
all index cards recovered. This listing sIowa all
data contained in every index card selected in the
search. Documents referenced in these index cards
are then pulled from. the document file and are
scanned for pertinency, possible follow-up runs,
etc.;? prior to release to the consumer. If,
through thivscan, a document is found Which does
NOT relate to the requested topic, it is known
that an error exists in the index card which
produced this particular reference. By turning
to the machine listing, the person effecting
correction can easily determine which portion
of the index card is in error. The listing also
provides him with all data required for the
deletion card employed in the correction process.
Data for the correction card, of course, must
usually come from. re-analysis of the document - a
51
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84700951R000400070003-0
step which ib taken before the d_ocument ia ieturned
to file.
2+ Detail File Conversion: Old Codes to New
.0.6*
Another facet of the correction :process with
which machines can help is the conversion of old
codes to new codes in the index file following a
change in the classification scheme itself. Mhen
the classification scheme has been altered, the
index file must be correspondingly updated or
Reference persomael are forced to work with
Inultiple recovery systems, which is, of course,
undesirelble4 Machines can be of great assist?
ance ia this regard through antomatib conversion
of the index cards from codes in the carded
system to their equivalent codes in .11.e, new system.
Or course, this type of conversion can be
effected on punched data Only. All data in SR's
system is punched. In the Intellofax card,
however, a substantial portion of the data
carriei is not punched but is entered as repro-
duced typewiTTer text. This textual data would,
with present OCR equipment, be lost in the type of
conversion depicted above (except when correction
is effected by "under plinchi/e). There are
indications, however, that new eqnipMent may be
on the market before long, which could cixcutvent
this difficulty, permitting Intellofax card
reproduetion without loss of unpunched or textual
data.
3. Snuect Portions of Detail File Listed for Survey
There is yet another way in whiCh machines can
facilitate the correction process. It is not unusual
that, ia the course of operating a large index systems
certaia sections of the. system, for One reaSon or
another, become suspect. In such cape, it has often
proved helpful to use machines to liat out for Study
by the classification officer all cards in the suspect
sections of the system. Analysis of these liStings
may lead to compensatory actions such as (1) card
corrections, (2) reprocessing of bathes of
materials, (3) altering or tightening of input pro-
cedures, and (4) revisions to the classification
scheme.
52
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
IlL THE TRINITY OF MACHINE REFERENCE St
Now, in conelusion here, I'd like to moralize a little about
the working climate necessary to the mechanized reference system,
We have beentalking here about ways in which machines can
be put to workin support of our classification people. I think
it will be evident to you that support activities such as those
outlined in this paper suggest a working atmosphere of mutual
understanding and eooperation among all elements of the system.
The yoint Ild now like to stress is that a mechanized
reference or data-handling system of any appreciable complexity
or scope can function properly only when a high degree of oper-
ational integration or synthesis exists among the three principal
components of the system; i.e., Classification, Machine, and
Referenoe.
If there was ever a case for the left hand's knowing what the
right hand is doing, it is the mechanized reference system.. -A
data-handling machine is a very precise and exacting piece of
eqnipment. It imposes upon its users demands of formidable ri-
gidity. It is essential that data symbols and their inter-rela-
tionships in the Machine system, preserve constancy of meaning
from the initial point of classification input, through all the
processes of machine manipulation, to the terminal point of
reference output. The machine will accommodate no ambiguity ...
no difference of interpretation. The burden of constancy lies
with the personnel operating the system.
The mechanized reference system, then, has a "need-to-know"
principle all of its own, a principle quite the opposite in its
effect from the need-to-know principle of security doetrine.
Instead of restricting knowledge and communication, the -
"need-to-know" principle of the mechanized reference system.
transcends the barriers of organizational compartmentatien
and proclaims that all components of the mechanized system,
need to know eegreat deal about one another.
What are some of the specifics of this "need-to-know"
principle?
The Classification component, in order to do its job, needs
to know the nature and capacity of the basic machine record; needs
to know the function, design, and maintenance sevence of all
data files in the systeme needs to know the functional capabili-
ties of the machines and something of their speeds; needs to
know the nature of the end product desired by the Reference
component; needs to know the avenues of approach to the data
files and the search techniques which will be relied upon in
fulfilling requirements; needs to know the scope and accent
53
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84400951R000400070003-0
of requests to be placed, ineluding shifts in the emphasis of
consumer interest; and needs to know of success and failure ia
the operation of the system as guide poste to classification
development.
The Machine component, in order to do its job, needs to
know the nature, of the classification schemes; needs to know
the data categories requiring discrete ma4rtenance sequences;
needs to know the substantive inter-relationships of data re-
corded in the system; needs to know the nature of the products
the reference component will require; needs to know the re-
covery techniques or access routes the Reference component will
wish to employ; needs to know the scope and number of requests
to be serviced and the time limits imposed on servicing them;
and needs to know the nature of new types of consumer needs so
that new applications of the machine potential may not be
neglected.
And the Reference: component, in order to do its Job? Well,
the Reference component, ideally, needs to know everything about
its sister components. It needs to know the substance of source
materials and the classification coverage of these materials;
it needs to know all classification schemes aid techniques
and procedures; it needs to know the design, data coverage,
and maintenence sequence of all mechanized files; and it needs
to know machine capabilities in searching and otherwise
manipulating these files so that reference reeovery approaches
may be efficient and fruitful and so that new methods of
exploiting the potential of the system may be conceived and
activated.
Our insistence on this point may seem like astempest in
a, tea pot, but, in view of experience already gained and in
view of the demands in this respect which the new data-handling
equipments now on the horizon will make upon lea Users, we
feel our poent to be both timely and of substance
Now al:. this stress on the "need-to-know" about one another
is not, of course, to suggest that you cannel; have internal
structure .thin your mechanized reference system. You've all
seen organieation charts which fracture our Office, Divisions,
Branches, eec., into neat little black-walled cells or boxes
with very seraight and very narrow paths between for com-
municating upward and downward (but not laterally). Well,
this celluietion or compartmentalization is admittedly
necessary for numerous administrative reasons and is all well
and, good fo:: the purposes intended. There are, in fact, lots
of very splendid benefits from compartmentalization. It is
a great help in the Hearts and flowers department; it is the
only elemene of order in Time and Attendance reporting; it
ApprovedFor Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
intersperses enough supervisors among us so that employees often
get to know the faces and sometimes even the names of their
bosses - and vice versa; it has not been neglected as a mechanism_
for justifying promotions; it is absolutely indispensable when
it comes to "orientation briefings"; and it can be a great com-
fort to hard-living employees by giving virtual assurance they
can safely expect not to have to speak to a living soul before
the morning coffee break:
Se, compartmentalization has its Justifications and none
of us would know how to live without it. But., there is the
grave risk here, nonetheless, of which we are warning - the risk
that organizational form. take precedence over function . that
delineation of elements within your system effect a separation
Of the properly inseparable. The Classification, Machine, and
Reference components of a mechanized reference system either
work together or they do not work at all. They are not per-
mitted unilateral license. They are highly interdependent
elements of a single entity,? the Reference System. They are,
if you will, a trinity, composed of three, but constituting
one.
Unless this interdependence is recognized, unless the
?need-to-know' principle is practiced, unless close inter-unit
work relations are established and sustained, your mechanized
reference system. will not only suffer a deficiency of imagi-
native service applications, it may well:tvel.Lral 1z fulfill
the basic functions it was established to perform.
55
App,roved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
?AWL I:V
DISCUSSION
QUESTION: Does any structural organization or other mechanism exist
to foster understanding and cooperation amonethe three major componentS
(Classification, Machine, and Reference) of the mechanized reference
system?
25X1A9aIIMM The "rused-to-kaW more about nne another is a matter of
great concern. The present OCR seminar is the first Office-wide effort
to break down organizational barriers and, I hope to have similar meet-
ings at least semi-annually, each to deal, with some aspect of the
information problem confronting OCR.
One inter-unit mechanism, is already in being-4,LtheflOCR
Composite Group established to strengthen the servicing of Intellofax
requests. This group is =noosed. of representatives from the Reference
Branch of the Library and the Analysis Branch of the Document Division.
25X1A9a
The Documeat DiViSio4 periodically schednles members of its
Analysis Branch to work with the Intellofax retrieval component of
the Library in Drder to acquaint themselves bitter with retrieval
attivities and problems encountered. The Machine Divison is also
represented whea appropriate through its standby member of the
Group.
There are nany ways to circumvent organiiational compartmenta-
tion, In the Special Register, the barriers of compartmentation are
diminished by circulating all consumer revests as written up by
Reference to ths classification analysts as a double check on
validity of retrieval coding; by establishing and exercising direct
working-level caannels among those aetually carrying out assigned
tasks irrespecbive of Branch of assignment or supervisory line&
of co;ssunicatioa; by staffing the servicing competent of Reference
only with persoas who have served at least tw6 years as classifica-
tion analysts; Dy periodic re-training of Reference personnel in
Analysis operations; by exchanging written operational procedures
among Reference, Analysis, and Machine units; and by inter-unit
staff meetings misled by any unit when the need for same arises.
Other possibilities include: (1) additiOnal training progrems
established by sach component for the benefit of selected personnel
from sister comnonents and (2) the repetition, after 4 to 6 months,
of the familiarization tours now given new EOD's, with follow-up
tours thereafter every year or two.
56
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
QUESTION: Will the effect on inter-unit coordination be negative if
OCR machine operations are consolidated into a single machine center
ouch as is currently under consideration?
The purpose of consolidation is largely economic and
25X1A9a 1 0 ioi?eve consolidation of machines need have a harmful effect
on efforts to coordinate activities of Classification, Reference,
and Machine components. As more complex and costly machines are
acquired, there is a natural and concomitant tendency to consolidate
machine facilities because of the prohibitive costs of multiple
installatiOns and the technical specialization required to operate
ouch equipment. The increased need for coordinating input and
output activities when working with such equipment may counter-bal-
ance any separation of input and reference personnel from, machine
personnel resulting from the physical consolidation Of machine
installations.
57
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Release 1999/09/24: CIARDP84-00951R000400070003-0
Summary of Final General Discussion
Most .?iscussion and comment in the final session of the
Conferencetleeg4 on consumer requests and appropriate on reactions
thereto, pointed out that Our response tq moat informa-
tion reques s must, o .necessity, be a collective or "team". response
because of the complexity of our retrieval problem and the substantive
interrelationships of the materials received and classified by the
vari2m(fwaonents of OCR,
reflecting on "requesters he has known," stated that
they generally fall into one of two categories; those who have no
knowledge of how to retrieve the data pertinent to their request,
and those who feel they know better than the information specialist
how to perform the search. Personally, he said, he prefers the
former, although this usually means that a reference analyst must
spend con8iderable time with a requester in order to determine ex-
actly whet.: it is that he seeks. Requests from within the Agency,
he stated. usually reflect considerable understanding of the re-
trieval y7oblem, while extra-Agency requests do not. As for the
advantaged to classification of better titling of reports, he agreed
that increased training of collection officers in report preparation
and titling would be useful as long as Such titles as "Waltz me
around again, Mohammed" continue to be received.
The session ended with some brief Observations on the oft-
discussed possibility of establishing sHcentral,point in OCR where
customers could obtain coordinated reference service.
indicated that centralization and coordination of our service
activitie3 would be materially aided by the physical accommodations
planned for OCR in the new Agency building. In the more immediate
future, h? added, it is, likely that there will be a greater number
of intra-OCR briefings and increased interchange of personnel be-
tween the various OCR divisions.
58
25X1A9a
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
APPENDIX
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
CONFERENCE ON PHILOSOPHY OF DOCUMENT CLASSIFICATION IN OCR
21 November 19,9
Chairman - Paul W. Howerton
0900 Introduction to the Nature of Classification .
0930 Panel I - The Intelligence Subject Code . ? 4
1030 Break
1045 Discussion
1100 Panel II - Classification Tools
1200 Lunch
1300 Discussion (Panel II)
1320 Panel III - Supplements to the Main Classified
File
1400 Discussion
' 1430 Break
1445 Panel IV - Contribution of Machines to Developmen
of the Classification Process . . .
1515 Discussion
1545 General Discussion
Program Assistants .
Logistical Support .
25X1A9a
25X1A9a
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0
Approved For Regikelit Re 3t-?95'
1R000400070003-0
ONY
FOR OFFICIAL US ONLY
Approved For Release 1999/09/24: CIA-RDP84-00951R000400070003-0