PROJECT(Classified) DOCUMENTATION/INFORMATION RETRIEVAL SYSTEM DEVELOPMENT TASK PHASE I OUTLINE REPORT

Document Type: 
Collection: 
Document Number (FOIA) /ESDN (CREST): 
CIA-RDP78-04727A000200250043-7
Release Decision: 
RIPPUB
Original Classification: 
S
Document Page Count: 
25
Document Creation Date: 
December 9, 2016
Document Release Date: 
March 26, 2001
Sequence Number: 
43
Case Number: 
Publication Date: 
June 28, 1963
Content Type: 
REPORT
File: 
AttachmentSize
PDF icon CIA-RDP78-04727A000200250043-7.pdf706.49 KB
Body: 
Approved For Release 2001/07/12 : CTA=RDP78-02r727A000200250043-7 CIA AUTOMATIC DATA PROCESSING STAFF PROJECT ~ 25X1A2g DOCUMENT/INFORMATION RE'TRIEVV SYSTEM DEVELQ'NLNT TASK PHASE I OUTLINE REPORT 28 June 1963 Approved For Release 2001/07/12 : CIA-RDP78-04727A0002q Approved For Release 2001/07/12 : 727A000200250043-7 CIA AUTOMATIC DATA PROCESSING STAFF Preface This outline report deals with the document/information retrieval system development element of Project thinking25X1 A2g at the end of Phase I of the system development task. The report covers: (1) The results of _ fact-finding throughout 25X1A2g the DD/I; (2) The conclusion that a major central reference system is required; (3) (4) 25X1A2g (5) The initial concept of a new central system; A suggestion to management that a base docu- ment indexing system be urged upon the intelligence community and that this indexing function be performed once and centrally for the members of the community; Theme plan for proceeding with the detailed development of a new document/information retrieval system (through Phases II & III); (6) A set of general observations of particular interest to management; (7) Major alternatives open to management; and (8) ALPS recommendation. 25X1A2g Note: has produced several "depth" papers for its own purposes which elaborate on the contents of this outline report. These papers are available in ADPS to persons wishing to peruse them. Approved For Release 2001/07/12 : CIA-RDP78-04727A000200250043-7 Approved For-Release 2001/07/12: IA-FDP78-04727A000200250043-7 CIA AUTOMATIC DATA PROCESSING STAFF PROJECT- DOCUMENT/INFORMATION RETRIEVAL SYSTEM DEVELOPMENT TASK * * * * Contents. Page 25X1A2g I. Docent/Information Retrieval System Developnent Task A. Four Phases of System Development Task. . . . . . . 1 B. Phase I 1. Fact-Finding . . . . . . . . . . . . . . . . . . 2 - 5 2. Central vs De-Centralized System. . . . . . . . 6 - 7 25X1A2g 3. - System Concept . . . . . . . . . . . . . . 8 - 11 4+. An Intelligence Community Task, Ideally . . . . 12 - 13 5. General Plan for Proceeding with CHIVE System Task . . . . . . . . . . . . . . . . . . . . . . 14 C. Phase II . . . . . . . . . . . . . . . . . . . . . 15 - 16 D. Phase III . . . . . . . . . . . . . . . . . . . . . 17 E. Phase IV . . . . . . . . . . . . . . . . . . . . 17 II. General Observations . . . . . . . . . . . . . . . . . . 18 - 22 III. Alternatives . . . . . . . . . . . . . . . . . . . . . . 23 - 24 IV. Recommendation . . . . . . . . . . . . . . . . . . . . . 25 Approved For Release 2001/07/12 : CIA-RDP78-04727A000200250043-7 Approved For Release 2001/07/12 : - - 7A000200250043-7 CIA AUTOMATIC DATA PROCESSING STAFF PROJECT DOCUMENT/INFORMATION RETRIEVAL SYSTEM DEVELOPMENT TASK PHASE I OUTLINE REPORT I. Document/Information Retrieval System Development Task A. Four Phases of System Development Task: 25X1A2g Phase I - Fact-Finding and Formulation of the Overall Concept of the New System (Sept 62 - June 63) Phase II - Detailed Systems Design (July 63 - June 61E ) Phase III - Implementation of Initial Segment (July 61+ - April 65) Phase IV - Implementation of Additional Increments (May 65 - ?) Approved For Release 2001/07/12 : CIA-RDP78-04727A000200250043-7 fig zi Approved For Release 2001/07/1 - >0727A000200250043-7 B. Phase I 1. Fact Finding a. General Personnel Conducting the Survey: 4 ADPS 25X1A5a1 4 M 25X1A2g Scope All Offices of the DD/I 150 + components studied Fact-finding reports prepared on each 25X1A2g Major Targets of 'act-Finding (1) Missions and functions of DD/I components (2) Information sources used (3) Internal processing and files (internal to Branch, etc. visited) (4) Use and evaluation of external files (5) Reports produced (6) Information needs and problems Survey Completed April 1963 b. Major Factors Bearing on System Development Task Volume of Document Receipts Multiplicity of DD/I Missions and Interests Variety and Depth of Info Required from these Documents Variable Time Requirements: For basic intelligence research For programmed, shorter-length research For current intelligence Approved For Release 2001/07/12 : CIA-Rt '78-04727A000200250043-7 Approved For Release 2001/07/12 7M M - 7A000200250043-7 Trend toward Current Reporting e. DD/I Information Resources (Present System) Composed of; Analyst Files (para. d immediately below) Central Info System (OCR) (para. e) Dissemination Services (para. f) Other Internal and External Services (para. g) d. Analyst Files The Analyst Files are, in fact, the primary DD/I info retrieval system in terms of : Use rate Response time Indexing and content to meet analyst specifications To check validity of new data and to determine its effect on what is already known. To handle immediate, short lead-time ad hoc queries. Basis for more leisurely research, also. Major Strengths Readily accessible Contain filtered data (reflects specialist/user judgment) Tailored to analysts' needs (topic, sequence, and index control) Ability to control subjects (concepts) according to the specific requirements of the analyst Major Weaknesses Data control largely limited to current interests Not readily manipulated Approved For Release 2001/07/12: CIA-R5 P78-04727A000200250043-7 Approved For Release 2001/07/12 : Win- tM 78O 7 27A000200250043-7 L _ra: t^d and partial historical depth Not ideally accessible to other analysts Organizations, personalities, areas not .easily controlled Duplicative processing among DD/I components File maintenance detracts from analytic time e. Central System (OCR) General Role - Back-up to Analyst Files for: Historical depth Gaps in analyst file coverage Routine, long lead time requests Major Uses To provide comprehensive recovery for long lead time, research projects To provide retrieval of data not controlled in analyst files To provide comprehensive storage and retrieval on organizations, personalities, areas Major Strengths Provides historical depth (institutional memory) Comprehensive topic and area coverage Multi-access to documents, e.g., date, source, topic, area, etc. Backstops intelligence gaps in analyst files Document repository rajor Weaknesses No single point for all-source retrieval Outputs from multiple points not compatible Approved For Release 2001/07/12 : CIA-RD$78-04727A000200250043-7 Approved For Release 200 1/07/12 : - 7S U4727A000200250043-7 STATSPEC Insufficient emphasis given to open literature, and cables No sensi::_ve to shi:1'ts in intelligence sources and prior Lt y .1nterests Iradecuate geographic coordinate retrieval Duplicative processing f. Dissemination Services Manual system Minimum of 120 man years/year (rough estimate) One million unique documents/year 10-15 million multiple copies/year 150-200 components served with specific reading requirements General analyst satisfaction Timely and accurate Inefficient and costly g. Other Information Retrieval Services 25X1A5a1 Agriculture, etc. Published bibliographies and indexes: Monthly Index of Russian Accessions, Referativnyy Zhurnal, ASTIA Technical Abstract Bulletin, etc. Files of other agencies: FTD/AFSC (White Stork), Dept. of Commerce, NSA, etc. FOIAb3b1 25X1 B4d Map Library, NPIC, , RPB/ FOIAb3b1 ~ RID/DDP, etc. Analyst chatter Approved For Release 2001/07/12 : CIA-RDP78-04727A000200250043-7 Approved For Release 2001/07/12 : CtA RDP` 8-04727A000200250043-7 2. Central vs Dc-Centralized system ;'his is a ma 2r ci ,ei ion e-r?ea for both systems design and management. ~' /A decision for a (e-centralized system would mean the up-grading and coordin:z,Gion of the Analyst File complex with near-total dependence ax)on same and the correlative curtail- ment of the central sy.;ten: to a very low use, very slow response, essentially archival role. /On the other hancc, a? decision for the continuation of an up-graded central system, in addition to the Analyst File system, means that heavy expenditures for a central system will not only continue but undoubtedly increase, that the effort to devise an improved central system must continue, and that eventually the resultant advanced system must be implemented and the cost and commotion of doing so accepted.7 a. De-Centralized System (Analyst Files) Provides primary support to intelligence production Proven in practice Reflects user needs and judgments For majority of uses, is preferred by analysts. (Will always exist to some degree.) Integrated sources (within clearance-level of analyst) "Personalized" files Difficult for others to use Lack continuity and consistency Difficult to manipulate Coverage of all orgs., persons, and areas, etc. not feasible Number and size would increase without central system -6- Approved For Release 2001/07/12 A000200250043-7 Approved For Release 2001/07/12 : - P-78-04727A000200250043-7 b. Centralized System 25X1A2g _ concludes a central system is long-run "must" for systems tic cioc:/int'o control If improves., ` ould.: Have higher use rate... thereby increasing the return or: expenditures; and Me Lnroads into present Analyst riles... thereby helping to offset costs If accepted as a base index system for the Intelligence Community (see para. IBl below), the 1111111 system 25X1A2g would undoubtedly pay for itself several times over. Approved For Release 2001/07/12: CIA-RDP78-04727A000200250043-7 Approved For Release 2001/07/12. 27A000200250043-7 25X1A2g 3. System Concept. a. Very simple to Lay: Central, ir.te:rated, machine-supported system to provide docuie.ii and information retrieval for the total DL/l. document flow. All geographic areas Al topics (persons, places, things, organi- zations, subjects) Depth ina.ex ng Direct entry to files (input or querying) Lingle-processing of input single-point retrieval Approved For Release 2001/07/1 WePd78o-f~gA000200250043-7 is TnPPhinP-;kAsIrt.P_(3 in-nut (Ine __ auto indexing) Approved For Release 2001/07/12 : CIA-RDP78-04727A000200250043-7 indexing urn.;:1 dissemination c ax-dom access capability :i-iai machine translat ion/Stenowriter ttG.:as ility 'xpc irr.ental rE:mote inquiry or display ntersed E to (1966-1967) C r,ye hardware complex/some advanced hardware 1 indexing of hard copy Some automatic indexing of rjmachine language sources Some character recognition (ex-Derimental) Limited remote interrogation ana display Some automatic dissemination Volume machine translation Target System (1968 - ?) Very large and advanced hardware complex, including extensive random access capability Automatic indexing for major portions of :,ase recovery system (incl. character recognition ) ~~uman indexing for special info retrieval projects Remote interrogation and display Automatic dissemination Volume machine translation (improved quality) (1) Document storage and retrieval (a) Persons, organizations/installations, and 9eo_- ra Ic locations to be stressed Approved For Release 2001/07/12 : CIA-RDP78-04727AO00200250043-7 -9- Approved For Release 2001/07/12 A000200250043-7 :'veils of most universal interest to Ana y sts s;7ea:.c st links in Analyst Files cror:est elements of present Central - >y s ;em aloLir:_e beyond proper handling via Analyst (b) Comraociit- * s r;.nd Subjects to be covered with less : ml]bLsis dot priority need ,irn..Lec. use in central system -.rya.= yst Files handle concepts (Subjects) Jet E.er 25X1A2g 25X1 B4d (2) Informations Storage, Manipulation, and Retrieval (a) Correlative to Document Index System via: index display Synthesis and summarization of index entries (b) Special Projects (Language Processing), such as: 3trategic Facilities Project Project (c) Major Automated Information System ;Subject: Targets cScope : World-wide Inputs : Machine language files external to _ 25X1A2g -index data (selected) 25X1A2g Special inputs designed for this system (For elaboration, see -l A2g paper, same subject, dated 2 May 63) Approved For Release 2001/07/12.: CIA-RDP78-04727A000200250043-7 -10- Approved For Release 2001/07/12: I - A000200250043-7 (d) Computatix. ncrical Processing), such as: 25X1 B4b (3) Non-liter .~ .a a Processing, such as: 25X1 B5e (1) Machine Trans1 ..ion/Stenowriter (5) Publication Sup?ort (Use of comT?uter for composing, tyne settg;, etc.) 25X1A2g d. troubled by size of task (1) Complexity of system design (2) Balanced nundling of such variety and volume Accompliaa objectives without undesirable consequences Hardware/software limitations (3) Costs - personnel and budgetary e. Full solution will require: (1) Development of new techniques Index, dissemination, abstract, display, input/ output, etc. (2) Development of new hardware Memory, input/output, character readers, etc. (3) Money and people Major investments during developmental years. Savings in long run? -11- Approved For Release 2001/07/1 A000200250043-7 Approved For Release 2001/07/12 : CIA77TOM - 000200250043-7 ii . Ideally, an Intellif;ence Community Task a. Ideal approach VOLn c._ . &r - task s: could be done centrally for the Intell._;enc?`;. ? zuaity (1) Community Mon (a) Eesi ani U:velop centrally a bass- doc/info c ry cet. for u::e by co rrnuni y mea:tep_ (b) Index cen:aily all does collected/or_ Anated (c) by In ell:i, nnce Comm nit y :ome decen_tralizea input but cc 1i:1 orL.~r.:[ no base system lame special-purpose, limited-interest Cate;orles excepted Provide We retrieval index, or suitable portions, to community members (d) Output servicing to be performed by individual members for its local users Ease system - provided by central organization Epeclal files, as required - built and warvLned by individual members Some output servicing provided by central organization (e) Initially: aoc/info indexing and retrieval (f) Eventually: translation, requirements control, etc. (2) Executive Agent - CIA (or Intelligence Processing Center under USIB) C"':n has most experience in large-scale, document systems CIA has best/largest personnel base C_iA already started towards such a system via 25X1A2g Approved For Release 2001/07/12: CIA= BP78-04727A000200"250043-7 Approved For Release 2001/07/12 : i - - 727A000200250043-7 C.i ;gust -~ Lo :io anyway for its own needs 0 ..: r rnanag ment to take ;interest to CIA, Luageu shou:t_c~ respond with real 1c F-sm to such an idea, and J 1u;,:_ -g one system ins ectin Limned to real worla c. Fund and shape :_xternal R&D of hardware softwa if commercial duve_io :1nt of same is not adeoua~e... (4--10-3 - ?) Mast lave new caT:a'bili ties to accomz oda -- grc,__ o,. sys V eia iequiremen s r ll be clarified during system :sign d. Implement initial segm-ant of new system...(July 6L. - April 65) e. ,, d coverage of new system...(May 65 - ?) Approved For Release 2001/07/12 : CIA-RDP78-04727A000200250043-7 Ti ry D r-n m Approved For Release 2001/07/12. _ 7A000200250043-7 25X1A2g C. Phase II - Datail.ea. (July 63 - June 61 ) 1. Perscrncl: a. ADPS - ccntinuiiLg lraa r~:ase l b. _Con_tracto: (:ii ) - cor ti: uin , ~' ^o ._ 1 C~es,se Z c. OCR - : Z`~ L ~:. _ ec~t let c rel ddl~ -lev el t~ ror;. CC' to irork Pull -sine on -Phase II. Tais e . wo ld: itccei?.fe training in EDP ,izg Integrated output Postures the central sycten to grow with EDP (where future machine-support capabiliti.os lie) Eventual automation of some functions now done manually C. Functions of OCR Affected/ot Affected by-System 1. Affected: indexing and retrieval Machine support Dissemination Document storage and retrieval Photo storage and retrieval 25X1A2g Approved For Release 2001/07/12 : CIA-RJBP78-04727A000200250043-7 Approved For Release 2001/07/12 Iii-RDP7 e4727A000200250043-7 rxtractin/stzactir,; services Publications prccurm:.er:t accounting and control 2. Not Affected: Book caiaioc; n anc . rnelving Publications a,.c r.,_.) --ocuu:^e icnt Library reference a. a,.- circulation sc: Pic. s (non-document) Distribution seavic~-:~, i.e., tare mailroc;_~ functions Motion picture T,rese-Zuations Liaison Staff Historical lrte .lig;e:.ic Collection D. Organizational Effet ;;s c,n CC Interim - New cys .:m will slowly absorb people and functions ?oni,titute new element; traditional elements ccatinue Eventual - Present OCR Divisions will largely disappear input Di isio.:rs within- will Y "F, sized by25X1 A2g GccE;raph c Re ~i +n Service Jivisiorn Systems :sevelopaent Division - Prot;rax,=.ng Division - Computer Operations Division 25X1A2g - New non_ Division (s) for non runctions25X1A2g E. Schedule of Effects on OCR Phase I - Fact-Finding and Systems Concept... (Sept 62 - June 63) lec hone Phase II - Detailed Systems Design... (July 63 - June 64) Lifec-: Done, except OCR System Trainees join with 25X1A2g Approved For Release 2001/07/12 : CIA-RDP78-04727A000200250043-7 -L9- Approved For Release 2001/07/1?mM - 727A000200250043-7 - Am Phase III - Initial ;ma-.c ,-ntation... (July 64 - April 65) r;t.< index, reference, r.ncl punch el phase over co new system -c- on of ola to new files i. c sec ; u i.ed) old system iced by E.,24 Phase IV - ,tc-)a-_lsion of (i ay 6j - '< L .'ect: ':_lc: maintenance/index/reference from,,. IR/ DR/GR/DD/Ly .ellofax) prase into new system . , cn personnel in MD phase over ?' c conversion accomplishe,l (limited) ;,iec;:: erred portions of old system coftinu- operations F. Single Service Point Idea Implementation of initial seLpent of adds one more-- 25X1A2g -unless OCR develops now a single service point to tap for the consumer all perzincnt C01i resources. 25X1A2g 25X1A2g G. organization of OCR by Geograpnic Region Prior to Implementation of organization of T'CR by Region before M implementation 25X1A2g would foster deaelo-se ent o single OCR service point, would lead to- i eret.ients oy !Region as well as source, and would facilitate successive expansions ofd 25X1A2g 11. State-of-the-Art Implications Conventional human indexing pushed to limit EAM support pushed to limit EDP offers hope through. new capabilities Even with 1'DP, R&D in hard :?rare and software a "must" to expand capabi..ties to meet expanded-requirements in 25X1A2g Phase IV. -20- Approved For Release 2001/07/12 : CIA-RDP78-04727A000200250043-7 Approved For Release 2001/07 - 27A000200250043-7 Machine indoxi:i infcrio: to human inuexing today But , offers sooeec , c .s:Mstency, and eventually perhaps colnparahle quality Total document re ric val ..; r;: cem for DO/ I appears not feasible with today's egkip:i:r t Eventual DD/i syc' *J..1 w b ~;aL?d :n next 3-5 years of implementation exoerl.enc.: a:--u on ~ L) ',al industry I. Budgetary Implications Development and implement i.on costs will be heavy Hardware tevelopr.ent (Government R&D support may be required) Systems/'Tec1rnicuc:s Lt ra_opment (Government support almost certainly re.quir(ed) Parallel Systems Oncration Conversion Eventual system more ecctomycal per item of data controlled J. Manpower Implications By single input hand::.irE:t of documents, hope to gain manpower to pen.:it : Deeper indexing Broader cover-age Greater effort oa out .: K. Conversion Implications It is desirable to convert pr.,sent OCR machine files, if feasible. EP data nay rot be compatible with EDP files, however --A stuc.y question for Phase II L. Security Implicat:~.ons "All-Source" cldtrance for all personnel operating the CHIVE system 25X1A2g I Approved For Release 2001/07/12 : CiA--RDP78-04727AO00200250043-7 Approved For Release 2001/07/ 000200250043-7 25X1A2g :_a v security classif .cation code, noweve r ) Approved For Release 2001/07/12 CIA-RDP78-04727A000200250043-7