WORKING GROUP INFORMATION
Document Type:
Collection:
Document Number (FOIA) /ESDN (CREST):
CIA-RDP84B00890R000400060012-7
Release Decision:
RIPPUB
Original Classification:
S
Document Page Count:
31
Document Creation Date:
December 14, 2016
Document Release Date:
July 7, 2003
Sequence Number:
12
Case Number:
Publication Date:
October 26, 1981
Content Type:
MF
File:
Attachment | Size |
---|---|
CIA-RDP84B00890R000400060012-7.pdf | 1.34 MB |
Body:
Approved For Release 2003 /86iwCIA-RDP84B00890R000400060012-7
I
PAPER
26 October 1981
MEMORANDUM FOR: Members, Continuity and Contingency
Working Group
Chairman
SUBJECT: Working Group Information
1. Background
As you are aware, the Office of the IHSA has requested
your participation in a 2-3 week effort to help them identify
the direction the Agency should follow relative--,to the
availability and reliability of IH systems that"wi:ll.be
supported in the '85-'89 time frame.
The IHSA has been directed to develop astr.ategic plan
for the Agency's-IH systems by the end of -August 1982. To
accomplish this task within the given time constraints, they
have devised a four phase approach. In the first phase (our
phase) a series of user oriented working groups will discuss
and clarify the goals or objectives of IH from their perspec-
tive. The second-phase will require other working groups,
consisting primarily of IH providers, to address how the goals
identified in Phase one might be implemented given technical,
space, budgetary and other resource realities. Phase three
and four concern themselves with preparing the draft and final
versions of the Plan. A schedule for development of the Plan
is attached for your information.
2. IHSA Point Papers
The IHSA has prepared two discussion papers that we can
use as "strawmen" to focus our views and structure our product.
In addition to background information provided, there are a
number of specific questions found in both the overview and
robustness point papers which will require our response. Since
our time is very limited, please give as much thought to these
questions and goals as you can prior to our initial gathering
25X1
Approved For Release 2003/08/13: CIA-RDP84B0089
~F.r.rT
Approved For Release IME'R3 : CIA-RDP84B00890R000400060012-7
on the 2nd and 3rd of November. If you can address the
quantitative questions early on, so much the better. If you
are disposed to committing some of your thoughts to paper
for group consideration, please do so.
3. Some Guidelines
a. The time frame for goal implementation is
1985 through 1989. It is assumed that any resources
to significantly alter the current plans for IH
enhancements would not be available until the
FY-85 budget.
b. Our focus should be on requirements and
needs, not solutions or implementations.
c. The basic theme of our discussion is simply:
What is the magnitude of the dependence that Agency
users will place on IH services in the late 80's?
Stated differently--At what point does the unavail-
ability of an IH service seriously impact an
intelligence function?
d. Recommending changes to the functional
capabilities of existing systems is not our concern.
e. We should avoid 'general goodness' suggestions
such as commonality and interoperability. These are
givens in any IH architectural plan.
f. In addition to the questions and goals posed
in the attached point papers, I would welcome any_
additional topics that you or your components would
like addressed.
25X1
4. Briefings
We have arranged for a number of tutorial briefings during
the first two days to hopefully bring everyone up to speed on
the current status, problems and any planned improvements in
the availability of the major IH systems. Topics and speakers
are listed in the attached schedule. If there are other subjects
that you feel would be beneficial to present to the group, please
advise or myself.
Approved For Release 2003/08/13 : CIA-RDP84B00890R000400060012-7
QVIDET
Approved For Release 2003/08/13 SEW 84B00890R000400060012-7
5. Reference Material
As you can ascertain from the listing attached, there
are a number of existing documents which have, some relevance
to the issue of continuity and contingency. The overview
point paper briefly summarizes what we have been able to assemble
from these sources. Copies of any of the reference documents
may be acquired by calling F- _j Copies will also be
available during the working group deliberations.
6. I look forward to an informative and productive
association. If you have any questions rior to the 2nd,
please call F7 I
25X1
Attachments:
As Stated
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
MaFT
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
Next 1 Page(s) In Document Exempt
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
Approved For Release 2003/081 ? . DP84B00890R000400060012-7
-,
Copies of the following publications are available from
the Chairman or IHSA representative.
1. OC - Information Handling Survivability Study,
dated 3 September 1980
2. ODP - MFR on Emergency Planning, by
dated 28 October 1980
3. NBS - Federal Information Processing Standards
(FIPS) Pub 87, Guidelines for ADP Contingency
Planning, dated 27 March 1981
4. Processing System Availability Charts covering
recent Fiscal Years
S. Communications Planning Issue (Recapitalization
Paper), First Two Sections: Challenge and Staff
Network, dated 17 August 1981
6. Information Handling Study-1980, Final Report of
the Information Handling Task Force, dated
5 September 1980
Approved For Release 2003/08/13 : CIA-RDP84B00890R000400060012-7
25X1
Ap jR'b l Fr%11 eld ? 266,-RMU:I(IAE~k{ 64B0089OR000400060012-7
TASK Sep Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
hase I: Objectives Def
.
Working Group Session (Phased)
h
i
S
s
es
ynt
Report to Senior Mgt.
i
Pl
ng
ann
hase II: Implementation
Guidance
f Pl
nnin
D
g
a
ev, o
rallel)
(P
i
Pl
a
ng
ann
Phase III: Dev. of Integrated Plan
Pl
i
an
c
Dev. of Rough Draft Strateg
Report to Senior Mgt.
0
Phase IV: Reconciliation
Reconciliation with Budget
Dcv. of Final Report
Report to Senior Mgt.
Legend
Documentation
Presentation
Approved For ReleaC;?2QO. i I:1 JA-RDP84B0089OR000400060012-7
P
RDP84B0089OR000400060012-7
Approved For Release 2003/08//13 : QAL
J~~17
WORKING
PAPER
Continuity and Contingency Overview Paper
a. Objective
b. Definitions
a. Political Considerations
b. Recognition of Need
c. Architectural Considerations
3. Current Planning Efforts
a. NIEPS
b. OC Initiatives
c. ODP Initiatives
d. OL Initiatives
a. Affordability
b. Interoperability
c. Reserve Capacity
6. Discussion Questions
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
SECRET
Approved For Release 2003/QZLI -RDP84B0089OR000400060012-7
Continuity and Contingency
a. Objective
The objective of this point paper is to focus attention on system
continuity and contingency planning for Agency IHS services. Our
intent is to-examine the more vulnerable aspects of the Agency's
information handling systems (IHSs) identifying critical functions and
susceptible choke points, gain perspective from a users view of the
criticality of these individual system weaknesses and generate user-
backed input for the Agency's IHS Strategic Plan. (For this effort,
the scope of threats to be considered exclude the nuclear war concerns
of the National Intelligence Emergency Planning Staff (NIEPS) and
related activities.)
Continuity is defined as the capability to sustain the
intelligence process when the system is impacted by forces that cause
a stressful condition. Contingency planning is those steps that are
taken to secure continuity of operations under the circumstances
mentioned above. Other terms which describe a system's ability to
sustain, failure or damage and continue to provide the services that
users need include availability, serviceability, reliability and
survivability. From a systems view, the term robustness describes
these characteristics. It refers to the resilience and reliability of
a system -- its ability to continue to function without failure, or
given failure or damage, with minimum degradation of performance. In
addition, robustness refers to the strength of the system in the sense
of its ability to absorb any additional demands that may be placed
upon it in a stressed mode.
Events of recent past as well as press and intelligence reports
have confirmed that the terrorist threat is ever-present and
increasing in intensity and sophistication. The vulnerable position
of Ito terrorist attack has been
readily apparent for many years. For example, the Office of
Communications has long been concerned about the "all the eggs in one
basket" problem As far back as ten
years ago, studies were made addressing Isusceptability to
terrorist attack, as well as natural forces and catastrophic failures
Approved For Release 2003/02/13 : CIA-RDP84B0089OR000400060012-7
M. RPT
25X1
Approved For Release 2003/0" RDP84BOO89OR000400060012-7
that might occur within the systems installed there. Over the past
decade this situation has only worsened, especially when considered in
the light of the many concentrated nodes in the ODP data processing
system and the critical choke points that are found within the OC data
links throughout the region. (Consult OC Survivability Study.)
When considering contingency plans, the focus of the Office of
Communications has traditionally been directed toward the facilities
of the foreign network, since that was where the major threat was
perceived to be. Today, however, this situation has changed
dramatically. In certain cases our facilities may be even less "safe"
Moreover,
an evident, heavy concentration of information handling equipment in
any one location is almost an open invitation to hostile acts against
it. There are, as well, throughout our system a number of fairly
evident (and highly critical) architectural choke points. The loss of
any one of which may be stifling to maintenance of the intelligence
process. There is now much evidence of increased managerial concern
for protecting the Agency's IHSs from such a loss. Additional
security measures represent only the first tier in providing this
protection-- interoperability, expanded capacity, availability,
survivability all represent additional and, ultimately, necessary
layers in the plan.
b. Recognition of Need
The Intelligence Capabilities for the 1985 Study on Surge
Collection and Analysis further confirms the need for an unspecified
measure of reserve IH capacity, both fixed and transportable, as well
as a robust information handling infrastructure with increased
capability to absorb unanticipated sytem demands. Although the Surge
paper does not specify, in definitive terms, the extent of the data
communications and processing capacity required, the need for an
expanded capability in this arena is clearly evidenced.
Also, the EXCOM Staff Long-Range Planning Project of December
1980, CIA Managment Directions for the 1980 CIA Management Directions
for the 1980's, affirmed the requirement for flexible and prompt
reaction to challenges in the IHS arena. It highlights the
destructive impact of constrained fundings in past communications
budgets, and the unacceptable risks incurred unless we make the large
front-end investments required for a more capable and robust IHS
architecture.
At the IH systems level a measure of redundancy is usually built
in to the design so many internal system failures are covered. The
infrastructure which supports the system, however, is often not as
well backed-up and, therefore, is subject to single point failures.
This is not to argue that our system architecture must be redesigned,
for this is neither practical from a physical nor a technical
-3-
Approved For Release 2001 -fIA-RDP84BOO89OR000400060012-7
Approved For Release 2003/01/J r -RDP84BOO89OR000400060012-7
viewpoint. It is to suggest, however, that the largely vertical IHS
architecture (see Headquarters Signal Flow Diagram in Robustness point
paper as an example) should be critically examined and future systems
redesigned accordingly. Current technological developments such as
packet technology, will greatly assist in this evolution. In the
system robustness paper which follows, this generic problem is
explored to greater length and appropriate suggestions are put
forward.
3. Current Planning Efforts
a. National Intelligence Emergency Planning (NIEPS)
At the Community level, the NIEPS staff is planning for the
information, communications', and personnel requirements needed to
support the President and his successors in a national emergency.
This effort is focused on continuity of government in the context of
strategic nuclear war, as mandated by Presidential Directives
(PD/NSC's) 53 and 58.
Within the Agency, the Planning Staff, Office of Plans and Policy,
has been assigned responsibility for continuity of operations of the
Agency, also primarily in the context of nuclear war. This planning
encompasses all types of eventualities, however, and is focused on
continuity of the Agency's general foreign intelligence functions.
The role of this segment of the DCI's staff is to guide, stimulate,
and coordinate the emergency planning of the individual offices within
the Agency. Ultimately, the product will be a contingency plan for
continuity of vital Agency functions in the event of a catastrophic
confrontation. It will be the role of the Planning Staff to
coordinate the various initiatives and, to that end, the results of
this working group will also be made available. The objective is to
bring about an orchestrated, serious and concerted examination of all
Agency contingency capability and planning.
The Information Handling survivability study of 3 September 1980
is the most recent assessment of the survivability of the OC
Information handling systems. After a close examination of the
communications network, several detailed suggestions are brought
forward in this report. Principal among these are the following:
1) Provide high speed data handling capabilities
to the foreign field.
2) Improve the connectivity of the Agency's microwave
system and explore redundancy in the routing of
commercial carrier circuitry.
3) Provide a backhaul capability for the SKYLINK
system.
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
SECRET.
4) Improve the circuitry (and equipment) at the
Headquarters building to expand its gateway
function in the event of a catastrophic failure
5) Establish a network architecture plan that
incorporates the compatiblity requirements of
different networks.
If desired, this document is available to participants for further,
more detailed examination.
The Office of Data Processing has taken several initiatives to
also examine its system, identify the vulnerable aspects of it, and
recommend approaches to increase the overall system robustness of the
Agency's data processing assets. The most recent of these, a 28
October 1980 MFR, is also available to the Working Group for review.
The MFR summary highlights several potential problem areas including
the need for review of SAFE compatability "not only from the backup
concept but also from the manpower, training and user point of view."
The MFR, however, is largely directed toward reporting things as they
are or have been, and not as ODP would like them to be. It concludes
that "plans to develop a.capability outside present quarters should
wait for a statement of overall Agency policy and guidance." This
current exercise is directed toward that end.
The Headquarters Engineering Branch of the Real Estate and
Construction Division, OL is completing a two-month study of
alternative actions to improve the reliability/availability of the
conditioned power systems. This study looks at some alternative
investments and their likely reliability/availability payoffs. The
principal consideration and perhaps some of the results from this
study will be available to this Working Group as it meets.
In examining the continuity and contingency question it is useful
to first identify the threats that might be expected and, in some
fashion, assess their relative likelihood of occurence. The following
table presents a categorized listing of such threats. The table
represents general hazard groupings which are intended to facilitate
easier treatment of the threats that are likely to confront us
throughout the ensuing decade. For example, the likelihood of a
suborned Agency or contractor employee attempting to sabotage a node
or nodes of our IHSs will probably be significant if we find ourselves
militarily opposing a Soviet push and we have the same IHS
architecture as today. Also, system supportability issues may become
more critical as systems become more complex. As used in the Outage
Approved For Release 20(00/13: CIA-RDP84B0089OR000400060012-7
Approved For Release 20031 41 # -RDP84BOO89OR000400060012-7
Causes table, maintainability covers the range of conditions that may
exist if an IHS product line became materially or technically
unsupportable for lack of spare parts, tech training, vendor support,
obsolescence, etc.
Participants are requested to complete the table by evaluating the
"likelihood of occurence" during the decade of the 80's of each of the
17 specific threats listed. This information along with the "extent
of damages expected" should be recorded in their respective columns in
terms of very low, low, medium, high and very high. The concern is
that we have an architecture that is balanced with respect to risk.
That is, a high probability of occurence should correlate with a-low
level of damage, and a low probability of occurence with a high
damage. A threat that provided, for example, a medium probability of
occurence and a high damage would represent an elevated level of risk.
Such a threat would then be identified as a particular concern.
(1) Affordability
Probably the biggest single issue with respect to continuity and
contingency is the priority that should be assigned to these concerns
relative to the other concerns (as represented in the other four
working groups). Accordingly, at .the close of this series of Working
Group meetings each participant will be provided a form on which they
will be requested to quantify the relative importance of improvements
in the IHS concern. Within the scope of this Working Group, however,
the assessment of.priorities should be qualitative. The dominent
concern is ultimately the affordability of continuity and contingency
investments.
(2) Interoperability
There, of course, are other valid concepts and concerns such as
system interoperability and alternate path processing which may impact
heavily on contingency planning. Articulation of critical IHS
requirements will assist in identifying candidate systems for enhanced
interoperability design. An awareness of opportunities to use
interface and protocol specifications, in both planned and current
systems design, that will fully exploit IHS interoperability will
greatly assist in overall large-scale contingency planning
Probably the best historical example of near-continuous
contingency planning within the Agency can be found in the altroute
management in the Office of Communications HF network. Before the
coming of the satellite era, all overseas field station links were
either solely or secondarily HF-dependent and, therefore, assigned to
one of several HF Base Stations. However, the very nature of the HF
transmission medium as well as the political vulnerability of the
individual overseas Base Stations made exercising various failure
scenarios within the network a regular monthly necessity.
Approved For Release 2@Q3/08/13 : CIA-RDP84B0089OR000400060012-7
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
SECRET
understood by the participants; over the ensuing decades the procedure
became near institutionalized. Although these exercises were, and
are, admittedly carried out in a much more restrictive and compatible
domain, a similar pattern of back-up operations could well be tailored
to strengthen the overall IHS contingency posture.
(3) Reserve Capacity
The Surge Collection and Analysis paper of EXCOM's Intelligence
Capabilities, 1985 Study delineates a requirement for rapidly
deployable field capabilities and rapidly expandable Headquarters
processing capabilities. A major part of providing the surge
capability that is desired for the mid-80's is to build enough-reserve
capacity into the IHS infrastructure to handle unforeseen requirements
as they occur. Another major part of surge planning is to have the
field-deployable, transportable assets available to bring them to bear
on the collection targets when they are needed. By judicious and
prudent management of assets that are directed toward continuity and
contingency planning it should be possible to also cover much of the
surge IHS requirements. The linkages between surge capacity and
contingency planning are strong and the capabilities required for one
compliments both requirements.
6. Discussion Questions
From experience and the information presented the Working Group is
requested to comment on the following questions. Responses should not
__be limited to a simple endorsement or rejection of the proposition,
but should include justifying comments along with any pertinent
suggestions they may wish to contribute. All participants are
encouraged to voice free and candid opinions in any discussion area.
1. What type of threats are most likely to stress
IHS assets during the 1980's? Please.fill in
the two Outage Table columns. Add explanatory
comments as required.
2. What are currently the most critical IHS service
failures? What do users want improved first?
From a user perspective, are the system availability
figures cited in the robustness paper "believable"?
How much effect does outbuilding location have on
system availability?
3. Should a formal IH system availability
improvement program be established? Is
it a desirable objective for the IHSA to
pursue? As an initial stage of this effort
is it feasible to institute a data-gathering
exercise for all IHS's (and associated support
systems and networks) sufficient to enable
sound availability figures to be specified
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
-7-
UPUT
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
SECRET
for each element of the system as well as
overall system availability? Given the
complexities mentioned, is it reasonable to
expect suppliers to collect MTBF and MTTR
data on all their IHS-related systems? Should
this be a required exercise to justify and
schedule system replacement? Should a time
phased plan be developed that specifies
annual availability improvement figures and
ultimate system availability goals for
specific systems and baseline conditions?
4. What are the most likely (or anticipated) areas
for IHS expenditures in the 1980's? What is
likely to be given up in favor of what? What
are the options and tradeoffs? How extensively
Should IHS expenditures be directed
toward contingency capabilities? From a study of
the point papers, supplemental information and
their own experience, users are asked to make general
comments regarding the extent to which they believe IHS
resources should be committed to improving the
serviceability and survivability of the IHS architecture.
5. Should Agency contingency planning be expanded to
include provision for Surge requirements anticipated
in the mid 1980's? If so, should this Surge
planning include:
a. Building sufficient and extensive reserve
capability into the future IHS architecture
to absorb Surge requirements without putting
strain (or having negative impact) upon the
IH Services that are regularly and routinely
provided to the Agency? (It should be
recognized that under Surge conditions
it is likely that the regular intelligence
processes will also be in a highly
active mode.)
b. Providing field-deployable IH systems sufficient
to support Surge field requirements for
communications, collection and other on-site
.requirements? (It is anticipated that these
systems would normally be transportable and
have a high degree of mobility and versatility
built into the various deployment schemes.)
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
Approved For Release 2003/ It. A-RDP84BOO89OR000400060012-7
IHS Outage Causes Table
Likelihood of
Threat Occurence
During 1980's
A. System Forced
1. Internal System Failures
a. Spontaneous Component Failure
b. Operator Error
c. System Maintainability
(NORS, NORM)
2. Agency Support Failures
a. Power Generator or UPS
b. Air Conditioning or Water Supply
c. Communication Lines/Links
B. Outside-Agent Forced
a. Extreme Weather
b. Flood
c. Fire
d. Untenability
a. External Utility Failures
b. Fire
c. Water
d. Operator Accident
3. Aggression Caused
a. Deliberate System Tampering
b. Sabotage
c. Terrorist Attack
d. Armed Conflict
Extent of
Damages to
IHS Capa
bility If
Threat
Occurs
Approved For Release 2093/08/13 : CIA-RDP84BOO89OR000400060012-7
Approved For Release 200
Mitt 9 IA-RDP84BOO89OR000400060012-7
1 4~ Man x" 5 V W%
yr m ame w +a
I. Background
The constrained budget environment of the past six or eight years
and the government-wide zero based budgeting process have taken their
toll on the Agency Information Handling Systems (IHS). We now have an
environment that is near an optimum with respect to cost versus
performance. Concomitantly, our systems are fragile. For the most
part, it is a single thread environment. As a broad generalization,
there are a large number of single points of failure in our IHSs; it
is far too likely that any single failure will bring down a system,
albeit perhaps, only briefly.
This lack of robustness of our systems will not be acceptable for
our foreseeable future. The Agency has an ever increasing level of
on-line, real-time operational responsibilities. Furthermore, it must
provide continuity of operation during contingencies and war. The
current robustness of our IHSs falls far short of serving these needs.
It provides inadequate availability for the projected numbers of users
of communications and data processing functionalities for the
mid-80`s, and inadequate survivability in the advent that hostile
action or an accident disables a major functional entity.
The reason that the current situation exists, of course, is that
robustness costs money and contributes little, if anything, to
performance. To improve robustness, we have to invest in it, perhaps
at the expense of providing other IHS functionalities. (Such a
tradeoff will always obtain, because there will always be more things
on our "wish" list than there are funds to implement them.) As a
consequence, we have to be careful in specifying robustness
objectives. It is tempting to specify lofty goals, but it should be
recognized that accomplishment of these goals costs, in terms of other
things not done. The problem is intensified because the cost of
availability tends to rise exponentially as the level approaches 1.0.
Since there will be acute limitations of funds in the light of all
the objectives the Agency has for new IHS functionalities, the
governing philosophy should probably be for the Agency to work its way
out of the fragility corner via the planned evolution of the
architecture. As we develop new capabilities we should assure that
they meet our robustness criteria and provide architectural redundancy
with respect to current systems. We should not modify the existing
systems simply to improve robustness--that is simply unaffordable.
While the robustness objectives of the IHS strategic plan need to
e specific to be meaningful, it should be recognized that they are
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
Approved For Release 2003/flFJA-RDP84B0089OR000400060012-7
not based on hard analyses. These will have to come later, and they
will doubtless result in some modification of our goals. But by being
specific at this point, it is possible to assure a common
understanding and mutual agreement as to what we need to accomplish,
relative to where we are today.
1. Measurement of System Availability
The issue of availability is made more complex by the fact that
there are some definitional problems relevant to the terms
"availability." For hardware, intrinsic availability, Ao, is defined
as:
AO = MTBF
MTBF + MTTR
MTTR - Mean time to repair
That is the definition we currently use, but it is recognized that it
has significant shortcomings in the IHS environment. Principally,
these shortcomings are:
o A momentary "spike," or system outage is not
uncommon and can have major impact in the IHS
environment. Although such a spike can
destroy a considerable amount of in-process
processing and contaminate electronic files,
the effect on the numerical values of availability
is negligible.
o An on-line system can slow down quite sharply
when its task load approaches system saturation.
Technically, the system has not failed, yet its
performance may be clearly unsatisfactory.
o There may be occasional, complex software errors
in such elements as the operation system, trans-
action handler, or communications elements of the
systems software. These errors are frequently
so difficult to diagnose that recourse is had to
temporary "work arounds," which deflect the
flow of the processing either back to the user
or into some non-ideal channel. Such deflections
are equivalent to either a loss of time or a
degraded operational mode, yet do not count as
an availability reduction.
Approved For Release 2003/Q /J3 : CIA-RDP84B0089OR000400060012-7
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
SWEET
Numerous proposals for new definitions of availability for the IHS
environment have been made in the professional literature. To date,
there is not a consensus with regard to any of these, and thus a clear
and reasonably complete specification of IHS availability is not
available. For the purpose of this planning effort, it is recommended
that we proceed with the current hardware systems availability
definition presented above.
When the SAFE system comes on-line in 1983, there will be three
central systems in the Headquarters area: the Ruffing Computer Center
(RCC), the Special Computer Center, (SCC), and SAFE. In addition,
there are numerous other systems dedicated to specific operational
functions, and some standalone general purpose mini-computers.
The RCC has grown in numbers of computers over a period of years,
reaching its current status of seven large, general purpose computers
in June 1978.. Principally, needed capacity growth from this point on
has been achieved by replacing existing processors with newer
technology ones of greater power. The installation currently has
three IBM 370/168's, an IBM 370/158, an IBM 3033MP, an IBM 3033 UP,
and an Amdahl V-8, all served by three COMTENs. Several hundred disk
drives and tape devices provide bulk memory for the RCC. The Center
provides three types of services: VM, GIMS, and batch. In general,
each of the mainframes is dedicated to one of these services.
The SCC consists of four mainframes: two Amdahl V-6's and two IBM
370/158's. CAMS currently runs on one of the Amdahl's, but CAMS II
development will be done on a 168 at a remote site in the Washington
area.
Currently the backup computer capability that we have is pretty
much limited to intracenter transfers of workload. We have not had to
test intercenter backup, but there is hardware compatability and a
communications link between the RCC and SCC which should permit it,
should the need arise.
The proportion of IHSs dedicated to on-line functions has grown
steadily. Historical data for the RCC indicates an exponentially
growing level of simultaneous on-line users, averaging 20 percent per
year. Data available on user expectations for utilization in the near
term future indicates that an extrapolation of this historical data
could produce a projection that would be low. All indications are
that central system utilization over the next several years will be at
a substantially higher percentage growth rate that is limited by
communications and central processing capabilities, rather than being
demand determined.
Based on the Ao availability definition in the previous section,.
ODP reports a Ruffing Computer Center system intrinsic availability of
about 0.97. User perception is usually that it is not as good as
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
-3-
MIUT
25X1
Approved For Release 2003/08/13 : CIA-RDP84BOO89OR000400060012-7
SECRET
this, however, and the difference clearly lies in the sort of
definitional shortcomings described in association with the
definition.
Figures 1, 2, and 3 provide representative data from ODP's most
recent computer system quarterly report. From these figures it is
possible to get a feel for the operating environment we have today.
Additionally, table 1 presents the availability and MTBF for each of
the major services at the quarterly reporting points.
The value of 0.97 is certainly comparable with the best values
obtaining commercially today. It is a satisfactory level when
supporting the approximately lIsimultaneous, on-line users that we
do today. It is, in fact, a remarkable value when it is recognized
that the centers are comprised of components for which the
contracturally committeed availability is 0.90. However, maybe in
five years, the number of simultaneous on-line users could be an order
of magnitude greater. At such a level, 0.97 is no longer acceptable;
availability must be significantly better.
If we maintain the lost work due to system failure at the same
level as today, a ten-fold increase in simultaneous on-line users
would correlate with an availability of 0.997. If we specify no more
than one lost hour in a work month of 9 hours days, we are talking
about Ao = 0.995. Also, the SAFE system has specified an
availability of 0.997. The fact that these approaches produce such
similar values argues that our availability objective should be in
this range.
The increase from Ao = 0.97 to approximately 0.997 for ODP's
central systems has major.IHS system implications. It means we must
have:
o better component reliability
o greater redundancy in the context of greater
system flexibility
o'system architecture with reduced single
points of failure
o system architecture with improved fail
soft character
o superior design for maintainability
The requirement for greater reliability is going to have an impact
on the way we procure equipment. GSA schedule EDP equipment has an
individually negotiated availability of 0.90, based on roughly
comparable duty requirements. There has been a traditional reluctance
in the computer industry to bid on RAM-type (Reliability, Availability
and Maintainability) performance characteristics for equipment. Most
-4-
Approved For Release 2 CIA-RDP84BOO89OR000400060012-7
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
Next 3 Page(s) In Document Exempt
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
Approved For Release 2003/08SEGR RDP84BOO89OR000400060012-7
component suppliers have been demurring at the suggestion of bidding
to higher levels of availability in non-schedule procurements. That
reluctance has recently been decreasing, however, as a high
reliability market emerges. A few suppliers such as Hewlett-Packard,
Tandem, and Wang, now specifically market their higher equipment
reliabilities - values of 0.98 or greater.
To achieve a 0.997 system availability without unacceptable
redundancy and component sparing, individual components should provide
even better reliability. The implication is that we need to develop
an approach to procuring a very high level of IHS components'
reliability in a competitive environment.
3. Communications System Availability
To maintain the required level of services the Office of
Communications has traditionally designed the communications switches
with a high degree of built-in system redundancy. The pay-off has, of
course, been reflected in terms of very good individual system
availability figures. (See Table 2 for FY-1981 figures.) System
availability has been maintained at a high level for quite some time
despite the continued aging and questionable supportability of the MAX
switches.
These figures do not, however, reflect overall end-to-end system
availability. To determine this quantity for a particular service
location, the overall ability of the transmission link and terminal
equipment must also be included in the composite availability figure.
(Individual terminal equipment availability probably would not have
much impact, however, since very high availability is usually achieved
by immediate "sparing off" to reserve equipment.)
25X1
System
MAX (avg of 3)
MAX-lA
MAX-II
MAX-III
Availability (A )
99.92
99.994
99.894
99.882
25X1
ARS (avg of 4)
ARS
ARS
WARS
ARS
CDS (A0)
DATEX (A 0)
ACT-0 (A s)
* System was deactivated circa 1 April 1981.
99.52
99.779
99.533
99.624
99.114
Approved For Release 2003/Dfi/13 : CIA-RDP84B0089OR000400060012-7
Approved For Release 2003/08/ DP84BOO89OR000400060012-7
it mm-
Table 3 is a compilation of the overseas communications system
outage figure for both the computer switches and the overseas
transmission media. This table provides instances of outage from
whatever source in these two generic domains. Traditionally, the
quality of the communications link is measured in number of hours of
carrier outage per month, and is so logged for reporting by overseas
communications operators.
Availability values for various services, such as provided by ODP,
are more difficult to measure in the case of communications data links
because of the great intricacy of the various networks and the time-
variable scheduling of service to the overseas stations. This
intricacy is illustrated in figure 4, showing just the major
electronic traffic paths within the Headquarters building.
To develop an accurate system availability, AO , figure for any
particular data service link the end-to-end system availability has to
be taken into account. When the effect of time-variable scheduling of
service resources coupled with the many combinations of transmission
systems external to the building is considered, the complexity of
accurately presenting Ao on a representative circuit is escalated
considerably. End-to-end circuit availability figures have not,
therefore, been routinely provided to the user community, although an
effort is currently underway to do this for the data transmission
links in the foreign network. Better documentation was a
recommendation of the 1980 OC Survivability Study.
In general, however, overall communication system availability is
perceived to be comparable to that of the data processing systems.
Consider a cable flow to the DDO from This traffic is
processed through the following systems: ARS (Hqs),
MAX-III, MAX-II, and CDS. Neglecting the effect of carrier and
station service hardware, these devices provide a circuit availability
of 0.982. This is actually an upper limit, since the effect of the
overall transmission media will degrade it slightly.
An area that has been a recognized availability problem is the
data links to the Headquarters area outbuildings. This is most
noticeably reflected in the availability seen by ODP terminals users
in these buildings. Although no data is regularly collected on the
availability of these links, it is generally perceived by users to
fall well below that of ODP's central systems, which the terminals'
users are accessing.
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
Next 2 Page(s) In Document Exempt
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
Approved For Release 2003/08/13: CIA-RDP84B0089OR000400060012-7
25X1
25X1
25X1
An additional dimension of the robustness concern is the provision
of support services, primarily power and cooling water. Emergency
power for Headquarters is provided by a three-sector system: five
diesel powered emergency generators, five uninterrupted power systems
(UPSs), and a complex, relay-based system to control the emergency
power start-up and system insertion. Elsewhere, such as
simpler.
the configuration is basically the same, although smaller and
Most of the UPS failures--that is failures to provide
uninterrupted power when the VEPCO power falters or fails--have been
traced to electrical and mechanical stress in the control systems.
The UPS control systems currently employed involve a hybrid of solid
state and relay technologies. Some of this technology is now fairly
dated, reflecting the age of the equipment. The control logic for the
three sectors of the Headquarters area systems is implemented
primarily in terms of relays. It is a large -scale system and such
relay control systems are tricky in terms of logic. It is difficult
to test such systems for all possible event sequences, so they
frequently have unproven sequences. Like any relay system, they may
also have intermittent failures in proven sequences. The latter most
frequently occur as a consequence of electrical stress affecting relay
actions in a way which modifes the intended logic. (Modification of
required make -before-break or break-before-make sequencing is a
frequent logic corruption, unique to relay systems, which can be
produced by electrical stress.)
A significant portion of the current data processing system
failures are caused by support systems. Providing system
availabilities of the level specified for SAFE, 0.997, is not possible
without upgrading the reliability of these systems. Such investments
should be considered in two areas: reliability and capacity. The
principal reliability concern is the control systems. Some of these,
as in the case of UPS, are intrinsic to vendor supplied
equipment. Headquarters has an Agency -developed, relay-based control
system containing approximately 2000 relays. Capacity improvements
are required if terminal local nets, local nets, with their associated
disk files, and mini-computers are to be protected by UPSs, in the
same manner as are the central facilities.
As a reference point, the approximate cost of providing UPS power
throughout the new headquarters compound building is estimated at $5
million. This produces an allocated cost of $1667 for each of the
Approved For Release 2003/08%13 : CIA-RDP84B0089OR000400060012-7
25X1
Approved For Release 200 6Atf IA-RDP84B0089OR000400060012-7
25X1 To retrofit the existing building to provide
the same total coverage might cost 50 percent more per person. Our
recent VEPCO experience is that there are about 20 spikes or short
term voltage drops which exceed equipment tolerances per year and one
to three outages.
6. System Architecture: Current and Required
Today we do not have an Agency IHS architecture; we have a group
of architectures. There are the central processing systems, the
overseas communications system, the SAFE system and the NPIC Data
System, to name a few. As the loads on these systems increase,
coupled with user demands for interoperational capabilities, the
importance of an overall architecture rapidly increases.
Certainly one of the most important current concerns with respect
to architectural definition is robustness. The architecture has to be
designed in the light of the system risks. The principal ones are:
o Failure of a key component - usually one
at a single point of failure
o System design or implementation shortcoming
that results in information spillage
o Substantial damage by accident or natural
causes
o Sabotage by an Agency employee or
facility-cleared contractor
o Terrorist acts
o Conventional war
o Strategic war
The emphasis we have placed on these risks in the design of our
systems is approximately in the order in which they are listed. That
none has had strong relationship to funding is seen in the fact that
there are over 30 single points of failure in the RCC system. That is
not the way anybody would like to have it; it is the product of cost
performance optimization in a severely strained budget environment.
The other risks have been accorded progressively less concern, and the
last three, hardly any. Those that rely heavily, or totally, on a
single functional node, such as NFAC on the RCC, are in an extremely
vulnerable position with respect to the last five risks.
25X1
-15-
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
UPUT
Approved For Release 2003/08/13,.:.CIA-$pP84BOO89OR000400060012-7
A remote backup facility would provide a much more robust
environment with respect to the data processing and communications
functions than we now have. To meet operational objectives and be
cost effective, it should probably meet the following criteria:
o Support combined data processing and communications
functions.
o Have a "critical mass" that would permit it to
maintain an adequate level of IHS functions to
meet Agency operational responsibilities, given
o Be located at least 100 miles outside the Washington
metropolitan area. (A location well south of
Washington would permit a higher, and thus less
vulnerable satellite antenna attitude.)
o Have some colocated Agency operational function
that require extensive use of the data processing
and communications facilities. (Only such normal
daily loading of the facility would assure its
continuous operational values to support emergency
operations.)
Approved For Release 2003t0*13 : CIA-RDP84B0089OR000400060012-7
Approved For Release 2003/08/13EW DP84B00890R000400060012-7
The cost of a remote backup facility may not be as substantial as
it would first appear. The data procaessing equipment could be almost
totally provided from existing stock and much of the communication
equipment may be similarly available as well. (ODP creates excess
mainframes when it has to buy larger, new technology machines to meet
demand. OC will have switching facilities - principally the Network
Switching Centers (NSC's) - that MERCURY will provide, which perhaps
could be placed in a remote site instead of Headquarters. On this
basis, the investment required in a remote facility is principally in
the facility itself, cryptographic and communications link equipment,
and any extra cost associated with concealment and deception relevant
to the function of the facility.
Whether such a remote facility alternatively makes sense depends
on two key factors in addition to IHS fragility overseas: the level
of the threat and cost. An expert assessment of the threat is beyond
the purview of this strategic planning activity. Determination of the
cost can only come as the product of concentrated study by a small
team dedicated to the evaluation of the issue. In fact, any such
evaluation should develop alternatives configurations in a gradation
of capability, and cost, corresponding to those alternatives.
In summary, the elements of physical security to be considered in
response to the various threats to IHSs include:
o Physical separation of primary resources
o Mutual backup of separated facilities to
provide system fail-soft operational
characteristic
o Provision for a backup facility, of a
specified "critical mass," more than a
hundred miles away from Headquarters
o Greater physical protection of each
resource center to increase the closest
point of approach of a sabotage-type
threat
o Even more stringent access control to
vital resource spaces, e.g., special
personnel identification cards with
associated monitoring equipment.
There are also electronic architectural concerns. The electronic
and physical architectural issues concerning robustness are coupled.
The capability for mutual backup of physically separated facilities
puts a substantial additional constraint on the electronic
Approved For Release 20038/13 : CIA-RDP84B0089OR000400060012-7
Approved For Release 2003/ / A-RDP84B0089OR000400060012-7
architecture. One of the chief,elements of this electronics
architectural constraint is that the hosted functionalities be
performable on an alternative resource of lesser scale, and perhaps
simpler architecture. There are, however, electronic architectures
for central systems, which essentially preempt backup within the
context of practical cost. Local nets suffer from the fact that,
lacking dedicated operational personnel, backup discipline will be
loosely applied. As a consequence, wherever precious data is
involved, it should be stored either on a central system or a mini-
computer system large enough to have dedicated operational personnel.
1. How Important is Improved Availability of IHS Services?
This question refers to "routine" failures of components. The
answer,of course, varying somewhat with the type of service. The
issue is the level of availability that we need to support Agency
operational needs.
In the foregoing discussion it was suggested that one rationale
for setting A 0 goals is to provide a constant loss of manhours, or
investment, with failures. In this context, if MTTR is a constant,
then the MTBF required is proportional to such parameters as the
number of logged users on VM or the batch CPU time. (Evaluation of
MTTR's implied by table 1 indicates remarkable constancy: about 1
hour.) If MTTR can be progressively reduced with increasing MTBF,
then.the increase in MTBF might not have to be so great. Other
rationales might be considered, but there should be some rationale to
guide this objective.
The base issue is identified as how important is the availability,
rather than what should it be, because the latter is subject to
competing priorities and implementation engineering realities.
Improved availability is going to require investment, and the question
is ultimately going to be, What is it worth?
2. What is the Requirement for Assured IHS Capabilities?
In considering the requirements for mutual backup of headquarters
central processing facilities and for a backup facility, the first
question is, What is the priority and requirement for assured IHS
capabilities? These are capabilities whose availability must be
restored within a specified time to avoid significant damage to
national intelligence operations.
Since this is a first, crude look at such a question, the answer
should not be expected to be precise. The word "significant" is
itself not precise. Ultimately, one might like to consider categories
of mission importance, but that appears to be beyond the scope of this
initial effort. The current objective is to get a sense for the scope
of the requirement for assured IHS capabilities, and from this, to
infer the requirement for backup capabilities.
-18-
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
SECRET
In table 2 is presented the consumption of ODP's resources in 1981
by the larger operational systems. In table 3 are the component-
funded systems. Hopefully, the systems in these two tables are also
most of the important ones with respect to Agency operations. There
may be others, however, of lesser size which are of equivalent
importance. If so, they should be added. The requirement for assured
capabilities should be based on evaluation of the specific systems of
the Agency, as presented in tables 2 and 3.
3. How Important is it to Upgrade Contingency Utilities?
Currently, we on-ly have UPS power conditioning for central
communications and data processing facilities in the Washington
metropolitan-area buildings. How important is it to retrofit such
buildings as headquarters, Ito provide total power
conditioning? (It is assumed that our tenancy in the other
metropolitan area buildings is sufficiently short-term that their
retrofit is clearly not warranted. This assumption should be
verified, however.)
A convenient measure in evaluating this issue may be the
investment required per person. Is this investment considered
worthwhile? With what priority? Since the value of each total
conditioning strongly relates to the installation of terminals and
local nets, the time phasing of such total power conditioning should
relate to the terminal installation goals developed by the Information
Handling Facilities working group.
The reliability of the existing UPS systems which support the
central system is also a legitimate dimension of this issue. Since
this is an integral aspect of the availability of the central systems,
it is probably best dealt with in this context, however.
IV. Discussion Questions
The following are a list of questions which probably need to be
answered in order to deal successfully with the foregoing issues.
1. What IHS functions are most critical to Agency users?
How long can the unavailability of each of the systems
listed in tables 4 and 5 be sustained before there is
significant damage to operational intelligence functions?
Please complete tables 4 and 5 by selecting time
groupings for each system and also indicate your
estimate of desired A o . For like systems, what are
the best estimates of system availability figures
that users will require in the 1985 to 1989 timeframe?
On this basis, what functionalities or systems need to
be backed up against a normal host center loss? What
does all of this add up to in terms of non-Washington
area backup capability?
Approved For Release 200M9/13 : CIA-RDP84B0089OR000400060012-7
WWT
Approved For Release 2003/Q814 , -RDP84BOO89OR000400060012-7
2. What opportunities have been identified for non-Agency
processing in a capacity-loss contingency? Have they
been explored with respect to security acceptability
and compatibility? Have then been tested in the backup
mode?
3. How-damaging to Agency operational functions
would be the loss or a major
computer center? Is the
risk high enough to
warrant an investment in a detailed evaluation
of a remote facility?
4. What priority should be given, in general,
to development of contingency capabilities
for IHSs? For example, is the investment and
disruption associated with development and
testing for identified mutual backups worthwhile?
Should simulated failure exercises be conducted
on a regular basis to validate the back-up
capability?
5. What are the availability implications of a local
processing capability for identified functionalities?
It should be recognized that for such local
capabilities:
- Overall functional availability may'well
be less unless complete data base and
application software compatibility with
central systems exists.
- Backup is likely to be unreliably performed
unless dedicated operational personnel
are provided.
(Complete compatibility = software runs either centrally
or locally, without. modification, and files use the same
DBMS in both environments and can be transferred in
either direction.)
6. What sort of new approaches might be evaluated in the
next phase, of this planning effort to obtain high
reliability machinery in a competitive procurement
environment? For example, can we make audit of supplier
quality assurance processes and. procedures an element
of a proposal evaluation?; Are there instances where
we should require that suppliers provide verified
reliability data from independent test laboratories
in their proposals?; and Should we plan statistically
designed acceptance test and evaluation of a sample
lot of equipment?
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7
Next 3 Page(s) In Document Exempt
Approved For Release 2003/08/13 : CIA-RDP84B0089OR000400060012-7