THE PARAMETERS OF AN OPERATIONAL MACHINE TRANSLATION SYSTEM
Document Type:
Collection:
Document Number (FOIA) /ESDN (CREST):
CIA-RDP80B01083A000100160026-0
Release Decision:
RIFPUB
Original Classification:
K
Document Page Count:
10
Document Creation Date:
December 22, 2016
Document Release Date:
March 27, 2012
Sequence Number:
26
Case Number:
Publication Date:
October 27, 1960
Content Type:
PAPER
File:
Attachment | Size |
---|---|
CIA-RDP80B01083A000100160026-0.pdf | 480.17 KB |
Body:
Declassified and Approved For Release 2012/03/27: CIA-RDP80B01083A000100160026-0
PAPER READ BEFORE THE NATIONAL CONFERENCE OF THE
AMERICAN DOCUMENTATION INSTITUTE
BERKELEY, CALIFORNIA
11:00 A. M. PST 27 October 1960
THE PARAMETERS OF AN OPERATIONAL MACHINE TRANSLATION SYSTEM
by
Paul W. Howerton
Deputy Assistant Director, Central Intelligence Agency
The use of machines to do high-volume, high-speed translation from one natural
language to another is rapidly approaching operational capability. There have been many
claims and counter-claims by several of the centers of research in machine translation
published in the press, and, as is usually the case, there is some truth in each of these
statements useful to our purpose of defining the operational parameters. In this paper
I propose to discuss the current requirements for machine translation and the data base
which can be used to come to final decision concerning these parameters. I do not
intend to recite the historical development of the field except as this experience is use-
ful to the purpose of this discussion since that chore has been well done by the Committee
on Science and Astronautics of the U. S. House of Representatives.
THE STATE OF THE ART
There are two principal schools of thought concerning the development of machine
translation. The first has few advocates, but the few are very articulate. This group
maintains that we must first concern ourselves with the design of special machines to do
the translating. The other school believes that general purpose computers can be used
* U. S. Congress. House, House Report No. 2021, 28 June 1960.
Declassified and Approved For Release 2012/03/27: CIA-RDP80B01083A000100160026-0
Declassified and Approved For Release 2012/03/27: CIA-RDP80BO1083A000100160026-0
-2-
for some time to come for both research and production in machine translation. Incisive
inquiry resolves this dichotomy to the conclusion that the former group believes the
problem of MT to be a machine one, while the latter believes it to be a linguistic problem.
I count myself in the linguistic group.
There is disagreement between the so-called "pure research types" and those of
us who believe that the need for machine capability is so urgent that we are willing to
be satisfied for the time being with finding a routine that works reasonably well and
whose operations are based on potentially transcendent concepts.
There are some who believe that a machine should be able to turn out a grammati-
cally and syntactically perfect product before we attempt production. It seems strange
that a machine should be expected to turn out translations which require no editing or
revising when human translators can not. There is no translation facility in the govern-
ment or elsewhere known to me which does not use a review process for polishing its
product and assuring meaning transfer. Although a few brave souls have tried to assign
percentages of adequacy to machine translated materials, they have never been very
successful in relating their percentages to a base which was constant. In another section
of this paper I shall put forth some experience which I believe will form a constant base
for evaluation.
Because my task here is to talk about operational capability, I shall not speak
to the theoretical research being so ably carried on by several research centers,
rather I shall now make a categorical statement that in my opinion, based on associ-
ation with machine translation research since 1952, the United States can look forward
to an acceptable machine production capability in 6 to 10 disciplines in a year's time.
Declassified and Approved For Release 2012/03/27: CIA-RDP80BO1083A000100160026-0
Declassified and Approved For Release 2012/03/27: CIA-RDP80BO1083A000100160026-0
-3-
The Air Force program has a general vocabulary now in being, which is able to make
word-by-word translations from Russian language newspaper text. Our program at
Georgetown University under Prof. Leon E. Dostert is now capable of translating from
Russian randomly selected texts in organic chemistry and very soon will be able to
accept texts in economics. By early spring 1961 we shall have vocabularies in physical
chemistry, geophysics, high energy physics and solid state physics to add to our
present lexical repertory. The computer program at Georgetown is being changed
over from its original form for the IBM 705 computer to the IBM 7090. With the
vocabularies in the six disciplines listed above, we expect to have turned out by
mid-1961 about 6 million words of text which have never before been translated
and which were not used in the development of the MT program.
Although I postulate the state of the art of machine translation to be of a
sufficient level to warrant operational machine translation production from Russian-
language materials, I do not wish to suggest that all problems in the transference of
meaning from one language to another by machine have been completely solved.
Further, although I am considered one of the strongest advocates of an operational
machine translation system now, I wish also to be counted as one who would raise
his voice in support of any meaningful research which would continue the upward trend
in quality of the machine translated output.
THE MAGNITUDE OF THE TRANSLATION PROBLEM
Our most immediate concern is with the translation of the Russian scientific and
technical press for the benefit of the American scientific community and through it the
national security. With the availability of this material in a form usable by the scientist
in this country who has no capability in the Russian language, we shall be able to appraise
Declassified and Approved For Release 2012/03/27: CIA-RDP80BO1083A000100160026-0
Declassified and Approved For Release 2012/03/27: CIA-RDP80BO1083A000100160026-0
the present state-of-the-art and the probable directions of scientific research in the
Soviet Union. In our early planning for the establishment of operational machine
translation, we reviewed the scientific literature output of the USSR for 1958. These
findings are summarized in the table below.
Table I
Soviet Scientific & Technical Publications for 1958*
Scientific Field
Words
Physicomathematical Sciences
80,
255, 000
Chemical Sciences
26,
015, 000
Biological Sciences
40,
968, 000
Geological-Geographical Sciences
85,
515, 000
Medical Sciences
153,
948, 000
386,
701, 000
Engineering-Industrial
488,
375, 000
875,
076, 000
If even half of the scientific material were worth translating, we would have
a total load of over 1 million words per day for every day of the year. The question
has been put to me several times as to who would read all of this material. This
question is an absurdity, since no one person would want to read all of this output
under any circumstances, any more than anyone would wish to read all the books in
the Library of Congress. The real benefit lies in making the material available soon
after publication without the ordinary delays of getting translations made by human
effort. No one wants all this translated material, but everyone wishes to be able to
select from it.
* Source: Accumulation of data from 1958 issues of Letopis' Zhurnal'nykh Statey
(Annals of Journal Articles) and Knizhnaya Letopis' (Book Annals).
Declassified and Approved For Release 2012/03/27: CIA-RDP80BO1083A000100160026-0
Declassified and Approved For Release 2012/03/27: CIA-RDP80B01083A000100160026-0
It may be interesting to note that a scientific linguist working full time on the
translation of Russian material is able to translate only about 1800 words per day.
With existing and forthcoming machine programs, it is or will soon be possible to
translate up to 50, 000 words per hour and as the programs become refined and as
more efficient methods of input and output are developed, there seems to be no
reason why this rate could not be increased to between 150, 000 and 200, 000 words
per hour.
At the present time all machine translation research centers are using
either punched cards or punched paper tape as the input medium. Our experience
with the preparation of punched cards has shown that a first-class card punch
operator is able to prepare about 9000 words per eight hour shift with an extremely
low error rate. As a matter of fact although these card punch operators had had
no previous experience with Cyrillic alphabet materials, with minimum training they
were able to achieve error rates which were lower than the rates demonstrated by
operators who were transcribing materials in Latin alphabet. In order to satisfy the
input requirements for our suggested million-words-a-day production, a staff of more
than one hundred card punch operators capable of the production rate described above
would be needed. Our experience with punched paper tape has been that although a
paper tape machine operator will turn out higher production on a short test, over the
longer range of a continuous eight hour day the card punch operator will turn out
approximately 14% more material ready for the machine. The explanation for this
situation lies in the fact that the correction of errors on punched cards is considerably
Declassified and Approved For Release 2012/03/27: CIA-RDP80B01083A000100160026-0
Declassified and Approved For Release 2012/03/27: CIA-RDP80BO1083A000100160026-0
simpler and less time consuming than the correction of error on paper tape.
The ultimate in our present horizon of input capability is the early development
of a machine which will read directly from original text and translate that original
text from its printed form into a digital machine language acceptable by the computer.
The present state of development of reading machines suggests a rate of input of
approximately a hundred words per second. This rate is completely acceptable and
compatible with the translation rates which we have suggested to be the optimum in
computer equipment now in being or contemplated. The principal problem as yet
unsolved is the transcription of graphic representations on a page of text. The train-
ing of a reading machine to recognize graphic materials and the routines to place these
graphic materials correctly in the output text remain to be developed. As an interim
measure we shall have to be satisfied with a reading machine which will input textual
materials at a net rate of 50 words per second and then we shall manually insert the
graphics as they should appear in the output text.
The parameters of input then call for a capability to feed the machine fifty
words a second - a capability which appears to be in the immediate offing -- and an
ultimate input rate of 100 words per second.
As mentioned above there are some who will argue the value of the special
purpose computer for machine translation over the use of the general purpose
computer. I have no doubt that at some time in the future as the methods of machine
translation become more and more refined we shall find it desirable to have a special
Declassified and Approved For Release 2012/03/27: CIA-RDP80BO1083A000100160026-0
Declassified and Approved For Release 2012/03/27: CIA-RDP80B01083A000100160026-0
purpose, linguistic computer built. However, at the present time there appears to be
no reason why such a special purpose machine is necessary. There are many computers
capable of doing machine translation available in the United States at the present time.
As routines and programs are developed for these various brands of computers, it
will be possible for institutions or firms having such machines to do their own automatic
translation when their requirement for such translation does not even approximate that
which would justify the acquisition of a special purpose, linguistic computer. Therefore,
I conclude that for the time being the general purpose computer will be quite adequate
for the planning for an operational machine translation capability.
The reliance on table-look-up as opposed to algorithmic programs does not
contribute either to efficient or economical machine translation. If all of the paradigms
of a language must be maintained in table form, there is a great expense in memory.
On the other hand the use of algorithmic routines will permit the storage of only the
stem form of words with the computer carrying out the necessary logical analysis to
identify the morphology and the function of a word in a sentence. For the time being
it seems to me to be desirable that both the table-look-up method and the algorithmic
method be pushed forward with deliberate speed so that sufficient evidence can be
assembled to permit a decision as to which of these methods is superior.
There are some workers in the field who have insisted that the responsibility
for determining the quality of translation lies with the MT research personnel. I
believe that the only meaningful criterion which can be applied to machine translation,
or human translation for that matter, is the effective transference of meaning from
one language to another. To satisfy ourselves that this transference of meaning was
in fact taking place, an experiment was conducted using a single observer who was
Declassified and Approved For Release 2012/03/27: CIA-RDP80B01083A000100160026-0
Declassified and Approved For Release 2012/03/27: CIA-RDP80B01083A000100160026-0
-8-
qualified in both the Russian language and the substance of the material under discussion.
He examined the machine output sentence by sentence and compared the translation with
the original Russian text. His findings were that there was effective meaning transfer.
We then undertook a more extensive research program in which a similar analysis
was carried out by a group of about one hundred scientists broken up into four groups.
The first group had substantive knowledge of the material which had been translated
and also Russian language capability. The second group had knowledge of the discipline,
but not the Russian language. The third group had the Russian language capability but
no expertise in the substance. And the fourth group had neither knowledge of the
Russian language nor of the discipline of the test materials. The summary results
of this experiment showed that in the case of the first group full meaning transfer
had taken place and the translated text was acceptable. The second group, whose
grasp of the discipline was good but whose language capability was slight or non-
existent, found more difficulty sorting out the meanings in lexical gaps, but they
still found meaning transfer to be recognizable. Frustration was apparent with the
two groups whose knowledge of the substance was either absent or minimal - frustration
which at times manifested itself in condemnation of machine translation. Please note
that all respondents who had knowledge of the discipline found the machine translation
acceptable and usable. This, I believe to be the over-riding criterion.
THE PARAMETERS OF OUTPUT
At the present time the machine output is put onto magnetic tape and an off-
line print-out is made. Under conditions of large scale production, this method may
be unsatisfactory. There are in being, however, several devices which will permit
high-speed and high-capacity alpha-numeric output from a computer. There remains
Declassified and Approved For Release 2012/03/27: CIA-RDP80B01083A000100160026-0 --
Declassified and Approved For Release 2012/03/27: CIA-RDP80BO1083A000100160026-0
only to determine the relative economics of the two methods - there is a limit to the
number of off-line print-out devices one may use before the costs overtake the capital
investment and operating cost of on-line equipment.
A great controversy has developed concerning the degree and type of post-editing
required for the machine output before publication. There are some who are so naive
as to think that a machine will be developed which can turn out machine translation
not requiring post-editing. Those of us who have been concerned with translation of
materials for some years, know that this is not realistic. In his book entitled
"Cybernetics of the Present and Future" Yu I. Sokolovskiy, in discussing the quality
of automatic translation from the Russian point of view states: "On the whole one
may say that a machine translation needs approximately the same amount of editing
as a man-made translation". In order to determine the qualifications of a good post-
editor, we believe it necessary to carry on a series of experiments using actual
machine output, and with people of varying qualifications, to arrive at some sort of
reliable criteria for personnel selection. Such a program is now underway at
Georgetown University.
AN OPERATIONAL MACHINE TRANSLATION CENTER
The first approximation of an operational machine translation center shall
have available in it three principal equipment complexes. The first of these shall
be the mechanical reading device which shall convert the printed form of literature
into machine acceptable language. The second complex shall be the translator itself
which, for the time being, can be a general purpose computer, but at some time in
the future will probably be a special purpose computer. The third complex shall be
the equipment necessary for accepting the output of the machine and converting it
Declassified and Approved For Release 2012/03/27: CIA-RDP80BO1083A000100160026-0
Declassified and Approved For Release 2012/03/27: CIA-RDP80BO1083A000100160026-0
- 10 -
into printed form in as expeditious manner as possible. Because of the speeds which we
believe practically obtainable, it does not appear necessary to contemplate the existence
of more than one translation center for Russian language materials for the immediate
future. However, as our capability grows and we are able to handle new languages and
new disciplines, expansion of the center to greater capacity, or the creation of other
centers to deal with other languages, may be desirable.
To review then - we must set up a center which will be capable of translating
approximately 1 million words per day starting from the raw publication and ending up
with a printed form of the output ready for post-editing. At the present time the rate-
determining step in this enterprise will be the input step. However, with the develop-
ment of reading machines, it is our belief that this step will not long remain a problem
area.
Let us not ask of machine translation more than we have asked of other
scientific developments in the past. The aircraft of 20 years ago was considerably
slower and of shorter range than equipment in use today. But that fact did not interfere
with the use of the then existing capability while new and better machines were developed.
Let us remember that the greatest enemy of progress is perfection.
Declassified and Approved For Release 2012/03/27: CIA-RDP80BO1083A000100160026-0 -, _