A SURVEY OF FREE-RESPONSE JUDGING PRACTICES
Document Type:
Collection:
Document Number (FOIA) /ESDN (CREST):
CIA-RDP96-00792R000701020002-6
Release Decision:
RIFPUB
Original Classification:
U
Document Page Count:
17
Document Creation Date:
November 4, 2016
Document Release Date:
May 17, 2000
Sequence Number:
2
Case Number:
Content Type:
RP
File:
Attachment | Size |
---|---|
CIA-RDP96-00792R000701020002-6.pdf | 1.43 MB |
Body:
Approved For Release 2000/08/15: CIA-RDP96-00792R000701020002-6
A SURVEY OF FREE-RESPONSE JUDGING PRACTICES
Julie Milton
Psychology Department
University of Edinburgh
7 George Square
Edinburgh EH8 9JZ
Scotland, U.K.
An idealised model of the free-response judging process is developed, and
its elements discussed in terms of judging practices in those free-response
studies published in full between 1964 and 1985. A wide variety of
occasionally conflicting judging practices was found, along with valuable
indications for further research in this important area.
AMMEDGEMENTS: My thanks are due to Nancy Zingrone and Deborah Weiner for
allowing me to use a draft version of their free-response bibliography.
20002-6
Approved For Release 2000/08/15: CIA-RDP96-00792R000701020002-6
While free-response methodology has been popular in ESP studies over
recent years, very little research has been directed to the important question
of how best to judgethe correspondence between free-response material and the
target. However, many experimenters have commented on judging issues, or have
reported relevant analyses or data which, when brought together, may suggest
strengths and weaknesses in our judging practices, and promising directions
for future research.
With these aims in mind, I have examined various aspects of procedure
which might influence the success of judging, using as a database eighty-five
free-response studies in which statistical assessment of the results was
attempted and which were published in full between 1964 and 1987 inclusive, in
the Journal of Parapsychology, Journal of the American Society for Psychical
Research, Journal of the Society for Psychical Research, International Journal
of Parapsychology, and European Journal of Parapsychology. Space constraints
prevent me from presenting a summary table of these studies and their full
references, but these can be obtained from me on request. All of the papers
in these journals (whether experimental or not), and those appearing in
Research in Parapsychology during the same period, were searched for
commentary relevant to free-response judging, as well as other sources where
appropriate. The survey is in two sections. In the first section, a model of
an ideal judging process is presented, and its elements discussed in terms of
their importance in current judging practices. The second section addresses
the issues of. whether percipients or independent judges are best suited to
perform the complex judging task, and what qualities a judge should have.
Finally, the findings of the review are discussed with their implications for
further research.
The underlying structure of the judging process
In a free-response ESP experiment, the percipient's task is to observe
and report his or her thoughts, imagery, feelings and mental or physical
experiences, which might relate to a randomly selected target. In
free-response studies, the targets used are generally fairly complex (they may
be people, or geographical locations, objects, and so on). The targets may
have elements (such as colour, the presence or absence of people) which differ
in their salience for the percipient, and in their frequency of occurrence.
In addition, targets may be regarded as possessing various broad categories of
content (such as semantic content, or emotional content), each of which broad
categories may differ in their salience. The salience of both individual
elements and categories of content may differ from one percipient to another,
depending on individual differences.
Just as free-response targets are complex and varied, so too are the
mentations reported by percipients. Mentations may be in the form of imagery
in any sense.modality, or merely abstract concepts; the may be vivid, bizarre,
fleeting, spontaneous, or have other distinguishing characteristics. Content
of various kinds may be present in them, with varying chance frequencies of
occurrence. Mentation items may relate in a variety of ways to the target
material, such as semantically or by association, and to a greater or lesser
degree. The type of correspondence may vary from percipient to percipient, or
from mentation to mentation, or both. Certain types of mentation, and certain
kinds of target-mentation correspondence may be more likely to carry psi
information than others.
The function of a free-response judge (in process-oriented research at
Approved For Release 2000/08/15: Ci! fRDP96-00792R000701020002-6
leastf pp wedeFaz Re ea&& 20GQ/ 5FSEIA[- Dg6 92bROOOft1 b2' 2 e
probability that psi was responsible for any resemblance between the target
and the mentation (or inversely, the strength of the ESP component on a given
trial). In the complex situation described above, one way of looking at the
task of an ideal judge is that he or she should:
(i) Assign some numerical value in proportion to the degree of
correspondence between a single mentation item and the target (and, in some
types of judging, to the controls);
(ii) increase this value (given a perfect match) in accordance with the
rarity of occurrence of the mentation item's content in the mentation of all
percipients in similar experimental. conditions (or in the mentation of that
particular percipient on other trials, if such data is available);
(iii) increase this value (given a perfect match) in accordance with the
rarity of occurrence of the mentation item's content in the entire
experimental target pool;
(iv) increase this value in accordance with the likelihood that the
mentation item, by virtue of its characteristics, is psi-related (e.g.,
whether it was bizarre, vivid, spontaneous, or whatever characteristics, if
any, are shown by research to mediate ESP)
(v) increase this value in accordance with the salience which the
content of the mentation has for the percipient (e.g. if research shows that
the presence of people in a target is highly salient to a percipient, then a
mentation item bearing on the presence or absence of people would be weighted
relatively heavily);
(vi) increase this value in accordance with the likelihood that the type
of correspondence (semantic, emotional, etc.) between mentation item and
target carries psi-related information, if such differences in likelihood are
indicated by research.
Having thus arrived at a weighted measure of the correspondence between
each mentation item in a trial and the target (and controls if appropriate),
the measures may be summed across the trial or otherwise combined to yield the
ESP score for that trial. Although this procedure resembles an atomistic
judging procedure most closely in its structure, it can also be thought of as
an implicit or idealised basis for holistic or coded judging procedures. In
holistic judging, it is possible to think of the overall rating assigned to
items in the judging set as a sum of individual mentation ratings weighted as
appropriate. In coded judging, the decision of whether a given content
category was present or absent could be regarded as being made according to
the sum of weighted ratings of relevant mentation items. Further weightings
could then be assigned to each decision according to the known salience of the
content category and the rarity of that value of the code in the target pool.
The importance of elements of judging in the literature
Each of the six elements of judging in various forms has received
occasional. attention either implicitly or .overtly in experimental and
theoretical papers, although very little direct or systematic research has
been done on this topic. Most opinion about how best to judge free-response
material seems to be based on anecdotal observations. While such observations
may be unreliable, they may also contain useful information about aspects of
judging which should be investigated empirically. This being so, each of the
six elements of judging is discussed in turn below in the context of
commentary and experimental results in the literature surveyed.
Approved For Release 2000/08/15: CIA-RDP96-00792R000701020002-6
(i) Assignment of a numerical value to correspondence
Ideally, the value assigned to the correspondence between a mentation
item and a target should reflect the correspondence in some objective (and
hence reliable) way. 16 studies reported in 10 papers in the database
surveyed, used atomistic judging, but in no case was interjudge reliability
calculated for the allocation of such ratings. In eight of the studies, each
point on the rating scale was labelled for the use of the judges (e.g., 0 _
"no correspondence"), which practice might be expected to increase interjudge
reliability. The number of points on the rating scale ranged from two to
eleven, with a mean of 4.2, and it is possible that the scales at the low end
of the range may be too constrained to be sensitive, while those at the higher
end require judges to make more fine judgements than is appropriate, and so
may be insensitive in effect because they increase error variance. In this
latter case of large rating range, interjudge reliability may be reduced. The
same may be true of holistic rating scales, which ranged from 4 points to 101,
and which were clearly reported as being labelled in only 14 out of the 52
studies in which a holistic scale was used. The number of items in the
judging set may be a factor in determining the appropriate rating scale; in
the studies surveyed, set size ranged from 2 to 36 items. Any future research
which addresses the issue of the appropriate rating scale in this task could
most usefully do so in the context of active training of judges, with
feedback, in the use of such scales. Boerenkamp (1984) had considerable
success in training eight independent judges to rate each statement made by a
"psychic" about a missing person on a fully-labelled four-point scale of
likelihood that it would apply to anyone in the population. To test the
reliability of the judges' ratings, the judges were randomly assigned to two
groups of four, and the average ratings of each statements were correlated,
yielding correlations ranging from r8 = +0.66 (36df, p