ANOMALY OR ARTIFACT? COMMENTS ON BEM AND HONORTON
Document Type:
Collection:
Document Number (FOIA) /ESDN (CREST):
CIA-RDP96-00789R003200100001-5
Release Decision:
RIFPUB
Original Classification:
U
Document Page Count:
9
Document Creation Date:
November 4, 2016
Document Release Date:
October 27, 1998
Sequence Number:
1
Case Number:
Content Type:
BULL
File:
Attachment | Size |
---|---|
CIA-RDP96-00789R003200100001-5.pdf | 959.09 KB |
Body:
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200100001-5
Piychoingical flulktin
1994, Vol. I I3, No. I. 19-24
copyright 1994 by the American Psychological Amociation, Inc.
(033-29O 9/94/f 3.00
Anomaly or Artifact? Comments on Bem and Honorton
Ray Hyman
Bern and Honorton imply that the I I autoganzfeld experiments demonstrate the existence of Asi-
a communications anomaly. They claim that the autoganzfeld results are consistent with previous
parapsychological findings and constitute evidence for a replicable psi effect. Although the auto-
ganzfeld experiments are methodologically superior to previous parapsychological experiments. the
tests of their randomization procedures were inadequate. The autoganzfeld experiments consistently
produced positive hit rates, whose combined effect was highly significant. However, these experi-
ments produced important inconsistencies with the previous ganzfeld experiments. They also
showed a unique pattern in the data that may reflect a systematic artifact. Because of these unique
features, we have to wait for independent replications of these experiments before we can conclude
that a replicable anomaly or psi has been demonstrated.
Bern and Honorton (1994) imply that if psychologists were
familiar with the most recent parapsychological research, they
would be more willing to accept the possibility that a commu-
nications anomaly existed. In particular, Bem and Honorton
focus on the experiments that are based on the ganzfeld proce-
dure. They "believe that the replication rates and effect sizes
achieved by one particular experimental method, the ganzfeld
procedure, are now sufficient to warrant bringing this body of
data to the attention of the wider psychological community"
(Bern & Honorton, 1994, p. 4). They review the debate between
Honorton and me over the original ganzfeld experiments. Hy-
man (1985) found that these studies suffered from statistical,
methodological, and documentation problems. Honorton
(1985) responded that these flaws were not sufficient to account
for the observed hit rates. Bern and Honorton (1994) review
this controversy and cite reviewers who apparently agree with
Honorton's position. The implication is that despite the defi-
ciencies in the ganzfeld experiments, the results support the ex-
istence of psi-a communications anomaly.
To Honorton's credit, he initiated a new series of experiments
that would be free from the flaws of the earlier ganzfeld database
(Honorton et at., 1990). These I I new experiments, called the
autoganzfeld studies yielded consistently positive hit rates and a
highly significant overall effect. Because these new experiments
showed positive results and allegedly were consistent with the
earlier ganzfeld database and other psi research, Bern and Hon-
orton implied that parapsychology had found its previously elu-
sive repeatable experiment.
Since the beginnings of psychical research in the mid-nine-
teenth century, its investigators have believed that they have sci-
entific evidence sufficiently strong to place before the general
scientific community. Each generation has tried to get the atten-
tion of the scientific community with findings that they claim
to be irrefutable. The particular evidence put forth has changed
from generation to generation. What a previous generation of
parapsychologists considered to be a solid case for psi was aban-
doned by later generations in favor of a more current candidate.
This shifting database for parapsychology's best case may be
why parapsychology still has not achieved the recognition it de-
sires from the general scientific community.
Now Bern, and Honorton (1994) believe that they have a
strong case to put before the psychological community. They
admit that the autoganzfeld findings still require independent
confirmation. To their credit, they specify the conditions and
the required sample size needed to provide adequate power. The
informed critic of parapsychology might ask what makes the
current situation different from the past claims for psi? Why
should we now believe that Honorton and his colleagues have
finally found a way to consistently produce evidence for psi?
We must wait for future attempts at replication before we
have an answer to the question. Bern and Honorton appear con-
fident that this time is different. Their review of the ganzfeld and
autoganzfeld databases encourages them to believe that consis-
tent psi results are within reach. In this commentary, I provide
reasons for believing that the autoganzfeld results contain in-
consistencies and some unique patterns that raise doubts about
their replicability.'
Agreements and Differences
Although my commentary focuses on my disagreements with
Bern and Honorton's (1994) presentation, I would like to briefly
specify some points of agreement. The autoganzfeld studies do
comply with most of the "stringent standards" (p. 353) spelled
out in the joint communique by Hyman and Honorton (1986).
I commend Honorton and his colleagues (1990) for creating a
protocol that eliminates most of the flaws that plagued the orig-
inal ganzfeld experiments. The I I autoganzfeld studies consis-
tently yield positive effects that, taken together, are highly sig-
nificant. I concur with Bern and Honorton's admission that
Correspondence concerning this article should he addressed to Ray
Hyman, Department of Psychology, University of Oregon, Eugene, Or-
egon 97403.
' Although I take a pessimistic position regarding future replications,
I think it is good that Bern and the parapsychologists are optimistic.
Such optimism should encourage investigators to attempt replications.
These replications will eventually decide the issue.
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200100001-5
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200100001-5
"the autoganzfeld studies by themselves cannot satisfy the re-
quirement that replications be conducted by a'broader range of
investigators' " (p. 13). 1 also support their suggestion that sev-
eral parapsychologists pool their resources and plan a large-
scale ganzfeld replication in which each laboratory contributes
a set of trials to the total pool.
So what is there to disagree about? I disagree with Bern and
Honorton about how strongly the autoganzfeld studies support
the hope for a replicable psi experiment. Where they see consis-
tency between the autoganzfeld studies and previous parapsy-
chological findings, I see inconsistency. Although I agree that
the autoganzfeld studies meet most of the stringent standards
that Honorton and I spelled out, I disagree that they meet all of
those standards. Our disagreements are a matter of degree. The
value of discussing our disagreements is to help clarify what
should constitute adequate evidence for the existence of an
anomaly. The-existence or nonexistence of psi will not be settled
by debate. The existence issue will be settled by independent
attempts at replication-at least four of which are currently un-
derway (McCrone, 1993).
In explaining my disagreements, I point to weaknesses in the
autoganzfeld experiments. I want to emphasize that as a single
contribution to the ganzfeld database, these are commendable
experiments of high quality. But no single experiment or set of
studies can be perfect in all respects. When such a series is given
the responsibility of carrying a burden beyond its original
purposes, then various deficiencies will inevitably become ap-
parent. This is the case, I believe, with the autoganzfeld studies.
Internal Consistency Within the Autoganzfeld Studies
Bem and Honorton describe the autoganzfeld studies as 11
separate experiments conducted by eight different experiment-
ers. The hit rates are positive and consistent across the studies
and the experimenters. Although this is encouraging, the con-
sistency tells us little about potential replicability. Neither the
studies nor the experimenters are independent. The studies vary
in whether they use naive or experienced subjects. However, the
target set, the selection and judging procedures, the laboratory,
the setting, and the procedures are identical across studies and
experimenters. No experimenter is associated with a single
study, nor does an experimenter have independent input into
the design and procedure as happens in an independent replica-
tion. Indeed, the term experimenter in this context simply refers
to a person who plays an already scripted role. Any unique fea-
tures of the autoganzfeld procedure-including possible arti-
facts-would be the same for all I 1 studies and the eight differ-
ent experimenters. Consequently the autoganzfeld studies
should be looked on as I large experiment rather than I1 sepa-
rate contributions.
Consistency With the Original Ganzfeld Database
Bern and Honorton claim that "[the autoganzfeld] results are
statistically significant and consistent with those in the earlier
database" (p. 13). They cite only two reasons to support this
claim. The overall effect size or hit rate is approximately the
same in the two databases. This apparent agreement in overall
effect size is meaningless. The overall effect size in the auto-
ganzfeld studies is a composite of two significantly different
effect sizes-that for the static targets and that for the dynamic
targets. The overall effect size in the ganzfeld data base is an
arbitrary composite of heterogeneous effect sizes, contributed
in unequal numbers, from different laboratories. The fact that
the two composites yield approximately the same effect size is
accidental. Both numbers could easily have been larger or
smaller, depending on the mix of the arbitrary sources from
which they were composed.
The dynamic targets yielded a significantly higher hit rate
than did the static targets in the autoganzfeld studies. Bem and
Honorton argued that this was consistent with the finding that
the multiple-image targets (View Master stereoscopic slide reels)
in the ganzfi ld database yielded significantly higher hit rates
than did the single-image targets. I do not believe that multiple
static images on a View Master reel can be equated to the dy-
namic moving image on a videoclip. However, I will not argue
this point.
Clearly the dynamic targets outperform the static targets in
the autoganzfeld studies. Even if this is consistent with the ap-
parent superiority of the View Master targets over the single-
image targets, Bem and Honorton (1994) overlook a serious dis-
crepancy. Single-image targets constituted 76% of the 835 ses-
sions in the ganzfeld experiments. Their average hit rate was
.346. Given this effect size and the 166 trials using static targets,
the power or probability of replicating this effect in the auto-
ganzfeld experiments was .82. This failure to find a significant
effect with the static targets was even more notable given that
these experiments were conducted in "the warm social ambi-
ence" (p. 14) of Honorton's laboratory.
Bem and Honorton acknowledge that the autoganzfeld stud-
ies failed to replicate the predicted sender-receiver pairing
effect. In the original ganzfeld database, the trials on which the
receiver chose a friend as a sender produced a hit rate of .44
compared with a hit rate of only .26 for those trials on which
the experimenter assigned a sender. I would emphasize that
given this size effect with the 198 trials with friends as senders
and 128 with someone else as senders, the power of getting a
significant replication of the effect is over .92. Again, given the
'psi conducive' atmosphere of Honorton's laboratory, this fail-
ure to get significance is a noteworthy inconsistency.
On two key comparisons with the original ganzfeld database,
the autoganzfeld fails to replicate even with adequate power.
The positive hit rate and overall significance of the autoganzfeld
studies are due to an essentially new type of target, presented in
a new way. Even if we agree that there is a kinship between the
View Master reels of the ganzfeld experiments and the dynamic
targets of the autoganzfeld, we cannot ignore the differences be-
tween multiple images of a travel scene presented statically with'
a slide projector and excerpts from motion pictures presented
with their accompanying audio on videocassettes. The prob-
lems of selecting, presenting, and controlling such targets pres-
ent new challenges. During the judging procedure in the origi-
nal ganzfeld experiments, the target and the decoys were dis-
played simultaneously. The judging procedure for the
autoganzfeld involves presenting the target and its decoys one
at a time. Because the positive hit rate and significance are due
to an essentially new type of target presented in a new way, the
need for independent replication is especially urgent.
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200100001-5
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200100001-5
Consistency With Previous Parapsychological Findings
Bern and Honorton (1994) also claimed that "there are reli-
able relationships between successful psi performance and con-
ceptually relevant experimental and subject variables, relation-
ships that also replicate previous findings" (p. 13). They point
to three such "replications." One is a small, but statistically sig-
nificant, correlation of. 18 between a measure of extroversion
and "psi performance" This is consistent with a tendency found
in previous psi studies. Second, they report the strong psi per-
formance of the Julliard students that they see as consistent with
psi studies that found a relation between psi abilities and cre-
ative and artistic abilities. This latter replication is not so im-
pressive when one considers that only 20 students were involved
and that their performance was not significantly different from
the other participants in the two studies in which they partici-
pated (Fisher's exact p = .262, two-tailed). In addition, as I
point out below, the Julliard students were exposed to just those
conditions that favored high hit rates-targets that were re-
peated, a preponderance of dynamic targets, and active prompt-
ing by the experimenter during judging. Thus, it is unclear
whether their high hit rate was a function of their creativity or a
function of the special targets and conditions with which they
happened to be associated.
The third correlate could not be demonstrated for the auto-
ganzfeld studies. Bern and Honorton (1994) pointed out that
the subjects in the autoganzfeld tended to believe in psi, re-
ported psychic experiences, and had practiced meditation or
related techniques. These variables were previously reported as
correlates of psi. However, I do not see how they can claim that
these attributes of their subject population are a replication of
previous findings. They report no correlations between these
variables and performance in the outoganzfeld studies. Indeed.
they cannot report any correlation because they did not have
subjects who lacked these properties. We do not know if nonbe-
lievers and people without psychic experiences would have per-
formed better or worse than the actual subjects.
In other words, they can justify only one of the correlates that
they use to claim consistency with previous psi studies. Even
here the relationship is weak and is just one of many previously
reported correlates that might have been found. At one time,
for example, parapsychologists claimed that the decline effect
was a pervasive and characteristic property of psi. However.
when no decline effect is found in it parapsychological study, it
does not deter the experimenter from pointing to some. other
significant departure from chance as evidence for psi. Note that
in the autoganzfeld studies, there is no decline effect.
Randomization and Claims of Psi
As I already stated, I agree that the autoganzfeld studies meet
most of the requirements that Honorton and I specified in our
joint communique (Hyman & Honorton, 1986). One surprising
exception is the inadequate testing of the randomization proce-
dures. The issue of randomization was central to the debate con-
cerning the original ganzfeld findings (Hyman, 1985). Adequate
randomization procedures are critical for parapsychological re-
search because the evidence for psi is based on a low probability
value for a departure from a chance baseline. Such probability
values have meaning within an idealized statistical model of the
experimental situation. Whether this statistical model applies to
a given situation is an empirical matter that must be adequately
justified if the stated significance levels are to be taken seriously.
Appropriate randomization procedures are one way to help en-
sure that the statistical model applies to the experimental data.
With respect to the autoganzfeld studies, this would entail se-
lecting the targets for each trial and ordering the target and de-
coys during judging in a demonstrably random manner. In ad-
dition, following the practice of a few past researchers, the para-
psychologist can also provide some post hoc analyses to show
that the distributions of targets and judging orders are consis-
tent with the underlying probability model.
Unfortunately, the autoganzfeld studies fell short on this crit-
ical requirement. The tests for adequacy of randomization were
confined to showing a uniform distribution of outputs from I
to 160 for target selection and a uniform distribution of the per-
mutations of all possible orderings during the judging proce-
dure. Emitting a uniform distribution of target choices is a nec-
essary but hardly sufficient requirement for an adequate ran-
dom generator.
These randomization procedures are critical because we can
expect strong systematic biases during the judging procedure.
The fact that the items to be judged have to be presented se-
quentially, when combined with what we know about subjective
validation (Marks & Kammann, 1980), would lead us to expect
a strong tendency to select the first or second items during the
judging series. We would also expect strong response biases
within each target pool. Bern and Honorton show such a bias in
the target pool used for Study 302. Both these biases may be
strengthened by the fact that the experimenter interacts with
the receiverduring the judging process. Although most receivers
participate in one session, each experimenter participates in
several. The response biases of the experimenters can play an
important role, especially in those studies in which the experi-
menter deliberately prompts the receiver to choose a particular
item during the judging. Such active prompting occurred in 6
of the I I studies (Honorton et al., 1990).2
If the randomizing of the selection of targets and of the order-
ing of items during judging is adequate, such response biases
should not affect the validity of the statistical tests. One way to
prevent response biases from distorting the hit rate is to use a
randomizing procedure that makes sure that each item within
a target pool occurs equally ollen, 'rhe simple randomizing pro-
cedure used in the autoganzfeld studies would guarantee that
each target occurred an equal proportion (not number) of times
only in the very long run. In any finite number of trials, the
individual targets would occur with varying frequencies. Again,
if the randomization was adequate, this inequality of occur-
rence would not bias the hit rate. The items in some target pools
that occurred most frequently would be those that were favored
2 One referee suggested that I make it clear that I am not claiming
that sensory leakage occurred because of experimenter prompting. I
agree. The experimenter, according to the protocol, was ignorant of
which member of the target pool was the target during the judging pro-
cedure. The point is that by actively helping the subject to rate the mem-
bers of the target pool, the experimenter let his or her own subjective
biases enter the selection procedure.
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200100001-5
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200100001-5
Table 1
Hit Rate as a Function ofthe Frequency ofOccurrence o/'Targets
Hits
12
25
16
20
19
4
4
6
106
Misses
36
65
26
48
36
8
3
2
224
n
48
90
42
68
55
12
7
8
330
Hit rate
.250
.278
.381
.294
.346
.333 .
571
.750
.321
by the response bias. This would bias the hit rate upward. The
items in other target pools, however, that occurred most fre-
quently would be those that were avoided by the response bias.
This would bias the hit rate downward. With adequate random-
ization, these two tendencies would balance each other.
Achieving adequate randomization is not easy. Much can go
wrong-as some parapsychologists, among others, have shown.
This is why it is disappointing that the autoganzfeld studies did
not show the same concern for randomizing that they showed
for other aspects of the methodology. This is also why, in my
role of devil's advocate, l was interested in directly checking the
actual distribution of target positions among the decoys during
judging. Daryl Bern kindly agreed to supply me with this infor-
mation along with other data from the autoganzfeld database.
Unfortunately, the variable labeled position on the data sheet
turned out to be the original position of the target in its target
pool rather than its position during judging. This latter infor-
mation was unavailable to either Bern or me at the time of this
writing.
Hit Rate and Target Frequency
Because I could not directly check the adequacy of the ran-
domization procedures, I tried to find some indirect indicators.
If randomizing was inadequate and targets occurred with vary-
ing frequency, possible biases might show up as differential hit
rates for targets occurring with various frequencies. For exam-
ple, if targets favored by response biases were also favored by a
deficient target selection procedure, then we would find a posi-
tive correlation between hit rate and target frequency. It would
be possible, of course, for a deficient randomization procedure
to yield a negative correlation. To see if actual repetitions of
targets had any observable consequences, I tabulated the pro-
portion of hits as a function of how many times a target oc-
curred in this database.'
As Table I shows, the relation between hit rate and target
frequency was strong. The test for a linear trend among the pro-
portions (Snedecor& Cochran, 1967, pp. 246-248) was positive
and significant, (z = 2.49, p = .0 13, two-tailed). An indication
of the strength of this trend is given by the Spearman rank order
correlation between the hit rate and target frequency, which was
.83. Another way to look at this relationship would be to com-
pare the hit rate of targets that occurred once or twice (.27) with
those that appeared three or more times (.36).
This pattern exists separately for the static and dynamic
targets, although it is stronger among the dynamic targets. The
static targets that occurred once or twice had a hit rate of .22
compared with a hit rate of .31 for those that occurred more
than twice. The hit rate was .32 for those dynamic targets that
occurred once or twice as compared with a hit rate of .41 for
those that occurred three or more times.
Target Occurrence and Experimenter Prompting
What accounts for this peculiar relationship? Is the correla-
tion between target frequency and hit rate determined by which
particular targets get repeated`? Or does replication itself some-
how increase the hit rate? If the relation is dye to response bi-
ases, I would expect experimenter prompting to affect the later
occurrences of targets rather than their first occurrences. With
these questions in mind, I conducted a multinomial analysis of
variance (Woodward, Bonett, & Brecht, 1990). In this analysis,
hit rate was the dependent variable, and 3 two-level factors were
the independent variables: target type (static, dynamic), target
occurrence (first, later), and experimenter prompting (no, yes).
Of the interactions, only that between target occurrence and
experimenter prompting was significant, x2(1, N = 330) = 6.83,
p = .009. The two significant main effects were target type, x2( I,
N = 330) = 4.76, p = .030, and target occurrence, x2(l, N =
330) = 11.56,p