ANOMALY OR ARTIFACT? COMMENTS ON BEM AND HONORTON

Document Type: 
Collection: 
Document Number (FOIA) /ESDN (CREST): 
CIA-RDP96-00789R003200100001-5
Release Decision: 
RIFPUB
Original Classification: 
U
Document Page Count: 
9
Document Creation Date: 
November 4, 2016
Document Release Date: 
October 27, 1998
Sequence Number: 
1
Case Number: 
Content Type: 
BULL
File: 
AttachmentSize
PDF icon CIA-RDP96-00789R003200100001-5.pdf959.09 KB
Body: 
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200100001-5 Piychoingical flulktin 1994, Vol. I I3, No. I. 19-24 copyright 1994 by the American Psychological Amociation, Inc. (033-29O 9/94/f 3.00 Anomaly or Artifact? Comments on Bem and Honorton Ray Hyman Bern and Honorton imply that the I I autoganzfeld experiments demonstrate the existence of Asi- a communications anomaly. They claim that the autoganzfeld results are consistent with previous parapsychological findings and constitute evidence for a replicable psi effect. Although the auto- ganzfeld experiments are methodologically superior to previous parapsychological experiments. the tests of their randomization procedures were inadequate. The autoganzfeld experiments consistently produced positive hit rates, whose combined effect was highly significant. However, these experi- ments produced important inconsistencies with the previous ganzfeld experiments. They also showed a unique pattern in the data that may reflect a systematic artifact. Because of these unique features, we have to wait for independent replications of these experiments before we can conclude that a replicable anomaly or psi has been demonstrated. Bern and Honorton (1994) imply that if psychologists were familiar with the most recent parapsychological research, they would be more willing to accept the possibility that a commu- nications anomaly existed. In particular, Bem and Honorton focus on the experiments that are based on the ganzfeld proce- dure. They "believe that the replication rates and effect sizes achieved by one particular experimental method, the ganzfeld procedure, are now sufficient to warrant bringing this body of data to the attention of the wider psychological community" (Bern & Honorton, 1994, p. 4). They review the debate between Honorton and me over the original ganzfeld experiments. Hy- man (1985) found that these studies suffered from statistical, methodological, and documentation problems. Honorton (1985) responded that these flaws were not sufficient to account for the observed hit rates. Bern and Honorton (1994) review this controversy and cite reviewers who apparently agree with Honorton's position. The implication is that despite the defi- ciencies in the ganzfeld experiments, the results support the ex- istence of psi-a communications anomaly. To Honorton's credit, he initiated a new series of experiments that would be free from the flaws of the earlier ganzfeld database (Honorton et at., 1990). These I I new experiments, called the autoganzfeld studies yielded consistently positive hit rates and a highly significant overall effect. Because these new experiments showed positive results and allegedly were consistent with the earlier ganzfeld database and other psi research, Bern and Hon- orton implied that parapsychology had found its previously elu- sive repeatable experiment. Since the beginnings of psychical research in the mid-nine- teenth century, its investigators have believed that they have sci- entific evidence sufficiently strong to place before the general scientific community. Each generation has tried to get the atten- tion of the scientific community with findings that they claim to be irrefutable. The particular evidence put forth has changed from generation to generation. What a previous generation of parapsychologists considered to be a solid case for psi was aban- doned by later generations in favor of a more current candidate. This shifting database for parapsychology's best case may be why parapsychology still has not achieved the recognition it de- sires from the general scientific community. Now Bern, and Honorton (1994) believe that they have a strong case to put before the psychological community. They admit that the autoganzfeld findings still require independent confirmation. To their credit, they specify the conditions and the required sample size needed to provide adequate power. The informed critic of parapsychology might ask what makes the current situation different from the past claims for psi? Why should we now believe that Honorton and his colleagues have finally found a way to consistently produce evidence for psi? We must wait for future attempts at replication before we have an answer to the question. Bern and Honorton appear con- fident that this time is different. Their review of the ganzfeld and autoganzfeld databases encourages them to believe that consis- tent psi results are within reach. In this commentary, I provide reasons for believing that the autoganzfeld results contain in- consistencies and some unique patterns that raise doubts about their replicability.' Agreements and Differences Although my commentary focuses on my disagreements with Bern and Honorton's (1994) presentation, I would like to briefly specify some points of agreement. The autoganzfeld studies do comply with most of the "stringent standards" (p. 353) spelled out in the joint communique by Hyman and Honorton (1986). I commend Honorton and his colleagues (1990) for creating a protocol that eliminates most of the flaws that plagued the orig- inal ganzfeld experiments. The I I autoganzfeld studies consis- tently yield positive effects that, taken together, are highly sig- nificant. I concur with Bern and Honorton's admission that Correspondence concerning this article should he addressed to Ray Hyman, Department of Psychology, University of Oregon, Eugene, Or- egon 97403. ' Although I take a pessimistic position regarding future replications, I think it is good that Bern and the parapsychologists are optimistic. Such optimism should encourage investigators to attempt replications. These replications will eventually decide the issue. Approved For Release 2000/08/08 : CIA-RDP96-00789R003200100001-5 Approved For Release 2000/08/08 : CIA-RDP96-00789R003200100001-5 "the autoganzfeld studies by themselves cannot satisfy the re- quirement that replications be conducted by a'broader range of investigators' " (p. 13). 1 also support their suggestion that sev- eral parapsychologists pool their resources and plan a large- scale ganzfeld replication in which each laboratory contributes a set of trials to the total pool. So what is there to disagree about? I disagree with Bern and Honorton about how strongly the autoganzfeld studies support the hope for a replicable psi experiment. Where they see consis- tency between the autoganzfeld studies and previous parapsy- chological findings, I see inconsistency. Although I agree that the autoganzfeld studies meet most of the stringent standards that Honorton and I spelled out, I disagree that they meet all of those standards. Our disagreements are a matter of degree. The value of discussing our disagreements is to help clarify what should constitute adequate evidence for the existence of an anomaly. The-existence or nonexistence of psi will not be settled by debate. The existence issue will be settled by independent attempts at replication-at least four of which are currently un- derway (McCrone, 1993). In explaining my disagreements, I point to weaknesses in the autoganzfeld experiments. I want to emphasize that as a single contribution to the ganzfeld database, these are commendable experiments of high quality. But no single experiment or set of studies can be perfect in all respects. When such a series is given the responsibility of carrying a burden beyond its original purposes, then various deficiencies will inevitably become ap- parent. This is the case, I believe, with the autoganzfeld studies. Internal Consistency Within the Autoganzfeld Studies Bem and Honorton describe the autoganzfeld studies as 11 separate experiments conducted by eight different experiment- ers. The hit rates are positive and consistent across the studies and the experimenters. Although this is encouraging, the con- sistency tells us little about potential replicability. Neither the studies nor the experimenters are independent. The studies vary in whether they use naive or experienced subjects. However, the target set, the selection and judging procedures, the laboratory, the setting, and the procedures are identical across studies and experimenters. No experimenter is associated with a single study, nor does an experimenter have independent input into the design and procedure as happens in an independent replica- tion. Indeed, the term experimenter in this context simply refers to a person who plays an already scripted role. Any unique fea- tures of the autoganzfeld procedure-including possible arti- facts-would be the same for all I 1 studies and the eight differ- ent experimenters. Consequently the autoganzfeld studies should be looked on as I large experiment rather than I1 sepa- rate contributions. Consistency With the Original Ganzfeld Database Bern and Honorton claim that "[the autoganzfeld] results are statistically significant and consistent with those in the earlier database" (p. 13). They cite only two reasons to support this claim. The overall effect size or hit rate is approximately the same in the two databases. This apparent agreement in overall effect size is meaningless. The overall effect size in the auto- ganzfeld studies is a composite of two significantly different effect sizes-that for the static targets and that for the dynamic targets. The overall effect size in the ganzfeld data base is an arbitrary composite of heterogeneous effect sizes, contributed in unequal numbers, from different laboratories. The fact that the two composites yield approximately the same effect size is accidental. Both numbers could easily have been larger or smaller, depending on the mix of the arbitrary sources from which they were composed. The dynamic targets yielded a significantly higher hit rate than did the static targets in the autoganzfeld studies. Bem and Honorton argued that this was consistent with the finding that the multiple-image targets (View Master stereoscopic slide reels) in the ganzfi ld database yielded significantly higher hit rates than did the single-image targets. I do not believe that multiple static images on a View Master reel can be equated to the dy- namic moving image on a videoclip. However, I will not argue this point. Clearly the dynamic targets outperform the static targets in the autoganzfeld studies. Even if this is consistent with the ap- parent superiority of the View Master targets over the single- image targets, Bem and Honorton (1994) overlook a serious dis- crepancy. Single-image targets constituted 76% of the 835 ses- sions in the ganzfeld experiments. Their average hit rate was .346. Given this effect size and the 166 trials using static targets, the power or probability of replicating this effect in the auto- ganzfeld experiments was .82. This failure to find a significant effect with the static targets was even more notable given that these experiments were conducted in "the warm social ambi- ence" (p. 14) of Honorton's laboratory. Bem and Honorton acknowledge that the autoganzfeld stud- ies failed to replicate the predicted sender-receiver pairing effect. In the original ganzfeld database, the trials on which the receiver chose a friend as a sender produced a hit rate of .44 compared with a hit rate of only .26 for those trials on which the experimenter assigned a sender. I would emphasize that given this size effect with the 198 trials with friends as senders and 128 with someone else as senders, the power of getting a significant replication of the effect is over .92. Again, given the 'psi conducive' atmosphere of Honorton's laboratory, this fail- ure to get significance is a noteworthy inconsistency. On two key comparisons with the original ganzfeld database, the autoganzfeld fails to replicate even with adequate power. The positive hit rate and overall significance of the autoganzfeld studies are due to an essentially new type of target, presented in a new way. Even if we agree that there is a kinship between the View Master reels of the ganzfeld experiments and the dynamic targets of the autoganzfeld, we cannot ignore the differences be- tween multiple images of a travel scene presented statically with' a slide projector and excerpts from motion pictures presented with their accompanying audio on videocassettes. The prob- lems of selecting, presenting, and controlling such targets pres- ent new challenges. During the judging procedure in the origi- nal ganzfeld experiments, the target and the decoys were dis- played simultaneously. The judging procedure for the autoganzfeld involves presenting the target and its decoys one at a time. Because the positive hit rate and significance are due to an essentially new type of target presented in a new way, the need for independent replication is especially urgent. Approved For Release 2000/08/08 : CIA-RDP96-00789R003200100001-5 Approved For Release 2000/08/08 : CIA-RDP96-00789R003200100001-5 Consistency With Previous Parapsychological Findings Bern and Honorton (1994) also claimed that "there are reli- able relationships between successful psi performance and con- ceptually relevant experimental and subject variables, relation- ships that also replicate previous findings" (p. 13). They point to three such "replications." One is a small, but statistically sig- nificant, correlation of. 18 between a measure of extroversion and "psi performance" This is consistent with a tendency found in previous psi studies. Second, they report the strong psi per- formance of the Julliard students that they see as consistent with psi studies that found a relation between psi abilities and cre- ative and artistic abilities. This latter replication is not so im- pressive when one considers that only 20 students were involved and that their performance was not significantly different from the other participants in the two studies in which they partici- pated (Fisher's exact p = .262, two-tailed). In addition, as I point out below, the Julliard students were exposed to just those conditions that favored high hit rates-targets that were re- peated, a preponderance of dynamic targets, and active prompt- ing by the experimenter during judging. Thus, it is unclear whether their high hit rate was a function of their creativity or a function of the special targets and conditions with which they happened to be associated. The third correlate could not be demonstrated for the auto- ganzfeld studies. Bern and Honorton (1994) pointed out that the subjects in the autoganzfeld tended to believe in psi, re- ported psychic experiences, and had practiced meditation or related techniques. These variables were previously reported as correlates of psi. However, I do not see how they can claim that these attributes of their subject population are a replication of previous findings. They report no correlations between these variables and performance in the outoganzfeld studies. Indeed. they cannot report any correlation because they did not have subjects who lacked these properties. We do not know if nonbe- lievers and people without psychic experiences would have per- formed better or worse than the actual subjects. In other words, they can justify only one of the correlates that they use to claim consistency with previous psi studies. Even here the relationship is weak and is just one of many previously reported correlates that might have been found. At one time, for example, parapsychologists claimed that the decline effect was a pervasive and characteristic property of psi. However. when no decline effect is found in it parapsychological study, it does not deter the experimenter from pointing to some. other significant departure from chance as evidence for psi. Note that in the autoganzfeld studies, there is no decline effect. Randomization and Claims of Psi As I already stated, I agree that the autoganzfeld studies meet most of the requirements that Honorton and I specified in our joint communique (Hyman & Honorton, 1986). One surprising exception is the inadequate testing of the randomization proce- dures. The issue of randomization was central to the debate con- cerning the original ganzfeld findings (Hyman, 1985). Adequate randomization procedures are critical for parapsychological re- search because the evidence for psi is based on a low probability value for a departure from a chance baseline. Such probability values have meaning within an idealized statistical model of the experimental situation. Whether this statistical model applies to a given situation is an empirical matter that must be adequately justified if the stated significance levels are to be taken seriously. Appropriate randomization procedures are one way to help en- sure that the statistical model applies to the experimental data. With respect to the autoganzfeld studies, this would entail se- lecting the targets for each trial and ordering the target and de- coys during judging in a demonstrably random manner. In ad- dition, following the practice of a few past researchers, the para- psychologist can also provide some post hoc analyses to show that the distributions of targets and judging orders are consis- tent with the underlying probability model. Unfortunately, the autoganzfeld studies fell short on this crit- ical requirement. The tests for adequacy of randomization were confined to showing a uniform distribution of outputs from I to 160 for target selection and a uniform distribution of the per- mutations of all possible orderings during the judging proce- dure. Emitting a uniform distribution of target choices is a nec- essary but hardly sufficient requirement for an adequate ran- dom generator. These randomization procedures are critical because we can expect strong systematic biases during the judging procedure. The fact that the items to be judged have to be presented se- quentially, when combined with what we know about subjective validation (Marks & Kammann, 1980), would lead us to expect a strong tendency to select the first or second items during the judging series. We would also expect strong response biases within each target pool. Bern and Honorton show such a bias in the target pool used for Study 302. Both these biases may be strengthened by the fact that the experimenter interacts with the receiverduring the judging process. Although most receivers participate in one session, each experimenter participates in several. The response biases of the experimenters can play an important role, especially in those studies in which the experi- menter deliberately prompts the receiver to choose a particular item during the judging. Such active prompting occurred in 6 of the I I studies (Honorton et al., 1990).2 If the randomizing of the selection of targets and of the order- ing of items during judging is adequate, such response biases should not affect the validity of the statistical tests. One way to prevent response biases from distorting the hit rate is to use a randomizing procedure that makes sure that each item within a target pool occurs equally ollen, 'rhe simple randomizing pro- cedure used in the autoganzfeld studies would guarantee that each target occurred an equal proportion (not number) of times only in the very long run. In any finite number of trials, the individual targets would occur with varying frequencies. Again, if the randomization was adequate, this inequality of occur- rence would not bias the hit rate. The items in some target pools that occurred most frequently would be those that were favored 2 One referee suggested that I make it clear that I am not claiming that sensory leakage occurred because of experimenter prompting. I agree. The experimenter, according to the protocol, was ignorant of which member of the target pool was the target during the judging pro- cedure. The point is that by actively helping the subject to rate the mem- bers of the target pool, the experimenter let his or her own subjective biases enter the selection procedure. Approved For Release 2000/08/08 : CIA-RDP96-00789R003200100001-5 Approved For Release 2000/08/08 : CIA-RDP96-00789R003200100001-5 Table 1 Hit Rate as a Function ofthe Frequency ofOccurrence o/'Targets Hits 12 25 16 20 19 4 4 6 106 Misses 36 65 26 48 36 8 3 2 224 n 48 90 42 68 55 12 7 8 330 Hit rate .250 .278 .381 .294 .346 .333 . 571 .750 .321 by the response bias. This would bias the hit rate upward. The items in other target pools, however, that occurred most fre- quently would be those that were avoided by the response bias. This would bias the hit rate downward. With adequate random- ization, these two tendencies would balance each other. Achieving adequate randomization is not easy. Much can go wrong-as some parapsychologists, among others, have shown. This is why it is disappointing that the autoganzfeld studies did not show the same concern for randomizing that they showed for other aspects of the methodology. This is also why, in my role of devil's advocate, l was interested in directly checking the actual distribution of target positions among the decoys during judging. Daryl Bern kindly agreed to supply me with this infor- mation along with other data from the autoganzfeld database. Unfortunately, the variable labeled position on the data sheet turned out to be the original position of the target in its target pool rather than its position during judging. This latter infor- mation was unavailable to either Bern or me at the time of this writing. Hit Rate and Target Frequency Because I could not directly check the adequacy of the ran- domization procedures, I tried to find some indirect indicators. If randomizing was inadequate and targets occurred with vary- ing frequency, possible biases might show up as differential hit rates for targets occurring with various frequencies. For exam- ple, if targets favored by response biases were also favored by a deficient target selection procedure, then we would find a posi- tive correlation between hit rate and target frequency. It would be possible, of course, for a deficient randomization procedure to yield a negative correlation. To see if actual repetitions of targets had any observable consequences, I tabulated the pro- portion of hits as a function of how many times a target oc- curred in this database.' As Table I shows, the relation between hit rate and target frequency was strong. The test for a linear trend among the pro- portions (Snedecor& Cochran, 1967, pp. 246-248) was positive and significant, (z = 2.49, p = .0 13, two-tailed). An indication of the strength of this trend is given by the Spearman rank order correlation between the hit rate and target frequency, which was .83. Another way to look at this relationship would be to com- pare the hit rate of targets that occurred once or twice (.27) with those that appeared three or more times (.36). This pattern exists separately for the static and dynamic targets, although it is stronger among the dynamic targets. The static targets that occurred once or twice had a hit rate of .22 compared with a hit rate of .31 for those that occurred more than twice. The hit rate was .32 for those dynamic targets that occurred once or twice as compared with a hit rate of .41 for those that occurred three or more times. Target Occurrence and Experimenter Prompting What accounts for this peculiar relationship? Is the correla- tion between target frequency and hit rate determined by which particular targets get repeated`? Or does replication itself some- how increase the hit rate? If the relation is dye to response bi- ases, I would expect experimenter prompting to affect the later occurrences of targets rather than their first occurrences. With these questions in mind, I conducted a multinomial analysis of variance (Woodward, Bonett, & Brecht, 1990). In this analysis, hit rate was the dependent variable, and 3 two-level factors were the independent variables: target type (static, dynamic), target occurrence (first, later), and experimenter prompting (no, yes). Of the interactions, only that between target occurrence and experimenter prompting was significant, x2(1, N = 330) = 6.83, p = .009. The two significant main effects were target type, x2( I, N = 330) = 4.76, p = .030, and target occurrence, x2(l, N = 330) = 11.56,p