ANOMALOUS MENTAL PHENOMENA
Document Type:
Collection:
Document Number (FOIA) /ESDN (CREST):
CIA-RDP96-00789R003100030001-4
Release Decision:
RIFPUB
Original Classification:
U
Document Page Count:
439
Document Creation Date:
November 4, 2016
Document Release Date:
January 22, 2003
Sequence Number:
1
Case Number:
Publication Date:
June 24, 1991
Content Type:
RP
File:
Attachment | Size |
---|---|
CIA-RDP96-00789R003100030001-4.pdf | 34.76 MB |
Body:
Approved For Release 2003/04/18 : CIA-RDP96-00789R003160d?gCR917-4406-10
Anomalous Mental Phenomena:
Selected Papers
Compiled By:
The Cognitive Sciences Laboratory
24 June 1991
Science Applications International Corporation
An Employee-Owned Company
5150 El Carnino Real, Suite B-31, Los Altos, California 94022 (415) 960-5910
Other SAIC Offices: Atoredvgatop aitovelatcgdanotritig tfetujmyrymoo ?elm 6sitostmem 249o, Seattle, Tucson
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
I INTRODUCTION
In this volume, we present a selected set of papers on, and/or in support of, anomalous mental phenom-
ena. No section could possibly be complete; however, we have chosen papers that are representative of
their particular sections. The sections, which are separated by blue sheets, are as follows:
Stclian
I Introduction
II Meta-analyses of Anomalous Mental Phenomena
Number of Papers
8
III Main-stream Publications
7
IV Anomalous-mental-phenomena Journal Publications
6
V Magnetoencephalography
3
VI Physics
6
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
II META-ANALYSES OF ANOMALOUS MENTAL
PHENOMENA
As in all behavioral sciences, replication of experiments in anomalous mental phenomena (AMP) is
critical before any putative effects can be verified as part of nature. Because of the complex nature of
most behavioral experiments, drawing conclusions from a body of similar experiments has been prob-
lematical. Meta-analysis, however, is a relatively new statistical approach that has been specifically de-
signed to address the particular difficulties inherent in the behavioral sciences.
The papers in this section have been selected because they represent all such analyses of a substantial
portion of the published AMP literature to date. Through replication and meta-analysis, the general
scientific community will have tools with which to judge the claims of the AMP literature.
The number that appears in the upper right?hand corner of the first page for each publication is keyed
to the following descriptions:
1. Utts, J., "Successful Replication Versus Statistical Significance," Journal of Parapsychology, Vol.
52, pp. 305-320, (December, 1988). By defining, in statistical terms, the meaning of replication
for few?a effects, Utts, a Professor of Statistics from the University of California at Davis, sets the
statistical basis for meta-analysis.
2. Honorton, C., "Error Some Place!" Journal of Communication, pp. 103-116, (Winter, 1975). This
paper predates the development of formal meta-analysis, but Honorton provides a critical review
of all the ESP card-guessing experiments from 1934 to 1939. The paper includes a description of
the claims and counter-claims surrounding the controversy of the day.
3. Honorton, C. and Ferrari, D. C., 'Future telling:' A meta-Analysis of Forced-Choise Precognition
Experiments, 1935-1987," Journal of Parapsychology, Vol. 53, pp. 282-308, (December, 1989).
Using the full complement of meta-analytical tools, Honorton provides a critical review of all the
ESP experiments during which the target material (i.e., usually ESP cards) is generated after the
guess has been recorded.
4. Honorton, C., Berger, R. E., Varvoglis, M. R, Quant, M., Derr, P, Schechter, E., I., and Ferrari, D.
C., "Psi Communication in the Ganzfeld," Journal of Parapsychology, Vol. 54, pp. 99-137, (June,
1990). This paper provides a meta-analysis of Ganzfeld experiments (i.e., a form of anomalous
cognition). The database is comprised of 11 series for a total of 355 individual trials.
5. Radin, D. I. and Nelson, R. D., "Evidence for Consciousness-Related Anomalies in Random
Physical Systems," Foundations of Physics, Vol. 19, No. 12, pp. 1499-1514, (December, 1989).
Radin and Nelson analyze over 800 experiments that claim evidence for mental human-machine
interactions (i.e., anomalous perturbation). After a careful analysis, which includes accounting
for experiment flaws, they conclude that there is substantial statistical evidence to support the
claim.
6. Honorton, C., Ferrari, D. C., and Bem, D. J., "Extraversion and ESP Performance: Meta-Analysis
and a New Confirmation," Proceedings of the Parapsychological Association 33rd Annual
Convention, Chevy Chase, MD, (August, 1990). In an important link to traditional psychological
experimentation, this paper provides a meta-analysis for the correlation of ESP performance and
a traditional personality variable, extraversion.
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
7. Rosenthal, R., "Meta-Analytic Procedures and the Nature of Replication: The Ganzfeld Debate,"
Journal of Parapsychology, Vol. 50, pp. 319-336, (December, 1986) Rosenthal, a professor of
psychology at Harvard University, is one of the early developers of the meta-analysis techniques.
In this paper, he comments about the Garafeld controversy.
8. Utts, J., "Replication and Meta-Analysis in Parapsychology," Accepted for publication in
Statistical Sciences. In this paper, Utts, provides an independent aid objective overview of the
AMP meta-analyses that follow.
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Journal of Porapychology, Vol. 52, December 1988
SUCCESSFUL REPLICATION VERSUS
STATISTICAL SIGNIFICANCE
BY JESSICA UTTS
ABSTRACT: The aim of this paper is to show that successful replication in para-
psychology should not be equated with the achievement of statistical significance,:
whether at the .05 or at any other level. The p value from a hypothesis test is
closely related to the size of the sample used for the test; so a definition of suc-
cessful replication based on a specific p value favors studies done with large sam-
ples. Many "nonsignificant" studies may simply be ones for which the sample size
was not large enough to detect the small magnitude effect that was operating. Con-
versely, "significant" studies may result froin a small but conceptually insignificant
bias, magnified by a very large sample.
The paper traces the history of the definition of statistical significance in para-
psychology and then outlines the problems with using hypothesis-testing results to
define successful replications, especially when applied in a cooklatiok fashion. Fi-
nally, suggestions are given for alternative approaches to looking at experimental
data. These include calculating statistical power before doing an experiment, using
estimation instead of, or In conjunction with, hypothesis testing, and implementing
some of the ideas from Bayesian statistics.
Replication is a major issue in parapsychology. Arguments about
whether a given research paradigm has been successful tend to fo-
cus on what the replication rate has been. For exainplc, the recent
review of parapsychology by the National Research Council includes
statements such as "...of these 188 [RNC] experiments with some
claim to scientific status, 58 reported statistically significant results
(compared with the 9 or 10 experiments that would be expected by
chance)" (Druckman gc Swets, 1988, p. 185). In each section,. the
report critically evaluates "significant" experiments and ignores
"nonsignificant" experiments. The extent to which nonsignificant
experiments are ignored is exemplified by the following oversight,
in which the tqtal number of studies is equated with the number of
"successful" studies: "Of the thirteen scientifically reported experi-
ments [of remote viewing], nine are classified as successful in their
outcomes by Hansen et al.... As it turns out, all but one of the nine
scientifically reported studies of remote viewing suffer from the flaw
of sensory cueing" (p. 183, emphasis added). Apparently the au-
thors decided that the four experiments that did not attain a p value
of .05 or less did not even warrant acknowledgment.
Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4
P-1?000?0004?00t168/00-96dati-VI3 914170/C00Z aseeieu JOd 130A0iddV
306 The JoUrnal of Parapyith01.00
The practice of defining a successful replication as :in .experi-
ment that attains a p value of .05 or less is common in parapsychol-
ogy, psychology, and some other disciplines that use statistics. How-
ever, like many other conventions in science, it is based on a series
of historical events rather than on rational thought. In this paper, I
will trace some of the history leading to this definition of a "suc-
cessful" experiment, outline some problems with this approach, and
suggest some methods that parapsychologists should consider in ad-
dition to the usual hypothesis-testing regimen. Rao (1984) and Hon-
orton (1984) have discussed similar problems and solutions in the
context of psi experiments.
HISTORY
It has not always been the case among parapsychologists that an
experiment was deemed successful if it reached a significance level
of p = .05. In 1917, John Edgar Coover, who was the Thomas Wel-
ton Stanford Psychical Research Fellow at Stanford University from
1912 to 193-7, published a book with the results from several exper-
iments he had conducted up to that time (Coover, 1917/1975). Al-
though hypothesis testing as we know it today had not yet been for-
malized, he essentially conducted tests on many facets of this data
and found no evidence for psi that was convincing to him. His con-
clusions regarding these results are typified by an example he gave
in which the hit rate for 518 trials was 30.1%, when 25% was ex-
pected by chance (exact p value = .00476):
We get 0.9938 [p-value = 1 ? 0.9938 = 0.0062] for the probability that
chance deviations will not exceed this limit [of 30,1 percent]....Since
this value, then, lies within the field of chance deviation, although the
probability of its occurrence by chance is fairly low, it cannot be ac-
cepted as a decisive indication of some cause beyond chance which op-
erated in favor of success in guessing. (p. 82)
He then revealed what level of evidence would convince him that
nonchance factors were operating: "...if we meet the requirement
of a degree of accuracy usual in scientific work by making P =
0.9999779, when absolute certainty is P = 1, then [there is] satisfac-
tory evidence for some cause in addition to chance" (p. 83). In other
words, he was defining significance with a p value of 2.21 x
Coover was not alone in requiring that, results conform to arbi-
trarily stringent significance levels. In 1940, when Rhine et al. pub-
Replication vs. Significance 307
fished Extra-Sensmy Perception After Sixty Years, they included the fol-
lowing definitions in the glossary:
p-value = probability of success in each trial
SIGNIFICANCE: When the probability that chance factors alone pnig
citiccd a given deviation is sufficiently small to provide relative certaintE
that chance is not a reasonable expectation, the deviation is sign:flea:1.*
above or below the chance level. Among ESP results, this is arbitrarila
taken to mean a deviation in the expected direction such that the criticain
ratio is 2.5 times the standard deviation (or four times the probable erg
ror) or greater. (p. 423-424)
Thus, significance was defined by z 2.5, or p .0062.
Seventeen years later, in their book Parapsychology: Frontier Sci-
ence of the Mind, Rhine and Pratt (1957) suggested that .01 was theti
appropriate threshold:
In order for such judgments to have the necessary objectivity, a criterionP.
of significance is established by practice and general agreement amon
the research workers in a particular field.... Most workers in parapsy- ?
cholog-y accept a probability of .01 as the criterion of significance. (p.0
186)
Finally, the Journal of Parapsychology has included a definition of0
significance in its glossary for many years, but the appropriate pag
value has fluctuated back and forth between .01 and .02, finally set-
ding at .02 in 1968. The following are excerpts from those glossar-.9
ies: oo
c.o
December 1949: "A numerical result is significant when it equals(T)
or surpasses some criterion of degree of chance improbabil-E
iv. Common criteria are: a prOba-bili-ty value Of .0-1 or less." 8
0
March 1950 to June 1957: "The criterion commonly used in this
Journal is a probability value of .02 or less."
September 1957: "The criterion commonly used in this Journal ?%
is P = .01."
December 1957 to December 1967: "The criterion commonly
used in parapsychology today is a probability value of .01 or
less."
March 1968 to December 1986: "The criterion commonly used
in parapsychology today is a probability value of .02 (odds of
50 to 1 against chance) or less.... Odds of 20 to 1 (probability
of .05) are regarded as strongly suggestive."
308 The journai of Parapyychology Replication os. Significance 309
March 1987: The term significance no longer appears in the glos-
sary.
By the mid-1980's, despite the value of .02 given in the Journal
of Parapsychology, significance seemed to have been determined to
-0> correspond to a p value of .05. For example, in their bibliography
n of remote-viewing research, Hansen, Schlitz, and Tart (1984) claim:
< ? "We have found that more than half (fifteen out of twenty-eight) of
a the published formal experiments have been successful, where only
m one in twenty would be expected by chance." As mentioned in my
0
n introduction, .05 was the value used by the National Research Coun-
cil in their recent evaluation of parapsychology. Both Hyman (1985)
(7 and Honorton (1985) used .05 as the criterion for a successful ganz-
a)
feld study. In discussing the Schmidt REG experiments, Palmer
K.) (1985) implicitly used .05 as the cut-off for significance by observ-
ing: "Based on Z-tests ... 25 of the 33 (76%) were significant at the
E--4- .05 level, two-tailed. In two of the seven non-significant studies...."
^ (p. 102).
This definition of significance is obviously not unique to para-
psychology. A popular introductory textbook in psychology states
O that:
? Psychologists used a statistical inference procedure that gives them an
estimate of the probability that an observed difference could have oc-
curred by chance. This computation is based on the size of the differ-
? ence and the spread of the scores. By common agreement, they accept
a difference as "real" when the probability that it might be due to
? chance is less than 5 in 100 (indicated by the notation p < .05). A sig-
? nificant difference is one that meets this criterion.... With a statistically
significant difference, a researcher can draw a conclusion about the be-
havior that .was under investigation. (Zimbardo, 1988, p. 54)
8
0 Given the weight that has been attached to .05 as the criterion
(6) for significance, one would think that it resulted from careful con-
sideration of the issue by statisticians and psychologists. Unfortu-
? nately, such is not the case. Its roots apparently lie in the following
4' passage published in 1926 by one of the founders of modern statis-
tics, Sir Ronald A. Fisher:
It is convenient to draw the line at about the level at which we can say:
"Either there is something in the treatment, or a coincidence has oc-
curred such as does not occur more than once in twenty trials." ...If
one in twenty does not seem high enough odds, we may, if wc prefer
it, draw the line at one in fifty (the 2 per cent point), or one in a
hundred (the 1 per cent point). Personally, the writer prefers to set a
0
CD
cr)
low standard of significance at the 5 per cent point, and ignore entirely
all results which fail to reach that level. A scientific fact should be re-
garded as experimentally established only if a properly designed exper-
iment rarely fails to give this level or significance. (Fisher, 1926, p. 504;
also quoted in Savage, 1976, p. 471)
Thus began the belief that an experiment is successful only if the
null hypothesis can be rejected using a = 0.05. As an immediate
consequence of this belief, Fisher and his followers created tables of
F statistics that included values only for tail areas of .05 and .01.
Since researchers did not have access to computer algorithms to de-
termine intermediate p values, success came to be measured in terms
of these two values alone.
PROBLEMS WITH HYPOTHESIS TESTING
Misconceptions about p Values
Most modern research reports include p values instead of simply
discussing whether an experimental result is significant at a pre-
specified level. Although this is somewhat better than the old
method of "one star or two" (corresponding to a significant result
at .05 br .01, respectively), it is still a misleading way to examine
experimental results.
The problem is that many?researchers interpret p values as being
related to the probability that the null hypothesis is true. Even some so-
phisticated researchers tend to think that an extremely small p value
must correspond to a very large effect in the population and that a
large p value (say > .10) means that there is no effect. In other
words, the size of the p value is incorrectly interpreted as the size of the
effect. It should be interpreted as the probability of observing results
as extreme or more so than those observed, if there is no effect.
To see how arbitrary it is to base a decision about the truth or
falsity of a statement on a p value, consider a binomial study based
on a sample of size n which results in z = 0.30, p value = .38, one-
tailed. One would probably abandon the hypothesis under study
and decide not to pursue the given line of research. Now suppose
that the study had been run with a sample of size 100n instead and
resulted in the exact same proportion of hits. Then we would find
z = 3.00, p value = .0013. These results would be regarded as
highly significant!
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
310 The Journal of Pamp.sychology
As another example, consider a chi square test for randomness
based on a sequence of n numbers, each of which can take the val-
ues 1, 2, ... 10. Suppose that the test results in a chi-square value of
11.0, df = 9, p value = 0.28. Now suppose the sequence was three
times as long but the proportions of each digit remained the same.
Then each term in the numerator of the chi-square statistic would
be multiplied by 32, whereas each term in the denominator would
only be multiplied by 3. The degrees of freedom would not change,
but the new result would be x2 = 33.0, df = 9, p value = .00013.
In the first case, the conclusion would be that the sequence was suf-
ficiently random, yet a sequence three times as long with the same
pattern would be seen to deviate considerably from randomness!
This problem was recognized more than 50 years ago by Berk-
son (1938):
We may assume that it is practically certain that any series of real ob-
servations does not actually follow a normal curve with absolute exactitude
in all respects, and no matter how small the discrepancy between the
normal curve and the true curve of observations, the chi-square P will
be small if the sample has a sufficiently large number of observations in
If this be so, then we have something here that is apt to trouble the
conscience of a reflective statistician using the chi-square test. For I sup-
pose it would be agreed by statisticians that a large sample is always
better than a small sample. If, then, we know in advance the P that will
result from an application of a chi-square test to a large sample, there
would seem to be no use in doing it on a smaller one, but since the
result of the former test is known, it is no test at all. (pp. 526-527,
emp ha cis in_ original)_
Replication
Very often researchers simply do not understand the connection
between the p value and the size of the sample. For example, Ro-
senthal and Gait() (1963) asked nine faculty members and ten grad-
uate students in a university psychology department to rate their
degree of belief or confidence in results of hypothetical studies with
various p values and with sample sizes of 10 and 100. Given the
same p value, one should have more confidence in a study with a
smaller sample because it would take a larger underlying effect to
obtain the small p value for a small sample. Unf_orturia.tely, theseApprovea i-or Keieaie 2003/04/1
?:
Replication vs. Significance
311
respondents demonstrated that they were far more likely to believe
results based on the large sample when the p values were the same.
(For a discussion of this example and some other problems with hy-
pothesis testing in psychology, see Bakan, 1967.)
One consequence of this misunderstanding is that researchers
misinterpret what constitutes a "successful replication" of an exper-
iment. Tversky and Kahneman (1982) asked 84 members of the
American Psychological Association or the Mathematical Psychology
Group the following question:
Suppose you have run an experiment on 20 subjects, and have obtained
a significant result which confirms your theory (z = 2.23, p < .05, two-
tailed). You now have cause to run an additional group of 10 subjects.
What do you think the probability is that the results will be significant,
by a one-tailed test, separately for this group? (p. 23)
The median answer given was .85. Only 9 of the 84 respondents
gave an answer between .40 and .60. Assuming that the value ob-
tained in the first test was close to the true population value, the
probability of achieving a p value .05 on the second test is actually
only about .47. This is because the sample size in the second study
is so .small. The effect would have to be quite large in order to be
detected with such a small sample.
In the same survey, Tversky and Kahneman also asked:
An investigator has reported a result that you consider implausible. He
ran 15 subjects, and reported.a significant value, t = 2.46. Another in-
vestigator has attempted to duplicate his procedure, and he obtained a
nonsignificant value of t with the same number of subjects. The direc-
tion was the same in both sets of data. You are reviewing the literature.
What is the highest value of t in the second set of data that you would
describe as a failure to replicate? (p. 28)
The majority of respondents considered t = 1.70 as a failure to
replicate. But if the results from both studies are combined, then
(assuming equal variances) the result is t = 2.94, df = 29, /9 value
= .003. The paradox is that the new study decreases faith in the orig-
inal result if viewed separately but increases it when combined with
the original data!
This misunderstanding about replication is quite prevalent in the
psi literature, as demonstrated by the emphasis on successful repli-
cation, where success is defined in terms of a specific.p value, re-
gardless of sample size. As an example of how unnecessarily dis-
8 CIARDP9iP00189R001/r100036004Giters, I have shown elsewhere (Utts,
17-1?000?0001.?00t168/00-96d0U-VIO 814170/C00Z aseeieu JOd PeACLIddV
319 The foam (I I Of PO rapychology
1986) that if the true hit rate in a binomial study (such as a ganzfeld
experiment) is actually 33%, and 25% is expected by chance, then a
study based on a sample of size 26 should be expected to be "suc-
cessful" (p .05) only about one fifth of the time. Even a study
based on a sample of size 100 should be "successful" only about half
of the time. It is no wonder that there are so many "unsuccessful"
attempts at replication in psi.
As another example of the paradoxical nature of this definition
of replication, consider the "unsuccessful" direct-hit ganzfeld studies
covered by the meta-analyses of Hyman (1985) and Honorton
(1985). Using those studies with p(hit) = .25, there were 13 out of
24" that were nonsignificant, a = 0.05, one-tailed. (See Honorton, p.
84, Table Al.) But when these 13 "failures" are combined, the re-
sult is 106 hits out of 367 trials, z = 1.66, p = .0485!
Problems with Point Null Hypotheses
A point null hypothesis is one that specifies a partkular value
("point") as the one being tested. Most hypothesis testing is done
with point null hypotheses. The problem with this approach is that
any given hypothesis is bound to be false, even if just by a minuscule
amount. For example, in a coin-tossing experiment, the null hy-
pothesis is that the coin is fair, that is to say, Ho: P = .5000000.
This is never precisely true in nature. All coins and coin-tossers in-
troduce a slight bias into ?the experiment. This slight bias can pro-
duce a very small p value if the sample size is large enough. If, for
example, the true probability of heads is .5001, and the observed
proportion of heads falls right at. this value, then the null hypothesis
will be rejected at .05 if the sample size is at least 6.7 x 107. As long
as there is any bias at all, the p value can be made arbitrarily small
by taking a large enough sample.
In practice, this problem was rarely serious before it became pos-
sible to collect large amounts of data rapidly using computers. Stat-
isticians have often used ESP as an example of one of the few cases
where it really is possible to specify an exact value for the null hy-
pothesis. But even this view is changing, as shown by this comment
from a recent issue of a popular statistics journal:
It is rare, and perhaps impossible, to have a null hypothesis that can be.
exactly modeled as 0 = 00. One might feel that hypotheses such as
Ho: A subject has no ESP, or
Ho: Talking to plants has no effect on their growth,
are representable as exact (and believable) point nulls, but, even here,
Replication vs. Significance
313
minor biases in the experiments will usually prevent exact representa-
tions as points. (Berger & Delampady, 1987, p. 320)
In summary, hypothesis testing as it is currently formulated
tends to be a misleading approach to examining data. Small samples
tend to lead to "nonsignificant" studies, whereas large samples can>
lead to extremely small p values, even if the null hypothesis is on1y:00
slightly wrong. Many researchers do not understand the meaning of
a p value and do not understand how closely replication issues areg
tied to sample size. Arguments about replication should not bea
-n
based on p values alone. 0
SOLUTIONS
Power Calculations
c7
If a hypothesis test is to be done at all, a researcher should at
least determine in advance whether it is likely to be successful. Thep
statistical power of a test is the probability that the null hypothesi
will be rejected. It obviously depends on what the true underlying.
state of nature is. Because this information cannot be known (orC2
there would be no point in doing the experiment), it is a good ide
to look at power for a variety of possibilities before conducting duo
experiment. The results will tell you whether you are likely to be;z9
able to reject the null hypothesis, using the sample size you have'
planned, for specific values of the magnitude of the effect.
Statistical power is a function of the sample size; the true under-
lying magnitude of the effect, the level of significance for which th
experiment would be considered a success, and the method og
analysis used. It does not depend on the data.
As an example, suppose you are planning to conduct a test og
the hypothesis H.: P = .25 using a series of 10 independent trialsE
Power calculations would proceed as follows:
1. Find the cutoff point for the number of hits that would leacla
to rejection of Ho. In this case, the p value for 5 hits is .08, and foal:.
6 hits it is .02, so 6 hits would probably be required to reject the
null hypothesis.
2. Power for a specific alternative is the probability that the null
hypothesis would be rejected if that alternative value is true. In this
case, power = P(6 or more hits). This can be computed directly,
using the binomial formula, for any specified hit rate. Here are
some examples:
314 The Journal of Parapsychology
Hit rate
Power = P(6 or more hits)
0.30
.047
0.33
.073
0.40
.166
0.50
.377
a Notice that even if the true hit rate is 50% instead of the chance level
m of 25%, the chances of a "successful" replication are poor, that is, only
0
n 37.7%. In most psi applications, 30% or 33% is probably a more re-
alistic approximation to the true hit rate, so there would be a very
47 small chance of having this experiment succeed with only 10 trials.
co As a second example, suppose you are planning to run the same
N) experiment with 100 trials and are planning to use the normal ap-
proximation instead of an exact test. Further, suppose you will re-
ject the null hypothesis if z 1.645, where z is the usual critical
o
ratio, corrected for continuity: z = (number of hits - 0.5 -
"co 25) / V(100 x .25 x .75) = .23(number of hits - 25.5). Using
? ? simple algebra, note that z 1.645 when the number of hits
0 32.65. Thus, the null hypothesis will be rejected if there are 33 or
5, more hits, so power = P(33 or more hits). Computing this for the
same hypothetical hit rates as in the previous example gives:
1:1
c.0
Cr) Hit rate Power = P(33 or more hits)
0 0.30
0.33
0.40
0.50
.289
.538
.939
.9998
0
gs Now there is a more reasonable chance for a successful study, al-
c6) though it is still only 29% even if the true hit rate is 30%.
cs For studies in which the null hypothesis does not involve a single
" value, it can be more difficult to compute power because it is not so
easy to specify a reasonable alternative. In these cases, it is still pos-
sible to look at the p value that can be expected if psychic function-
ing were to occur at specified levels for the sample size planned. For
example, McClenon and Hyman (1987) conducted a remote-viewing
study with eight trials, one for each of eight subjects, and used the
preferential-ranking method of Solfvin, Kelly, and Burdick (1978)
on the subject rankings. Each subject was asked to rank-order eight
Replication. vs. Significance 315
choices of potential targets as compared to the response he or she
had produced. By chance, the average rank should be 4.5. If
psychic functioning had reduced the average rank to 4.0, the p
value would have been .298, not significant. Even if the average >
rank had been reduced to 3.5, the study would still not have been;03
significant, p value = .126. The average rank would have to be 3.0 a
before this study would achieve a significant result. A parapsychol-
ogist experienced in remote viewing should be able to determine in a
advance whether such a study would be likely to be successful with
such a small sample.
The lesson here is that a "nonsignificant" study may be nothing
more than a study with low power. Before investing time and
money in a new study, it should be determined whether it is likely
to succeed if psychic functioning is operating at a given level.
Estimation
An approach that avoids many of the problems with hypothesis
testing is to construct a "confidence interval" or an "interval esti-
mate" for the magnitude of an effect. This is done by computing an
interval of values that almost certainly covers the true population
value. The degree of certainty is called-the confidence coefficient and
is specified by the researcher. Common values are 95% and 99%.
As,an example, consider a binomial study with 100 trials that
results in 35 hits. Using the normal approximation, one would ex-
pect the proportion of hits in the sample to be within 1.96 standard
deviations of the true hit rate 95% of the time. The appropriate
standard deviation for the proportion P of hits is VP(I - P)/n.
Thus, a 95% confidence interval for the true hit rate is found by
adding-and-sub traeting--1.96 of-these-standard-deviations to the 8
portion of hits observed in the sample. The resulting interval in this cs
case is 0.35 - 0.09 to 0.35 + 0.09, or 0.26 to 0.44. This tells us that t.64
with a fair amount of certainty (95%), the true hit rate is covered g
by the interval from 0.26 to Q.44: For the same proportion of hits 7%
in a study with 1,000 trials, the interval would be from 0.32 to 0.38. 4=.
The larger the sample size, the shorter the width of the interval.
Consider two studies designed to test H.: P = .5:
p value
Study 1
3.60
.0004
1,000
Study 2
2.40
.0164
100
316 ? The Journal of Parapsychology
Which study provides more convincing evidence that there is a
strong effect? In keeping with the results of Rosenthal and Gaito
(1963) discussed earlier, most people would say that the first study
ikows a stronger effect, both because the p value is smaller and be-
-cause it is based on a larger sample. In fact, the opposite is true.
Ihe number of hits for the two studies are 557 (55.7%) and 62
2%), respectively; the smaller study had a higher hit rate. The
% confidence intervals for the hit rates in the two studies are
P.53 to 0.59) and (0.53 to 0.72), respectively, so in both studies we
3:fe relatively sure that the hit rate is at least 53%, but in the second
ctudy it could be as high as 72% whereas in the first it is probably
5o higher than 59%.
mu/ In studies with huge sample sizes, confidence intervals make it
ident that an infinitesimal p value does not correspond to an ef-
Tt- cc of large magnitude. For example, consider a study based on
G.)
800,000 trials and designed to test I-I?: I' = .50. Suppose there were
t0,500 hits. Then z = 3.16, and the p value is 7.9 x 10. But what
Fjoes this mean in practical terms? A 95% confidence interval for the
ue hit rate is from 0.5019 to 0.5081. Thus, it appears that the true
%lit rate is indeed different from 0.50, but reporting the results in
*his way makes it clear that the magnitude Of the difference is very
mall. The reader can decide whether an effect of this size has any
crneaning in the context of the experiment.
0
6 In summary, confidence intervals are preferable to hypothesis
Tests for the following reasons:
CO
C.0 1. They show the magnitude of the effect.
X
o 2. They show that the accuracy of the conclusion is highly de-
o
atendent on the sample size.
0 3. They remove the focus from decision making, which is arbi-
o
Srary at best because of sample size problems.
cs 4. They highlight the distinction between statistical significance
Snd practical significance.
4. 5. They allow the reader of a research report to come to his or
her own conclusion.
Meta-Analyses
Meta-analytic techniques may be viewed by some parapsycholo-
gists as the solution to studying the issue of replication. Even though
these techniques can address the replication issue in useful ways,
Replication vs. Significance 317
they also contain some dangerous pitfalls. For example, both Hy-
man (1985) and Honorton (1985) used "vote-counting" in their
meta-analyses of the ganzfeld data base. In other words, they tallied
the number of significant studies in the data base. This procedure
inherits all of the problems associated with the original determina-
tion of whether a study was "significant" in the first place. A series
of studies, each with low power, may all be determined to be non-
significant, when the combined data may lead to an extremely sig-
nificant result. Conversely, a series of studies based on large samples
may all be significant, but the magnitude of the effect may be very
small. A vote-count showing that most studies are significant could
mislead researchers into believing that there was a large effect.
The concept of effect size was introduced to account for the fact
that individual study results are highly dependent on sample size.
,Estimating the effect sizes for a series of studies and seeing whether
' they are similar is a useful way of studying replication. However,
examining only the effect size for an individual study does not give
any indication of the accuracy of the result. This should be done in
conjunction with some estimate of. the accuracy of the result, Such
as a confidence interval.
Bayesian Methods
Man.)/ statisticians believe that the conceptual framework of hy-
pothesis testing and interval estimation is philosophically incorrect.
Rather, they start by assigning-prior probabilities, based on subjec-
tive belief, to various hypotheses, and then combine these "priors"
with the data to compute final or "posterior" probabilities for the
hypotheses. This is called the Bayesian approach to statistics. An in-
troduction to the ideas of Bayesian analysis can be found in Berger
and Berry (1988) or Edwards, Lindman, and Savage (1963). A more
technical reference is Berger (1985).
Berger and Berry (1988), in a recent article in American Scientist,
discussed the use of Bayesian methods instead of classical methods:
The first step of this demonstration is to calculate the actual probability
that the hypothesis is true in light of the data. This is the domain of
Bayesian statistics, which processes data to produce "final probabilities"
...for hypotheses. Thus, the conclusion of a Bayesian analysis might be
that the final probability of H is 0.30.
The direct simplicity of such a statement compared with the convo-
luted reasoning necessary to interpret a P-value is in itself a potent ar-
318 The Journal of Parapsychology
gument for Bayesian methods. Nothing is free, however, and the ele-
gantly simple Bayesian conclusion requires additional input. To obtain
the final probability of a hypothesis in light of the experimental data, it
is necessary.to specify the probability of the hypothesis before or apart
from the experimental data.
Where does this initial probability come from? The answer is simple.
0 It must be subjectively chosen by the person interpreting the data. A
person who doubts the hypothesis initially might choose a probability of
a. 0.1; by contrast, someone who believes in it might choose 0.9. (p. 162)
0
They then provide an example of testing the hypothesis H: P = .5,
X where P is the proportion of hits expected in a binomial experi-
(r7 merit. Suppose that in 17 trials there are 13 successes (76.5%). Then
the p value is .049, two-tailed. Unless, of course, the experiment was
co
designed to stop at the fourth failure instead of at the 17th trial.
0 Then the p value, with the identical data, would only be .021. Such
(.4 problems arise with classical methods, but not with Bayesian meth-
ods.
Using the Bayesian approach, suppose that one's prior belief
that H is true is 50%. If H isn't true, the prior belief is that the true
0 value of P is equally likely to be anywhere between 0.5 - c and 0.5
+ c (where c is some constant), but could not possibly be farther
*1 than that from 0.5. The choice of c represents prior opinion about
0
-0 the strength of the effect, if there is one. Choosing c = 0.1 (the
effect isn't likely to be very strong even if it exists) results in a final
cE) probability of 0.41 for H (given that there were 13 successes in 17
0
-?1 trials), whereas choosing c = 0.4 results in a probability of 0.21 for
co H. In other words, the final degree of belief in H is dependent on
0X one's prior belief about the strength of the effect. It. also depends
c,c3 on prior opinion about the veracity of H, and on the observed data.
One reason that Bayesian methods are not more widely used is
,c3 that they are often difficult to apply. Another reason is that re-
ow searchers are uncomfortable with having to specify subjective de-
grees of belief in their hypotheses. This approach makes particular
4. sense for parapsychology, however, because most researchers have
strong opinions about the probability that psi is real, and these opin-
ions play a central role in how psi researchers and critics evaluate
the evidence. Posterior probabilities in Bayesian analyses are a func-
tion of both the prior probabilities and the strength of the evidence;
it may be informative to formalize these opinions and to see how
much evidence would be needed to increase the posterior probabil-
ity of a psi hypothesis to a non-negligible level when the prior prob-
ability was close to zero.
Replication vs. Significance
REFERENCES
319
BAKAN, D. (1967). On method: Toward a reconstruction of psychological investi-
gation. San Francisco: Jossey-Bass, Inc.
. >
1.3HRGER, J. O. (1985). Statistical decision thew). and Bayesian analysts. New
York: Springer-Verlag.
BERGER, j. 0., & BERRY, D. A. (1988, March-April). Statistical analysis an2t
the illusion of objectivity. American Scientist, pp. 159-165. a
BERGER, J. 0., & DELAMPADY, M. (1987). Testing precise hypotheses. Statism
tical Science, 2(3), 317-334. 0
BERKSON, J. (1938). Some difficulties of interpretation encountered in tITE1
application of the chi-square test. Journal of the American Statistical Ass.
dation, 33, 526-542.
COOVER, J. E. (1975). Experiments in psychicial research. New York: Arne
Press. (Originally published 1917)
DRUCKMAN, D., & SWETS, J. A. (1988). Enhancing human performance. WasiE
ington, D. C.: National Academy Press.
EDWARDS, W., LINDMAN, H., 8c SAVAGE, L. J. (1963). Bayesian statistical i*
ference for psychological research. Psychological Review, 70, 193-242. 7:%o
FISHER, R. A. (1926). The arrangement of field experiments. Journal of 016
Ministry of Agriculture of Great Britain, 33, 503-513.
1-JANSEN, C. P., SC:I [LITZ. M. J.. Se TART. C. T. (1984). Bibliography, remote4,
viewing research, 1973-1981. In R. Targ & K. Harary, The mind ra
(pp. 265-269). New York: Villard Books.
HONORTON, C. (1984). How to evaluate and improve the replicability in?
parapsychological effects. In 13. Shapin & L. Coly (Eds.), The repeatabilitg
problem in parapsychology (pp. 238-255). New York: Parapsychologg
Foundation, Inc. C.0
HONORTON, C. (1985). Meta-analysis of psi ganzfeld research: A respons
to Hyman. Journal of Parapsychology, 49, 51-91. ? 0
-HvivtAN, ganzfeid -psi experiment: A critical-appraisal. Jour-a
0
nal of Parapsychology, 49, 3-49. 0
MCCLENON, J., & HYMAN R. (1987). A remote viewing experiment cont.
ducted by a skeptic and a believer. Zetetic Scholar, Nos. 12/13, 21-33. g
PALMER, J. (1985). An evaluative report on the current status of parapsy2
chology. U.S. Army Research Institute for the Behavioral and SocialP
Sciences, Alexandria, VA.
RAO, K. R. (1984). Replication in conventional and controversial sciences.
In 13. SIIAPIN & L. COLY (Eds.), The repeatability problem in parapsychology,
(pp. 22-41). New York: Parapsychology Foundation, Inc.
RHINE, J. B., & PRATT, J. G. (1957). Parapsychology: Frontier science of the
mind. Springfield IL: Charles C. Thomas.
RHINE, J. B., PRATT, J. G., STUART, C. E., Smrru, 13. M., & GREENWOOD,
J. A. (1940). Extra-sensory perception after sixty years. Boston: Bruce Hum-
phries.
320 The Journal of Parapsychology
ROSENTI IA1., R., tic GniTo, J. (19(13). The interpretation of' levels or signifi-
cance by psychological researchers. Journal of Psychology, 55, 33-38.
SAVAGE, L. J. (1976). On rereading R. A. Fisher. Annals of Statistics. 4, 4,11-
500.
10LFVIN, G. F., KELLY, E. F., & BURDICK, D. S. (1978). Some new methods
-013 for preferential-ranking data. Journal of the American Society for Psychical
a Research, '72, 93-110. >
-0
gVERSKY, A., & KAHNEMAN, D. (1982). Belief in the law of small numbers. -0
-s
a In D. Kahneman, P. Slovic, Sc A. Tversky (Eds.), Judgment under uncer- 0
eT tainty: Heuristics and biases. Cambridge: Cambridge University Press. <
CD
7T-rs, J. M. (1986). The ganzfeld debate: A statistician's perspective. Journal a
(DX of parapsychology, 50, 393-402. m
0
arImBARD0, P. G. (1988). Psychology and life. Glenview, IL: Scott, Foresman n
0? and Co. X
CD
CD CV
su
iivision of Statistics u)
CD
aniversity of California n.)
t avis, CA 95616 o
o
?..
co o
.. 4.
- 0 "
Is. co
? ?
i3 0
0
-0 F.
CD x
co 0
6 -0
o to
-4 co
co O
CD
X o
-4
o co
o to
c.4
_. X
o o
o o
o c.4
c.4 _.
o o
o o
o o
_. c.4
4. o
o
o
_.
Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4
Paranormal Communication
"Error Some Place!"
by Charles Honorton
Review of the ESP controversy
traces debate from statistical
and methodological issues to
the a priori critique and the
paradigm of "normal science."
Asked his opinion of ESP. a skeptical psychologist once retorted, "Error
Some Placel" I believe he was right, but for the wrong reasons. Western
science has always been ambivalent toward the mental side of reality, and
it is perhaps not surprising that the occurrence of "psychic" phenomena is
one of the most controversial topics in the history of science.
The first serious effort toward scientific examination of psi claims was
undertaken by the Society for Psychical Research (SPR), founded in London
in 1882 for the purpose of "making an organized and systematic attempt
to investigate the large group of phenomena designated by such terms as
mesmeric, psychical, and spiritualistic." The SPR leadership included many
distinguished scholars of the period, and similar organizations quickly
spread to other countries, including the American Society for Psychical Re-
search, founded in New York in 1885 under the aegis of William James,
who himself took an active role in early investigations of mediumistic
communications.
These turn-of-the-century investigators focused much of their attention
on authenticating individual cases of spontaneous experiences suggestive
of psi communication. While a great deal of provocative material was care-
fully examined and reported (e.g., 18), the limitations inherent in the case
study approach prohibited definitive conclusions. However thoroughly au-
thenticated, spontaneous cases cannot provide adequate assessment of such
potential sources of contamination as chance coincidence, unconscious in-
ference and sensory leakage, retroactive falsification, or deliberate fraud.
Charles Honorton is director of research in the Division of Parapsychology and
Psychophysics. Department of Psychiatry, Maimonides Medical Center, Brooklyn, N.Y.
Approved For Release 2003/04/18 : CIA-RDP96-00789R0031000300014
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
1
Journal of Conununication, Winter 1975
Early experimental approaches pri arily involved the "telepathic" reD o-
duction of drawings at a distance (6 ). While often striking correspondences
were obtained, the experimental cOnditions did not usually provide or
random selection of target (stimulus) material, and were not always
adequate with respect to the possibility of sensory leakage, intentiona: or
otherwise.
Neither the spontaneous case stLelies nor the early experimental eF.crts -
made much impact upon the scientiFc community, though they drew criucal
comment from prominent period sSientists. "Neither the testimony of 311
the Fellows of the Royal Society, nor even the evidence of my own
proclaimed Helmholtz, "would leac me to believe in the transmission of
thought from one person to anothen independently of the recognized ch--n-
nels of sense." Thomas Huxley declined an invitation to participate in7-=,
some of the early SPR investigations, saying he would sooner listen to the
idle gossip of old women.
The ruclments -of an experimental methodology
for testing psi were suggested t ree
centuries ago by Francis B
In Sylva Sylvartirn, a work puhlished posthumously, Bacon discu
"experiments in consort, monitory touching transmission of spirits rnd_
forces of imagination." He suggested that "the motions of shuffling cars,
or casting of dice" could be used to test the "binding of thoughts. . . .
The experiment of binding of thotights should be diversified and triec to
the full; and you are to note whetler it hit for the most part though not
?
always" (2).
The application of probability theory to the assessment of deviaujons
from theoretically expected chance outcomes was introduced to psyclic.21_
research in 1884 by the French Noel laureate, Charles Richet, in experi.
rents involving card-guessing. Thel popularity of card-guessing as anl ex-
perimental methodology was greatly( influenced by the work of J. B. Rhine
and his associates at Duke Universiv in the early l930s: Rhine (50) dev isec
a standard set of procedures around a simplified card deck contaiAim;
randomized se uences of five georinetric forms (circle, cross, wavy li
Q-? square, and ircle . These "ESP cards" were prepared in packs of 25, ian(
\--"eadi "run" through the pack was associated with a constant binomial p.irot
ability of 1/5, since subjects were riot given trial-by-trial feedback. Prceid-
ing the experimental conditions ware adequate to eliminate illicit sensor
cubes, recording errors, and ration6.1 inference, statistically significant
partures from binomial chance exPectation were interpreted as indicating
extrasensory communication.
Initially, "telepathy" tests consisted of having a subject in one r oor
_
attempt to identify the order of the cards as they were observed b-r? a.,
"agent" in another room. In "clairiloyance" tests, the subject attempted to
"guess" the order of the cards diredtly, as they lay-concealed in an opaqt;
104
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Paranormal Communication / "Error Some Place!"
container or in another room, without an agent. "Precognition" tests,
introduced somewhat later (59), required the subject to make anticipatory
guesses of the card order befOre the pack was shuffled or otherwise random-
ized.
Rhine introduced the term "ESP" in his first major report on the Duke
University work in 1934 (50). He reported a total of 85,724 card-guessing
trials, carried out with a wide variety of subjects and under a wide range
of test conditions. The results as a whole were astronomically significant,
though informal exploratory trials were indiscriminately pooled with those
carried out under more carefully controlled conditions. The best-controlled
work during this period was the Pearce-Pratt distance series of clairvoyance
tests (58), in which the subject, Pearce, located in one building, attempted
to identify the order of the cards as they were handled, but not viewed, by
Pratt, the experimenter, located in another building. The level of accuracy
obtained in this series of 1,850 trials was associated with a probability of
As a stimulant to experimental research, Rhine's work had unprece-
dented influence. For the first time a common methodology was adopted
and employed on a large scale by a number of independent and widely
separated investigators. For the first time, also, the scientific community
was confronted with a body. of data, collected through conventional meth-
ods, which it could no longer ignore?nor too hastily accept. The wide-
scale adoption of the card?guessing methodology was accompanied by a
plethora of critical articles, challenging almost every aspect of the evalua-
tive techniques and the experimental conditions. During the period be-
tween 1934 and 1940, approximately 60 critical articles by 40 authors ap-
peared, primarily in the psychological literature. While card-guessing it
no longer the primary methodology in experimental parapsychology, the
questions which arose over its use are of equal relevance to the more
sophisticated approaches used today.
The first major issue concerned the
validity of the assumption that the
probability of success in the card-guessing
experiments was actually .1/5.
If chance expectation is other than 1/5, the significance of the observed
deviations would obviously be in doubt. This issue was quickly resolved
by mathematical proof and through empirical "cross-checks," a form of
control series in which responses (guesses) were deliberately compared with
target orders for which they were not intended (e.g., responses on run ni
matched with the target sequence for run n.,). Empirical cross-checks were
reported for 24 separate experimental series involving a total of 12,228
runs (305,700 individual trials). While the actual experimental run scores
(e.g., guesses on run n) compared to targets for run n1) were highly sig-
nificant and yielded a mean scoring rate of 7.23/25, the control-cross-check
11
Approved For Release 2003/04/18 : CIA-RDP96-00789R0031000300014
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
lout-nal of Communication, Winter 1975
scores were in all cases nonsignificar4, with a mean scoring rate of 5.04 (4). --
Several critics questioned the applicability of the binomial distribution
as a basis for assessing the statistical; significance of ESP card-guessing data.
Willoughby (711) proposed the use Of an empirical control series, but later
withdrew the suggestion after comparing the two methods (79. Alternative
methods of deriving the probable !error and recommendations for usibg
the empirical standard deviation were also proposed and later withdraWn ?
(21, 22). Concern over this issue diiininished and was generally abandoned
following the publication of a large chance control series involving half
a million trials and demonstrating close approximation to the binomlial
model (12).
Another question arose about Whether the binomial model provides
sufficient approximation to the normal distribution to allow use of normal
probability integral tables for det'ermination of significance levels (17).
Stuart and Greenwood (73) showed that when the normal distribution is
used as an approximation to the binomial model, discrepancies are
portant only with cases of borderline significance and few trials.
The use of the binomial criticallratio (a) to evaluate the significance. of
the ESP card-guessing deviations Was generally approved b.:. professional
statisticians (6, 20). Fisher (10), however, commented that high levels of-
statistical significance should not bei accepted as substitutes foe:- independent
replication. In another vein, Hundngton (20) asked, -If mathematics has
successfully disposed of the hypothesis of chance, what has psychology to ?
say about the hypothesis of ESP?"
The mai:t frequently expressed methodological
concern was the possibility of some'
form of "sensory leakage," giving the ESE
sub feet enough information about the targets
to account for significant, ext radiance results
As early as 1895, two Danish psychologists, Hansen and Lehmann
(16), reported that with the aid of parabolic reflectors-subjects could detec-
digits and other material silently concentrated upon by an agent. Jn thes
experiments, the subject and agenti sat with their heads close to the foci
of two concave mirrors. While the agent concentrated on the number, h'
made a special effort to keep his ips closed. Under these conditions, th
subjects were frequently successful ip identifying the number. These results
were interpreted by Hansen and ILehmann as supporting the hypothes-
of "involuntary whispering." The l utilization of subtle sensory cues
demonstrated in a careful investigation by S. G. Soal of a stage "telepathist
(66). There were also reports, such as the case of "Ilga K.," a mental1i.
retarded Latvian child who could read any text, even in a foreign langUag '
when someone stood behind her, reading "silently." Experiments wit..
dictaphonc recordings revealed that "Ilga" was responding to very slight
auditory cues (3).
10-6 ,
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4
Journat of Communication,. Winter 1975
able to the ESP hypothesis made 71.5 percent more errors of commission
(increasing ESP scores), while those who were unfavorable to the ESP
hypothesis made 100 percent more errors of omission (decreasing ESP
scores). Murphy (37) reported an analysis of 175,000 trials from experiments
reporting positive evidence for ESP and found only 175 errors (0.10 per- -
cent). Greenwood (12) reported only 90 recording errors in rechecking his
500,000-trial control study, of which 76 were errors of omission.
Some critics also alleged that improper selection of data could account
for experimental successes. This could be done in several ways: (a) selection
of subjects; (b) selection of particular blocks of data out of larger samples;
(c) selection of one of several forms of analysis; and (d) selective reporting
of particular studies. The questions raised have sometimes been stated
cynically in the form, "Parapsychologists must run 100 subjects before
they find one with 'ESP'." As if in defense against this charge, a number
of the reported studies specifically stated that all of the data collected were ?
included in the analysis (see 43, pp. 118-124, Table 12).
Concerning selection of subjects, Warner (76) suggested two criteria:
first, results of "poor" subjects must be included up to the point when
they are discontinued since it does not matter how many trials a given
subject makes as long as all of the trials (for all subjects) are included;
second, exclude all preliminary trials (for both "good" and "poor" sub-
jects) and use preliminary screening studies to select "good" candidates for
formal work. These criteria were generally endorsed by the chief critics of
the period (e.g., 23).
The question of post hoc selection of analyses was not a point of serious
concern in the period between 1934 and 1940, though it is relevant to the
assessment of some of the process-oriented investigations reported more
recently. The question of whether nonsignificant studies were withheld -
from publication involves an issue which is of great toncern to the be-
havioral sciences as a whole (70, 81) and one which is difficult to accurately
assess since there is no way of knowing how many studies may have been
withheld from publication because their results failed to disconfirm he
null hypothesis.
Several studies of American Psychological Association publication li-_
cies (4, 70, 81) indicate that experimental studies in general are more likely
to be published if the null hypothesis is rejected at the conventional 05
and .01 alpha levels than if it is not rejected. These studies also indic.te
that a negligible proportion of published studies are replications. Boza th
and Roberts (4), in a survey of 1,334 articles from psychological journ Is,
found that. 94 percent of the articles involving statistical tests of significance
reported rejection of specific null hypotheses; only eight articles (less than
1 percent) involved replications of previously published studies.
With respect to the implications of such selection for the ESP hypothesis,
there are two partial answers. First, considering the degree of critical int.er-_,
est which prevailed in the 1930s, it seems unlikely that nonsignificant find-
ings would have been repressed during this period; second, the high levels
Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
1Pararrarrnza Communication / "Error Some Moe
It is clear that at least some of the early exploratory series reported in
Rhine's monograph were open to criticism for inadequate controls against
sensory cues. While Rhine did not base major conclusions on such poorly
controlled data, inclusion of them in his monograph provided a ready target
for critical reviewers and sidetracked discussion away from the better con-
trolled work, such as the Pearce-Pratt series, which was not susceptible to
explanation by sensory cues.
Defects in an early commercial printing of ESP cards were reported by
several investigators (18, 25). It was found that the cards were warped and
could under certain conditions be identified frOm the back. This discovery
circulated widely for a time as an explanation of all successful (i.e., statis-
tically significant) experimental series. The parapsychologists retorted that
defective cards had not been employed in any bf the experiments reported
in the literature and that, in any case, they could not account for results
from studies involving adequate screening with such devices as opaque
envelopes, screens, distance, or work involving, the precognition paradigm
in which the target sequences were not generated until after the subject
had made his responses (53, 54, 72).
By 1940 nearly one million experimental trials had been reported under
conditions which precluded sensory leakage. These included five studies
in which the target cards were enclosed in opaque sealed envelopes (41, 45,
46, 54, 59), 16 studies employing opaque screens (7, 8, 11, 19, 83, 34, 35, 38,
41, 42, 44, 45, 46, 59, 71), ten studies involving separation of subjects and
targets in different buildings (50, 51, 52, 53, 34, 32, 8, 77, 61, 60), and two
studies involving precognition tasks (59, 75). These data are summarized in
Table 1. The results were independently significant in 27 of. the 33 experi-
ments. By the end of the 1930s there was general agreement that the better-
controlled ESP experiments could not be accOunted for on the basis of
sensory leakage.
The hypothesis that significant "extrachance" deviations in ESP experi-
ments might be attributable to motivated scoririlg errors was investigated in
several studies. In one investigation (26), 28 observers recorded 11,125 mock
ESP trials. Of these, 126 (1.13 percent) were misrecorded. Observers favor-
Table 1: ESP card-guessing experiments (1934-1939) excluding sensory cues.
Method
Studies
N (Trials) Mean/25
P<
9
"Clairvoyance" paradigm,
stimuli in sealed, opaque
envelopes
5
129,775
5.21
4.(V Combined z 6.14 ?1.29
-013 Studies with p < .05 87.5% 0.0%
O Mean ES .055 .005
m SD? .045 .035
a.
t(15) = 2.61,p = .01
O r = .559
These results are quite striking and suggest that future studies
mu' combining these moderators should yield especially reliable effects.
SUMMARY AND CONCLUSIONS
Our meta-analysis of forced-choice precognition experiments
O confirms the existence of a small but highly significant precognition
5 effect_ The effect appears to be replicable; significant outcomes are
*.1 reported by 40 investigators using a variety of methodological par-
adigms and subject populations.
co
The precognition effect is statistically very robust: it remains
highly significant despite elimination of studies with z scores in the
0
^ upper and lower 10% of the z-score distribution and when a third
co of the remaining investigators?the major contributors of precog-
? nition studies?are eliminated.
Estimates of the "filedrawer" problem and consideration of para-
_.
0 psychological publication practices indicate that the precognition ef-
fect cannot plausibly be explained on the basis of selective publica-
? tion bias. Analyses of precognition effect sizes in relation to eight
g measures of research quality fail to support the hypothesis that the
4." observed effect is driven to any appreciable extent by methodolog-
ical flaws; indeed, several analyses indicate that methodologically su-
perior studies yield stronger effects than methodologically weaker
studies.
Analyses of parapsychological alternatives to precognition, al-
though limited to the subset of studies using random number tables,
provide no support for the hypothesis that the effect results from
the operation of contemporaneous ESP and PK at the time of ran-
lomization.
Although the overall precognition effect size is small, this does
lot imply that it has no practical consequences. It is, for example,
)f the same order of magnitude as effect sizes leading to the early
.ermination or several major medical research studies. In 1981, the
\Iational Heart, Lung, and Blood Institute discontinued its study of
n-opranolol because the results were so favorable to the propranolol
reatment that it would be unethical to continue placebo treatment
Kolata, 1981); the effect size was 0.04. More recently, The Steering
-3ommittee of the Physicians' Health Study Research Group (1988),
n a widely publicized report, terminated its study of the effects of
tspirin in the prevention of heart attacks for the same reason. The
tspirin group suffered significantly fewer heart attacks than a pla-
:ebo control group; the associated effect size was 0.03.
The most important outcome of the meta-analysis is the identi-
ication of several moderating variables that appear to covary sys-
ematically with precognition performance. The largest effects are
)bserved in studies using subjects selected on the basis of prior test
)erformance, who are tested individually, and who receive frial-by-
rial feedback. The outcomes of studies combining these factors con-
rast sharply with the null outcomes associated with the combination
)f group testing, unselected subjects, and no feedback of results. Be-
ause the two groups of studies were conducted by a subset of the
ame investigators, it is unlikely that the observed difference in per-
ormance is due to experimenter effects. Indeed, these outcomes-
mderscore the importance of carefully examining differences in
ubject populations, test setting, and so forth, before resorting to
acile "explanations" based on psi-mediated experirrienter effects or
he "elusiveness of psi."
The identification of these moderating variables has important
nplications for our understanding of the. phenomena and provides
clear direction for future research. The existence of moderating
ariables indicates that the precognition effect is not merely an
nexplained departure from a theoretical chance baseline, but
ather is an effect that covaries with factors known to influence
lore familiar aspects of human performance. It should now be pos-
tble to exploit these moderating ['actors to increase die magnitude
nd reliability of precognition effects in new studies.
RFTERENCES
?,
.KERS, C. (198). Parapsychology is science, but its findings are inconclusive.
Behavioral and Brain Sciences, 10, 566-568.
17-1?000?0001?COON68/00-96dC1N-V10 : 81?/170/?00Z asealaN JOA peACLIddV
302 /wawa l'arapsychology
BARNETT, V., '11 (1978). (hi1/icr. iii statistical data. New York: Julni
Wiley & Sons.
BROWNLEE, K. A. (1965). Siatistical theory and methodology in science and engi-
neering. New York: John Wiley & Sons.
-0 ? COHEN, J. (1977). Statistical power analysis for the behavioral sciences. New York:
? Academic Press.
0
? DAWES, R. M., LANDNIAN, J., & WILLIAMS, J. (1984). Reply to Kurosawa. Amer-
? icon Psychologist, 39, 74-75.
m 1-ioNoRToN, C. (1985). Meta-analysis of psi ganzfeld research: A response to
O Ifyinan. Journal of Parapsychology, 49, 51 ?92.
? HYMAN, R. (1985). The ganzfeld psi experiment: A critical appraisal. journal
7 of l'imipAyrholoo, 49, 3-50.
(1
? KOLATA, G. B. (1981). Drug !build to help heart attack survivors. Science, 214,
? 774-775.
r%,) MANGAN, G. L. (1955). Evidence of displacement in a precognition test.Immtal
0
0 of Parapsychology. 19, 35-11.
C4 MORRIS, R. L. (1982). Assessing experimental support for true precognition.
Journal of Parapsychology, 46, 321-336.
" ROSENTHAL, R. (1984). Meta-analytic procedures for social research. Beverly Hills,
co
CA: Sage.
C) STEERING COMMI1TEE OF THE PHYSICIANS' HEAL:Ill STUDY RESEARCH GROUP.
> (1988). Preliminary. report: Findings from the aspirin component of the
ongoing Physicians' Health Study. New England Journal of Medicine, 318,
262-264.
STERLING, '11 D. (1959). Publication decisions and their possible effects on
inferences drawn from tests of significance?or vice versa. Journal of the
0 American Statistical Association, 54, 30-34.
CO WILKINSON, I,. 0984 SYS7i1T: The system far statistics. Evanston, II.: SVS.IAL
to
0
0 CilizoNt)LoGicAL LISTING OF STUDIES IN META-ANALYSIS
Ce4
0 CARINGTON, W (1935). Preliminary experiments in precognitive guessing:7012/--
0
0 nal of the Society Jiff Psychical Research, 29, 86-104.
C4 RHINE, J. B. (1938). Experiments bearing on the precognition hypothesis: I.
0
0 Pre-shuffling card calling. Journal of Parapsychology, 2, 38-54.
0
RHINE, J. B., Smi-rn, B. M., & Woonicurr, J. L. (1938). Experiments bearing
on the precognition hypothesis: II. The role of ESP in the shuffling of
cards. Journal of Parapsychology, 2, 119? 131.
HUMPHREY, B. M., & 1'RA"1-1-, J. G. (1941). A comparison of five ESP test
procedures. Journal of Parapsychology, 5, 267-293.
RHINE, J. B. (1941). Experiments bearing upon the precognition hypothesis:
III. Mechanically selected cards. journal of Parapsychology, 5, 1-57.
STUART, C. E. (1941). An analysis to determine a test predictive of extra-chance
scoring in card-calling tests. Journal of Parapsychology, 5, 99-137.
HUMPHREY, B. M., & RHINE, J. B. (1942). A confirmatory study of salience in
precognition tests. Journal of Parapsychology, 6, 190-219.
0
0
CD
CY)
6
A illeta-Analysis of Forced-Choice Precognition Experiments 303
Rittra. J. B. (1942). Evidence of precognition in the covariation of salience
ratios. Journal of Parapsychology, 6, 111-143.
NR:ot., J. E, & CARINGTON, W. (1947). Some experiments in willed die-throw-
ing. Proceedings of. the Society for Psychical Research, 48, 164-175.
THout,Ess, R. H. (1949). A comparative study of performance in three psi
tasks. journal of Parapsychology, 13, 263-273.
BASTIN, E. W, & GREEN, J. M. (1953). Some experiments in precognition.
Journal of Parapsychology, 17, 137-143.
MGMAI IAN, E. A., & BATES, E. K. (1954). Report of further Marchesi exper-
'molts. Journal of ray-opychology, 18, 82-92.
MANGAN, G. L. (1955). Evidence of displacement in a precognition test.Journal
l 'a rapsycliology, 19, 35-14.
Osis, K. (1955). Precognition over time intervals of one to thirty-three days.
Jounial of Parapsychology. 19, 82-91.
NIELSEN, W. (1956). An exploratory precognition experiment. Journal of Para-
psychology, 20, 33-39.
NIELSEN, W (1956). Mental states associated with success in precognition. jour-
nal of Parapsychology, 20, 96-109.
FAHLER, J. (1957). ESP card tests with and without hypnosis. Journal of Para-
psychology, 21, 179-185.
MANGAN, G. L. (1957). An ESP experiment with dual-aspect targets involving
one trial a clay. Journal of Parapsychology, 21, 273-283.
ANDERSON, M., ?& WHITE, R. (1953). A survey of work on ESP and teacher-
pupil attitudes. Journal of Parapsychology, 22, 246-268.
NASH, C. B. (1958). Correlation between ESP and religious value. Journal of
Parapsychology, 22, 204-209.
ANDERSON, M. (1959). A precognition experiment comparing time intervals of
it feW clays and one year journal of Parapsychology, 23, 81-89.
ANDERSON, M., & GREGORY, E. (1959). A two-year program of tests for clair-
voyance ancl precognition with a class of public school pupils. Journal. of
Parapsychology, 23, 149-177.
NASH, C. B. (1960). Can precognition occur diametrically? Journal of Parapsy-
chology, 24, 26-32.
FREEMAN, J. A. (1962). An experiment in precognition. finanal of Parapsychology,
26, 123-130.
RHINE, J. B. (1962). The precognition of computer numbers in a public test.
Journal of Parapsychology, 26, 244-251.
RM., M. (1962). -Raining the psi faculty by hypnosis. Journal of the Society for
Psychical Research, 41, 234-252.
SANDERS, M. S. (1962). A comparison of verbal and written responses in a
precognition experiment. Journal of Parapsychology, 26, 23-34.
FREEMAN, J. (1963). Boy-girl differences in a group precognition test. journal
of Parapsychology, 27, 175-181,
RAO, K. R. (196,3). Studies in the preferential effect: 11. A language ESP test
involving precognition and "intervention." puma/ of Parapsychology, 27, 147-
160.
P-1?000?000?00t168/00-96dCIU-VI3 81./170/?00Z aseeieu -10d peACLIddV
304 The Journal of Parapsychology A Meta-Analysis of Forced-Choice Precognition Experiments 305
FREEMAN, J. (1964). A precognition test with a high-school science club. Journal
ry. Parapsychology, 28, 214-221.
FREEMAN, J., & NIELSEN, W (1964). Precognition score deviations as related
to anxiety levels. Journal of Parapsychology, 28, 239-249.
SCHMEIDLER, G. (1964). An experiment on precognitive clairvoyance: Part I.
The main results. Journal of Parapsychology, 28, 1-14.
FREEMAN, J. A. (1965). Differential response of the sexes to contrasting ar-
> rangements of ESP target material. Journal of Parapsychology, 29, 251-258.
Osis, K., & FAHLER, J. (1965). Space and time variables in ESP. Journal of the
a American Society for Psychical Research, 59, 130-145.
< ? FAHLER, J., & OSIS, K. (1966). Checking for awareness of hits in a precognition
experiment with hypnotized subjects. Journal of the American Society for
^ Psychical Research, 60, 340-346.
0
-I FREEMAN, J. A. (1966). Sex differences and target arrangement: High-school
? booklet tests of precognition. Journal of Parapsychology, 30, 227-235.
4,7 Ror;i:its, D. P. (196(i). Negaiive and posiiive a fleci and ESP nin-score variance.
? Journal of Parapsychology, 30, 151-159. -
en
M ROGERS, D. P., & CARPENTER, J. C. (1966). The decline of variance of ESP
scores within a testing session. Journal of Parapsychology, 30, 141-150.
0
? BRIER, B. (1967). A correspondence ESP experiment with high-I.Q. subjects.
Journal of Parap.sychology, 31, 113- 148.
BUZBY, D. E. (1967). Subject attitude and score variance in ESP tests. Journal
co of Parapsychology, 31, 43-50.
? ? BUZBY, D. E. (1967). Precognition and a test of sensory perception. Journal of
O Parapsychology, 31, 135-142.
>, FREEMAN, J. A. (1967). Sex differences, target arrangement, and primary men-
? tal abilities. Journal of Parapsychology, 31, 271-279.
? HONORTON, C. (1967). Creativity and precognition scoring level. Journal of.
to
? Parapsychology, 31, 29-42.
cE) CARPENTER, J. C. (1968). Two related studies on mood and precognition run-
? score variance. Journal of Parapsychology, 32, 75-89.
(Ds:" DuvAL, R, & MONTREDON, E. (1968). ESP experiments with mice. Journal of
? Parapsychology, 32, 153-166.
0
0 FEATHER, S. R., & BRIER, R. (1968). The possible effect of the checker in
precognition tests. Journal of Parapsychology, 32, 167-175.
0
0 FREEMAN, J. A. (1968). Sex differences and primary mental abilities in a group
0
precognition test. Journal of Parapsychology, 32, 176-182.
s NASI I, C. S., & NAsit, C. B. (1968). Effect of target selection, field dependence,
0 and body concept on ESP performance.Jcrurnal of Parapsychology, 32, 248-
257.
RHINE, L. E. (1968). Note on an informal group test of ESP. Journal of Para-
psychology, 32, 47-53.
RYZL, M. (1968). Precognition scoring and attitude toward ESP.Journal of Para-
psychology, 32, 1-8.
RYZ1,, M. (1968). Precognition scoring and attitude. Journal of Parapsychology,
32, 183-189.
CARPENTER, J. C. (1969). Further study on a mood adjective check list and
ESP run-score variance. Journal of Parapsychology, 33, 48-56.
DUVAL, R, & MONTREDON, E. (1969). Precognition in mice: A confirmation.
Journal of Parapsychology, 33, 71-72.
FREEMAN, J. A. (1969). The psi-differential effect in a precognition test.Journal
of Parapsychology, 33, 206-212.
FREENIAN, J. A. (1969). A pi-ecognition experiment with science teacherslourna/
of Parapsychology, 33, 307-310.
JOHNSON, M. (1969). Attitude and target differences in a group precognition
test. Journal of Parapsychology, 33, 324-325.
MONTREDON, E., & ROBINSON, A. (1969). Further precognition work with mice.
Journal of Parapsychology, 33, 162-163.
Scurvitrir, H. (1969). Precognition of a quantum process. Journal of Parapsy-
chology, 33, 99-108.
BENDER, H. (1970). Differential scoring of an outstanding subject on GESP and
cliti rvoyai ice. journal of Parap.sychology, 34, 272-273.
FREEMAN, J. A. (1970). Sex differences in ESP response as shown by the Free-
man picture-figure test. Journal of Parapsychology, 34, 37-46.
FREEMANJ. A. (1970). Ten-page booklet tests with elementary-school children.
journal Y. PampAychology, 34, 192-196.
FREEMAN, J. (1970). Shift in scoring direction with junior-high-school students:
A summary. Journal of Parapsychology, 34, 275.
FREEMAN, J. A. (1970). Mood, personality, and attitude in precognition. tests.
Journal of Parapsychology, 34, 322.
HARALDSSON, E. (1970). Subject selection in a machine precognition testlourna
of Parapsychology, 34, 182-191.
HARALDSSON, E. (1970). Precognition of a quantum process: A modified rep-
lication. Journal of Parapsychology, 34, 329-330.
NIELSEN, W. (1970). Relationships between precognition scoring level. and
mood. Journal of Parapsychology, 34, 93-116.
Sctimnrr, H. (1970). Precognition test with a high-school group. Journal of
Parapsychology, 34, 70.
BELOFF, J., & BATE, D. (1971). An attempt to replicate the Schmidt findings.
Journal of the Society for Psychical Research, 46, 21-31.
HONORTON, C. (1971). Automated forced-choice precognition tests with a "sen-
sitive."Journal of the American Societyfor Psychical Research, 65, 476-481.
MrrcitEt.t., E. D. (1971). An ESP test from Apollo 14. Journal of Parapsychology,
35, 89-107.
Si:mum., II., & PANTAS, I.. (1971). Psi iests,with psychologically equivalent
conditions and internally different machines. Journal of Parapsychology, 35,
326-327.
STANFORD, R. G. (1971). Extrasensory effects upon "memory." Journal of the
American ;Society for Psychical Research, 64, 161-186.
STEILBERG, B. J. (1971). Investigation of the paranormal gifts of the Dutch
sensitive Lida T Journal of Parapsychology, 35, 219-225.
17-1?000?0001;?00t168/00-96dCIU-VI3 81./170/?00Z aseeieu JOd peACLIddV
306 The journal of Parapsychology A Meta-Analysis of Forced-Choice Precognition Experiments 307
Timut.Ess, R. H. (1971). Experiments on psi self-training with Dr. Schmidt's
pre-cognitive apparatus. Journal of the Society for Psychical Research, 46, 15-
91.
HONORTON, C. (1972). Reported frequency of dream recall and ESP. journal
of the American Society for Psychical Research, 66, 369-374.
JottNsoN, M., & N(m.DBEcK, B. (1972). Variation in the scoring behavior of a
"psychic" subject. journal of Parapsychology, 36, 122-132.
KELLY, E. E, & KANTHAMANI, B. K. (1972). A subject's efrorts toward voluntary
control. Journal of Parapsychology, 36, 185-197.
.Scruotn-r, H., & PANTAs, L. (1972). Psi tests with internally different machines.
Journal of Parapsychology, 36, 222-232.
CRAIG, J. G. (1973). The effect of contingency on premgnition in the rat.
Research in Parapsychology 1972, 154? 15(3.
FREEMAN, J. A. (1973). The psi quiz: A new ESP test. Research in Parapsychology
1972, 132-134.
ARTLEY, B. (1974). Confirmation of the small-rodent precognition work. Journal
of Parapsychology, 38, 238-239.
HARRIS, S., & TERRY, J. (1974). Precognition in a water-deprived Wistar rat.
Journal of Parapsychology, 38, 239.
RANDALL, J. L. (1974). An extended series of ESP and PK tests with three
? English schoolboys. journal of the Society for Psychical Research, 47, 485-494.
EYSENCK, H. J. (1975). Precognition in rats. Journal of Parapsychology, 39, 222-
227 -
HARALussoN, E. (1975). Reported dream recall, precognitive dreams, and ESP.
Research in Parapsychology 1974, 47-48.
HoNowroN, C., RAMSEY, M., & CABIBBO, C. (1975). Experimenter effects in
extrasensory perception. journal of the American Society for Psychical Research,
69, 135-149.
KANTHAMANI, H., & RA), H. H. (1975). Response tendencies and stimulus
structure. journal of Parapsychology, 39, 97-105.
LevTN,J. i;(15A et-pri ex ruin nus with gerbils4H4444-(41-441.-144)44-
psychology, 39, 363-365.
TERRY, J. C., & 1-1Amus, S. A. (1975). Precognition in water-deprived rats.
Research in Parapsychology 1974, 81..
DAVIS, J. W, & HAIGHT, J. (1976). Psi experiments with rats. journal of Para-
psychology, 40, 54-55.
JAccBs, J., & BREEDERVELD, H. (1976). Possible influences of birth order on
ESP ability. Research Letter (Parapsychology Laboratory, University of
Utrecht). No. 7, 10-20.
N F.V I 1.1,E, R. C. (1976). Some aspects of precognition testing. Research in Para-
psychology 1975, 29-31.
DRUCKER, S. A., DAMES, A. A., & Rum N, L. (1977). ESP in relation to cognitive
development and IQ in young children. Journal of the American Society Jiff
Psychical Research, 71, 289-298.
ARALDSSON, E. (1977). ESP and the defense mechanism test (DMT): A further
validation. European Journal of Parapsychology, 2, 104-114.
SARGENT, C. L. (1977). An experiment involving a novel precognition task.
Journal of Parapsychology, 41, 275-293.
BIERMAN, D. J. (1978). Testing the "advanced wave" hypothesis: An attempted
replication. European Journal of Parapsychology, 2, 206-212.
BRAUD, W. (1979). Project Chicken Little: A precognition experiment involving
the SKYLAB space station. European Journal of Parapsychology, 3, 149-165.
HARALDSSON, E., & JoHNsoN, M. (1979). ESP and the defense mechanism test
(DMT) Icelandic study No. III: A case of the experimenter effect? European
Journal of Parapsychology, 3, 11-20.
O'BRIEN, J. T (1979). An examination of the checker effect. Research in Para-
psychology 1978, 153-155.
CLEMENS, D. B., & PHILLIPS, D. T (1980). Further studies of precognition in
mice. Research in Parapsychology 1979, 156.
HARALDSSON, E. (1980). Scoring in a precognition test as a function of the
frequency of reading on psychical phenomena and belief in ESP. Research
Letter (Parapsychology Laboratory, University of Utrecht), No. 10, 1-8.
SARGENT, C., & HARLEY, T A. (1981). Three studies using a psi-predictive
trait variable questionnaire. Journal of Parapsychology, 45, 199-214.
WINKELMAN, M. (1981). The effect of formal education on extrasensory abil-
ities: The Ozolco study. Journal of Parapsychology, 45, 321-336.
NAsH, C. B. (1982). ESP of present and future targets. journal of the Society for
Psychical Research, 51, 374-377.
THALBOURNE, M., BELOIT, J., & DELANOY, D. (1982). A test for the "extra-
verted sheep versus introverted goats" hypothesis. Research in Parapsychology
1981, 155-156.
CRANDALL, J. E., & HITE, D. D.(1983). Psi-missing and displacement: Evidence
fill- improperly fbcused psi? Journal of the American Society for Psychical Re-
search, 77, 209-228.
54
in-
inary findings. Research in Parapsychology 1982, 103-105.
.)(311Ns()N, M., & HARP, t.nss()N, E. (1984). The Defense Mechanism Test as a
predictor of ESP scores: Icelandic studies IV and Viournal of Parapsychology,
48, 185-200.
TEnnEB, 'W. (1984). Computer-based long-distance ESP: An exploratory ex-
amination (RB/PS). Research in Parapsychology 1983, 100-101.
HESELTINE, G. L. (1985). PK success during structured and nonstructured
RNG operation. Journal of Parapsychology, 49, 155-163.
HARALDSSON, E., & JOHNSON, M. (1986). The Defense Mechanism Test (DMT)
as a predictor of ESP perlbrmance: Icelandic studies VI and VII. Research
in Parapsychology 1985, 43-44.
VAssv, L. (1986). Experimental 'study of complexity dependence in precogni-
tion. journal of Parapsychology, 50, 235-270.
P-1?000?0001.?00t168/00-96dCIU-VI3 814170/C00Z aseeieu -10d peACLIddV
308 The Journal of Parapsychology
FlmoRT0N, C. (1987). Precognition and real-time ESP performance in a com-
puter task with an exceptional subject. journal of Parapyrhnlo , cl, 991-
320.
Psychophysical Research Laboratories
P 0. Box 569
Plainsboro, NI 08536
17-1?000?0001.?00U68/00-96dCIU-VIO 814170/?00Z aseeieu -10j panoiddv
17-1?000?0001.?00t168/00-96dCIU-VIO : 81./170/C00Z aseeieu -10d peACLICIdV
PSI COMMUNICATION IN THE GANZFELD
EXPERIMENTS WITH AN AUTOMATED TESTING SYSTEM
AND A COMPARISON WITH A META-ANALYSIS
OF EARLIER STUDIES
BY CHARLES HONORTON, RICK E. BERGER, MARIO P. VARVOGLIS,
MARTA QUANT, PATRICIA DERR, EPHRAIM I. SCHECHTER, AND
DIANE C. FERRARI
0
CD
a
ABSTRACT: A computer-controlled testing system was used in II experiments on 0
ganzfeld psi communication. The automated ganzfeld system controls target selection
and presentation, subjects' blind-judging, and data recording and storage. Video-
taped targets included video segments (dynamic targets) as well as single images
CD
(static targets). 'Two hundred and forty-one volunteer subjects completed 355 psi Ci)
ganzfeld sessions. The subjects, on a blind basis, correctly identified randomly Se-
lected and remotely viewed targets to a statistically significant degree, z = 3.89, p =
.00005. Study outcomes were homogeneous across the 11 series and eight different 0
experimenters. Performance on dynamic targets was highly significant, z = 4.62, p o
= .0000019, as was the difference between dynamic and static targets, p = .002.
0
Suggestively stronger performance occurred with friends than with unacquainted ?I=.
sender/receiver pairs, p = .0635. The automated ganzfeld study outcomes are corn- ?%
pared with a meta-analysis of 28 earlier ganzfeld studies. The two data sets are con- 0
? ?
sistent on four dimensions: overall success rate, impact of dynamic and static targets,
effect of sender/receiver acquaintance, and prior ganzfeld experience. The combined 0
z for all 39 studies is 7.53, p = 9 x 10-'4. )>
i3
0
Research on psi communication in the ganzfeld developed as the .
result ()I' earlier research suggesting that psi functioning is Fre-
quently associated with internal attention states brought about 0
00
C0 ?
This work was supported by the James S. McDonnell Foundation of St. Louis,
Missouri, and by the John E. Fetzer Foundation of Kalamazoo, Michigan. 0
0
We wish to thank Marilyn J. Schlitz, Peter Rojcewicz, and Rosemarie Pilkington
for their help in recruiting participants; Daryl J. Bern of Cornell University and 0
Donald McCarthy of St. Johns University for helpful comments on an earlier draft. 0
of this paper; Edwin C. May of SRI International for performing the audio spectrum 0
c.A.s
analysis; and Robert Rosenthal of Harvard University for suggestions concerning 0
data analysis. We also wish to thank several PRL colleagues who contributed in var-
ions ways to the work reported here: Nancy Sondow for assistance in the preparation
relaxation exercise and instruction tape that was used throughout, and George
Hansen and Linda Moore who served frequently as lab senders. Hansen also .pro-
vided technical assistance and conducted a data audit resulting in the correction of
several minor errors that appeared in a version of this report presented at the 32nd
Annual Convention of the Parapsychological Association. Finally, we thank the 241
volunteer participants for providing us with such interesting data.
100 The journal of Parap.sychology
through dreaming, hypnosis, meditation, and similar naturally oc-
curring or artificially induced states (Braud, 1978; Honorton, 1977).
This generalization, based on converging evidence from sponta-
neous case studies, clinical observations, and experimental studies,
led to the development of a low-level descriptive model of psi func-
tioning, according to which, internal attention states facilitate psi de-
n
O tection by attenuating sensory and somatic stimuli that normally
mask weaker psi input (Honorton, 1977, 1978). This "noise-reduc-
o_
tion" model thus identified sensory deprivation as a key to the ire-
-n
? quent association between psi communication and internal attention
? states, and the ganzfeld procedure was developed specifically to test
mT the impact of perceptual isolation on psi performance.
()
? - Fifteen years have passed since the initial reports of psi Com-
? munication in the ganzfeld (Brand, Wood, & Brand, 1975;
Honorton & Harper, 1974; Parker, 1975). Dozens of additional psi
0 ganzfeld studies have appeared since then, and the success of the
0 paradigm has triggered substantial critical interest. Indeed, there is
at least one critical review or commentary for every ganzfeld study
co reporting significant evidence of psi communication (Akers, 1984;
Alcock, 1986; Blackmore, 1980, 1987; Child, 1986; Druckman &
5 Swets, 1988; Harley & Matthews, 1987; Harris & Rosenthal, 1988;
Honorton, 1979, 1983, 1985; Myelin:inn, 1986; Hyman, 1983,
O 1985, 1988; Hyman & Honorton, 1986; Kennedy, 1979; McClenon,
CO 1986; Palmer, 1986; Palmer, Honorton, & Utts, 1989; Parker &
6 Wiklund, 1987; Rosenthal, 1986; Sargent, 1987; Scott, 1986;
_9 Stanford, 1984, 1986; Stokes, 1986; Utts, 1986).
co Of the many controversies spanning the history of parapsycholog-
? ical inquiry, the psi ganzfeld domain is unique in three respects.
0 First, the central issue involves the replicability of a theoretically
-based technique rather than th-e Special abilities or exCerpth5fiiirin-
o
0 dividuals (Honorton, 1977). Second, meta-analytic techniques have
(.4 been used to assess statistical significance, effect size, and potential
0 threats to validity (Harris & Rosenthal, 1988; Honorton, 1985;
-% Hyman, 1985, 1988; Rosenthal, 1986). Third, investigators and crit-
ics have agreed on specific guidelines for the conduct and evaluation
of future psi ganzfeld research (Hyman & Honorton, 1986).
The Automated Ganzfeld Testing System
Psi ganzfeld experiments typically involve four participants. The
subject (or receiver, R) attempts to gain target-relevant. imagery
while in the ganzfeld; following the ganzfeld/imagery period, R
Psi Communication in the Canzfehl 101
tries?on a blind basis?to identify the actual target from among
four possibilities. A physically isolated sender (Se) views the target
.and attempts to communicate salient aspects of it to R. Two exper-
imenters (Es) are usually required. One E manages R, elicits R's ver-
bal report of ganzfeld imagery (mentation), and supervises R's blind
judging of the target and decoys; a second E supervises Se, and ran-
domly selects and records the target.
We developed an automated ganzfeld testing system ("autoganz-
feld") to eliminate potential methodological problems that were
identified in earlier ganzfeld studies (Honorton, 1979; Hyman &
Honorton, 1986; Kennedy, 1979) and to explore factors associated
with successful performance. The system provides computer control
of target selection and presentation, blind judging, subject feedback,
and data recording and storage (Berger & Honorton, 1986). A com-
puter-controlled videocassette recorder (VCR) accesses and auto-
matically presents target stimuli to Se. A second E is required only
for assistance in target selection The system includes an experimen-
tal design module through which E specifies the sample size and
status of a new series.
The system was designed to enable further assessment of factors
identified with successful performance in earlier ganzfeld studies.
Differences in target type and sender/receiver acquaintance seem to
be particularly important. Significantly better performance occurred
in studies using dynamic rather than static targets. Dynamic targets
contain multiple images reinforcing a central theme, whereas static
targets contain a single image. Also, studies permitting subjects to
have friends as their senders yielded significantly superior perfor-
mance compared to those requiring subjects to work, with laboratory
senders-. (See-`Comparison of-Study-Outeentes-with-Ganeta-
Analysis" in the Results section.)
The autoganzfeld system uses both dynamic and static targets.
The dynamic targets are excerpts from films; static targets irfclude
art work and photographs. Receivers may, if they choose, bring
friends or family members to serve as their senders; a session setup
module registers the sender type and other session information.
In this report, we present the results of the 11 autoganzfeld
series conducted between the inauguration of the experiments in
February, 1983, and September, 1989, when funding problems
required suspension of the PRL research program.' We focus on
'This article conforms to the reporting guidelines recommended by Hyman and
Imuirton (1986). !Seca M h
C or io ,,iz th d
o or is ataba,c, however, it is not practical to
P-1?000?0001.?00t168/00-96dCIU-VI3 914170/C00Z aseeieu JOd 130A0iddV
nu, 01 r-urapsycnotogy
(1) evidence for psi in the autoganzfeld situation, (2) the impact of
dynamic versus static targets, (3) the effects of sender/receiver ac-
quaintance, (4) the impact of prior psi ganzfeld experience, and
(5) a comparison of these four factors with the outcomes of earlier
nonautomated psi ganzfeld experiments. Our findings on demo-
graphic, psychological, and target factors will be presented in later
reports.
-o Subjects
-o
The participants are 100 men and 141 women ranging in age
a from 17 to 74 years (mean = 37.3, SD = 11.8). This is a well-
educated group; the mean formal education is 15.6 years (SD =
2.0).
Our primary sources of recruitment include referrals from col-
67 leagues (24%), media presentations concerning PRL research (23%),
friends or acquaintances of PRL staff (20%), and referrals from
mu)
N) other participants (18%).
Belief in psi is strong in this population. On a seven-point scale
where "1" indicates strong disbelief and "7" indicates strong belief
4. in psi, the mean is 6.20 (SD = L03); only two participants rated
co" their belief in psi below the midpoint of the scale. Personal experi-
? ? ences suggestive of psi were reported by 88% of the subjects; 80%
? reported ostensible telepathic experiences. Eighty percent of the
participants have had some training in meditation or other tech-
niques involving internal focus of attention.
CD
Participant Orientation
cb
0
'?%1
CO
CD
0
0
(.4
(.4
Initial contact. New participants receive an information pack be-
fore their first session. The information pack includes a 55-item per-
sonal history survey (Participant Information Form [PIF]; Psycho-
physical Research Laboratories, 1983), Form F of the Myers-Briggs
Type Indicator (MBT1; Briggs & Myers, 1957), general information
about the research program, and directions for reaching PRL. Par-
ticipants usually return the completed questionnaires before their
first session. However, if new participants are scheduled on short
4 notice, they either complete the questionnaires at PRL or, in a few
cases, at home after the session.
include the data in an appendix to the report. Instead, we will supply the data to
qualified investigators in a Lotus-compatible, MS-DOS computer disk file. There is a
small fee to cover materials and mailing. Address inquiries to the Journal.
Psi Communication in the Ganzfeld 103
Whenever possible, new participants are encouraged to come in
for a preliminary orientation session, prior to their first PRL ganz-
['cid session. The orientation serves as a "get acquainted" session for
participants and the PRL staff, and introduces participants to the
PRL program and facility. Participants who avail themselves of this
option generally complete the MBTI and PH' questionnaires during
the orientation session. We inform new participants that they may
bring a friend or family member to serve as their sender. When a-g
participant chooses not to do so, a PRL staff member serves as12,
sender. We encourage participants to reschedule their session rather 2
than feel they must come in to "fulfill an obligation" if they are not a
feeling well. -n
Session orientation. We greet participants at the door when they'
arrive and attempt to create a friendly and informal social atmos-
phere. Coffee, tea, and soft drinks are available. E and other staff (sT))
members engage in conversation with R during this period. When (T)
a laboratory sender is used, time is taken for sender and receiver to N)
become acquainted.
If the participant is a novice, we describe the rationale and back-
ground of the ganzfeld research, and we seek to create positive ex-
pectations concerning R's ability to identify the target. This infor-
mation is tailored to our perception of the needs of the individual
participant, but it generally includes four elements: (1) a brief re-
view of experimental, clinical, and spontaneous case trends indicat-
ing that ESP is more readily detected during internal attention states
such as dreaming, hypnosis, and meditation (Honorton, 1977),
(2) the notion that these states all involve physical relaxation and
functional sensory deprivation, suggesting that weak ESP impr'es-
sions may be more readily detected when perceptual and somatic
noise is reduced, (3) the development of the ganzfeld technique to %)
test this noise-reduction hypothesis, and (4) the long-term success of g
the ganzfeld technique as a means of facilitating psi comrnupicatipri
in unselected subjects.
We encourage "goal orientation" and discourage excessive "task
orientation" during the session; this is especially emphasized with
participants who appear to be anxious or overly concerned about
their ability to succeed in the ganzfeld task. We discourage partici-
pants from analyzing their mentation during the session, and tell
them that they will have an opportunity to analyze their mentation
during the judging procedure. They are encouraged to adopt the
role of an outside observer of their mental processes during the
ganzfeld Again, this is emphasized with those who appear anxious
0
CO
. .
0
-0
CD
6
0
0
0
0
CA)
0
0
0
104 The Journal of Parapsychology
about their performance; they are advised to relax, follow the taped
instructions, and to simply allow the procedure to work. We inform
participants that they may experience various types of correspond-
ence between their mentation and the target; they are told that they
may experience direct, literal correspondences to the target, but that
they should also be prepared for correspondences involving distor-
tions or transformations of the target content, cognitive associations,
and similarities in emotional tone. Finally, we orient new partici-
pants to where Se and E will be located during the session.
Layout and Equipment
R and Se are sequestered in nonadjacent, sound-isolated and
electrically shielded rooms. Both rooms are copper-screened, and
are 14 ft apart on opposite sides of E's monitoring room, which pro-
vides the only access. R and Se remain isolated in their respective
rooms until R completes the blind-judging procedure.
R's room is an Industrial Acoustics Corp., IAC 1205A Sound-
Isolation Room, consisting of two 4-inch sheetrock-filled steel
panels. The two panels are separated by a 4-inch air space, for a
total thickness of one foot.
The inside walls and ceiling of Se's room are covered with 4-inch
Sonex acoustical material, similar to that used in commercial
broadcast studios. A free-standing Sonex-covered plywood barrier
(5 ft wide by 8 ft high) positioned inside the sender's room, between
Se's chair and the acoustical door, blocks sound transmission
Through?the?claw frame. Figure 1 shows the -fluor plan of the ex-
0
0 periinental rooms.
0
E occupies a console housing the computer system and other
cs equipment. The computer is an Apple II Plus with two disk drives,
2 a printer, and an expansion chassis. The computer peripherals in-
4 dude a real-time clock, a noise-based random number generator
(RNG), a Cavri Interactive Video Interface, an Apple game pad-
dle, and a fan. Other equipment includes a color TV monitor, the
VCR used to access and display targets, and three electrically iso-
lated audiocassette recorders. One audiocassette recorder presents
audio stimuli (prerecorded relaxation exercises, session instructions,
and white noise). Another plays background music during the ex-
perimental setup. The third records R's ganzfeld mentation and
Psi Communication in the Ganzfeld 105
RECEIVER
E's equipment
console
Industrial Acoustics
12.05A Sound
Isolation Room
SENDER
0 EXPERIMENTER
Figure 1. Floor plan of experimental suite.
SCALE
5 ft
Double wall with 4"
Sonex Acoustical
Padding and acousti-
cal door
judging period associations. There is two-way intercom
cation between E and R. One-way audio communication
Se allows Se to listen to R's ganzfeld mentation.
Receiver Preparation
0
0
I
commumco
from R
-0
CD
R sits in a comfortable reclining chair in the IAC room. Se keeps5
R company while E prepares R for visual and auditory ganzfel
stimulation. Translucent hemispheres are taped over R's eyes witho
Micropore t4' tape. Headphones are placed over R's ears. A clip-or
microphone is fastened to R's collar. A 600-watt red-filtered flood
?light;located approximateTy 6 ft in?fiont of R's face, is adjusted inS -
intensity until R reports a comfortable, shadow-free, homogeneoug
visual field. White noise level is similarly adjusted; R is informect
that the white noise should be as loud as possible without being ang
noying or uncomfortable. The ganzfeld light and white noise inten-
sity are adjusted from E's console after R and Se are sequestered in
their respective rooms.
Sender Preparation
Se sits in'a comfortable reclining chair in the sender's room. Se
faces a color.TV monitor, wearing headphones. During the session,
Se can hear R's mentation report through one headphone; if dy-
0
CD
0-
0
crt
01
0
cr)
???1
03
C.0
0
0
C.4
0
0
0
C.4
0
0
0
106 The Journal of Parapsychology
namic targets are used, Se hears the target audio channel through
the other headphone.
Series Manager Setup Procedures
E accesses the autoganzfeld computer program through the Se-
ries Manager software. Series Manager is a password-protected, menu-
driven control program. It provides the only means through which
an experimenter may specify parameters for the series design, reg-
ister new participants in the series, set up a session, and run a ses-
sion. The Series Manager menu is accessed through entry of a private
(and nonechoing) password.
Series design. A valid series design must exist before sessions can
be run in an experimental series. This is done through the Series
Manager "design" module. The design module prompts E to specify
the type of series (pilot, screening, or formal), the number of
participants, the maximum number of trials per participant, the
total number of trials per series, and the series name. There is no
provision for changing the series design once it is accepted by E.
Design parameters are saved in a disk file; they are passed to the
experimental program at the beginning of the session.
Participant registration. When R is new to a series, E accesses
"Participant Registration" from the Series Manager menu before the
session. E is prompted to enter R's name and identification number.
The module verifies that the maximum number of participants
specified in the design is not exceeded. (An error message appears
if an attempt is made to register more participants than are speci-
fied in the design; then, control is returned to the Series Manager
menu.)
Session setup. E then selects "Session Setup" from the Series Man-
ager menu, E is prompted to enter R's name and thc program ver-
ifies that R has not already completed the maximum number of
trials specified in the design module. (An error message appears if
a participant has completed the number of sessions allowed for the
series or has not been properly registered; control is then returned
to the Series Manager menu.) E enters Se's name and the sender
type: lab, lab friend, or friend. Lab senders are PRL staff members
whose acquaintance with the participant is limited to the experi-
ment. Lab friend refers to PRL staff senders who have some .social
acquaintance with R outside the laboratory. Friend senders are friends
or family members of the participant. Finally, E enters the ganzfeld
light and noise intensity levels and his or her initials. E then leaves
Psi Communication in the Ganzfeld 107
the monitoring room while another PRL staff person supervises tar-
get selection.
Targets
The system uses short video segments (dynamic targets) and still
pictures (static targets) as targets. Dynamic targets include excerpts
from motion pictures, documentaries, and cartoons. Static targets 4';
include art prints, photographs, and magazine advertisements.
There are 160 targets, arranged in judging sets of four dynamic 2
or four static targets. The sets were constructed to minimize simi- a
larities among targets within a set. The targets are recorded on four -n
one-half-inch VI-IS format videocassettes; each videocassette con- 9,
tains 10 target sets (5 dynamic and 5 static). A signal recorded on g?
an audio track of each videocassette allows computer access of the (7
targets. Target display time?to Se during each sending period and Po'
to R during the judging period?is approximately one minute;
blank space added to briefer targets insures that the VCR remains g
in play mode for the same length of time for all targets.
Preview packs. The video display format of the autogan'zfeld tar- 2
gets does not permit simultaneous viewing of the entire target set colZ
during the judging procedure as is done in many nonautomated
ganzfeld studies. Each target set is therefore accompanied by a pre- 0
view pack containing brief excerpts of all four targets in the set; this
gives R a general impression of the range of target possibilities. R
views the preview pack at the beginning of the judging procedure; :to]
it runs approximately 30 sec.
6
CO
CD
Titc IIIISCL sclectoi (TS) is a PRL staff member who has no eon- g
tact with either E or R until after the blind-judging procedure. TS (-4
is needed to load the videocassette containing the target into the g
VCR. TS is informed which of the four videocassettes contains the (.9
target, but remains blind to the target's identity. If Sc is a staff g
member, Se serves this role; otherwise, a staff member not involved 0
in the session serves as TS. (In the latter case, Se and R are segues- 4.
tered in their respective rooms before TS enters the monitoring
room.)
The Series Manager program prompts TS to press a key on the
computer keyboard. A program call to the hardware RNG obtains
the target.-yalue (a number between 1 and 160) and stores it in, corn-
Target Selection
108 The Journal of Parapsychology
puter memory.' The program determines the target set and video-
cassette number from the target value. The videocassette number is
displayed on the monitor, and TS is prompted to insert it into the
VCR. The program verifies that the correct videocassette has been
inserted and clears the monitor screen; if the videocassette is not
-cs correct, an error message prompts TS to insert the correct video-
cassette.
0
TS places a cardboard cover over the VCR's front panel to con-
e.
a ceal the digital counters and VU meters. Finally, TS leaves the mon-
m itoring room with the three remaining videocassettes, knocking
0
n three times on the monitoring room door as a signal for E to return.
(7) Relaxation Exercises and Ganzfeld Instructions
co
0
Ts'
0
co
co
co
CD
0
0
0
0
0
0
0
0
R and Sc undergo a I4-min prerecorded relaxation exercise be-
fore the mentation/sending period: This provides a unique shared
experience for R and Se before the ESP task. The relaxation exer-
cise includes progressive relaxation exercises and autogenic phrases
(Jacobson, 1929; Shultz, 1950). Ganzfeld instructions are recorded
after the relaxation exercise. The instructions and relaxation exer-
cise are delivered in a slow, soothing but confident manner with
ocean sounds in the background. The style of presentation is similar
to a hypnotic induction procedure. The ganzfeld instructions to R,
which are also heard by Se, areas follows:
During this experiment we want you to think out loud. Report all of the
images, thoughts, and feelings that pass through your mind. Do not
cling to any of them. Just observe them as they go by. At some point
during the session, we will send you the target information. Do not try
to anticipate or conjure up this information. Just give yourself the sug-
gestion, right now?m-theTormiT) -making a wisii?that the information
will appear in consciousness at the appropriate time. Keep your eyes
open as much as possible during the session and allow your conscious-
ness to flow through the sound you will hear through the headphones.
One of us will be monitoring you in the other room. Now get as coni-
fortable as possible, release all conscious hold of your body, and allow
it to relax completely. As soon as you begin observing your mental proc-
esses, start thinking out loud. Continue to share your thoughts, images,
and feelings with us throughout the session.
2 An exception Occurs in the two target comparison series (Series 301 and 302).
See pp. 112-113.
Psi Communication in the Ganzfeld 109
Mentation/Sending Procedures
Receiver mentation report. After the relaxation exercise and in-
structions. R listens to the white noise through headphones for 30
minutes. R reports whatever thoughts, images, and feelings occur in_g
the ganzfeld. The mentation report is monitored by E and Se fronig
their respective rooms. The mentation report is tape recorded, and 2
E takes detailed notes for review from R prior to judging.
cr.
Target presentation and sender procedures. A Cavri Video Interface -n
automates computer access and control of targets from a JVC BR- 9,
6400U VCR. An electronic video switcher selectively routes the 47?
video output (VCR or computer text mode) to three color TV mon- ET
itors, one each for E, R, and Se. E's and R's monitors remain in
computer text mode until the judging period. During each of the (D
six sending periods, Se's TV monitor is switched from computer 6-
0
text to VCR mode.
At the beginning of each sending period, Se's monitor displays .c12
the prompt., "Silently communicate the contents and meaning of the
target to [R's first name]." Sc views the target and attempts to corn- 00
munic:ate its contents to R. Se mentally reinforces R for target- 0
related associations and mentally discourages R when the mentation
is unrelated to the target.
-0
Judging Procedure CD
co
6
After the mentation period, E turns off the ganzfeld light and
reads back R's mentation from the session notes. R remains in ganz-
feld during the mentation review to minimize any abrupt shift in ??)
state. E's and R's TV monitors are switched into VCR mode by the ej
puter,-which-also-prompts Se-to "Silently-direct [R's -first name]
to select the target that you saw." Se's TV monitor remains blank g
(computer mode) during this period. co
R removes the eye covers and view's the preview pack.-- From 8
their respective rooms, R and E then view the four potential targets
(the actual target and three decoys), which are presented in one of .12.
four random sequences. R, viewing each candidate, associates to the
item as though it were the actual target, describing perceived simi-
larities between the item and the ganzfeld mentation. While R as-
sociates to each candidate, E points out potential correspondences
that R may,have overlooked.' R views any of the target candidates
as often as desired before proceeding to the judging task.
3 This applies to Pilot Series 3, Novice Series 103-105, and to Experienced Series
110 The Journal of Parapsychology
A 40-point rating scale then appears on R's TV monitor. The
scale is labelled 0% on the left and 100% on the right. Using a coin-
puter-ganie paddle to move a pointer horizontally across the rating
scale, R indicates the degree of similarity between his ganzfeld men-
tation and each potential target. E and Se view R's ratings on their
monitors. The program checks for ties, and, if they occur, R re-rates
the four candidates to obtain unique ratings for each. The program
then converts R's ratings into ranks. A rank of 1 is assigned to the
candidate R believes has the strongest similarity to his ganzfeld men-
tation; a rank of 4 is given to the candidate R believes is least like
his ganzfeld experience.
Feedback and Post-Session Procedures
After R finishes judging, Se leaves the sender's room and enters
R's room with E. Se reveals the actual target, which the computer
automatically displays on R's TV monitor. The session data are ?vrit-
ten to a floppy disk file.
Following feedback, E is prompted to backup the series data
disk. The target videocassette is then automatically wound to a po-
sition near the center of the videocassette (frame 50,000). E selects
"Analysis" from the Series Manager menu, and obtains a hardcopy
printout of the session data file. The printout includes: the file
name, R's name and ID number, series type, session number, Se's
name, E's initials, date and start time, target number, target position
in the set, R's target ranking, the standardized target rating (z
score), target judging sequence, target name, target type and set
number, sender type, light and white noise levels, finish time, and
optional experimenter's comments. The printout is attached to E's
notes on R's mentation and placed in a ring binder containing all
such information for the series. The audio tape of the session is sim-
ilarly filed.
Experimenters
Eight Es contributed to the autoganzfeld database. Honorton,
one of the originators of the psi ganzfeld technique, has conducted
psi ganzfeld experiments over a 16-year period. Derr and Varvoglis
201 and 302, It does not apply to (lie earlier series (Pilot Series 1-2; Novice Series
101-102; or Experienced Series 301). This practice was initiated because participants
frequently railed to identify obvious correspondences between their tneittation and
target elements.
Psi Communication in the Ganzfeld 111
worked with Honorton at Maimonides Medical Center and were
trained by him. Berger is primarily for the technical im-
plementation of the autoganzfeld system. He trained Honorton,
Derr, Varvoglis, and Schechter in its use. Honorton trained Quant,
Ferrari, and Schlitz in the use of the autoganzfeld system.'
Experimental Series
Altogether, 241 participants contributed 355 sessions in 11 sa
ries. To fully address the issue of selective reporting, we inclucrv
every session completed from the inauguration of the experimen?ri
in February, 1983, to September, 1989, when the PRL facility vies
closed. Thus, this database has no "file-drawer" problem (Rosenth31
1984).
The studies include three pilot series and eight formal seri.
Five of the formal series were single-session studies with novice pAt-
ticipants. The remaining three formal series involved experiencgi
participants.
0
Pilot Series
CO
Series I. This initial pilot series was conducted during the devil
opment and testing of the autoganzfeld system. It served to test s s-
tern operation, to detect and correct programming errors; and
fine-tune session timing functions. Nineteen subjects contributedy
sessions as Rs. Seven, including PRI, staff members, had pripr
perience as Rs in nonautomated ganzfeld studies at Maimonigs
Medical Center. The remaining 12 Rs were novices with no prar
ganzfeld experience. Series sample size was not specified in adyair).;
the series continued until we were satisfied that the system was go-
erating reliably.
Series 2. This pilot series was designed by Berger in an atterept
to avert potential displacement effects and subject judging problas
by having E rather than R serve as judge: R received feedback ?ly
to the actual target. Four participants contributed to this seas.
Nine of the planned 50 sessions were completed before Berger's kle-
parture. from NU, when this series was discontinued.
Berger', Schechter, and Varvoglis have doctorate degrees in psychology. Quant
holds a masters degree in counselling psychology, and Ferrari has a bachelors degree
in psychology. Schlitz has conducted independent garizteld and remote-viewing re-
search in'other laboratories and has a masters degree in anthropology. . ?
0
Feedback and Post-Session Procedures
(T)
? After R finishes judging, Se leaves the sender's room and enters
rsa R's room with E. Se reveals the thual target, which the computer
automatically displays on R's TV monitor. The session data are writ-
ten to a floppy disk file.
Following feedback, E is prompted to backup the series data
co- disk. The target videocassette is then automatically wound to a po-
sition near the center of the videocassette (frame 50,000). E selects
? ?
O ? "Analysis" from the Series Manager menu and obtains a hardcopy
printout of the session data file. The printout includes: the file
? name, R's name and ID number, series type, session number, Se's
co name, E's initials, date and start time, target number, target position
6 in the set, R's target ranking, the standardized target. rating (z
o score), target judging sequence, target name, target type and set
co number, sender type, light and white noise levels, finish time, and
co
? optional experimenter's comments. The printout is attached to E's
notes on R's mentation and placed in a ring binder containing all
such information for the series. The audio tape of the session is sim-
ilarly filed.
0
0
0 Experimenters
? Eight Es contributed to the autoganzfeld database. Honorton,
one of the originators of the psi ganzfeld technique, has conducted
psi ganzfeld experiments over a 16-year period. Derr and Varvoglis
110 The Journal of Parap.sychology
A 40-point rating scale then appears on R's TV monitor. The
scale is labelled 0% on the left and 100% on the right. Using a com-
puter-game paddle to move a pointer horizontally across the rating
scale, R indicates the degree of similarity between his ganzfeld men-
tation and each potential target. E and Se view R's ratings on their
monitors. The program checks for ties, and, if they occur, R re-rates
the four candidates to obtain unique ratings for each. The program
then converts R's ratings into ranks. A rank of 1 is assigned to the
candidate R believes has the strongest similarity to his ganzfeld men-
tation; a rank of 4 is given to the candidate R believes is least like
his ganzfeld experience.
201 and 302. It does not apply to the earlier series (Pilot Series 1-2; Novice Series
101-102; or Experienced Series 301). This practice was initiated because participants
frequently failed to identify obvious correspondences between their mentation and
target elements.
Psi Communication in the Ganzfeld 111
worked with Honorton at Maimonides Medical Center and were
trained by hint. Berger is primarily responsible for the teci-",;ctil im-
plementation of the autoganzfeld system. He trained Honorton,
Derr, Varvoglis, and Schechter in its use. Honorton trained Quant,
Ferrari, and Schlitz in the use of the autoganzfeld system.'
Experimental Series
-0
-0
Altogether, 241 participants contributed 355 sessions in 11 sei:
ries. To fully address the issue of selective reporting, we include
o.
every session completed from the inauguration of the experimentisi
in February, 1983, to September, 1989, when the PRL facility wA
closed. Thus, this database has no "file-drawer" problem (Rosenthat
1984).
(T)
The studies include three pilot series and eight formal serieN.
Five of the formal series were single-session studies with novice paiD-
ticipants. The remaining three formal series involved experienca
participants.
Pilot Series
CO
Series I. This initial pilot series was conducted during the devg-
opment and testing of the autoganzfeld system. It served to test s-
tern operation, to detect and correct programming errors, and tly.
line-tune session tinting. functions. Nineteen subjects contributed a
sessions as Rs. Seven, including PRI, stall. members, had pripr AR-
perience as Rs in nonautomatecl ganzfeld studies at MaimonicEs
Medical Center. The remaining 12 Rs were novices with no prcor
ganzfeld experience. Series sample size was not specified in adyanre;
the series continued until we were satisfied that the system was *-
erating reliably. ? ?
Series 2. "rhis pilot series was designed by Berger in an attengt
to avert potential displacement effects and subject judging problems
by having E rather than R serve as judge: R received feedback o2y
to the actual target. Four participants contributed to this serles.
Nine of the planned 50 sessions were completed before Berger's ite-
parturc from PRL when this series was discontinued.
Berger, Schechter, and Varvoglis have doctorate degrees in psychology. Quant
holds a ma.sters degree in counselling psychology, and Ferrari has a bachelors degree
in psychology. Schlitz has conducted independent ganzfeld and remote-viewing re-
search mother laboratories and has a masters degree in anthropology. .
P-1?000?0001,COM68/00-96dCIU-VI3 81./170/?00Z eseeieu Jod peAwddv
112 The journal of Parapsychology
Series 3. This pilot series was a practice series for pailicipants
who completed the allotted number of sessions in ongoing formal
series but who wanted additional ganzfeld experience. This series
also includes several demonstration sessions when TV film crews
were present and provided receiver experience for new PRL staff.
The sample size was not preset.
Novice ("Firstzl'imers") Series
The identification of characteristics associated with successful in-
itial performance was a major goal of the PRL ganzfeld project
(Honorton & Schechter, 1987). Except for Series 105, each novice
series includes 50 ganzfeld novices, that is, participants with no
prior ganzfeld experience. Each novice contributed a single ganz-
feld session. Most novices had not participated in any psi experiment
prior to the novice series.
Series 101. This is the first novice series.
Series 102. Beginning with this series, R was prompted after the
mentation period to estimate the number of minutes since the end
of the relaxation/instructions tape.
Series 103. Starting with this series, Rs were given the option of
having no sender (i.e., "clairvoyance" condition). Only four partici-
pants opted to have no sender.
Series 104. A visiting scientist (Marilyn Schlitz) served as E in
seven sessions and as Se in six sessions with subjects from The Juil-
liard School in New York.
Series 105. This series was started to accommodate the overflow
of Juilliard students from Series 104. The sample size was set to 25.
Six sessions were completed at the time the PRL program was sus-
pended. (There were 20 Juilliard students altogether. Sixteen were
in Series 104 and four were in Series 105.)
Experienced Subjects Series
Series 201. This series involved especially promising subjects.
The number of trials was set to 20. Seven sessions by three Rs were
completed at the time the PRL program was suspended.
Series 301. This series compared dynamic and static targets.
Sample size was set to 50 sessions. Twenty-five experienced subjects
each contributed two sessions. The autoganzfeld program was mod-
ified for this series so that each R would have one session with
dy-
Psi Communication in the Ganzfeld
113
mimic targets and one session with static targets. Subjects were in-
formed of this only after completing both sessions.
Series 302. This series used a single dynamic target set (Set 20).
In earlier series, Target 77 ("Tidal Wave Engulfing Ancient City")
had an especially strong success rate while Target 79 ("High-Spe41
Sex Trio") had never been correctly identified. We made two pt-
gram
-
gram modifications for this series. The target selection ("Randoit
ize") routine was modified to select only targets in Set 20, and tick
VCR tape-centering routine was modified to wind the videotape r.R
a randomly selected position between frame numbers 85,000 argl
95,000. The second modification insured that E could not be cuet
perhaps unconsciously, by the time required to wind the tape fro?"
its initial position to the target location.
The study involved experienced Rs who had no prior experien&
with Set 20. Each R contributed one session. Participants were ur6)
aware of the purpose of the study or that it was limited to one targa
set. The design called for the series to continue until 15 sessioiR
were completed with each of the two targets of interest. Twenty-fiv:a
sessions were completed when the PRL program was suspended. !)::
0
Statistical Analysis
Except for two pilot series, series sample sizes were specified icn2
advance. Our primary hypothesis was that the observed succear
rate?the proportion of correctly identified targets?would reliable
exceed the null hypothesis expectation of .25. To test this hypothi`lo
esis, we calculated the exact binomial probability for the observe
number of direct hits (ranks of 1) with p = .25 and q = .75. Org
the basis of the overwhelmingly positive outcomes of earlier studiev
we preset alpha to .05, one-tailed. 0
We also tested two secondary hypotheses, based on riatterns oE
success in earlier psi ganzfeld research. These are: (1) that,dynami.0
targets are significantly superior to static targets, and (2) that per?
formance is significantly enhanced when the sender is a friend of
compared to when R and Se are not acquainted. We initially
planned to test these hypotheses by chi-square tests, a trial-based
analysis. However, a consultant (Dr. Robert Rosenthal) suggested
that a t test using the series as the unit would be a more powerful
test of these hypotheses, and we have followed his recommendation.
The remaii-iing analyses are exploratory.'
5 The statistical analyses in this report were performed using SYSTAT ,(Wilkin-
114 The journal of Parapsychology
TABLE 1
OUTCOME BY SERIES
Series
Series type
Hits
Effect size
subjects trials N % (h)
19
4
25
50
50
50
50
6
3
25
25
I Pilot .25 .99
> 2 Pilot .18 .25
13
13 3 Pilot .07 .--
g()
3101 Novice -.02 -.30
< .
CD 102 Novice .24 1.60
al 03 Novice .11 .67
71104
O Novice .24 1.60
n 105 Novice .87 1.78
X
(D201 Experienced .38 .69
(7301 Experienced .11 .67
D)
0302 Experienced .81 3.93
CD
IV Overall 241 355 122 34 .20 3.89
o
0
Note. The z scores are based on the exact binomial probability with p = .25
4C2and q = .75.
"
co
29 8
9 3
36 10
50 12
50 18
50 15
50 18
6 4
7 3
50 15
25 16
36
33
28
24
36
30
36
67
43
30
64
RESULTS
0
),>Overall Success Rate
? Ganzfeld hit rate. There were 241 participants, who contributed
g355 autoganzfeld sessions. The 122 direct hits (34.4%) yield an exact
inomial p of .00005 (z = 3.89). The effect size, Cohen's It (Cohen,
V01977), is .20. The 95% confidence interval (CI) is a hit rate from
ro% to 39%. Because this level of accuracy would occur about one
?time in 20,000 by chance, we reject the null hypothesis. (See 'Fable
c41.)
0 Success rale by series. Of the 11 series, 10 yield positive outcomes.
oThe mean series effect size is .29, SD = .29, t (10) = 3.32.
o Homogeneity of effect sizes. Traditionally, psi investigators have
obeen preoccupied by whether there is a significant nonzero effect.
-11 An equally important issue, however, is the size of the effect. There
is a growing tendency among behavioral scientists to define replic-
ability in terms of the homogeneity of effect sizes (Hedges, 1987;
son, 1988). When t tests are reported on samples with unequal variances, they are
calculated using the separate variances within groups for the error and degrees of
freedom following Brownlee (1965). Combined zs are based on Stouffer's method
(Rosenthal, 1984). Unless otherwise specified, p levels are one-tailed.
Psi Communication in the Ganzfeld 115
TABLE 2
OUTCOME BY EXPERIMENTER
Experimenter
trials
Hits
Effect
size (h)
Quant
Honorton
Berger
Derr
Varvoglis
Schechter
Ferrari
Schlitz
106
72
53
45
43
11
15
7
38
27
18
19
11
5
9
9
36
38
34
2i
26
36
60
29
.24
.29
.20
.05.
.03
.23
.79
.08
>
-0
-0
n
0
<
0
a
-n
0
-s
X
a)
Rosenthal, 1986; Utts, 1986). Two or more studiesare replicates of,
one another if their effect sizes are homogeneous. We assess them
a)
homogeneity of effect sizes across the 11 series by performing a chi-
square homogeneity test comparing the effect size for each seriesg
with the weighted mean effect size (Hedges, 1981; Rosenthal, 1984).!--4,
o
.The formula is: .P.
where k is
study, and the weighted mean effect size is:
x2(k - 1) = E - 102,
i I
CO
0
the number of studies, N1 is the sample size of the ithci*)
-0 ?
co
co
TI
E
0
T -CD
E(T)
0
The test shows that the series effect sizes are not significantly nong
Homogeneity of Outcome by Experimenter
homogeneous: x2 =
16.25, 10 df, p = .093.
Eight Es contributed to the autoganzfeld database. (See Table 2.)
All eight experimenters have positive effect sizes. A chi-square ho-
mogeneity test, using the mean effect sizes for each E weighted by
sample size, indicates that the results are homogeneous across ex-
perimenters: X2 = 7.13, 7 df, p = -- .415.
P-1?000?000?00t168/00-96dCIU-VI3 81./170/?00Z eseeiati Jod peAwddv
I 16 The Journal of Pa ropAyrhohn,ry
TA 11 I 3
GANzFEt.o SUCCESS IN RELATIUN to r's1 II NI 11E1I. or SESSIUNS
No. of sessions as receiver
1
2
3
4+
N subjects
183
23
24
11
N trials
183
?11;
72
5.1
Hits
53
19
31
19
% Hits
29
41
43
35
Effect size (h)
.09
.34
.38
.22
Subject-Based Analysis
Seventy-six percent of the participants (N = 183) contributed a
single session as R. Fifty-eight Rs contributed multiple sessions. Par-
ticipants with multiple sessions either had direct hits or strongly
suggestive target mentation correspondences in their first session.
? (See Table 3.)
Success rate by subjects. To test the consistency of ganzfeld perfor-
mance across participants, we use the standardized ratings of the
target and decoys (Stanford's z scores; Stanford Sc Sargent, 1983) as
the dependent variable. Stanford zs are averaged for participants
with multiple sessions. Direct hits and Stanford zs are highly mere-
fated. In this database, N (353) is .776. The mean Stanford z for the
241 participants is .21 (SD = 1.04), and t (240) = 3.22 (p = .00073).
The 95% CI is a Stanford z from .08 to .35. The effect size (Cohen's
d; Cohen, 1977) is .21. (The effect size for subjects is nearly identical
to the trial-based effect size, h = .20.) Thus, there is a general ten-
dency for participants to give higher ratings to the actual target
than to the decoys,_and the significance_a_these?experiments is?not
attributable to exceptional performance by a few outstanding sub-
jects.
Dynamic Versus Static Targets
The success rate for dynamic targets is highly significant. There
are 190 dynamic target sessions and 77 direct hits (40%, Ii= .32;
exact binomial p = 1.9 x 10-6, z = 4.62). The hit rate for static
targets is not significant (165 trials, 45 hits, 27%, It = .05, p = .276,
z = .59). Using the series effect size as the outcome variable and
target type as the predictor variable, the point-biserial correlation
(re) between ganzfeld performance and target type is .663, t (17) =
Psi Communication in the Ganzfeld 117
TA int: '1
SENDER/RECEI VER PAIRING
Sender as:
Lab
Lab
friend
Friend
N trials
N hits
% Hits
Effect size (11)
1,10
46
33
.18
2.01
.023
66
24
36
.24
1.93
.026
145
52
36
.24
2.83
.0023
0
CD
0-
11
0
07
3.65, p = .002.' The 95% CI for dynamic targets is a hit rate fromN)
34% to 47%. The CI for static targets is from 21% to 34%. Thus,g
our hypothesis concerning the superiority of dynamic targets istg
strongly supported.
SimderIReceiver Pairing
CO
0
Receivers are more successful with friends than with laboratoryi3
senders, although the difference is not statistically significant. TheD
number of sessions in this analysis is 351 because four subjects:Cr!
opted to have no sender. The best performance occurs with friendcg
senders. Sessions with laboratory senders, although significant, haveF.1)
the lowest success rate. (See Table 4.) so
Using series effect sizes as the unit of analysis and sender typex
as the predictor variable (combining lab friend and friends), r1, isg
.363, t 61 0635 7 ..The....-
friends is a hit rate from 33.3% to 47%. For lab senders, the CI is g
from 18.3% to 41.8%. Thus, although the effect of sender type is
not statistically significant, there is a trend toward better resuftS with g
friends.
'Separate effect sizes were obtained for the dynamic and static target sessions of
each series. Since Series 302 used dynamic targets only, the analysis is based on 11
dynamic target effect sizes and 8 static target effect sizes; two static target series (105
and 201) had extremely small sample sizes (2 and 3 sessions, respectively). A similar
procedure is used in the analyses of sender/receiver pairing and experienced versus
novice subjects.
'Three series involving laboratory senders were eliminated from this analysis be-
cause of extremely small sample sizes. These include Series 2 (a = 2), Series 105 (a
= 2), and Series 201 (n = 1). Thus, the point biserial correlation is based on 11
series with friends and 8 series with laboratory senders.
118 The journal of Parapsychology
Ganzfeld Experience
-rwo hundred and eighteen participants had their first experi-
ence as ganzfeld receivers in the autoganzfeld series. (This includes
. the 5 Novice Series 101-105 and 12 novices in Series 1.) For all but
24 (11%), their initial autoganzfeld session provided their first ex-
_0> perience as participant in any parapsychological research. Of the
13 218 novices, 71(32.5%, h = .17) correctly identified their target (ex-
O act.binomial p = .0073, z = 2.44).
? Participants with some ganzfeld experience contributed 137
-na trials and 51 hits (37%, h = .26, p = .001, z = 3.09). When series
? effect sizes are used as the unit. of analysis and prior ganzfeld ex-
xf? perience is used as the predictor variable, i, is .078, 1 (10) , 9.25,
p = .41. The 95% CI for novices is a hit rate from 25.5% to 49.5%.
Pn) The CI for experienced participants is from 29% to 50%.
ati
0 Participation by PRL Laboratmy Staff
o
For completeness, we report the contribution of laboratory staff
?% as subjects in this database. PRL staff members contributed 12 ses-
sions as R. These sessions yield 3 hits (exact binomial p = .50; h =
O .00).
O White Noise and Ganzfeld Illumination Levels
co
to)
6 The mean white noise level (in arbitrary units of-0-7.5) is 2.97
o (SD = 1.77). As measured from the headphones, the mean noise
at level is approximately 68 dB. The mean light intensity (arbitrary
co
? units of 0-100) is 73.8 (SD = 26.1). Preferred noise and light in-
tensity levels are highly correlated: r = .569, 1 (353) = 12.99.
Neither noise nor light intensity is significantly related to ganz-
feld performance. The point-biserial correlation between hits and
o
c.,) noise level is ?.026, 1 (353) = ? 0.18, p = .631, two tailed. For light.
o ?
0 intensity, ri, is ?.040, 1 (353) = ?0.76, p = .449, two tailed.
RANDOMNESS TESTS
The adequacy of randomization was a major source of disagree-
ment in two meta-analytic reviews of earlier psi ganzfeld research
(Honorton, 1985; Hyman, 1985). In this section we document the
Psi Communication in the Ga74eld 119
adequacy of our randomization procedure according to guidelines
agreed on by Hyman and Honorton (1986).
Global Tests of Random Number Generator
Full-range frequency analysis. As described earlier, autoganzfeld
targets are selected through a program call to the RNG for values
within the target range (1-160). The number of experimental ses-
sions (Ai = 355) is too small to assess the RNG output distribution
for the full range, so we performed a large-scale control series to
test the distribution of values. Twelve control samples were col-
lected. These included five samples with 156,000 trials, six samples
with 1,560 trials, and one sample of 1,560,000 trials. The 12 result-
ing 'clii-square values were compared to a chi-square distribution
with 155 df, using the Kolmogorov-Smirnov (KS) one-sample test.
The KS test yields a two-tailed p = .577, indicating that the RNG
used in these experiments provides a uniform distribution of values
throughout the full target range.'
Test of frequency distribution for Set 20. We used a single target set
(Set 20) in Series 302. We repeated the frequency analysis in a
40,000-trial control sample, restricting target selection to the four
target values within Set 20 (Targets 77-80). A chi-square test Of the
distribution of targets within Set 20 shows that the RNG produces
n uniform distribution of the target values within- the set: x?'- = 3.19,
3 df, p = .363.
Tests of the Experimental RNG Usage
Each autoganzfeld session required two RNG calls. An RNG call
at the beginning of the session determined the target; ? another,
made before the judging procedure, determined the order in which g
the target and decoys were presented for judging.
Distribution of targets in the experiment. -A chi-square test of the dis-
tribution of' values within the target sets shows that the targets were cs
selected uniformly from among the four possibilities. within each set; (a
x2 with 3 df is 0.86, p = .835.
Distribution of judging order. A chi-square test of the judging order
indicates that the targets were uniformly distributed among the four
possible judging sequences: the x2 with 3 df is 1.85, p = .604.
"(hie or the preview pack elements for Set 6, containing Targets 21-24, was
damaged. This required filtering the RNG calls in the experiment and control tests
to bypass the 6maged portion of the videotape, leaving the targets in Pool 6 unused.
Thus, for the-full-range analyses reported here, there are 155 df rather than 159.
120 The journal of Paraksychology
Summary
The randomness tests demonstrate that the RNG used for target
selection in these experiments provides an adequate source of ran-
dom numbers and was functioning properly during the experi-
ments.
0
? EXAMPLES OF TARGET-MENTATION CORRESPONDENCES
a
9, In this section, we present some examples of correspondences
xbetween targets and ganzfeld mentation. Although conclusions can-
be drawn from qualitative data, this material should not be ig-
nored. It constitutes the raw data on which the objective statistical
co
mevidence is based, and may provide important insights concerning
ghe underlying process. These examples are excerpts from sessions
Df subjects' ganzfeld mentation reports, identified by them during
ahe blind judging procedure as providing their basis for rating the
arget.
co ?
ffarget 90, Static: DaliS "Christ Crucified."
Feries I. Participant II): 77. 1?anh = 1. z score = /.67.
0 "...I think of guides, like spirit guides, leading me and I come into like
1:I
co a court with a king. It's quiet.... It's like heaven. The king is something
0, like Jesus. Woman. Now I'm just sort of summersaulting through
6
o heaven.... Brooding.... Aztecs, the Sun God.... High priest....
-.4 Fear.... Graves. Woman. Prayer.... Funeral.... Dark. Death....
co
co Souls.... Ten Commandments. Moses ...."
X
Earget 77, Dynamic: Tidal wave engulfing ancient city. From "Thr Clash
the rttans, a .filla-TaSe-d an Greek Myikalagy. /I huge tidal wave cra,vhes
to the shore. The scene shifts to a center courtyard of an ancient Greek
it? y; there is a statue in the center, and buildings with Greek columns around
e periphery. People are running to escape consumption by the tidal wave.
fater rushes through the buildings, destroying the columns and the statue;
ftople scurry through a stone tunnel, just ahead of the engulfing water;
debris floats through the water.
Series: I. Participant ID: 87. Rank = 1. z score = 1.42.
" ...The city of Bath comes to mind. The Romans. The reconstruction
of the baths through archaeology. The Parthenon. Also getting sort of
buildings like Stonehenge but sort of a cross between Stonehenge and
the Parthenon. The Byzantine Empire. The Gates of Thunder. The?
Psi Communication in the Ganz. fe 121
Holy See. Tables floating about.... The number 7 very clearly. That just
popped out of nowhere. It reminds me a bit of one of the first Clash
albums, however. The Clash, "Two Sevens" I think it was called, I'm not
sure...." [The target was number 77.]
Series 302. Participant ID: 267. Rank = I. z score = 2.00.
"...A big storm over New York City. I'm assuming it's New York City.
No, it's San Francisco.... A big storm and danger. It looks so beautiful
but I'm getting the sense of danger from it.... It's a storm. An earth-
quake...."
Target 63, Dynamic: Horses. From the film, "The Lathe of Heaven." An
overhead view of five horses galloping in a snow storm. The camera zooms
in on the horses as they gallop through the snow. The scene shifts to a close-
up of a single horse trotting in a grassy meadow, first at normal speed, then
in slow-motion. The scene shifts again; the same horse trotting slowly
through empty city streets.
Series: 101. Participant ID: 92. Rank = 1. z score --- 1.25.
"...I keep going to the mountains.... It's snowing.... Moving' again,
this lime to the left, spinning to the left Spinning. like on a carousel,
horses. I horses on a carousel, a circus "
Target 46, Dynamic: Collapsing Bridge. Newsreel footage of the collapse of
a bridge the 1940s. The bridge is swaying back and forth and up and down.
Light posts are swaying. The bridge collapses from the center into the water.
Series: 101. Participant ID: 135. Rank = I. z score = 1.94.
" ...Something, some vertical object bending or swaying, almost some-
thing swarillg_iirt the wind.... Some thin vertical abject,-13e-nditag--to-t-he
left Some kind of ladder-like structure but it seems to be almost
blowing in the wind. Almost like a ladder-like bridge over some kind of
chasm that's waving in the wind. This is .not vertical this is horizon-
tal.... A bridge, a drawbridge over something. It's like one of those old
English type bridges that opens up from either side. The middle part
comes up. I see it opening. It's opening. There was a flash of an old
English stone bridge but then back to this one that's opening. The
bridge is lifting, both sides now. Now both sides are straight up. Now
it's closing again. It's closing, it's coming down, it's closed. Arc, images
of arcs, arcs, bridges. Passageways, many arcs. Bridges with many
arcs
Target 137, Static: "Working on a Watermelon Farm." This painting shows
a black man ? with his back to the picture; his suspenders form a V-shape
122 The Journal of Parapsychology
around his shoulders. A dog is in front of the man; there are watermelons
between the dog and the man. The man faces a dirt path with watermelon
patches on either side. on, the left side, another man pushes a wheelbarrow
filled with huge watermelons.
Series: 101. Participant ID: 105. Rank = 2. z score =-- 0.98.
"...a small lamb, very soft, outside. Small, playful.... I see a
-0 shape.... An apple.... I see a kitchen towel with a picture on it. Apple
seeds or a fruit cut in half showing the seeds. A tomato or an apple.
2 The fruit was red on the Outside.... I thought of watermelon as in a
a watermelon basket. Thinking of kids playing on a beach. Little kids
m playing with balls that are bigger than they are and buckets that are
9, three-quarters their size.... I had a thought of going through a tunnel,
not the kind of tunnel you see on Earth but the type of tunnel described
2. when someone dies."
coTarget 64, Dynamic: 1920s Car Sinking. 1' the film "Ghost Stoiy."I he
CD
Kjcene depicts the murder of a young blonde woman by three young men in
ghe 1920s. The men are all wearing suits; one of the men is wearing a
gedora hat that is turned up in the back. The men push an old car into a
.lake. The camera shifts between close-ups of their facial expressions, and the
car, as it slowly sinks into the water. Thr woman's face and hand appear in
? ? the car's large rectangular rear window; she silently screams out for help.
......C2The car disappears beneath the water as the sequence ends.
i'Series: 102. Participant ID: 154. Rank. = I. z score = 1.45.
" ... Girl with a haircut.... Blond hair.... A car.... The back or sonic-
one's head.... Someone running to the right.... Someone on the right
6 in a brown suit.., and a fedora hat turned up very much in the
0 back.... Fedora, trench coat, dark tie.... A tire of a car. The car's going
co to the left. An old movie.... I'm picturing an Edward G. Robinson
to
xj movie.... Big roundish car like 1940's. Those scenes from the back win-
o
0 dow. Bumping once in a while up and down looking through the back
window you could see that it was probably a big screen in back of the
0 car and the car's standing still actually.... I think it's a movie I saw.
0
0 They're being shot at and shooting at the window and then the girl gets
0 shot.... Girl with the blonde haircut.... Someone walking in a suit,
0
0 brown suit.... It's the 1940's again, 30's maybe. Except it looks like it's
in. color. Something red, blood ...blood on someone's lap.... A dead
person all of a sudden.... A big mouth opened. Yelling, but no
sound.... Two people running near a train.... Dressed in 1920 type
suits with balloony pants, like knickers A big, old-fashioned white car
with a flat top. 1920's, "
?
Target 107, Static: Stained-Glass Madonna with Child. This is a stained-
glass window depicting the Virgin Mary and Christ child.
Psi Communication in the Ganzfeld 123
Series: 102. Participant ID: 183. Rank = 2. z score = 0.61.
"Sonic kind of a house, structure.... Some kind or wall or building.
Something with the sky in the background. Thinking of a bell. A bell
structure. Something with a hole with the light coming through the
hole.... Like a stained glass window like you see in churches."
Target 19, Static: Flying Eagle. An eagle with outstretched wings is about
to land on a perch; its claws are extended. The eagle's head is white and its
wings and body are black.
Series: 104. Participant ID: 316. Rank = 1. z score = 2.00.
" ... A black bird. I see a dark shape of a black bird with a very pointed
beak with his wings down.... Almost needle-like beak.... Something
that would fly or is flying... like a big parrot with long feathers on a
perch. Lots of feathers, tail feathers, long, long, long.... Flying, a big
huge, huge eagle. The wings of an eagle spread out.... The head of an
eagle. White head and dark feathers.... The bottom of a bird...."
Target 144, Dynamic: Hell. From the film "Altered States." This sequence
depicts a psychedelic experience. Evetything is tinted red. The rapidly shifting
scenes include: A man screaming; many people in the midst of fire and
smoke: a man. screaming in an isolation tank; people in agony; a large sun
with a corona around it; a mass crucifixion; people jumping off a precipice,
in the midst of fire, smoke, and molten lava; spiraling crucifixes. There is a
close-up of a lizard's head, slowly opening its mouth, at the end of the se-
quence.
Series: 104. Participant ID: 321. Rank = 1. z score = 1.49.
" ...1 just see a big `X'. A big I see a tunnel in front of me. It's
like a tunnel of-smog or a tunnel of smoke. I'm going down rm
going down it at a pretty fast speed.... I still see the color red, red, red,
red, red, red, red, red.... Ah, suddenly the sun.... The kind of cartoon
sun you see when you can see each pointy spike around the sphere... . I
stepped on a piece of glass and there's a bit of blood coming out of My
foot.... A lizard, with a big, big, big head...."
Target 148, Static. Three U77U,S7tat Planes. Three small aircraft flying in
formation. The planes are white and have swept-back wings; their landing-
gear is extended. A winding road is visible below.
Series: 104. Participant ID: 322. Rank = 2. z score = 0.39.
" ... A .jet plane.... A 747 on the way to Greece. Blue.skies. Sounds like
it's going Aigher....1 think I'm back on the plane again. I never used
to be afraid of flying until recently.... They need better insulated jets,
soundproof' like these ? rooms. They could use these comfortable seats,
too. And the leg room. The service isn't bad either....Still can't get the
I 21 77le journal of Parapsychology
feeling of being in an airplane out of my mind. Flying over Greenland
and Iceland when I went to England.... Feels like we're going higher
and higher.... Descending. It seems we're descending.... Big airplanes
flying over with people like me -staring down.... Flying around in a
piece of tin.... Feel like I'm getting a G-force. Maybe I am taking off.
Sure feels like it. Feels like we're going straight up.... I always feel like
when I'm on the plane going home, I just hope that plane makes it past
0 the Rocky Mountains "
a Target 10, Static: Santa and Coke. This is a Coca-Cola Christmas ad from
the 1950s, showing Santa Claus holding a Coke bottle in his left hand; three
buttons are visible on Santa's suit. Behind Santa and to his left, is a large
(DX bottle cap with the Coca-Cola logo leaning against an ornamented Christmas
tree:
Series: 104. Participant ID: 332. Rank = I. z score = 1.11.
CD
0 " . There's a man with a dark beard and he's got a sharp face....
cc)
There's another man with a beard. Now there's green and white and
cc) he's in bushes and he's sort of colonial. He looks like Robin Hood and
he's wearing a hat.... I can see him from behind. I can see his hat and
he has a sack over his shoulder.... Window ledge is looking clown and
0 there's a billboard that says 'Coca-Cola' on it.... There's a snowman
again and it's got a carrot for a nose and three black buttons coming
down the front.... There's a white beard again. There's a man with a
0 white beard.... There's an old man with a beard...."
to
0" Target 70, Dynamic: Dancing in NY City Streets. From the film "The Wiz."
65 The span of yellovq)aved bridge over a body of water and automobile traffic
01 is visible in the opening scene; the New York City skyline is in the back-
ground. A hot-air balloon flies overhead. The scene shifts as Dorothy (Diana
0 Ross), her dog Toto, the Lion, Tin Man, and Scarecrow dance along the
04 bridge; one _of the_bridge's. supporting Archesisi2ehind?thein?The_Chrys1er
0
0
0
CA)
0
0
0
Budding is in the background. At the end of the sequence, the characters
dance in front of a painted backdrop of an old-fashioned building.
Series: 105. Participant ID: 336. Rank = I. z score = 1.40.
"Big colorful hot air balloons._ White brick wall.... Ocean.... People
walking before my eyes. Several people.... A dog. Hot air balloon....
.a nightclub singer.... Back of a woman's head, short curly hair....
Water.... Balloon, big balloon.... Yellow.... Very tall building. Look-
ing down at a city. Leaving a city, going up.... Faces. An arc....
Water.... A woman's face.... Cars, freeway.... A rock-n-roll star
chanting.... Architecture. A jester's geometrical figures, designs.
...Yellow chocolate bar. Water. Going down into water, deep down....
Man with long golden hair and sun glasses .... The Bay, San Francisco
l'si Communication in the Canzjeld 125
Bay. A lion.... Highways Lion, see a lion.... Tornado.... Bal-
loon.... Face mask.... City.... Leaning Tower of Pisa Long hall-
way, doorway.... Long road. Long, long desert road...."
Target 22, Dynamic: Spiders. From the documental), "Life on Earth." A
spider is weaving its web. The spider's long legs spring up and down re-
peatedly, weaving strands of the web. The body of the spider is constantly in
motion, and bounces up and down. A close-up shows one of the. veins of the
web being stretched out by the spider. Various views of the web. ?
Series: 301. Participant ID: 146. Rank = 2. z score = 0.65.
"... Now visual patterns more like a spider web and the color. And then
like the form of the veins of a windmill Something like a spider web
again. A spider Web. A pattern that instead of a spider web it looks like
basket weaving.... An image of the way sonic children were able to do
something like flying when I was a child though I never had one. It was
a?forgotten what it was called?a pogo stick or ajump stick, something
in which you jumped up and down and you could hop quite a distance
by doing so.... I have kinesthetic images all over as in vigorous motion
expressed in flying or jumping on this sort of spring stick that I men-
tioned.... Vigorous motion. It's as though I were trying to combine re-
laxation with participating in an image of something very vigoroics.... I
really feel carried away by these images of vigorous activity without
being able to localize this activity .as to .what
Target 108, Static: Two fire eaters. A young fire eater, in the foreground,
facing to the right of the picture, blows a huge flame out of his mouth. In
the background there is another .fire eater. A group of people are watching
on the left side of the picture.
Series: 301. Participant ID: 146. Rank = I. z score = 1.71.
"... I keep having images of flames now and then.. :. The sound re-
minds me oFflames too.... I aria flames again.... In these new images
the fire takes on a very menacing meaning.... Rather mountainous
sticking up of bare rocks just as though they had come from a recently
formed volcano. Volcanos of course get back to the fire, extreme heat.
I had an image of a volcano with molten lava inside the crater. Molten
lava running down the side of the volcano.... Cold. Written out there
behind the visual field and thinking how it contrasts with my images of
flames. Although my images of flames didn't actually include much real.
feeling of heat. I didn't have any imagery of heat in connection with. the
flames. just abstract thought of flames.... Now I think of the water as
a way of putting out flames. Suddenly, I was biting my lip. Biting my
lip as though' lips had something to do with the imagery and I see lips
out in front .of me.... And the lips I see are bright red, reminding me
of the flame imagery earlier. And then a bright heart such as Valentine's
0
CD
a.0
c7
CD
co
0
co
cr)
???1
CO
CD0
0
C4
0"
0
0
C4
0
0
0
126 The journal of Parapsychology
candy in the shape of a heart. The cinnamon flavored candies that I
remember as a child having at Valentine's. Red color....This red as in
the cinnamon candy is a deep very intense red. And similarly for the
flames. And now I sec (11c word 'red'...."
Target 94, Dynamic: Hang Gliders. The sequence shows a skier on a? V-
.sha per! hang glider. The .skier Amin high up above snow covered mountoim
and a pine forest. At the end, the skier lands on a mountain slope and skis
away. The sequence is accompanied by Pachelbel's Canon.
Series: 301. Participant ID: 188. Rank = I. z score = 1.26.
Some kind of 'V' shape, like an open book.... I get some moun-
tain.... Some kind of bird with a long wing.... The shape of an upside
down 'V'....Ski, something about skiing came to me.... Some kind of
a body like an oval shape of a body with wings on top of it in a
shape. Another 'V' like a wing shape....Something with wings....
Again the shape of an umbrella came into my mind. A butterfly
shape...."
Target .80, Dynamic: Bugs Bunny in Space. In this cartoon, there is a close-
up of the lower part of a cigar-shaped rockets/tip and the supports holding
it up. The rocket assembly slides over to the launching pad, directly above
Bugs Bunny's underground patch. The scene shifts to the underground
patch, as Bugs Bunny climbs up the ladder leading out of his patch. Un-
knowingly, he climbs up through the interior of the rockets/tip. The rocket's
supports pull away and then it takes off into space. The rocket's nose .cone
spins as Bugs Bunny appears through the top and he sees the Earth recede
rapidly in the distance. As the sequence ends, Bugs Bunny is hit in the belly
by a comet.
Series: 302. Participant ID: 292. Rank = I. z score = 1.48.
"... Space craft....The solar system. The underside of a helicopter or
a submarine or some kind of fish that you're seeing from under-
neath....Sort of being underneath it. Sort of being underneath A
very strange image like a cartoon character, animated character. With
his mouth open kind of.... Like a hypodermic needle or a candle or
this shaft like thing with the a pointed top again.... missiles
flying.... An aerial perspective.... I'm just kind of editing here I think.
I'm really hoping all this rocketship kind of imagery isn't because of the
noise. I feel like I'm in a rocketship or something....That image of the
ship going into the belly of the mother ship...."
COMPARISON OF STUDY OUTCOMES WITH
GANZFEI.D MEI-A-ANALYSIS
In this section, we compare the automated ganzfeld study out-
comes with the results of earlier ganzfeld studies, summarized iii 't
l'si Communication in the Ganzfeld 127
'FA 14 LE 5
COM PA RISON OF OVERALL PERFORMANCE IN AUTOMATED GA NZFELD AND
M ErA-A N A LYSIS DATA SErS
Outcome
variable
z scores
Effect sizes (h)
Database
Meta-analysis
Autoganzfeld
Meta-analysis
Autoganzfekl
studies Mean SD
28 1.25 1.57
11 1.10 1.14
28 .28 .46
11 .29 .29
di 11
0.33 25 .748
0.14 28 .892
Note. The p values are two-tailed.
meta-analysis (1-lonorton, 1985). We
four dimensions: (I) overall success
targets, (3) sender/receiver pairing,
enced subjects.
Overall Success Rate
compare the two databases on
rate, (2) dynamic versus static
and (4) novice versus experi-
To assess the consistency of results, we compare the 11 auto-
ganzfelcl series to the 28 studies in a meta-analysis of earlier ganz-
feld studies (Honorton, 1985, Table Al, p. 84), using direct hits as
the dependent variable. The outcomes of the two data sets are con-
sistent. Both display a predominance of positive outcomes: 23 of the
28 studies in the meta-analysis (82%) and 10 of the 11 autoganzfeld
series (91%) yield positive z scores. The mean autoganzfeld z scores
and effect sizes are very similar to those in the meta-analysis. (See
Table 5.)
Combined Estimates of Ganzfeld Success Rate
Because the z scores and effect sizes for the automated ganzfeld
are consistent with the original set of 28 studies in the meta-analysis,
a better estimate of their true population values may be obtained by
combining them. Positive outcomes were obtained in 33 of the 39
studies (85%); the 95% CI is from 69% to 99%. Table 6 shows a
stem-and-leaf frequency plot of the z scores (Tukey, 1977). Unlike
other methods of displaying frequency distributions, the stem-and-
leaf plot retains ,the numerical data precisely. (Turned on its side,
the stem-and4eaf plot becomes a conventional histogram.) Each
number includes a stem and one or more leaves. For example, the
stem 1 is followed by leaves of 6,6,6,7,7,7, representing z scores of
1.6,1.6,1.6,1.7,1.7,1.7. In the display, the letter "H" identifies the
128 The Journal of Parapsychology
TABLE 6
Dis-ritiBuTioN 01: 2: Scotus
Stern
Leaf
Minimum z =
-1.97
-1.
97
Lower hinge -
0.25
0
-0.
85
Median z =
0.92
-0.
33
Mean z =
1.28
0-
0.
H 222224
Upper hinge =
2.08
11
0
0.
1.
M 6667777999
666777
Maximum z =
SD =
4.02
1.44
2.
H 011
Skewness (g,) =
0.05
2.
8
Kurtosis (g.,) =
-0.37
3.
01124
Combined (Stouffer) z =
7.53
3.
9
4.
0
upper and lower hinges of the distribution, and "M" identifies its
median. The z's range from - 1.97 to 4.02 (mean z 1.21, SD =
1.45), and the 95% CI is a z from .76 to 1.66.
0 The combined z for the 39 studies is 7.53 (p = 9 x
5, Rosenthal's (1984) file-drawer statistic indicates that 778 additional
studies with z scores averaging zero would be required to reduce the
-0 significance of the combined ganzfeld database to nonsignificance;
c.o
? that is a ratio of 19 unknown studies for every known study.
? A stem-and-leaf display of the effect sizes is shown in Table 7.
-4 The effect sizes range from -.93 to 1.11 (mean It = ..28, SD = .1 I).
co
co The two most extreme values on both sides of the distribution are
(T) outliers. The 95% CI is an h between .15 and .4 1; the equivalent hit
G.) rate is from 31.5% to 44.5%.
0? Dynamic Versus Static Targets
The use of video sequences as targets is a novel feature of the
^ autoganzfeld database. However, a comparable difference in target
type exists in the earlier ganzfeld studies. Of the 28 direct hits stud-
ies in the meta-analysis, 9 studies (by three independent investiga-
tors) used View Master stereoscopic slide reels as targets
(Honorton, 1985, Studies 7-8, 16-19, 21, 38-39). Static targets
(single pictures or slides) were used in the remaining 19 studies by
seven independent investigators (Studies 1, 2, 4, 10-13, 23-31, 33-
34, 41-42). Like the autoganzfeld video sequences, View Master tar-
gets present a variety of images reinforcing a central target theme.
Psi Communication in the Ganzfeld 129
TABLE 7
DISTRIBUTION OF EFFECT SIZES (COHEN'S h)
Stem
Leaf
- .9
-.4
3
0
OUTSIDE VALUES
?
Minimum h
-0
-0
0
CD
- .9 3D-
-.3
Lower hinge
0.10x
-.1
0
Median h
-.0
51
Mean h
0.2P
c:0
.0
7779
Upper hinge
.1
H 002888
Maximum h
1.4t
.2
M 1334
SI)
0.41(D
.3
11144777
Skewness (g1)
0.2t
.4
H 01113
Kurtosis (g2)
2.490c4
.5
7
.7
3
.8
17
co
OUTSIDE VALUES
1.3 3
0
1.4 4
To compare the relative impact of dynamic and static targets irk12
the autoganzfeld and meta-analysis, we obtained point-biserial cor-
relations for each data set using target type (static or dynamic) asF.,:si
the predictor variable and the series effect size, Cohen's It, as ditto
outcome variable. We test the difference between the two correla-
tions using Cohen's q (Cohen, 1977). Dynamic targets yield signifi-Ei
cantly-larger-effect sizes-in- both- data 'sets. POT the Ineta-atiarys-ts, -r1,8
is .409, / (26) = 2.28, p = .015; and for the autoganzfeld, as re-o
ported above, ri, is .663. The two correlations are not significandi
different (q = .36; z = 1.14). Therefore, we combine the-two data
sets to obtain a better estimate of the relationship between effect size-.
and target type: r, = .439, t (45) = 3.28, p = .002. The 95% CIs4
are 24% to 36% for static targets and 38% to 55% for dynamic tar-
gets. Thus, the cumulative evidence strongly indicates that dynamic
targets are more accurately retrieved than static targets.
Sender/Receiver Pairing
A similar analysis compares the effects of sender/receiver pairing
in the two databases. Studies in the meta-analysis did not routinely
irmocoancoou69Loo-96dau-vi3 : 914170/C00Z aseeieu JOd 130A0iddV
130 The Journal of Parapsychology
provide detailed breakdowns regarding sender/receiver pairing.
Sender/receiver pairing in the meta-analysis can only be coded ac-
cording to whether subjects could bring friends to serve as their
sender oi we, e rem ric ted to Um, alory scii(ici s. III 17 siticlics, hy six
independent investigators, subjects were free to bring friends
(Honorton, 1985, Studies 1-2, 4, 7-8, 1(5, 23-28, 30, 33-34, 38-
39). Laboratory-assigned senders were used exclusively in the re-
maining 8 studies, by four independent investigators (Studies 10?
.12, 18-19, 21, 29, 41). (Three studies using clairvoyance proce-
dures and no senders are excluded from this analysis.) For the au-
toganzfeld studies, we calculated separate effect sizes for each series
by sender type (combining lab friend and friend for comparability
With the meta-analysis). In the meta-analysis, ri, (23) is .403; larger
effect sizes occurred in studies where friends could serve as sender
(t = 2.11, p = .023). For the autoganzfeld, as reported above, rp is
.363, in the same direction. The two correlations are very similar (q
= .05; z = 0.14) and are combined to give a better estimate of the
relationship between sender/receiver pairing and ganzfeld study
outcome: r1, = .38,1 (12) = 2.66, p = .0055. The 95% Cis are 20%
to 34% for unacquainted sender/receiver pairs and 31.1% to 19.2%
for friends. Thus, the sender/receiver relationship does have a sig-
nificant impact on performance.
Effect of Prior Ganzfeld Experience
The meta-analysis includes 14 studies, by nine independent in-
vestigators, in which novices are used exclusively (Honorton, 1985,
Studies 2, 4, 8, 10-12, 16-18, 23-24, 31, 41-42). Experienced or
mixed samples of novice and experienced subjects are used in the
remaining 14 studies, by four different investigators (Studies 1, 7,
19, 21, 25-30, 33-34, 38-39). Studies using experienced subjects
were more successful than those limited to novices; the point-biserial
correlation between level of experience and effect size is .229, t (26)
= 1.20, p = .12. For the autoganzfeld studies, as reported above,
rp is .078. The two correlations do not differ significantly (q = .155;
z = 0.40), and the combined rp is .194, t (38) = 1.22, p = .105. The
respective 95% CIs are 24.5% to 44.5% for novices and 35.5% to
48% for experienced subjects.
The 95% Cls for these comparative analyses are shown graphi-
cally in Figure 2. The bottom two rows are Cis for the overall hit
rates in the meta-analysis and autoganzfeld, respectively. The next
Data set and condition
Psi Communication in the Ganzfeld
131
A u 0:Ex 1,,,r -
Niela:Exper
Au lo:Nov ice -
Meta:Novice .-
A u lo:SIZ=Fr
N{ etn:S12=Fr
Au Lo:S12=La b
la:Slt=La
Au lo:IGT=S La
Mcla:TGT=Sla
Aulo:TGT=Dyn
-1.0 -0.8 -0.6 -0.4 -0.2 0.0 02 0.4 0.6 0.8
Effect size (h)
81./170/C00Z eseelet1
0
1.0 ?
0
to
Figure 2. Comparison of autoganzfeld and meta-analysis 95% confidence?,
limits. Abbreviations are defined as follows: Meta = meta-analysis studies,
Auto = automated ganzfeld studies, Dyn = dynamic targets, Sta = static-4
targets, Lab = laboratory senders, Fr = sender is friend or acquaintance
of receiver, Novice = no prior ganzfeld experience, Exper = prior ga.nz-?2
feld experience. 0
0
two rows give the Cis for dynamic targets in the two data sets, and
SO On.
DISCUSSION
We now consider various rival hypotheses that might account for
the experimental outcomes, and the degree to which the automated
ganzfeld experiments, viewed in conjunction with the earlier psi
132 The journal of Parapsychology
ganzfeld studies, constitute evidence for psi communication. Finally,
we consider directions for future research suggested by these find-
ings.
13 Rival Hypotheses
0
Sensory Cues. Only Sc knows the identity of the target until R
a finishes the automated judging procedure. If Se is not a PRL staff
071 member, a staff member not otherwise involved in the session su-
pervises target selection. In either case, the target selector knows
a) only which videocassette contains the target. The target selector
leaves the monitoring room with the remaining three target tapes
(ID) after knocking three times on the monitoring room door, signalling
E to return. Since the target selector only knows the videocassette
o number, variations in knocking cannot communicate any useful in-
c..,
a formation to E. The cardboard cover over the VCR eliminates any
it visual cues to E regarding the position of the videotape or the activ-
_.
co ity of the VU meters (which are active when the target is dynamic
and has a soundtrack).
0
Sensory transmission from Se to R during the ganzfeld session is
? eliminated by having R and Se in separate, sound-attenuated rooms.
O If either participant leaves their room before R's ratings have been
? registered in the computer, the session is unconditionally aborted.
6 The videotape target display system prevents potential handling
0 cues during the judging procedure. Computer registration of R's
g target ratings and automated feedback after the session prevents the
? possibility of cheating by Se during feedback, raised by Hyman
o (1985).
? ?After-about-8G% of the--sessions-were-completed, it was becoming_
.0 clear that our hypothesis concerning the superiority of dynamic tar-
o
(.4 gets over static targets was receiving substantial confirmation. Be-
cause dynamic targets contain auditory as well as visual information,
?% we conducted a supplementary test to assess the possibility of audi-
tory leakage from the VCR soundtrack to R. With the VCR audio
set to normal amplification, no auditory signal could be detected
through R's headphones, with or without white noise. When an ex-
ternal amplifier was added between the VCR and R's headphones
and with the white noise turned completely off, the soundtrack
could sometimes be faintly detected. It is unlikely that subjects could
have detected any target audio signal with the normal VCR ampli-
fication and white noise; as we have reported, there is no correlation
between ganzfeld success rate and white noise level in these exper-
Psi Communication in the Ganz/'hl 1 33
iments. Nevertheless, to totally exclude any possibility of subliminal
cueing, we modified the equipment. Additional testing confirmed
that this modification effectively eliminated all leakage. This was
formally confirmed by an audio spectrum analysis, covering the fre-
quency domain between 475 Hz and 15.2 kHz. The critical question,
of course, is whether performance on dynamic targets diminished
after this modification. The answer is no; in fact, performance im-
proved. Before the modification, the direct hit rate on dynamic tar-
gets was 38% (150 trials, 57 hits, h = .28, exact binomial p =
.00029, z = 3.44); the 95% CI was from 31% to 45%. Following the
modification, the direct hit rate was 50% (40 trials, 20 hits, h = .52,
exact binomial p = .00057, z = 3.25) with a 95% CI from 37% to
63%. The direct hit rate for all targets?static and dynamic?after
the modification was 44% (64 trials, 28 hits, It = .39, exact binomial
= .00082, z = 3.15).
Randomization. As Hyman and Honorton (1986, p. 357) have
pointed out, "Because ganzfeld experiments involve only one target
selection per session..., the ganzfeld investigator can restrict his or
her attention to a frequency analysis allowing assessment of the de-
gree to which targets occur with equal probability." We have 'docu-
mented both the general adequacy of the RNG used for target se-
lection and its proper functioning during the experiment.
Data selection. Except for two pilot studies, the number of partic-
ipants and trials were specified in advance for each series. The pilot
or formal status of each series was similarly specified in advance and
recorded on disk before beginning the series. We have reported all
trials, including pilot and ongoing series, using the automated ganz-.
Feld system. Thus, there is no "file-drawer" problem in this data-
Psi ganzfeld success rate is similar for pilot and formal sessions.
The proportion of hits for the 66 pilot sessions is .32 (h = .16, p
.129, z -= 1.13). For the 289 formal sessions, the proportion correct
is .35 (h = .22, p = .0001, z = 3.71). The difference is not signifi-
cant: X2 = 0. 1 1 , 1 df, p = .734.
If we assume that the remaining trials in the three unfinished
series would yield only chance results, these series would still be sta-
tistically significant (exact binomial p = .009, z = 2.36). This would
reduce the overall z for all 11 series from 3.89 to 3.61. Thus, inclu-
sion of the three incomplete studies does not pose an optional stop-
ping problem: '?
Multiple analysis. Informal examination of recent issues of several
American Psychological Association journals suggests that correction
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
P-1?000?0001.?00t168/00-96dCIU-VI3 914170/C00Z eseeiati JOd peACLIddV
134 The Journal of Parapsychology
for multiple comparisons is not a common practice in more conven-
tional areas of psychological inquiry. Nevertheless, half of 11yrnan's
(1985) 50-page critique of earlier psi ganzfeld research focused on
issues related to multiple testing. In the present case, advance spec-
ification of the primary hypothesis and method of analysis prevents
problems involving multiple analysis or multiple indices ill our test
of the overall psi ganzfeld effect. Our direct hits analysis is actually
less significant than either the sum of ranks method (z = 4.01, p =
2.7 x 10-5) or Stanford's z scores (1 = 4.53, 354 41; p = 4.1 x
1 0-6).
In addition to the primary hypothesis, however, we also tested
two secondary hypotheses concerning, the impact of target type and
sender/receiver pairing on psi performance, and we have presented
several purely exploratory analyses as well. Our Results section in-
chides IS significaiRe tests involving psi perfOrmance as the depen-
dent variable, and the p values cited are not adjusted for multiple
comparisons. Of the 15 significance tests, 9 are associated with p <
.05. The Bonferroni multiple comparisons procedure provides a
conservative method of adjusting the alpha level when several si-
multaneous tests of significance are performed (Holland & Copen-
haver, 1988; Hyman & Honorton, 1986; Rosenthal & Rubin, 1984).
When the Bonferroni adjustment is applied, six of the nine individ-
ually significant outcomes remain significant; these are: the overall
hit rate, the subject-based analysis using Stanford z scores, the dif-
ference between dynamic and static targets, the dynamic target hit
rate, and the hit rate for experienced subjects.
Although the relationship between psi performance and sender
type is not independently significant in the autoganzfeld, the cor-
relation coefficient of .363 is close to that observed in the meta-
analysis (r = .403), and the combined result is significant. The cu-
initiative evidence, therefore, does support the conclusion that the
sender/receiver relationship is a significant moderator of ganzfeld
psi performance.
Security. Given the large number of subjects and the significance
of the outcome using subjects as the unit of analysis, subject decep-
tion is not a plausible explanation. The automated ganzfeld protocol
has been examined by several dozen parapsychologists and behav-
ioral researchers from other fields, including well-known critics of
parapsychology. Many have participated as subjects, senders, or ob-
servers. All have expressed satisfaction with our handling of security
issues and controls.
In addition, two experts on the simulation of psi ability have ex-
amined the autoganzfeld system and protocol. Ford,Kross has been
Psi Communication in the Ganzfeld 135
a professional mentalist for over 20 years. Ile is the author of many
articles in mentalist periodicals and has served as Secretary/Treas-
urer of the Psychic Entertainers Association. Mr. Kross has provided
us with the following statement: "In my professional capacity as a
inentalist. I have reviewed Psychophysical Research Laboratories'
automated ganzfeld system and found it to provide excellent secu-
rity against deception by subjects" (personal communication, May,
1989). We have received similar comments from Daryl Bern, Pro-
fessor of Psychology at Cornell University. Professor Bern is well
known for his research in social and personality psychology. He is
also a member of the Psychic Entertainers Association and has per-
formed for many years as a mentalist. Ile visited PRI. for several
days and was a subject in Series 101.
The issue of. investigator integrity call only be conclusively ad-
dressed through independent replications. It is, however, worth
drawing attention to the 13 sessions in which a visiting scientist,
Marilyn J. Schlitz, served as either experimenter (N = 7, 29% hits,
h = .08) or sender (N = 6, 67% hits, h = .36). Altogether, these
sessions yielded 6 direct hits (N = 13, 46.2% hits, h = .45). This
effect size is more than twice as large as that for the database as a
whole.
Status of the Evidence for Psi Communication in the Ganzfeld
The automated ganzfeld studies satisfy the methodological
guidelines recommended by Hyman and Honorton (1986). There-
sults are statistically significant. The effect size is homogeneous
across 11 experimental series and eight different experimenters.
Moreover, the autoganzfeld results are consistent with the outcomes
of the earlier, nonautomated ganzfeld studies; the combined z .of
7.53 would be expected to arise by chance less than one time in 9
trillion.
We have shown that, contrary to the assertions of certain critics
(Druckman & Swets, 1988, p. 175), the ganzfeld psi effect exhibits
"consistent and lawful patterns of covariation found in other areas
of. inquiry." The automated ganzfeld studies display the same pat-
terns of relationships between psi performance and target .type,
sender/receiver acquaintance, and prior testing experience found in
earlier ganzfeld studies, and the magnitude of these relationships is
consistent across the two data sets. The impact of target type and
sender/receiver acquaintance is also consistent with patterns in spon-
taneous case studies, linking ostensible psi experiences to emotion-
ally significant events and persons. These findings cannot be ex-
irmocoancoou69Loo-96dau-vi3 914170/C00Z aseeieu JOd peACLIddV
P-1?000?0004?00t168/00-96dCIU-VI3 81?/170/?00z aseeletliOd peAoiddv
136 The journal of Parap.sychology
plained by conventional theories of coincidence (Diaconis 8c
Mosteller, 1989).
Hyman and Honorton (1986) have stated,
...the best way to resolve the [ganzfeld] controversy. ... is to await the
outcome of future ganzfeld experiments. These experiments, ideally,
will be carried out in such a way as to circumvent the file-drawer prob-
lem, problems of multiple analysis, and the various dekcts in random-
ization, statistical application, and documentation pointed out by
Hyman. If a variety of parapsychologists and other investigators con-
tinue to obtain significant results under these conditions, then the exis-
tence of a genuine communications anomaly will have been demon-
strated. (pp. 353-354)
We have presented a series of experiments that satisfy these
guidelines. Although no single investigator or laboratory can satisfy
the requirement of independent replication, the automated ganzfeld
studies are quite consistent with the earlier studies. On the basis of
the cumulative evidence, we conclude that the ganzfeld effect rep-
resents a genuine communications anomaly. This conclusion will
either be strengthened or weakened by additional independent rep-
lications, but there is no longer any justification for the claim made
by some critics that the existing evidence does not warrant serious
attention by the scientific community.
Recommendations for Future Research
Recent psi ganzfeld research has necessarily focused on meth-
odological issues arising from the ganzfeld controversy. It is essen-
tial that future studies comply with the methodological standards
agreed-carc-he--rs imperative_that-
serious attention be given to conditions associated with successful
outcomes.
Small to medium effect sizes characterize many research findings
in the biomedical and social sciences (e.g., Cohen, 1977; Rosenthal,
1984). Rosenthal (1986) and Utts (1986) make a strong case for
more careful consideration of the magnitude of effect in the design
and analysis of future ganzfeld studies. The automated ganzfeld
studies show a success rate slightly in excess of 34%. Utts's (1986)
power analysis shows that for an effect of this size, the investigator
has only about one chance in three of obtaining a statistically signif-
icant result in a 50-trial experiment. Even with 100 trials?an unu-
sually large sample size in ganzfeld research?the probability of a
significant outcome is only about. .5.
Psi Communication in the Ganzfeld 137
We urge ganzfeld investigators to use dynamic targets and to de-
sign their studies to allow subjects to have the option to have friends
or acquaintances as their senders. The similarity of the autoganzfeld
and meta-analysis data sets strongly indicates that these factors are
important moderators of psi ganzfeld performance. If our estimate4:;
of the impact of dynamic and static targets is accurate, a 50-sessionig
series using dynamic targets has approximately an 84% chance of
yielding a significant outcome. A comparable series with static tar- a
gets has only about one chance in five of achieving significance. -n
0
REFERENCES
ALCOCK, J. E. (1986). Comments on the Hyman-Honorton ganzfeld contro-
versy. Journal of Parapsychology, 50, 345-348.
AKERS, C. (1984). Methodological criticisms of parapsychology. In S. Krippner r%)
(Ed.), Advances in parapsychological research, Vol. 4 (pp. 112-164). Jeffer- s8
son, NC: McFarland.
BERGER, R. E., & HONORTON, C. (1986). An automated psi ganzfeld testing 0
system. In D. H. Weiner & D. I. Raclin (Eds.), Research in parapsychology It
1985 (pp. 85-88). Metuchen, NJ: Scarecrow Press. CO
BLACKMORE, S. (1980). The extent of selective reporting of ESP. ganzfeld
0
studies. European Journal of Parapsychology, 3, 213-219.
131,AcKmoRE, S. (1987). A report of a visit to Carl Sargent's laboratory. Journal
of the Society for Psychical Research, 54, 186-198.
BRAUD, W. G. (1978). Psi conducive conditions: Explorations and interpre-
tations. In B. Shapin & L. Coly (Eds.), Psi and states of awareness (pp. 1? (69)
34). New York: Parapsychology Foundation, Inc. 6
BRAUD, W. G., WOOD, R., & BRAUD, L. W. (1975). Free-response GESP per- F,i)
formance during an experimental hypnagogic state induced by visual S'a
and acoustic ganzfeld techniques: A replication and extension. Journal of g
the American Society for Psychical Research, 69, 105-113. 0
Sc-M-Thits,-1-:-13-:-(-1-g5-7).-Myers-13-riggs--Type-indicator Fortn-F.-P-alo
Alto, CA: Consulting Psychologists Press, Inc. 0
? 0
BROWNLEE, K. A. (1965). Statistical theory and methodology in science and engi-
neering. New York: John Wiley & Sons, Inc. 0
0
CHILD, I. L. (1986). Comments on the ganzfeld controversy. Journal of Para- o
psychology. 50, 337-3,14.
COHEN, J. (1977). Statistical power analysis for the behavioral sciences (rev. ed.).
New York: Academic Press.
DiAcoms, P., & MOSTELLER, F. (1989). Methods for studying coincidences.
Journal of the American Statistical Association, 84, 853-861.
DRUCKMAN, D., Sc SWETS, J. (1988). Enhancing human performance: Issues, the-
ories, and techniques. Washington, DC: National Academy Press.
1-Inatkly, T.,. k MA-I-rums, G. (1987). Cheating, psi, and the appliance of
science: A' reply to Blackmore. Journal of the Society for Psychical Research,
54, 199-207.
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Volume 19, Number 12, December 1989
Plenum Press ? New York-London
This issue completes Volume 19
FNDPA4 19(12) 1441-1538 (1989)
ISSN 0015-9018
FOUNDATIONS
OF PHYSICS
An International Journal Devoted to the Conceptual Bases and
Fundamental Theories of Modern Physics, Biophysics, and Cosmology
? Editor:
Alwyn van der Merwe
Editorial Board
Asim 0. Barut
Peter G. Bergmann
Nikolai N. Bogolubov
David Bohm
Robert S. Cohen
Olivier Costa de Beauregard
Robert H. Dicke
Hao
Max Jammer
Brian D. Josephson
R. Bruce Lindsay
Per-Olov L6wdin
Henry Margenau
Jagdish Mehra
Andr?ercier
Louis Neel
Kazuhiko Nishijima
James L. Park
Linus Pauling
Rudolph Peierls
Karl R. Popper
Ilya Prigogine
Abdus Salem
John L. Synge
'Hans-J. Treder
Jean-Pierre Vigier
Mikhail Vol'kenshtein
Carl Friedrich von Weizsacker
Eugene P. Wigner
Chen-Ning Yang
-ounding Editors: Henry Margenau and Wolfgang Yourgraut
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Foundations of Physics, Vol. 19, No. 12, 1989
Evidence for Consciousness-Related Anomalies in
Random Physical Systems
Dean I. Radin1 and Roger D. Nelson2
Received May 6, 1988; revised June 12, 1989
Speculations about the tole of consciousness in physical sys ems are frequently
observed in the literature concerned with the interpretation of t uantum mechanics.
While only three experimental investigations can be found on his topic in physics
journals, more than 800 relevant experiments have been repor ed in the literature
of parapsychology. A well-defined body of empirical evidenc4 from this domain
was reviewed using meta-analytic techniques to assess method logical quality and
overall effect size. Results showedl effects conforming to ch4nce expectation in
control conditions and unequivocal non-chance effects in expeimental conditions.
This quantitative literature review agrees with the findings of Iwo earlier reviews,
suggesting the existence of some form of consciousness-related anomaly in random
physical systems.
1. INTRODUCTION
The nature of the relationship between human consciousness and the
physical world has intrigued philosophers for millenia. In this century,
speculations about mind?body interactions persist, often contributed by
physicists in discussions of the measurement problem in quantum mechanics.
Virtually all of the founders of quantum theory?Planck, de Broglie,
Heisenberg, Schrodinger, Einstein?considered this subject ii depth," ) and
contemporary physicists continue this tradition.(2-7)
'Department of Psychology, Princeton University, Princeton, New Jersey 08544. Present
address: Contel Technology Center, 15000 Conference Center Drive, P.O. Box 10814,
Chantilly, Virginia 22021-3808.
2 Department of Mechanical and Aerospace Engineering, Princeton UniVersity, Princeton,
New Jersey 08544.
1499
0015-9018/89/1200-1499106.00,0 ? 1989 Plenum
Publishing Corporation
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
1500 Radin and Nelson
The following expression of the problem can be found in a recent
interpretation of quantum theory:
If conscious choice can decide what particular observation I measure, and there-
fore into what states my consciousness splits, might not conscious choice also
be able to influence the outcome of the measurement? One possible place where
mind may influence matter is in quantum effects. Experiments on whether it is
possible to affect the decay rates of nuclei by thinking suitable thoughts would
presumably be easy to perform, and might be worth doing.")
Given the distinguished history of speculations about the role of
consciousness in quantum mechanics, one might expect that the physics
literature would contain a sizable body of empirical data on this topic. A
search, however, reveals only three studies.
The first is in an article by Hall, Kim, McElroy, and Shimony, who
reported an experiment "based upon taking seriously the proposal that the
reduction of the wave packet is due to a mind?body interaction, in which
both of the interacting systems are changed."(91 This experiment examined
whether one person could detect if another person had previously observed
a quantum mechanical event (gamma emission from sodium-22 atoms).
The idea was based on the supposition that if person A's observation
actually changes the physical state of a system, then when person B obser-
ves the same system later, B's experience may be different according to
whether A has or has not looked at the system. Hall et al.'s results, based
on a total of 554 trials, did not support the hypothesis; the observed
number of "hits" obtained in their experiment was precisely the number
expected -by chance (277), while the variance of their measurements was
significantly smaller than expected (p<
The second study is referred to by Hall el al., who end their article by
pointing out that a similar, unpublished experiment using cobalt-57 as the
source was successful (40 hits out of 67 trials).(10)
The third study is a more systematic investigation reported by
Jahn and Dunne," who summarize results of over 25 million binary
trials collected during seven years of experimentation with random-event
generators. These experiments, involving long-term data collection with
33 unselected individuals, provide persuasive, replicable evidence of an
anomalous correlation between conscious intention and the output of
random number generators.
Thus, of three pertinent experiments referenced in mainstream physics
journals, one describes results statistically too close to chance expectation
and two describe positive effects.('") Given the theoretical implications of
such an effect, it is remarkable that no further experiments of this type can
be found in the physics literature; but this is not to say that no such
experiments have been performed. In fact, dozens of researchers have
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4
Consciousness in Physical Systems
1501
reported conceptually identical experiments in the puzzling anti uncertain
domain of parapsychology. Perhaps because of the insular nature of
scientific disciplines, the vast majority of these experiments are unknown
to most scientists. A few critics who have considered this literature have
dismissed the experiments as being flawed, nonreplicablel or open
to fraud,(12-16) but their assertions are countered by at least two
detailed reviews which provide strong statistical support for aim existence
of anomalous consciousness-related effects with randoti number
J
generators:1'" In this paper, we describe the results of a corr prehensive,
quantitative meta-analysis which focused on the questions of nrthodologi-
cal quality and replicability in these experiments.
2. THE EXPERIMENTS
The experiments involved some form of microelectro
number generator (RNG), a human observer, and a set of ins
the observer to attempt to "influence" the RNG to generat
lc random
ructions for
particular
numbers, or changes in a distribution, solely by intention. RNGs are
usually based upon a source of truly random events such 4s electronic
noise, radioactive decay, or randomly seeded pseudorandom sequences.(19)
Feedback about the distribution of random events is often provided in the
l
form of a digital display, but audio feedback, computer graphics, and a
variety of other mechanisms have also been used. Some o the RNGs
described in the literature are technically sophisticated, the best devices
employing electromagnetic shielding, environmental failsafe mechanisms
triggered by deviant voltages, currents, or temperature automatic
computer-based data recording on magnetic media, redundar t hard copy
output, periodic randomness calibrations, and so on.(18?20) i
RNGs are typically designed to produce a sequence of random bits at
the press of a button. After generating a sequence of say, 100 random bits
(0's or l's), the number of l's in the sequence may be provided as feedback.
In an experimental protocol using a binary RNG, a run mi4it consist of
an observer being asked to cause the RNG to produce, in three successive
button presses, a high number (sum of l's greater than chancl expectation
of 50), a low number (less than 50), and a control condition si.lith no direc-
tional intention. An experiment might consist of a group of individuals
each contributing a hundred such runs, or one individual icontributing
several thousand runs. Results are usually analyzed by cornparing high
aim and low aim means against a control mean or theor tical chance
expectation.
Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
1502 Radin and Nelsoo
3. META-ANALYTIC PROCEDURES
The quantitative literature review, also called meta-analysis, has
become a valuable tool in the behavioral and social sciences.(21'
Meta-analysis is analogous to well-established procedures used in the
physical sciences to determine parameters and constants. The technique
assesses replication of an effect within a body of studies by examining the
distribution of effect sizes.(22-24) In the present context, the null hypothesis
(no mental influence on the RNG output) specifies an expected mean effect
size of zero. A homogeneous distribution of effect sizes with nonzero mean
indicates replication of an effect, and the size of the deviation of the mean
from its expected value estimates the magnitude of the effect.
Meta-analyses assume that effects being compared are similar across
different experiments, that is, that all studies seek to estimate the same pop-
ulation parameters. Thus the scope of a quantitative review must be strictly
delimited to ensure appropriate commonality across the different studies
that are combined.(21,25) This can present a nontrivial problem in meta-
analytic reviews because replication studies typically investigate a number
of variables in addition to those studied in the original experiments. In the
present case, because different subjects, experimental protocols, and RNGs
were employed within the reviewed literature, some heterogeneity
attributable to these factors was expected in the obtained distribution of
effect sizes. However, the circumscription for the review required that every
study in the database have the same primary goal or hypothesis, and hence
estimat& the same underlying effect.
Experiments selected for review examined the following hypothesis:
The statistical output of an electronic RNG is correlated with observer
intention in accordance with prespecified instructions, as indicated by
the directional shift of distribution parameters (usually the mean) from
expected values.
Because this "directional shift" is most often reported as a standard
normal deviate (i.e., Z score) in the reviewed experiments, we determined
effect size as a Z score normalized by the square root of the sample size
(N), e = Z , where N was the total number of individual random events
(with probability of a hit at p = 0.5, p =0.25, etc.). This effect size measure
is equivalent to a Pearson product moment correlation.(')
3.1. Unit of Analysis
To avoid redundant inclusion of data in a meta-analysis, "units of
analysis" are often specified. We employed the following method: If
an author distinguished among several experiments reported in a single
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Consciousness in Physical Systems 1503
article with titles such as "pilot test" or "confirmatory test," or provided
independent statistical summaries, each of these studies s6s coded and
quality-assessed separately. If an experiment consisted ofl two or more
conditions comparing different intentions or types of RNG devices, the
data were split into separate units of analysis to allow the results to be
coded unambiguously. In general, within a given reviewled report, the
li
largest possible aggregation of nonoverlapping data col ected under a
single intentional aim was defined as the unit of analysis (l ereafter called
an experiment or study).
For each experiment, a Z score was assigned co responding to
whether the observed result matched the direction of int ntion. Thus, a
negative Z obtained under intention to "aim lbw" was recorded as a
positive score. When sufficient data were provided in a report, Z was
calculated from those data and compared with the reported results; the
new calculation was used if there was a discrepancy. If oily probability
levels were reported, these were transformed into the c rresponding 2
score. For experiments reported only as "nonsignificant," a conservative
value of Z = 0 was assigned; if the outcome was reported ?illy as "statisti-
cally significant," Z = 1.645 was assigned; and if sample size was not repor-
ted or could not be calculated from the information proNfided, a special
code of N = 1 was assigned.
3.2. Assessing Quality
Because the hypothesized anomalous effect is not easily accom-
modated within the prevailing scientific world-view, it is particularly
important to assess the trustworthiness of each review d experiment.
Unfortunately, estimating experimental quality tends to le a subjective
task confounded by prior expectations and beliefs.(26.27) Est mates of inter-
judge reliability in assessing the quality of research reports, for example,
rarely exceed correlations of 0.5.(28) We addressed this problem by
assigning to each experiment a single quality weight derived from a set of
sixteen binary (present/absent) criteria. The first author coded and
double-checked the coding for all studies; the second autho independently
coded the first 100 studies. Inter-judge reliability for qual1 ty criteria was
r = 0.802 with 98 degrees of freedom.
These criteria were developed from published ciliticisms about
random-number generator experiments 4j5.2933) and from expert opinion
on important methodological considerations when perf rming studies
involving human behavior.(20,34.35) Collectively, these c iteria form a
measure of credibility by which to judge the reported da a. The criteria
assess the integrity of the experiment in four categori s?procedures,
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
1504 Radin and Nelson
statistics, the data, and the RNG device?and they cover virtually all
methodological criticisms raised to date. They are (1) control tests noted,
(2) local controls conducted, (3) global controls conducted, (4) controls
established through the experimental protocol, (5) randomness calibrations
conducted, (6)failsafe equipment employed, (7) data automatically recor-
ded, (8) redundant data recording employed, (9) data double checked,
(10) data permanently archived, (11) targets alternated on successive trials,
(12) data selection prevented by protocol or equipment, (13) fixed run
lengths specified, (14) formal experiment declared, (15) tamper-resistant
RNG employed, and (16) use of unselected subjects.
Each criterion was coded as being present or absent in the report of
an experiment, specifically excluding consideration of previously published
descriptions of RNG devices or control tests. This strategy was employed
to reflect lower confidence in such experiments since, for example, random-
ness tests conducted once on an RNG do not guarantee acceptable perfor-
mance in the same RNG in all future experiments. As a result, assessed
quality was conservative, that is, lower than the "true" quality for some
experiments, especially those reported only as abstracts or conference
proceedings. Using unit weights (which have been shown to be robust in
such applications1361) on each of the sixteen descriptors, the quality rating
for an individual experiment was simply the sum of the descriptors. Thus,
while a quality score near zero indicated a low quality or poorly reported
experiment, a score near sixteen reflected a highly credible experiment.
3.3. Assessing Effect Size
Assume that each of K experiments produces effect size estimates e of
a parameter E, based on N samples, and that each e has a known standard
error s. The weighted mean effect size is calculated as e. = E co,e,lEco?
where co, = 1/4 = N1, and i ranges from 1 to K. The standard error of e. is
se= (E co )-112. A test for homogeneity for the K estimates of e; is given by
HK=Ea),(e,?e.)2, where HK has a chi-square distribution with K-1
degrees of freedom.(") The same procedure can be followed to test for
homogeneity of effect size across M independent investigators. In this case,
e.; and se; are calculated per investigator, and the test for homogeneity is
performed as H m=E e.,)2, where e. and cu., are mean weighted
effect size and 1/se2 per investigator, respectively, e. m=E coje. ;ix cop and
j ranges from 1 to M. HM has M? 1 degrees of freedom.
For a quality-weighted analysis, we may determine e. Q=
E (Q,cojedlE(Qico,), where Qi is the quality assessed for experiment i. The
standard error associated with eQ is seQ=(E(Oodl(EQ,w,)2)-112; the
test for homogeneity is similar to that described above. Finally, following
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Consciousness in Physical Systems 1505
I
I
the practice of reviewers in the physical sciences,(23?24) we deleted potential
"outlier" studies to obtain a homogeneous distribution of effec sizes and to
reduce the possibility that the calculated mean effect size m y have been
spuriously enlarged by extreme values. The procedure used was as follows:
If the homogeneity statistic for all studies was significant (a I the p .05).
Free-response studies involving group testing. Only
two FR studies involved group testing (Table 1, row
8). Both studies were contributed by the same inves-
tigator. The mean weighted r is .19 (z = 1.83, p = .067,
95% CI from -.01 to .37). The results are significantly
nonhomogeneous (x2i = 7.53, p 05
Notes. r is the weighted average correlation coefficient (Hedges ik ()Mu, 1985). X2 is tIllr within group homogeneity statistic
(Rosenthal, 1984).
Consistency across Investigators
Table 3 shows the overall FR results by investiga-
tor. Three of the four investigators have significant
ESP/extraversion correlations and the results of the
fourth investigator (Braud) approach significance.
The z by investigator is 5.11, a result that should arise
by chance less than one time in 3 .3 million. The results
are homogeneous across investigators (x23 = 2.51, p
> .05). Although 10 of the 14 FR studies were contrib-
uted by one investigator (Sargent), evidence for the
relationship between free-response ESP performance
and extraversion is not dependent upon that investi-
gator. When Sargent's work is eliminated, the results
of the three remaining investigators still strongly sup-
ports a relationship between ESP performance and
extraversion (z = 335, p = 0.0008, two-tailed). There-
fore, we conclude that the ESP/extraversion relation-
ship is consistent across investigators.
Extraversion Measures
Each FR investigator used a different scale for
measuring extraversion. Marsh used the Bemreuter
Personality Inventory (Super, 1942); Sargent and his
group used the Cattell 16PF (Ca ttell, Eber gr Tatsuoka,
1970); Braud and Bells ez Morris used scales con-
structed by the investigators (with no psychometric
validation provided). It is impossible to isolate the
effects of the instruments for measuring extraversion
from the ensemble of procedures and research styles
associated with the investigators. All that can be said
is that a relationship between extraversion and ESP
performance is evident in studies using four different
measures of extraversion.
Selective Reporting
In order to assess the vulnerability of these studies
to selective reporting, we used Rosenthal's (1984)
"Fail-safe N" statistiC to estimate the number of unre-
ported studies averin
a' ' g null outcomes necessary to
ed
ruce the known lata base to nonsigificance. The
Fail-safe N is 140 studies. In other words, if we were
to assume that the 'observed outcomes arise from
selective reporting, Owould be necessary to postulate
10 unreported studies averaging null outcomes for
each reported study. Therefore, we conclude that the
free-response ESP/extraversion relationship cannot
be explained on the basis of selective reporting.
1
Power Analysis
i
The FR mean r of 120 is equivalent to an average
ESP scoring advantae for extraverts over introverts
of 0.4 standard devi4tions. The FR studies average
sample size is 44 subjects and the likelihood of detect-
ing a correlation of .2 at the five percent significance
level with this samplsize?the statistical power?is
37 percent (Cohen, 197, p. 87). Thus, in a sample of
elf/
14 studies, the expected number of statistically sig-
nificant studies is 5.2;41e actual number of significant
studies is seven (exact binomial probability, with
p = .37 Sr q = .63, = i23, one-tailed). Thus, the ob-
served rate of significnt outcomes is consistent with
a correlation of .2. 1
Achievement of statistical significance, assuming
I
a correlation of .2, is essentially a coin toss with
sample sizes less than 48 subjects; a sample size of 180
is necessary to achieve 85 percent power.
In the following sec
validity of the ESP/e
comparing the meta-a
of a new data set.
on, we explore the predictive
traversion meta-analysis by
lytic estimate to the outcome
Approved For. Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
HONORTON, FERRARI & BEM 9
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Table 4. ESP/Extraversion Correlations by Experimenter In the PRL Novice Series
Experimenter
N Subjects
Experimenter EI Score
Honorton
41
.27
1.71
101
Quant
69
.29
2.38
103
Derr
22
.03
0.68
81
Berger
13
-.37
-1.18
115
Varvoglis
21
.os
0.32
133
Schechter
7
-.05
-0.10
125
Ferrari
10
-.20
-0.54
133
Schatz
7
.15
0.92
69
Note. r is the weighted average correlation coefficient (Hedges & Olkin, 1985),
A New Confirmation
Extraversion data is available for 221 of the 241
subjects in a series of ESP ganzfeld studies reported
by Honorton, Berger, Varvoglis, Quant, Derr, Han-
sen, Schechter & Ferrari (1990) and conducted at the
Psychophysical Research Laboratories (PRL) in
Princeton, N.J. The experimental procedures are de-
scribed in detail in the Honorton, et al. (1990) report.
Subjects
The subjects were 131 women and 90 men. Their
average age is 37 years (sd = 11.7). This is a well-edu-
cated group; the mean formal education is 15.5 years
(sd = 2.0) and belief in psi is strong in this population.
On a seven-point scale where "1" indicates strong
disbelief and '7" indicates strong belief in psi, the
mean is 6.20 (sd = 1.03). Personal experiences sugges-
tive of psi were reported by 88percent of the subjects;
eighty percent reported ostensible telepathic experi-
ences. Eighty percent have had some training in
meditation or other techniques involving internal fo-
cus of attention. One hundred and sixty-three sub-
jects contributed a single ESP ganzfeld session and 58
contributed multiple sessions.
Extraversion Measure
Extraversion was measured using the continuous
scores of the Extraversion/Introversion (El) Scale in
Form F of the Myers-Briggs Type Indicator (MBTI;
Briggs 8c Myers, 1957). The MBTI was not used in any
of the meta-analysis studies. The MBTI EI Scale is
constructed so that scores below 100 indicate extrav-
ersion and scores above 100 indicate introversion.
(For consistency with the meta-analysis, we have
reversed the signs so that positive correlations reflect
a positive relationship between ESP performance and
extraversion.) The mean EI score for the PRL subjects
is 100.36 (sd = 25.18).
ESP Measure
ESP performance was measured using the stand-
ardized ratings of the target and decoys (Stanford's
z-scores; Stanford and Sargent, 1983). Stanford z's
were averaged for subjects with multiple sessions.
Results
Overall results. The correlation between ESP per-
formance and extraversion in the PRL series is signifi-
cant (r = .18,219 df, t = 2.67, p = .008, two-tailed, 95%
CI from .05 to. .30). This outcome is very close to the
meta-analytic estimate for free-response studies
(r = .20) and the difference between the two correla-
tions is nonsignificant (Cohen's q = .02, z = -0.26,
p = .793, two-tailed).
Ganzfeld Novices. The results are similar if we re-
strict our analysis to the five PRL Novice series with
inexperienced subjects who each completed a single
ganzfeld session. MBTI data is available for 190 of the
205 Novices and the mean weighted r for the five
series is .17 (z = 2.25, p = .024, two-tailed, 95% CI from
.02 to .31). The ESP/extraversion correlations are ho-
mogeneous across the five series (x24 = 2.88, p > .05).
Eleven subjects in the first Novice series (Series 101)
completed the MBTI between six and eighteen
months after their ESP ganzfeld session and we did
not maintain records of their identity. However, the
results are essentially the same when this series is
eliminated. The mean weighted r for the remaining
four Novice series is .19 (z = 2.30, p = .021, two-tailed,
95% CI from .03 to .34).
Outcome by experimenter. Eight experimenters con-
tributed to the PRL data base (Honorton, et al., 1990).
Table 4 shows the ESP/extraversion correlation by
Approved For Release 2003/04/18 : CIA-RDP96-00789R0031000300014
10 EXTRAVERSION & ESP: A META-ANALYSIS & NEW CO4'IRMATION
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
experimenter for the five Novice series. The mean
weighted r for the eight experimenters is .16 (z = 2.09,
p = .037, two-tailed, 95% CI from .01 to .30). The
results are homogeneous across the eight experi-
menters (X27 = 6.43, p > .05).
Outcome in relation to EI status of experimenter. It is
possible that the relationship between ESP perform-
ance and extraversion is moderated by personality
characteristics of the experimenter. The last column
of Table 4 shows the MBTI EI scores for each experi-
menter. Only two experimenters (Derr and Schlitz)
are extraverts. Two others (Honorton and Quant) are
borderline introverts. While the above analyses indi-
cate that the ESP/extraversion correlation is consis-
tent across experimenters, there is a nonsignificant
tendency for the relationship to be stronger in the
data of less introverted experimenters (r = .47,6 df, p
= .235, two-tailed).
Combined Estimate of the
Relationship between
Free-response ESP Performance
and Extraversion
Combining the new confirmation with the meta-
analysis, the overall mean weighted r is .19 (z = 5.50,
p = 3.8 x 10-8, 95% CI from .13 to .26). The 'Tail-safe
N" for the combined estimate is 181 studies, or a ratio
of 12 unreported studies averaging null effects for
each known study. Four of the five investigators have
overall significant outcomes and the outcomes are
homogeneous across investigators %24 = 6.03, p
>.05).
Discussion
The Meta-Analysis
Forced-choice studies. The meta-analysis challenges
the conclusions from earlier narrative reviews of the
relationship between extraversion and forced-choice
ESP performance (Eysenck, 1967; Palmer, 1977; Sar-
gent, 1981). The apparent relationship between ex-
traversion and ESP performance in these studies ap-
pears to be due to the influence of subjects' knowl-
edge of their ESP performance on their subsequent
responses to the extraversion measures. Evidence for
a relationship between ESP and extraversion occurs
only when extraversion was measured after the ESP
test; no evidence of an ESP/extraversion relationship
is found in studies where extraversion was measured
before the ESP task.
Evidence for a tilonzero effect in the forced-choice
studies is also limited to the subset of studies involv-
ing ESP testing ptiocedures that were vulnerable to
potential sensory leakage. There is reason to believe,
however, that this nay result from a procedural con-
found: six the eight studies in this subgroup for which
information on th4 order of testing is available also
involved extraver ion testing following ESP feed-
back.
The apparent bi ' sing effect of ESP feedback prob-
ably arises from ore of two possibilities. Awareness
of "success" or "fliure" may lead subjects to later
th
perceive emselvs as more extraverted or intro-
verted. Or, the prOblem may arise from an experi-
menter expectancy 'effect (Rosenthal & Rubin, 1978),
in which subjects rer-pond to the investigator's expec-
tations that extraverts are more successful in ESP
tasks than introvert. Obviously, further research will
be necessary to cla4fy the problem.
The existence of this problem, however, necessar-
ily arouses concerr over the viability of reported
relationships betwen ESP performance and other
personality factors such as neuroticism (Palmer,
1977). Much of the research in these areas was con-
ducted by the same
similar methods we
sions regarding the
formance and other
suspended until the
examined with res
vestigators, and it is likely that
e used. We believe that conclu-
relationship between ESP per-
personality variables should be
relevant study domains can be
ct to this problem.
Free-response studies. The meta-analysis does sup-
port the existence of relationship between extraver-
sion and free-responeESP performance. The free-re-
sponse studies are i4iot amenable to explanation in
terms of an order artlfact or other identifiable threats
to validity. The ove4all correlation of .20 would be
expected to occur or4y about one time in 674,000 by
chance. Three of the four investigators contributing
to this data base obtainIed significant ESP/extraver-
sion relationships, aid the fourth investigator's re-
sults approach sig
n
' cance. The correlations are ho-
mogeneous across investigators, and across the larg-
est grouping of studis in which subjects were tested
individually. The effect remains highly significant
even when 71 percent of the studies, contributed by
one investigator, are eliminated from consideration.
Thus, the relationshij seems to be robust. Estimation
of the filedrawer prob em (Rosenthal,1984), indicates
that it would be neces ry to postulate 10 unreported
studies averaging n 11 results for every retrieved
study in order to acccunt for the observed effect on
the basis of selective reporting.
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
140NORTON, FERRARI & BEM 11
Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4
The New Confirmation
The results of the confirmation, involving a new
set of investigators and a new scale of extraversion,
support the meta-analytic findings and increase their
generalizability. The relationship between free-re-
sponse ESP performance and extraversion now spans
833 subjects and five independent investigator teams.
The homogeneity of the effect across the eight experi-
menters in the confirmatory study further increases
our confidence that the effect is replicable and is not
dependent upon unknown characteristics of individ-
ual investigators. A nonsignificant trend in the data
does suggest that the ESP/extraversion relationship
may, to some extent, be moderated by the experi-
menter's extravertedness and it may be advisable for
future investigators to record and report extraver-
sion/introversion scores of the experimenters.
The Predictive Validity of Meta-Analysis
Meta-analysis is a powerful tool for summarizing
existing evidence. It enables more precise estimation
of the significance and magnitude of behavioral ef-
fects than has been possible with traditional narrative
reviews, and is useful in identifying moderating vari-
ables. In the present case, meta-analytic techniques
revealed a serious source of bias that had been over-
looked in earlier narrative reviews of the ESP /extrav-
ersion domain. Moreover, the meta-analysis identi-
fied a subset of the domain that is not amenable to the
discovered bias and provided an estimate of the mag-
nitude of the relationship between ESP and extraver-
sion in that subset.
Ultimately, the usefulness of meta-analysis will be
judged by its ability to predict new outcomes and in
this regard we consider the results of the confirma-
tion study to be especially noteworthy. The correla-
tion between ESP performance and extraversion in
the confirmation study is very close to that predicted
by the meta-analysis. This is the second test of the
predictive validity of meta-analysis in parapsy-
chological problem areas; we have previously re-
ported that ESP ganzfeld performance in a new series
of studies (Honorton, etal., 1990), closely matched the
outcomes of earlier studies in a meta-analysis
(Honorton, 1985). Predictability is the hallmark of
successful science and these findings lead us to be
optimistic concerning the prospect that parapsychol-
ogy may be approaching this more advanced stage of
development.
References
Briggs, K. C., 4rici Myers, I. B. (1957). Myers-Briggs
Type Indicator Form F. Palo Alto, CA: Consulting
Psychologists Press, Inc.
Cattell, R. B., Eber, H. W., & Tatsuoka, M. M. (1970).
Handbook for the Sixteen Personality Factor Question-
naire. Champaign, IL: Institute for Personality and
Ability Testing.
Cohen, J. (1977). Statistical power analysis for the behav-
ioral sciences. New York: Academic Press. (Re-
vised Edition.)
Eysenck, H. J. (1967). Personality and extra-sensory
perception. Journal of the Society for Psychical Re-
search, 44, 55-70.
Hedges, L. V., & Olkin, I. (1985). Statistical methods for
Meta-Analysis. New York: Academic Press.
Hedges, L. V. (1987). How hard is hard science, how
soft is soft science? The empirical cumulativeness
of research. American Psychologist, 42, 443-455.
Honorton, C. (1985). Meth-analysis of psi ganzfeld
research: a response to Hyman. Journal of Parapsy-
chology, 49, 51-92.
Honorton, C., Berger, R. E., Varvoglis, M. P., Quant,
M., Derr, P., Hansen, G., Schechter, E. I., and
Ferrari, D. C. (1990). Psi communication in the
ganzfeld: experiments with an automated testing
system and a comparison with a meta-analysis of
earlier studies. In Research in Parapsychology 1989.
Metuchen, NJ: Scarecrow Press. (In press.)
Honorton, C., & Ferrari, D.C. (1989) "Future Telling":
a meta-analysis of forced-choice precognition ex-
periments, 1935-1987. journal of Parapsychology,
53, in press.
Hyman, R. (1985). The psi ganzfeld experiment: A
critical appraisal. Journal of Paiapsycholo , 49, 3-
49.
McCarthy, D., & Schechter, E. I. (1986). Estimating
effect size from critical ratios. In D. H. Weiner &
D. I. Raclin (Eds.) Research in Parapsychology 1985.
Scarecrow Press, pp. 95-96.
Palmer, J. (1977). Attitudes and personality traits in
experimental ESP research. In B. B. Wolman (Ed.)
Handbook of parapsychology. New York: Van Nos-
trand Reinhold.
Palmer, J., & Lieberman, R. (1975). The influence of
psychological set on ESP and out-of-the-body ex-
periences. journal of the American Society for Psychi-
cal Research, 69,193-213.
Radin, D. I., & Nelson, R. D. (1989). Evidence for
consciousness-related anomalies in random
physical systems. Foundations of Physics, 19 ,1499-
1514.
Rosenthal, R. (1984). Meta-Analytic procedures for social
research. Beverly Hills, CA: Sage.
Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4
72 DaRAVERSION & ESP: A META-
Approved For Release 2003/04/18 : CIA-R
Rosenthal, R., & Rubin, D. B. (1978). Interpersonal
expectancy effects: The first 345 studies. Behavioral
and Brain Sciences, 3, 377-386.
Sargent, C. L. (1981). Extraversion and perform-
ance in 'extra-sensory perception' tasks. Personal-
ity and Individual Differences, 2,137-143.
Stanford, R. G., and Sargent, C. L. (1983). Z scores in
free-response methodology: comments on their
utility and correction of an error. Journal of the
American Society for Psychical Research, 77,319-326.
Super, D. E. (1942). The Be rnreuter Personality Inven-
tory: a review of research. Psychological Bulletin,
39, 94-125.
Tukey, J. W. (1977). Exploratory data analysis. Reading,
MA: Addison-Wesley.
Studies Used in the Meta-Analysis
Some reports contain more than one study. For
reports with multiple studies, the number of studies
is indicated in brackets following the reference.
Ashton, H. T., Dear, P. R., & Harley, T. A. (1981). A
four-subject study of psi in the ganzfeld. Journal
of the Society for Psychical Research, 51, 12-21.
Astrom, J. (1965). GESP and the MPI measures. Jour-
nal of Parapsychology, 29, 292-293.
Bellis, J., & Morris, R. L. (1980). Openness, closeclness
and psi. Research in Parapsychology 1979, 98-99.
Braud, L. W. (1976). Openness versus dosedness and
its relationship to psi. Research in Parapsychology
1975, 155-159.
Braud, L. W. (1977). Openness vs. closedness and its
relationship to psi. Research in Parapsychology
1976, 162-165.
Casper. G. W. (1952). Effects of the receiver's attitude
toward the sender in ESP tests. Jou rruzi of Parapsy-
chology, 16,212-218.
Fisk, G. W (1960). The Rhodes experiment. Linkage
in extra-sensory perception by M. C. Marsh. Jour-
nal of the Society for Psychical Research, 40,219-239
and M. C. Marsh. (unpublished). Linkage in Extra-
Sensory Perception. Unpublished doctoral disser-
tation, Dept. of Psychology, Rhodes University,
Grahamstown, South Africa. 450 pages.
Green, C. E. (1966). Extra-sensory perception and the
Maudsley Personality Inventory. Journal of the So-
ciety for Psychical Research, 43, 285-286.
Green, C. E. (1966). Extra-sensory perception and the
extraversion scale of the Maudsley Personality
Inventory. Journal of the Society for Psychical Re-
search, 43,337. ,
ANALYSIS & NEW CONORMATION
DP96-00789R003100030001-4
Haraldsson, E. (10,70). Psychological variables in a
GESP test using plethysmograph recordings. Pro-
ceedings of the Parapsychological Association. 7, 6-7.
Harley, T. A., & Sairl gent, C. L. (1980). Trait and state
rs
facto influen ing ESP performance in the gan-
zfeld. Research in Parapsychology 1979, 126-127.
Humphrey, B. M. (1945). An exploratory correlation
study of persotlity measures and ESP scores.
Journal of Parap chology, 9, 116-123. [3 studies]
Humphrey, B. M. t(1951). Introversion-extraversion
ratings in relatiOn to scores in ESP tests. Journal of
Parapsychology, 15, 252-262.
Kanthamani, B. K. (1966). ESP and social stimulus.
Journal of Parapsychology, 30,31-38.
Kanthamani, B. K.,& Rao, K. R. (1972). Personality
characteristics Of ESP subjects: III. Extraversion
and ESP. Journall of Parapsychology, 36, 198-212.
Krishna, S. R., & Rio, K. R. (1981). Personality and
'belief' in relatio4 to language ESP scores. Research
in Parapsycholo 1980, 61-63. [2 studies]
McElroy, W. A., and Brown, W. K. R. (1950 Electric
shocks for errorsin ESP card tests. Journal of Para-
psychology, 14, 257-266.
Nash, C. B. (1966). 4ation between ESP scoring level
and the Minn ta Multiphasic Personality In-
ventory. Journal ofthe American Society for Psychical
Research, 60, 56-62. [8 studies]
Nicol, J. F., & Humphrey, B. M. (1953). The explora-
tion of ESP and human personality. Journal of the
American Society f4r Psychical Research, 47,133-178.
Nicol, J. F., & Humphlrey, B. M. (1955). The repeatabil-
ity problem in ES -personality research. Journal of
the American Soci y for Psychical Research, 49,125-
156.
Nielsen, W. (1970). Relationships between precogni-
tion scoring level and mood. Journal of Parapsy-
chology, 34, 93-116
Nielsen, W. (1970). S ' dies in group targets: a social
psychology class. Iroceedings of the Parapsychologi-
cal Association, 7, -57.
Nielsen, W. (1970). Studies in group targets: an un-
usual high school group. Proceedings of the Para-
psychological Association, 7, 57-58.
Sargent, C. L. (1978). Hypnosis as a psi-conducive
state: a controllec replication study. Journal of
Parapsychology, 42,257-275. [2 studies]
bring psi in the ganxfdd. New
Foundation, Inc. [2 stud-
Sargent, C. L. (1980).
York: Pa rapsychol
les]
Sargent, C. L., Bartlett
Response structur
zfeld free-responsi
H. J., and Moss, S. P. (1982).
and temporal incline in gan-
GESP testing. Journal of Para-
psychology, 46, 85-110. [2 studies]
Sargent, C. L., Harley, T. A., Lane, J., & Radcliffe, K.
(1981). Ganzfeldpi-optimization in relation to
session duration. R earch in Parapsychology 1980,
' 82-84.
Approved For, Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
HONORTON, FERRARI & BEM 13
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Sargent, C. L., St Harley, T. A. (1981). Three studies
using a psi-predictive trait variable question-
naire. Journal of Parapsychology, 45, 199-214.
Sargent, C. L., and Matthews, G. (1982). Ganzfeld
GESP performance with variable-duration test-
ing. Research in Parapsychology 1981, 159-160.
Shields, E. (1962). Comparison of children's guessing
ability (ESP) with personality characteristics.
Journal of Parapsychology, 26, 200-210. [2 studies]
Shrager, E. F. (1978). The effects of sender-receiver
relationship and associated personality variables
on ESP scores. Journal of the American Society for
Psychical Research, 72, 35-47. [2 studies]
Szczygielski, D., St Schrneidler, G. R. (1975). ESP and
two measures of introversion. Research in Parapsy-
chology/974, 15-17.
ThaIboume, M. A., Beloff, J., and Delanoy, D. (1982).
A test for the 'extraverted sheep versus intro-
verted goats' hypothesis. Research in Parapsychol-
ogy 1981, 155-156. [2 studies]
Thalbourne, M. A., Beloff, J., Delanoy, D., & Jung-
kuntz, J. H. (1983). Some further tests of the ex-
traverted sheep versus introverted goats
hypothesis. Research in Parapsychology 1982, 199-
200. [4 studies]
Thalboume, M. A., and Jungkuntz, J. H. (1983). Ex-
traverted sheep versus introverted goats: experi-
ments VII and VIII. Journal of Parapsychology, 47,
49-51. [2 studies]
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Journal 4 Parapsychology, V ol. 50. December 1986
META-ANALYTIC PROCEDURES AND THE
NATURE OF REPLICATION: THE
GANZFELD DEBATE
By ROBERT ROSENTHAL
ABSTRACT: This paper is a commentary on the valuable debate between Charles
Honorton (1985) and Ray Hyman (1985) about the evidence for psi in the ganzfeld
situation. Their debate was a creative, constructive, and task-oriented dialogue that
served admirably to sharpen the issues involved. In my commentary I focus on the
concept of replication, distinguishing the troublesome older view with a more use-
ful alternative. Specific issues related to replication are discussed including prob-
lems of multiple testing, subdividing studies, weighting replications, and problems
of small effects. The earlier meta-analytic work is summarized, evaluated, and com-
pared with a meta-analysis of a different controversial area. Rival hypotheses of
procedural and statistical types are discussed, and a tentative inference is offered.
The conclusion calls for wider use of newer views of the success of replication.
Science in general and parapsychological inquiry in particular
have been well served by the recent ganzfeld debate between
Charles Honorion (1985) and Ray- Hyman (1985) as organized by
the Journal's editor, K. Ramakrishna Rao. Two serious and highly
knowledgeable scholars have invested a, great amount of time, en-
ergy, and creative thought to produce a debate that is a model of
task-oriented, constructive dialogue. It is clear that the participants
have been devoted to clarifying and understanding the scientific is-
sues rather than simply to "scoring points."
As a result of their efforts we have an excellent review of the
issues to be considered in evaluating the data generated by the ganz-
feld experiments. In addition, through their meta-analytic work, we
have an enormously valuable quantitative summary of the ganzfeld
studies. In the end, Hyman and Honorton have not resolved all
their differences, nor is it likely that they will. Hyman has raised
cogent and telling questions. Honorton has answered them in co-
gent and ? telling terms. I am sure that Hyman will have excellent
The preparation of this paper and the development of some.:'of the ?procedures
described within it were supported by the National Science Foundation. Much of the
summary and interpretation of the meta-analyses will bC included in a paper com-
missioned by the National Academy tif Sciences dint is in preparation by Monica J.
Harris and Robert Rosenthal.
318 The journal of Parapsychology
study failed to replicate that of Smith. Such errors are made very
frequently in most areas of psychology and the other behavioral sci-
CrICCS.
>? Pseudo-Successful Replications
. Return now to Table I and focus attention on cell B, the cell of
<
M successful replication." Suppose that two investigators both rejected
0-
the null hypothesis at p < .05 with both results in the same direc-
. m
tion. Suppose further, however, that in one study the effect size r
X) was .90 whereas in the other, study the effect size r was only .10,
(D 'significantly smaller than the r of .90 (Rosenthal & Rubin, 1982a).
CD
fa)Iii this case our interpretation is more complex. We have indeed
cn
M had a successful replication of the rejection of the null, but we have
ts.)
not come even close to a successful replication of the effect size.
"Successful Replication" of Type II Error
co
Cell C of Table 1 represents the situation in which both studies
0 failed to reject the null hypothesis. Under those conditions investi-
1>gators might conclude that there was no relationship between the
73variables investigated. Such a conclusion could be very much in er-
ror, the more so the lower the power of the two studies was low
(Cohen, 1977). If power levels of the two studies (assuming medium
?effect sizes in, the population) were very high, say .90 or .95, then
two failures to obtain a significant relationship would provide evi-
Vence that the effect investigated was not likely to he a very large
affect. If power calculations had been made assurrfiLig_a very_sma1---
--two--faihrres-to-rerett-rt.le-n-u-11 altho-ugh not providing
gtrong evidence for the null would at least suggest that. the size of
ahe effect in the population was probably quite modest.
2, If sample sizes of the two studies failing to reject the null were
aiodest so that power to detect all but the largest effects were low,
4.e ry little could be concluded .from two failures to reject except that
the effect sizes were unlikely to be enormous. For example, two in-
vestigators with Ars of 20 and 40, respectively, find results not sig-
nificant at p < .05. The effect sizes phi (i.e., r for dichotomous var-
iables) were .29 and .20, respectively, and both p's xver'('
approximately .20. The combined p ol these two results, however.,
is .035[(z1 + z.2)/\77 = zi, and the mean efTect size in the mid-.20's
is not trivial (Rosenthal & Rubin, 1982h).
Gaizzfeld Debate-Rosen/hal
'FABLE 3
COMPARISON OF Two SF.TS OF REPLICATIONS
319
Replication sets
A B 0
Study 1 Study 2 Study 1 Study 2
0
96
15
98
27
p (two-tailed)
.05
.05
.01
.18
z (p)
1.96
1.96
2.58
1.34
.90
.50
.26
.96
CD
(1')
.20
.55
.27
.27
cs)
Cohen's q (z, -
.35
.00
CD
Comparing Views of Replication
The traditional, not very useful, view
Table 1 has two primary characteristics:
I. It focuses on significance level as the relevant summary statis-
tic of a study.
73
of replication
modeled
in
CO
2. It makes its evaluation of whether replication has been suc-
cessful in a dichotomous fashion. For example, replications are suc-
cessful if both or neither p < .05 (or .01, etc.), and they are unsuc-
cessful if one p < .05 (or .01, etc.) and the other p > .05 (or .01,
etc.). Psychologists' reliance on a dichotomous decision procedure
accompanied by an untenable discontinuity of credibility in results
varying in p levels has been well documented (Nelson, Rosenthal, &
Rosnow, 1986; Rosenthal & Gaito, 1963, 1964).
. --The -n-ewer,---rrrofe-tigeftit -View tirre-Plication success has two pri-
mary characteristics:
1. It focuses on effect size as the more important summary sta-
tistic of a study with only a relatively minor interest in the statistical
significance level.
2. It makes its evaluation of whether replication has been suc-
cessful in a continuous fashion. For example, two studies are. not
said to be successful or unsuccessful replicates of each other but,
rather, the degree of failure to replicate is specified.
Table 3 shows two sets of replications. Replication set A shows
two results both rejecting the null hut with a difference in effect
sizes of .30 in units of' I or .35 in units of Fisher's z transformation
of r (Cohen, 1977; Rosenthal & Rosnow, 1984; Snedecor & 'Coch-
ran, 1980). That difference, in units of r or Fisher's z is the degree
7.)
CD
6
CO
CD
0
0
C.4
322 The Journal of Parapsychology
multiple questions, multiple dependent variables make good scien-
tific sense. However, as both Honorton (1985) and Hyman (1985)
.int out, the use of multiple dependent variables may affect the
curacy of the p levels computed. For example, ii. five dependent.
giriables are used and one of these is found to show an effect at p
a .05, it would be misleading to say that an effect has been dem-
rrustrated at p < .05. That is because the actual p of finding one p
ggnificant at .05 (or any other chosen level) increases as the number
Azu tests made increases. That is not a good reason to decrease the
Eiriety of dependent variables used, assuming there is a good the-
/retical basis for choosing to use each one.
Alternate procedures are available. Bonferroni procedures can
ge used to adjust for the number of tests made (Rosenthal & Rubin,
k)983). To overcome the conservatism of this basic approach and de-
Z-ease Type II errors, it is possible to weight the dependent varia-
les according to their importance and apply a so-called ordered
onlerroni procedure (Rosenthal & Rubin, 1984, 1985). Perhaps it.
g most useful, however, to apply specially developed procedures
?1"1".D., integrate all the information 11:011) all .the .dependent variables
ittzl obtain only a single overall test of significance and effect size
lstimate. This can be accomplished very easily so long as we have
?reasonable estimates of the intercorrelations among the dependent
ariables (Rosenthal & Rubin, 1986).
co
'ttbdi-oiding Studies
w An issue discussed in the ganzield debate has to do with die sub-
alivision of studies into substudies as a function of different experi-
anental procedures or individual difference variables such as sex,
age, degree of belief in psi effects, and the like (Schmeidler, 1968).
cAs long as all the data are preserved and entered into the meta-
j,,analysis, no harm is done by subdividing. Indeed, subdividing is
very useful in the search for moderator variables (Rosenthal, 1984).
Subdividing could have a very biasing effect on the accuracy of
a cited p value if the overall data are subdivided in various ways,
significant results are reported for one or inure substudies, and the
rest of the substudies are "thrown away." In the ordinary more
proper application of meta-analytic procedures, however, subdivid-
ing makes little difference. Consider a psi experiment. with an over-
all nonsignificant effect = .13, two-tailed). After the study is over,
it is noted thiit about hall the subjects were favorable toward psi and
half were not and that there had been both female and male sub-
Gang:4d Debate?Rosenthal 323
TABLE
SUBDIVISION OF A LARGER EXPERIMENT
Believing subjects Disbelieving subjects
Two-tailed p z Two-tailed p
Female
Males
.05
.39
2.0
1.0.62
.62
0.5
?0.5
0
CD
Noir: For the study as a whole. p was .13 and z was 1.5 berme subdividing. Positive
z's reflect results in the predicted direction; negative z's reflect results in the unpre-
dicted direction.
jects. Suppose that a subgroup of subjects, say female believers,
show a significant psi effect but the remaining groups do not. No
harm is done by reporting that fact, though an adjustment is useful
in reporting the obtained p that takes into account how many
subgroups were tested. It is essential, however, that the results of
significance tests for the nonsignificant subgroups also be entered
into the meta-analysis.
Table 4 illustrates the situation; four substudies have been
formed, only one of which was .significant. When we combine the
results of the four substudies, however, we find the overall z to be
[(2.0) + (1.0) + (0.5) + (-0.5)]/V21 = 1.5, p = .13, two-tailed. Es-
sentially, subdividing makes little difference so long as no data are
discarded. If a particular substudy showed great promise of evi-
dencing psi, nothing would prevent the investigator from conduct-
ing new studies using only the preselected experimental conditions
or types of subjects. It would also be appropriate to conduct .a meta-
analysis on all the substudies that could be found that met the
promising condition. In that case, however, the initial "study of dis-
covery" should be entered with an adjustment for the fact that sev-
eral tests of significance were computed (Rosenthal & Rubin, 1983,
1984).
Flaw Effects and Weighting Replications
There are few flawless studies in the behavioral sciences. Flaws
can increase Type I or Type II errors, and the wise meta-analyst
would do well to note how well Hyman (1985) and Honorton (1985)
have searched for and evaluated flaws. For each flaw, it would be
desirable to make some estimate of how much difference it made to
the outcome. In the present debate some flaws scented to make a
difference and others did not. When Haws matter we can adjust for
320 - The Journal of Parapsychology
of failure to replicate. "lhat both studies were able to reject the null
and at exactly the same p level is simply a function of sample size.
Replication set B shows two studies with different p values, one sig-
nificant at < .05, the other not significant. However, the two effect
size estimates are in excellent agreement. We would say, accord-
ingly, that replication set B shows more successful replication than
does replication set A.
It should be noted that the values of Table 3 were chosen so that
the combined probability of the two studies of set A would be iden-
tical to the combined probability of the two studies of set B; (z, +.
z2)/V2 = z of 2.77, p = .0028, one-tailed.
The Metrics of the Success of Replication
Once we adopt a view of the success of replication as a function
of similarity of effect sizes obtained, we can become more precise in
our assessments of the success of replication. Figure 1 shows the
"replication plane" generated by crossing the results of the first
study conducted (expressed in units of the effect size r) by the re-
sults of the second study conducted. All perfect replications, those
in which the effect sizes are identical in the two studies, fall on a
diagonal rising from the lower left corner (-1.00,-1.00) to the up-
per right corner (+1.00, +1.00). The results of replication set B
from Table 3 are shown to fall exactly on the diagonal of successful
replication (+ .26, +.26). The results of replication set A are shown
to fall somewhat above the line representing perfect replication. Fig-
ure 1 shows that although set B reflects a more successful replica-
tion than set A, the latter is also located fairly close to the line and
is, therefore, a fairly successful replication set as well.
Cohen's q. An alternative to the indexing of the success of repli-
cation by the difference between obtained effect size r's is to trans-
form the I's to Fisher's z's before_ taking the_clifference?Fishe-es-z------
IffelTifli-distributed nearly normally and can thus be used in setting
confidence intervals and testing hypotheses about r's, whereas r's
distribution is skewed, and the more so as the population value of ?-
moves further from zero. Cohen's q is especially useful for testing
the significance of difference between two obtained effect size r's.
This is accomplished by means of the fact that
1 1
N, ? 3 N, ? 3
is distributed as z, the standard normal deviate (Rosenthal, 1984;
Ganzfeld Debate?Rosenthal 321
?1.00 ?.80 ?.60 ?.40 ? .20 .00 .20 .40 .60 .80 1.00
1.00 I I
.80 ?
.60 ?
.40 ---
n
o ?
z
.00 ?
0
40 ? .20 ?
'6
? .60
? .80
? 1.00
Set A
e,'R. Set B
?
Figure 1. The replication plane.
Rosenthal & Rubin, 1982a; Snedecor & Cochran, 1980). When there
are more than two effect size r's to be evaluated for their variability
(i.e., heterogeneity), the three references above all provide the ap-
propriate formula for computing the test of the heterogeneity of
ISSUES RELATED TO REPLICATION
Multiple Testing
In ganzfeld studies, in parapsychological research more broadly,
and, indeed, in most areas of behavioral science, it is common that
more than one test of significance is computed to evaluate a -re-
search hypothesis.. There may, for example, be a set of several de-
pendent variables used to evaluate outcome. So long as there are
0
0
0
0
0
324 The. J ourn al of Para psvch o logy
these flaws in our weighting or studies. For example, we Call give
weights of zero to truly terrible studies and lowered but nonzero
eights to less than truly terrible studies. Such weighting may lead
?!;) less biased conclusions than simple discarding of studies for flaws
iske, 1978; Rosenthal, 1984; ? Rosenthal & Rubin, 1985).
eplication Difficulty and Small Effects
0
Although I lyman (1985) and I Ionorton (1985) disagree on the
2.egree of confidence warranted by the ganzleld literature, they
gree that the results reported do not reflect an enormous magni-
(Pude of effect. In Cohen's (1977) terminology, the average size of
ganzfeld effect reported by Hyman. (1985) and H.onorton (1985)
a on the small side. That, of course, is not surprising. Controversial
aiesearch areas are characterized by small effect sizes. For example,
a recent review of five controversial areas of human performance
Tesearch, Harris and Rosenthal (1986) estimated the actual effect.
oizes (r) to range only from .00 to .18 with a median of .10 and a
5% confidence interval ranging From .02 to .19.
*I Small effect sizes are just what we shonld expect from contro-
gersial areas. According to fundamental principles of statistical
ower (Cohen, 1977), if the true effect. size were substantial, studies
cbvith only modest sample sizes would routinely be able to reject the
? c.
-41u11. For example, if the population value of r were .60, J0 of
co
crceplication attempts would he significant at p < .05 with sample sizes
gl 24 (Cohen, 1977, p. 92). However, if' the population value of r
avere .10, the median of out- five controversial areas (Harris & Ro-
aenthal, 1986), only 7% of replication attempts would be significant
it p < .05 with sample sizes of 24. For the small population value
r (.10), it would require sample sizes of' over 1,000 to achieve a
rate of rejecting the null at p < .05.
" Even though controversial research areas are characterized by
small effects (including zero as a possibility), that does not mean that
the effects are of no practical importance. Indeed, the median small
effect of five areas cited above (r = .10) is equivalent to improving
our success rate from 45% to a success rate of' 55% (Rosenthal &
Rubin, 19824
Before leaving the topic of replication difficulty, it may help us
to place this problem in useful perspective by noting that it is not
only in the parapsychological or 01.11C1' bell:160VA sciences that rep-
lication difficulties emerge. Indeed, students of the physical sciences
have pointed out failures to replicate the construction of TEA-lasers
Gan;reld Debate?Rosenthal 325
despite the availability of detailed instructions for replication. Ap-
parently TEA-lasers could be replicated dependably only when the
replication instructions were accompanied by a scientist who had ac-
tually built a laser (Collins, 1985).
SUMMARIZING THE META-ANALYSES
Ilyman (1985) and Honorton (1985) have done important meta-
analytic work on the topic of' the ganzfekl experiments; it is this
work 1 summarize here.
Five indices of "psi" success have been used in ganzfeld research
(Honorton, 1985). One criticism of research in this area is that some
investigators used several such indices in their studies and failed to
adjust their reported levels of significance (p) for the fact that they
had made multiple tests (Hyman, 1985). Because most studies used
a particular one of these five methods, the method of direct hits;
Honorton focused his meta-analysis on just those 28 studies ?(or.:.ii
total of 42) for which direct hit data were available.
The method of direct hits scores a success only when the single
correct target is chosen out of a set of I total targets. Thus, the prob.
ability of success on a single trial is 1// with I usually = 4 but some,-
times 5 or 6. The other methods, using some form of partial credit,
appear to be more precise in that they use more of the information
available. Although they differ in their interpretation of the results,
Honorton (1985) and Hyman (1985) agree quite well on the basic
quantitative results of the meta-analysis of these 28 studies. This
agreement holds both for the estimation of statistical significance
(Honorton, 1985, p. 58) and of effect size (Hyman, 1985, p. 13):
Stem-and-Leaf Display
Table 5 shows a stem-and-leaf display of the 28 effect size esti-
mates based on the direct hits studies summarized by Honorton
(1985, p. 84). The effect size estimates shown in Table 5 are in units
of Cohen's h, which. is the difference between (a) the arcsine 'trans-
formed proportion of direct hits obtained and (b) the arcsine trans-
formed proportion of direct hits expected under the null hypothesis
(i.e., lit). The advantage of it over j, the difference between raw pro-
portions, is that all It values that are identical are identically .detect-
able whereas all j values that are identical (e.g., .65 - .45 and
.25-.05) are not equally detectable (Cohen, 1977, p. 181).
Approved For Release 2003/04/18
396
The immlill oJ ParapAyehology
TABLE 5
STEM-AND-LEAF PLOT or "DIRECT HIT" GANZFELD STUDIES: COHEN'S It
Stem
Le a
1.4
1.3
1.2
1.1
1.0
.9
.8
.7
.6
.5
.4
.3
.2
.1
.0
-.0
-.1
.2
-.3
-.4
CD
-.5
-.6
0
co
CD
-.9 3
0
_
ukey (1977) developed the stem-and-leaf plot as a special form
o
f frequency distribution to facilitate .the inspection of a batch of
Qata. Each number in the data batch is made up of one stein and
re leaf, but each stem may serve several leaves. Thus, the stem .1
-14 followed by leaves of 3, 8, 8 representing the numbers .13, .18,
8. The first digit is the stem; the next digit is the leaf. The stem-
and-leaf display functions as any other frequency distribution but
the original data are retained precisely.
Distribution of studies. From Table 5 we see that the distribution
of effect sizes is unimodal, with the bulk of the results (80%) falling
between -.10 and .58. The distribution is nicely symmetrical, with
the skewness index (gi = .17). only 24% of that required for signif-
icance at p < .05 (Snedecor & Cochran, 1980, pp. 78-79, 492). The
. tails of the distribution, however, are too long for, normality with
,
4
3
3
8
0 2 2 2 4
1 2 2 4 4 7 8
2
3 8 8
7 7 9
5
0
2
0
:
Ganzfeld Debate-Rosenthal 327
kurtosis index g, = 2.04, p = .02. Relative to what we would expect
from a normal distribution, we have studies that show larger posi-
tive and larger negative effect sizes than would be reasonable. In-
deed, the two largest positive effect sizes are significant outliers at p
< .05, and the largest negative effect size approaches significance,
with a Dixon index of .37 compared to one of .40 for the largest
positive effect size (Snedecor & Cochran, 1980, pp. 279-280, 490).
The total sample of studies is still small; however, if a much larger
sample showed the same result, that would be a pattern consistent
with the idea that both strong positive results ("psi") and strong neg-
ative results ("psi-missing") might be more likely to find their way
into print or at least to be more available to a meta-analyst.
Distribution of subjects. It is useful to examine the distribution of
effect sizes obtained in the summarized studies. It would also be:
useful to examine the distribution of effect sizes obtained by indi-
vidual subjects wit/un the studies summarized. For example, in a
study with a mean I,. of .20, is the distribution of h fairly normal with
centering at .20, or is the distribution skewed with the bulk of the
subjects centered closer to zero but with a few subjects earning con-
sistently high values of .1i? ? ? ? ? - ?? ? - ? ? ??? ? ?
Distribution of investigators. Just as it is useful to examine the dis-
tribution of the results of studies ancl of subjects within studies. it is
also useful to examine the distribution of results obtained by differ-
ent investigators (Honorton, 1985; Hyman, 1985; Rosenthal, 1969,
1984). The 28 direct hit studies were conducted by 10 different in-
vestigators (Honorton, 1985, p. 60). Four investigators conducted
only one study each, two conducted two studies each, two conducted
ree-ai tidies each, one conductMlive studies, and one conducted
nine studies. Analysis of variance showed that these 10 investigators
differed significantly and importantly in the average magnitude of
the effects they obtained with F(9,18) = 3.81, p < .01, eta = .81.
Interestingly, there was little relationship between the mean effect
size obtained by each investigator and the number of studies con7
ducted (r = .11; 48) = 0.31, p > .70).
That different investigators may obtain significantly different re::
sults from their subjects is well known in various areas Of psychology
(Rosenthal, 1966). For example, in such a standard experimental
area as eyelid conditioning, studies conducted at Iowa obtained re-
sults in the predicted direction 94% of the time, whereas those cOn-
ducted elsewhere obtained such results only 62% of the time with
x2(1) = 4.05, p < .05, N = 25, r = .40 (Rosenthal, 1966, p. 24;
19"`I 110
328 The Jon tool
TABLE 6
STATISTICAL SUMMARY OF -DIREC1* HIT" GANZFELD STUDIES
> Central tendency (Coheds
Variability
Unweighted mean
.98
Maximum
1.44
o 'Weighted mean ?
.23
Quartile 3 (Q3)
.42
CD Median
.32
Median (Q2)
.32
a Proportion positive sign
.89
Quartile 1 (Q1)
.08
0
Minimum
- .93
.Sio-nificance tests
h
Q3 - Q1
.3/1
combined Stouffer z
(T) I test of mean z
6.60
3.23
ii-: 1.75 (Q3 -- Q1J
.96
.45
Z of proportion positive
CD
3.10
Correlation of Ii
With z
.86
w Confidence intenials"
With raw j
.98
0 Front To
co
0
80%
95%
99%
99.9%
.17
.11
.04
-.03
.39
.45
.52
.59
'Based on N of 28 studies.
0
toll Summary of Stein-and-Leaf Display
6
Table 6 provides a summary of the stem-and-leaf display of Ta-
CO ble 5 and some additional useful information about central ten- ?
dency, variability, significance tests, confidence intervals, and corre-
lations between Cohen's h and (a) significance level (z) and (b) raw
tj difference in proportions (j). Only a few comments are required.
0 Effect size. The bulk of the results (82%) show a positive effect
(.4 size where 50% would be expected under the null (p= .0004). The
0 mean effect size, h, of .28 is equivalent to having a direct hit rate of
.38 when .25 was expected under the null. The 95% confidence in-
terval suggests the likely range of effect sizes to be from .11 to .45,
equivalent to accuracy rates of .30 to .46 when .25 was expected
under the null hypothesis.
Significance testing. The overall probability that obtained accuracy
was better than the accuracy expected under the null was a p of
3.37/10" associated with a Stouffer z of 6.60 (Mosteller Ss: Bush,
1954; Rosenthal, 1978a, 1984).
File-drawer analysis. A combined p as low as that obtained can be
used as a guide to the tolerance level for null results that never
found their way into the meta-analytic data base (Rosenthal, 1979,
Ganzfeld Debate
329
1984). It has long been believed that studies failing to reach statis-
tical significance may be less likely to be published (Rosenthal, 1966;
Sterling, 1959). Thus it may be that there is a residual of nonsignifi-
cant studies languishing in the investigators' file drawers. With sim-
ple calculations, it can be shown that, for the current studies sum-
marized, there would have to be 423 studies with mean p .50,
one-tailed, or z = 0.00 in those file drawers before the overall com-
bined p would become just > .05, as Honorton (1985) has pointed
out.
That many studies unret rieved seems unlikely for this specialized
area of parapsychology (1-lonorton, 1985; Hyman, 1985). Based on
experience with meta-analyses in other-domains of research (e.g.,
interpersonal expectancy effects) the mean z or effect size for non-
significant studies is not 0.00 but a value pulled strongly from 0.00
toward the mean z or mean effect size of the obtained studies (Ro-
senthal & Rubin, 1978).
Comparison with an Earlier Meta-Analysis
It is instructive to compare die results of the ganzfeld research
meta-analysis by Honorton (1985) with the results of an older and
larger meta-analysis of another controversial research domain-
that of interpersonal expectancy effects (Rosenthal & Rubin, 1978).
In that analysis, eight areas of expectancy effects were summarized;
effect sizes (Cohen's d, roughly equivalent to Cohen's h) ranged
from .14 to 1.73 with a grand mean d of .70. Honorton's mean ef-
fect size (h = .28) exceeds the mean d of two of the eight areas
(reaction time experiments [d = .17], and studies using laboratory
interviews [d .14]).
The earlier meta-analysis displayed the distribution of the z's as-
sociated with the obtained p levels. Table 7 shows a comparison of
the two meta-analyses' distributions of z's. It is interesting to note
the high degree of similarity in the distributions of significance lev-
els. The total proportion of significant results is somewhat higher
for the ganzfeld studies but not significantly so (x2(1) = 1.07, N' =
373, p = .30, (1) = .05).
INTERPRETING THE IVIETA-ANALVTIG RESULTS
Although the results of the meta-analysis are clear, the meaning
of these results is open to various interpretations. The most obvious
0
? Predicted direction
+ 3.72 and above
+ 3.09 and above
0
+ 2.33 and above
? + 1.65 and above
- Not significant
? - 1.64 to + 1.64
? Unpredicted direction
- 1.65 and below
0
C.4
330 The journal of Parapsychology
TABLE 7
PROPORTION OF STUDIES REACHING CRITICAL LEVELS
FOR Two RESEARCH AREAS
OF SICNIFICANCE
Interval for z
Expected Expectancy Ganzfeld
proportion research research" Difference
.0001 .07 .04 -.03
.001 .12 .18 M6
.01 .19 .25 .06
.05 .36 .43 .07
.90 .60 .50 -.10
.05 .03 .07 .04
W = 345 studies: from Rosenthal & Rubin (1978).
''N = 28 studies; Iron] Ilonorton (1085).
al interpretation might be that at a very low p, and with a fairly im-
O pressive effect size, the ganzfeld psi phenomenon has been dem-
onstrated. However, there are rival hypotheses that will need to be
iJ -considered, many of them put forward in the detailed evaluation by
Hyman (1985).
c't Procedural Rival Hypotheses
co
Senso7y leakage. A standard rival hypothesis to the hypothesis of
. 0 ? ESP is that sensory leakage occurred and that the receiver was
knowing by the-sender ot by an incerfriedi-
? e
ary between the sender and receiver. As early as 1895, Hansen and
E3 Lehmann (1895) described "unconscious whispering" in the labora-
tory, and Kennedy (1938, 1939) was able to show that senders in
? telepathy experiments could give auditory cues to their receivers
4. quite unwittingly. Ingenious use of parabolic sound reflectors made
this demonstration possible. Moll (1898), Stratton (1921), and War-
ner and Raible (1937) all gave early warnings on the dangers of un-
intentional cueing (for summaries see Rosenthal, 1965, 1966). The
subtle kinds of cues described by these early workers were just the
kind we have come to look for in searching for cues given off by
experimenters that might serve to mediate the experimenter ex-
pectancy effects found in laboratory settings (Rosenthal, 1966,
1985).
ii II I
Gan4-eld Debate-Rosenthal 331
By their nature, ganzfeld studies tend to minimize problems of
sensory cueing. An exception occurs when the subject is asked to
choose which of four (or more) stimuli has been "sent" by another
person or agent.. When the sante stimuli held originally by the
sender are shown to the receiver, finger smudges or other marks
may serve as cues. Honorton has shown, however, that studies con-
trolling for this type of cue yield at least as many significant effects
as do the studies not controlling for this type of cue.
Recording errors. A second rival hypothesis has nearly as long a
history. Kennedy and Uphoff (1939) and Sheffield and Kaufman
(1952) both found biased errors of recording the data of parapsy-
chological experiments. In a meta-analysis of 139,000 recorded ob-
servations in 21 studies, it was found that about 1 % of all observa-
tions were in error and that, of the errors committed, twice as many
favored the hypothesis as opposed it (Rosenthal, 1978b). Although
it is difficult to rule recording errors out of.ganzfeld studies (or any
other kind of research), their magnitude is such that they could
probably have only a small biasing effect on the estimated average
effect size (Rosenthal, 1978b, p. 1007).
Intentional error. The very recent history of science has reminded
us that even though fraud in science is not quite of epidemic pro-
portion, it must be given close attention (Broad & Wade, 1982;
Zuckerman, 1977). Fraud in parapsychological research has been a
constant concern, a concern found to be justified by periodic fla-
grant examples (Rhine, 1975). In the analyses of Hyman (1985) and
Honorton (1985), in any case, there appeared to be no relationship
between degree of -monitoring of participants and the results of the
_study.
Statistical Rival Hypotheses
File-drawer issues. The problem of biased retrieval of studies for
any meta-analysis was described earlier. Part or this problem is ad-
dressed by the 10-year-old norm of the Parapsychological Associa-
tion of reporting negative results at its meetings and in its journals
(Honorton, 1985). Part of this problem is addressed also by Black-
more (1980), who conducted a survey to retrieve unreported ganz-
feld studies. She found that 7 of her total of 19 studies were judged
significant overall by the investigators. This proportion of significant
results (.37) was not significantly (or appreciably) lower than the
proportion of published studies found significant (.43) in Honor-
ton's (1985) meta-analysis of direct hit ganzleld studies ((1) =
rI ill
332 The Journal of Parapsychology
0.17, ck. = .06. Somewhat similar results were obtained by Sommer
> (in press). in her analysis of research on the menstrual cycle. She
_Om found 61% of the published results to be significant compared to
a 40% .of the unpublished studies; x2(1) = 2.30, p < .065, one-tailed,
al (I) = .20. The results of the Blackmore and Sommer studies did not
a' differ significantly (z = 0.69). Taken together, these studies provide
071 only modest evidence for a serious file-drawer problem.
A problem that seems to be a special case of the file-drawer
a) problem was pointed out by Hyman (1985). That was a possible ten-
dency to report the results of pilot studies along with subsequent
((0 significant results when the pilot data were significant. At the same
n.) time it is possible that pilot studies were conducted without prom-
o i .
o sing results, pilot studies that then found their way into the file
--- drawers. In any case, it is nearly impossible to have an accurate es-
timate of the number of unretrieved studies or pilot studies actually
at conducted. Chances seem good, however, that there would be fewer
^ than the 423 results of mean z = 0.00 required to bring the overall
0
combined p io > .05.
Multiple testing. Each ganzfeld -study may have More than one de-
? pendent variable for scoring degree of success. If investigators use
co these dependent variables sequentially until they find one significant
6 at p < .05, the true p will be higher than .05 (Hyman, 1985). This
^ issue was discussed earlier; it is not an inherently intractable one
?S (Rosenthal & Rubin, 1986).
? Randomization. Hyman (1985) has noted that the target stimulus
0 may not have been selected in a truly random way from the pool of
" potential targets. To the extent that this is the case, the p values
0 calculated can be in error. Hyman (1985) and Honorton (1985) dis-
agree over the frequency in this sample of studies of improper ran-
o
0 domization. In addition, they disagree over the magnitude of the
" relationship between inadequate randomization and study outcome.
4` Hyman felt this relationship to be significant and positive; Honorton
felt this relationship to be nonsignificant and negative. Because the
median p level of just those 16 studies using random number tables
or generators (z = .94) was essentially identical to that found for all
28 studies, it seems unlikely that poor randomization procedures
were associated with notch of an increase in significance level (Ilon-
orton, 1985, P. 71).
Statistical errors. Hyman (1985) and Honorton agree that 6 of the
28 studies contained statistical errors. However, the median effect
size of these studies (II = .33) was very similar to the overall median
= .32), so that it seems unlikely that these errors had a major
Ganzfeld Debate?Rosenthal
effect on the overall effect size
from the analysis decreases the
is equivalent to a drop of the
when .25 is the expected value
A Tentative Inference
333
estimate. Omitting these six studies
mean h from .28 to .26. Such a drop
mean accuracy rate from .38 to .37
under the null.
On the basis of the preceding summary and the very valuable
meta-analytic evaluations of Honorton (1985) and Hyman (1985),
what are we to believe? It would be easiest to say, "Let's wait until
more data have been accumulated from studies purged of the prob-
lems noted by Hyman, Honorton, and others." That is not a realistic
approach. At any point in time some judgment can be made, and
though our judgment might be more accurate later on when those
more nearly perfect studies become available, the situation for the
ganzfeld domain seems reasonably clear. We feel it would be im-
plausible to entertain the null given the combined p from these 28
studies. Given the various problems or flaws pointed out by Hyman
and Honorton, the true effect size is almost surely smaller than the
mean h of .28 equivalent to a mean accuracy of 38% when 25% is
expected under the null. We are persuaded that the net result of
statistical errors was a biased increase in estimated effect size of at
least a full percentage point (from 37% to 38%). Furthermore, we
are persuaded that file-drawer and related problems are such that
some of the smaller effect size results have probably been kept off
the market. If pressed to estimate a more accurate effect size, we
might think in terms of a shrinkage of /z from the obtained value.of
.28 to perhaps an h of .18. Thus, when the accuracy rate expected
under the null is 1/4, we might estimate the obtained accuracy rate
to be about 1/3.
CONCLUSION
Parapsychologists in particular and scientists in general owe a
great debt of gratitude to Ray Hyman (1985) and Charles Honorton
(1985) for their careful and extensive analytic and meta-analytic
work on the ganzfeld problem. Their debate has yielded an espe-
cially high lightTheat ratio, and many of the important issues have
now been brought out into bold relief.
In my commentary on the ganzfeld debate, I focused Most
closely on the concept of replication. That seemed appropriate, not
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
P-1?000?0001,COM68/00-96dCIU-VI3 814170/?00Z eseeieu -10d 130A0iddV
334 The Journal of Parapsychology
only because of the centrality of the problem of replicabilitv in the
parapsychological literature, but also because of the centrality of the
problem in many sciences, especially when the. efThcl sizes sought in
the population are small. The effect size zero is only a special case
of the class of small effect sizes.
In closing I want only to suggest that parapsychological and
other behavioral sciences would be well served to modify their view
of the success of replication in the direction of the following newer
view:
I. A replication is successful to the degree that the second study
obtains an effect size similar to the effect size of the first study.
2. Three or more investigations arc successful replicates of one
another to the extent that the effect sizes are homogeneous.
3. Significance testing has nothing to do with success of replica-
tion though it can be useful in many ways, including the assessment.
of the likelihood of the null given all prior research (weighted as
desired and as reasonable) and the likelihood of real differences
among the effect sizes of two or more studies.
R.I.TERENUE,S
BLACKMORE, S. (1980). The extent of selective reporting of. ESP ganzfekl
studies. European Journal of Parapsychology, 3, 213-219.
BROAD, W., & WADE, N. (1982). Betrayers of the truth. New York: Simon and
Schuster.
COHEN, J. (1977). Statistical power analysis for the behavioral sciences (rev. ed.).
New York: Academic Press.
COLLINS, H. M. (1985). Changing order: Replication and induction in scientific
,-GA:-Sage.
FISKE, D. W. (1978). The several kinds of generalization. The Behavioral and
Brain Sciences, 3, 393-394.
HANSEN, F. C. C., & 1..EnmANN, A. (1895). Veber Unwillkiirliches Flustern.
Philosophische Studien, 11, 471-530.
HARRIS, M. J., & ROSENTHAL, R. (1986). Interpersonal expectancy effects- and
human pmformance research. Report prepared for the National Academy
of Sciences.
HONORTON, C. (1985). Meta-analysis of psi ganzfeld research: A response
to Hyman. Journal of Parapsychology, 49, 51-91.
HYMAN, R. (1985). The ganzfeld psi experiment: A critical appraisal. jour-
nal of Parapsychology, 49, 3-49.
KENNEDY, J. L. (1938). Experiments on "unconscious whispering." Psycho-
logical Bulletin, 35, 526. (Abstract)
KENNEDY, J. 1.. (1939). A methodological review of extra-sensory percep-
'il#on. lrylogic 1letnz TIv9-17 :
!11
Ganzfeld Debate-Rosenthal 335
KENNEDY, J. I.., & tIMton,. H. F. (1939). Experiments on the nature of
extra-sensory perception: 111. The recording error criticism of extra-
chance scores. Journal of Pampsychohny, 3, 226-245.
Mutt., A. (1898). //y/mo/isnt (Atli ed.). New York: Scribner.
MosTELLER, F. M., & Busti, R. R. (1954). Selected quantitative techniques.
In G. Lindsey (Ed.), Handbook of social psychology: Vol. I. Theory and
method (pp. 289-334). Cambridge, MA: Addison-Wesley.
NELSON, N., ROSENTHAL, R., & RosNow, R. L. (1986). Interpretation of
significance levels and effect sizes by psychological researchers. Ameri-
can Psychologist, 41, 1299-1301.
RAo, K. R. (1985). The ganzfeld debate. Journal of Parapsychology, 49, 1-2.
RHINE, J. B. (1975). Second report on a case of experimenter fraud. Journal
of ['a ra psycho( oKy 19 906-325,
ROSENTIIM? R. (1965), Clever Hans: A case study of scientific method. In
0. Pfungst, Clever Hans (pp. ix-xlii). New York: Holt, Rinehart and
Winston.
ROSENTHAL, R. (1966). Experimenter (fleets in behavioral research. New York:
Appleton-Century-Crofts.
RosEKrunt., R. (196(.0. Interpersonal expectations. In R. Rosenthal R. L.
Rosnow (Eds.), Artifact in behavioral research (pp. 181-277). New York:
Academic Press.
ROSENTIIM., R. (1978a). Combining results of 'independent studies. Psycho-
logical Bulletin, 85, 185-193.
ROSENTHAL, R. (1978b). How often are our numbers wrong? American Psy-
chologist, 33, 1005-1008.
ROSENTHAL, R. (1979). The "file drawer problem" and tolerance for null
results. Psychological Bulletin, 86, 638-641.
RosEgrum., R. (1984). Meta-analytic procedures for social
Hills, CA: Sage.
-Rostwm-AL, R. (+98-5)-.--Nonvti hal cues- in t1 mediation of-interpersonal-.
research. Beverly
expectancy effects. In A. W. Siegman & S. Feldstein (Eds.), Mzdtichannel 0
0
integrations of nonverbal behavior (pp. 105-128). Hillsdale, NJ: Lawrence
0
Erlbaum Associates. 0
ROSENTHAL, R., & GArro, J. (1963). The interpretation of levels of signifi-
cance by psychological researchers. Journal of Psychology, 55, 33-38.
RosErmint., R., & GAIT?, J. (1964). Further evidence for the cliff effect in
the interpretation of levels of significance. Psychological Reports, 15, 570.
ROSENTHAL, R., & ROSNOW, R. L. (1984). Essentials of behavioral research:
Methods and data analysis. New York: McGraw-Hill.
ROSENTHAL, R., & RUBIN, D. B. (1978). Interpersonal expectancy effects:
The first 345 studies. The Behavioral and Brain Sciences, 3, 377-386.
ROSENTHAL, R., & RUBIN, D. B. (1979). Comparing significance levels of
independent studies. Psychological Bulletin, 86, 1165-1168.
ROSENTHAL, R., & RuinN, I). B. (1982a). Comparing effect sizes of 'hide-
, -tpdieF 13,-trho/ogH?,qulletin 92, 500-504.
'I
111
17-1.000?0001.?00t169/00-96dCltl-VI3 914170/?00Z eseeieu JOd peACLIddV
336 The Journal of Parapsycholog,y
RosENTHAL, R., & RUBIN, D. B. (1982b). A simple, general purpose display
of magnitude of experimental effect. journal of Educational Psychology,
74, 166-169.
ROSF.NTHAL, R., & RUBIN, D. B. (1983). Ensemble-adjusted p values. Psycho-
logical Bulletin, 94, 540-541.
ROSENTHAL, R., & RUBIN, D. B. (1984). Multiple contrasts and ordered ?
Bonferroni procedures. journal of Educational Psychology, 76, 1028-
1034
RosENTHAt., R., & RUBIN, I). B. (1985). Statistical analysis: Summariziry
evidence versus establishing facts. Psychological Bulletin, 97, 527-529.
ROSEN-I-HAL, R., & RUBIN, D. B. (1986). Meta-analytic procedures for corn.
bining studies with multiple effect sizes. Psychological Bulletin, 99, 400-
406.
SCHMEIDLER, G. R. (1968). Parapsychology. In International Encyclopedia
the Social Sciences (pp. 386-390). New York: MacMillan & Free Press.
SHEFFIELD, F. D., KAUFMAN, KS., & RHINE, J. B. (1952). A PK experimen
at Yale starts a controversy. Journal of the American Society for Psychica
Research, 46, 111-117.
SNEDECOR, G. W., & COCHRAN, W. G. (1980). Statistical methody (7th ed.)
Ames: Iowa State University Press.
SOMMER, B. (in press). The file drawer effect and publication rates in men
strual cycle research. Psychology of Women Quarterly.
SPENCE, K. W. (1964). Anxiety (drive) level and performance in eyelid con
dit ioning, Psychological Bulletin. 6 1 , I 20-139,
STERLING, T. D. (1959). Publication decisions and their possible effects o
inferences drawn from tests of significance?or vice versa. Journal (
the American Statistical Association, 54, 30-34.
STRATTON, G. M. (1921). The control of another person by obscure sign
Psychological Review, 28, 301-314.
TRUZZI, M. (1981). Reflections on paranormal communication: A zetetic
perspective. In T. A. Seheok & R. Rosenthal (Eds.), The Clever Hal
phenomenon (pp. 297-309). New York: New York Academy of Science
TUKEY, J. W.. (1977). Exploratmy data analysis. Reading, MA: Addison-We
Icy.
\VARNER, 1,.. & RAnn.E. M. (1937). Telepathy in the psychophysical labor
tory. Journal of Parapsychology, 1, 44-51.
ZUCKERMAN, 11. (1077). Deviant behavior and social control in science. 1
E. Sagarin (Ed.), Deviance and social change (pp. 87-138). Beverly Hill
CA: Sage.
Department of Psychology
Harvard University
Cambridge, MA 02138
Approved For Release 2003/04/18
CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
REPLICATION AND META-ANALYSIS IN PARAPSYCHOLOGY
Jessica Utts
Division of Statistics
University of California, Davis
1. INTRODUCTION
In a June 1990 Gallup Poll, 49% of the 1,236 respondents claimed to believe in
extrasensory perception (ESP), and one in four claimed to have had a personal experience
involving telepathy (Gallup and Newport, 1991). Other surveys have shown even higher
percentages; the University of Chicago's National Opinion Research Council recently surveyed
1,473 adults, of which 67% claimed that they had experienced ESP (Greeley, 1987).
Public opinion is a poor arbiter of science, however, and experience is a poor substitute
for the scientific method. For more than a century, small numbers of scientists have been
conducting laboratory experiments to study phenomena such as telepathy, clairvoyance, and
precognition, collectively known as "psi" abilities. This paper will examine some of that work,
as well as some of the statistical controversies it has generated.
Parapsychology, as this field is called, has been a source of controversy throughout its
history. Strong beliefs tend to be resistant to change even in the face of data, and many people,
scientists included, seem to have made up their minds on the question without examining any
empirical data at all. A critic of parapsychology recently acknowledged that "The level of the
debate during the past 130 years has been an embarrassment for anyone who would like to
believe that scholars and scientists adhere to standards of rationality and fair play" (Hyman,
1985a, p.89). While much of the controversy has focused on poor experimental design and
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
8
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
2
potential fraud, there have been attacks and defenses of the statistical methods as well,
sometimes calling into question the very foundations of probability and statistical inference.
Most of the criticisms have been leveled by psychologists. For example, a 1988 report
of the U.S. National Academy of Sciences concluded that "The committee finds no scientific
justification from research conducted over a period of 130 years for the existence of
parapsychological phenomena" (Druckrnan and Swds, 1988, p. 22). The chapter on
parapsychology was written by a subcommittee chaired by a psychologist who had published a
similar conclusion prior to his appointment to the committee (Hyman, 1985a, p.'7). There were
no parapsychologists involved with the writing of the report. Resulting accusations of bias
(Palmer, Honorton and Utts, 1989) led U.S. Senator Claiborne Pell to request that the
Congressional Office of Technology Assessment (OTA) conduct an investigation with a more
balanced group. A one-day workshop was held on September 30, 1988 bringing together
parapsychologists, critics, and experts in some related fields (including the author of this paper).
The report concluded that parapsychology needs TMa fairer hearing across a broader spectrum of
the scientific community, so that emotionality does not impede objective assessment of
experimental results" (Office of Technology Assessment, 1989).
It is in the spirit of the OTA report that this article is written. After Section 2, which
offers an anecdotal account of the role of statisticians and statistics in parapsychology, the
discussion turns to the more general question of replication of experimental results. Section 3
illustrates how replication has been (mis)interpreted by scientists in many fields. Returning to
parapsychology in Section 4,a particular experimental regime called the "ganzfeld" is described,
and an extended debate about the interpretation of the experimental results is discussed. Section
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
3
5 examines a meta-analysis of recent ganzfeld experiments designed to resolve the debate.
Finally, Section 6 contains a brief account of meta-analyses that have been conducted in other
areas of parapsychology, and conclusions are given in Section 7.
2. STATISTICS AND PARAPSYCHOLOGY
Parapsychology had its beginnings in the investigation of purported mediums and other
anecdotal claims in the late 19th century. The Society for Psychical Research was founded in
Britain in 1882, and its American counterpart was founded in Boston in 1884. While these
organizations and their members were primarily involved with investigating anecdotal material,
a few of the early researchers were already conducting "forced-choice" experiments such as
card-guessing. (Forced-choice experiments are like multiple choice tests; on each trial the
subject must guess from a small, known set of possibilities.) Notable among these was Nobel
Laureate Charles Richet, who is generally credited with being the first to recognize that
probability theory could be applied to card-guessing experiments (Rhine, 1977, p.26; Richet,
1884).
F.Y. Edgeworth, partly in response to what he considered to be incorrect analyses of
these experiments, offered one of the earliest treatises on the statistical evaluation of forced-
choice experiments in two articles published in the Proceedings of the Society for Psychical
Research (Edgeworth, 1885, 1886). Unfortunately, as noted by Mauskopf and McVaugh (1979)
in their historical account of the period, Edgeworth's papers were "perhaps too difficult for their
immediate audience" (p. 105).
Edgeworth began his analysis by using Bayes Theorem to derive the formula for the
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
3
5 examines a meta-analysis of recent ganzfeld experiments designed to resolve the debate.
Finally, Section 6 contains a brief account of meta-analyses that have been conducted in other
areas of parapsychology, and conclusions are given in Section 7.
2. STATISTICS AND PARAPSYCHOLOGY
Parapsychology had its beginnings in the investigation of purported mediums and other
anecdotal claims in the late 19th century. The Society for Psychical Research was founded in
Britain in 1882, and its American counterpart was founded in Boston in 1884. While these
organizations and their members were primarily involved with investigating anecdotal material,
a few of the early researchers were already conducting "forced-choice" experiments such as
card-guessing. (Forced-choice experiments are like multiple choice tests; on each trial the
subject must guess from a small, known set of possibilities.) Notable among these was Nobel
Laureate Charles Richet, who is generally credited with being the first to recognize that
probability theory could be applied to card-guessing experiments (Rhine, 1977, p.26; Richet,
1884).
F.Y. Edgeworth, partly in response to what he considered to be incorrect analyses of
these experiments, offered one of the earliest treatises on the statistical evaluation of forced-
choice experiments in two articles published in the Proceedings of the Society for Psychical
Research (Edgeworth, 1885, 1886). Unfortunately, as noted by Mauskopf and McVaugh (1979)
in their historical account of the period, Edgeworth's papers were "perhaps too difficult for their
immediate audience" (p. 105).
Edgeworth began his analysis by using Bayes Theorem to derive the formula for the
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
4
posterior probability that chance was operating, given the data. He then continued with an
argument "savouring more of Bernoulli than Bayes" in which "it is consonant, I submit, to
experience, to put 1/2 both for a and 13, " i.e. for both the prior probability that chance alone was
operating, and the prior probability that "there should have been some additional agency." He
then reasoned (using a Taylor Series expansion of the posterior probability formula) that if there
were a large probability of observing the data given that some additional agency was at work,
and a small objective probability of the data under chance, then the latter (binomial) probability
"may be taken as a rough measure of the sought a posteriori probability in favour of mere
chance" (p. 195). Edgeworth concluded his article by applying his method to some data
published previously in the same journal. He found the probability against chance to be .99996,
which he said "may fairly be regarded as physical certainty" (p. 199). He concluded:
"Such is the evidence which the calculus of probabilities affords as to the
existence of an agency other than mere chance. The calculus is silent as to the
nature of that agency -- whether it is more likely to be vulgar illusion or
extraordinary law. That is a question to be decided, not by formulae and figures,
but by general philosophy and common sense" (p. 199).
Both the statistical arguments and the experimental controls in these early experiments
were somewhat loose. For example, Edgeworth treated as binomial an experiment in which one
person chose a string of eight letters and another attempted to guess the string. Since it has long
been understood that people are poor random number (or letter) generators, there is no statistical
basis for analyzing such an experiment. Nonetheless, Edgeworth and his contemporaries set the
stage for the use of controlled experiments with statistical evaluation in laboratory
parapsychology.
One of the first American researchers to use statistical methods in parapsychology was
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
5
John Edgar Coover, who was the Thomas Welton Stanford Psychical Research Fellow, in the
Psychology Department at Stanford University, from 1912 to 1937 (Dommeyer, 1975). In 1917
Coover published a large volume summarizing his work (Coover, 1917). Coover believed that
his results were consistent with chance, but others have argued that Coover's definition of
significance was too strict (Dommeyer, 1975). For example, in one evaluation of his telepathy
experiments, Coover found a two-tailed p-value of .0062. He concluded "Since this value, then,
lies within the field of chance deviation, although the probability of its occurrence by chance is
fairly low, it cannot be accepted as a decisive indication of some cause beyond chance which
operated in favor of success in guessing" (Coover, 1917, p. 82). On the next page he made it
explicit that he would require a p-value of .0000221 to declare that something other than chance
was operating.
It was during the summer of 1930, with the card-guessing experiments of J.B. Rhine at
Duke University, that parapsychology began to take hold as a laboratory science. In fact,
Rhine's laboratory still exists under the name of the Foundation for Research on the Nature of
Man, housed at the edge of the Duke University campus.
It wasn't long after Rhine published his first book, Extrasensory Perception in 1934, that
the attacks on his methodology began. Since his claims were wholly based on statistical analyses
of his experiments, the statistioal methods were closely scrutinized by critics anxious to find a
plausible explanation for Rhine's positive results.
The most persistent critic was a psychologist from McGill University named Chester
Kellogg (Mauskopf and McVaugh, 1979). Kellogg's main argument was that Rhine was using
the binomial distribution (and normal approximation) on a series of trials that were not
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
6
independent. The experiments in question consisted of having a subject guess the order of a
deck of 25 cards, with five each of five symbols, so technically Kellogg was correct.
By 1937 several mathematicians and statisticians had come to Rhine's aid. Mauskopf and
McVaugh (1979) speculated that since statistics was itself a young discipline, "a number of
statisticians were equally outraged by Kellogg, whose arguments they saw as discrediting their
profession" ( p. 258). The major technical work, which acknowledged that Kellogg's criticisms
were accurate but did little to change the significance of the results, was conducted by Charles
Stuart and Joseph A. Greenwood and published in the first volume of the Journal of
Parapsychology (Stuart and Greenwood, 1937). Stuart, who had been an undergraduate in
mathematics at Duke, was one of Rhine's early subjects, and continued to work with him as a
researcher until Stuart's death in 1947. Greenwood was a Duke mathematician, who apparently
converted to a statistician at the urging of Rhine.
Another prominent figure who was distressed with Kellogg's attack was E. V.
Huntington, a mathematician at Harvard. After corresponding with Rhine, Huntington decided
that, rather than further confuse the public with a technical reply to Kellogg's arguments, a
simple statement should be made to the effect that the mathematical issues in Rhine's work had
been resolved. Huntington must have successfully convinced his former student, Burton Camp
of Wesleyan, that this was a wise approach. Camp was the 1937 President of IMS. When the
annual meetings were held in December of 1937 (jointly with AMS and AAAS), Camp released
a statement to the press that read:
"Dr. Rhine's investigations have two aspects: experimental and statistical. On the
experimental side Mathematicians, of course, have nothing to say. On the
statistical side, however, recent mathematical work has established the fact that,
assuming that the experiments have been properly performed, the statistical
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
7
analysis is essentially valid. If the Rhine investigation is to be fairly attacked, it
must be on other than mathematical grounds" (Camp, 1937).
One statistician who did emerge as a critic was William Feller. In a talk at the Duke
Mathematical Seminar on April 24, 1940, Feller raised three criticisms to Rhine's work (Feller,
1940). They had been raised before by others (and continue to be raised even today). The first
was that inadequate shuffling of the cards resulted in additional information from one series to
the next. The second was what is now known as the "file-drawer effect," namely, that if one
combines the results of published studies only, there is sure to be a bias in favor of successful
studies. The third was that the results were enhanced by the use of optional stopping, i.e. by
not specifying the number of trials in advance. All three of these criticisms were addressed in
a rejoinder by Greenwood and Stuart (1940), but Feller was never convinced. Even in its third
edition published in 1968, his book An Introduction to Probability Theory and Its Applications
still contains his conclusion about Greenwood and Stuart: "Both their arithmetic and their
experiments have a distinct tinge of the supernatural" (Feller, 1968, P. 407). In his discussion
of Feller's position, Diaconis (1978) remarks, "I believe Feller was confused.. .he seemed to
have decided the opposition was wrong and that was that."
Several statisticians have contributed to the literature in parapsychology to greater or
lesser degrees. T.N.E. Greville devoted much of his professional life to developing statistical
methods for parapsychology; Fisher (1924, 1929) addressed some specific problems in card-
guessing experiments; Wilks (1965) described various statistical methods for parapsychology;
Lindley (1957) presented a Bayesian analysis of some parapsychology data; and Diaconis (1978)
pointed out some problem S with certain experiments and presented a method for analyzing
experiments when feedback is given.
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
8
Occasionally, attacks on parapsychology have taken the form of attacks on statistical
inference in general, at least as it is applied to real data. Spencer-Brown (1957) attempted to
show that true randomness is impossible, at least in finite sequences, and that this could be the
explanation for the results in parapsychology. That argument re-emerged in a recent debate on
the role of randomness in parapsychology, initiated by psychologist J. Barnard Gilmore
(Gilmore, 1989; Utts, 1989a; Palmer, 1989; Gilmore, 1990; Palmer, 1990). Gilmore stated that
"The agnostic statistician, advising on research in psi, should take account of the possible
inappropriateness of classical inferential statistics" (1989, p.338). In his second paper, Gilmore
reviewed several non-psi studies showing purportedly random systems that do not behave as they
should under randomness (e.g. Iversen, Longcor, Mosteller, Gilbert, and Youtz, 1971; and
Spencer-Brown, 1957). Gilmore concluded that "Anomalous data ...should not be found nearly
so often if classical statistics offers a valid model of reality" (1990, p. 54), thus rejecting the use
of classical statistical inference for real-world applications in general.
3. REPLICATION
Implicit and explicit in the literature on parapsychology is the assumption that in order
to truly establish itself, the field needs to find a repeatable experiment. For example, Diaconis
(1978) starts the summary of his article in Science with the words "In search of repeatable EH)
experiments, modern investigators..." (p. 131). On October 28-29, 1983, the 32nd International
Conference of the Parapsychology Foundation was held in San Antonio, Texas, to address "The
Repeatability Problem in Parapsychology." The Conference Proceedings (Shapin and Coly,
1985) reflect the diverse views among parapsychologists on the nature of the problem. Honorton
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
9
(1985a) and Rao (1985), for example, both argued that strict replication is uncommon in most
branches of science, and that parapsychology should not be singled out as unique in this regard.
Other authors expressed disappointment in the lack of a single repeatable experiment in
parapsychology, with titles such as "Unrepeatability: Parapsychology's Only Finding"
(Blackmore, 1985), and "Research Strategies for Dealing with Unstable Phenomena" (Beloff,
1985).
It has never been clear, however, just exactly what would constitute acceptable evidence
of a repeatable experiment. In the early days of investigation, the major critics "insisted that
it would be sufficient for Rhine and Soal to convince them of ESP if a parapsychologist could
perform successfully a single 'fraud-proof experiment" (Hyman, 1985a, p. 71). However, as
soon as well-designed experiments showing statistical significance emerged, the critics realized
that a single experiment could be statistically significant just by chance. British psychologist
C.E.M. Hansel quantified the new expectation, that the experiment should be repeated a few
times, as follows:
"If a result is significant at the .01 level and this result is not due to chance but
to information reaching the subject, it may be expected that by making two
further sets of trials the antichance odds of one hundred to one will be increased
to around a million to one, thus enabling the effects of ESP -- or whatever is
responsible for the original result -- to manifest itself to such an extent that there
will be little doubt that the result is not due to chance" (Hansel, 1980, p.298).
In other words, three consecutive experiments at p .01 would convince Hansel that something
other than chance was at work.
This argument implies that if a particular experiment produces a statistically significant
result, but subsequent replications fail to attain significance, then the original result was probably
due to chance, or at least remains unconvincing. The problem with this line of reasoning is that
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
10
there is no consideration given to sample size or power. Only an experiment with extremely
high power should be expected to be "successful" three times in succession.
It is perhaps a failure of the way statistics is taught that many scientists do not understand
the importance of power in defining successful replication. To illustrate this point, psychologists
Tversky and Kahnemann (1982) distributed a questionnaire to their colleagues at a professional
meeting, with the question:
"An investigator has reported a result that you consider implausible. He ran 15
subjects, and reported a significant value, t = 2.46. Another investigator has
attempted to duplicate his procedure, and he obtained a nonsignificant value of
t with the same number of subjects. The direction was the same in both sets of
data. You are reviewing the literature. What is the highest value of t in the
second set of data that you would describe as a failure to replicate?" (1982, p.
28).
In reporting their results, Tversky and Kahnemann stated:
"The majority of our respondents regarded t = 1.70 as a failure to replicate. If
the data of two such studies (t = 2.46 and t = 1.70) are pooled, the value of t
for the combined data is about 3.00 (assuming equal variances). Thus, we are
faced with a paradoxical state of affairs, in which the same data that would
increase our confidence in the finding when viewed as part of the original study,
shake our confidence when viewed as an independent study" (1982, p. 28).
At a recent presentation to the History and Philosophy of Science Seminar at the
University of California at Davis, I asked the following question. Two scientists, Professors A
and B, each have a theory they would like to demonstrate. Each plans to run a fixed number
of Bernoulli trials and then test Ho: p = .25 versus H.: p > .25. Professor A has access to
large numbers of students each semester to use as subjects. In his first experiment he runs 100
subjects, and there are 33 successes (p = .04, one-tailed). Knowing the importance of
replication, Professor A nuis an additional 100 subjects as a second experiment. He finds 36
successes (p = .009, one-tailed).
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
11
Professor B only teaches small classes. Each quarter she runs an experiment on her
students to test her theory.
She carries out ten studies this way, with the following results:
Number of successes one-tailed p-value
10
4
.22
15
6
.15
17
6
.23
25
8
.17
30
10
.20
40
13
.18
18
7
.14
10
5
.08
15
5
.31
20
7
.21
I asked the audience by a show of hands to indicate whether or not they felt the scientists
had successfully demonstrated their theories. Professor A's theory received overwhelming
support, with approximately 20 votes, while Professor B's theory received only one vote.
If you aggregate the results of the experiments for each Professor, you will notice that
each conducted 200 trials, and Professor B actually demonstrated a higher level of success than
Professor A, with 71 as opposed to 69 successful trials. The one-tailed p-values for the
combined trials are .0017 for Professor A and .0006 for Professor B.
To address the question of replication more explicitly, I also posed the following
scenario. In December of 1987 it was decided to prematurely terminate a study on the effects
of aspirin in reducing heart attacks because the data were so convincing (See e.g. Greenhouse
and Greenhouse, 1988; Rosenthal, 1990a). The physician-subjects had been randomly assigned
to take aspirin or a placebo. There were 104 heart attacks among the 11,037 subjects in the
aspirin group, and 189 heart attacks among the 11,034 subjects in the placebo group (chi-square
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
12
= 25.01, p< .00001).
After showing the results of that study, I presented the audience with two hypothetical
experiments conducted to try to replicate the original result, with outcomes as follows:
REPLICATION #1 REPLICATION #2
Heart Attack Heart Attack
Yes No Yes No
Aspirin
11
1156
Aspirin
20
2314
Placebo
19
1090
Placebo
48
2170
Chi-square = 2.596, p=.11 Chi-square = 13.206, p =.0003
I asked the audience to indicate which one they thought was a more successful
replication. The audience chose the second one, as would most journal editors, because of the
"significant p-value". In fact, the first replication has almost exactly the same proportion of
heart attacks in the two groups as the original study, and is thus a very close replication of that
result. The second replication has very different proportions, and in fact the relative risk from
the second study is not even contained in a 95% confidence interval for relative risk from the
original study. The magnitude of the effect has been much more closely matched by the "non-
significant" replication.
Fortunately, psychologists are beginning to notice that replication is not as
straightforward as they were originally led to believe. A special issue of the Journal of Social
Behavior and Personality was entirely devoted to the question of replication (Neuliep, 1990).
In one of the articles, Rosenthal cautioned his colleagues: "Given the levels of statistical power
at which we normally operate, we have no right to expect the proportion of significant results
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
13
that we typically do expect, even if in nature there is a very real and very important effect"
(Rosenthal, 1990b, p.16).
Jacob Cohen, in his insightful article titled "Things I Have Learned (So Far)," identified
another misconception common among social scientists: "Despite widespread misconceptions to
the contrary, the rejection of a given null hypothesis gives us no basis for estimating the
probability that a replication of the research will again result in rejecting that null hypothesis"
(Cohen, 1990, p.1307).
Cohen and Rosenthal both advocate the use of effect sizes as opposed to significance
levels when defining the strength of an experimental effect. In general, effect sizes measure the
amount by which the data deviate from the null hypothesis in terms of standardized units. For
instance, the effect size for a two-sample t-test is usually defined to be the difference in the two
means, divided by the standard deviation for the control group. This measure can be compared
across studies without the dependence on sample size inherent in significance levels. (Of course
there will still be variability in the sample effect sizes, decreasing as a function of sample size.)
Comparison of effect sizes across studies is one of the major components of meta-analysis.
Similar arguments have recently been made in the medical literature. For example,
Gardner and Altman (1986) stated that the use of p-values "to define two alternative outcomes -
significant and not significant.- is not helpful and encourages lazy thinking" (p. 746). They
advocated the use of confidence intervals instead.
As discussed in the next section, the arguments used to conclude that parapsychology has
failed to demonstrate a replicable effect hinge on these misconceptions of replication and failure
to examine power. A more appropriate analysis would compare the effect sizes for similar
Approved For. Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
14
experiments across experimenters and across time to see if there have been consistent effects of
the same magnitude. Rosenthal also advocates this view of replication:
"The traditional view of replication focuses on significance level as the relevant
summary statistic of a study and evaluates the success I of a replication in a
dichotomous fashion. The newer, more useful view of replica.tion focuses on
effect size as the more important summary statistic of a study and evaluates the
success of a replication not in a dichotomous but in a continuous fashion"
(Rosenthal, 1990b, p. 28).
The dichotomous view of replication has been used throughout the history of
parapsychology, by both parapsychologists and critics (Utts, 1988). For example, the National
Academy of Sciences Report critically evaluated "significant" experiments, but entirely ignored
"nonsignificant" experiments.
In the next three sections we will examine some of the results in parapsychology using
the broader, more appropriate definition of replication. In doing so, we will show that the
results are far more interesting than the critics would have us believe.
4. THE GANZFELD DEBATE IN PARAPSYCHOLOGY
An extensive debate took place in the mid-1980's between a parapsychologist and critic,
questioning whether or not a particular body of parapsychological data had demonstrated psi
abilities. The experiments in question were all conducted using the ganzfeld setting (described
below). Several authors were invited to write commentaries on the debate. As a result, this
data base has been more thoroughly analyzed by both critics and proponents than any other, and
provides a good source for studying replication in parapsychology.
The debate concluded with a detailed series of recommendations for further experiments,
and left open the question of whether or not psi abilities had been demonstrated. A new series
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
15
of experiments that followed the recommendations were conducted over the next few years. The
results of the new experiments will be presented in Section 5.
4.1 Free-response Experiments
Recent experiments in parapsychology tend to use more complex target material than the
cards and dice used in the early investigations, partially to alleviate boredom on the part of the
subjects and partially because they are thought to "more nearly resemble the conditions of
spontaneous psi occurrences" (Burdick and Kelly, 1977, p. 109). These experiments fall under
the general heading of "free-response" experiments, because the subject is asked to give a verbal
or written description of the target, rather than being forced to make a choice from a small
discrete set of possibilities. Various types of target material have been used, including pictures,
short segments of movies on video tapes, actual locations, and small objects.
Despite the more complex target material, the statistical methods used to analyze these
experiments are similar to those for forced-choice experiments. A typical experiment proceeds
as follows. Before conducting any trials, a large pool of potential targets is assembled, usually
in packets of four. Similarity of targets within a packet is kept to a minimum, for reasons made
clear below. At the start of an experimental session, after the subject is sequestered in an
isolated room, a target is selected at random from the pool. A sender is placed in another room
with the target. The subject is asked to provide a verbal or written description of what he or
she thinks is in the target, knowing only that it is a photograph, an object, etc.
After the subject's description has been recorded and secured against the potential for
later alteration, a judge (who may or may not be the subject) is given a copy of the subject's
description and the four possible targets that were in the packet with the correct target. A
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
16
properly conducted experiment either uses video tapes or has two identical sets of target material
and uses the duplicate set for this part of the process, to ensure that clues such as fingerprints
don't give away the answer. Based on the subject's description, and of course on a blind basis,
the judge is asked to either rank the four choices from most to least likely to have been the
target, or to select the one from the four that seems to best match the subject's description. If
ranks are used, the statistical analysis proceeds by summing the ranks over a series of trials and
comparing the sum to what would be expected by chance. If the selection method is used, a
"direct hit" occurs if the correct target is chosen, and the number of direct hits over a series of
trials is compared to the number expected in a binomial experiment with p = .25.
Note that the subjects' responses cannot be considered to be "random" in any sense, so
probability assessments are based on the random selection of the target and decoys. In a
correctly designed experiment, the probability of a direct hit by chance is .25 on each trial,
regardless of the response, and the trials are independent. These and other issues related to
analyzing free-response experiments are discussed by Utts (19891i).
4.2 The Psi Ganzfeld Experiments
The ganzfeld procedure is a particular kind of free-response experiment utilizing a
perceptual isolation technique originally developed by Gestalt psychologists for other purposes.
Evidence from spontaneous case studies and experimental work had led parapsychologists to a
model proposing that psychic functioning may be masked by sensory input and by inattention
to internal states (Honorton, 1977). The ganzfeld procedure was specifically designed to test
whether or not reduction of external "noise" would enhance psi performance.
In these experiments, the subject is placed in a comfortable reclining chair in an
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
17
acoustically shielded room. To create a mild form of sensory deprivation, the subject wears
headphones through which white noise is played, and stares into a constant field of red light.
This is achieved by taping halved translucent ping-pang balls over the eyes and then illuminating
the room with red light. In the psi ganzfeld experiment, the subject speaks into a microphone
and attempts to describe the target material being observed by the sender in a distant room.
At the 1982 Annual Meeting of the Parapsychological Association, a debate took place
over the degree to which the results of the psi ganzfeld experiments constituted evidence of psi
abilities. Psychologist and &Ale Ray Hyman and parapsychologist Charles Honorton each
analyzed the results of all known psi ganzfeld experiments to date, and reached strikingly
different conclusions. The debate continued with the publication of their arguments in separate
articles in the March 1985 issue of the Journal of Parapsychology. Finally, in the December
1986 issue of the Journal of Parapsychology, Hyman and Honorton wrote a joint article in which
they highlighted their agreements and disagreements, and outlined detailed criteria for future
experiments. That same issue contained commentaries on the debate by ten other authors.
The data base analyzed by Hyman and Honorton consisted of results taken from 34
reports written by a total of 47 authors. Honorton counted 42 separate experiments described
in the reports, of which 28 reported enough information to determine the number of direct hits
achieved. Twenty three of the studies (55%) were classified by Honorton as having achieved
statistical significance at .05.
4.3 The Vote-Counting Debate
Vote-counting is the term commonly used for the technique of drawing inferences about
an experimental effect_ by counting the number of significant versus non-significant studies of
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
18
the effect. Hedges and 011dn (1985) give a detailed analysis of the inadequacy of this method,
showing that it is more and more likely to make the wrong decision as the number of studies
increases. While Hyman acknowledged that "vote-countirg raises many problems (Hyman,
1985b, p.8)," he nonetheless spent half of his critique of :he ganzfeld studies showing why
Honorton's count of 55% was wrong.
Hyman's first complaint was that several of the studies contained multiple conditions,
each of which should be considered as a separate study. rsing this definition he counted 80
studies (thus further reducing the sample sizes of the individual studies), of which 25 (31%)
were "successful." Honorton's response to this was to invite readers to examine the studies and
decide for themselves if the varying conditions constituted separate experiments.
Hyman next postulated that there was selection bias, so that significant studies were more
likely to be reported. He raised some important issues about how pilot studies may be
terminated and not reported if they don't show significant results, or may at least be subject to
optional stopping, allowing the experimenter to determine the number of trials. He also
presented a chi-square analysis that "suggests a tendency to report studies with a small sample
only if they have significant results" (Hyman, 1985b, p.14). but I have questioned his analysis
elsewhere (Utts, 1986, p. 397).
Honorton refuted Hyman's argument with four rejoinders (Honorton, 1985b, p.66). In
addition to reinterpreting Hyman's chi-square analysis, Honorton pointed out that the
Parapsychological Association has an official policy encouraging the publication of non-
significant results in its journals and proceedings, that a large number of reported ganzfeld
studies did not achieve statistical significance, and that there would have to be 15 studies in the
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
19
"file-drawer" for every one reported to cancel out the observed significant results.
The remainder of Hyman's vote-counting analysis consisted of showing that the effective
error rate for each study was actually much higher than the nominal 5%. For example, each
study could have been analyzed using the direct hit measure, the sum of ranks measure, or one
of two other measures used for free-response analyses. Hyman carried out a simulation study
that showed the true error rate would be .22 if "significance" was defined by requiring at least
one of these four measures to achieve the .05 level. He suggested several other ways in which
multiple testing could occur, and concluded that the effective error rate in each experiment was
not the nominal .05, but rather was probably close to the 31% he had determined to be the
actual success rate in his vote-count.
Honorton acknowledged that there was a multiple testing problem, but he had a two-fold
response. First, he applied a Bonferroni correction and found that the number of significant
studies (using his definition of a study) only dropped from 55% to 45%. Next, he proposed that
a uniform index of success be applied to all studies. He used the number of direct hits, since
it was by far the most commonly reported measure and was the measure used in the first
published psi ganzfeld study. He then conducted a detailed analysis of the 28 studies reporting
direct hits and found that 43% were significant at .05 on that measure alone. Further, he
showed that significant effects were reported by six of the 10 independent investigators, and
thus were not due to just one or two investigators or laboratories. He also noted that success
rates were very similar for reports published in refereed journals and those published in
unrefereed monographs and abstracts.
While Hyman's arguments identified issues such as selective reporting and optional
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
20
stopping that should be considered in any meta-analysis, the dependence of significance levels
on sample size makes the vote-counting technique almost useless for assessing the magnitude of
the effect. Consider for example the 24 studies where the direct hit measure was reported and
the chance probability of a direct hit was .25, the most common type of study in the data base.
(There were 4 direct hit studies with other chance probabilities and 14 that did not report direct
hits.) Of the 24 studies, 13 (54%) were "nonsignificant" at a = .05, one-tailed. But if the 367
trials in these "failed replications" are combined, there are 106 direct hits, z = 1.66, and p ?
.0485, one tailed. This is reminiscent of the dilemma of Professor B in Section 3.
Power is typically very low for these studies. The median sample size for the studies
reporting direct hits was 28. If there is a real effect and it increases the success probability from
the chance .25 to an actual .33 (a value whose rationale will be rnade clear below), the power
for a study with 28 trials is only .181 (Utts, 1986). It should be no surprise that there is a
"repeatability" problem in parapsychology.
4.4 Flaw Analysis and Future Recommendations
The second half of Hyman's paper consisted of a "Meta-Analysis of Flaws and
Successful Outcomes" (1985b, p. 30), designed to explore whether or not various measures of
success were related to specific flaws in the experiments. While many critics have argued that
the results in parapsychology can be explained by experimental flaws, Hyman's analysis was the
first to attempt to quantify the relationship between flaws and significant results.
Hyman identified 12 potential flaws in the ganzfeld experiments, such as inadequate
randomization, multiple teits used without adjusting the significance level (thus inflating the
significance level from the nominal 5%), and failure to use a duplicate set of targets for the
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
21
judging process (thus allowing possible clues such as fingerprints). -Using cluster and factor
analyses, the 12 binary flaw variables were combined into three new variables, which Hyman
named General Security, Statistics and Controls.
Several analyses were then conducted. The one reported with the most detail is a factor
analysis utilizing 17 variables for each of 36 studies. Four factors eme:ged from the analysis.
From these, Hyman concluded that security had increased over the years, that the significance
level tended to be inflated the most for the most complex studies, and that both effect size and
level of significance were correlated with the existence of flaws.
Following his factor analysis, Hyman picked the three flaws that seemed to be most
highly correlated with success, which were inadequate attention to both randomization and
documentation, and the potential for ordinary communication between the sender and receiver.
A regression equation was then computed using each of the three flaws as dummy variables, and
the effect size for the experiment as the dependent variable. From this equation, Hyman
concluded that a study without these three flaws would be predicted to have a hit rate of 27%.
He concluded that this is "well within the statistical neighborhood of the 25% chance rate" (ibid,
p. 37), and thus "the ganzfeld psi data base, despite initial impressions, is inadequate either to
support the contention of a repeatable study or to demonstrate the reality of psi" (ibid p. 38).
Honorton discounted both Hyman's flaw classification and his analysis. He did not deny
that flaws existed, but objected that Hyman's analysis was faulty and impossible to interpret.
Honorton asked psychometrician David Saunders to write an Appendix to his article, evaluating
Hyman's analysis. Saunders first criticized Hyman's use of a factor analysis with 17 variables
(many of which were dichotomous) and only 36 cases, and concluded that "the entire analysis
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
22
is meaningless" (Saunders, 1985, p.87). He then noted that Hyman's choice of the three flaws
to include in his regression analysis constituted a clear case of multiple analysis, since there were
84 possible sets of three that could have been selected (out of nine potential flaws), and Hyman
chose the set most highly correlated with effect size. Again, Saunders concluded that "any
interpretation drawn from [the regression analysis] must be regarded as meaningless" (ibid, p.
88).
Hyman's results were also contradicted by Harris and Rosenthal (1988b) in an analysis
requested by Hyman in his capacity as Chair of the National Academy of Sciences'
Subcommittee on Parapsychology. Using Hyman's flaw classifications and a multivariate
analysis, Harris and Rosenthal concluded that "Our analysis of the effects of flaws on study
outcome lends no support to the hypothesis that ganzfeld research results are a significant
function of the set of flaw variables" (1988b, p. 3).
Hyman and Honorton were in the process of preparing papers for a second round of
debate when they were invited to lunch together at the 1986 Meeting of the Parapsychological
Association. They discovered that they were in general agreement on several major issues, and
decided to coauthor a "Joint Communique" (Hyman and Honorton, 1986). It is clear from their
paper that they both thought it was more important to set the stage for future experimentation
than to continue the technical-arguments over the current data base. In the abstract to their
paper they wrote:
"We agree that there is an overall significant effect in this data base that cannot
reasonably be explained by selective reporting or multiple analysis. We continue to
differ over the degree to which the effect constitutes evidence for psi, but we agree that
the final verdict awaits the outcome of future experiments conducted by a broader range
of investigators and according to more stringent standards" (Ibid, p. 351).
Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 CIA-RDP96-00789R003100030001-4
23
The paper then outlined what these standards should be. They included controls against
any kind of sensory leakage, thorough testing and documentation of randomization methods used,
better reporting of judging and feedback protocols, control for multiple analyses, and advance
specification of number of trials and type of experiment. Indeed, any area of research could
benefit from such a careful list of procedural recommendations.
4.5 Rosenthal's Meta-Analysis
The same issue of the Journal of Parapsychology in which the Joint Communique
appeared also carried commentaries on the debate by 10 separate authors. In his commentary,
psychologist Robert Rosenthal, one of the pioneers of meta-analysis in psychology, summarized
the aspects of Hyman's and Honorton's work that would typically be included in a meta-analysis
(Rosenthal, 1986). It is worth reviewing Rosenthal's results so that they can be used as a basis
of comparison for the more recent psi ganzfeld sti..idies reported in Section 5.
Rosenthal, like Hyman and Honorton, focused only on the 28 studies for which
direct hits were known. He chose to use an effect size measure called Cohen's h, which is the
difference between the arcsin transformed proportions of direct hits that were observed and
expected:
? h =2 x(arcsinyiii -arcsinji)
One advantage of this measure over the difference in raw proportions is that can be used to
compare experiments with different chance hit rates.
If the observed and expected numbers of hits were identical, the effect size would be
zero. Of the 28 studies, 23 (82%) had effect sizes greater than zero, with a median effect size
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDp96-00789R003100030001-4
24
of .32 and a mean of .28. Thest, correspond to direct hit rates of .40 and .38 respectively, when
.25 is expected by chance. A 95% confidence interval for the true effect size is from .11 to .45,
corresponding to direct hit rates of from .30 to .46 when chance is .25.
A common technique in meta-analysis is to calculate a "combined z," found by summing
the individual z scores and dividing by the square root of the number of studies. The result
should have a standard normal distribution if each z score has a standard normal distribution.
For the ganzfeld studies, Rosenthal reported a combined z of 6.60 with ap-value of 3.37 x
He also reiterated Honorton's file-drawer assessment by calculating that there would have to be
423 studies unreported to negate the significant effect in the 28 direct hit studies.
Finally, Rosenthal acknowledged that because of the flaws in the data base and the
potential for at least a small file drawer effect, the true average effect size was probably closer
to .18 than .28. He concluded, "Thus, when the accuracy rate expected under the null is 1/4,
we might estimate the obtained accuracy rate to be about 1/3" (Ibid, p. 333). This is the value
used for the earlier power calculation.
It is worth mentioning that Rosenthal was commissioned by the National Academy of
Sciences to prepare a background paper to accompany its 1988 report on parapsychology. That
paper (Harris and Rosenthal, 1988a) contained much of the same analysis as his commentary
summarized above. Ironically, the discussion of the ganzfeld work in the National Academy
Report focused on Hyman's 1985 analysis, but never mentioned the work it had commissioned
Rosenthal to perform, which contradicted the final conclusion in the report.
Approved For Release 2003/04/18 : CIA-RDp96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
25
5. A META-ANALYSIS OF RECENT GANZFELD EXPERIMENTS
After the initial exchange with Hyman at the 1982 Parapsychological Association
Meeting, Honorton and his colleagues developed an automated ganzfeld experiment, that was
designed to eliminate the methodological flaws identified by Hyman. The execution and
reporting of the experiments followed the detailed guidelines agreed upon by Hyman and
Honorton.
Using this "autoganzfeld" experiment, eleven experimental series were conducted by eight
experimenters between February 1983 and September 1989, when the equipment had to be
dismantled due to lack of funding. In this section the results of these experiments are
summarized and compared to the earlier ganzfeld studies. Much of the information is derived
from Honorton et al (1990).
5.1 The Automated Ganzfeld Procedure
Like earlier ganzfeld studies, the "autoganzfeld" experiments require four participants.
The first is the Receiver (R), who attempts to identify the target material being observed by the
Sender (S). The Experimenter (E) prepares R for the task, elicits the response from R, and
supervises R's judging of the response against the four potential targets. (Judging is double-
blind; E does not know which is the correct target.) The fourth participant is the lab assistant
(LA) whose only task is to instruct the computer to randomly select the target. No one involved
in the experiment knows the identity of the target.
Both R and S are 'sequestered in sound-isolated, electrically shielded rooms. R is
prepared as in earlier ganzfeld studies, with white noise and a field of red light. In a
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4
26
non-adjacent room, S watches the target material on a television and can hear R's target
description ("mentation") as it is being given. The mentation is also tape-recorded.
The judging process takes place immediately after the 30 minute sending period. On a
TV monitor in the isolated room, R views the four choices from the target pack that contains
the actual target. R is asked to rate each one according to how closely it matches the ganzfeld
mentation. The ratings are converted to ranks, and if the correct target is ranked first, a direct
hit is scored. The entire process is automatically recorded by the computer. The computer then
displays the correct choice to R as feedback.
There were 160 pre-selected targets, used with replacement, in ten of the eleven series.
They were arranged in packets of 4, and the decoys for a given target were always the remaining
three in the same set. Thus, even if a particular target in a set were consistently favored by R's,
the probability of a direct hit under the null hypothesis would remain at 1/4. Popular targets
should be no more likely to be selected by the computer's random number generator than any
of the others in the set. The selection of the target by the computer is the only source of
randomness in these experiments. This is an important point, and one that is often
misunderstood. (See Utts, 1989b for elucidation.)
Eighty of the targets were "dynamic," consisting of scenes from movies, documentaries
and cartoons; and 80 were "static", consisting of photographs, art prints, and advertisements.
The four targets within each set were all of the same type Rarlier studies indicated that
dynamic targets were more likely to produce successful results, and one of the goals of the new
experiments was to test that theory.
The randomization procedure used to select the target and the order of presentation for
Approved For Release .2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
27
judging was thoroughly tested before and during the experiments. A detailed description is
given by Honorton et al (1990. p. 118-120).
Three of the eleven series were pilot series, five were formal series with novice
receivers, and three were formal series with experienced receivers. The last series with
experienced receivers was the only one that did not use the 160 targets. Instead, it used only
one set of four dynamic targets in which one target had previously received several first place
ranks, and one had never received a first place rank. The receivers, none of whom had had
prior exposure to that target pack, were not aware that only one target pack was being used.
They each contributed one session only to the series. This will be called the "special series" in
what follows.
Except for two of the pilot series, numbers of trials were planned in advance for each
series. Unfortunately, three of the formal series were not yet completed when the funding ran
out, including the special series, and one pilot study with advance planning was terminated early
when the experimenter relocated. There were no unreported trials during the six year period
under review, so there was no "file-drawer".
Overall, there were 183 R's who contributed only one trial and 58 who contributed more
than one, for a total of 241 participants and 355 trials. Only twenty three R's had previously
participated in ganzfeld experiments and 194 R's (81%) had never participated in any
parapsychological research.
5,2 Results
While acknowledging that no probabilistic conclusions can be drawn from qualitative
data, Honorton et al (1990), included several examples of session excerpts that R's identified as
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
28
providing the basis for their target rating. To give a flavor for the dream-like quality of the
mentation and the amount of information that can be lost by only assigning a rank, the first
example is reproduced here. The target was a painting by Salvador Dali called "Christ
Crucified." The correct target received a first place rank. The part of the mentation R used to
make this assessment read:
"... I think of guides, like spirit guides, leading me and I come into a court with a king.
It's quiet.... It's like heaven. The king is something like Jesus. Woman. Now I'm just
sort of summersaulting through heaven.... Brooding.... Aztecs, the Sun God.... High
priest.... Fear.... Graves. Woman. Prayer.... Funeral.... Dark. Death.... Souls.... Ten
Commandments. Moses...." (Ibid, p. 120).
Over all eleven series there were 122 direct hits in the 355 trials, for a hit rate of 34.4%
(exact binomial p-value = .00005) when 25% were expected by chance. Cohen's h is .20, and
a 95% confidence interval for the overall hit rate is from .30 to .39. This calculation assumes,
of course, that the probability of a direct hit is constant and independent across trials, an
assumption that may be questionable except under the null hypothesis of no psi abilities.
Honorton et al also calculated effect sizes for each of the eleven series and each of the
eight experimenters. All but one of the series (the first novice series) had positive effect sizes,
as did all of the experimenters.
The special series with experienced R's had an exceptionally high effect size with h =
.81, corresponding to 16 direct hits out of 25 trials (64%), but the remaining series and the
experimenters had relatively homogeneous effect sizes given the amount of variability expected
by chance. If the special series is removed, the overall hit rate is 32.1%, h = .16. Thus, the
positive effects are not due to just one series or one experimenter.
?
Seventy one of the 218 trials contributed by novices were direct hits (32.5%, h = .17),
Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
29
compared with 51 hits in the 137 trials by those with prior ganzfeld experience (37%, h = .26).
The hit rates and effect sizes were 31% (h = .14) for the combined pilot series, 32.5% (h =
. 17) for the combined formal novice series, and 41.5% (h = .35) for the combined experienced
series. The last figure drops to 31.6% if the outlier series is removed. Finally, without the
outlier series the hit rate for the combined series where all of the planned trials were completed
was 31.2% (h = .14) while it was 35% (h = .22) for the combined series that were terminated
early. Thus, optional stopping cannot account for the positive effect.
There were two interesting comparisons that had been suggested by earlier work and
were preplanned in these experiments. The first was to compare results for trials with dynamic
targets with those for static targets. In the 190 dynamic target sessions there were 77 direct hits
(40%, h = .32) and for the static targets there were 45 hits in 165 trials (27%, h = .05), thus
indicating that dynamic targets produced far more successful results.
The second comparison of interest was whether or not the sender was a friend of the
receiver. This was a choice the receiver could make. If he or she did not bring a friend, a lab
member acted as sender. There were 211 trials with friends as senders (some of whom were
also lab staff), resulting in 76 direct hits (36%, h = .24). Four trials used no sender. The
remaining 140 trials used non-friend lab staff as senders and resulted in 46 direct hits (33 %, h
= .18). Thus, trials with friends as senders were slightly more successful than those without.
Consonant with the definition of replication based on consistent effect sizes, it is
informative to compare the autoganzfeld experiments with the direct hit studies in the previous
data base. The overall success rates are extremely similar. The overall direct hit rate was
34.4% for the autoganzfeld studies and was 38% for the comparable direct hit studies in the
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
30
earlier meta-analysis. Rosenthal's (1986) adjustment for flaws had placed a more conservative
estimate at 33%, very close to the observed 34.4% in the new studies.
One limitation of this work is that the autoganzfeld studies, while conducted by eight
experimenters, all used the same equipment in the same laboratory. Unfortunately, the level of
funding available in parapsychology and the cost in time and equipment to conduct proper
experiments make it difficult to amass large amounts of data across laboratories. Another
autoganzfeld laboratory is currently being constructed at the University of Edinburgh in
Scotland, so interlaboratory comparisons may be possible in the near future.
Based on the effect size observed to date, large samples are needed to achieve reasonable
power. If there is a constant effect across all trials, resulting in 33% direct hits when 25% are
expected by chance, to achieve a one tailed significance level of .05 with 95% probability would
require 345 sessions.
We end this section by returning to the aspirin and heart attack example in Section 3, and
expanding a comparison noted by Atkinson et al (1990, p. 237). Computing the equivalent of
Cohen's h for comparing observed heart attack rates in the aspirin and placebo groups results
in h = .068. Thus, the effect size observed in the ganzfelcl data base is triple the much-
publicized effect of aspirin on heart attacks.
6. OTHER META-ANALYSES IN PARAPSYCHOLOGY
Four additional meta-analyses have been conducted in various areas of parapsychology
since the original ganzfeld meta-analyses were reported. Three of the four analyses focused on
evidence of psi abilities, while the fourth examined the relationship between extraversion and
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4
31
psychic functioning. In this section, each of the four analyses will be briefly summarized.
There are only a handful of English-language journals and proceedings in
parapsychology, so retrieval of the relevant studies in each of the four cases was simple to
accomplish by searching those sources in detail and by searching other bibliographic data bases
for keywords.
Each analysis included an overall summary, an analysis of the quality of the studies
versus the size of the effect, and a "file-drawer" analysis to determine the possible number of
unreported studies. Three of the four also contained comparisons across various conditions.
6.1 Forced-choice Precognition Experiments
Honorton and Ferrari (1989) analyzed forced-choice experiments conducted from 1935
to 1987, in which the target material was randomly selected after the subject had attempted to
predict.what it would be. The time delay in selecting the target ranged from under a second to
one year. Target material included items as diverse as ESP cards and automated random number
generators. Two investigators, S.G. Soal and Walter J. Levy, were not included because some
of their work has been suspected to be fraudulent.
Overall Results. There were 309 studies reported by 62 senior authors, including more
than 50,000 subjects and nearly two million individual trials. Honorton and Ferrari used z /Vn
as the measure of effect size (ES) for each study, where n was the number of Bernoulli trials in
the study. They reported a mean ES of 0.020, and a mean z-score of 0.65 over all studies.
They also reported a combined z of 11.41, p = 6.3 x 10-25. Thirty percent (92) of the studies
were statistically significara at cy = .05. The mean ES per investigator was 0.033, and the
significant results were not due to just a few investigators.
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
32
Quality. Eight dichotomous quality measures were assigned to each study, resulting in
possible scores from zero for the lowest quality, to eight for the highest. They included features
such as adequate randomization, preplanned analysis, and automated recording of the results.
The correlation between study quality and effect size was 0.081, indicating a slight tendency for
higher quality studies to be more successful, contrary to claims by critics that the opposite would
be true. There was a clear relationship between quality and year of publication, presumably
because over the years experimenters in parapsychology have responded to suggestions from
critics for improving their methodology.
File-drawer. Following Rosenthal (1984), the authors calculated the "fail-safe .N"
indicating the number of unreported studies that would have to be sitting in file-drawers in order
to negate the significant effect. They found N = 14,268, or a ratio of 46 unreported studies for
each one reported. They also followed a suggestion by Dawes et al (1984) and computed the
mean z for all studies with z > 1.65. If such studies were a random sample from the upper 5%
tail of a N(0,1) distribution, the mean z would be 2.06. In this case it was 3.61. They
concluded that selective reporting could not explain these results.
Comparisons. Four variables were identified that appeared to have a systematic
relationship to study outcome. The first was that the 25 studies using subjects selected on the
basis of good past performance were more successful than the 223 using unselected subjects,
with mean effect sizes of .051 and .008, respectively. Second, the 97 studies testing subjects
individually were more successful than the 105 studies that used group testing; mean effect sizes
were .021 and .004, respectively. Timing of feedback was the third moderating variable, but
information was only available for 104 studies. The 15 studies that never told the subjects what
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
33
the targets were had a mean effect size of -.001. Feedback after each trial produced the best
results, the mean ES for the 47 studies was .035. Feedback after each set of trials resulted in
mean ES of .023 (21 studies), while delayed feedback (also 21 studies) yielded a mean ES of
only .009. There is a clear ordering, as the gap between time of feedback and time of the actual
guesses decreased, effect sizes increased.
The fourth variable was the time interval between the subject's guess and the actual target
selection, available for 144 studies. The best results were for the 31 studies that generated
targets less than a second after the guess (mean ES = .045), while the worst were for the 7
studies that delayed target selection by at least a month (mean ES = .001). The mean effect
sizes showed a clear trend, decreasing in order as the time interval increased from minutes to
hours to days to weeks to months.
6.2, Attempts to Influence Random Physical Systems
Radin and Nelson (1989) examined studies designed to test the hypothesis that The
statistical output of an electronic RNG [random number generator] is correlated with observer
intention in accordance with prespecified instructions" (p. 1502). These experiments typically
involve RNGs based on radioactive decay, electronic noise, or pseudorandom number sequences
seeded with true random sources. Usually the subject is instructed to try to influence the results
of a string of binary trials by mental intention alone. A typical protocol would ask a subject to
press a button (thus starting the collection of a fixed-length sequence of bits), and then try to
influence the random source to produce more zeroes or more ones. A run might consist of three
successive button presses, One each in which the desired result was more zeroes or more ones,
and one as a control with no conscious intention: A z score would then be computed for each
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
34
button press.
The 832 studies in the analysis were conducted from 1959 to 1987, and included 235
"control" studies in which the output of the RNGs were recorded but there was no conscious
intention involved. These were usually conducted before and during the experimental series, as
tests of the RNGs.
Results. The effect size measure used was again z /Vn, where z was positive if more
bits of the specified type were achieved. The mean effect size for control studies was not
significantly different from zero (-1.0 x 10-5). The mean effect size for the experimental studies
was also very small, 3.2 x 104, but it was significantly higher than the mean ES for the control
studies (z = 4.1).
Quality. Sixteen quality measures were defined and assigned to each study, under the
four general categories of procedures, statistics, data, and the RNG device. A score of 16
reflected the highest quality. The authors regressed mean effect size on mean quality for each
investigator, and found a slope of 2.5 x 10-5 with standard error of 3.2 x 10, indicating little
relationship between quality and outcome. They also calculated a weighted mean effect size,
using quality scores as weights, and found that it was very similar to the unweighted mean ES.
They concluded that "differences in methodological quality are not significant predictors of effect
size" (p. 1507).
File-drawer. Radin and Nelson used several methods for estimating the number of
unreported studies (p. 1508-10). Their estimates ranged from 200 to 1000 based on models
assuming that all significant studies were reported. They also calculated the fail-safe N to be
54,000.
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
35
6.3 Attempts to Influence Dice
Radin and Ferrari (1991) examined 148 studies, published from 1935 to 1987, designed
to test whether or not consciousness can influence the results of tossing dice. The also found
31 "control" studies in which no conscious intention was involved.
? Results. The effect size measure used was z /Vn, where z was based on the number of
throws in which the die landed with the desired face (or faces) up, in n throws. The weighted
mean ES for the experimental studies was 0.0122 with a standard error of 0.00062; for the
control studies the mean and standard error were 0.00093 and 0.00255, respectively. Weights
for each study were determined by quality, giving more weight to high quality studies.
Combined z scores for the experimental and control studies were reported by Radin and Ferrari
to be 18.2 and 0.18, respectively.
Quality. Eleven dichotomous quality measures were assigned, ranging from automated
recording to whether or not control studies were interspersed with the experimental studies. The
final quality score for each study combined these with information on method of tossing the dice,
and with source of subject (defined below). A regression of quality score versus effect size
resulted in a slope of -.002, with a standard error of .0011. However, when effect sizes were
weighted by sample size there was a significant relationship between qiinlity and effect size,
leading Radin and Ferrari to conclude that higher quality studies produced lower weighted effect
sizes.
File-drawer. Radin and Ferrari calculated Rosenthal's fail-safe N for this analysis to be
17,974. Using the assumption that all significant studies were reported, they estimated the
number of unreported studies to be 1,152. As a final assessment, they compared studies
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
36
published before and after 1975, when the Journal of Parapsychology adopted an official policy
of publishing nonsignificant results. They concluded, based on that analysis, that more
nonsignificant studies were published after 1975, and thus "We must consider the overall (1935-
1987) data base as suspect with respect to the filedrawer problem."
Comparisons. Radin and Ferrari noted that there was bias in both the experimental and
control studies across die face. Six was the face most likely to come up, consistent with the
observation that it has the least mass. Therefore, they examined results for the subset of 69
studies in which targets were evenly balanced among the six faces. They still found a significant
effect, with mean and standard error for effect size of 8.6 x 10-3 and 1.1 x 10-3, respectively.
The combined z was 7.617 for these studies.
They also compared effect sizes across types of subjects used in the studies, categorizing
them as unselected, experimenter and other subjects, experimenter as sole subject, and specially
selected subjects. Like Honorton and Ferrari (1989), they found the highest mean ES for studies
with selected subjects; it was approximately .02, more than twice that for unselected subjects.
6.4 Extraversion and ESP Performance
Honorton, Ferrari and Bern (1990) conducted a meta-analysis to examine the relationship
between scores on tests of extraversion and scores on psi-related tasks. They found 60 studies
by 17 investigators, conducted from 1945 to 1983.
Results. The effect size measure used for this analysis was the correlation between each
subject's extraversion score and ESP score. A variety of measures had been used for both
scores across studies, so various correlation coefficients were used. Nonetheless, a stem and
leaf diagram of the correlations showed an approximate bell shape with mean and standard
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
37
deviation of .19 and .26, respectively, and with an additional outlier at r = .91. Honorton et
al reported that when weighted by degrees of freedom, the weighted mean r was .14, with a
95% confidence interval covering .10 to .19.
Forced-choice versus Free-response Results. Because forced-choice and free-response
tests differ qualitatively, Honorton et al chose to examine their relationship to extraversion
separately. They found that for free-response studies there was a significant correlation between
extraversion and ESP scores, with mean r = .20 and z = 4.46. Further, this effect was
? homogeneous across both investigators and extraversion scales.
For forced-choice studies, there was a significant correlation between ESP and
extraversion, but only for those studies that reported the ESP results to the subjects before
measuring extraversion. Honorton et al speculated that the relationship was an artifact, in which
extraversion scores were temporarily inflated as a result of positive feedback on ESP
performance.
Confirmation with New Data. Following the extraversion/ESP meta-analysis, Honorton
et al attempted to confirm the relationship using the autoganzfeld data base. Extraversion scores
based on the Myers-Briggs Type Indicator were available for 221 of the 241 subjects who had
participated in autoganzfeld studies.
The correlation between extraversion scores and ganzfeld rating scores was r = .18, with
a 95% confidence interval from .05 to .30. This is consistent with the mean correlation of r =
.20 for free-response experiments, determined from the meta-analysis. These correlations
indicate that extraverted subjects can produce higher scores in free-response ESP tests.
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
38
7. CONCLUSIONS
Parapsychologists often make a distinction between "proof-oriented research" and
"process-oriented research." The former is typically conducted to test the hypothesis that psi
abilities exist, while the latter is designed to answer questions about how psychic functioning
works. Proof-oriented research has dominated the literature in parapsychology. Unfortunately,
many of the studies used small samples and would thus be nonsignificant even if a moderate-
sized effect exists.
The recent focus on meta-analysis in parapsychology has revealed that there are small but
consistently nonzero effects across studies, experimenters, and laboratories. The size of the
effects in forced-choice studies appear to be comparable to those reported in some medical
studies that had been heralded as breakthroughs. (See Section 5, and Honorton and Ferrari,
1989, p. 301.) Free-response studies show effect sizes of far greater magnitude.
A promising direction for future process-oriented research is to examine the causes of
individual differences in psychic functioning. The ESP/extraversion meta-analysis is a step in
that direction.
In keeping with the idea of individual differences, Bayes and empirical Bayes methods
would appear to make more sense than the classical inference methods commonly used, since
they would allow individual abilities and beliefs to be modelled. Jeffreys (1990) reported a
Bayesian analysis of some of the RNG experiments, and showed that conclusions were closely
tied to prior beliefs even though hundreds of thousands of trials were available.
It may be that the nonzero effects observed in the meta-analyses can be explained by
something other than ESP, such as shortcomings in our understanding of randomness and
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
39
independence. Nonetheless, there is an anomaly that needs an explanation. As I have argued
elsewhere (Utts, 1987) research in parapsychology should receive more support from the
scientific community. If ESP does not exist, there is little to be lost by erring in the direction
of further research; which may in fact uncover other anomalies. If ESP does exist there is much
to be lost by not doing process-oriented research, and much to be gained by discovering how
to enhance and apply these abilities to important world problems.
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
40
REFERENCES
Atkinson, R.L., Atkinson, R.C., Smith, E.E. and Bern, D.J. (1990). Introduction to
Psychology, 10th Ed. Harcourt Brace Jovanovich, San Diego.
Beloff, J. (1985). Research strategies for dealing with unstable phenomena. In The Repeatability
Problem in Parapsychology (B. Shapin and L. Coly. eds.) 1-21. Parapsychology
Foundation, New York.
Blackmore, S.J. (1985). Unrepeatability: Parapsychology's only finding. In The Repeatability
Problem in Parapsychology (B. Shapin and L. Coly. eds.) 183-206. Parapsychology
Foundation, New York.
Burdick, D.S. and Kelly, E.F. (1977). Statistical methods in parapsychological research. In
Handbook of Parapsychology (B.B. Wolman, ed.) 81-130. Van Nostrand Reinhold, New
York.
Camp, B.H. (1937). (Statement in Notes Section.) Journal of Parapsychology 1 305.
Cohen, J. (1990). Things I have learned (so far). American Psychologist 45 1304-1312.
Coover, I.E. (1917). Experiments in Psychical Research at Leland Stanford Junior University.
Stanford University, Stanford, CA.
Dawes, R.M. , Landman, I. and Williams, J. (1984). Reply to Kurosawa. American Psychologist
39 74-75.
Diaconis, P. (1978). Statistical problems in ESP research. Science 201 131-136.
Dommeyer, F.C. (1975). Psychical Research at Stanford University. Journal of Parapsychology
39 173-205.
Druckman, D. and Swets, J.A., Eds. (1988). Enhancing Human Performance: Issues, Theories,
and Techniques. National Academy Press, Washington, DC.
Edgeworth, F.Y. (1885). The calculus of probabilities applied to psychical research.
Proceedings of the Society for Psychical Research 3 190-199.
Edgeworth, F.Y. (1886). The calculus of probabilities applied to psychical research II.
Proceedings of the Society for Psychical Research 4 189-208.
Feller, W.K. (1968). An Iniroduction to Probability Theory and Its Applications, Volume 1, 3rd
Ed. Wiley, New York.
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
41
Feller, W.K. (1940). Statistical aspects of ESP. Journal of Parapsychology 4 271-297.
Fisher, R.A. (1924). A method of scoring coincidences in tests with playing cards. Proceedings
of the Society for Psychical Research 34 181-185.
Fisher, R.A. (1929). The statistical method in psychical research. Proceedings of the Society for
Psychical Research 39 189-192.
Gallup, G.H. Jr. and Newport, F. (1991). Belief in paranormal phenomena among adult
Americans. Skeptical Inquirer 15 137-146.
Gardner, M.J. and Altman, D.G. (1986). Confidence intervals rather than p-values: estimation
rather than hypothesis testing. British Medical Journal 292 746-750.
Gilmore, J.B. (1989). Randomness and the search for psi. Journal of Parapsychology 53 309-
340.
Gilmore, J.B. (1990). Anomalous significance in pararandom and psi-free domains. Journal of
Parapsychology 54 53-58.
Greeley, A. (1987). Mysticism goes mainstream. American Health 7 47-49.
Greenhouse, J.B. and Greenhouse, S.W. (1988). An aspirin a day...? Chance 1 24-31.
Greenwood, J.A. and Stuart, C.E. (1940). A review of Dr. Feller's critique. Journal of
Parapsychology 4 299-319.
Hansel, C.E.M. (1980). ESP and Parapsychology: A Critical Re-evaluation. Prometheus Books,
Buffalo.
Harris, M.J. and Rosenthal, R. (1988a). Interpersonal Expectancy Effects and Human
Peiformance Research. National Academy Press, Washington DC.
Harris, M.J. and Rosenthal, R. (1988b). Postscript to Interpersonal Expectancy Effects and
Human Performance Research. National Academy Press, Washington DC.
Hedges, L.V. and 011dn, I. (1985). Statistical Methods for Meta-Analysis . Academic Press, Inc.,
Orlando, FL.
Honorton, C. (1977). Psi and internal attention states. In Handbook of Parapsychology (B.B.
Wolman, ed.) 435-472. Van Nostrand Reinhold, New York.
Honorton, C. (1985a). How to evaluate and improve the replicability of parapsychological
effects. In The Repeatability Problem in Parapsychology (B. Shapin and L. Coly. eds.)
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
42
238-255. Parapsychology Foundation, New York.
Honorton, C. (1985b). Meta-analysis of psi ganzfeld research: A response to Hyman. Journal
of Parapsychology 49 51-91.
Honorton, C., Berger, R.E., Vargoglis, M.P., Quant, M., Derr, P., Schechter, E.I. and
Ferrari, D.C. (1990). Psi communication in the ganzfeld: experiments with an automated
testing system and a comparison with a meta-analysis of earlier studies. Journal of
Parapsychology 54 99-139.
Honorton, C. and Ferrari, D.C. (1989). "Future telling": A meta-analysis of forced-choice
precognition experiments, 1935-1987. Journal of Parapsychology 53 281-308.
Honorton, C., Ferrari, D.C., and Bern, D.J. (1990). Extraversion and ESP Performance: A
meta-analysis and a new confirmation. Proceedings of the Annual Meeting of the
Parapsychological Association.
Hyman, R. (1985a). A critical overview of parapsychology. In A Skeptic's Handbook of
Parapsychology (P. Kurtz, ed.) 1-96. Prometheus Books, Buffalo.
Hyman, R. (1985b). The ganzfeld psi experiment: A critical appraisal. Journal of
Parapsychology 49 3-49.
Hyman, R. and Honorton, C. (1986). Joint communique: The psi ganzfeld controversy. Journal
of Parapsychology 50 351-364.
Iversen, G.R, Longcor, W.H., Mosteller, F., Gilbert, J.P., and Youtz, C. (1971). Bias and
runs in dice throwing and recording: A few million throws. Psychomenika 36 1-19.
Jeffreys, W.H. (1990). Bayesian analysis of random event generator data. Journal of Scientific
Exploration 4 153-169.
Lindley, D.V. (1957). A statistical paradox. Biometrika 44 187-192.
Mauskopf, S.H. and McVaugh, M. (1979). The Elusive Science: Origins of Experimental
Psychical Research. The Johns Hopkins University Press, Baltimore.
McVaugh, M.R. and Mauskopf, S.H. (1976). J.B. Rhine's Extrasensory Perception and its
background in psychical research. Isis 67 161-189.
Neuliep, J.W. (Ed.) (199Q). Handbook of replication research in the behavioral and social
sciences. Journal of Social Behavior and Personality (Special Issue) 5(4).
Office of Technology Assessment (1989). Report of a workshop on experimental
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
43
parapsychology. Journal of the American Society for Psychical Research 83 317-339.
Palmer, J. (1989). A reply to Gilmore. Journal of Parapsychology 53 341-344.
Palmer, J. (1990). Reply to Gilmore: Round two. Journal of Parapsychology 54 59-61.
Palmer, LA., Honorton, C. and Utts, J. (1989). Reply to the National Research Council study
on parapsychology. Journal of the American Society for Psychical Research 83 31-49.
Radin, D.I. and Ferrari, D.C. (1991). Effects of consciousness on the fall of dice: A meta-
analysis. Journal of Scientific Exploration 5 (to appear).
Radin, D.I. and Nelson, R.D. (1989). Evidence for consciousness-related anomalies in random
physical systems. Foundations of Physics 19 1499-1514.
Rao, K.R. (1985). Replication in conventional and controversial sciences. In The Repeatability
Problem in Parapsychology (B. Shapin and L. Coly. eds.) 22-41. Parapsychology
Foundation, New York.
Rhine, J.B. (1934). Extrasensory Perception. Boston Society for Psychical Research, Boston.
(Reprinted by Branden Press in 1964).
Rhine, J.B. (1977). History of experimental studies. In Handbook of Parapsychology (B.B.
Wolman, ed.) 25-47. Van Nostrand Reinhold, New York.
Richet, C. (1884). La suggestion mentale et le calcul des probabilites. Revue Philosophique 18
608-674.
Rosenthal, R. (1984). Meta-Analytic Procedures for Social Research. Sage, Beverly Hills.
Rosenthal, R. (1986). Meta-analytic procedures and the nature of replication: The ganzfeld
debate. Journal of Parapsychology 50 315-336.
Rosenthal, R. (1990a). How are we doing in soft psychology? American Psychologist 45 775-
777.
Rosenthal, R. (1990b). Replication in behavioral research. Journal of Social Behavior and
Personality. 5 1-30.
Saunders, D.R. (1985). On Hyman's factor analysis. Journal of Parapsychology 49 86-88.
Shapin, B. and Coly, L. (Eds.) (1985). The Repeatability Problem in Parapsychology.
Parapsychology Foundation, New York.
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
44
Spencer-Brown, G. (1957). Probability and Scientific Inference. Longmans Green, London and
New York.
Stuart, C.E. and Greenwood, J.A. (1937). A review of criticisms of the mathematical evaluation
of ESP data. Journal of Parapsychology 1 295-304.
TVersky, A. and Kahneman, D. (1982). Belief in the law of small numbers. In D. Kahneman,
P. Slovic and A. Tversky (als), Judgment under uncertainly: Heuristics and biases.
Cambridge University Press, Cambridge.
Utts, J. (1986). The ganzfeld debate: A statistician's perspective. Journal of Parapsychology 50
395-402.
Utts, J. (1987). Psi, statistics, and society. Behavioral and Brain Sciences 10 615-616.
Utts, J. (1988). Successful replication versus statistical significance. Journal of Parapsychology
52 305-320.
Utts, J. (1989a). Randomness and randomization tests: A reply to Gilmore. Journal of
Parapsychology 53 345-351.
Utts, J. (1989b). Analyzing free-response data - a progress report. To appear in Psi Research
Methodology: A Re-examination (L. Coly, ed.). Parapsychology Foundation, New York.
Wilks, S.S. (1965). N.Y. Statistician 16 (nos. 6 and 7).
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
III MAIN-STREAM PUBLICATIONS
One measure of the acceptance of anomalous mental phenomena as a valid area for investigation is the
degree to which research papers appear in the main-stream scientific literature. The reports in this
section have been selected because they are a representative sample of such papers.
The number that appears in the upper right?hand corner of the first page for each publication is keyed
to the following descriptions:
9. Targ, R. and Puthoff, H. E., "Information Tiansmission Under Conditions of Sensory Shielding,"
Nature, Vol. 252, pp. 602-607, (October, 1974). 'Parg and Puthoff describe a series of experiments
with selected individuals, including Mr. Un Geller, and introduce an anomalous cognition
technique called remote viewing. The paper also includes a pilot experiment to investigate the
effects of anomalous cognition on the alpha rhythms in the brain.
10. Puthoff, H. E. and Targ, R., "A Perceptual Channel for Information Transfer over Kilometer
Distances: Historical Perspective and Recent Research," Proceedings of the IEEE, Vol. 64, No. 3,
pp. 329-354, (March, 1976). Puthoff and Targ provide a historical review of the pertinent
literature and describe over 50 remote viewing (i.e., anomalous cognition) trials. The paper also
includes representative examples of remote viewing.
11. Jahn, R. G., "The Persistent Paradox of Psychic Phenomena: An Engineering Perspective,"
Invited Paper, Proceedings of the IEEE, Vol. 70, No. 2, pp. 136-170, (February, 1982). Jahn
describes a replication of remote viewing and extends the distance to over 10,000 kilometers. In
addition to an independent overview of parapsychology, Jahn also includes descriptions of a
number of anomalous perturbation experiments.
12. Child, I. L., "Psychology and Anomalous Observations: The Question of ESP in Dreams,"
American Psychologist, Vol. 40, No. 11, pp. 1219-1230, (November, 1985). Professor Child, the
then Chairman of the Psychology Department at Yale University, provides a critical review of the
anomalous cognition dream studies conducted at Maimonides Medical Center in the early 1970's.
Professor Child warns the general psychological research community not to dismiss the body of
research and suggests that it should be of wide interest to them.
13. Atkinson, R. L, Atkinson, R. C., Smith, E E., and Bern, D. J., Introduction to Psychology, 10th Edition,
pp. 234-243, Harcourt Brace Jovanovich, New York, (1990). Professor Bern included anomalous
cognition in a chapter on consciousness and its altered states in a widely-used introductory text in
psychology. Bern provides definitions of terms, a review of the experimental evidence for anomalous
cognition, an analysis of the debate over the evidence, and a review of the anecdotal evidence.
14. Walker, E. H., May, E. C., Spottiswoode, S. J. P., and Piantanida, T:, "Testing Schrodinger's
Paradox with a Michelson Interferometer," Physics B, Vol. 151, pp. 339-348, (1988). While not
directly related to anomalous mental phenomena, this paper describes an experimental test to
determine if consciousness is a necessary ingredient for determining physical reality. The authors
conclude that is it not, and thus, this result has implications for anomalous perturbation research.
15. Hyman, R., "Parapsychological Research: A Tutorial Review and Critical Appraisal," Invited
Paper, Proceedings of the IEEE, Vol. 74, No. 6, pp. 823-849, (June, 1986). Dr. Hyman is a Professor
of Psychology at the University of Oregon in Eugene and has been a long-time critic of and
commentator on the field of parapsychology. Hyman reviews the historical experiments and
provides a critical analysis of the current research.
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
ourc Vo(. 1)1 October 18 19/4
pproved For Release
N. 251 No. 5476 October 18, 1974
Published weekly by
Macmillan Journals Ltd
4 Little
telephone:
Telegrams:
Ill
London
Essex Street, WC2R 3LF
(01) 836 6633 Telex: 262024
Phusis London WC2R 3LF
Washington
National Press Building, DC 20045
Telephone: (202) 737 2355
Telex: 64280
. Editor
David Davies
Deputy Editor
Roger Woodham
Editorial staff
Gillian Boucher Colin Norman*
John Gribbin Sally Owen
John Hall Allan Piper
Eleanor Lawrence Miranda Robertson
?Mary Lindley Fiona Selkirk
Peter Milford Robert Vickers
Peter Newmark Mary Wade'
John Wilson
?Washington office
Publishing Director
Jenny Hughes
Display advertisement enquiries to:
London Office
Classified adyertisement enquiries to:
T. G. Scott and Son Ltd,
1 Clement's Inn,
London WC2A 2ED
Telephone: (01) 242 6264 and
(01) 405 4743
Telegrams: Textualist London
WC2A 2ED
Subscription enquiries to:
Macmillan Journals Ltd, Brunel Road,
Basingstoke, Hams, R021 2XS
Telephone: Basingstoke 29242
Publication address in the United States
The Wm Byrd Press Inc.,
2901 Byrdhill Road, ,
Richmond, Virginia 23228
Second Class Postage for the USA
paid at Richmond, Virginia
US Postmaster, please send form 3579'
to Nature, 711 National Press Building,
Washington DC 20045 ?
. Price
?22 per year?excepting USA -
and Canada (?28 per year)
Registered as a newspaper at the
. British Post Office '
Copyright 0 Macmillan Journals Ltd,
October 18, 1974
Corer Picture
A hundred years ago Nature was
reviewing E. J. Marey's Animal
Mechanism (page 518, October 29,
18741, These cumbersome mechanisms
were soon to be replaced by Muy
bridge's zoopraxiscope camera. On
page .567 we looks at a Muybridge
sequence and?a century later?what
happens when the-light is,switc.bed on.
Volume 252 October 18, 1974
Investigating the paranormal 559
For those in peril on the factory floor 560
INTERNATIONAL NEWS
562
NEWS AND VIEWS
569
ARTICLES
Human reproduction and family planning: research strategies in developing countries?
A. Kessler and C. C. Standley 577
Compositional variation in recent Icelandic tholeiites and the Kverkfjoll hot spot?
G. E. Sigvaldason, S..Steinthorsson, N. Oskarsson and?, fins/and ? 579
Climatic significance of deuterium abundance in growth rings of Picea?W, E. Shiegl 582
Properties of hybrids between Salmonella phage P22 and coliphage 7.?
D. Botstein and I. Herskowitz ? 584
LETTERS TO NATURE?Physical Sciences
Distance to Cygnus X-I?C.-C. Cheng, K. J. H. Phillips and A. M. Wilson ? 589
High energy radiation from white holes?J. Y. Narlikar, K. M. V. Appa Rao and
N. Dadhich
590
Spectrum of the cosmic background radiation between 3 mm and 800 pm?
E. 1. Robson,.b. G. Vickers, J. S. Hui:inga, J. E. Beckman and P. E. Clegg
A new solar?terrestrial relationship?G. M. Brown
Rainfall, drought and the solar cycle?C. A. Wood and R. R. Lovett
Dynamic implications of mantle hotspots?M. A. Khan
A-type doubling in the CH molecule?R. E. Hammersley and W. G. Richards
Sc
5t.
Drag-reducing polymers and liquid-column oscillations?W. D. McComb
lif noise with a low frequency white noise limit?K. L. Schick and A. A. Verveen
Second Law of Thermodynamics?D. R. Wilkie
60
Information transmission under conditions of sensory shielding?R. Tar: and H. Puthoff
LETTERS TO NATURE?Biological Sciences
The stability of a feasible random ecosystem?A. Roberts 607
Objective evaluation of auditory evoked EEG responses?B. McA. Sayers and
H. A. Beagley
Imprinting and exploration of slight novelty in chicks?P. S. Jackson and
P. P. G. Bateson
608
609
Microbial activation of prophenoloxidase from immune insect larvae?A. E. Pye 610
Elevation of total serum IgE in rats following helminth parasite infection?
E. Jarrett and H. Bazin .
612
Alternative route for nitrogen assimilation in higher plants?P. J. Len and
B. J. Miflin 614
Evolution of cell senescence,?atherosclerosis and benign tumours?D. Dykhuizen 616
Insulin stimulates myogenesis in a rat myoblast line?J.-L. Mandel and
M. L. Pearson 618
Sickle cell resistance to In vivo hypoxia-0. Castro, S. C. Finch and G. Osbaldistone 62.0
Expression of the dystrophia muscularis (dy) recessive gene in mice?R. Parsons 621
Growth of human muscle spindles in vitro?B. J. Elliott and D. G. F. Harriman 622
Multiple control mechanisms underlie initiation of growth in animal cells?
L.J. de Asua and E. Rozengurt 624
Control of cell division in yeast using the ionophore, A23187 with calcium and
magnesium?J. N. Duffus and L. I. Patterson 626
' Antigen of mouse bile capillaries and cuticle of intestinal mucosa?
N. I. Khramkova and T. D. Beloshapkirta 627
Ultrastructural analysis of toxin binding and entry into mammalian cells?
G. L. Nicolson 628
Serum dopamine Pfr-hydroxylase activity in developing hypertensive rats?T. Nagatsu,
? ? T. Kato, Y. Numata (Sudo), K. Thula, H. Umezawa, M. Matsuzaki and T. Takeuchi 630
Enzymatic synthesis of acetylcholine by a serotonin-containing neurone from Helix?
M. R. Hanky, G. A. Cottrell, P. C. Einson and F. Fortnum 631
Approved For Release 2003/04/18 : CIA-RDP96-00789R0031000300014
Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4
Information transmission tinder
conditions of sensory shielding.
WE present results of experiments Suggesting the existence of
one or more perceptual modalities through which individuals
obtain information about -their environment,' -although this
.information is not presented .to any .known sense. The litera-
.-ture1-3 and our Observations lead us to conclude that such
.abilities can be studied under laboratory conditions.
We have investigated the ability of certain peopk to describe
-graphical material or remote scenes shielded against ordinary
perception. In addition, we performed pilot studies to determine
if electroencephalographic (EEG) recordings might indicate
perception of remote happenings even in the absence of correct
overt responses.
We concentrated on what we consider /o be our primary
responsibility?to resolve under conditions as unambiguous
as possible the basic issue of whether a certain class of pan-
-normal perception phenomena exists. So we conducted our
.experiments with sufficient control, utilising visual, ,acoustic
and electrical shielding, to ensure that all conventional paths of
sensory input were blocked. At all times we took measures to
prevent ,sensory leakage and to prevent deception, whether
intentional or unintentional.
Our goal is not just to catalogue interesting, events, but to
uncover patterns of cause-effect relationships That lend them-
selves to analysis and hypothesis in the forms with which
we are familiar in scientific study. The results presented here
constitute a .first step toward" that goal; we have established
under known conditions a daS base from which departures as a
function of physical and psychological variables can be studied
in future work.
REMOTE PERCEPTION OF GRAPHIC MATERIAL
First, we conducted experiments with Mr Uri Geller in
which we examined his ability, while located in an electricallY
shielded room, to reproduce target pictures drawn by experi-
menters ? located at remote locations. Second, we conducted
double-blind experiments with Mr Pat Price, in which vie
measured his .ability to describe remote outdoor scenes many
miles from his physical location. Finally, we conducted PT
Approved For Release 2003/04/18: CIA-RDP96-00789R0031000300014
Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4
piaiure Vol. 251 October 18 1974
liminary tests using EEGs, in which subjects were asked to
perceive whether a remote light was flashing, and to determine
whether a subject could perceive thc presence of the light,
even if only at a noncogaitive level of awareness. .
In preliminary tcsting Geller apparently demonstrated an
ability to reproduce simple pictures (line drawings) ,which had
been drawn and placed in opaque sealed envelopes which he
was not permitted to handle. But since each of the targets was
)(flown to at least one experimenter in the room with Geller,
it was not possible on the basis of the preliminary testing to
discriminate between Geller's direct perception of envelope
contents and perception through some mechanism involving
thc experimenters, whether paranormal or subliminal.
So we examined thc phenomenon under conditions designed
to eliminate all conventional information channels, overt or
subliminal. Geller was separated from both the target material
and anyone knowledgeable of the material, as in the experiments
of ref. 4.
In the first part of the study a series of 13 separate drawing
experiments were carried out over 7 days. No experiments
are deleted from the results presented here, ?
At the beginning of the experiment either . Geller or the
experimenters entered a shielded room so that from that time
forward Geller was at all times visually, acoustically and
electrically shielded from personnel and material at the target
location. Only following Geller's isolation from the experi-
menters was a target chosen and drawn, a procedure designed
to eliminate pre-experiment cueing. Furthermore, to eliminate
the possibility of pre-experiment target forcing, Geller was kept
ignorant as to the identity of the person selecting the target
and as to the method of target selection. This was accomplished
by thc use of three different- techniques: (1) pseudo-random
technique of opening a dictionary arbitrarily and choosing the
first word that could be drawn (Experiments 1.4); (2) targets,
blind to experimenters and subject, prepared independently by
a
60 ]
SRI scientists outside the experimental group (followitn
Ocher's isolation) and provided to the e,xperirnenters during
the course of the expe.risnent (Experiments 5-7, 11-13); and (3
arbitrary selection from a target pool decided upon in advanci
of daily experimentation and designed to provide data concern
ing information content for use in testing specific hypothese.
(Experiments 8-10). Geller's task was to reproduce with per
on paper the line drawing generated at the target location
Following a period of effort ranging from a few minutes tc
half an hour, Geller either passed (when he did not feel con
fident) or indicated he was ready to submit a drawing to flu
'experimenters, in which case the drawing was collected befor
.Geller was permitted to See the target
To prevent sensory cueing of the target information, Experiment
1 through 10 were carried out using a shielded room in SRI's facilit
for EEG research. The acoustic and visual isolation is provide
by a double-walled steel room, locked by means of an inner aria
outer door, cad) of which is secured with a refrigerator-type lockim
mechanism. Following target selection when Geller was insid
the room, a one-way audio monitor, operating only from the hisid
to the outside, was activated to monitor Geller during his efforts
The target piaurc was never discussed by the experimenters after th
picture was drawn and brought near the shielded room. In ou
detailed examination of the shielded room and the protocol used ii
these experiments, no sensory leakage has been found.
The conditions and results for the 10 experiments carried out in ti
shielded room are displayed in Table 1 and Fig. 1. All experimen
except 4 and 5, were conducted with Geller inside the shielded roorc
In Experiments 4 and 5, the procedure was reversed. For thos
experiments in which Geller was inside the shielded room, the targs
location was in an adjacent room at I distance of about 4 m, excep
for Experiments 3 and 8, in which the target locations were, resp:
tively, an of5= at a distance of 475 m and a room at a distance c
about 7 m.
A response was obtained in all experiments except Number
5-7. In Experiment 5, the person-to-person link was eliminate'
by arranging for a scientist outside the usual experimentr
gronel to draw a picture, lock it in the- shielded room befor
Geller's arrival at SRI, and leave the area. Geller was then le
? TARGET ' RESPONSE
TAR 0 ET
? :
RIESPOftiE 2
REspolog
Approiied For Release 2003/(14/18 : CIA-RDP96-00789603100030001-4
_Fig. 1 Target pictures and responses drav.en. by Uri Geller under shielded conditions.-
R ESPONSE
???p?
01:11 ga-L.F.-Ct
)ic .NL
T AA 0 ET
RESPONSE
Nature Vol. 251 October 18 1974
Expert. Da
(month,?day, year
1
8/4/73
2
8/4/73
3
8/5/73
-4
8/5/73
5
8/6/73
6
817/73
7
8/7/73
8
8/8/73
9
8/8/73
10
818173
11
8/9/73
12
8/10/73
13
8/1 0/73
ti4PF2P3PMQUIDP96200789R003100030001-4
Geller Location Target location
Shielded room]'
Shielded room 1
Shielded 'room1 ?
Room adjacent to
shielded room]
Room adjacent to
shielded room I
Shielded room 1
Shielded-room 1
Shielded room 1 ,
Shielded room 1
Shielded room 1
Shielded room 24
Shielded room 2
Shielded room 2
Adjacent room (4.1 Erin
Adjacent room (4.1 m)
Office -(475 in)
Shielded room I
? (3.2.m) .
Shielded room 1
(32m)
Adjacent room (4.1 in)
Adjacent room (4.1 m)
Remote room (6.75 m)
Adjacent room (4.1 m)
? Adjacent room (4.1 m)
Computer (54 m)
Computer (54 in)
Computer (54 m)
Target
Firecracker
Grapes ?
Devil
Solar system.
Rabbit
Tree
Envelope
Camel
)3ridge
Seagull
Kite (cognputer CRT)
Church (computer memory)
Arrow through heart
(computer CRT, zero
intensity)
Figure
a
lb
lc
Id
No drawing
No drawing
No drawing
. .le
If
lg
? 24
? -2b
2c
*EEG Facility shielded room (set text).
4Ferceiver?target distances measured in metres.
ISRI Radio Systems Laboratory shielded room (see text).
by the experimenters to the -shielded room and asked to draw
the picture located inside the roorn. Be said that he got no clear
impression and therefore did not submit a drawing. The elimina-
tion of the person-to-person link was examined further in the
second series of experiments with this subject.
Experiments 6 and 7 were carried out while we attempted to
record Geller's EEG during his efforts to perceive the target
pictures. The target pictures were, respectively, a tree and an
envelope. He found it difficult to hold adequately still for good
EEG records, said .that he experienced difficulty in getting
impressions of the targets and again -submitted no drawings.
Experiments 11 through 13 were -carried out in SRI's Engin-
eering Building, to make use of the computer facilities available
there. For the exPerimenters, Geller was secured in-a double-
-walled, copper-screen Faraday cage 54 in down the ball and
around the corner from the computer room. The Faraday cage
provides 120 dB attenuation for plane wave radio 'frequency
radiation over a range of 15 kHz to I GHz. For magnetic fields
the attenuation is 68 dB at 15 kHz and decreaies? to 3 dB at
60 HZ. Following Creller's isolation, the targets for these
experiments were chosen by computer laboratory personnel
not otherwise associated with either the experiment or Geller,
and the experimenters and subject were kept blind as to the
contents of the target pool. ?
For Experiment 11, a piCture of a kite was drawn on the face
of a cathode ray tube display screen, driven by the computer's
graphics program. For Experiment 12, a picture of a church
was drawn and stored in the memory of the computer. .In
Experiment 13, the 'target drawing, an arrow through a heart
(Fig. 24), was drawn on the face of the cathode ray tube and
then the display intensity was turned off so that no picture
was visible.
To obtain an independent evaluation of the correlation be-
tween target and response data, the experimenters 'submitted
the data for judging on a 'blind' basis by two -SRI scientists
who were not otherwise associated with the research. For the
10 cases in which Geller provided a response, the judges were
asked to match the response data with the corresponding
target data (without replacement). In those cases in which
Geller made more than One drawing as his respcinse to /he
target, all the drawings were combined as a set for judging.
The two judges each Matched the target data to the response
data with no error. For either judge such a correspondence has
an a priori probability, under the null hypothesis of no in-
formation channel, of P = (109-1 = 3 x 10-7.
A second series of experiments was carried out -to determine
whether direct perception of enve4ope contents was possible
. withoin some person knowing of the target picture.
One hundred target pictures of. everydayObjects were drawn
by an SRI artist and. sealed by other SRI personnel in double
Approved For Release 2003/04/1
envelopes containing black 'cardboard. The hundred targets
were divided randomly into groups of 20 for use in each of the
Three days' experiments.
On each of the three days of these experiments, Geller passed.
That is, he declined .to associate any envelope with a drawing
that he made, expressing dissatisfaction with the existence of
such a large target pool. On each day he made approximately 12
recognisable drawings, which he felt were associated with the
entire target pool of 100. On each of the three days, two of his
drawings could reasonably be associated with two of the 20
daily targets. On the third day, two of his drawings were very
dose replications of -two of that day's target pictures. The
drawings resulting from this experiment do not depart signific-
antly from what would be expected by chance.
In a simpler experiment Geller was successful in obtaining
information under conditions in which no persons were know-
ledgeable of the target. A double-blind experiment was per-
forrned in which a single 3/4 inch die was placed in a 3 x 4 x 5
inch steel box. The box was then vigorously shaken by one of the
experimenters and placed on the table, a technique found in
control runs to produce a distribution of die faces differing non-
signfficantlyfrom 'chance. The orientation of the die within the
box was unknown -to the experimenters at that time. Geller
would then write down which die face was uppermost. The
target pool was known, but the targets were individually pre-
pared in a manner blind to all persons involved in the experi-
ment. This experiment was performed ten times, with Geller
passing twice and giving a response eight times. In the eight
times in which he gave a response, he was correct each time.
The distribution of responses consisted of three 2s, one 4, two
5s, and two 6s. The probability of this occurring by chance is
approximately one in 101.
In certain situations significant information transmission can
take place under shielded conditions. Factors which appear to
be important and therefore candidates for future investigation
include whether the subject knows the set of targets in the target
pool, the actual number of targets in the target pool at any
given time, and whether the target is known by any of the
experimenters.
It has been widely reported that Geller has demonstrated the
ability to bend metal by paranormal means. Although metal
bending by Geller has been observed in our laboratory,
we have
not been able to combine such observations with adequately
controlled experiments to obtain data sufficient to support the
paranormal hypothesis.
'REMOTE VIEWING NATURAL TARGETS
A study by Osis' led us to determine whether a suoject could
describe randomly chosen geographical sites located several
miles from the subject's position and demarcated by some
8 : CIA-RDP96-00789R003100030001-4
appropriate.means (remote viewing), This experiment carried the experimenters and the subject were kept blind as to the
out wilinAhigc1M.rp?aafga oq seopAr_olD962664t,t6FIbiplibrakjaibiwich were used without replace,-
e- inn, amon- ment.
city co"uircilfrian, consisted-Zr a ren .
stration-of-ability tests involving loaal. targets in the San An experimenter was closeted with Price at SRI to wait 30 Min to
Francisco Bay area which could be documented by several jade- "begin the narrative description of the remote location. The SRI
locations from which the subject viewed the remote locations con--
pendent judges. We .planned the experiment considering that sisted of an outdoor. park (Experiments 1. 2), the double-walled
natural gcograPhiCal places or. man-inade sites that have copper-screen Faraday cage .discussed earlier (Experiments 3, 4, and
existed for a long time are more potent targets for.paranormal 6-9); and an office (Experiment 5). 6 second experimenter would then
perception experiments than arc artificial targets prepared iii the obtain a target location from the Division Director from a set of
travelling Orders previously prepared and randamised by the Director
laboratory. This is based on subject opinions that the use of and kept under his control. The target demarcation team (two to
artificial targets involves a 'trivialisation of the ability' as corn- four SRI experimenters) then -proceeded directly to the target by
pared with natural pre-existing targets.. automobile without communicating with the s-ubject or experimenter
In each of nine experiments involving. Price as subject and remaining behind. Since the experimenter remaining with the subject
at SRI was in ignorance both as to the particular target and as to
SRI experimenters as a target demarcation team, a remote the target pool, he was free to question Price to clarify his descrip-
location ,Was ? chosen in a double-blind protocol. Price, who tions. The demarcation team then remained at the 'target site for
remained at SRI, was asked to describe this remote location, as 30 min after the 30 min allotted .fo'r travel. During the observation
well as whatever. activities might be going on there. . period, the remote-viewing subject would describe his impressions of
the target site into a tape recorder. A comparison was then made
Several descriptions yielded significantly correct data per- when the demarcation team returned. `?
taming to and descriptive of the target location. ? . Price's ability to describe correctly buildings, docks, roads,
.. .
In the experiments a set Of twelve target locations clearly gardens and so on, ? including structural materials, colour,
differentiated from each other and within 30 min driving time ambience and activity, sometimes in great detail, indicated the
from SRI had been 'chosen from a target-rich environment (more functioning of a remote perceptual ability. But the descriptions
than 100 targets .of the type used in the experimental series) contained inaccuracies as well as coitet-t statements. To obtain
prior to the experimental series by an individual in SRI manage- a numerical evaluation of the accuracy of the remote viewing
ment, the director of .the Information Science and Engineering experiment, the experimental results were subjected to inde-
Division,. not otherwise associated with the experiment. Both pendent judging on a blind basis by five SRI scientists who were
TARGET
Fig. ZsoOmputer drawings and responses drawn 14 Uri Geller. a, Computer drawing stored on video display; b conipitici-arawing.storeeFii
corriputer.meapory. only; ccomputer drawing stored:on; video- display .with?ixrcp.intensityi.
?
Approved For Release 2003/04/18 : CIA-RDP96-00789R003-100030001-4 ?
Table 2 Distribution of correct selections by judges A, B, C, D, and E in remote viewing experiments
. Descriptions chosen by judges
1
2
3 -
? Places viiited by judges
4 ! 5 6
7
8
9
Hoover Tower
ABODE
D
Baylands Nature:Preserve
2
ABC
Radio Telescope
3
ACD
BE
Redwood City Marina
4
CI)
ABDE
Bridge Toll Plaza
5
ABD
DCE
Drive-In Theatre
6
C
Arts and Crafts Garden Plaza
'7
A33CE
Church
8
Rinconada Park
9
CE
AL
Of the 45 selections (5 judges, 9 choices), 24 were correct. Bold type indicates the description chosen Most often for each place visited. Ce__
choices lie on the main diagonal. The number of correct matches by Judges A through E is 7, 6, 5, 3, and 3, respectively. The expected nu_ )
of correct matches from the five judges was five; in the experiment 24 such matches were Obtained. The a priori probability of such an occur,,i
by chance, conservatively assuming assignment without replacement on the part of the judges, is P = 8.10'4?.
not otherwise associated with the research. The judges were
asked to match the nine locations, which they independently
visited, apinet the typed manuscripts Of the tape-recorded nar-
ratives of the remote viewer. The transcripts were -unlabelled
and presented in random order. The judges were asked to find a
narrative which they would consider the best match for each
of the places they visited. A given narrative could be assigned
to more than one target location. A correct match requires that
the transcript of a given date be associated with the target Of
that date. Table 2. shows the distribution of the judges' choices.
Among all possible analyses, the most conservative is a per-
mutation analysis of the plurality vote of the judges' selections
assuming assignment without replacement, an approach inde-
pendent of the number of judges. By plurality vote, six of the
Line descriptions and locations were correctly matched. Under
the null hypothesis (no remote viewing and a random selection
of descriptions without replacement), this outcome has an a
priori probability of? = 5.6 x 10-4, since, among all possible
permutations of the integers one through nine, the probability
of six or more being in their natural position in the list has that
value. Therefore, although Price's descriptions contain in-
accuracies, -the descriptions are sufficiently accurate to permit
the judges to differentiate among the various targets to the
degree indicated.
EEG EXPERIMENTS ?
An experiment was undertaken to determine whether a
physiological measure such as EEG activity could be used as an
indicator of information transmission between an isolated
subject and a remote stimulus. We hypothesised that perception
could be indicated by such a measure even in the absence of
verbal or other overt indicators.".
It was assumed that the application of remote stimuli would
result in responses similar to those obtained under conditions
of direct stimulation. For example, when normal subjects are
stimulated with a !dashing light, their EEG typically shows .a
decrease in the amplitude of the resting rhythm and a driving
of the brain waves at the frequency of the flashes'. We hypothe-
sised that if we stimulated one subject iii this manner (a sender),
the EEG of another subject in a remote room with no flash
present (a receiver), might show changes in alpha (9-11 Hz)
activity,and possibly EEG driving similar to that Of the sender.
We informed our subject that at certain times a light was to
be flashed in a sender's eyes in a distant room, and if the subject
perceived that event, consciously or unconsciously, it might be
evident from Changes in his EEG output. The receiver was
seated in the visually opa'que, acoustically, and electrically
shielded dotible-walled steel room previously described. The
sender was stated in a room about 7 m from the receiver. ?
To find subjects who were responsive to stich a remote
stimulus, We initigy worked with four .female and tvio male
volunteer subjects, all of whom believed that success in the
experimental situation might be possible. These wereilesignated
'receivers'. The senders Were either other subjects or .1
experimenters. We decided beforehand to run one or tv,
sessions of 3:6 trials each with each subject in this sele-'c
procedure, and to do -a more extensive study with any stt
whose results: were positive.
A Grass' PS-2 photostimiilator placed about 1 m in front of t
sender was used to present flash trains of 10 s duration. The recein
EEG activity from the occipital region (Oz), referenced to 1- t
mastoids, was amplified with a Grass 5P-1 preamplifier and asso
driver amplifier with a bandpass of 1-120 Hz. The EEG data
recorded on magnetic tape with an Ampex SP 300 recorder.
On each trial, a tone burst of fixed frequency was presented to bo
sender and receiver and was followed in one second by either (
train of flashes or a null flash interval presented to the sender. 7 -1
six such trials were given in an experimental session, consisting
null trials-no flashes following the tone-: 2 trials of flashes at 6 f.p
and 12 trials of flashes at 16 f.p.s., all randomly intermixed, dett
mined by entries from a table of random numbers. Each of the it
generated an 11-s EEG epoch. The last 4 s of the epoch was se t
for analysis to minimise the desynchronising action of the wai
. cue. This 4-s segment was subjected to Foorier analysis on a UNC
computer.
Spectrum analyses gave no evidence of =-G driving in any rec-ii
although in control runs the receivers did exhibit driving
physically stimulated with the flashes. Dm of the six subjects sJ
initially, one subject (H. H.) showed a consistent alpha blocking effe
We therefore undertook further study with this subject.
Data from seven sets of 36 trials each were collected froi- tl
subject on three separate days. This comprises all the data col
to date with this subject under the test conditions described
The alpha band was identified from average spectra, then scores
average power and peak "power were obtained frorn individuaLtri:
and subjected to statistical analysis.
Of our six subjects, H. H. had by far the most monochro_1.
EEG spectrum. Figure 3 shows an overiay of the three averag
spectra from one of this subject's 36-trial runs, displayi
changes in her alpha activity for the three stimulus condi Jr
Mean values for the average power and peak power foi
Table 3 EEG data for H.H. showing average power and peak
in the 9-11 Hz band, as a function of Lash frequency and -lc
Flash
Frequency
Sender
0 6 16
Average Power
0 6 16
Peak Power
J.L.
94.8
84.1
76.8
357.7
329.2
289.6
R.T.
41.3
45.5
37.0
160.7
161.0
125.0
No sender
(subject '
informed)
,
25.1
35.7
28.2
87.5
95.7
81.7
..-....i
J.L.
54.2
55.3
44.8
191.4
170.5
149.3
J.L.
56.8
50.9
32.8
240.6
178.0
104.6
R.T.
39.8
24.9
30.3
145.2
74.2
122.1
No sender
(subject
not
informed)
86.0
53.0
52.1
318.1
180.6
202.3
Averages
56.8
49.9
43.1
214.5
169.8
153.5
-12% -24%,(P