含有〈科学〉标签的文章(55)

[译文]科学病得不轻?

科学的退化
Scientific Regress

作者:William A. Wilson @ 2016-05
译者:小聂(@PuppetMaster)
校对:龙泉
来源:First Things,https://www.firstthings.com/article/2016/05/scientific-regress

The problem with science is that so much of it simply isn’t. Last summer, the Open Science Collaboration announced that it had tried to replicate one hundred published psychology experiments sampled from three of the most prestigious journals in the field. Scientific claims rest on the idea that experiments repeated under nearly identical conditions ought to yield approximately the same results, but until very recently, very few had bothered to check in a systematic way whether this was actually the case.

学研究的问题在于,它们中的很大一部分其实根本不科学。去年夏天,开放科学合作组织(OSC)宣布他们曾试图重复100个选自三本行业权威杂志上的心理学实验。科学论断建基于这样一个观念:在几乎相同的条件下重复实验,其结果也应该相同。但是直到最近为止,此前几乎没有人系统性地验证是不是真的如此。

The OSC was the biggest attempt yet to check a field’s results, and the most shocking. In many cases, they had used original experimental materials, and sometimes even performed the experiments under the guidance of the original researchers. Of the studies that had originally reported positive results, an astonishing 65 perce(more...)

标签:
7442
科学的退化 Scientific Regress 作者:William A. Wilson @ 2016-05 译者:小聂(@PuppetMaster) 校对:龙泉 来源:First Things,https://www.firstthings.com/article/2016/05/scientific-regress The problem with science is that so much of it simply isn’t. Last summer, the Open Science Collaboration announced that it had tried to replicate one hundred published psychology experiments sampled from three of the most prestigious journals in the field. Scientific claims rest on the idea that experiments repeated under nearly identical conditions ought to yield approximately the same results, but until very recently, very few had bothered to check in a systematic way whether this was actually the case. 学研究的问题在于,它们中的很大一部分其实根本不科学。去年夏天,开放科学合作组织(OSC)宣布他们曾试图重复100个选自三本行业权威杂志上的心理学实验。科学论断建基于这样一个观念:在几乎相同的条件下重复实验,其结果也应该相同。但是直到最近为止,此前几乎没有人系统性地验证是不是真的如此。 The OSC was the biggest attempt yet to check a field’s results, and the most shocking. In many cases, they had used original experimental materials, and sometimes even performed the experiments under the guidance of the original researchers. Of the studies that had originally reported positive results, an astonishing 65 percent failed to show statistical significance on replication, and many of the remainder showed greatly reduced effect sizes. OSC小组的工作是迄今最大规模的对于心理学的验证,结果非常惊人。小组几乎采用了原初的实验材料,有些甚至在原来研究者的指导下进行实验。在所有结果阳性的研究中,竟然有65%在统计上不显著,剩下中也有很多的重复结果不如原先的显著。 Their findings made the news, and quickly became a club with which to bash the social sciences. But the problem isn’t just with psychology. There’s an unspoken rule in the pharmaceutical industry that half of all academic biomedical research will ultimately prove false, and in 2011 a group of researchers at Bayer decided to test it. Looking at sixty-seven recent drug discovery projects based on preclinical cancer biology research, they found that in more than 75 percent of cases the published data did not match up with their in-house attempts to replicate. 他们的发现上了新闻,并且很快成了用来攻击社会科学的大棒。但是问题不只是出在心理学领域。医药产业心照不宣的法则是,半数生物医学研究最终会被证明为假,而在2011年拜耳的一组研究者们决定试验一下。在研究了最近的67个基于临床前癌症生物学研究的新药计划之后,他们发现其中75%以上的实验发表的数据和他们内部重复实验的数据对不上。 These were not studies published in fly-by-night oncology journals, but blockbuster research featured in Science, Nature, Cell, and the like. The Bayer researchers were drowning in bad studies, and it was to this, in part, that they attributed the mysteriously declining yields of drug pipelines. Perhaps so many of these new drugs fail to have an effect because the basic research on which their development was based isn’t valid. 这些研究都不是那些发表在无足轻重的肿瘤学期刊上的研究,而是发表在《科学》、《自然》、《细胞》之类期刊上的大手笔。他们发现人们被垃圾研究淹没了,认为这就是临床药物试验离奇衰落的原因。或许如此多的新药研制失败是因为它们所基于的科学研究不靠谱。 When a study fails to replicate, there are two possible interpretations. The first is that, unbeknownst to the investigators, there was a real difference in experimental setup between the original investigation and the failed replication. These are colloquially referred to as “wallpaper effects,” the joke being that the experiment was affected by the color of the wallpaper in the room. This is the happiest possible explanation for failure to reproduce: It means that both experiments have revealed facts about the universe, and we now have the opportunity to learn what the difference was between them and to incorporate a new and subtler distinction into our theories. 当研究结果无法被重复时,有两种可能性。一种是,确实有某项研究者不知道的实验装置区别存在。这种情况俗称“墙纸效应”,戏谑的认为实验会被墙纸的颜色所影响。这是一个皆大欢喜的解释,表明这两个实验揭示了一些事实,现在我们有机会研究这些差异并将这个更微妙的新发现融入理论中。 The other interpretation is that the original finding was false. Unfortunately, an ingenious statistical argument shows that this second interpretation is far more likely. First articulated by John Ioannidis, a professor at Stanford University’s School of Medicine, this argument proceeds by a simple application of Bayesian statistics. Suppose that there are a hundred and one stones in a certain field. One of them has a diamond inside it, and, luckily, you have a diamond-detecting device that advertises 99 percent accuracy. After an hour or so of moving the device around, examining each stone in turn, suddenly alarms flash and sirens wail while the device is pointed at a promising-looking stone. What is the probability that the stone contains a diamond? 而另一种可能是,原实验的结果为假。很不幸的是,一项设计巧妙的统计学论证显示出第二种解读更有可能。该论证最早由斯坦福医学院John Ioannidis教授提出,现在被一个简单的贝叶斯统计应用所取代。假设一块田里有101块石头,其中的一块里面有钻石,并且,你正好有个号称准确率99%的钻石探测器。在经过了近一个小时的来回,一个一个的检查石头之后,突然警报响起,探测器指向一个有可能的石头。该石头含钻石的可能性是多少? Most would say that if the device advertises 99 percent accuracy, then there is a 99 percent chance that the device is correctly discerning a diamond, and a 1 percent chance that it has given a false positive reading. But consider: Of the one hundred and one stones in the field, only one is truly a diamond. Granted, our machine has a very high probability of correctly declaring it to be a diamond. But there are many more diamond-free stones, and while the machine only has a 1 percent chance of falsely declaring each of them to be a diamond, there are a hundred of them. So if we were to wave the detector over every stone in the field, it would, on average, sound twice—once for the real diamond, and once when a false reading was triggered by a stone. If we know only that the alarm has sounded, these two possibilities are roughly equally probable, giving us an approximately 50 percent chance that the stone really contains a diamond. 大多数人会说既然探测器的准确率是99%,那么就有99%的可能性该探测器正确的判断出了钻石的所在,和1%的可能性探测器给出了误报。但是请考虑这一点:101块石头中,只有一块有钻石。毋庸置疑,我们的探测器可以以很高的可能性正确判断一块石头里面是否有钻石。但是大多数石头里面是没有钻石的,所以尽管探测器仅有1%的可能性错误的判断出它们中的某一个有钻石,但是这样的石头有100个。于是如果我们在每一块石头上挥舞探测器,则它会报警的期望值是两次,一次为真正的钻石,一次为误报。如果我们仅仅只是听到报警而已,那么这两个情况出现的可能性是相等的,得出的结论就是石头里面有钻石的可能性是大约50%。 This is a simplified version of the argument that Ioannidis applies to the process of science itself. The stones in the field are the set of all possible testable hypotheses, the diamond is a hypothesized connection or effect that happens to be true, and the diamond-detecting device is the scientific method. A tremendous amount depends on the proportion of possible hypotheses which turn out to be true, and on the accuracy with which an experiment can discern truth from falsehood. Ioannidis shows that for a wide variety of scientific settings and fields, the values of these two parameters are not at all favorable. 这是Ioannidis教授关于科学研究过程的统计学论证的一个简化版本。田里的石头就是所有可验证的理论假设的集合,钻石就是那个恰好为真的假设,而探测器就是科学的方法。至关重要的两个参数是真假设占所有可行假设的比例,以及用实验来判断真假的准确性。Ioannidis教授向我们说明了在大部分科研情景和领域里面,这两个参数的值都不容乐观。 For instance, consider a team of molecular biologists investigating whether a mutation in one of the countless thousands of human genes is linked to an increased risk of Alzheimer’s. The probability of a randomly selected mutation in a randomly selected gene having precisely that effect is quite low, so just as with the stones in the field, a positive finding is more likely than not to be spurious—unless the experiment is unbelievably successful at sorting the wheat from the chaff. Indeed, Ioannidis finds that in many cases, approaching even 50 percent true positives requires unimaginable accuracy. Hence the eye-catching title of his paper: “Why Most Published Research Findings Are False.” 比如说,设想一个分子生物学研究小组想要决定人类无数基因中的某一个基因变异是否会增加阿尔兹海默症的风险。一个随机选择的基因里面产生的随机变异,正好产生一个给定的效果,这个可能性是很低的。所以就像田里的石头一样,阳性结果在大多数情况下都很有可能是假的,除非该实验有着令人难以置信的准确率。确实,Ioannidis教授发现在很多情况下,即使是接近50%的真阳性结果也需要惊人的准确率。正是因为这样,他才给他的论文起了个吸引眼球的标题:“为什么被发表的多数研究结论都是假的?” What about accuracy? Here, too, the news is not good. First, it is a de facto standard in many fields to use one in twenty as an acceptable cutoff for the rate of false positives. To the naive ear, that may sound promising: Surely it means that just 5 percent of scientific studies report a false positive? But this is precisely the same mistake as thinking that a stone has a 99 percent chance of containing a diamond just because the detector has sounded. What it really means is that for each of the countless false hypotheses that are contemplated by researchers, we accept a 5 percent chance that it will be falsely counted as true—a decision with a considerably more deleterious effect on the proportion of correct studies. 准确率又如何呢?也不是太令人乐观。首先,许多研究领域实际上能接受的上限是20个结果里面有一个假阳性。对普通人来说,这个听起来很不错:这想必表明仅仅只有5%的科学研究结果是假阳性的吧?但这和那些认为诱发探测器报警的石头会有99%的可能性藏有钻石的人正好犯了同一个错误。这个数字真正的意义在于,对于研究者们考虑的无数种可行假设中的每一个错误理论,我们接受有5%的可能性它们会被当成是正确理论。这是一个可以显著减少结果正确科学研究的做法。 Paradoxically, the situation is actually made worse by the fact that a promising connection is often studied by several independent teams. To see why, suppose that three groups of researchers are studying a phenomenon, and when all the data are analyzed, one group announces that it has discovered a connection, but the other two find nothing of note. Assuming that all the tests involved have a high statistical power, the lone positive finding is almost certainly the spurious one. However, when it comes time to report these findings, what happens? The teams that found a negative result may not even bother to write up their non-discovery. After all, a report that a fanciful connection probably isn’t true is not the stuff of which scientific prizes, grant money, and tenure decisions are made. 吊诡的是,当数个独立研究小组对同一理论假设做研究的时候,情况反而更糟了。这里用一个例子来说明为什么。设想有三个小组在研究同一现象,在分析完了所有数据之后,一个小组宣布他们发现了现象之间的联系,但是其它两个小组没有发现任何值得一提的东西。假如所有的实验都具有很强的统计学判断力,那么这个孤立的阳性结果几乎一定是可疑的。尽管如此,当要对实验结果做报告发表的时候,会发生什么呢?得出阴性结论的小组甚至都不会去把他们的毫无建树的实验写成论文。毕竟,科研奖项、经费、或是终身教授是不会给一个对有前景的理论假说持否定结论的。 And even if they did write it up, it probably wouldn’t be accepted for publication. Journals are in competition with one another for attention and “impact factor,” and are always more eager to report a new, exciting finding than a killjoy failure to find an association. In fact, both of these effects can be quantified. Since the majority of all investigated hypotheses are false, if positive and negative evidence were written up and accepted for publication in equal proportions, then the majority of articles in scientific journals should report no findings. When tallies are actually made, though, the precise opposite turns out to be true: Nearly every published scientific article reports the presence of an association. There must be massive bias at work. 而且就算他们写成了论文,也很可能不会被发表。期刊之间会争夺学术界的注意力和“影响因子”,因此更乐意发表激动人心的新发现,而不是那些煞风景的阴性结果。事实上,这两个效应是可以被量化的。既然大多数被研究的理论假设应该为假,则如果阴性结果和阳性结果一样被写成论文发表的话,那么大多数期刊论文都应该报告说没有任何发现才对。可是事实上却恰好相反,几乎所有得以发表的论文都认为现象之间存在关联。这个过程中必有大量的偏差。 Ioannidis’s argument would be potent even if all scientists were angels motivated by the best of intentions, but when the human element is considered, the picture becomes truly dismal. Scientists have long been aware of something euphemistically called the “experimenter effect”: the curious fact that when a phenomenon is investigated by a researcher who happens to believe in the phenomenon, it is far more likely to be detected. 即便科学家都如同天使一般,不受任何恶意驱使,Ioannidis教授的论证也一样成立。但是在考虑到人为因素之后,情况就真的差到难以想象了。科学家很久以来都熟悉所谓“观察者期望效应”的委婉说法,即当研究者相信某些现象存在的时候,他们就更有可能在实验中发现这些现象。 Much of the effect can likely be explained by researchers unconsciously giving hints or suggestions to their human or animal subjects, perhaps in something as subtle as body language or tone of voice. Even those with the best of intentions have been caught fudging measurements, or making small errors in rounding or in statistical analysis that happen to give a more favorable result. Very often, this is just the result of an honest statistical error that leads to a desirable outcome, and therefore it isn’t checked as deliberately as it might have been had it pointed in the opposite direction. 这种效应很多源自于:研究者无意识的给他们的人类或动物被试的一些暗示建议,这些暗示可以微妙到肢体语言或是声调变化。就算是最自律的研究者也曾被发现捏造测量,或是在取整的时候犯些小错误,抑或是偏向于统计分析给出的好结果等。经常是一个无心的统计偏差造成了研究者想要的结果,因而就不会被刻意的复查。如果结果指向相反的结论,恐怕就不会被这么轻易的放过了。 But, and there is no putting it nicely, deliberate fraud is far more widespread than the scientific establishment is generally willing to admit. One way we know that there’s a great deal of fraud occurring is that if you phrase your question the right way, scientists will confess to it. In a survey of two thousand research psychologists conducted in 2011, over half of those surveyed admitted outright to selectively reporting those experiments which gave the result they were after. Then the investigators asked respondents anonymously to estimate how many of their fellow scientists had engaged in fraudulent behavior, and promised them that the more accurate their guesses, the larger a contribution would be made to the charity of their choice. 但难以粉饰的事实是,学术圈内造假的广泛程度已经远超学界主流共识所愿意承认的那些。有一种方式可以让我们知道大批的造假行为正在发生,那就是巧妙的使用问卷调查来让科学家们坦白。在2011年的一次涉及两千多位心理学家的问卷调查里,半数以上直接承认了自己有选择性的报告了想要的实验结果。调查者之后让他们匿名估算同事中有多少人从事学术不诚信行为,并许诺向他们指定的慈善机构捐款,额度和估算的准确程度正相关。 Through several rounds of anonymous guessing, refined using the number of scientists who would admit their own fraud and other indirect measurements, the investigators concluded that around 10 percent of research psychologists have engaged in outright falsification of data, and more than half have engaged in less brazen but still fraudulent behavior such as reporting that a result was statistically significant when it was not, or deciding between two different data analysis techniques after looking at the results of each and choosing the more favorable. 经过数轮匿名估算,辅以自我报告的学术不诚信行为数字以及其它的间接测量,调查者得出的结论是:大约有10%的心理学家曾经直接伪造数据,并且半数以上曾经有过相对不那么无耻的学术不端行为,例如将非统计显著的结果报告为统计显著,或是在比较了两种数据分析结果之后再选择对自己有利的分析方法等。 Many forms of statistical falsification are devilishly difficult to catch, or close enough to a genuine judgment call to provide plausible deniability. Data analysis is very much an art, and one that affords even its most scrupulous practitioners a wide degree of latitude. Which of these two statistical tests, both applicable to this situation, should be used? Should a subpopulation of the research sample with some common criterion be picked out and reanalyzed as if it were the totality? Which of the hundreds of coincident factors measured should be controlled for, and how? The same freedom that empowers a statistician to pick a true signal out of the noise also enables a dishonest scientist to manufacture nearly any result he or she wishes. 许多形式的统计造假极难被抓住,或是太过于接近真实的分析决断,从而可以充分拒绝造假的指控。数据分析更像是一门艺术,即使是最严谨的数据分析者也有相当多的自由度可供发挥。两个同样适用的统计检验方法,该用哪个?是否应该将样本中的符合公共准则的子样本挑出来代表整体重新分析?数百个里面,我应该控制哪个?如何控制?使统计学家可以从噪音中挑出信号的那种自由度,同时让不诚实的科学家可以炮制出他/她想要的任何结果。 Cajoling statistical significance where in reality there is none, a practice commonly known as “p-hacking,” is particularly easy to accomplish and difficult to detect on a case-by-case basis. And since the vast majority of studies still do not report their raw data along with their findings, there is often nothing to re-analyze and check even if there were volunteers with the time and inclination to do so. 通过不断诱导数据从而得出不存在的显著统计,是一种通常被称作“p值操纵”的作弊法。做起来很容易,但是要检验出其是否被使用,却是极难。【译注:p值操纵指研究者轮番使用不同的统计方法和数据,直到结果显著为止。与正常的数据分析所采用的提出假设之后用数据验证假设的流程相反,p值操纵旨在找到具有显著性的关联,并在此基础上建立假设,因此导致假阳性。】并且大部分研究结果的原始数据还是不公开的,就算有人肯花时间来检查,也没有资源。 One creative attempt to estimate how widespread such dishonesty really is involves comparisons between fields of varying “hardness.” The author, Daniele Fanelli, theorized that the farther from physics one gets, the more freedom creeps into one’s experimental methodology, and the fewer constraints there are on a scientist’s conscious and unconscious biases. If all scientists were constantly attempting to influence the results of their analyses, but had more opportunities to do so the “softer” the science, then we might expect that the social sciences have more papers that confirm a sought-after hypothesis than do the physical sciences, with medicine and biology somewhere in the middle. 在估算这种学术不端的广泛性方面,有一个有创意的尝试,涉及到比较各学科的“硬”度。始作俑者Daniele Fanelli认为,一个学科离(最硬的)物理学越远,在实验方法上就更具有自由度,对于科学家们有意无意的错误的约束也越少。假如所有的科学家都试图影响实验分析的结果,而较“软”的学科里这么做更加容易,结果就是我们可能会发现,相比于物理学,社会科学发表的文章中更多的证实了那些倍受青睐的假说,而医学和生物学处于这两个学科之间的某个位置。 This is exactly what the study discovered: A paper in psychology or psychiatry is about five times as likely to report a positive result as one in astrophysics. This is not necessarily evidence that psychologists are all consciously or unconsciously manipulating their data—it could also be evidence of massive publication bias—but either way, the result is disturbing. 这正是研究发现的结果:心理学或是精神病学研究论文报告阳性结果的可能性是天体力学的五倍左右。这并不必然表明心理学家们在有意无意的篡改数据,也可能是论文发表系统的大规模选择性偏见,但是无论如何,令人担忧。 Speaking of physics, how do things go with this hardest of all hard sciences? Better than elsewhere, it would appear, and it’s unsurprising that those who claim all is well in the world of science reach so reliably and so insistently for examples from physics, preferably of the most theoretical sort. Folk histories of physics combine borrowed mathematical luster and Whiggish triumphalism in a way that journalists seem powerless to resist. The outcomes of physics experiments and astronomical observations seem so matter-of-fact, so concretely and immediately connected to underlying reality, that they might let us gingerly sidestep all of these issues concerning motivated or sloppy analysis and interpretation. 到物理学,对于这个最硬的学科,结果又如何呢?至少看起来比别的强。因而,不出意料的是,几乎所有认为科学世界安然无恙的那些人会放心的坚持从物理学里寻找例证,最好还是偏理论方向。民间物理学的历史以一种让记者们无法抵御的方式将数学的光泽和辉格式凯旋主义相结合。物理实验和天文观测的结果看上去如此注重事实,如此具体而又直接关联到其表象之下的现实世界,以至于可以让我们小心翼翼的绕开那些别有用心的或是不合格的分析和解读。 “E pur si muove,” Galileo is said to have remarked, and one can almost hear in his sigh the hopes of a hundred science journalists for whom it would be all too convenient if Nature were always willing to tell us whose theory is more correct. “不管你怎么想,它(地球)就是在动的”,这据说是伽利略的名言,而从他的这句感叹中我们几乎能听到一百个科学报道者的祈祷,因为对他们来说,大自然若是肯轻易透露谁的理论更正确,那简直就是太方便了。 And yet the flight to physics rather gives the game away, since measured any way you like—volume of papers, number of working researchers, total amount of funding—deductive, theory-building physics in the mold of Newton and Lagrange, Maxwell and Einstein, is a tiny fraction of modern science as a whole. In fact, it also makes up a tiny fraction of modern physics. Far more common is the delicate and subtle art of scouring inconceivably vast volumes of noise with advanced software and mathematical tools in search of the faintest signal of some hypothesized but never before observed phenomenon, whether an astrophysical event or the decay of a subatomic particle. 即使如此,向物理学寻求庇护也多少泄露一些信息。因为无论怎么看,不论是从发表文章数、研究员数量、还是研究经费方面来看,被牛顿、拉格朗日、麦克斯韦和爱因斯坦所铸造的基于演绎和理论构建的物理学,在整个现代科学界里面也仅仅只是一小撮。实际上,就算是在现代物理学里也是少数。更为普遍的情况则是那些精细微妙的艺术,能够使用先进的软件和数学工具,从难以想象的大规模数据中分离噪音,去找某种极其微弱的从未被观测到的理论信号。 This sort of work is difficult and beautiful in its own way, but it is not at all self-evident in the manner of a falling apple or an elliptical planetary orbit, and it is very sensitive to the same sorts of accidental contamination, deliberate fraud, and unconscious bias as the medical and social-scientific studies we have discussed. Two of the most vaunted physics results of the past few years—the announced discovery of both cosmic inflation and gravitational waves at the BICEP2 experiment in Antarctica, and the supposed discovery of superluminal neutrinos at the Swiss-Italian border—have now been retracted, with far less fanfare than when they were first published. 这类工作自有其难点和引人之处,但是绝不像落下的苹果或是椭圆的行星轨道那样不证自明,且和我们所讨论过的医学以及社会科学一样,非常容易受到意外污染、刻意造假和下意识的偏见所影响。过去几年里最饱受赞誉的两项物理学科研成果——北极BICEP2实验发现的宇宙暴涨和引力波,以及在瑞士-意大利边境发现的超光速中微子——现在已经被撤回,相应关注也比它们刚发表时少了许多。 Many defenders of the scientific establishment will admit to this problem, then offer hymns to the self-correcting nature of the scientific method. Yes, the path is rocky, they say, but peer review, competition between researchers, and the comforting fact that there is an objective reality out there whose test every theory must withstand or fail, all conspire to mean that sloppiness, bad luck, and even fraud are exposed and swept away by the advances of the field. 许多现有科研领域的辩护者承认这些问题,又称赞科学方法自有纠错能力。是的,道路是曲折的,他们说,但是同行评议、研究者之间的竞争、以及存在客观现实以检验理论这些令人舒心的事实,都会随着科学的进展潜移默化的将懒惰、倒霉、甚至欺诈等因素暴露并且驱逐出科研领域。 So the dogma goes. But these claims are rarely treated like hypotheses to be tested. Partisans of the new scientism are fond of recounting the “Sokal hoax”—physicist Alan Sokal submitted a paper heavy on jargon but full of false and meaningless statements to the postmodern cultural studies journal Social Text, which accepted and published it without quibble—but are unlikely to mention a similar experiment conducted on reviewers of the prestigious British Medical Journal. 教条就是这样口口相传。但是这些声明几乎从未被像科学假设那样检验过。新科学至上主义的支持者们乐于重复“Sokal恶作剧”(指物理学家Alan Sokal向后现代文化研究期刊《社会文本》递交了一篇充满着行话但却全是错误和无稽表述的论文,却被接受并且毫无异议的发表了),却不太可能提到一个类似的实验,对象是具有很高声望的英国医学期刊的评审者们。 The experimenters deliberately modified a paper to include eight different major errors in study design, methodology, data analysis, and interpretation of results, and not a single one of the 221 reviewers who participated caught all of the errors. On average, they caught fewer than two—and, unbelievably, these results held up even in the subset of reviewers who had been specifically warned that they were participating in a study and that there might be something a little odd in the paper that they were reviewing. In all, only 30 percent of reviewers recommended that the intentionally flawed paper be rejected. 实验者有意更改了一篇论文,使之包含八个不同的重大错误,分散于实验设计、方法论、数据分析、和结果解读方面。在221个评审者中,没有一个人挑出全部错误。他们平均抓到少于两个错误。并且,令人难以置信的是,当告诉一个分组的评审者们他们面对的论文有问题时,该结论也成立。总而言之,只有30%的评审者认为这篇有意制造的问题论文应该被拒绝发表。 If peer review is good at anything, it appears to be keeping unpopular ideas from being published. Consider the finding of another (yes, another) of these replicability studies, this time from a group of cancer researchers. In addition to reaching the now unsurprising conclusion that only a dismal 11 percent of the preclinical cancer research they examined could be validated after the fact, the authors identified another horrifying pattern: The “bad” papers that failed to replicate were, on average, cited far more often than the papers that did! As the authors put it, “some non-reproducible preclinical papers had spawned an entire field, with hundreds of secondary publications that expanded on elements of the original observation, but did not actually seek to confirm or falsify its fundamental basis.” 如果有一件事情是同行评议机制所擅长的,那就是让不受欢迎的想法不被发表。来看看另一个可重复性研究吧(对,另一个),这次是来自于一些癌症研究人员的。在得出只有11%的癌症临床前研究可被事后验证的令人毫不惊讶的结论之外,研究者们发现了另一个恐怖的现象:那些结果难以被重复的“坏”的论文,平均引用次数大于结果能被重复的那些!正如研究者们所提到的那样:“一些不可重复的临床前实验论文创造了一整个研究领域,和基于原初观察结论所衍生出的数百篇论文,但却没有认真确证或是证伪其研究基础。” What they do not mention is that once an entire field has been created—with careers, funding, appointments, and prestige all premised upon an experimental result which was utterly false due either to fraud or to plain bad luck—pointing this fact out is not likely to be very popular. Peer review switches from merely useless to actively harmful. It may be ineffective at keeping papers with analytic or methodological flaws from being published, but it can be deadly effective at suppressing criticism of a dominant research paradigm. Even if a critic is able to get his work published, pointing out that the house you’ve built together is situated over a chasm will not endear him to his colleagues or, more importantly, to his mentors and patrons. 可他们没有提到的是,当整个研究领域被创造出来,当事业、经费、职务和声望都和一个实验结论所绑定,这个结果是假造的,无论是出于有意欺骗还是仅仅只是运气不好,将事实捅出去看来不是很受欢迎的做法。由此,同行评审从纯粹无用变成了积极为害,它在为论文排除分析上的或方法论上的缺陷方面很没用,但是在压制对主流研究范式的批评方面却非常有效。就算批评者最终可以将他的作品发表,指出整个研究领域是空中楼阁这种行为也不会受到同事、甚至导师和赞助方的青睐。 Older scientists contribute to the propagation of scientific fields in ways that go beyond educating and mentoring a new generation. In many fields, it’s common for an established and respected researcher to serve as “senior author” on a bright young star’s first few publications, lending his prestige and credibility to the result, and signaling to reviewers that he stands behind it. In the natural sciences and medicine, senior scientists are frequently the controllers of laboratory resources—which these days include not just scientific instruments, but dedicated staffs of grant proposal writers and regulatory compliance experts—without which a young scientist has no hope of accomplishing significant research. Older scientists control access to scientific prestige by serving on the editorial boards of major journals and on university tenure-review committees. Finally, the government bodies that award the vast majority of scientific funding are either staffed or advised by distinguished practitioners in the field. 在科学领地的开拓上,有资历的科学家除了对新一代传道授业之外,还可以在其它方面施加很大影响。在很多学科领域,卓有建树且受人尊敬的老学者以论文通讯作者的方式为年轻有为的新学者站台,用自己的名声和信誉向论文评审者对实验结果做出担保,这是很常见的。在自然科学和医学领域,有资历的科学家往往也掌握重要的研究资源,这些资源如今已不仅仅是科学仪器,还包括专门的研究基金申请书写作小组和合规问题专家等。没有这些资源,资历浅的研究员很难做出有影响力的研究。前辈们还掌控着重要的学术声誉,他们往往在重要期刊和终身教职的评审委员会列席。最后,许多主要的科研经费来自于政府机构,而政府的研究理事会要么由行内卓越人士担任,要么向他们寻求建议。 All of which makes it rather more bothersome that older scientists are the most likely to be invested in the regnant research paradigm, whatever it is, even if it’s based on an old experiment that has never successfully been replicated. The quantum physicist Max Planck famously quipped: “A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.” 这一切都会使情况变的更麻烦,因为有资历的科学家更有可能站在主流的研究范式一边,无论该范式是什么,就算是建立在一个从未被成功重复的年代久远的实验结果之上。量子物理学家马克思·普朗克有句至理名言:“新的科学理论战胜旧理论,并非是论敌被说服了,而是论敌们最终都死掉了,新的一代成长起来并逐渐适应了新理论。” Planck may have been too optimistic. A recent paper from the National Bureau of Economic Research studied what happens to scientific subfields when star researchers die suddenly and at the peak of their abilities, and finds that while there is considerable evidence that young researchers are reluctant to challenge scientific superstars, a sudden and unexpected death does not significantly improve the situation, particularly when “key collaborators of the star are in a position to channel resources (such as editorial goodwill or funding) to insiders.” 普朗克可能有些过于乐观了。最近一篇来自于国家经济研究办公室的报告,研究了当明星学者在他们最为高产的时候突然死亡所带来的影响,发现虽然有大量的证据表明年轻学者不愿意去挑战明星学者,但是明星学者的突然意外死亡并不能显著改变这个情境,特别是当“明星学者的重要合作者依然掌控着学科内资源(如论文评审时的青睐或是研究经费)的分配渠道”时。 In the idealized Popperian view of scientific progress, new theories are proposed to explain new evidence that contradicts the predictions of old theories. The heretical philosopher of science Paul Feyerabend, on the other hand, claimed that new theories frequently contradict the best available evidence—at least at first. Often, the old observations were inaccurate or irrelevant, and it was the invention of a new theory that stimulated experimentalists to go hunting for new observational techniques to test it. 在理想化的波普尔式科学进步图景中,新理论应该能够解释新的证据,而这些证据是和旧理论所做出的预测是相悖的。与之相反,离经叛道的科学哲学家Paul Feyerabend认为,新理论常常和能够获得的最好证据相悖,至少在一开始是这样的。旧的观察方式往往不够精确或不是非常有关联,而正是新理论的发明促使实验者们去寻找新的观察技术来验证它们。 But the success of this “unofficial” process depends on a blithe disregard for evidence while the vulnerable young theory weathers an initial storm of skepticism. Yet if Feyerabend is correct, and an unpopular new theory can ignore or reject experimental data long enough to get its footing, how much longer can an old and creaky theory, buttressed by the reputations and influence and political power of hundreds of established practitioners, continue to hang in the air even when the results upon which it is premised are exposed as false? 但是这种“非正式”的过程能够成功的关键,取决于当脆弱的新理论一开始被怀疑的风暴包围时能否以一种天真乐观的方式来无视既有证据。尽管如此,就算Feyerabend是对的,且不受欢迎的新理论能够无视或拒绝实验数据以至于站稳脚跟,那些陈腐古板的旧理论,即便其所基于的实验结论已被证明是错的,背后有着数百名业内人士的名誉、影响力、和政治权力的支持,会继续滞留多久呢? The hagiographies of science are full of paeans to the self-correcting, self-healing nature of the enterprise. But if raw results are so often false, the filtering mechanisms so ineffective, and the self-correcting mechanisms so compromised and slow, then science’s approach to truth may not even be monotonic. That is, past theories, now “refuted” by evidence and replaced with new approaches, may be closer to the truth than what we think now. 科学的圣传中充斥着凸显其自我纠正和自我治愈能力的光辉事迹。但如果原始结果是如此容易出错,筛选过程如此无效,且自我纠正机制如此迟缓且经常不被遵守的话,那么科学发掘事实真相的过程甚至不一定是单调的。即,过去的理论,现在已经被新证据“证伪”且被新方法取代的那些,可能比我们所想的更接近事实。 Such regress has happened before: In the nineteenth century, the (correct) vitamin C deficiency theory of scurvy was replaced by the false belief that scurvy was caused by proximity to spoiled foods. Many ancient astronomers believed the heliocentric model of the solar system before it was supplanted by the geocentric theory of Ptolemy. The Whiggish view of scientific history is so dominant today that this possibility is spoken of only in hushed whispers, but ours is a world in which things once known can be lost and buried. 这种倒退在以前也曾经发生过:在19世纪,对于坏血病的(正确的)维他命C缺乏理论被错误的理论取代,该理论认为是坏掉的食物导致了坏血病。许多古代天文学者相信日心说的太阳系模型,直到它被托勒密的地心说取代。以辉格史观看待科学发展的历程支配着当前主流看法,以至于倒退的可能性仅仅存在于窃窃私语中。但是在我们身处的世界里,知识是可以被掩埋和失传的。 And even if self-correction does occur and theories move strictly along a lifecycle from less to more accurate, what if the unremitting flood of new, mostly false, results pours in faster? Too fast for the sclerotic, compromised truth-discerning mechanisms of science to operate? The result could be a growing body of true theories completely overwhelmed by an ever-larger thicket of baseless theories, such that the proportion of true scientific beliefs shrinks even while the absolute number of them continues to rise. Borges’s Library of Babel contained every true book that could ever be written, but it was useless because it also contained every false book, and both true and false were lost within an ocean of nonsense. 而且就算自我纠正确实发生了,且理论的发展严格遵循从模糊到精确的周期,可是如果那些新的、大部分是错误的结果以更快的速度涌现呢?这速度如果快过让迟钝且不完善的科学真理判定机制来做出反应,情况又会怎样呢?结果可能是增长的正确理论被完全淹没在更快速增长的无稽理论中,以至于正确理论的绝对数量在增加,而同时它们所占的比例却逐渐减小。博尔赫斯的“巴别图书馆”里有每一本可能的包含真正知识书籍,但这毫无用处,因为它也收藏了每一本由错误知识构成的书【译注:“巴别图书馆”的藏书包含了25个书写符号任意排列组合组成的所有可能书籍】,结果就是正确和错误的知识都消散于无意义的海洋里。 Which brings us to the odd moment in which we live. At the same time as an ever more bloated scientific bureaucracy churns out masses of research results, the majority of which are likely outright false, scientists themselves are lauded as heroes and science is upheld as the only legitimate basis for policy-making. There’s reason to believe that these phenomena are linked. When a formerly ascetic discipline suddenly attains a measure of influence, it is bound to be flooded by opportunists and charlatans, whether it’s the National Academy of Science or the monastery of Cluny. 我想起生活中的怪事。一方面科学官僚们产生日渐臃肿的研究结果,其中大部分很可能是错误的,另一方面,科学家受到英雄般的尊崇,而科学被视为制定政策的唯一合理依据。我们有理由认为这些现象之间是有联系的。当一个曾经冷门的领域突然获得了一定的影响力的时候,必然遭到一批投机者和骗子的入侵,无论是国家科学院还是克吕尼修道院,都是一样的情况。【译注:克吕尼修道院,是公元910年在法国克吕尼建立的天主教修道院,以禁欲著称,是天主教改革运动克吕尼改革的发源地。】 This comparison is not as outrageous as it seems: Like monasticism, science is an enterprise with a superhuman aim whose achievement is forever beyond the capacities of the flawed humans who aspire toward it. The best scientists know that they must practice a sort of mortification of the ego and cultivate a dispassion that allows them to report their findings, even when those findings might mean the dashing of hopes, the drying up of financial resources, and the loss of professional prestige. 这个比较并不是那么的荒谬:就像修道主义,科学也拥有一个超人的目标,其成就远非有缺陷的人类能力所及。最好的科学家懂得要忍辱负重并培养出冷静的心境,以便他们能够忠实地公布科学发现,尽管有时候这些发现意味着希望的破灭,财政的干涸,以及职业声誉上的损失。 It should be no surprise that even after outgrowing the monasteries, the practice of science has attracted souls driven to seek the truth regardless of personal cost and despite, for most of its history, a distinct lack of financial or status reward. Now, however, science and especially science bureaucracy is a career, and one amenable to social climbing. Careers attract careerists, in Feyerabend’s words: “devoid of ideas, full of fear, intent on producing some paltry result so that they can add to the flood of inane papers that now constitutes ‘scientific progress’ in many areas.” 不必惊奇,尽管科学的实践超出了修道的范畴,它仍然能吸引到不顾自身利益而追求真理的人们,尽管在历史上的大部分时期,投身科学无财无名。而现在,科学,特别是科技官僚,是一项职业,顺应社会攀爬。它会吸引一心求名求利的人,用Feyerabend的话说,这些人“毫无创见,充满恐惧,只想制造出某些琐碎的结论以便加入构成很多领域里所谓的‘科学进步’的论文大军”。 If science was unprepared for the influx of careerists, it was even less prepared for the blossoming of the Cult of Science. The Cult is related to the phenomenon described as “scientism”; both have a tendency to treat the body of scientific knowledge as a holy book or an a-religious revelation that offers simple and decisive resolutions to deep questions. 如果说科学界对突然涌入的利益分子缺乏准备,那么面对爆发的科学教派就更是措手不及了。这个教派和被称为“科学至上主义”的现象有很大联系。二者都倾向于将科学知识视为圣经或是某种非宗教意义上的启示,认为它对于深刻的问题可以带来简单且具有决定意义的解答。 But it adds to this a pinch of glib frivolity and a dash of unembarrassed ignorance. Its rhetorical tics include a forced enthusiasm (a search on Twitter for the hashtag “#sciencedancing” speaks volumes) and a penchant for profanity. Here in Silicon Valley, one can scarcely go a day without seeing a t-shirt reading “Science: It works, b—es!” The hero of the recent popular movie The Martian boasts that he will “science the sh— out of” a situation. 但是科学教在此之上又多了一点夸夸其谈和一点不知脸红的无知。在修辞上体现为一种强迫症式的狂热(在推特上搜一下“sciencedancing”的主题标签就知道了)和对脏话的嗜好。在我们硅谷,走在大街上经常看到有人的T恤上印着诸如“科学:贼好用,婊子们!”最近的热门电影《火星救援》里的主人公面对危机时的豪言壮语则是“用科学把它捅出屎”。 One of the largest groups on Facebook is titled “I f—ing love Science!” (a name which, combined with the group’s penchant for posting scarcely any actual scientific material but a lot of pictures of natural phenomena, has prompted more than one actual scientist of my acquaintance to mutter under her breath, “What you truly love is pictures”). Some of the Cult’s leaders like to play dress-up as scientists—Bill Nye and Neil deGrasse Tyson are two particularly prominent examples— but hardly any of them have contributed any research results of note. Rather, Cult leadership trends heavily in the direction of educators, popularizers, and journalists. 脸书上最大的团体之一的名字是“我真他妈的爱科学!”(这个名字,加上该团体对于发表大量自然现象的图片而不是科学内容的爱好,已经让不止一个我认识的真正的科学家嘀咕“你们爱的其实是图片吧”)。某些科学教的领袖们喜欢装扮成科学家的样子——Bill Nye和Neil deGrasse Tyson是其中的两个典型——但他们几乎没有任何值得一提的研究贡献。与之相对的是,这些领袖们在教育者、科普者、和媒体从业者中非常受欢迎。 At its best, science is a human enterprise with a superhuman aim: the discovery of regularities in the order of nature, and the discerning of the consequences of those regularities. We’ve seen example after example of how the human element of this enterprise harms and damages its progress, through incompetence, fraud, selfishness, prejudice, or the simple combination of an honest oversight or slip with plain bad luck. These failings need not hobble the scientific enterprise broadly conceived, but only if scientists are hyper-aware of and endlessly vigilant about the errors of their colleagues . . . and of themselves. When cultural trends attempt to render science a sort of religion-less clericalism, scientists are apt to forget that they are made of the same crooked timber as the rest of humanity and will necessarily imperil the work that they do. The greatest friends of the Cult of Science are the worst enemies of science’s actual practice. 总之,科学在最好的时候,是具有非凡目标的人类事业:在自然的秩序中发现常理,并且用这些常理来推断事情的后果。我们看到了这项事业里的人类因素一个又一个危害进步的例子,有些出于无能、欺瞒、自私、偏见,有些只是出于某种诚实的忽视和一点坏运气。这些失败不能成为科学事业的羁绊,但这需要科学家对于同事们和自己的错误非常了解,并且保持高度警惕。当文化潮流试图将科学表述成某种区别于宗教的圣职专权时,科学家们非容易忘记他们是和其他人一样易于腐蚀的朽木,随时有可能危害从事的行业。最狂热的科学教徒是科学实践最大的敌人。 William A. Wilson is a software engineer in the San Francisco Bay Area. 本文作者William A. Wilson 是旧金山湾区的一名软件工程师。 (编辑:辉格@whigzhou) *注:本译文未经原作者授权,本站对原文不持有也不主张任何权利,如果你恰好对原文拥有权益并希望我们移除相关内容,请私信联系,我们会立即作出响应。

——海德沙龙·翻译组,致力于将英文世界的好文章搬进中文世界——

[译文]一位社会心理学家的自白

RECKONING WITH THE PAST
和过去做个了结

作者:MICHAEL INZLICHT @ 2016-02-29
译者:龟海海(@龟海海)
校对:混乱阈值(@混乱阈值)
来源:MICHAEL INZLICHT的博客,http://michaelinzlicht.com/getting-better/2016/2/29/reckoning-with-the-past

Sometimes I wonder if I should be fixing myself more to drink.

有时候我辗转反侧,不知是否该借酒消愁。

No, this is not going to be an optimistic post.

没错,这不是一篇鸡汤文。

If you want bubbles and sunshine, please see my friend Simine Vazire’s post on why she is feeling optimistic about things. If you want nuance and balance, see my co-moderator Alison Ledgerwood’s new blog*. Instead, if you will allow me, I want to wallow.

如果你想要泡沫和阳光,我朋友Simine Vazire的文章会告诉你为什么她如此积极乐观。如果你想要情绪间的微妙平衡,看我同僚Alison Ledgerwood的新博客。而我,只想好好吐槽一番。

I have so many feelings about the situation we’re in, and sometimes the weight of it all breaks my heart. I know I’m being intemperate, not thinking clearly, but I feel that it is only when we feel badly, when we acknowledge and, yes, grieve for yesterday, that we can allow for a better tomorrow. I want a better tomorrow, I want social psychology to change. But, the only way we can really change is if we reckon with our past, coming clean that we erred; and erred badly.

我对我们现在的处境有太多的感触,这有时沉重得让我心力交瘁。我知道我失去了自控,头脑不清楚。但我觉得只有当我们直面昨日,为昨日沉痛伤感,才能拥有美好的明天。我渴望美好的明天,我希望社会心理学能改变。但是,唯一能使我们真正改变的是和过去做个了结,坦白过去所犯的严重错误。

To be clear: I am in love with social psychology. I am writing here because I am still in love with social psychology. Yet, I am dismayed that so many of us are dismissing or justifying all those small (and not so small) signs that things are just not right, that things are not what they seem. “Carry-on, folks, nothing to see here,” is what some of us seem to be saying.

首先声明:我热爱社会心理学。我在这儿码字就是因为我依然爱它。然而,让我感到泄气的是,尽管很多微小(其实并非如此微小)的迹象表明情况不妙且另有隐情,我们之中许多人却对所有这些迹象视而不见或想出种种理由开脱。“继续,伙计,这儿没啥好看的,”我们中有些人似乎在这么说着。(more...)

标签: |
7422
RECKONING WITH THE PAST 和过去做个了结 作者:MICHAEL INZLICHT @ 2016-02-29 译者:龟海海(@龟海海) 校对:混乱阈值(@混乱阈值) 来源:MICHAEL INZLICHT的博客,http://michaelinzlicht.com/getting-better/2016/2/29/reckoning-with-the-past Sometimes I wonder if I should be fixing myself more to drink. 有时候我辗转反侧,不知是否该借酒消愁。 No, this is not going to be an optimistic post. 没错,这不是一篇鸡汤文。 If you want bubbles and sunshine, please see my friend Simine Vazire’s post on why she is feeling optimistic about things. If you want nuance and balance, see my co-moderator Alison Ledgerwood’s new blog*. Instead, if you will allow me, I want to wallow. 如果你想要泡沫和阳光,我朋友Simine Vazire的文章会告诉你为什么她如此积极乐观。如果你想要情绪间的微妙平衡,看我同僚Alison Ledgerwood的新博客。而我,只想好好吐槽一番。 I have so many feelings about the situation we’re in, and sometimes the weight of it all breaks my heart. I know I’m being intemperate, not thinking clearly, but I feel that it is only when we feel badly, when we acknowledge and, yes, grieve for yesterday, that we can allow for a better tomorrow. I want a better tomorrow, I want social psychology to change. But, the only way we can really change is if we reckon with our past, coming clean that we erred; and erred badly. 我对我们现在的处境有太多的感触,这有时沉重得让我心力交瘁。我知道我失去了自控,头脑不清楚。但我觉得只有当我们直面昨日,为昨日沉痛伤感,才能拥有美好的明天。我渴望美好的明天,我希望社会心理学能改变。但是,唯一能使我们真正改变的是和过去做个了结,坦白过去所犯的严重错误。 To be clear: I am in love with social psychology. I am writing here because I am still in love with social psychology. Yet, I am dismayed that so many of us are dismissing or justifying all those small (and not so small) signs that things are just not right, that things are not what they seem. “Carry-on, folks, nothing to see here,” is what some of us seem to be saying. 首先声明:我热爱社会心理学。我在这儿码字就是因为我依然爱它。然而,让我感到泄气的是,尽管很多微小(其实并非如此微小)的迹象表明情况不妙且另有隐情,我们之中许多人却对所有这些迹象视而不见或想出种种理由开脱。“继续,伙计,这儿没啥好看的,”我们中有些人似乎在这么说着。 Our problems are not small and they will not be remedied by small fixes. Our problems are systemic and they are at the core of how we conduct our science. My eyes were first opened to this possibility when I read Simmons, Nelson, and Simonsohn’s paper during what seems like a different, more innocent time. 我们的问题不小,想轻易补救谈何容易。我们的问题是系统性的,而且密切关系到我们如何进行科研。我起初发现有可能出了问题是在我读了 Simmons, Nelson, 和Simonsohn合著的论文之后,那时情况看起来和如今还有所不同,还是一个更纯真的年代。【编注:该论文发表于2011年】 This paper details how small, seemingly innocuous, and previously encouraged data-analysis decisions could allow for anything to be presented as statistically significant. That is, flexibility in data collection and analysis could make even impossible effects seem possible and significant. 这篇论文详细阐述了那些之前受鼓励的微小且看似无害的数据分析是如何让事物呈现出统计意义的。那就是,灵活的数据收集和分析可以让那些实际不可能的作用变得可能并且显著。 What is worse, Andrew Gelman made clear that a researcher need not actively p-hack their data to reach erroneous conclusions. It turns out such biases in data analyses might not be conscious, that researchers might not even be aware of how their data-contingent decisions are warping the conclusions they reach. This is flat-out scary: Even honest researchers with the highest of integrity might be reaching erroneous conclusions at an alarming rate. 更糟的是,研究者无需主动挖掘数据就能得到错误的结论,这点被Andrew Gelman解释得很清楚。事实是,研究者在数据分析中的偏见可能不是有意识的,他们甚至没有意识到自己依据数据做出的决定正在歪曲他们最终得到的结论。这可怕至极:即使最诚实,最正直的研究者也有可能以高得吓人的几率得出错误的结论。 Third, is the problem of publication bias. As a field, we tend only to publish significant results. This could be because as authors we choose to focus on these; or, more likely, because reviewers, editors, and journals force us to focus on these and to ignore nulls. 接下来还有发表过程中的偏见。在特定领域中,我们只倾向于发表具有显著意义的结果。这可能是由于作为作者我们选择把注意力放在这些结果上;或者,更可能的是,因为审稿人,编辑和期刊迫使我们把注意力放在具有显著意义的结果上,而忽略那些零结果的研究。 This creates the infamous file drawer that altogether warps the research landscape. Because it is unclear how large the file drawer is for any research literature, it is hard to determine how large or small any effect is, if it exists at all. 这就导致了臭名昭著的"文件抽屉"问题(即发表偏见问题),最终歪曲了整个研究领域的形态。由于对任何研究文献我们无法知道其中的“文件抽屉”有多大,我们很难确定该问题所产生的某种影响有多大,假如该影响确实存在的话。 I think these three ideas—that data flexibility can lead to a raft of false positives, that this process might occur without researchers themselves being aware, and the unknown size of the file drawer—explains why so many of our cherished results can’t replicate. These three ideas suggest we might have been fooling ourselves into thinking we were chasing things that are real and robust, when we were pursuing neither. 我认为以上三点——数据的灵活性可能导致大量错误结论,且这一过程可能在研究人员不经意间发生,以及“文件抽屉”尺寸大小的不明——很好地解释了为什么众多我们所珍视的研究成果无法被重复。这三点表明我们可能一直以来自欺欺人以为自己在探求真实且坚实的结果,而事实上我们所追求的既不真实也不坚实。 As someone who has been doing research for nearly twenty years, I now can’t help but wonder if the topics I chose to study are in fact real and robust. Have I been chasing puffs of smoke for all these years? 作为一个做了近20年研究的人,我忍不住怀疑过往研究的课题是否有确凿的依据立论。这些年来我致力探求的是否只是海市蜃楼? I have spent nearly a decade working on the concept of ego depletion, including work that is critical of the model used to explain the phenomenon. I have been rewarded for this work, and I am convinced that the main reason I get any invitations to speak at colloquia and brown-bags these days is because of this work. 我曾用将近十年的时间来研究“自我耗尽”的概念,包括对解释该现象的模型至关重要的一些工作。我因此项研究获奖,同时我确信现在我之所以能受邀在众多学术讨论会发言并白吃白喝都是因为此项研究。 The problem is that ego depletion might not even be a thing. By now, many people are aware that a massive replication attempt of the basic ego depletion effect involving over 2,000 participants found nothing, nada, zip. Only three of the 24 participating labs found a significant effect, but even then, one of these found a significant result in the wrong direction! 问题在于,“自我耗尽”这个概念可能根本就不存在。时至今日,许多人都知道一项由两千余人参加的试图重复“自我耗尽”效应的大规模研究最终什么都没发现,一片空白。二十四个参与研究的实验室中只有三个发现显著的效应,但即使这样,其中一个发现的显著效应竟然是反向的! There is a lot more to this registered replication than the main headline, and there is still so much evidence indicating fatigue is a real phenomenon. I promise to get to these thoughts in a later post, once the paper is finally published. But for now, we are left with a sobering question: If a large sample pre-registered study found absolutely nothing, how has the ego depletion effect been replicated and extended hundreds and hundreds of times? More sobering still: What other phenomena, which we now consider obviously real and true, will be revealed to be just as fragile? 此次记录在案的重复性研究留下的不仅仅是一个标题,同时,还有大量的证据表明“疲劳”是真实存在的现象。我承诺一旦我的论文最终发表,我会在之后的博客文章中加以阐述。但现在,令人警醒的问题则是:如果此前大量的研究毫无斩获,那么“自我耗尽”的效应是如何成千上万次地被复制并延伸的呢?更令人警醒的:其它那些我们认为真实无疑的现象,又会不会同样经不起检验呢? As I said, I’m in a dark place. I feel like the ground is moving from underneath me and I no longer know what is real and what is not. 如我所说,我身处黑暗之地。我感觉似乎脚下的土地都在移动,而我已经辨不清真实和虚假了。 I edited an entire book on stereotype threat, I have signed my name to an amicus brief to the Supreme Court of the United States citing stereotype threat, yet now I am not as certain as I once was about the robustness of the effect. I feel like a traitor for having just written that; like, I’ve disrespected my parents, a no no according to Commandment number 5. 之前我编辑了《刻板印象的威胁》一书,我还签署了一份美国最高法院的法庭陈述并引用了《刻板印象的威胁》,但如今我对该效应的确凿程度却不如过去那样坚定。写下这些文字,让我觉得自己像个叛徒。这感觉如同我对父母大不敬,触犯了十戒第五条。 But, a meta-analysis published just last year suggests that stereotype threat, at least for some populations and under some conditions, might not be so robust after all. P-curving some of the original papers is also not comforting. 但是,去年一项“元分析”(对以往的研究结果进行系统的定量分析)的研究表明,”刻板印象威胁”在一些特定条件下对于一些特定人群可能并不适用,此外对一些原始论文作p值统计曲线的结果同样不让人放心。 Now, stereotype threat is a politically charged topic and there is a lot of evidence supporting it. That said, I think a lot more pain-staking work needs to be done on basic replications, and until then, I would be lying if I said that doubts have not crept in. Rumor has it that a RRR of stereotype threat is in the works. 如今,“刻板印象威胁”是一个政治上受攻击的话题,也受很多有力证据的支持。在这样的情况下,我认为在基础的重复性研究上还有更多艰苦的工作需要做,在这之前,我若说对该效应没有疑问那肯定是在撒谎。有传言称,在之前的很多关于“刻板印象的威胁”的工作中存在着危险信号。 To be fair, this is not social psychology’s problem alone. Many other allied areas in psychology might be similarly fraught and I look forward to these other areas scrutinizing their own work—areas like developmental, clinical, industrial/organizational, consumer behavior, organizational behavior, and so on, need an RPP project or Many Labs of their own. Other areas of science face similar problems too. 公正地说,不止是社会心理学领域存在此问题。心理学中的许多其它类似领域可能同样受影响,我希望这些领域中的研究工作被仔细检验,如进化的、临床的、产业的/组织的、消费行为的、组织行为的心理学等等,都需要一个研究参与池项目【译注:RPP,Research Participation Pool,是一个协调管理研究参与对象的项目】或者“多重实验室”项目【译注:多重实验室项目,Many Labs Project是一个旨在对心理科学多种效应进行可重复性验证的项目】。其他领域的科学研究同样面临类似问题。 During my dark moments, I feel like social psychology needs a redo, a fresh start. Where to begin, though? What am I mostly certain about and where can my skepticism end? I feel like there are legitimate things we have learned, but how do we separate wheat from chaff? Do we need to go back and meticulously replicate everything in the past? Or do we use those bias tests Joe Hilgard is so sick and tired of to point us in the right direction? What should I stop teaching to my undergraduates? I don’t have answers to any of these questions. 在我消沉的这段时间,我觉着社会心理学需要推倒重建,从头来过。那么,从哪儿开始?对于哪些事我能确信不疑?在哪里我能平息我的疑惑?我认为我们学到了一些合理的东西,但如何区分成果和糟粕呢?我们是否需要回去并且一丝不苟地重复过去所有的事情呢?或者我们是否该使用Joe Hilgard厌恶至极的偏见测试来指明方向?哪些东西是我不该教授给本科生的?对所有这些问题我都没有答案。 This blogpost is not going to end on a sunny note. Our problems are real and they run deep. Okay, I do have some hope: I legitimately think our problems are solvable. I think the calls for more statistical power, greater transparency surrounding null results, and more confirmatory studies can save us. What is not helping is the lack of acknowledgement about the severity of our problems. What is not helping is a reluctance to dig into our past and ask what needs revisiting. 本篇博文注定不会有个阳光的结局。我们的问题是真切的,而且深入。好吧,我确实有几点期望:我有理由相信我们的问题是有解的。我认为更多数据支撑,对零结果研究更透明的运作,更多证实性的研究,这些可以解救我们于目前的困境。而帮倒忙的则是:缺乏对问题严重性的认知,不愿意挖掘探究我们的过去并且不愿拷问哪里出了问题。 Time is nigh to reckon with our past. Our future just might depend on it. 时候不早了,是该和我们的过去做个了结了。或许,我们的未来还指望着它呢。

········

*In case you haven’t heard, Alison started a wonderful Facebook discussion group that I have the privilege of co-moderating. If you’re tired of bickering and incivility, but still want a place to discuss ideas, PsychMAP just might be for for you. 再次安利一下,Alison开了一个非常不错的脸书讨论组,我也有幸在其中参与共同主持。如果你厌倦了互撕,但仍想找个地方抒发讨论,PsychMAP可能恰好就适合你。 (编辑:辉格@whigzhou) *注:本译文未经原作者授权,本站对原文不持有也不主张任何权利,如果你恰好对原文拥有权益并希望我们移除相关内容,请私信联系,我们会立即作出响应。

——海德沙龙·翻译组,致力于将英文世界的好文章搬进中文世界——

会动摇多少结论呢

【2016-07-25】

@whigzhou: 以统计学方法为主导的研究有个问题是,容易让人忽视一些有着根本重要性但又缺乏统计差异的因素,比如身高,在一个儿童营养条件普遍得到保障的社会,研究者可能会得出『营养不是影响身高的重要因素』的结论,并且这一结论可能在很多年中都经受住了考验,直到有一天,某一人群经历了一次严重营养不良……

@whigzhou: 在可控实验中,此类问题可以通过对营养条件这一参数施加干预而得以避免,但社会科学领域常常不具备对参数进行任意干预的条件,只能用统计学方法来模拟可控实验,可是(more...)

标签: | |
7329
【2016-07-25】 @whigzhou: 以统计学方法为主导的研究有个问题是,容易让人忽视一些有着根本重要性但又缺乏统计差异的因素,比如身高,在一个儿童营养条件普遍得到保障的社会,研究者可能会得出『营养不是影响身高的重要因素』的结论,并且这一结论可能在很多年中都经受住了考验,直到有一天,某一人群经历了一次严重营养不良…… @whigzhou: 在可控实验中,此类问题可以通过对营养条件这一参数施加干预而得以避免,但社会科学领域常常不具备对参数进行任意干预的条件,只能用统计学方法来模拟可控实验,可是当某些变量的采样值缺乏多样性时,这一模拟便无法进行,于是便留下了盲点。 @whigzhou: 近年来有很多针对国别的政治学研究,量化了很多指标,统计学工具也用的挺熟练,但我总有种感觉,一些基本背景条件似乎没有得到足够关注,比如拿破仑战争之后各国政治的一个基本背景是英帝或美帝的存在,这一条件如此普遍而牢固乃至观察不到差异,一旦消除,会动摇多少结论呢? @whigzhou: 让问题变得更棘手的是那些存在足够大差异但『边际影响率从某个阈值开始骤减』的变量,比如钙摄入量与身高的关系,在『从零到适宜值』这个区间,钙摄入对身高影响显著,而从适宜值往上,边际影响率急减,几乎没影响,此时更容易得出错误结论。 @慕容飞宇gg: 是。类似的各种公立学校和私立学校的比较也存在类似问题,现有的结论都只适用于现在90%的学生上公立学校的基本背景。对李伯儒主导的学界来说这个基本背景是理所当然的。 @whigzhou: 嗯 @whigzhou: 我们经常听到诸如『某一特性差异60%归因于基因,40%归于环境』之类的说法,仿佛这一归因比例是某个固有值似的,而实际上,这些比例当然高度依赖于目标人群的生存条件,你把一个群体的铅污染全部消除,智力的环境影响『比重』立马就降低了。 @whigzhou: Taleb的《黑天鹅》想要谈论的就是这个主题,可是他太笨了,写了厚厚一本看起来很哲学的砖头书,结果也没说清楚。  
[译文]为何精神分裂症患者那么爱抽烟

Schizophrenia: No Smoking Gun
精神分裂症:缺乏“冒烟”的确凿证据

作者:Scott Alexander @ 2016-01-11
译者:沈沉(@你在何地-sxy)
校对:小册子(@昵称被抢的小册子)
来源:Slate Star Codex,http://slatestarcodex.com/2016/01/11/schizophrenia-no-smoking-gun/

[Note: despite how some people are spinning this, tobacco is still really really bad and you should not smoke it]
【请注意:尽管许多人言之凿凿,但烟草真的真的还是很不好,不应该抽烟。】

I.

Schizophrenics smoke. A lot. Depending on the study, about 60-80% of schizophrenics smoke, compared to only about 20% of the general population. And they spend on average about 27% (!) of their income on cigarettes. Even allowing that schizophrenics don’t make much income, that’s a lot of money. Sure, schizophrenics are often poor and undereducated and have other risk factors for smoking – but even after you control for this, the effect is still pretty strong.

精神分裂症患者抽烟,而且很多。根据某项研究,大约60%至80%的精神分裂症患者会抽烟,与之相比,总人口中只有约20%。而且,他们在烟草上的花费大约占到其收入的27%(!)。即便考虑到精神分裂症患者收入不高,这也是一大笔钱。无疑,精神分裂症患者通常都很穷、受教育程度不高,并且还有其他导致其吸烟的风险因素,但即便把所有这些都加以控制,精神分裂症与抽烟之间的统计关系还是很强。

Various people have come up with various explanations. Cognitively-minded people say that schizophrenics smoke as a maladaptive coping strategy for the anxiety caused by their condition. Pharmacologically-minded people say that schizophrenics smoke because smoking accelerates the metabolism of antipsychotic drugs and so makes their side effects go away faster. Pragmatically-minded people say that schizophrenics smoke because they’re stuck in institutions with nothing to do all day. No points for guessing what the Freudians say.

许多人已经为此提出过许多各种解释。关注认知的人说,精神分裂症患者抽烟,是对该疾病所致焦虑的不良应对策略。关注药理的人会说,他们抽烟是因为抽烟会加快抗精神病药物的代谢,从而能够促使其副作用更快消失。更为务实的人会说,他们抽烟是因为他们被困在了整日无所事事的社会福利机构里面。猜测弗洛伊德主义者的说法就没必要了。

But all these theories have problems. Sure, schizophrenics are often institutionalized, but even the ones at home smoke a lot. Sure, some schizophrenics are often on antipsychotics, but even the ones who aren’t on meds smoke a lot. Sure, schizophrenics are anxious, but we don’t see people with Generalized Anxiety Disorder having 80% smoking rates.

但所有这些理论都存在问题。毫无疑问,精神分裂症患者通常都被社会福利机构收容,但即便是那些散居在家的也抽很多烟。毫无疑问,有些精神分裂症患者经常服用抗精神病药,但即便是那些不服药的也抽很多烟。毫无疑问,精神分裂症患者很焦虑,但我们并没有在患有广泛性焦虑障碍的人群中看到80%的吸烟率。

As usual, (more...)

标签: | |
7262
Schizophrenia: No Smoking Gun 精神分裂症:缺乏“冒烟”的确凿证据 作者:Scott Alexander @ 2016-01-11 译者:沈沉(@你在何地-sxy) 校对:小册子(@昵称被抢的小册子) 来源:Slate Star Codex,http://slatestarcodex.com/2016/01/11/schizophrenia-no-smoking-gun/ [Note: despite how some people are spinning this, tobacco is still really really bad and you should not smoke it] 【请注意:尽管许多人言之凿凿,但烟草真的真的还是很不好,不应该抽烟。】 I. Schizophrenics smoke. A lot. Depending on the study, about 60-80% of schizophrenics smoke, compared to only about 20% of the general population. And they spend on average about 27% (!) of their income on cigarettes. Even allowing that schizophrenics don’t make much income, that’s a lot of money. Sure, schizophrenics are often poor and undereducated and have other risk factors for smoking – but even after you control for this, the effect is still pretty strong. 精神分裂症患者抽烟,而且很多。根据某项研究,大约60%至80%的精神分裂症患者会抽烟,与之相比,总人口中只有约20%。而且,他们在烟草上的花费大约占到其收入的27%(!)。即便考虑到精神分裂症患者收入不高,这也是一大笔钱。无疑,精神分裂症患者通常都很穷、受教育程度不高,并且还有其他导致其吸烟的风险因素,但即便把所有这些都加以控制,精神分裂症与抽烟之间的统计关系还是很强。 Various people have come up with various explanations. Cognitively-minded people say that schizophrenics smoke as a maladaptive coping strategy for the anxiety caused by their condition. Pharmacologically-minded people say that schizophrenics smoke because smoking accelerates the metabolism of antipsychotic drugs and so makes their side effects go away faster. Pragmatically-minded people say that schizophrenics smoke because they’re stuck in institutions with nothing to do all day. No points for guessing what the Freudians say. 许多人已经为此提出过许多各种解释。关注认知的人说,精神分裂症患者抽烟,是对该疾病所致焦虑的不良应对策略。关注药理的人会说,他们抽烟是因为抽烟会加快抗精神病药物的代谢,从而能够促使其副作用更快消失。更为务实的人会说,他们抽烟是因为他们被困在了整日无所事事的社会福利机构里面。猜测弗洛伊德主义者的说法就没必要了。 But all these theories have problems. Sure, schizophrenics are often institutionalized, but even the ones at home smoke a lot. Sure, some schizophrenics are often on antipsychotics, but even the ones who aren’t on meds smoke a lot. Sure, schizophrenics are anxious, but we don’t see people with Generalized Anxiety Disorder having 80% smoking rates. 但所有这些理论都存在问题。毫无疑问,精神分裂症患者通常都被社会福利机构收容,但即便是那些散居在家的也抽很多烟。毫无疑问,有些精神分裂症患者经常服用抗精神病药,但即便是那些不服药的也抽很多烟。毫无疑问,精神分裂症患者很焦虑,但我们并没有在患有广泛性焦虑障碍的人群中看到80%的吸烟率。 As usual, I’m more biologically-minded, so I find it interesting that some of the genes that most commonly turn up as linked to schizophrenia – especially CHRNA3, CHRNA5, and CHRNA7 – are in nicotine receptors. Indeed, some of them are also the genes identified as risk factors for smoking. 我素来更倾向从生物学方面考虑,所以我发现了一个有趣之处,那就是部分最经常被与精神分裂症联系在一起的基因,特别是CHRNA3、CHRNA5和CHRNA7,都能在尼古丁受体上找到。 Further, there’s a lot of evidence that schizophrenic people actually feel better and have fewer symptoms when they’re smoking. Further, schizophrenics tend to gravitate toward cigarettes with higher nicotine content, and smoke them in ways that maximize nicotine absorption. 实际上,部分此类基因同时也被确认为影响吸烟的风险因素。此外,大量证据表明,精神分裂症患者在吸烟时确实会更加舒坦、更少症状。此外,精神分裂症患者一般会较喜欢尼古丁含量更高的烟草,而且吸烟时会设法尽量吸收更多的尼古丁。 It seems like part of the problem with schizophrenia is that the brain’s nicotine system isn’t working well. Smoking supplements nicotine and makes the system run smoother, so schizophrenics feel better when they smoke and continue to do so. This is the widely accepted self-medication hypothesis. 精神分裂症的问题似乎部分在于患者大脑的尼古丁系统运转不良。吸烟能够补充尼古丁,从而让这一系统运转更加顺畅,所以精神分裂症患者在吸烟时会感觉更加良好,并且乐此不疲。这就是受到广泛认同的“自发用药假说”。 I like this because it’s a really elegant example of…I don’t know what you’d call it…memetic evolution? Nobody knew that nicotine helped schizophrenia, nobody told the schizophrenics that, but they sort of naturally gravitated to an effective treatment for their condition by going in the direction of things that make them feel better, even going so far as to unknowingly gravitate toward cigarette brands with more nicotine. 我喜欢这一假说,因为它真是模因进化(我不知道你们如何称呼它)的一个极好例证。原先并没人知道尼古丁有助于缓解精神分裂症,没人这么告诉患者,但他们通过追随让他们感觉良好的事物,可以说是自然地找到了有效的治疗方法,甚至不自觉地偏爱尼古丁含量更高的烟草品牌。 They did all of this before psychiatry had any idea why they were doing it, and in the face of constant protests that it was stupid and useless. This should be a warning to anyone who’s too quick to tell patients that their coping strategies are maladaptive. 早在精神病学对其做法之缘由有任何了解之前,他们就已经在这么做了,尽管当时人们一直批评这种做法既愚蠢又无用。有些人会过于仓促地认为患者的应对策略调整不佳,上述事实应当能让这些人引以为戒。 But there’s a much more important question here: does smoking cause schizophrenia? How about prevent it? 但此处还有一个更为重要的问题:吸烟会导致精神分裂症吗?又会不会防止精神分裂症呢? II. First, the causation argument. Gurillo et al do a meta-analysis and conclude that “daily tobacco use is associated with increased risk of psychosis and an earlier age of onset of psychotic illness. The possibility of a causal link between tobacco use and psychosis merits further examination”. That is, schizophrenics are already smoking much more at the moment their schizophrenia starts. This suggests that maybe smoking is helping to cause the schizophrenia? 首先来看因果论证。Gurillo等人做了一个荟萃分析,得出结论认为:“每日使用烟草与精神病风险的增加和精神疾病发病年龄的提早均有关。烟草使用和精神病之间存在因果关系的可能性还需要进一步研究。”也就是说,精神分裂症患者在初次发病时就已经在大量抽烟了。这是否意味着吸烟有可能增加患精神分裂症的风险? All nice and well, except for a few things. First, this study ignores the possibility that the genes that cause schizophrenia might also cause increased smoking, even though we have some evidence that this is true (actually, it doesn’t ignore this, it mentions it, but uses it as a reason why a schizophrenia-smoking link is more plausible). 听上去很好,就是有一点点问题。首先,该研究忽略了一种可能性,即导致精神分裂症的基因可能也会导致烟瘾增加,而我们在这方面有一些证据。(实际上该研究并没有忽略这种可能性,而是有所提及,但只是把它作为精神分裂症与吸烟有关联这一说法更可信的理由)。 Second, we know that people who will later develop schizophrenia are seen as kind of odd even before they come down with the disease, and it’s possible that they’re already in some unusual brain state that smoking helps relieve. Third, this study is not controlled – meaning that we’re totally helpless before factors like “people destined to later develop schizophrenia are often poor, and poor people smoke more”. 第二,我们知道,有些后来得了精神分裂症的人早在得病之前就看起来似乎有点奇怪,可能那时候他们的大脑就已经处于某种不正常状态,而吸烟能帮助缓解这种状况。第三,该项研究没有进行对照控制,也就是说如果把某些因素考虑进去,比如“后来注定会得精神分裂症的人通常很穷,而穷人通常抽烟更多”等,我们就无力回答。 And fourth, another study shows exactly the opposite. 还有,第四,另一项研究有完全相反的发现。 Zammit et al (thanks to @allfeelsallthetime for the tip) looks at 50,000 teenage Swedish conscripts, then follows them throughout their lives to see which ones do or don’t get schizophrenia. They find that without adjusting for confounders, smokers are more likely to get schizophrenia. Zammit等人(感谢网友@allfeesallthetime提示)选取了50000个应征入伍的瑞典青少年,然后终身追踪他们,观察哪些会得精神分裂症,哪些不会。他们发现,如果不就混杂因子【编注:混杂因子是指同时导致A与B两个因子,从而使得A与B表现出相关性的因子。】作出调整,吸烟者便看起来更可能得精神分裂症。 But when you do adjust for confounders, smokers are less likely to get schizophrenia, (hazard ratio 0.8, p = 0.003) and heavy smokers are much less likely to get schizophrenia (hazard ratio 0.5)! A dose-dependent relationship was found between smoking and protection from schizophrenia. This is really interesting. 但如果你就混杂因子作了调整,吸烟者得精神分裂症的可能性相对就会较低(风险比为0.8,p=0.003),而重度嗜烟者患精神分裂症的可能性相对而言非常低(风险比为0.5)!在吸烟与避免精神分裂症之间居然找出了这种与剂量相关的关系,真是非常有意思。 Why do we find such different results from these two studies? The only explanation I can think of is that the second study controls for various factors including cannabis use, personality variables, IQ, past psychiatric diagnoses, and place of upbringing (thanks @su3su2u1 for the tip) and the first study controls for zilch. 为什么两项研究会得出如此不同的结论?我能想到的唯一解释就是,第二项研究对照控制了许多不同因素,包括吸食大麻、个性差异、智商、既往精神病诊断史、成长地点等(感谢网友@su3su2u1提示),而第一项研究没做任何控制。 In fact, we find that the second study’s uncontrolled numbers are not that different from the first study’s uncontrolled numbers, and that the only difference is that the second study then went on to control for confounders and get the opposite result. Controlling for more things is not always better, but controlling for a few things that previous studies and common sense suggest are very relevant is pretty superior to just leaving the data entirely unprocessed. Advantage very much second study. 实际上,我们发现第二项研究中未进行控制的因子数目跟第一项研究中未进行控制的因子数目没有多大出入,两者唯一的差别就是第二项研究进一步控制了混杂因子,然后就得出了相反的结论。控制的因子并不总是越多越好,但对此前研究和基本常识都认为,对非常相关的一些事项进行控制,比对数据完全不加任何处理的做法要好得多。第二项研究因而拥有压倒优势。 III. Unlike certain people on Facebook, I fucking hate science. Let me explain why. 跟Facebook上的某些人不同,我真他妈讨厌科学。让我来解释解释。 The first study here, Gurillo et al, was published ten years after the second study. Since it is a meta-analysis, it included the second study in it. The authors of the first study definitely read the second study. They just didn’t care. 此处提到的Gurillo等人所做的第一项研究,发表于第二项研究完成后的10年之后。由于它是一个荟萃分析,所以它的对象包括了第二项研究。该研究的作者们必定读过第二项研究。他们只是毫不在乎。 Nowhere in the first study does it say “By the way, we read this other study that got the opposite results from us, let’s try to figure out why, oh, it was because they controlled for things and we didn’t, maybe that should call our findings into question.” 第一项研究从未在任何地方说过:“此外,我们读到了另外一项研究,其结论与我们的正相对立;我们来看看原因是什么,哦,原来是因为他们对一些事项进行了控制而我们没有,这也许会对我们的发现构成质疑。” You know what they did do? They listed the second study as finding that smoking increased schizophrenia risk, because the rules of their meta-analysis said they would only take uncontrolled data, and so they did. You can read this entire study, which cites the second study no fewer than six times, without hearing at all about the fact that the second study got the opposite result using likely better methodology. 你知道他们实际干了什么吗?他们将第二项研究列为吸烟增加精神分裂症患病风险的发现之一,因为他们做荟萃分析的一项原则是只采用未控制的数据,他们也真是这么做的。你们可以读读其全文,它引用第二项研究不下六次,但在任何地方你都看不到它提及第二项研究利用可能更好的方法得出了完全相反的结论这一事实。 Then they go on to conclude that: 然后,他们在结论中说:
Cigarette smoking might be a hitherto neglected modifiable risk factor for psychosis, but confounding and reverse causality are possible. Notwithstanding, in view of the clear benefits of smoking cessation programs in this population, every effort should be made to implement change in smoking habits in this group of patients. 吸烟可能是引发精神病的可改造风险因素之一,这一点迄今为止一直为人所忽略。但是,混杂偏差和反向因果关系也有可能存在。尽管如此,考虑到在这一人群中实施戒烟计划的明显好处,我们应该全面努力,促使这一病患群体改变吸烟习惯。
Clear benefits! Every effort! Aaaaaaah! 明显好处!全面努力!啊哈哈哈哈! I mean, I know where they (and the Lancet editors, who write a glowing comment backing them up) are coming from. Smoking is bad because lung cancer, COPD, etc. But now we have these things called e-cigarettes! They deliver nicotine without tobacco! As far as anyone knows they carry vastly less risk of cancer, COPD, etc. If nicotine actually prevents schizophrenia rather than causing it, that is the sort of thing we should really want to know. And instead we’re just getting this “We should make schizophrenia patients stop smoking, because smoking is bad”. 我说,我知道他们(以及《柳叶刀》的编辑们,他们写了篇热情洋溢的评论支持前者)的出发点在哪儿。吸烟不好,因为会导致肺癌、慢性阻塞性肺炎等等。但我们现在已经有了所谓的电子烟!它们无需烟草就能提供尼古丁。如果尼古丁确实会预防而不是导致精神分裂症,这种事应该是我们确实想要明白知晓的。但是,我们听到的却是这样一些话:“我们应该让精神分裂症患者停止抽烟,因为抽烟不好。” Look. I am not going to come out and say that there’s great evidence that nicotine decreases schizophrenia risk. There’s one study, which other studies contradict. I happen to think that the one study looks better than its competitors, but that’s my opinion and I have nowhere near the evidence I would need to feel really strongly about this. 注意,我不是跳出来说有很强的证据表明尼古丁有助于减少精神分裂症患病风险。有一项研究这么说,还有许多研究跟它有抵触。我只是凑巧觉得,这项研究似乎比其他研究做得更好,当然这只是我的个人看法,要说我对这一想法的信念有多强烈,那根本还缺乏必要的证据支持。 But I feel like we are very far from the point where we know enough to be pushing people at risk of schizophrenia away from nicotine, and light-years away from the point where we can use phrases like “clear benefits”. 但是,我也认为,要说我们已经具备了足够的知识,以催促有精神分裂症患病风险的人远离尼古丁,那我们现在还差得远;要说使用“明显好处”一类的说法,那我们还差着很多光年。 Possibly I am an idiot and missing something very important. But if this is true, I wish the authors of the new study, and the editors of The Lancet, would have acknowledged the existence of the conflicting study and patiently explained to their readership, many of whom are idiots like myself, “Here’s a study that looks better than ours that seems to contradict our results, but here’s why our study is nevertheless far more believable.” That’s all I ask. 也许我是个笨蛋,忽略了一些非常重要的事情。但如果真是如此,我就希望上述新研究的作者们,以及《柳叶刀》的编辑们,能够承认与他们有相互冲突的研究存在,并能耐心地向读者们解释,因为许多读者跟我一样是笨蛋。“有项研究看起来比我们做得好,结论与我们的相反,但我们的研究仍然更可信,理由如下。”这才是我希望看到的。 No matter how much of an idiot I am, I can’t possibly imagine how that wouldn’t be a straight-out gain. 不管我有多么傻,我也根本无法想象,这么做怎么会不是一件彻头彻尾的好事。 PS: Cigarette smoking definitely decreases your risk of Parkinson’s Disease. Parkinson’s is similar to schizophrenia in that both involve dopamine. But schizophrenia involves too much dopamine and Parkinson’s too little, so the analogy could go either direction. 附:吸烟绝对会减少你患帕金森症的风险。帕金森症跟精神分裂症有些类似,两者都涉及到多巴胺。只是,精神分裂症是多巴胺过多,而帕金森症则是过少,所以该类比可以指向两个方向。【译注:即吸烟可能会减低,也可能会增加精神分裂症的风险。PPS: Tobacco smoking is definitely still bad! Nothing in here at all suggests that tobacco smoking has the slightest chance of not being a terrible decision! 又附:吸烟仍然绝对有害!本文没有任何地方说吸烟有可能不是个糟糕的决定,没门。 (编辑:辉格@whigzhou) *注:本译文未经原作者授权,本站对原文不持有也不主张任何权利,如果你恰好对原文拥有权益并希望我们移除相关内容,请私信联系,我们会立即作出响应。

——海德沙龙·翻译组,致力于将英文世界的好文章搬进中文世界——

一张膏药

【2016-05-21】

@深大-子豪:辉总,冒昧问句,能否略微点评一下《无穷的开始:世界进步的本源》这本书?打扰了。

@whigzhou: 没读过,看了看介绍,感觉我不会有兴趣,这个人的念头听起来挺幼稚的

@whigzhou: 【不懂量子力学,我就随便嘀咕几句】1)多重世界,多么偷懒而幼稚的一张膏药啊,2)Deutsch对波普证伪主义的解读,好像还是很朴素的那种,3)同时推崇多重世界膏药和证伪主义,不觉得哪里有问题?4)有关模因已有了各种幼稚理论,Deutsch又添了一个,5)基因和模因居然能和多重世界扯上关系,惊了~

1)我把一些理论称为膏药,是因为我认为它们背离了可证伪性原则,

2)按我所采用的贝叶斯阐释,所谓可证伪性,就是能够就如何(结构性地)修正我们的(more...)

标签: | |
7152
【2016-05-21】 @深大-子豪:辉总,冒昧问句,能否略微点评一下《无穷的开始:世界进步的本源》这本书?打扰了。 @whigzhou: 没读过,看了看介绍,感觉我不会有兴趣,这个人的念头听起来挺幼稚的 @whigzhou: 【不懂量子力学,我就随便嘀咕几句】1)多重世界,多么偷懒而幼稚的一张膏药啊,2)Deutsch对波普证伪主义的解读,好像还是很朴素的那种,3)同时推崇多重世界膏药和证伪主义,不觉得哪里有问题?4)有关模因已有了各种幼稚理论,Deutsch又添了一个,5)基因和模因居然能和多重世界扯上关系,惊了~ 1)我把一些理论称为膏药,是因为我认为它们背离了可证伪性原则, 2)按我所采用的贝叶斯阐释,所谓可证伪性,就是能够就如何(结构性地)修正我们的贝叶斯信念网络有所建议, 3)科学是我们构建和调整信念网络的一种方法,库恩的范式可理解为信念网络的结构模式,它决定一个信念网络由哪些节点组成, 3.1)当然,范式的内容还包括如何为信念网络获取输入的操作性规范。 4)拉卡托斯的纲领可理解为多层信念网络,所谓硬核就是最底层的那些节点,拉卡托斯为如何在接受新输入后调整信念网络给出了原则性指导:优先尝试调整上层结构,尽量别动下层结构, 5)范式给出之后,特定科学理论/假说为节点间向量赋值, 6)接受证伪的不是单一节点或向量,而是整个多层信念网络,或某一特定网络的某个局部,通常是某个高度内聚的局部, 7)一场科学地震的震级,是指整个信念网络的多大一部分需要拆掉重建, 8)丹内特的波普造物就是一部自学习的贝叶斯推断机, 9)波普说的客观知识就是一个外部(外于人脑)贝叶斯网络, 10)科学supposed to be一部由科学社区共同维护的贝叶斯推断机, 11)丹内特说的格里高列造物就是一部学会利用外部贝叶斯网络的贝叶斯推断机,  
一根小辫子

【2016-02-04】

@海德沙龙 《一个动听故事的破碎及永生》 诺奖得主Daniel Kahneman在《思考,快与慢》里讨论了一个有趣的发现,若考试时问题很难看清,得分会更高。这里的所谓考试,是由Shane Frederick发明的“认知反应测试”(CRT),Malcolm Gladwell觉得这个结论很爽,便将此事写进了《大卫与歌利亚》一书

@熊也餐厅: 不知道什么原因不太喜欢daniel kahneman~

@whigzhou: 呵呵说(more...)

标签: | |
7032
【2016-02-04】 @海德沙龙 《一个动听故事的破碎及永生》 诺奖得主[[Daniel Kahneman]]在《思考,快与慢》里讨论了一个有趣的发现,若考试时问题很难看清,得分会更高。这里的所谓考试,是由Shane Frederick发明的“认知反应测试”(CRT),[[Malcolm Gladwell]]觉得这个结论很爽,便将此事写进了《大卫与歌利亚》一书 @熊也餐厅: 不知道什么原因不太喜欢daniel kahneman~ @whigzhou: 呵呵说我呢,我确实说不清楚为何不喜欢Kahneman,大概就是股气味吧,不好闻 @whigzhou: 这回总算让我抓到了小辫子,以后就方便跟人解释为何我不喜欢Kahneman了,ps.特别讨厌Gladwell,这老兄体味更重 @whigzhou: 心理学实验重复不出来原本不算什么大不了的事情,很多(可能是大部分)心理学实验都重复不出来,但拿着单个实验在通俗文章里添油加醋大说特说,就让我很不爽,这种通俗文章看起来很科学很有耐心(你看人家能把一个实验讲的那么明白细致连我都看得懂),其实还不如不提实验直接说道理。  
[微言]用科学去塑造人

【2015-09-23】

@Ent_evo “用科学去塑造人,而不是让他们自然成长,这种想法让我们震惊……但这种想法当然是非理性的。……孩子所聆听的道德训诫,可能因为不科学而没有成效,但其意图也是塑造性格,就像赫胥黎笔下的耳语机器一样。因此,看起来我们并不反对塑造人,只要它很低效就行;我们反对的只是高效的塑造。”-罗素

@whigzhou: 罗素一谈社会就幼稚的一塌糊涂,也不想想,谁有资格塑造人?怎么算高效?目标不明怎么算效率?“用科学塑造人”又是什么意思?把(more...)

标签: | |
6793
【2015-09-23】 @Ent_evo “用科学去塑造人,而不是让他们自然成长,这种想法让我们震惊……但这种想法当然是非理性的。……孩子所聆听的道德训诫,可能因为不科学而没有成效,但其意图也是塑造性格,就像赫胥黎笔下的耳语机器一样。因此,看起来我们并不反对塑造人,只要它很低效就行;我们反对的只是高效的塑造。”-罗素 @whigzhou: 罗素一谈社会就幼稚的一塌糊涂,也不想想,谁有资格塑造人?怎么算高效?目标不明怎么算效率?“用科学塑造人”又是什么意思?把孩子泡在一堆论文里?万一科学研究发现泡在传统里更“高效”呢? @陈胡子伯爵:感觉你没明白罗素的意思,科学的培养是指用科学的方法培养而不是让孩子读科学论文。 @whigzhou: 1)你的脑补是你的,我不喜欢替人脑补,2)假设你的脑补成立,那么,将“科学方法”和传统方法对立起来之前,你先得证明“把孩子泡在传统/习俗/宗教里”的培养方法是不科学的 @whigzhou: 我尤其不喜欢替分析哲学家脑补,作为分析哲学家,有义务自己把话说清楚  
[译文]一个动听故事的破碎及永生

A Trick For Higher SAT scores? Unfortunately no.
SAT高分有诀窍?很不幸,不是。

作者:Terry Burnham @ 2015-4-20
译者:沈沉(@你在何地-sxy)
校对:Drunkplane(@Drunkplane-zny)
来源:AEON, http://www.terryburnham.com/2015/04/a-trick-for-higher-sat-scores.html

Wouldn’t it be cool if there was a simple trick to score better on college entrance exams like the SAT and other tests?

如果SAT之类的大学入学考试和其他考试都有得高分的简单诀窍,岂不是很爽?

There is a reputable claim that such a trick exists. Unfortunately, the trick does not appear to be real.

根据某个著名说法,确实有诀窍。不幸的是,这一诀窍似乎并不可靠。

This is the story of an academic paper where I am a co-author with possible lessons for life both inside and outside the Academy.

这里要讲的是我参与写作的一篇学术论文的故事,它对学术内外的生活可能都会有些教益。

png;base642968fe44110dd3fdIn the spring of 2012, I was reading Nobel Laureate Daniel Kahneman’s book, Thinking, Fast and Slow. Professor Kahneman discussed an intriguing finding that people score higher on a test if the questions are hard to read. The particular test used in the study is the CRT or cognitive reflection task invented by Shane Frederick of Yale. The CRT itself is interesting, but what Professor Kahneman wrote was amazing to me,

2012年春,我读了诺贝尔奖获得者Daniel Kahneman的书《思考,快与慢》。Kahneman教授讨论了一个非常有趣的发现,如果考试时的问题很难看清,人们得分就会更高。这一研究中用到的具体考试,是由耶鲁大学的Shane Frederick发明的“认知反应任务”(CRT)【译注:应为“认知反应测试”,原文有误】。CRT本身很有意思,但Kahneman教授的说法更是令我惊愕。

“90% of the students who saw the CRT in normal font made at least one mistake in the test, but the proportion dropped to 35% when the font was barely legible. You read this correctly: performance was better with the bad font.”

“通过正常字体阅读CRT试卷的测试学生中,有90%至少会做错一道题,但如果试卷字体勉强才能辨认,这个比例就会下降到35%。把这句话读准了:坏字体伴随着好成绩。”

I thought this was so cool. The idea is simple, powerful, and easy to grasp. An oyster makes a pearl by reacting to the irritation of a grain of sand. Body builders become huge by lifting more weight. Can we kick our brains into a higher gear, by making the problem harder?

我觉得这简直太爽了。这个想法简单、有力且容易掌握。蚌壳受沙粒刺激作出反应,就会生出珍珠。健身者加大举重重量就会增加块头。我们是否能通过把问题搞难,来加大大脑马力?

png;base64ff53b96183f53427Malcolm Gladwell also thought the result was cool. Here is his description his book, David and Goliath:

Malcolm Gladwell也觉得这个结论很爽。以下是他在《大卫与歌利亚》一书中的描述:

The CRT is really hard. But here’s the strange thing. Do you know the easiest way to raise people’s scores on the test? Make it just a little bit harder. The psychologists Adam Alter and Daniel Oppenheimer tried this a few years ago with a group of undergraduates at Princeton University. First they gave the CRT the normal way, and the students averaged 1.9 correct answers out of three. That’s pretty good, though it is well short of the 2.18 that MIT students averaged. Then Alter and Oppenheimer printed out the test questions in a font that was really hard to read … The average score this time around? 2.45. Suddenly, the students were doing much better than their counterparts at MIT.

“CRT真是很难。但这里有个怪事。要提高人们的考试得分,你知道什么方法最简单吗?只需把考题整得更难一点。心理学家Adam Alter和Daniel Oppenheimer几年前在普林斯顿大学拿一群本科生做过实验。首先他们用常规方式搞了一次CRT考试,学生平均表现是3道题里做对1.9道。很不错,但比起麻省理工学生平均做对2.18道可差远了。然后Alter和Oppenheimer用一种很难辨读的字体打印了测试问题……这次的平均得分?2.45。学生们突然就比麻省理工的对手要强了。”

png;base6483d2eaf734995958As I read Professor Kahneman’s description, I looked at the clock and realized I was teaching a class in about an hour, and the class topic for the day was related to this study. I immediately created two versions of the CRT and had my students take the test – half with an easy to read presentation and half with a hard to read version.

读着Kahneman教授的上述描写时,我看了看表,发现还有约一个小时我就要去上课,课程当天的主题正与这一研究相关。我立即就制作了两种版本的CRT——一半易读、一半难(more...)

标签: |
6484
A Trick For Higher SAT scores? Unfortunately no. SAT高分有诀窍?很不幸,不是。 作者:Terry Burnham @ 2015-4-20 译者:沈沉(@你在何地-sxy) 校对:Drunkplane(@Drunkplane-zny) 来源:AEON, http://www.terryburnham.com/2015/04/a-trick-for-higher-sat-scores.html Wouldn’t it be cool if there was a simple trick to score better on college entrance exams like the SAT and other tests? 如果SAT之类的大学入学考试和其他考试都有得高分的简单诀窍,岂不是很爽? There is a reputable claim that such a trick exists. Unfortunately, the trick does not appear to be real. 根据某个著名说法,确实有诀窍。不幸的是,这一诀窍似乎并不可靠。 This is the story of an academic paper where I am a co-author with possible lessons for life both inside and outside the Academy. 这里要讲的是我参与写作的一篇学术论文的故事,它对学术内外的生活可能都会有些教益。 png;base642968fe44110dd3fdIn the spring of 2012, I was reading Nobel Laureate Daniel Kahneman’s book, Thinking, Fast and Slow. Professor Kahneman discussed an intriguing finding that people score higher on a test if the questions are hard to read. The particular test used in the study is the CRT or cognitive reflection task invented by Shane Frederick of Yale. The CRT itself is interesting, but what Professor Kahneman wrote was amazing to me, 2012年春,我读了诺贝尔奖获得者Daniel Kahneman的书《思考,快与慢》。Kahneman教授讨论了一个非常有趣的发现,如果考试时的问题很难看清,人们得分就会更高。这一研究中用到的具体考试,是由耶鲁大学的Shane Frederick发明的“认知反应任务”(CRT)【译注:应为“认知反应测试”,原文有误】。CRT本身很有意思,但Kahneman教授的说法更是令我惊愕。
“90% of the students who saw the CRT in normal font made at least one mistake in the test, but the proportion dropped to 35% when the font was barely legible. You read this correctly: performance was better with the bad font.” “通过正常字体阅读CRT试卷的测试学生中,有90%至少会做错一道题,但如果试卷字体勉强才能辨认,这个比例就会下降到35%。把这句话读准了:坏字体伴随着好成绩。”
I thought this was so cool. The idea is simple, powerful, and easy to grasp. An oyster makes a pearl by reacting to the irritation of a grain of sand. Body builders become huge by lifting more weight. Can we kick our brains into a higher gear, by making the problem harder? 我觉得这简直太爽了。这个想法简单、有力且容易掌握。蚌壳受沙粒刺激作出反应,就会生出珍珠。健身者加大举重重量就会增加块头。我们是否能通过把问题搞难,来加大大脑马力? png;base64ff53b96183f53427Malcolm Gladwell also thought the result was cool. Here is his description his book, David and Goliath: Malcolm Gladwell也觉得这个结论很爽。以下是他在《大卫与歌利亚》一书中的描述:
The CRT is really hard. But here’s the strange thing. Do you know the easiest way to raise people’s scores on the test? Make it just a little bit harder. The psychologists Adam Alter and Daniel Oppenheimer tried this a few years ago with a group of undergraduates at Princeton University. First they gave the CRT the normal way, and the students averaged 1.9 correct answers out of three. That’s pretty good, though it is well short of the 2.18 that MIT students averaged. Then Alter and Oppenheimer printed out the test questions in a font that was really hard to read … The average score this time around? 2.45. Suddenly, the students were doing much better than their counterparts at MIT. “CRT真是很难。但这里有个怪事。要提高人们的考试得分,你知道什么方法最简单吗?只需把考题整得更难一点。心理学家Adam Alter和Daniel Oppenheimer几年前在普林斯顿大学拿一群本科生做过实验。首先他们用常规方式搞了一次CRT考试,学生平均表现是3道题里做对1.9道。很不错,但比起麻省理工学生平均做对2.18道可差远了。然后Alter和Oppenheimer用一种很难辨读的字体打印了测试问题……这次的平均得分?2.45。学生们突然就比麻省理工的对手要强了。”
png;base6483d2eaf734995958As I read Professor Kahneman’s description, I looked at the clock and realized I was teaching a class in about an hour, and the class topic for the day was related to this study. I immediately created two versions of the CRT and had my students take the test - half with an easy to read presentation and half with a hard to read version. 读着Kahneman教授的上述描写时,我看了看表,发现还有约一个小时我就要去上课,课程当天的主题正与这一研究相关。我立即就制作了两种版本的CRT——一半易读、一半难读,让我的学生去考。
(1) A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost? _____ cents (1) 球棒和球共需1.1美元。球棒比球要贵1美元。请问球需多少美分? (1) A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost? _____ cents (in my experiment, I used Haettenschweiler - I do not know how to get blogger to display Haettenschweiler). (1) 球棒和球共需1.1美元。球棒比球要贵1美元。请问球需多少美分?(考试中,此处用的是Haettenschweiler字体)
Within 3 hours of reading about the idea in Professor Kahneman’s book, I had my own data in the form of the scores from 20 students. Unlike the study described by Professor Kahneman, however, my students did not perform any better statistically with the hard-to-read version. I emailed Shane Frederick at Yale with my story and data, and he responded that he was doing further research on the topic. 在读过Kahneman教授书中的观点后不到三小时,我就拿到了自己的数据——20个学生的成绩。不过,跟Kahneman教授所述研究不同,统计上而言,我的学生在难读版测试中并没有表现更好。我把我的故事和数据邮寄给了耶鲁的Shane Frederic,他当时说他正在就此问题做进一步研究。 Roughly 3 years later, Andrew Meyer, Shane Frederick, and 8 other authors (including me) have published a paper that argues the hard-to-read presentation does not lead to higher performance. 大概三年以后,Andrew Meyer, Shane Frederick及其他8名作者(包括我)发表了一篇论文,论证说,难读的试题并不会带来更好的成绩。 The original paper reached its conclusions based on the test scores of 40 people. In our paper, we analyze a total of over 7,000 people by looking at the original study and 16 additional studies. Our summary: 最早那篇论文的结论来自40个人的测试得分。我们的论文则通过检视原初研究和其余16项研究,分析对象总数超过7000人。我们的总结是:
Easy-to-read average score: 1.43/3 (17 studies, 3,657 people) Hard-to-read average score: 1.42/3 (17 studies, 3,710 people) 易读版平均得分:1.43/3(17项研究,3657人) 难读版平均得分:1.42/3(17项研究,3710人)
Malcolm Gladwell wrote, “Do you know the easiest way to raise people’s scores on the test? Make it just a little bit harder.” The data suggest that Malcolm Gladwell’s statement is false. Here is the key figure from our paper with my annotations in red: Malcolm Gladwell写道,“人们要想提高考试得分,你知道什么方法最简单吗?把考题整得更难一点。”数据显示,Malcolm Gladwell的说法是错的。以下是我们所写论文的关键图表,我的注解标红:   png;base64db8e9525745b448I take three lessons from this story. 从这个故事中我得到三条教训。 1.Beware simple stories. 1.提防简单的故事 “The price of metaphor is eternal vigilance.” Richard Lewontin attributes this quote to Arturo Rosenblueth and Norbert Wiener. “比喻的好处须以永恒的警惕换取。”Richard Lewontin将这一名言归于Arturo Rosenblueth 和 Norbert Wiener所说。 The story told by Professor Kahneman and by Malcolm Gladwell is very good. In most cases, however, reality is messier than the summary story. Kahneman教授和Malcolm Gladwell讲的故事非常动听。但在多数情况中,现实都比简洁的故事要凌乱。 2.Ideas have considerable “Meme-mentum” 2.观念具有相当大的“模因惯性”And yet it moves,” This quote is attributed to Galileo when forced to retract his statement that the earth moves around the sun. “但是它仍在运转”,这一名言被认为是伽利略被迫收回其地球绕日运动学说时所说。 The message is that It takes a long time to change conventional wisdom. The earth stayed at the center of the universe for many people for decades and even centuries after Copernicus. 启示就是,要改变传统观点需要花费很长时间。在哥白尼之后的数十年甚至数世纪中,地球对许多人而言仍是宇宙的中心。 png;base6419c7919457087521I expect that the false story as presented by Professor Kahneman and Malcolm Gladwell will persist for decades. Millions of people have read these false accounts. The message is simple, powerful, and important. Thus, even though the message is wrong, I expect it will have considerable momentum (or meme-mentum to paraphrase Richard Dawkins). 我预料,由Kahneman教授和Malcolm Gladwell所说的错误故事会继续存在几十年。数百万人读过这些错误说法。这个讯息简单、有力且重要无比。因此,尽管它是错的,我预测它会具有相当大的惯性动量(或借用Richard Dawkins的话说,模因惯性)。 One of my favorite examples of meme-mentum concerns stomach ulcers. Barry Marshall and Robin Warren faced skepticism to their view that many stomach ulcers are caused by bacteria (Helicobacter pylori). Professor Marshall describes the scientific response to his idea as ridicule; in response he gave himself an ulcer drinking the bacteria. Marshall gives a personal account of his self-infection in his Nobel Prize acceptance video (the self-infection portion starts at around 25:00). 我最喜欢援引的模因惯性例证之一跟胃溃疡有关。Barry Marshall和Robin Warren认为许多胃溃疡源于细菌(幽门螺杆菌),这一观点遭到质疑。Marshall教授称,科学界的反应是认为他的观点十分可笑;作为回应,他服用细菌并让自己患上了溃疡。在其接受诺贝尔奖的视频中,Marshall自己描述了这一自我感染经历。png;base64272736908a379b1 3.We can measure the rate of learning. 3.我们可以测量学习的速率 We can measure the rate of learning. Google scholar counts the number of times a paper is cited by other papers. I believe that well-informed scholars who cite the original paper ought to cite the subsequent papers. We can watch in real-time to see if that is true. 我们可以测量学习的速率。“谷歌学术”计算某论文被其他论文征引的次数。我认为,渊博的学者,在引用了原初的研究论文之后,也应该引用其后相关的论文。我们能实时观测这一想法是否为真。
Paper 论文 Comment 备注 citations as of April 20, 2015 2015.4.20之前引用数 citations as of today 迄今为止引用数
Alter et al. (2007). "Overcoming intuition: metacognitive difficulty activates analytic reasoning." Journal of Experimental Psychology: General 136(4): 569. Alter等人(2007)。“克服直觉:元认知困难能激活分析推理”,《实验心理学杂志:总论》 136(4):569 Original paper showing hard-to-read leads to higher scores 最早提出难读导致高分的论文   344 click for current count 点击链接查看当前数字
Thompson et al. (2013). "The role of answer fluency and perceptual fluency as metacognitive cues for initiating analytic thinking." Cognition 128(2): 237-251. Thompson等人(2013)。“回答流利性和感知流利性作为推动分析推理的元认知触发物”,《认知》 128(2):2237-251 Paper contradicts Alter at. al by reporting no hard-to-read effect. 与Alter等人相左,报告不存在“难读高分”效应的论文 38 click for current count 点击链接查看当前数字
Meyer et al. (2015). "Disfluent fonts don’t help people solve math problems." Journal of Experimental Psychology: General 144(2): e16. Meyer等人(2015)。“繁难字体对于人们解决数学问题并无助益”,《实验心理学杂志:总论》 144(2): e16 Our paper summarizing the original study and 16 others. 我们概述原初研究和后续16项研究的论文 0 (this “should” increase at least as fast as citations for Alter et. al, 2007) 0(引用数的增长速度“本应”至少与Alter等人2007年论文相同) click for current count 点击链接查看当前数字
(编辑:辉格@whigzhou) *注:本译文未经原作者授权,本站对原文不持有也不主张任何权利,如果你恰好对原文拥有权益并希望我们移除相关内容,请私信联系,我们会立即作出响应。

——海德沙龙·翻译组,致力于将英文世界的好文章搬进中文世界——

[微言]可得材料与方法论

【2015-08-15】

@whigzhou: 历史学(按年代和文明而分的)各领域,方法论差异极大,乃至形成不同学派,这种分化的主要原因,依我看,是可得材料的数量和性质差异,材料少的可怜时,研究者必须放宽视野,从更一般原理做推断,拟构出最合理的假说,材料多而难懂时,则侧重于解码,材料多而质量差时,则重考据,至于量化研究……

@whigzhou: 那些在特定领域选定或创造了适当方法论的历史学家,便有机会成为该领域之宗师,(more...)

标签: |
6384
【2015-08-15】 @whigzhou: 历史学(按年代和文明而分的)各领域,方法论差异极大,乃至形成不同学派,这种分化的主要原因,依我看,是可得材料的数量和性质差异,材料少的可怜时,研究者必须放宽视野,从更一般原理做推断,拟构出最合理的假说,材料多而难懂时,则侧重于解码,材料多而质量差时,则重考据,至于量化研究…… @whigzhou: 那些在特定领域选定或创造了适当方法论的历史学家,便有机会成为该领域之宗师,与其方法论所对应的禀赋、旨趣、特长、技术,塑造了这门学科的气质,一旦确立,与之不合者便不为其所容,于是一个学派便固化了下来,革新力量只能来自外部。  
[微言]科学解释不了的事情

【2015-07-17】

@押沙龙 那些骗子最喜欢说的就是:“科学也有解释不了的事情”,其实科学要是跟他们一样不要脸,科学也什么都能解释。

@Ent_evo: 不不不,其实在百分之九十九的情况下,当科学家说他不能解释的时候,意思其实是“我们有好多种解释,但还不知道哪一种或者哪几种是真的”

@whigzhou: 99%太夸张了,假如导致某些现象(或差异)的过程过于复杂以至无法用普通人可读的方式叙述,即便可以用一个模型来演示,也很难说是“可解释的”< (more...)

标签: |
6282
【2015-07-17】 @押沙龙 那些骗子最喜欢说的就是:“科学也有解释不了的事情”,其实科学要是跟他们一样不要脸,科学也什么都能解释。 @Ent_evo: 不不不,其实在百分之九十九的情况下,当科学家说他不能解释的时候,意思其实是“我们有好多种解释,但还不知道哪一种或者哪几种是真的” @whigzhou: 99%太夸张了,假如导致某些现象(或差异)的过程过于复杂以至无法用普通人可读的方式叙述,即便可以用一个模型来演示,也很难说是“可解释的” @whigzhou: 假如用来演示和说明的模型和实际过程差不多复杂(用黑话说就是该过程不可化约),这种现象就可以说是无法解释的,更何况,许多情况连准确建模都做不到 @whigzhou: 所谓混沌便是造成不可化约的一种情况,凡有混沌环节存在的地方,解释都难以进行,问题是这种情况并不少见  
再论中医

多年前我曾就中医发表过一些观点,今天不小心又提起这个话题,刚好这几年又有些新体会,再整理补充一下:

1)中医这个词的含义不太清楚,按较狭窄的用法,它是指一套理论体系(诸如阴阳五行、五脏六腑、气血经络、寒热干湿、温凉甘苦……),以及被组织在这套体系之内的各种治疗方法,而按较宽泛的用法,则囊括了所有存在于汉文化中的非现代医疗;

2)对于那套理论体系,我的态度是完全唾弃;

3)对于被归在中医名下的各种治疗方法,我的态度和对待其他前科学的朴素经验一样,持高度怀疑的态度;

4)但我不会像有些反中医者那样,做出一个强判断:它们(more...)

标签: | | | |
6276
多年前我曾就中医发表过一些观点,今天不小心又提起这个话题,刚好这几年又有些新体会,再整理补充一下: 1)中医这个词的含义不太清楚,按较狭窄的用法,它是指一套理论体系(诸如阴阳五行、五脏六腑、气血经络、寒热干湿、温凉甘苦……),以及被组织在这套体系之内的各种治疗方法,而按较宽泛的用法,则囊括了所有存在于汉文化中的非现代医疗; 2)对于那套理论体系,我的态度是完全唾弃; 3)对于被归在中医名下的各种治疗方法,我的态度和对待其他前科学的朴素经验一样,持高度怀疑的态度; 4)但我不会像有些反中医者那样,做出一个强判断:它们都是无用的或错误的; 5)我相信,这些疗法中,有不少大概是有点用的; 6)然而,现代医疗的发展,大幅改变了利用这些可能用处的机会成本和得失比,依我看,改变的程度已达到:其中没有什么是值得考虑到,我甚至认为,作为医疗消费者,认真考虑这些可能用处,会显得很愚蠢; 7)考虑到中医界普遍拒绝按现代医学标准去审查旧疗法,对这些疗法持总体负面评价(即所谓一棍子打死),是完全合理的,在我看来,今天一位医生宣称自己是中医,或推崇中医,仅这一点,足以让他变得不值得信任; 8)但是这一评价方式不适用于过去,在现代医疗普及之前,一位相信传统疗法的医生,也完全可能是明智的、理性的、具有批判性头脑的,甚至具有一些朴素科学态度的,据我了解,许多被归为中医的医生,其实对那些理论说辞没什么兴趣,他们只是相信一些特定疗法,而且也愿意随经验而调整自己的信念; 9)我相信(虽然没什么经验依据),在近代以前,或多或少有点用处的中医疗法,很可能比现在多不少,但随着现代医疗的普及,幸存下来的中医疗法中,有用的比例降低了,剩下的基本上都是没用的;理由是, 10)在科学方法出现之前,对传统知识的筛选机制是基于个体经验和口碑传播的,这一选择机制有个特点:因果链容易从随机个体经验中得到识别的那些事情上,知识改进和积累更可能发生,而在因果链不容易识别的那些地方,便是迷信的温床; 11)在现代医疗普及的过程中,大众对待新旧疗法的态度上,上述筛选机制仍会起作用,因而,传统疗法中那些被用于因果链较明显的病症上因而很可能有点用的疗法,反而更容易被现代疗法所淘汰,结果,剩下的都是安慰剂,因果关系越是难以看清,对安慰剂的需求就越大,这大概就是当代中医的情况,在现代医疗的排挤下,它已经转变成了一个比以往远更纯粹的安慰剂产业。  
[微言]主流意见与政治纷争

【2015-07-17】

@whigzhou: 当一个议题被政治纷争笼罩时,我就不会相信什么“科学界主流意见”,这些意见几十年后再看估计扯蛋居多,判断这种局面出现的几个线索:1)向来谨慎的科学家突然变得信誓旦旦起来,2)专业跟议题距离很远的科学家突然大批掺和进来,3)动辄几百上千联名公开信,4)动机论阴谋论开始盛行……

@你国人民感情伤害专家: 说人碳暖球呢。

@whigzhou: 很多,从塞维利亚信条,种族差异,人碳暖球,智力测量,同性教育……(more...)

标签: | |
6245
【2015-07-17】 @whigzhou: 当一个议题被政治纷争笼罩时,我就不会相信什么“科学界主流意见”,这些意见几十年后再看估计扯蛋居多,判断这种局面出现的几个线索:1)向来谨慎的科学家突然变得信誓旦旦起来,2)专业跟议题距离很远的科学家突然大批掺和进来,3)动辄几百上千联名公开信,4)动机论阴谋论开始盛行…… @你国人民感情伤害专家: 说人碳暖球呢。 @whigzhou: 很多,从塞维利亚信条,种族差异,人碳暖球,智力测量,同性教育…… @慕容飞宇gg: 社会心理学界左得一塌糊涂,那些同性父母对儿童没有影响的结论根本不足为信。看他们对相反结论签名抗议的闹剧也就知道了 @whigzhou: 我看的不多,让我纳闷的是,同性家庭历史才多长?他们有机会大致正常养孩子的历史才多长?观察养育效果的合理周期是多长?这么快就有结论了还信誓旦旦?  
[微言]理性与直觉

【2015-07-06】

@whigzhou: 有关营养/代谢及其与健康之间关系,我唯一能确定的是:以前从大众传媒中看到的建议都是靠不住的。引起我怀疑的几点:许多结论都是基于老鼠实验得出;连最基础的卡路里计算都不靠谱(燃烧法远远不能模拟消化过程);科学界的众说纷纭;从科学研究到公共建议之间的政治扭曲;等等。

@whigzhou: 如果我不得不采纳一种意见,我宁愿相信来自灵长类学家和人类学家的,而不是农业部或WHO的,宁愿相信我所信赖的人从亲身经历中获得的经验,总之,暂不接受任何总体方案,每条建议视其方法和理由个别对(more...)

标签: | |
6210
【2015-07-06】 @whigzhou: 有关营养/代谢及其与健康之间关系,我唯一能确定的是:以前从大众传媒中看到的建议都是靠不住的。引起我怀疑的几点:许多结论都是基于老鼠实验得出;连最基础的卡路里计算都不靠谱(燃烧法远远不能模拟消化过程);科学界的众说纷纭;从科学研究到公共建议之间的政治扭曲;等等。 @whigzhou: 如果我不得不采纳一种意见,我宁愿相信来自灵长类学家和人类学家的,而不是农业部或WHO的,宁愿相信我所信赖的人从亲身经历中获得的经验,总之,暂不接受任何总体方案,每条建议视其方法和理由个别对待。 @迦列: 一个人可以在某些问题上持有格列高利心智,而在另外一些问题上仅仅持有达尔文心智,而公共政策的制定本以每个人均持有较高级心智为假设,最终却成了低级心智持有者的修罗场。 @whigzhou: 你对格列高利/达尔文心智之分的理解和我不同,我在原帖中表达的,是一种“在某些事情上宁可信赖直觉”的态度 @whigzhou: 我说的“宁可信赖直觉”,是相对于信赖一套科学理论或由科学理论/知识所支撑的实践指导方案,这种信赖不能被归为达尔文心智,因为它不必是基于一组硬编码而做出的刻板反应,而可以是基于大量个体经验,听取了大量他人意见,经过了深思熟虑之后做出的选择,并对未来调整保持开放,因而是格列高利式的。 @whigzhou: 之所以在经过权衡之后做出如此选择,是因为我在尝试寻找科学意见的过程中,逐渐意识到,科学界在此问题上还远没有形成共识,其研究深度和成果质量也远未达到能够产生一个可让我接受的整体指导方案的程度 @whigzhou: 简言之,我所信赖的“直觉”,并不是由一个达尔文造物按硬编码所产生的直觉,而是由一个装载了大量经验的格列高利造物在充分运用理性能力之后所产生的直觉 @whigzhou: 一个相似的例子是气候问题,早些年这个话题还很热门时,出于写评论的需要,我读了大量相关材料,但越读越感觉这里水太深,从各种论证理由和举证材料看,科学界显然还没找到一种成熟可靠且得到广泛认可的气候理论,远不足以给出一个高置信度的判断,更不足以提供一套可信赖的行动方案。 @whigzhou: 一旦得出这一判断,我的选择便是:拉远镜头、眯起眼睛,让我的直觉告诉我哪些说法更靠谱一些,除此之外,我真不知道还有什么更好的方法,我不可能把一生精力投入到这件对我并非最重要的事情上(而且就算我这么做也未必有什么结果),但我确实也有点兴趣,想得到一个判断,于是只好用直觉赌一赌了。  
[微言]主流意见与科学论证

【2015-06-11】

@格林黑风:请问如何看待“科学界主流意见”?在温室气体事件中丁院士反驳“科学界有主流意见吗?”在转基因事件中某科学家告诉代表科学界主流告诉崔永元转基因无害(然后崔代表传媒界主流质疑这个结论

@whigzhou: 主流意见是指占多数的一群人就某问题所表达出的共同信念,是可以在经验上检查的,因而是有意义的

@whigzhou: 诉诸主流意见是一种合理的论证方式,但这种论证不是逻辑推导,实际上我们平常见到(包括在学术著作里看到)的绝大多数论证都不是逻辑推导,而是一种“说服你相信”的论证,由于有N种理由可以让你相信一件事,所以论证也可以从N个方面入手,因为某个意见是主流而相信它,并没有什么不合理之处。

@whigzhou: 我们常听到对各种所谓“逻辑谬误”的批评,比如这份清单里的 http://t.cn/R2WYSVv 问题是其中罗列的许多方法,其使用者根本无意做逻辑推导,称之为逻辑谬误是不得要领的。当然这不是说它(more...)

标签: | |
6144
【2015-06-11】 @格林黑风:请问如何看待“科学界主流意见”?在温室气体事件中丁院士反驳“科学界有主流意见吗?”在转基因事件中某科学家告诉代表科学界主流告诉崔永元转基因无害(然后崔代表传媒界主流质疑这个结论 @whigzhou: 主流意见是指占多数的一群人就某问题所表达出的共同信念,是可以在经验上检查的,因而是有意义的 @whigzhou: 诉诸主流意见是一种合理的论证方式,但这种论证不是逻辑推导,实际上我们平常见到(包括在学术著作里看到)的绝大多数论证都不是逻辑推导,而是一种“说服你相信”的论证,由于有N种理由可以让你相信一件事,所以论证也可以从N个方面入手,因为某个意见是主流而相信它,并没有什么不合理之处。 @whigzhou: 我们常听到对各种所谓“逻辑谬误”的批评,比如这份清单里的 http://t.cn/R2WYSVv 问题是其中罗列的许多方法,其使用者根本无意做逻辑推导,称之为逻辑谬误是不得要领的。当然这不是说它们都是无可指摘的,而是说不能先假定它们是逻辑推导而指摘。 @whigzhou: 比如我自己,假如我对某一问题除一般流行报道之外毫无所知,就会选择相信主流意见,但假如我恰好了解一位我相当信任的学者对此事的看法,而且对他在该领域的判断力有信心,就可能不顾主流意见而相信他,或者假如我恰好在这个问题上花过很多功夫,仔细检查了各方意见及其论证过程,就可能坚持自己的看法 @whigzhou: 啃完半只烤鸭,再说说论证这件事。让我们考虑那些最困难的论证,说它最困难是因为你的听众压根不接受你的概念框架和方法论(用库恩的话说就是你的范式),在他们看来你用的词汇都是无意义的、你举证的方法也不对,最重要的是,你根本没抓住问题的重点!我想说的是,即便在这种情况下,论证也可以展开。 @whigzhou: 此时,论证的首要任务是让对方接受你的范式,也就是引导他完成范式转换,用心理学术语说,就是格式塔转换。怎么引导呢?考虑一下立体图([[stereogram]])和歧义图([[ambiguous images]])的情况就明白了(不了解这两个概念的朋友可以上维基查一下这两个词条)。 @whigzhou: 设想你的一位朋友从未见过或听说过立体图,现在你试图说服他相信眼前这幅图里有一匹马,你会怎么做?很明显,你会做出一连串指示:把画举到和眉毛一样高,让眼睛距离它15cm,慢慢眯起眼睛,假如还没看出来,微微转动一下脖子…… @whigzhou: 假如他试了半分钟还没看出来,然后说你在胡扯,你会怎么办?一个显而易见的办法是,叫来另外几个人,当他们都宣布看出来了,你就有了一个主流意见,显然,这个主流意见的存在会说服他继续使劲看,而不是简单认为你在胡扯 @whigzhou: 我通过这个例子想说明的是,在开始我们在科学研究中常见的那种常规论证之前,需要一些基本前提(在我们的例子里,他至少得具有正常的人类视觉)和预备性论证,这些工作与逻辑完全无关,是一种达成合作的方式,他要么已经接受了你的范式,要么愿意配合你的一系列指示,去尝试转换到这一范式。 @whigzhou: 当然,在科学(特别是自然科学)著作中,我们很少见到这个层次上的论证,那是因为每个既已确立的学科,在范式上早已经达成共识,不需要说服谁相信“这幅画里有一匹马”,需要说服的是对一些具体细节的共识,比如“这幅画里有一匹怀孕三个月的母马在向南奔跑” @whigzhou: 和自然科学相比,社会科学最大的不同是,在范式上的共识很少很弱,因而大量的论证只能停留在这一层次上,通过种种努力,引导对方作出格式塔转换 @whigzhou: 这种努力有意义吗?当然有,想想你第一次看出立体图时的兴奋,就能体会到它的意义  
[微言]蛋蛋与科学

【2015-06-07】

@吴昊老是重名很无奈 @whigzhou 辉总,知乎看到这个问题,觉得高票答案扯蛋,却自己提不出最合适解释,您怎么看?【为什么人类的睾丸长在体腔外?】刘哈哈:转自豆瓣–南度的日记:《蛋疼三部曲》之一:

@whigzhou: 大概看了下,对问题的描述和介绍的各种假说挺有意思,但他自己的分析不行,比如他老是用“这个解释虽然漂亮,但却不能解释为什么其他动物不把睾丸放在外面,难道它们的精子就不需要磨练么?”这种说辞来反驳,他显然没意识到:这个逻辑可以秒杀任何进化生物学解释。

@吴昊老是重名很无奈: 是的,我也是这(more...)

标签: | |
6131
【2015-06-07】 @吴昊老是重名很无奈 @whigzhou 辉总,知乎看到这个问题,觉得高票答案扯蛋,却自己提不出最合适解释,您怎么看?【为什么人类的睾丸长在体腔外?】刘哈哈:转自豆瓣--南度的日记:《蛋疼三部曲》之一: @whigzhou: 大概看了下,对问题的描述和介绍的各种假说挺有意思,但他自己的分析不行,比如他老是用“这个解释虽然漂亮,但却不能解释为什么其他动物不把睾丸放在外面,难道它们的精子就不需要磨练么?”这种说辞来反驳,他显然没意识到:这个逻辑可以秒杀任何进化生物学解释。 @吴昊老是重名很无奈: 是的,我也是这个感觉,他对很多假说的反驳还是有道理的,虽然经不起深究。我想过是不是性选择造成的,但是似乎不像男性对女性的乳房一样,女性对男性的睾丸外挂却没有表现出相应的心理机制。 @whigzhou: 既然外挂在哺乳动物中那么普遍,这事情肯定不能从人类的条件去想 @姚广孝_wayne:然而进化生物学现在走进了一个误区,即喜欢用“这样有什么好处”来替代本来想论证的“为什么会这样”,而前者往往只需要首先脑补,然后寻找证据 @whigzhou: 找出“这样有什么好处”是论证“为什么会这样”的重要步骤,先构造假说,再找数据验证,这难道不是科学研究的常规方法吗? @whigzhou: 这和破案中考虑作案动机是一个道理,可供探索的可能性空间几乎是无限的,不借助某些线索的启发,就只能瞎蒙乱撞,瞎蒙乱撞不是科学方法 @whigzhou: 进化生物学家研究性状起源时,和通过反向工程破解电路板的人一样,采取的是丹内特所称的设计立场,也就是功能主义立场,即,首先假定它是具有某种功能的,然后猜测它可能具有什么功能,然后做一系列测试去验证猜测,几番努力还是找不到,再考虑其他可能,比如副产品、退化残余、漂变之类 @real_whisper:科学研究的唯一方法是分析归纳。科学必须基于事实判断,上来就定义“好处”这种价值判断不是科学方法。 @whigzhou: 那你说说啥叫“分析”? @慕容飞宇gg:辉总,从进化论的角度来说,“这样有什么好处”和“为什么会这样”有区别吗? @whigzhou: 有。你还得构造并验证它如何带来此等好处的完整因果链,就好比你光有作案动机不能定罪,还得构造因果链并加以证明