Saturday, August 12, 2017

Du Bois on Da Vinci

A quick write up on a charming essay by the young Du Bois (from his time as a graduate student at Harvard), which I only found out about through the fascinating historical work of Trevor Pearce. The essay is entitled Leonardo Da Vinci As A Scientist and is available online here.

Leonardo Da Vinci -- ``I was even a pioneer in
side-eye and general shade throwing.''
Du Bois is concerned to argue that Da Vinci deserves credit as the founder of modern experimental science. The argument strategy is twofold. First, to show that Da Vinci has sufficient (and sufficiently impressive) scientific achievements to merit attention as an early scientist at all. This Du Bois achieves by just reviewing historians (apparently then - 1889 - relatively recent) reappraisal of Da Vinci's empirical work and work inventing scientific machinery and to show that it was indeed impressive. This in itself was interesting; so for instance I learned here that Da Vinci was already floating the idea that the sublunary realm and the broader cosmos should be understood as operating on the same principles, that Da Vinci has a
claim to being an early inventor of the telescope and also being the first to notice a parallel between how the camera obscura works and the operations of the human eye, and that on the basis of observational study of plants Da Vinci was developing ideas about plant respiration which now seem to have been on the right track. Cool!

The second step in the argument, however, is the more philosophically and conceptually interesting. Here Du Bois' task is to argue that Da Vinci deserves credit not just as a link in a great chain of scientific workers, but rather some sort of special credit as a founder figure in one sense or another. Here the point is largely drawn out by comparison with three other figures: Roger Bacon Gilbert of Colchester and Francis Bacon. While Du Bois is impressed with each of these figures, he thinks they were each lacking in a certain way. Roger Bacon was not enough of an empiricist: to be credited as a founder of modern science, Du Bois feels, empiricism must be one's epistemological foundation, where for R.Bacon ``empiricism was but a branch of the tree of which philosophy was the trunk''. Glibert of Colchester has, so to speak, the opposite problem -- he's all empiricism with no metatheory. While he's impressive in his collection of observational and experimental results, he's ``a mere experimenter, with little breadth of conception, or broad generalising powers''. F. Bacon, finally, came after Da Vinci, and is substantially the same in his metatheory (so Du Bois thinks! Please don't hurt me, Renaissance scholars), but just didn't achieve as much scientifically as Da Vinci. F. Bacon comes across, basically, as an especially talented expositor of Da Vincian method, but not himself worthy of the claim to priority on scientific method.

The philosophy of science young Du Bois is working with is interesting, and worth making more explicit than Du Bois himself does in the essay. In Da Vinci, Nature had found itself a man who could do both: patient skillful observational work, aided by machines of his own device, that uncovers particular facts of great interest and also general principles, and also explicit epistemological theorising of a sort which acknowledged and explained the importance of founding one's claims in such observations. Science, then, is the epistemologically self-conscious skillful application of empiricist method. R. Bacon was a skillful natural philosopher and epistemologically self-conscious, but not an empiricist. Gilbert of Colchester was a skillful empiricist, but did not evince the requisite degree epistemological self-consciousness. F. Bacon was an epistemically self-conscious empiricist, but just not quite good enough at the actual application. Da Vinci was the first person in whom all these qualities meet to a sufficient degree, or so Du Bois claims. (This essay also features a trait which is characteristic of all Du Bois' latter work on social matters -- explicit reticence and diffidence, with frequent reminders that one ought be cautious about one's conclusions given the difficulties of gathering evidence and being sure it is complete or representative.)

W.E.B. Du Bois -- ``The idea that the person
in this picture could ever be as enthusiastic
about anything as the person who wrote that
essay on Da Vinci is genuinely surprising.''
I've worked on Du Bois' philosophy of science before, but I have never in my published work explicitly remarked on the undercurrent of empiricism. None the less, it is there; most especially it can be seen in his lifelong habit of issuing scathing condemnations of a priori approaches to history and sociology, where he thinks that prejudice unchecked by experience has been the source of much racist balderdash concerning African (and African-descended) folk. It is remarkable to think, then, how closely Du Bois' scientific and social mission accords with the early philosophy of science he developed here. For, The Philadelphia Negro or Black Reconstruction can plausibly be described as epistemologically self-conscious skillful applications of empiricist method; in both these works (and many of his less famous essays besides) he mixes explicit methodological remarks exhorting a more carefully and rigorously observationally grounded approach to the study of black life in America, with the actual collection of novel results about social, political, or economic conditions, and in both the highlighted cases they have (nowadays) come to be seen as classics of their respective fields. His work is thus epistemologically self-conscious in its empiricism, involves the actual application of observational method as well as its exhortation, and skillful performance thereof. The philosophy of science underlying this essay by the young Du Bois seems to have set a pattern that he attempted to live up to for the rest of his scientific career.

Da Vinci, of course, is not just a great scientist and engineer, but also a great artist. Du Bois was evidently aware of this, and this fact about him is mentioned at various points in the essay. Da Vinci is indeed paradigmatic of the Renaissance Man, the individual who strives to hone diverse skills to a high degree and exhibit a broad culture. In this respect too Du Bois seems to have followed Da Vinci, being more acclaimed for his literary style and humanistic moral and political vision than his scientific career. Being attracted to the broad humanism of the Renaissance, and having great respect for Du Bois' work, seeing this essay where Du Bois develops his ideas about philosophy of science as part of an ode to Da Vinci and the Renaissance scientific humanism that Da Vinci pioneered, was in its own way quite affecting for me. Even if I cannot match these figures in their skill, I hope to at least preserve and advance the spirit of humanistic inquiry that they each embodied.

Sunday, August 6, 2017

Significant Moral Hazard

What follows is a guest post by my comrade Dan Malinsky. After the recent publication of the paper `Redefine statistical significance' Malinsky and I attended a talk by one of the paper's authors. I found Malinsky's comments after the talk interesting and thought-provoking that I asked him to write up a post so I could share it with all yinz. Enjoy!

--------------------------------------------------------------------------------------------------------------------------

Benjamin et al. present an interesting and thought-provoking set of claims. There are, of course, many complexities to the P-value debate but I’ll just focus on one issue here.

Benjamin et al. propose to move the conventional statistical significance threshold in null hypothesis significance testing (NHST) from P < 0.05 to P < 0.005. Their primary motivation for making this recommendation is to reduce the rate of false positives in published research. I want to draw attention to the possibility that moving threshold to P < 0.005 may not have it’s intended effect: despite the fact that “all else being equal” such a policy should theoretically reduce false positive rates, in practice this move may leave the false positive rates unchanged, or even make them worse. In particular, the “all else being equal” clause will fail to hold, because the policy may incentivize researchers to make more errors of model specification, which will contribute to a high false positive rate. It is at least an open question which causal factors will dominate, and what the resultant false positive rate will really look like.

An important contribution to the high false positive rates in some areas of empirical research is model misspecification, broadly-understood. By model-misspecification I mean anything which might make the likelihood wrong: confounding, misspecification of the relevant parametric distributions, incorrect functional forms, sampling bias of various sorts, sometimes non-i.i.d.ness, etc. In fact, these factors are more important contributions to the false positive rate than the choice of P-value convention or decision threshold, in the sense that any plausible decision rule no matter how stringent (whether it is based on P-values, Bayes factors, or posterior probabilities) will lead to unacceptably high false positive rates if model misspecification is widespread in the field.

Note that the authors Benjamin et al. agree on the first claim. Benjamin et al. mention some of these problems, agree that they are problems, and frankly admit that their proposal does nothing to address these or many other statistical issues. Model misspecification, in their view, ought to be tackled separately and independently of the decision rule convention. The authors also admit that these and related issues are “arguably bigger problems” than the choice of P-value. I think these are bigger problems in the sense specified above: model misspecification will afflict any choice of decision rule. This is important because the proposed policy shift may actually lead to more model misspecification. So, the issues interact and it is not so straightforward to tackle them separately.

P < 0.005 requires larger sample sizes (as the authors discuss), which are expensive and difficult to come by in many fields. In an effort to recruit more study participants, researchers may end up with samples that exhibit more bias -- less representative of the target population, not identically distributed, not homogenous in the right ways, etc. Researchers may also be incentivized, given finite time and resources, to perform less model-checking and diagnostics to make sure the likelihood is empirically adequate. Furthermore, the P-value critically depends on the tails of the relevant probability distribution. (That’s because the P-value is calculated based on the “extreme values” of the distribution of the test statistic under the null model.) The tails of the distribution are rarely exactly right at finite sample sizes, but they need to be “right enough.” With a low P-value threshold like 0.005, getting the tails of the distribution “right enough” to achieve the advertised false positive rate becomes more unlikely because with 0.005 one considers outcomes further out into the tails. Finally, other problems which inflate false positive rates like p-hacking, failure to correct for multiple testing, and so on may be exacerbated by the lower threshold. The mechanisms are not all obvious -- perhaps, for example, making it more difficult to publish “positive” findings will incentivize researchers to probe a wider space of (mostly false) hypotheses in search of a “significant” one, thereby worsening the p-hacking problem -- but it is at least worth taking seriously that these factors may offset the envisaged benefits of P < 0.005. (I think there are some interesting things which may be said about why these considerations are less worrisome in particle physics, where the famous 5-sigma criterion plays a role in announcements. I’ll leave that aside for now.)

I’m not disputing any mathematical claim made by the authors. Indeed, for two decision rules like P < 0.05 and P < 0.005 applied to the same hypotheses, likelihood, and data, the more stringent rule will lead to fewer expected false positives. My point is just that implementing the new policy will change the likelihoods and data under consideration, since researchers will face the same pressure to publish significant results but publishing will be made more difficult in a kind of crude way.

This worry will be relevant for any decision threshold convention, and so it speaks against any strict uniform standard. However, Benjamin et al. raise the important point that “it is helpful for consumers of research to have a consistent benchmark.” My friend and colleague Liam Kofi Bright reinforces this point in his blog post: there are all sorts of communal benefits to having some mechanism which distinguishes “significant” results from “insignificant.” I’d like to propose a different kind of mechanism.

Sometimes statisticians casually entertain the idea of requiring “staff statistician reviewers” to review (the data analysis portions of) empirical articles submitted for publication. I think we can plausibly institutionalize a version of this practice, and it can function as a benchmarking procedure. Every journal will pay some number of professional statisticians (who should be otherwise employed at universities, research centers, etc.) to act as statistical reviewers, and specifically to interrogate issues of model specification, sample selection, decision procedures, robustness, and so on. Only when a paper receives a stamp of approval from two or more statistical reviewers should it count as having “passed the benchmark.” The institutionalization of this proposal would have some corollary benefits: there are a lot of statistician professionals who are employed with “soft money,” i.e., they have to raise parts of their salaries by applying for grants. This mechanism could partially replace that grant-cycle: journals would apply regularly every few years for funding from the NIH, NSF, and other funding agencies to compensate statistical reviewers (an amount dependent on the journal’s submission volume); the statisticians get to supplement their incomes with this funding rather than spend time applying for grants; and the public gets some comfort in knowing that the latest published results are not fraught with data analysis problems. I can image a host of other benefits too: e.g., statisticians will be inspired and motivated to direct their own research towards addressing live concerns shared by practicing empirical scientists, and the empirical scientists will be alerted to more sophisticated or state-of-the-art analytic methods. Statistician’s review may also reduce the prevalence of NHST, in favor of some of the alternative analytical tools mentioned in Benjamin et al. The details of this proposed institutional practice need to be elaborated, but I conjecture it would be more effective at reducing false positives (and perhaps cheaper) than imposing P < 0.005 and requiring larger sample sizes across the board.

[I should acknowledge that, depending how my career goes, I could be the kind of person who is employed in this capacity. So: conflict of interest alert! Acknowledgements to Liam Kofi Bright, Jacqueline Mauro, Maria Cuellar, and Luis Pericchi.]