Wednesday, March 5, 2014

Top 10 reasons to not share your data (and why you should anyway)

Much has been made about the recently announced data policy at PLoS (see this post for summary of sorts or Google #plosfail). Reading some of this I was reminded of this excellent piece of writing by Randall J. LeVeque.  It is entitled "Top 10 reasons to not share your code (and why you should anyway)" but most of it applies equally well to data in my opinion.  Some excerpts follow.

Before discussing computer code, I'd like you to join me in a thought experiment. Suppose we lived in a universe where the standards for publication of mathematical theorems are quite di fferent: papers present theorems without proofs, and readers are expected to simply believe the author when it is stated that the theorem has been proved. 
In this alternative universe the reputation of the author would play a much larger role in deciding whether a paper containing a theorem could be published. ...  Eventually some agitators might come along and suggest that it would be better if mathematical papers contained proofs. Many arguments would be put forward for why this is a bad idea. Here are some of them ... 
1. The proof is too ugly to show anyone else. It would be too much work to rewrite it neatly so others could read it. And anyway it's just a one-o proof for this particular theorem, and not intended for others to see, or to use the ideas for proving other theorems. My time is much better spent proving another result and publishing more papers rather than putting more e ort into this theorem, which I've already proved 
2. I didn't work out all the details. Some tricky cases I didn't want to deal with, but the proof works fine for most cases, such as the ones I used in the examples in the paper. (Well, actually I discovered that some cases don't work, but they will probably never arise in practice.) 
3. I didn't actually prove the theorem, my student did.  And the student has since moved to Wall Street, and thrown away the proof, since course dissertations also need not include proofs.  But the student was very good, so I am sure it was correct. 
4. Giving the proof to my competitors would be unfair to me. It took years to prove this theorem, and the same idea can be used to prove other theorems. I should be able to publish at least 5 more papers before sharing the proof. If I share it now my competitors can use the ideas in it without having to do any work, and perhaps without even giving me credit since they won't have to reveal their proof technique in their papers. 
5. The proof is valuable intellectual property. The ideas in this proof are so great that I might be able to commercialize them some day, so I'd be crazy to give them away. 
6. Including proofs would make math papers much longer. Journals wouldn't want to publish them and who would want to read them? 
7. Referees will never agree to check proofs. It would be too hard to check correctness of long proofs and finding referees would become impossible.  It's already hard enough to find good referees and get them to submit reviews in finite time.  Requiring them to certify the correctness of proofs would bring the whole mathematical publishing business crashing down. 
8. The proof uses sophisticated mathematical machinery that most readers/referees don't know. Their wetware cannot fully execute the proof, so what's the point in making it available to them? 
9. My proof invokes other theorems with unpublished (proprietary) proofs. So it won't help to publish my proof - readers still will not be able to fully verify correctness. 
10. Readers who have access to my proof will want user support. Anyone who can't fi gure out all the details will send email requesting that I help them understand it, and asking how to modify the proof to prove their own theorem. I don't have time or sta ff to provide such support.





No comments: