Open source knowledge

Most people at University are taught to use some form of computer program that is necessary for their future work and research. Almost always, this program is commercial, that is, it is sold and maintained by a private company. In my case, it has mainly been SPSS.
 
Now, there are a couple of problems with this approach:
  1. Students are given knowledge that can't be utilized whenever and however they want. That is, they will have to sit by a computer which has a licence for the program, something that is problematic both during their studies and after. Bear in mind that these programs often are quite expensive to say the least; a single licence of Matlab costs thousands of dollars.
  2. The Universities are basically sending away tax money to private companies that have an interest in making money. This is the same problem as with comercial peer-reviewed journals. The money of the university isn't utilized in the best possible way when large amounts of it goes to private shareholders.

However, there should be a fairly easy sollution to all this:

  1. Start using open source alternatives to the proprietary software currently being used. In the case of SPSS, this could for example be R. In the case of Matlab, go with Octave or SAGE instead (in relation to this, see this post). Need to edit some pictures in Photoshop? GimpShop is at your service. Are you a music student who needs to write a partiture in Finale or Sibelius? Well, you don't. Lilypond should be a good substitute. I can go on, but you get the hang of it.
  2. Some people might, rightfully so, point out that these programs, in some cases, lack necessary features or proper GUIs. (One could debate whether having decent GUIs actually, in the end, is beneficiary. In my own opinion, too much graphics sitting between the user and what the program actually does with the data, might obscure one's understanding of the whole process, but let's put that aside for now.) This should, however, be easily fixable: Consider the incredibly large amount of money that Universities all around the world are pumping into these private companies every year. Now, consider what would be the case if the Universities took as much money as actually is going into the actual development process itself as of now (that is, taking whatever they pay for licensing fees right know and then subtract the estimated amount that just gets thrown into a capitalistic black hole) and donated this to open source-projects with set goals of adding these features to the original open source programs. You would have a whole armada of free software, available for anybody to use, that would beat proprietary software because of the huge economic support.

A great side-effect of all this would also be that Universities could ditch Windows as the preferred OS. That is, all the free software mentioned above runs on Linux (as well as other operating systems, including Windows), in contrast to many commercial programs that are Windows only software. If the latter programs were no longer used, Windows would have played out its role and the computers at campus wouldn't need to have it installed, something that further would be economically beneficiary to the Universities. And no, you don't need office to get things done.

The Zen of Python

 The Zen of Python:

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Some esoteric references here, but all in all, it's something applicable to a much broader scope than just programming. 

 

Turning tables into graphs

In relation to the last post: good visualization is important but in order for the chance to make it good, one has to turn the data into graphics from the start. In the article Let's Practice What We Preach: Turning Tables into Graphs, the authors make a great job of going through a whole issue of the Journal of the American Statistical Association, transforming tables into graphs, something which statisticians often themselves recommends. Yes, the article is a bit old but the sentiment is still as important and applicable to scientific journals. 

ResearchBlogging.org

Andrew Gelman, Cristian Pasarica, and Rahul Dodhia (2002). Let’s Practice What We Preach: Turning Tables into Graphs The American Statistician, 56 (2)

Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups @ Solomon Messing

http://solomonmessing.wordpress.com/2012/03/04/visualization-series-insight-from-cleveland-and-tufte-on-plotting-numeric-data-by-groups/:

  1. Never represent something in 2 or worse yet 3 dimensions if it can be represented in one—NEVER use pie charts, 3-D pie charts, stacked bar charts, or 3-D bar charts.
  2. Remove as much chart junk as possible–unnecessary gridlines, shading, borders, etc.
  3. Give your audience a sense of the noise present in your data–draw error bars or confidence bands if you are plotting estimates.
  4. If you want to plot multiple types of groups on a single outcome (the visual analog of cross-tabulations/marginals), use multi-paneled plots. These can also help if overploting looks too cluttered.
  5. Avoid mosaic plots. Instead use paneled histograms.
  6. Ditch the legend if you can (you almost always can).

This is a great and important blog post on why visualization is well worth paying attention to and how to get things right, something that sadly most researchers doesn't. Best of all, it's just the first of a series of blog posts so keep an eye out for the rest in the series.

Supplemental material done right!

Michael Eisen recently posted about two guys, Greg Lang and Dacid Botstein, who in their latest paper included THE COMPLETE LABORATORY NOTEBOOK (I really had to use capital letters here) as supplement material. This is an exelent example of taking the spirit of science, openess to investigation from others, and actaully making it real, not just something you say because it sounds nice.

Scientific_epicness

Michael notes:

It's really not so amazing that they did this. It’s actually a totally obvious and natural thing to scan and post an entire lab notebook as supplemental material – in principle allowing anyone to answer virtually any question they have about the actual work conducted. What is amazing is that – as far as I know – this is the first time anyone’s actually done it. And (members of my lab take note) this will not be the last.

I couldn't have said it better myself. If I had a lab, I would have enforced this practice yesterday.

Hooray, I was wrong!

Phil continues to deliver. If only everybody could have the mindset he argues for in this post, we'd be far better off. Seriously.

I guess my overall point is that any online discussion, even between people who violently disagree with each other, should be a co-operative venture. One of you is wrong, and you're working together to find out who. And, we should keep in mind that most of the benefit goes to the person who was actually wrong in the first place.

Phil describes the phenomena of always trying to be right as an ego thing. I think he's right about this, but in addition, I would also suggest that the sunk-cost-fallacy is involved in this kind of behavior as well. That is, once you've invested in getting to know a certain field, you want that knowledge to pay off, and in the world of science, that payoff is constituted by an abbility to come with accurate predictions and in general be right about things. Of cource, if one is wrong, continuing to fervently argue for something only adds to the loss. If one just could cut ones losses and change perspective, one would gain more than if holding on to that faulty view. Alas, it's called a fallacy for a reason.

A difference that makes all the difference in the world

Phil Birnbaum, over at Sabermetric Research, has recently posted about the difference of calling someone a dick and being a dick.

B gets in trouble for calling A a dick. But A gets off scot-free for BEING a dick. And that's a lot worse.

It's a great post that illuminates how, as Phil puts it, people often doesn't have the time or patience to judge what you've said, but very well understands how you said it without having to put any effort into it. There's an assymerty going on here, an assymerty that has big consequenses for how people are judged when participating in discussions.