In today’s world of big data and all-the-omics, it's tough to straddle that murky line between biology and bioinformatics. Computational biology is challenging, and computers are mean. Here at Femina Sci we’ve spent countless evenings battling the robots late into the night, only to cry ourselves to sleep with the help of whiskey and ice cream, so you don’t have to! Here are our top tips for surviving the battle against bioinformatic robots:
1) Instead of asking, first, “what question do I want to answer with my data?” and, second, “what computational tool will allow me to do this?” Simplify your life by starting with, “what software package can I easily install?” Installation requires root privileges? Nope. Written in Java? That’s another nope. Requires installing a bazillion other dependencies? That’s a definite nope.
2) If you went ahead and attempted to install something complicated anyways and just can’t figure it out, there is no shame in asking for help. When desperation sets in, beg your local bioinformagician (i.e. the people who actually took the time to learn Python, unlike yourself who took that 20 minute online tutorial, if even) for assistance. They love this.
3) Manual File Manipulation (M.F.M.): Yes, open all 2,000 of your input files in a text editor and delete that string of 8 random characters in the sample IDs. In the long run it will be faster than writing a batch script.
4) Don’t let those pesky columns in your R data frame defeat you, simply save the data frame as a .csv, open it in Excel, remove the unwanted column, save as .csv and load data frame back into R. Easy peasy lemon squeezy!
5) Can’t tell the difference between Python and Perl syntax if it hit you in the face? Trial and error is a tried and true method. If the square brackets don’t work, try parentheses. If a single quote doesn’t work, try the double. There are only so many symbol options on the keyboard; one of them has to be right.
6) Still didn’t work? Sounds like you need to restart EVERYTHING (including the computer, maybe even unplug it and flip the circuit breaker) and try again. Don’t know how many cores you’ll need? You should probably just use all the cores. #AllCoresAllDay
7) Quality sequence alignments are crucial for your downstream analyses; a poor alignment can produce all kinds of Salvador Dali trees. With so many alignment and trimming options floating around, it’s hard to find a trustworthy approach. You know who is trustworthy? You! Just load your sequences into a Word document, put on a good movie, and align by hand! Alternatively, if that is too much screen time for you then print the document and grab your favorite set of highlighters, scissors, and tape. Highlight each base pair a different color, and piece together! You can always scan it back in later.
8) If the appropriate statistical test doesn’t give you a p-value < 0.05, find another test! There are tons of them out there.
9) For generating null models and normal distributions, as well as for divination and lottery tickets, check out this random number generator!
10) Ultimately, it’s not what the data says - it’s how sexy the figure looks. You should spend an obscene amount of time changing the figure settings, especially the colors. Try websites like I want hue, color hex, or color brewer to really nail down the perfect palette. Then marinade on it for the next week, only to decide to change it back to your initial color palette. Hopefully your PI will splurge for the color prints of your publication.
Bonus Tip: After following these tips, be sure to publish your full pipeline in the spirit of scientific transparency using platforms such as Github and/or Jupyter.
Best way to clean your data - UV sterilize it.