Computational Biology Resources FTW

This is my ever-growing collection of links, solutions and sources I have discovered and used when trying to learn and teach computational biology. I often use it as a one-stop resource page for whomever asks me about a good book, website or that command that lets you execute line 45 from history and to learn about handling data in shell and R.

A bunch of papers

If you need a good reference or just to persuade your colleague or supervisor that she really needs to get to where the puck is going to be. Actually, scrape that, this train has been puffing along for quite a while and all we can do now is not get left behind.

Also, bioinformatics != computational biology.

Four books on computational biology I highly recommend

A more thorough list is available at bookdown.org, in particular these two books on data visualisation (both use ggplot extensively):

A good book to learn Python

Do not use Excel for handling dates and gene identifiers

In particular, do not export gene IDs and dates to Excel and then import it back to R or other programming tools. You have been warned.

If you have to use Excel for dates, split your date into three numerical columns: year, month and day and use package lubridate to handle the dates after importing to R. Also, here is a good website with tricks for power users and here is a website which explains R data structures for people coming from Excel.

Get a good text editor

This is essential. A good text editor has to support regular expressions and understand different line ending conventions. All the software below is free to use.

Do it in style

Code style guides for R. Pick one and stick to it:

Also important:

Tools useful in teaching or just for mucking about

Some teaching ideology

R tutorials/codethroughs I like

Two classics:

Do not let Jenny Bryan set your computer on fire!

The only two things that make @JennyBryan 😤😠🤯. Instead use projects + here::here() #rstats pic.twitter.com/GwxnHePL4n

— Hadley Wickham (@hadleywickham) December 11, 2017

…use the right way to organise your R work:

Shell-fu

How to install Bash shell on Windows 10

Three very useful and inexpensive books on command line

Shell prompt

Take time to make your terminal window and the font big enough!

Useful link with options to modify your prompt: https://www.cyberciti.biz/tips/howto-linux-unix-bash-shell-setup-prompt.html

Difference between .bash_profile and .bashrc

This is relevant for modifying the $PATH:

How to move around shell

Clear your screen

How to really clear the terminal

Listing stuff (ls)

How to move around your folders

Four ways to go home:

If your folder or file names include spaces

To repeat last command

Reading/displaying text files

Wildcards in shell (to do stuff on more than one file at a time)

Regular expressions and grep

xkcd #208

Everything you wanted to know about regular expressions

Two useful regular expression testers

…but rememeber that grep in Notepadd++, Ruby, JavaScript or Mac terminal can have slightly different implementations (i.e. not all functions will work or not all functions will work the same way). When stuff doesn’t work, try egrep (extended grep) and always RTFM.

A cool regular expression recognition web app - you put in your input and it tries to automatically find a regexp pattern to match it. When it works, it’s like magic.

There is now also a way of testing and visualising regular expressions inside R studio: Regexplain by Garrick Aden-Buie. And if you want a very nerdy regular expressions’ testing site, try regexcrossword.com (this site tests you).

Wildcards for regular expression pattern matching

Boundaries

Quantifiers, used in combination with characters and wildcards

Capturing and replacing

Basic grep commands

Other bits that didn’t fit anywhere else

Extracting columns and sorting

Prevent accidental deletion or overwriting files or folders

Some less basic stuff

Git basics

Jenny Brian’s book about Git for R users is great: Happy Git and GitHub for the useR.

Another book on bioinformatics

The extensive “missing manuals” for awk and sed

And a very good tutorial that let’s you use Awk right away: Why you should learn just a little Awk: An Awk tutorial by Example by Greg Grothaus.

Utitlities to handle fastq files etc.

Extract sequences from the fastq file

reads.fastq | awk '{if(NR%4==2) print length($1)}' | sort -n | uniq -c > read_length.txt

awk '0 == (NR + 1) % 2' inputfile.txt

cat barcount.txt | sed -E -e 's/^ +([0-9]+) [ACGTN]+/\1/' | awk 'BEGIN{total=0} {if ($1>10000) total+=$1} END{print total}'

Enable NTFS read/write in macOS

This will let you read anc write to a Windows partition from macOS:

open /Volumes sudo echo "LABEL=DRIVE_NAME none ntfs rw,auto,nobrowse" >> /etc/fstab

Enable ext4 read in macOS

This will let you read from a Linux partition on macOS:

I am still working on it - I only managed to get read access when root…

Setting up ftp proxy via command line

This assumes you cannot modify or don’t trust the system–wide settings in Ubuntu/Mac.

How to use screen

Ctrl-a d to disconnect from the screen\ screen -ls list of screens\ screen -r [id of the screen] to reconnect to the screen

Random stuff