RNA-Seq Analysis in WSL – Part 1 : Installation of tools

In this series of posts, we will see how we can perform RNA-Seq analysis in Windows operating system using linux tools in WSL. Also, the part of analysis which require “R”, will be done in the Windows system.

We will be using following tools:

  • samtools (WSL / Linux)
  • HISAT2 (WSL / Linux)
  • Stringtie (WSL / Linux)
  • gffcompare (WSL / Linux)
  • Ballgown (R)

Install samtools

We need to install three things for this.

  • htslib
  • bcftools
  • samtools

Open WSL. A terminal will open. Remember to open WSL as administrator.

Prepare

To install samtools we need to have few other things installed on our linux system first. Following are the commands you need to execute on the WSL terminal one-by-one to install them. For those who are not familiar with linux, these commands are called as bash commands.

Bash
sudo apt-get update
sudo apt-get install gcc
sudo apt-get install make
sudo apt-get install libbz2-dev
sudo apt-get install zlib1g-dev
sudo apt-get install libncurses5-dev 
sudo apt-get install libncursesw5-dev
sudo apt-get install liblzma-dev

Download samtool, htslib and bcftools from

https://www.htslib.org/download

The downloads would be in
/mnt/c/Users/<yourname>/Downloads

Note: in place of <yourname>, it will be the username which you are currently using. For example, if your user name is john, then the downloaded packages will be in /mnt/c/Users/john/Downloads

Move the downloaded packages to a different folder (directory)

Let’s create a directory where all the software needed will be stored. Note that the versions of software you would have downloaded will be different. Here I am writing the filenames based on the version which I had downloaded when this work was done.

Bash
cd ~
mkdir bin
export PATH=$HOME/bin/:$PATH

cd bin

mv /mnt/c/Users//<yourname>/Downloads/htslib-1.21.tar.bz2 ./
mv /mnt/c/Users//<yourname>/Downloads/samtools-1.21.tar.bz2 ./
mv /mnt/c/Users//<yourname>/Downloads/ bcftools-1.21.tar.bz2 ./

Now that we have copied the software files to a directory, we can now install them. Continue from the same terminal window above. If you had restarted the WSL, first make the current directory to bin using:

Bash
cd ~
cd bin
Bash
tar -vxjf htslib-1.21.tar.bz2
cd htslib-1.21
make
cd ..

tar -vxjf samtools-1.21.tar.bz2
cd samtools-1.21
make
cd ..

tar -vxjf bcftools-1.21.tar.bz2
cd bcftools-1.21
make

cd ..

Make these software available for use by exporting their path to environment variables.

Bash
export PATH=$HOME/bin/htslib-1.21:$PATH
export PATH=$HOME/bin/samtools-1.21:$PATH
export PATH=$HOME/bin/bcftools-1.21:$PATH

Samtools installation is now complete.

Install HISAT2

First download hisat2 from https://daehwankimlab.github.io/hisat2/download/

Copy to the bin directory created above.

Bash
cd ~/bin
mv /mnt/c/Users//<yourname>/Downloads/hisat2-2.1.0-Linux_x86_64.zip ./
unzip hisat2-2.1.0-Linux_x86_64.zip

If unzip is not installed, install it first and run the unzip command above.

Bash
sudo apt install unzip

Add to path

Bash
cd ~/bin
cp hisat2-2.1.0/hisat2* hisat2-2.1.0/*.py ./

This will show some warnings about files being repeated, don’t worry.

Install Stringtie

Download stringtie linux binary from https://ccb.jhu.edu/software/stringtie/

Download the linux binary not the osx.

Bash
cd ~/bin
mv /mnt/c/Users//<yourname>/Downloads/stringtie-2.2.3.Linux_x86_64.tar.gz ./
tar xvzf stringtie-2.2.3.Linux_x86_64.tar.gz
cp stringtie-2.2.3.Linux_x86_64/stringtie ./

Install gffcompare

You need to setup github first

Bash
cd ~/bin
git clone https://github.com/gpertea/gffcompare
cd gffcompare
make release

Adding this to path was not so straight forward

Close WSL and restart in administrator mode. On the terminal type:

Bash
nano ~./bashrc

A terminal will turn into the nano text editor and would show the bashrc file contents. Go to its end by arrow keys and in new line type:

Bash
export PATH=$PATH:/home/trunil/bin/gffcompare

You can save and exit by Ctrl+X, then Y and then Enter. You will then be taken to normal command line prompt.

Restart WSL and check gffcompare is in path by typing:

Bash
which gffcompare

Install Ballgown

Ballgown is an `R` package. So, we need to first install R and Rstudio in our Windows system.

Once this is done, we will install tidyverse package in R. Start Rstudio and in its console, execute:

R
install.packages("tidyverse")

It would take some time for this to complete.

To install ballgown execute:

R
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("ballgown")

Checking the installations are in path

Finally we can check if everything is installed correctly and is in path.

Restart WSL and type following commands:

Bash
which samtools
which hisat2
which stringtie
which gffcompare

They should give the output directory in which these commands are present in the path. If no output is given by any of the commands, it means that the command is not in the path and you will not be able to use the tool.

Other tools to install in R

Install devtools

R
install.packages("devtools")

Install rtools

Go to
https://cran.r-project.org/bin/windows/Rtools/

Download .exe file of rtools version compatible with your R version.
Install.

Install tidyverse

Tidyverse is a collection of R programming language packages for data science. I can be installed by the following in R console.

R
install.packages("tidyverse")

Install RSkittleBrewer

R

Sys.unsetenv("GITHUB_PAT")
devtools::install_github("alyssafrazee/RSkittleBrewer", auth_token = NULL)
Sys.unsetenv("GITHUB_PAT")
gitcreds::gitcreds_delete() # type 2
library(devtools)
devtools::install_github("alyssafrazee/RSkittleBrewer", auth_token = NULL)

We are now set to do the RNA-Seq analysis. In later posts we will be seeing how to use these tools.