Tuesday, May 26, 2015

How to Perform Basic RNA-seq Analysis with Cufflinks

Brief outline

This is a Perl wrapper for RNA-seq analysis to identify differentially expressed genes.
Basically I follow this Nature Protocol by Cufflinks' authors, but with a more updated workflow recommendatd for cufflinks 2.2 (in which the cuffquants is used to generate CXB file for final analysis).
The script can be download from this link as well.

use strict;
use 5.010;
use File::Basename qw/fileparse/;
use POSIX qw/strftime/;

# ----------------------------------------------------------------------
# Basic Information
# ----------------------------------------------------------------------
# For RNA-Seq Analysis of snoRNA-Knockdown experiment  
# @ RIKEN BSI, Mental Biology Team
# @ Author  : Psytky03
# @ Date : 2015-05-22
# @ Software Versions
# Tophat  : 2.014
# Bowtie 2 : 2.2.3
# cufflinks : 2.2.1
# samtools  : 0.1.19
# GFOLD  : 
# ----------------------------------------------------------------------

# ----------------------------------------------------------------------
# FASTQ samples - in fastq.gz format
# ----------------------------------------------------------------------
# GFP_B6_DIV4.fastq.gz
# GFP_ICR_DIV10.fastq.gz
# GFP_ICR_DIV4.fastq.gz
# MBII52_B6_DIV4.fastq.gz
# MBII52_ICR_DIV10.fastq.gz
# MBII52_ICR_DIV4.fastq.gz
# MBII85_B6_DIV4.fastq.gz
# MBII85_ICR_DIV10.fastq.gz
# MBII85_ICR_DIV4.fastq.gz
# ----------------------------------------------------------------------

# ----------------------------------------------------------------------
# Files and Parameters
# ----------------------------------------------------------------------
# my $simulate = 0; # run the program
  my $simulate = 1; # only show the simulated run and commands

  my $GTF_file     = "genes.gtf"; # UCSC GTF
  my $Genome_Fasta_Prefix  = "genome";  # mm10 
  my $Genome_Fasta_file  = "genome.fa"; # m10 whole genome fasta 
  my $assemblies_file  = "assemblies_2.txt"; # desired assemblies output file 

  my $tophat_threads = 11;
  my $cufflinks_thread  = 11;
  my $cuffmerge_thread  = 11;
# ----------------------------------------------------------------------

# ----------------------------------------------------------------------
# Run Tophat - Cufflinks 
# ----------------------------------------------------------------------
open(ASSEMBLIES, ">$assemblies_file");

my @gzfiles = <*.fastq.gz>;
my @accpeted_bam_files; 

foreach my $file (@gzfiles) 
 my ($basename, $path, $suffix) = fileparse($file, qw/.fastq.gz/);
 my $current_time    = strftime('%Y-%m-%d %H:%M:%S',localtime);
 my $tophat_output_dir = $basename."_tophat_out";
 my $cufflinks_output_dir = $basename."_cufflinks_out";
 say "[$current_time]\tNow is processing $file"; 
 my $accepted_bam = "./$tophat_output_dir/accepted_hits.bam";
 push @accpeted_bam_files, $accepted_bam;
 my $tophat_command   = "tophat2 -p $tophat_threads -G $GTF_file -o $tophat_output_dir ".
          "$Genome_Fasta_Prefix $file";
 my $cufflinks_command  = "cufflinks -p $cufflinks_thread -o $cufflinks_output_dir ".
 if ($simulate)
 { say($tophat_command);
 } else{
 say ASSEMBLIES "./$cufflinks_output_dir/transcripts.gtf";
 sleep (1);

my $current_time    = strftime('%Y-%m-%d %H:%M:%S',localtime);
say "----------------------------------------------------------------------";
say "[$current_time] Finished Tophat - Cufflinks Process"     ;
say "----------------------------------------------------------------------";
# ----------------------------------------------------------------------

# ----------------------------------------------------------------------
# Run Cuffmerge to create a signle merged transcritome annotation
# ----------------------------------------------------------------------

my $cuffmerge_command = "cuffmerge -g $GTF_file -s $Genome_Fasta_file -p $cuffmerge_thread $assemblies_file";

if ($simulate){say($cuffmerge_command)} 

my $current_time    = strftime('%Y-%m-%d %H:%M:%S',localtime);
say "----------------------------------------------------------------------";
say "[$current_time] Finished CuffMerge Process"       ;
say "----------------------------------------------------------------------";

# ----------------------------------------------------------------------

# ----------------------------------------------------------------------
# Run CuffQuant 
# ----------------------------------------------------------------------
my $merged_assemble_file = "./merged_asm/merged.gtf";
my $cuffquant_thread  = 11;

foreach my $bam_file (@accpeted_bam_files)
 my ($basename, $path, $suffix) = fileparse($bam_file, qw/.bam/);
 $path =~ s/_tophat_out\///g;
 my $cuffquant_output_dir = $path."_cuffquants_out";
 my $cuffquant_command = "cuffquant -p $cuffquant_thread -o $cuffquant_output_dir ".
       "$merged_assemble_file $bam_file";
 if ($simulate){say($cuffquant_command)} 
 sleep (1);

my $current_time    = strftime('%Y-%m-%d %H:%M:%S',localtime);
say "----------------------------------------------------------------------";
say "[$current_time] Finished CuffQuant Process"  ;
say "----------------------------------------------------------------------";
# ----------------------------------------------------------------------

How to Add Prefix to all files in the folder

# Method 1
for f in * ; do mv "$f" "PRE_$f" ; done

# Method 2 
ls | xargs -I {} mv {} PRE_{}
ls | xargs -I {} mv {} {}_SUF

Thursday, May 21, 2015

How to do Dunnett Test in R



weight <- read.csv("~/Desktop/Pat-Mat-WT-Weight.csv", head=TRUE)
weight$Genotype <- as.factor(weight$Genotype)

low_fat <- subset(weight, Food=="CE-2")
high_fat <- subset(weight, Food =="D12492")

time_point <- c(4:17)

high_results <- lapply(time_point, function(x) {
 summary(glht(aov(high_fat[,x] ~ Genotype, high_fat), linfct=mcp(Genotype="Dunnett")))


ls(summary(glht(aov(high_fat[,4] ~ Genotype, high_fat), linfct=mcp(Genotype="Dunnett"))))
summary(glht(aov(high_fat[,4] ~ Genotype, high_fat), linfct=mcp(Genotype="Dunnett")))$vcov

low_results <- lapply(time_point, function(x) {
  summary(glht(aov(low_fat[,x] ~ Genotype, low_fat), linfct=mcp(Genotype="Dunnett")))


File Link

Scientific software development best practices

23andMe - a story from relative finder

Sunday, May 10, 2015

Preordered Nonda Hub+, a USB-C Type hub for Macbook

The expected delivery time is July. Will write another test post then.

Bradley Cooper In American Sniper

I did not recognize that the charactor of Chris Kyle, in the movie American Sniper, was portrayed by Bradley Cooper. It was a real shock - thought the actor might be a military veteran.

Monday, May 4, 2015

On writting - Arthur Schopenhauer

He who writes carelessly makes first and foremost the confession that he himself does not place any great value on his thoughts. For the enthusiasm which inspires the unflagging endurance necessary for discovering the clearest, most forceful and most attractive form of expressing our thoughts is begotten only by the conviction of their weightiness and truth – just as we employ silver or golden caskets only for sacred things or priceless works of art.

Sunday, May 3, 2015

EndNote Shortcut for Mac when working in Word

A short summary of Endnote's shortcut for Mac in the CYCW mode.
The most frequently used ones were highlighted in orange color 

Control + 1: Switch between Word and EndNote
Control + 2: Insert Selected Citation(s)
Control + 3: Configure Bibliography
Control + 5: View Selected Citations
Control + 6: Edit and Manage Citations
Control + 7: Find and Insert Reference
Control + 8: Export Traveling Library
Control + 9: CWCY Preferences

論文のすすめ - 高山忠利 (日本大学医学部教授)


Saturday, May 2, 2015

Restaurants Around Riken

My personal recommendations for restaurants near RIKEN's Wako Campus.
Thanks to my boss's genius idea to hold the meeting on Saturday, I am constantly searching for the answer "where to eat near RIKEN without leaving Wakoshi?"
(Last updated 2015-05-10)

Eat alone 
  1. Gyoza No Ohsho (餃子の王将 和光店) : This is a popular Chinese restaurant usually with a long line, particularly in dinner time of the weekends. It is located at the 1F of the Itoyokado - the big supermarket near the station, next to the Lotteria.  (Map link)
  2. Kishin Ramen (樹真ラーメン): The most popular Ramen restaurant in Wakoshi, another restaurant with long line, near the Wako post office. My personal pick is 濃厚スープラーメン(Dense Soap Ramen)   [Tabelog link] [Map link]
  3. Sushiro (スシロー 和光白子店): Tasty, mostly 108 yen per dish. if you are a big fan of Sushi, can not miss this one. 5 minutes walk from the Ihouse H building. [Map link

Eat with 1-3 friends
  1. Restaurant listed above (Gyoza No Ohsho | Kishin Ramen | Sushiro) 
  2. Suehiro Yakiniku restaurant (スエヒロ館 和光本町店): Very reasonable price and also great taste. Enjoyed every time. 10 minutes walk from RIKEN. [restruant link] [map link]
  3. Ottori Yakitori Wako (おっ鳥家 和光店): Open only for dinner (17:00~24:00). Highly recommended. [restaurant link][map link]

Eat with a bunch of people 

Not recommended 
  1. Café Restaurant Gusto (Café レストラン ガスト) – the nearest restaurant and just outside RIKEN, a family restaurant [map link]
  2. Japanese Restaurant Tonden - overpriced, taste is just so so. [map link]

Restaurant you should avoid (for safety reason)
  1. Chicken Bastard (チキン野郎): I went to this restaurant for a farewell dinner with another 4 colleagues. 3 out of 5 got food poisoning the next day. The clinical diagnosis suggested the cause was Campylobacter bacteria. The restaurant is at the 2F of the Kishin Ramen, so be careful. 

Friday, May 1, 2015

ACMG guides on the interpretation of sequence variants

Link to the ACMG GuideLine (PDF)
The increased availability of sequence data in clinical settings has been accompanied by challenges in standardization, interpretation and reporting of genetic tests. Meeting the need for community guidelines, a new report from the American College of Medical Genetics and Genomics (ACMG) summarizes updated recommendations for the reporting and interpretation of sequence variants for Mendelian disorders in a clinical context.