Coding: May 2015

Tuesday, May 26, 2015

How to Perform Basic RNA-seq Analysis with Cufflinks

Brief outline

This is a Perl wrapper for RNA-seq analysis to identify differentially expressed genes.
Basically I follow this Nature Protocol by Cufflinks' authors, but with a more updated workflow recommendatd for cufflinks 2.2 (in which the cuffquants is used to generate CXB file for final analysis).
The script can be download from this link as well.



#!/usr/bin/perl
use strict;
use 5.010;
use File::Basename qw/fileparse/;
use POSIX qw/strftime/;

# ----------------------------------------------------------------------
# Basic Information
# ----------------------------------------------------------------------
# For RNA-Seq Analysis of snoRNA-Knockdown experiment  
# @ RIKEN BSI, Mental Biology Team
# @ Author  : Psytky03
# @ Date : 2015-05-22
# @ Software Versions
# Tophat  : 2.014
# Bowtie 2 : 2.2.3
# cufflinks : 2.2.1
# samtools  : 0.1.19
# GFOLD  : 
# ----------------------------------------------------------------------


# ----------------------------------------------------------------------
# FASTQ samples - in fastq.gz format
# ----------------------------------------------------------------------
# GFP_B6_DIV4.fastq.gz
# GFP_ICR_DIV10.fastq.gz
# GFP_ICR_DIV4.fastq.gz
# MBII52_B6_DIV4.fastq.gz
# MBII52_ICR_DIV10.fastq.gz
# MBII52_ICR_DIV4.fastq.gz
# MBII85_B6_DIV4.fastq.gz
# MBII85_ICR_DIV10.fastq.gz
# MBII85_ICR_DIV4.fastq.gz
# ----------------------------------------------------------------------



# ----------------------------------------------------------------------
# Files and Parameters
# ----------------------------------------------------------------------
# my $simulate = 0; # run the program
  my $simulate = 1; # only show the simulated run and commands

  my $GTF_file     = "genes.gtf"; # UCSC GTF
  my $Genome_Fasta_Prefix  = "genome";  # mm10 
  my $Genome_Fasta_file  = "genome.fa"; # m10 whole genome fasta 
  
  my $assemblies_file  = "assemblies_2.txt"; # desired assemblies output file 

  my $tophat_threads = 11;
  my $cufflinks_thread  = 11;
  my $cuffmerge_thread  = 11;
# ----------------------------------------------------------------------




# ----------------------------------------------------------------------
# Run Tophat - Cufflinks 
# ----------------------------------------------------------------------
open(ASSEMBLIES, ">$assemblies_file");

my @gzfiles = <*.fastq.gz>;
my @accpeted_bam_files; 




foreach my $file (@gzfiles) 
{  
 my ($basename, $path, $suffix) = fileparse($file, qw/.fastq.gz/);
 my $current_time    = strftime('%Y-%m-%d %H:%M:%S',localtime);
 
 my $tophat_output_dir = $basename."_tophat_out";
 my $cufflinks_output_dir = $basename."_cufflinks_out";
 
 
 say "[$current_time]\tNow is processing $file"; 
 
 my $accepted_bam = "./$tophat_output_dir/accepted_hits.bam";
 push @accpeted_bam_files, $accepted_bam;
 
 my $tophat_command   = "tophat2 -p $tophat_threads -G $GTF_file -o $tophat_output_dir ".
          "$Genome_Fasta_Prefix $file";
        
 my $cufflinks_command  = "cufflinks -p $cufflinks_thread -o $cufflinks_output_dir ".
       "$tophat_output_dir/accepted_hits.bam";
 
 if ($simulate)
 { say($tophat_command);
  say($cufflinks_command);
 } else{
  system($tophat_command);
  system($cufflinks_command);
 }
 say ASSEMBLIES "./$cufflinks_output_dir/transcripts.gtf";
 sleep (1);
}

close ASSEMBLIES;
my $current_time    = strftime('%Y-%m-%d %H:%M:%S',localtime);
say "----------------------------------------------------------------------";
say "[$current_time] Finished Tophat - Cufflinks Process"     ;
say "----------------------------------------------------------------------";
# ----------------------------------------------------------------------



# ----------------------------------------------------------------------
# Run Cuffmerge to create a signle merged transcritome annotation
# ----------------------------------------------------------------------

my $cuffmerge_command = "cuffmerge -g $GTF_file -s $Genome_Fasta_file -p $cuffmerge_thread $assemblies_file";

if ($simulate){say($cuffmerge_command)} 
else{system($cuffmerge_command)}

my $current_time    = strftime('%Y-%m-%d %H:%M:%S',localtime);
say "----------------------------------------------------------------------";
say "[$current_time] Finished CuffMerge Process"       ;
say "----------------------------------------------------------------------";

# ----------------------------------------------------------------------




# ----------------------------------------------------------------------
# Run CuffQuant 
# ----------------------------------------------------------------------
my $merged_assemble_file = "./merged_asm/merged.gtf";
my $cuffquant_thread  = 11;

foreach my $bam_file (@accpeted_bam_files)
{
 my ($basename, $path, $suffix) = fileparse($bam_file, qw/.bam/);
 $path =~ s/_tophat_out\///g;
 my $cuffquant_output_dir = $path."_cuffquants_out";
 
 my $cuffquant_command = "cuffquant -p $cuffquant_thread -o $cuffquant_output_dir ".
       "$merged_assemble_file $bam_file";
 
 if ($simulate){say($cuffquant_command)} 
 else{system($cuffquant_command)}
 sleep (1);
}

my $current_time    = strftime('%Y-%m-%d %H:%M:%S',localtime);
say "----------------------------------------------------------------------";
say "[$current_time] Finished CuffQuant Process"  ;
say "----------------------------------------------------------------------";
# ----------------------------------------------------------------------

How to Add Prefix to all files in the folder

# Method 1
for f in * ; do mv "$f" "PRE_$f" ; done

# Method 2 
ls | xargs -I {} mv {} PRE_{}
ls | xargs -I {} mv {} {}_SUF

Thursday, May 21, 2015

How to do Dunnett Test in R


rm(list=ls())


install.packages("multcomp")
library("multcomp")

weight <- read.csv("~/Desktop/Pat-Mat-WT-Weight.csv", head=TRUE)
weight$Genotype <- as.factor(weight$Genotype)

low_fat <- subset(weight, Food=="CE-2")
high_fat <- subset(weight, Food =="D12492")

time_point <- c(4:17)

high_results <- lapply(time_point, function(x) {
 summary(glht(aov(high_fat[,x] ~ Genotype, high_fat), linfct=mcp(Genotype="Dunnett")))
})

sink(file="high_results.txt") 
high_results
sink(NULL) 


ls(summary(glht(aov(high_fat[,4] ~ Genotype, high_fat), linfct=mcp(Genotype="Dunnett"))))
summary(glht(aov(high_fat[,4] ~ Genotype, high_fat), linfct=mcp(Genotype="Dunnett")))$vcov


low_results <- lapply(time_point, function(x) {
  summary(glht(aov(low_fat[,x] ~ Genotype, low_fat), linfct=mcp(Genotype="Dunnett")))
})


sink(file="low_results.txt") 
low_results
sink(NULL) 

cwd()

File Link

Scientific software development best practices

Sunday, May 10, 2015

Preordered Nonda Hub+, a USB-C Type hub for Macbook

The expected delivery time is July. Will write another test post then.

Bradley Cooper In American Sniper

I did not recognize that the charactor of Chris Kyle, in the movie American Sniper, was portrayed by Bradley Cooper. It was a real shock - thought the actor might be a military veteran.

Tuesday, May 5, 2015

莎士比亚 Shakespeare Sonnet #64

Monday, May 4, 2015

On writting - Arthur Schopenhauer

He who writes carelessly makes first and foremost the confession that he himself does not place any great value on his thoughts. For the enthusiasm which inspires the unflagging endurance necessary for discovering the clearest, most forceful and most attractive form of expressing our thoughts is begotten only by the conviction of their weightiness and truth – just as we employ silver or golden caskets only for sacred things or priceless works of art.

Sunday, May 3, 2015

EndNote Shortcut for Mac when working in Word

A short summary of Endnote's shortcut for Mac in the CYCW mode.
The most frequently used ones were highlighted in orange color

Control + 1: Switch between Word and EndNote

Control + 2: Insert Selected Citation(s)

Control + 3: Configure Bibliography

Control + 5: View Selected Citations

Control + 6: Edit and Manage Citations

Control + 7: Find and Insert Reference

Control + 8: Export Traveling Library

Control + 9: CWCY Preferences

論文のすすめ - 高山忠利 (日本大学医学部教授)

　良く耳にする「論文などは二の次で臨床を一生懸命すべきだ」という意見は正論である。なぜなら、外科医にとって最大の命題は、手術を成功させ患者さんに社会復帰して頂く点にあるからだ。しかし、臨床と論文とは本当に二律背反の事象であろうか？私は、外科臨床を効率良く学習する最良の方法が論文を書く作業だと考えている。10年ほど前に、外科の知識・技術の獲得に最も効率の良い手段は論文作成の作業そのものに在ると気付いた。もちろん、試験勉強と同様に論文書きは孤独な作業である。しかし、効率を求めるには常に自己犠牲が必要となる。そこで、自分自身の論文制作に関する経験を示し、第3外科教室員の参考に供したい。

Restaurants Around Riken

My personal recommendations for restaurants near RIKEN's Wako Campus.
Thanks to my boss's genius idea to hold the meeting on Saturday, I am constantly searching for the answer "where to eat near RIKEN without leaving Wakoshi?"
(Last updated 2015-05-10)

Eat alone

Gyoza No Ohsho (餃子の王将和光店) : This is a popular Chinese restaurant usually with a long line, particularly in dinner time of the weekends. It is located at the 1F of the Itoyokado - the big supermarket near the station, next to the Lotteria. (Map link)
Kishin Ramen (樹真ラーメン): The most popular Ramen restaurant in Wakoshi, another restaurant with long line, near the Wako post office. My personal pick is 濃厚スープラーメン(Dense Soap Ramen) [Tabelog link] [Map link]
Sushiro (スシロー和光白子店): Tasty, mostly 108 yen per dish. if you are a big fan of Sushi, can not miss this one. 5 minutes walk from the Ihouse H building. [Map link]

Eat with 1-3 friends

Restaurant listed above (Gyoza No Ohsho | Kishin Ramen | Sushiro)
Suehiro Yakiniku restaurant (スエヒロ館和光本町店): Very reasonable price and also great taste. Enjoyed every time. 10 minutes walk from RIKEN. [restruant link] [map link]
Ottori Yakitori Wako (おっ鳥家和光店): Open only for dinner (17：00～24：00). Highly recommended. [restaurant link][map link]

Eat with a bunch of people

Not recommended

Café Restaurant Gusto (Café レストランガスト) – the nearest restaurant and just outside RIKEN, a family restaurant [map link]
Japanese Restaurant Tonden - overpriced, taste is just so so. [map link]

Restaurant you should avoid (for safety reason)

Chicken Bastard (チキン野郎): I went to this restaurant for a farewell dinner with another 4 colleagues. 3 out of 5 got food poisoning the next day. The clinical diagnosis suggested the cause was Campylobacter bacteria. The restaurant is at the 2F of the Kishin Ramen, so be careful.

Friday, May 1, 2015

ACMG guides on the interpretation of sequence variants

Nature Reviews Genetics 16, 256–257 (2015) doi:10.1038/nrg3940

Link to the ACMG GuideLine (PDF)
The increased availability of sequence data in clinical settings has been accompanied by challenges in standardization, interpretation and reporting of genetic tests. Meeting the need for community guidelines, a new report from the American College of Medical Genetics and Genomics (ACMG) summarizes updated recommendations for the reporting and interpretation of sequence variants for Mendelian disorders in a clinical context.

Coding

Pages