Sunday, October 31, 2010

Amazon AWS 1 year free tier usage

Amazon just announced 1 year free usage tier for its popular cloud service: Amazon Web Service (AWS). The free package includes 1 micro instance, 10GB EBS storage and 5 GB S3 storage and 30GB data tranfer per month.

This free tier will be more than enough to host a website for the lab or set up a wiki or blog.
But the more interesting point is to try different things without any cost of failure, there are already enough system images (AMIs based on linux or Windows) can be just selected and run.

There are also AMI image for bioinformatics which include pre-installed software and library for biology science. The popular Galaxy system also has an AMI for those who want to run the service on their own. The Galaxy team also made a good tutorial for install it in the AWS.


It was mentioned this free offer might be stopped at any time, so be early to register.


Monday, September 20, 2010

Export Plink Format File Under Affymetrix Genotyping Console

My colleague just got the genotyping data typed by Affymetrix 6.0 chips. Delivered to him in CEL files. After QC and genotype call, he want to analyze the data using the popular software Plink.

There was an option in Affymetrix Genotyping Console (AGC) as Exporting to Plink Format, however it requires ARR file which specify the sample attribution, ARR file was produced by Affymetrix GeneGchip Command Console (AGCC) and the process to make it was not that straight forward.

And here is an easy way to use the Affy 6.0 genotyping data in Plink without the ARR file.
Since Plink support transposed PED (tped) format as input, which is almost identical to the exported genotyping format by AGC. You just need to simply modify the file a little bit to fit this requirement. An example of the exported genotyping data file is as below:

#GenomeWideSNP_6.na30.annot.db
#%genome-version-ucsc=hg18
#%genome-version-ncbi=36.1
Probe Set ID sample1 sample2
SNP_A-2131660 CC CC
SNP_A-2131666 CC CT

To make the tped file, the annotation lines need to be skip (lines begin by #)
and the chromosome number/morgan position/physical position should be added in to the beginning of each SNP typing line. All these infomation will be easily get in the annotation file of affymetrix 6.0.

Tfam file which contain individual and family info should be made afterword.
when these two files are ready, it is simply to use the plink to convert it into bed format as

plink --tped tpedfile --tfam famfile --recode --make-bed --out newbedfile name
Update: Forgot to mention that the genotype data in tped data should be like "C C" rather "CC"
there should be a tab between otherwise you will get error message.