Monday, September 20, 2010

Export Plink Format File Under Affymetrix Genotyping Console

My colleague just got the genotyping data typed by Affymetrix 6.0 chips. Delivered to him in CEL files. After QC and genotype call, he want to analyze the data using the popular software Plink.

There was an option in Affymetrix Genotyping Console (AGC) as Exporting to Plink Format, however it requires ARR file which specify the sample attribution, ARR file was produced by Affymetrix GeneGchip Command Console (AGCC) and the process to make it was not that straight forward.

And here is an easy way to use the Affy 6.0 genotyping data in Plink without the ARR file.
Since Plink support transposed PED (tped) format as input, which is almost identical to the exported genotyping format by AGC. You just need to simply modify the file a little bit to fit this requirement. An example of the exported genotyping data file is as below:

#GenomeWideSNP_6.na30.annot.db
#%genome-version-ucsc=hg18
#%genome-version-ncbi=36.1
Probe Set ID sample1 sample2
SNP_A-2131660 CC CC
SNP_A-2131666 CC CT

To make the tped file, the annotation lines need to be skip (lines begin by #)
and the chromosome number/morgan position/physical position should be added in to the beginning of each SNP typing line. All these infomation will be easily get in the annotation file of affymetrix 6.0.

Tfam file which contain individual and family info should be made afterword.
when these two files are ready, it is simply to use the plink to convert it into bed format as

plink --tped tpedfile --tfam famfile --recode --make-bed --out newbedfile name
Update: Forgot to mention that the genotype data in tped data should be like "C C" rather "CC"
there should be a tab between otherwise you will get error message.