Wednesday, October 21, 2015

Autism Genetics and Open Science

“MSSNG”, a project which aims to sequence the whole-genome of 10,000 autistic individuals and make the data available to the public, was launched by Autism Speaks in collaboration with Google. The name of “MSSNG” is pronounced as “missing”, but has vowels omitted. This deliberate omission reflects the missing puzzle pieces of autism.

The address of the MSSNG website is, which is easy to remember and hard to forget. This project will make a huge impact in the genetic studies of ASD given its unprecedented sample size and the open access policy. The huge volume of the sequencing data will be hosted in the Google Cloud Platform and can be analyzed with Google Genomics tool. MSSNG may lead to a major breakthrough in the understanding of the cause, the diagnosis and treatment of the autism. 

Tuesday, October 20, 2015

2015 ASHG annual meeting in Baltimore - Memo and Thoughts

I had the great honor to be a recipient of the traveling grant awarded by the Japanese Association for Propagation of the Knowledge of Genetics. With this generous support, I was able to attend the American Society of Human Genetics (ASHG) 2015 annual meeting held in Baltimore from Oct 6th to 10th – one of the most important and largest meetings for genetic research. The scientific programs were well organized and covered a wide range of topics from statistical genetics, population genetics, to fundamental research, and extend to clinical applications. The cutting-edge genome-editing tool Crispr-Cas9 was also highlighted in this meeting and there was a special session dedicated to the recent advances in this field; in recognize to the importance of this technology, the prestigious ASHG Gruber Genetics Prize was awarded to Prof. Emmanuelle Carpenter and Prof. Jennifer Doudna for their contribution to the discovery and application of the CRISPR-Cas9 system. In addition, with more than 6500 attendees including researchers, clinicians and vendors, the meeting also provided premium opportunities of extending network and establishing future collaborations.

I was inspired by a number of exciting studies and also greatly benefit from the stimulating discussion with other attendees. For me, the most impressive thing I learned is the utility of the high-throughput chromosome conformation capture (Hi-C) data. In the talk titled “The influence of structural variation on genomic integrity and gene regulation”, Dr. Malte Spielmann from Max Planck Institute for Molecular Genetics presented his remarkable research on the organization of the mega-base scale topologically associated domains (TADs) and functional consequence if the integrity of such genome architecture disrupted. The work “Disruptions of Topological Chromatin Domains Cause Pathogenic Rewiring of Gene-Enhancer Interactions” was published in the prestigious journal Cell ( In this study, the researchers found that different forms of structure variations known as copy number variation (CNV) in the Epha4 locus lead to different types of limb malformations.  With the aid of the Hi-C data, the authors identified 3 TAD in this locus, they hypothesized that CNVs may disrupt local chromatin organization and change the enhancer-promoter interactions, leading to abnormal expression of the adjacent genes outside the original TAD - which advocates the concept that enhancer adoption might be a pathogenesis mechanism. By using CRISPR/Cas9 genome editing, they created mice with different chromosomal rearrangements found in human patients (the methodology part was published in Cell Report) and showed that if the CNV disrupted a CTCF-associated boundary domain, the gene located in the neighboring TAD will be unregulated by distal enhancer and thus lead to the abnormal limb formation. This study demonstrated the chromatin topology integrity is an essential component for understanding of the molecular mechanisms of pathogenesis especially related with large chromosomal variations. In another talk by Rao et al, an extremely high-resolution 3D maps of human and mouse genome was introduced and there is an accompanying software to visualize the intensive Hi-C data (  
In addition to above, I found the talk “Epigenetic and transcriptional dysregulation of oxytocin receptor (Oxtr) in Tet1 methyl cytosine deoxygenate deficient mouse brain” quite interesting. In this talk, Dr. Tower discovered that Oxtr was among the top down-regulated genes in the hippocampus of Tet1-/- mice. Tet1 is a gene with pivotal role in the DNA demethylation in mammals. They further demonstrated that the down-regulation of Oxtr was mediated by the hypermethylation of the CpG island (CGI) located within Oxtr exon 3 in Tet1-/- mice rather than CGI in the promoter region. While CGI hypermethylation was not observed in ESCs, hypermethylation of exon 3 of Oxtr was detected as early as E14.5. This suggests TET1 is necessary for preventing hypermethylation of Oxtr within the first few days post conception in mice. Given the critical role of Oxtr in social and maternal behavior, they went on to the behavior test and observed impaired maternal care in virgin Tet1-/- female mice, as evidenced by a longer latency to pup retrieval and less time spent huddling with the pups.

In the poster session, basically I visited all posters that have the keywords of either “autism” or “CNV” in the abstracts. One of the interesting presentations is No. 3123F “Whole-exome sequencing identifies a novel 2.5 kb duplication in INSR in a patient with Donohue syndrome”. The mutations in gene Insulin Receptor (INSR) was known to cause Donohue syndrome - a rare disorder characterized by severe insulin resistance. However, for several patients of Donohue syndrome from the same family, no mutation was found after standard Sanger sequencing of the whole INSR gene. To search for other pathogenic mutation, the whole exome sequencing (WES) was conducted but still no plausible mutation was identified. At this situation, the authors performed CNV calling and found a 2.5 kb micro-duplication spanning exon 10-11 of the exact causal gene INSR. Further analysis revealed this duplication caused the frame-shift of the coding sequence and resulted in a premature stop codon. To summarize, for WES, it is recommended to search potential CNV when no promising results obtained from SNV analysis. In another presentation No. 3138 titled “comprehensive comparative performance analysis of high-resolution array platforms for genome-wide CNV detection in humans”, I was surprised to know Affymetrix 6.0 chip outperforms CytoScan, a chip designed solely for CNV analysis. In poster No. 1755, Kaviar, a comprehensive public catalog of human variant and genotype frequency was demonstrated and is accessible at This tool combines 31 public data sources and 4622 private whole genome sequences. It integrates genome variation data from 77,238 unrelated individuals, including the 1000 Genomes Project's data, UK10K COHORT allele frequencies representing 3781 individuals, the Exome Aggregation Consortium (ExAC) 63,000 exomes, and 808 whole genomes from the Alzheimer's Disease Neuroimaging Initiative (ADNI). In short, it provides a one-stop query engine when one needs to look up the allele frequency of the rare variant.

I also participated one poster walk “Genome Structure, Variation, and Function” led by Prof. Manolis Dermitzakis. He discussed three selected posters and shared his insights into how genetic variants exerts the influence on gene expression level. I personally found No. 3173F intriguing. In this comprehensive study of gene regulating variation, the authors evaluated the variation’s influence on distal epigenetic modification, mRNA stability, transcription and translation rate, and ribosome occupancy. They found that as many as 30% of all QTLs that affect protein expression levels do not appear to affect chromatin-level traits. Instead, they tend to modulate gene expression levels directly by affecting splicing and/or RNA decay.

My personal reflection on this year’s ASHG is that with the trends towards higher-resolution, higher throughput data (Hi-C, Encode and whole exome/genome data from thousands of samples), and the availability of the genome-editing tool to manipulate the genome in cell/animal level. Many challenging biological hypotheses now can be tested with computational, statistical, experimental methods and will in turn lead to a better understanding the genetic mechanism of the biological process such as development and aging, and the pathogenesis mechanism of diseases.