Virtual Library

Start Your Search

J. Kocher



Author of

  • +

    P3.04 - Poster Session/ Biology, Pathology, and Molecular Testing (ID 235)

    • Event: WCLC 2015
    • Type: Poster
    • Track: Biology, Pathology, and Molecular Testing
    • Presentations: 1
    • +

      P3.04-055 - Accurate Strategies to Detect Clinical Important Long Indels from RNA-Seq Data: EGFR as Example (ID 2506)

      09:30 - 09:30  |  Author(s): J. Kocher

      • Abstract

      Background:
      Somatic mutations are driver for tumor development and tumor characteristics that can be used for diagnosis and targeted therapy. These mutations are mostly detected from tumor DNA. As dynamic molecules of gene activities, transcriptome by RNA-seq is increasingly popular, which not only measures gene expression but also structural variants such as alternative splicing, fusion products or mutations. The full utilization of the multi-level information will facilitate personalized medicine. Although single nucleotide mutations (SNVs) can be more easily identified from RNA-seq, intermediate insertions/deletions (indels) exert significant bioinformatics challenges as RNA-seq data is much more complex as a result splicing and most RNA-seq alignment programs do not align reads with gap well and variant callers designed for DNA-seq are not adequate for RNA-seq, which leaves most of important indels undetected.

      Methods:
      We evaluated commonly used RNA-seq analysis programs TopHat, BWA, BWA-MEM, STAR, and GSNAP along with single sample variant and paired tumor/normal somatic mutation callers GATK, VarScan, MuTect, JointSNVmix, SomaticSniper in a set of lung adenocarcinomas with known single nucleotide and indel (from 15 to 19 bases) mutations from exome-seq data. We aimed to develop highly sensitive and specific strategies for both single nucleotide and longer indel mutations that are important to clinical actions.

      Results:
      The alignment is the critical step for longer indel identification and the evaluated programs had a wide range of sensitivity to map sequence reads with indels, ranging from not at all (TopHat with either Bowtie 1 or 2) to a decent number of reads mapped if sequence reads are long (GSNAP). The sensitivity was significantly impacted by sequence lengths (50bp vs 100bp) or if gapped alignment was explicitly used. When sufficient reads with indels were aligned, most variant calling programs were able to detect the indels with varied sensitivities except MuTect which only single nucleotide mutations were reported. Specificity was highly filtering criteria dependent. We implemented and recommended different strategies for the indel detection depending upon which alignment program was used. For TopHat alignment, unmapped reads were realigned with BWA-MEM; alignments from STAR or GSNAP were further processed following RNA-seq variant detection best practice. With these strategies, we demonstrated high accuracy in SNV or somatic mutation detections in RNA-seq data compared with exome-seq data and known mutations validated from other technologies in lung adenocarcinoma datasets. With the information, a more comprehensive genomic aberration characterization can be made to each individual tumor for clinical decision making.

      Conclusion:
      With careful modifications and customization to bioinformatics algorithms, RNA-seq data can be reliably used for both single nucleotide and long indel detection that can be used for treatment selection and outcome prediction.