Comprehensive Resources for Cancer NGS Data Analysis

Tools in Cancer Genomics

Alignment

  1. BFAST : An alignment tool for fast and accurate mapping of short reads to reference sequences

  2. BWA : A software package for mapping low-divergent sequences against a large reference genome, such as the human genome.

  3. Bowtie : An ultrafast, memory-efficient short read aligner.

  4. Novoalign/NovoalignCS : An aligner for single-ended and paired-end reads from the Illumina Genome Analyser.

  5. MAQ : Mapping and Assembly with Quality, builds assembly by mapping short reads to reference sequences.

  6. SHRiMP : A software package for aligning genomic reads against a target genome.

  7. SOAP2 : (Short Oligonucleotide Analysis Package), a program for faster and efficient alignment for short oligonucleotide onto reference sequences.

  8. SSAHA2 :(Sequence Search and Alignment by Hashing Algorithm), a pairwise sequence alignment program designed for the efficient mapping of sequencing reads onto genomic reference sequences.

  9. GASSST : A Global Alignment Short Sequence Search Tool .

  10. PASS : A program to align short sequences .

  11. MicroRazerS : A rapid alignment of small RNA reads. .

  12. SeqMap : A tool for mapping large amount of oligonucleotide to the genome .

  13. PerM : An Efficient mapping of short sequences accomplished with periodic full sensitive spaced seeds .

GENOMIC VARIATION DISCOVERY

Mutation Calling

  1. GAMES : Identifies and annotates mutations in next-generation sequencing projects.

  2. CoNAn-SNV : A probabilistic framework for the discovery of single nucleotide variants in WGSS data.

  3. LoFreq : A fast and sensitive variant-caller for inferring single-nucleotide variants (SNVs) from high-throughput sequencing data.

  4. GATK: A multiple-sample, technology-aware SNV and indel caller.

  5. JointSNVMix: A probabilistic model for detection of somatic mutations in normal/tumour pair.

  6. SAMtools: A set of utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer.

  7. SNVMix: A tool for SNV calling based on probabilistic binomial mixture model.

  8. SOAPsnp: A tool for identifying SNVs by Beijing Genomics Institute (BGI).

  9. Strelka: A tool for somatic small-variant calling from sequenced tumor-normal sample pairs.

  10. SomaticSniper: A program to identify SNVs that are different between tumor and normal sample.

  11. VarScan: A platform-independent, technology-independent software tool for identifying SNVs, indels, and CNVs in massively parallel sequencing of individual and pooled samples.

Indel

  1. Dindel: A program for calling small indels from short-read sequence data from Illumina platform.

  2. Pindel: A tool for identifying indels and structural variants at single-based resolution from next-generation sequence data.

  3. SplazerS: A tool for detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing.

  4. MoDIL : Detecting INDEL Variation with Clone-end Sequencing - Resources and Source Code .

  5. PyroHMMvar : A program to call short indels and SNPs for Ion Torrent and 454 data.

Structural Variation

  1. svseq2 : An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data.

  2. BreakDancer: A tool for detecting five types of SVs (insertions, deletions, inversions, inter- and intra-chromosomal translocations) from next generation paired-end sequencing reads.

  3. CREST: A software that uses the soft-clipped reads to directly map the breakpoints of SVs.
  4. GASV: A tool for identifying and comparing structural variants by computing intersections of breakpoint regions.

  5. HYDRA: A tool for detecting structural variants in both unique and duplicated genomic regions.

  6. PEMer: A software package for detecting SVs from paired-end reads.

  7. R453Plus1Toolbox: An R/Bioconductor package for the analysis of Roche 454 sequencing data.

  8. SVMerge: A tool for SVs analysis by integrating calls from several existing SV callers.

  9. SVDetect: A tool for identifying structural variations from paired-end/mate pair data.

  10. VariationHunter: An tool for identifying structural variations from paired-end WGS data.

  11. destruct: A software tool for identifying structural variation in tumour genomes from whole genome illumina sequencing.

Copy Number Variation

  1. CMDS: A population-based method for recurrent CNVs analysis from multiple samples.

  2. CNAseg: A tool for Identifying CNVs in cancer from NGS data.

  3. cnvHMM: A tool for CNVs analysis using Hidden Markov algorithm.

  4. CNVnator: A tool for CNV discovery and genotyping from depth of read mapping.

  5. FREEC: A tool for control-free CNVs detection using deep-sequencing data.

  6. RDXplorer: A tool for CNVs detection in whole human genome sequence data using read depth coverage.

  7. SegSeq: A tool for detecting CNVs from short sequence reads.

  8. VarScan: A platform-independent, technology-independent software tool for identifying SNVs, indels, and CNVs in massively parallel sequencing of individual and pooled samples.

  9. GENSENG : A software detecting CNVs from NGS data.

  10. CNV-seq : A method for detecting DNA copy number variation (CNV) using high-throughput sequencing .

  11. mrCaNaVaR : A copy number caller that analyzes the whole-genome next-generation sequence mapping read depth to discover large segmental duplications and deletions.

Mutation Effects

  1. ANNOVAR: An efficient software tool to use update-to-date information to functionally annotate genetic variants detected from diverse genomes.

  2. PolyPhen-2: (Polymorphism Phenotyping v2) is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations.

  3. CHASM: (Cancer-specific High-throughput Annotation of Somatic Mutations) is a method that predicts the functional significance of somatic missense mutations observed in the genomes of cancer cells.

  4. SIFT: predicts whether an amino acid substitution affects protein function.

Assembly

  1. ALLPATHS-LG : High quality genome assembly from low cost data.

  2. Celera Assembler : A de novo whole-genome shotgun (WGS) DNA sequence assembler.

  3. Geneious : Software for analyzing both your high-throughput and Sanger sequencing data .

  4. LOCAS : A software to assemble short reads of next generation sequencing technologies at low coverage.

  5. Contrail : A Hadoop based genome assembler for assembling large genomes in the clouds.

  6. MIRA : A whole genome shotgun and EST sequence assembler for Sanger, 454, Solexa (Illumina), IonTorrent data and PacBio (the later at the moment only CCS and error-corrected CLR reads). .

  7. Velvet : a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom .

  8. CongrPE : a de novo assembly algorithm, named CongrPE, for Next-Generation Sequencing technology.

  9. ZORRO : A hybrid sequencing technology assembler.

  10. ABySS : Assembly By Short Sequences - a de novo, parallel, paired-end sequence assembler .

Annotation

  1. wANNOVAR : Annotating genetic variants for personal genomes via the web.

  2. ANNOVAR : An efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, as well as mouse, worm, fly, yeast and many others).

  3. SVA : (Sequence Variant Analyzer) Software to annotate,visualize and analyze the genetic variants identified through NGS.

  4. WebApollo : Browser based tool for visualization and editing of sequence annotations .

  5. Chaos : A Perl-based system for annotation of variants identified in high-throughput sequencing experiments developed at the Wellcome Trust Centre for Human Genetics (WTCHG).