Illumina Next-Generation Sequencing
Next Generation Sequencing and Data Analysis
Select a heading below to learn more about Next Gen Sequencing and Data Analysis Services
Next Generation Sequencing
The ATGC provides a complete next generation sequencing service. Investigators provide the facility with genomic DNA, total RNA or ChIP DNA (depending on the requested application) and the facility provides complete sample processing. The output of the NGS service is FASTQ files. Data analysis is available for full service (library preparation and sequencing) submissions.
NGS services include:
1. Project consultation and budget planning with a facility representative and an MDACC faculty bioinformatician.
2. Library Preparation: An NGS library is made up of random fragments that represent the entire sample. It is created by shearing DNA into 150-400 base fragments. These fragments are ligated to specific adapters. Library fragments of the appropriate size are then selected (size is application dependent) and isolated. Following a sample cleanup step, the resultant library is quantified by qPCR and checked for size distribution using the Agilent TapeStation. The ATGC has automated library preparation for most applications using the Eppendorf EPMotion 5075 and Agilent Bravo Liquid Handlers.
3. Cluster Generation: Library fragments are bound to a flow cell by hybridizing the fragments to a lawn of oligonucleotides complementary to the adapter sequences. Bound fragments are clonally amplified by bridge amplification to create millions of individual dense clusters of clones. Cluster generation occurs on-instrument, in a closed environment on the NovaSeq6000, NextSeq500 and iSeq 100 instruments.
4. Illumina Sequencing: Sequencing on the flow cell employs Illumina’s well-established sequencing-by-synthesis chemistry. This chemistry utilizes two (NovaSeq6000, NextSeq500) or four (MiSeq) reversible terminator nucleotides, each possessing a chemically blocked hydroxyl group. To begin sequencing, primers are hybridized to single stranded, covalently bound templates on the flow cell. Fluorescently labeled nucleotides are then flowed across the flow cell. During chain extension the fluorescent nucleotides compete for incorporation into the growing DNA chain. A single complementary nucleotide is incorporated into each DNA, terminating the chain and resulting in the simultaneous one base extension of millions of DNA clusters. The incorporated nucleotides are excited by a laser, and emit their characteristic fluorescence (or lack of fluorescence). This fluorescence is detected and recorded in an imaging step. Following base detection, the fluorescent dye is cleaved and the 3’ hydroxyl block is chemically reversed, allowing chain extension to continue. This is repeated 50 to 500 times, generating a series of images.
5. Data Analysis: The raw data generated is imaged, and base-called before sequence analysis begins. Sequences generated are de-multiplexed and transferred to an institutional server where the sequence data is accessed by MDACC bionformaticians. Data analysis is performed in collaboration with faculty from the department of bioinformatics.
Instrumentation
NovaSeq 6000
Flow Cell Type and Cycle # | Number of Lanes | Run Options | Approx. PF Clusters (M) | Estimated Output (Gb)* | MDACC Price** |
---|---|---|---|---|---|
S-Prime-500 | 1 | 2X250 | 650-800 | 325-400 | $6,606 |
S-Prime-500 Xp | 2 | 2X250 | 325-400 | 162-200 | $3,505 |
S-Prime-300 |
1 | 2X150 | 650-800 | 200-250 | $5,178 |
S-Prime-300 Xp | 2 | 2X150 | 325-400 | 100-125 | $2,800 |
S-Prime-200 | 1 | 2x100 | 650-800 | 134-167 | $4,634 |
S-Prime-200 Xp | 2 | 2x100 | 325-400 | 67-83 | $2.517 |
S-Prime-100 |
1 | 1X100, 2X50, 26X91X8 | 650-800 | 65-80 | $3,355 |
S-Prime-100 Xp | 2 | 1X100, 2X50, 26X91X8 | 325-400 | 32-40 | $1,888 |
S1-300 | 1 | 2X150 | 1300-1600 | 400-500 | $8,208 |
S1-300 Xp | 2 | 2X150 | 650-800 | 200-250 | $4,314 |
S1-200 | 1 | 2x100 | 1300-1600 | 266-333 | $7,566 |
S1-200 Xp | 2 | 2X100 | 650-800 | 133-166 | $3,993 |
S1-100 | 1 | 1X100, 2X50, 26X91 or 28X91 |
1300-1600 | 134-167 | $5,965 |
S1-100 Xp | 2 | 1X100, 2X50, 26X91 or 28X91 |
650-800 | 67-83 | $3,193 |
S2-300 | 1 | 2X150 | 3300-4100 | 1000-1250 | $14,038 |
S2-300 Xp | 2 | 2X150 | 1650-2050 | 500-625 | $7,227 |
S2-200 | 1 | 2x200 | 3300-4100 | 667-833 | $12,850 |
S2-200 Xp | 2 | 2X100 | 1650-2050 | 333-416 | $6,698 |
S2-100 | 1 | 1X100, 2X50, 26X91 or 28X91 |
3300-4100 | 333-417 | $10,422 |
S2-100 Xp | 2 | 1X100, 2X50, 26X91 or 28X91 |
1650-2050 | 166-208 | $5,421 |
S4-300 | 1 | 2X150 | 8000-10000 | 2400-3000 | $20,360 |
S4-300 Xp | 4 | 2X150 | 2000-2500 | 600-750 | $5,167 |
S4-200 | 1 | 2X100 | 8000-10000 | 1600-2000 | $17,554 |
S4 200 Xp | 4 | 2X100 | 2000-2500 | 400-500 | $4,465 |
*The ATGC does not guarantee output for investigator-prepared libraries.
Custom run formats are available for full flow cell submissions.
**Please see the Price List for current External or GCC pricing.
NextSeq500
The Illumina NextSeq 500 System is capable of sequencing a 30X human genome in a single run. Two flow cell formats and multiple reagent configurations enable data output from 20–120 Gb in a single run, providing flexibility across a broad range of applications. It has a simple workflow and quick run times that enable fast sequencing of exomes, transcriptomes, and whole genomes. The NextSeq 500 sequencer generates up to 400 million clusters passing filter (up to 120 Gb) in the High Output configuration and up to 130 million clusters passing filter (up to 40 Gb) in the Mid Output configuration.
NextSeq500 Sequencing-per run | Estimated Output per Run | Approx. SE Reads Per Run | MD Anderson Price* |
---|---|---|---|
High Output | |||
75SR | 25-30 Gb | 400 million | $2,025 |
75PE | 50-60 Gb | 400 million | $3,647 |
150PE | 100-120 Gb | 400 million | $5,655 |
Mid-Output | |||
75PE | 16-19 Gb | 130 million | $1,587 |
150PE | 35-39 Gb | 130 million | $2,325 |
*Please see the Price List for current External or GCC pricing.
MiSeq
The Illumina MiSeq is a low output sequencer capable of generating up to 15 Gb of data per instrument run. It has the longest read-length in the Illumina line-up, generating up to 600 bases per 300bp paired-end run. A variety of flow cells and read lengths provide flexibility on this single sample platform.
Flow Cell Type | Maximum Output | Approx. SE Reads Per Run | MD Anderson Price* |
---|---|---|---|
MiSeq600 V3 | 15 Gb | 20-25 million | $2,025 |
MiSeq300 V2 | 4.5-5 Gb | 10-15 million | $1,455 |
MiSeq500 V2 | 7.5-8 Gb | 10-15 million | $1,609 |
MiSeq Nano 300 | 300 Mb | 1 million | $574 |
MiSeqv3-150 | 3.3-3.8 Gb | 20-25 million | $1,293 |
MiSeqv2-50 | 750–850 Mb | 10-15 million | $1,187 |
*Please see the Price List for current External or GCC pricing.
NGS Applications
Transcriptome Analysis
Transcriptome analysis may be quantitative (gene expression analysis) and/or qualitative (transcript discovery, splice variant identification, coding SNP validation, gene fusions). The ATGC offers several options for transcriptome analysis. The choice of sample preparation method is based on the sample quality, quantity and the investigator’s experimental objective.
RNA Applications |
|
||
---|---|---|---|
Application | FFPE Compatible | Strand Specific | Application Notes |
RNA Exome (RNA Access) | Yes | Yes | Human mRNA-Seq for FFPE samples- Generates cDNA from total RNA then captures the exome regions. This protocol is optimized for sequencing RNA from degraded or FFPE samples and samples with limited starting material. RNA-Access enables the discovery of novel features such as alternative splicing, fusion transcripts, coding splice variants and quantitative gene expression analysis. Capture based method covering 98.3% of the RefSeq Exome. Human only |
Stranded mRNA-Seq | No | Yes | Uses oligo dT based capture for Poly enrichment followed by cDNA synthesis using random and oligo dT priming. Sequences generated map to coding regions of the genome. Applications: Gene expression quantitation, fusions, splice variants- Poly A transcripts only |
Stranded Total RNA-Seq |
Yes | Yes | Here rRNA depletion is performed (no Poly A enrichment) followed by cDNA synthesis utilizing oligo-d(T) and random hexamers. This method allows the sequencing of mRNA and non-polyadenylated RNA including histone mRNAs, precursors for Cajal body related small RNAs, and lncRNAs. Sequences map to exons and intergenic regions. Applications: Gene expression, lncRNA. More complete transcriptome with Poly A and non-Poly A transcripts |
Low Input mRNA-Seq |
No |
No |
mRNA--Seq for good quality samples with <100ng total RNA |
Low Input Total RNA-Seq | Yes | Yes | Total RNA-Seq for samples with <100ng total RNA |
Small RNA-Seq including miRNA-Seq | Yes | No
|
Quantification of miRNA expression. Protocol integrates Unique Molecular barcodes (UMIs ) into the reverse transcription reaction enabling unbiased andaccurate miRNome-wide quantification of mature miRNAs. UMIs require 75SR sequencing. |
TCR a/b Profiling | No | No | TCR alpha and Beta targeted sequencing |
RNA Capture | Custom application | ||
RIP-Seq | No | No | Investigator provides immunoprecipitated RNA |
Note: Strand specificity — Preserves strand information. Strand specificity can be used to identify antisense transcripts, determine the transcribed strand of non-coding RNAs and may help to demarcate the boundaries of overlapping genes.
DNA Applications | ||
---|---|---|
Application | FFPE Compatible | Application Notes |
Agilent Exome V7 | Yes | SureSelect Human All Exon v7, is a comprehensive exome, designed using the latest versions of RefSeq (99.3% coverage), GENCODE (99.6% coverage), CCDS (99.6% coverage) and UCSC Known Genes (99.6% coverage). Design Size: 48.2 Mb |
Agilent Clinical Research Exome | Yes | The SureSelect clinical research exome V2 is a comprehensive medical exome with overall exonic coverage, enhanced coverage of genes associated with disease and increased coverage of HGMD, OMIM, ClinVar, and ACMG targets. The associated gene list includes gene names and evidence of their disease relevance. Design Size: 67.3 Mb |
T200.1 Panel | Yes | 263 gene solid tumor panel. Covers all exons. |
ChiP-Seq | NA | Used to identify transcription factor (protein) binding sites in genomes and specific cell types. The investigator performs chromatin IP and provides antibody captured DNA to ATGC. Library requires a minimum of 10ng of immunoprecipitated DNA. Maximum size 500bp. Enrichment should be 10 fold or greater. Requires ' input' DNA for analysis comparison. |
Targeted Capture | Yes | Custom Application. Selectively enriches for and sequences investigator defined regions of interest. The ATGC provides custom targeted capture using hybridization-based capture probes and amplicon-based enrichment methods (Twist, Agilent and IDTdesigns). The facility stocks the Twist T200.1 panel. |
Whole Genome -Seq | No | Human, Mouse, Rat, Yeast, Monkey, Viral, Bacterial and other genomes. For applications in cancer research, the ATGC provides sequencing of matched tumor and normal samples. |
Data Analysis
Data analysis is available to investigators with full-service submissions (library preparation and sequencing) on a fee-for-service basis.
Application | Service Description | Price | Estimated Turnaround Time (Analysis only) |
---|---|---|---|
Bulk mRNA and Total RNA-Seq | a. Bam files b. FPKM value or normalized count data c. Differential expression analysis for comparison of two groups d. Gene set enrichment analysis (GSEA) for comparison of two groups |
$90/sample | 2 weeks: < 20 samples 3-4 weeks: 20–50 samples |
Whole Exome | a. Bam files b. Variation detection including somatic mutation detection by GATK c. Copy number alteration detection including GISTIC analysis |
$120 T/N pair | 2 weeks: <20 samples 3-4 weeks: 20–50 samples |
Bulk ChIP-Seq | a. MACS-based peak calling b. Differential peak detection by diffReps c. Motif discovery on differential peaks by Homer d. Super-enhancer/enhancer detection for H3K27ac ChIP-Seq by ROSE |
$80/sample | 2 weeks: < 20 samples 3-4 weeks: 20–50 samples |
scRNA-Seq | a. pre-processing including pre-filtering and normalization and batch bias correction b. *auto-annotation of cell types and subtypes by clustering with marker genes detection for each cluster. c. Cellular composition analysis of cell type between case and control. d. Differential expression analysis for cell subpopulations of interest, GSVA analysis, and GSEA analysis. e. Cell trajectory analysis and pseudotime analysis. f. SCENIC analysis for the identification of transcription factors (TFs). |
$400/sample | 4–6 weeks |
Whole Genome Analysis | a. Bam files b. Somatic mutation detection by GATK/Mutect2 c. Copy number alteration detection including GISTIC analysis d. Structural variant detection by GATK-SV and AnnotSV |
$180 Per T/N pair | 4 weeks: <20 samples 6-8 weeks: 20–50 samples |
*sc data analysis is an interactive process that requires active participation by the requesting lab. Manual cell identification is performed by the requesting lab using supporting data provided by the ATGC.
ATGC Director of Bioinformatics
Xiaoping Su, Ph.D.
Data Scientist
Yunxin Chen
Associate Data Scientist
Lijin Joo, Ph.D.
Getting started
Project Consultation
The ATGC provides budget planning, technology consultations and project planning to cancer center members. We strongly recommend that first time NGS service users and investigators with large-scale projects schedule a meeting before initiating a project. To schedule a consultation meeting, please contact Erika Thompson at ejthomps@mdanderson.org.
Sample Submission
All samples should be accompanied by a completed sample submission form. Sample submission requirements (minimum quantity and recommended sequence length) vary based on the service and sample type.