Bioinformatics Analysis
ACGT, Inc. provides comprehensive bioinformatics services using our supercomputing platforms and state-of-the-art analysis tools. Our team of bioinformaticians focuses on delivering reliable, accurate data and helps you interpret the results. Our analysis options include Raw Data Analysis and QC, Genome and Transcriptome De novo Assembly, Whole Genome and Transcriptome Mapping, ChIP Site Mapping, and Advanced Analysis.
Raw Data Analysis and QC
For initial raw data processing, the primary analysis including image processing and base calling will be performed on all sequencing runs. Once the initial data pass our quality control, it is analyzed and aligned into SAM and BAM format using Burrow-Wheeler Aligner (http://bio-bwa.sourceforge.net/bwa.shtml) and samtools (http://samtools.sourceforge.net/), which can be used for further analysis.
Genome and Transcriptome De novo Assembly
De novo assembly at both genomic and transcriptomic level is carried out by assembling short-sequencing reads into an entire reference sequence using SOAPdenovo.
Whole Genome and Transcriptome Mapping
For whole genome mapping, the sequencing reads from the whole genome are mapped to the reference genome to detect genetic variations (SNP, SV, CNV) or to identify the methylation status of cytosines in CpG islands, using SOAPaligner
For transcriptome mapping, the sequencing reads are mapped to each gene and presented in Fragments Per Kilobase of exon per Million fragments mapped reads (FPKM) or Reads per kb per Million mapped reads (RPKM) using Tophat (http://tophat.cbcb.umd.edu/) and Cuffllinks (http://cufflinks.cbcb.umd.edu/) in order to determine gene expression level.
ChIP Site Mapping
ChIP-Seq data generated from the genome analyzer are transitioned through several phases for complete analysis. Initially, the sequence reads are aligned to the genome sequence using GERALD module included in Illumina’s CASAVA v1.82 (http://www.illumina.com/support/sequencing/sequencing_software/casava/questions.ilmn). Read data that are uniquely aligned to a genome then can be viewed as a custom track in the UCSC genome browser (http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html).
Metagenomic Analysis
In Metagenomics Analysis, the libraries are made directly from the PCR amplicons for sequence targets that can provide identifying information (for example, the 16S rRNA gene in bacterial species). The sequencing reads from these libraries are in fact individual PCR product sequences. They are aligned to determine the complexity of the sample, and frequency of occurrence for each unique variant. Then BLAST is utilized to identify component species. The results from the metagenomic analysis can be used for phylogenetic, species analysis and diversity of assessment of microbial communities living in soil, water, or human body.
Advanced Analysis
More complex bioinformatics analysis for various research purposes, such as complex diseases, cancer, or population analysis, can be arranged on a project-specific basis.
Data Management: Storage, Transfer, and Delivery
Because of storage limitations, we are not able to save image files for the GA IIx. FASTQ files will be archived for X time. Analysis files will be archived for X time.
In order to protect data integrity and confidentiality, we have a dedicated team of IT professionals on site. Https and sFTP are the default transfer protocol for secure data transfer. Large data files will be delivered on hard drives and encryption using GPG can be optionally provided.
File formats
• FASTQ: derived from FASTA format with the addition of quality scores. Each read from a sequencer comprises an identifier line, a sequence line, a second identifier line (or with a + character) and final a quality line. This typically forms the input to a mapping program (along with a FASTA reference genome). A typical human exome FASTQ file might be around 10-15GB, which can be compressed to 5-6GB using gzip.
• SAM format: mapped/aligned sequence containing detail about alignment, mapping quality etc. This usually contains a subset of the raw reads (as some will have been discarded at the mapping stage). The SAM (or BAM) file is typically used as the substrate for variant calling algorithms and other analyses.
• BAM format: binary version of SAM. A typical human exome BAM file might be around 2GB in size.
Useful sites
• SEQanswers: an online forum – extremely useful for NGS information
http://seqanswers.com
• Illumina:
http://www.illumina.com
