BIOINFORMATICS ANALYSIS

ACGT provides comprehensive bioinformatics services using our high performance computing platforms and state-of-the-art analysis tools. Our team of bioinformaticians focuses on delivering reliable, accurate data and helps you interpret the results. Our Bioinformatics services include Raw Data Analysis and QC, Genome and Transcriptome de novo Assembly, Whole Genome and Transcriptome Mapping, ChIP-Seq Analysis, Metagenomic Analysis, and other custom analyses.

Raw Data Analysis and QC

For initial raw data processing, the primary analysis including image processing and base calling is performed automatically on all sequencing runs. The reads that pass initial filters will be de-multiplexed, trimmed of adapters, and have low quality sequences removed. High quality data will be provided to the customer, or used for downstream workflows.

Genome and Transcriptome De novo Assembly

De novo assembly at both genomic and transcriptomic level is carried out by assembling short-sequencing reads into a larger, scaffolded set of sequences without any reference.

Whole Genome and Transcriptome Mapping

For whole genome mapping, the sequence reads are mapped to the reference genome to detect genetic variations (SNP, SV, CNV, INDEL) or to identify the methylation status of cytosines in CpG islands.

For transcriptome mapping, the sequence reads are mapped to a reference genome and presented in Fragments Per Kilobase of exon per Million fragments mapped reads (FPKM) or Reads per kb per Million mapped reads (RPKM) or read counts in order to determine gene expression level.

ChIP-Seq Analysis

For ChIP-Seq, initially the sequence reads are filtered and aligned to a reference genome. Next, the “peaks” are called, compared against a control sample, and annotated.

Metagenomic Analysis
  • In shotgun metagenomics, DNA extracted from the environmental or clinical samples is fragmented, and adapters are added. The fragmented libraries are then sequenced and de novo assembled.  The high quality contigs are then classified for an overview of the microbial diversity of the samples.
  • In short barcode sequence metagenomics, the libraries are made directly from the PCR amplicons for sequence targets that can provide identifying information (for example, the 16S rRNA gene in bacterial species). The sequencing reads from these libraries are in fact individual PCR product sequences, which can be counted and classified to generate a profile of the microbic community.

The results from the metagenomic analysis can be used for phylogenetic, species analysis and diversity of assessment of microbial communities living in many different environments.

Custom Analysis

More complex bioinformatics analysis for various research purposes, such as comparative genomic and transcriptomic analyses, methylation analysis, small RNA analysis, and cancer NGS data, can be arranged on a project-specific basis.

Data Management: Storage, Transfer, and Delivery

Because of storage limitations, we are not able to save the original image files. In the absence of specific arrangements with the customer, FASTQ and analysis files will be archived for 2 months.

In order to protect data integrity and confidentiality, we have a dedicated team of IT professionals onsite. Https and sFTP are the default transfer protocol for secure data transfer. Large data files will be delivered on hard drives and encryption using GPG can be optionally provided.

File formats
  • FASTQ: derived from FASTA format with the addition of quality scores. Each read from a sequencer comprises an identifier line, a sequence line, a second identifier line (or with a + character) and final a quality line. This typically forms the input to a mapping program (along with a FASTA reference genome). A typical human exome FASTQ file might be around 1-3GB, which can be compressed using gzip.
  • SAM format: human-readable text files containing details about alignment, mapping quality of sequence reads.
  • BAM format: BAM is the compressed binary version of the SAM format.
Useful sites