MAQGene

MAQ Gene Host-it-yourself webpage that inputs fastq reads and outputs annotated variants

Purpose and audience For C. elegans biologists with next-gen sequence data, facilitates genome-wide discovery of biologically meaningful mutations.

Scope Currently C. elegans only. Extension to other species under development.

Availability & Installation Requires Linux with Apache and MySQL servers. Available through svn:

# cut-and-paste the following in a shell:

svn co https://maqweb.svn.sourceforge.net/svnroot/maqweb/trunk maqgene

Or see SourceForge project Page

Automated Workflow

Aligns sample input reads to reference genome (with maq)
Computes sample genome consensus from aligned reads at each locus
Identifies loci that differ from the reference (‘variants’)
Annotates found variants with genomic features and computed biological effects
Outputs annotated variants in tabular flat file

Selected example output (Kindly provided by Christina Chen)

ID      Mut        CHR  START     PHASTCONS  REF  SAM  CS     LM       MQ     NQ     WR  VR   DP   PILEUP    VTP    IS  CLASS            DESCRIPTION          PARENT_FEATURES
183824  my_mutant  II   9964672   0.00       X    X    -1000  -999.99  -1000  -1000  3   8    11   @,,,g---  indel  0   nongenic         -1782 downstream     R53.2
10976   my_mutant  I    2777373   0.00       C    G    24     0.75     30     4      0   27   27   @GgGG---  point  0   five_prime_UTR   none                 Y71F9B.13c
204709  my_mutant  I    4345310   0.94       X    X    -1000  -999.99  -1000  -1000  2   -1   1    @,c,c     indel  4   frameshift       none                 ZK973.6
548     my_mutant  I    233319    -missing-  T    G    2      2.19     34     2      1   12   13   @GGGG---  point  0   missense         ATA->CTA[Ile->Leu]   Y48G1BM.6
18001   my_mutant  I    7407340   -missing-  C    T    3      9.44     1      2      1   8    9    @,TTT---  point  0   ncRNA            none                 C15A11.7b
102     my_mutant  I    24070     -missing-  G    A    14     1.25     5      14     0   4    4    @AaAa     point  0   nongenic         2709 into            Y74C9A.4b
73010   my_mutant  III  9101166   0.01       C    T    5      0.62     27     4      0   25   25   @TGtg---  point  0   non_start        ATG->ATA[Met->Ile]   ZK507.1
35147   my_mutant  II   1771674   -missing-  T    A    11     1.69     0      6      0   3    3    @Aaa      point  0   premature_stop   TGT->TGA[Cys->stop]  F36H5.1.1,F36H5.1.2
35873   my_mutant  II   2022881   0.00       A    G    5      3.38     22     2      1   5    6    @.gGg---  point  0   readthrough      TAA->CAA[stop->Gln]  F59H6.7
779     my_mutant  I    321706    0.00       C    T    4      0.81     31     0      0   75   75   @tTTT---  point  0   silent           GAG->GAA[Glu->Glu]   Y48G1A.5
423     my_mutant  I    200949    -missing-  C    T    16     1.00     28     14     0   5    5    @TTTt---  point  0   SNP              none                 haw294
75163   my_mutant  III  10797302  0.04       C    T    23     0.75     31     22     0   12   12   @tttt---  point  0   splice_acceptor  none                 K11D9.3.1,K11D9.3.2,K11D9.3.3
167603  my_mutant  X    4914714   0.00       C    A    3      0.75     29     0      1   8    9    @,aaa---  point  0   splice_donor     none                 K05B2.3
7609    my_mutant  I    1862796   -missing-  T    C    7      1.00     0      6      0   2    2    @cc       point  0   three_prime_UTR  none                 M01D7.2

Key


brief name      name in actual output        description
ID              variant_id                   arbitrary id number assigned to this variant by MAQGene
Mut             mutant_strain                name of this sample given by the user
CHR             dna                          chromosome identifier (may also be contig or any piece of dna in the reference)
START           start                        1-based start position on chromosome
PHASTCONS       per_locus_5way_conservation  optional per-locus associated information added by user
REF             reference_base               nucleotide in the reference genome at this position
SAM             sample_base                  majority base (or one of two) in the sample genome at this position
CS              consensus_score              Phred-scaled score, probability that called sample base is erroneous
LM              loci_multiplicity            Average number of additional loci that reads aligned here also mapped to
MQ              mapping_quality              Average Phred-scaled score, probability that reads were mapped here in error
NQ              neighbor_quality             NQS (Neighborhood quality standard) score
WR              number_wildtype_reads        number of reads aligned at this locus that match the reference base
VR              number_variant_reads         number of reads aligned at this locus that mismatch the reference base
DP              sequencing_depth             total number of reads covering this locus (number_wildtype_reads + number_variant_reads)
PILEUP          sample_reads                 symbolic slice of the alignment at this locus
VTP             variant_type                 either indel or point mutation
IS              indel_size                   0 for point mutation.  > 0 for insertion (more dna in sample than reference)
CLASS           class                        any of several classes of genic and non-genic annotations
DESCRIPTION     description                  extra biological or proximity information
PARENT_FEATURES parent_features              name of associated annotation feature relating to this entry