AGSNP
AGSNP is an annotation-based, genome-wide SNP discovery pipeline package using NGS data for large and complex genomes without a reference
genome sequence is reported here. Roche 454 shotgun reads with low genome coverage of one individual are annotated in order to distinguish
single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents
of shotgun reads of another individual generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative
SNPs. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence.
The pipeline is applicable to all current NGS platforms, provided that at least one of them generates relatively long reads. This pipeline
package with a user's giude is available upon request.
More information:
You FM, Huo N, Deal KR, Gu YQ, Luo MC, McGuire PE, Dvorak J, Anderson OD.
Annotation-based genome-wide SNP
discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without
a reference genome sequence. BMC Genomics, 2011, 12(1):59.
AGSNP Pipeline programs
| Function |
Program |
Dependency |
| Roche 454 read annotation |
| BLAST search |
multi-blast.pl
batch_blast2table.pl
unique_query_ids.pl
extract_seqs.pl
|
Blast2 package |
| Gene annotation |
bwa_mapping_pipeline.pl |
BWA package |
| Assembly |
batch_gsassembly.pl |
gsAssembler (Newbler) (454 Life Technologies) |
| Removing artificial duplicates |
batch_clustering_reads.pl |
cd-hit-454 |
| SNP discovery |
| Format conversion utilities |
roche2fastq.pl
fasta2fastq.pl
solid2fastq.pl
|
BWA package |
| Read mapping and SNP calling |
bwa_snp_pipeline.pl
|
BWA package
SAMTools
|
| SNP filtering |
SummarizeGBSSNPs.jar
snp_filter_pipline_hom.pl
snp_filter_pipline_het.pl
snp_filter_pipline_mg.pl
|
|
|
Download
The package and user's manual can be downloaded by clicking the following links:
AGSNP_release1.2.tar.gz
AGSNP_unsermanual_v1.2.pdf
If you are interested in this package, please contact Dr. Frank M. You (frank.you@agr.gc.ca).
Genome-wide SNP discovery in the large and complex Aegilops tauschii genome
The pipeline program package, AGSNP, was used for SNP discovery between two accessions of
Ae. tauschii, AL8/78 and AS75, the parents of the F2 mapping used for the construction of
an Ae. tauschii genetic map (Luo et al., 2009).
Aegilops tauschii is the core genome of the Triticum-Aegilops alliance and the diploid source
of the wheat D genome. Its genome is 4.02 Gb large and contains 90% repetitive sequences.
It is also an important source of germplasm in wheat breeding and a diploid model for the wheat D-genome.
Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform.
Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD,
although some Solexa genomic sequences were also generated. A total of 195,631 putative SNPs
were discovered in gene sequences, 155,580 putative SNPs were discovered in other uncharacterized
single-copy regions, and 145,907 putative SNPs were discovered in repeat junctions.
SNPs were dispersed across the whole genome. DNA containing putative SNPs were PCR
amplified from AL8/78 and AS75 genomic DNA and resequenced with ABI 3730xl to assess the
false positive SNP discovery rate. In a sample of 186 randomly selected putative SNPs,
84.0% in gene regions and 88.0% in repeat junctions were validated.
Putative SNPs disocvered in Aegilops tauschii:
|