FindSNPs

REDHORSE- A Software Suite to Detect Recombinations From Next-Generation Sequencing Data

MAIN

INDEX

ANALYTICAL PIPELINE

CONTACT

SYSTEM REQUIREMENTS

REDHORSE Package

Example Data

Find single nucleotide variations

REDHORSE implements a custom built SNV caller that calls SNVs using the list file and the reference fasta file. The SNV caller compares the allele information at each base from the sample against the reference genome and finds loci which are different from the reference genome. The SNV caller does not make assumptions regarding the ploidy of the data but rather makes use of the allele information that is generated based on the user defined thresholds. For paired end data, it ensures that reads exist in both the orientations at the regions where SNVs are called. The program outputs standard information such as chromosome, position, reference allele and alternate allele in tab-delimited or standard vcf format.

Prerequisites

1) Generate Input Data
2) Find Alleles
3) List Alleles

How to run it?

java -jar C:\AsisKhan\softwareManuscript\Code\REDHORSE.jar findSNPs -i "C:\AsisKhan\softwareManuscript\data\ListFiles\VAND.list" -j "C:\AsisKhan\softwareManuscript\data\FASTA\ToxoDB-8.0_TgondiiME49_Genome.fasta" -k "tab" -m 25 -o "C:\AsisKhan\softwareManuscript\data\parentalSNPFiles/VAND.snps"

-i is the input list file generated in step 3 of prerequisites
-j is the input reference fasta file
-m is the number that suggests minimum percentage of reads in the forward and reverse direction at a position to call a snp. In the example above, atleast 25% of the reads must be in both forward and reverse direction
-o is the output file
-k if set to tab is in tab-delimited format. If set to "vcf" is in vcf format.

Output

The output of this utility in tab-delimited format contains the following columns.

chromosome    position    reference    alternate allele    read depth    frequency    avgMappingQuality    SnpType
TGME49_chrVIII    348    G    T    42.0    100.0    150.0    Hom
TGME49_chrVIII    420    C    G    43.0    100.0    150.0    Hom
TGME49_chrVIII    694    A    G    35.0    100.0    150.0    Hom
TGME49_chrVIII    1105    A    G    50.0    100.0    150.0    Hom
TGME49_chrVIII    1797    G    A    46.0    100.0    150.0    Hom
TGME49_chrVIII    2009    C    T    33.0    100.0    150.0    Hom
TGME49_chrVIII    2566    C    T    41.0    100.0    150.0    Hom
TGME49_chrVIII    3424    C    A    44.0    100.0    150.0    Hom
TGME49_chrVIII    3745    C    G    49.0    100.0    150.0    Hom
TGME49_chrVIII    4301    A    G    36.0    100.0    150.0    Hom
TGME49_chrVIII    5052    G    T    49.0    100.0    150.0    Hom
TGME49_chrVIII    5238    G    A    57.0    100.0    150.0    Hom
TGME49_chrVIII    5326    A    G    44.0    100.0    150.0    Hom
TGME49_chrVIII    5399    T    C    48.0    100.0    150.0    Hom
TGME49_chrVIII    5700    C    T    54.0    100.0    150.0    Hom
TGME49_chrVIII    5714    A    G    61.0    100.0    150.0    Hom
TGME49_chrVIII    5719    G    A    58.0    100.0    150.0    Hom
TGME49_chrVIII    6873    A    G    41.0    100.0    150.0    Hom
TGME49_chrVIII    9612    T    C    30.0    100.0    150.0    Hom
TGME49_chrVIII    10487    G    A    27.0    100.0    150.0    Hom
.........
.......

The first Column is the chromosome name.
The second column is the genomic position for that chromosome.
The third column is the reference allele.
The fourth column is the alleles found at that position.
The fifth column is the read depth at that position
The sixth column is the frequency of the allele(s).
The seventh column is the average mapping quality of the reads at that position
The eigth column is the type of the snp (homozygous/heterozygous)

Index of Utilities
Analytical Pipeline