MAIN
INDEX
ANALYTICAL PIPELINE
CONTACT
SYSTEM REQUIREMENTS
REDHORSE Package
Example Data | Find single nucleotide variations | Index of Utilities Analytical Pipeline | REDHORSE implements
a custom built SNV caller that calls SNVs using the list file and the
reference fasta file. The SNV caller compares the allele information at
each base from the sample against the reference genome and finds loci
which are different from the reference genome. The SNV caller
does not make assumptions regarding the ploidy of the data but rather
makes use of the allele information that is generated based on the user
defined thresholds. For paired end data, it ensures that reads exist in
both the orientations at the regions where SNVs are called. The program
outputs standard information such as chromosome, position, reference
allele and alternate allele in tab-delimited or standard vcf format. Prerequisites1) Generate Input Data 2) Find Alleles 3) List Alleles
How to run it?
java -jar
C:\AsisKhan\softwareManuscript\Code\REDHORSE.jar findSNPs -i
"C:\AsisKhan\softwareManuscript\data\ListFiles\VAND.list" -j
"C:\AsisKhan\softwareManuscript\data\FASTA\ToxoDB-8.0_TgondiiME49_Genome.fasta"
-k "tab" -m 25 -o "C:\AsisKhan\softwareManuscript\data\parentalSNPFiles/VAND.snps" |
-i is the input list file generated in step 3 of prerequisites -j is the input reference fasta file -m
is the number that suggests minimum percentage of reads in the
forward and reverse direction at a position to call a snp. In the
example above, atleast 25% of the reads must be in both forward and
reverse direction -o is the output file -k if set to tab is in tab-delimited format. If set to "vcf" is in vcf format.
OutputThe output of this utility in tab-delimited format contains the following columns.
chromosome
position reference alternate
allele read depth
frequency avgMappingQuality SnpType TGME49_chrVIII
348 G T
42.0 100.0 150.0
Hom TGME49_chrVIII 420
C G 43.0
100.0 150.0 Hom TGME49_chrVIII
694 A G
35.0 100.0 150.0
Hom TGME49_chrVIII 1105
A G 50.0
100.0 150.0 Hom TGME49_chrVIII
1797 G A
46.0 100.0 150.0
Hom TGME49_chrVIII 2009
C T 33.0
100.0 150.0 Hom TGME49_chrVIII
2566 C T
41.0 100.0 150.0
Hom TGME49_chrVIII 3424
C A 44.0
100.0 150.0 Hom TGME49_chrVIII
3745 C G
49.0 100.0 150.0
Hom TGME49_chrVIII 4301
A G 36.0
100.0 150.0 Hom TGME49_chrVIII
5052 G T
49.0 100.0 150.0
Hom TGME49_chrVIII 5238
G A 57.0
100.0 150.0 Hom TGME49_chrVIII
5326 A G
44.0 100.0 150.0
Hom TGME49_chrVIII 5399
T C 48.0
100.0 150.0 Hom TGME49_chrVIII
5700 C T
54.0 100.0 150.0
Hom TGME49_chrVIII 5714
A G 61.0
100.0 150.0 Hom TGME49_chrVIII
5719 G A
58.0 100.0 150.0
Hom TGME49_chrVIII 6873
A G 41.0
100.0 150.0 Hom TGME49_chrVIII
9612 T C
30.0 100.0 150.0
Hom TGME49_chrVIII 10487
G A 27.0
100.0 150.0 Hom ......... ....... | - The first Column is the chromosome name.
- The second column is the genomic position for that chromosome.
- The third column is the reference allele.
- The
fourth column is the alleles found at that position.
- The fifth column is the read depth at that position
- The
sixth column is the frequency of the allele(s).
- The seventh column is the average mapping quality of the reads at that position
- The eigth column is the type of the snp (homozygous/heterozygous)
|
| Index of Utilities Analytical Pipeline |
|