ListAlleles

REDHORSE- A Software Suite to Detect Recombinations From Next-Generation Sequencing Data

MAIN

INDEX

ANALYTICAL PIPELINE

CONTACT

SYSTEM REQUIREMENTS

REDHORSE Package

Example Data

List Alleles

Index of Utilities
Analytical Pipeline

Prerequisites

1) Generate Input Data
2) Find Alleles

How to run it?

The listAlleles utility of REDHORSE package takes allele file and summarizes it in terms of frequencies. At this stage two different filters can be applied . Type java -jar REDHORSE.jar listAlleles -h for options. Run the utility as follows:

The -m parameter is the minimum frequency required to call a base. A frequency threshold of 0.4 would mean nucleotide at any given position with frequency greater than or equal to 40% would be called. If there is more than one nucleotide with frequency more than 40% at a given genomic position, which is typical of heterozygous sites in anneuploid genomes, then both will be listed at this stage. Depending on the threshold specified, there might be more than 2 nucleotides at any given position.
The -n parameter calls nucleotides at a position if the read depth (coverage) at that position is greater than or equal to the number specified. if the coverage is less than the number specified, a "-" will be reported.
Finally, -o parameter is the location where the output file needs to be written.

java -jar REDHORSE.jar listAlleles -i "C:\AsisKhan\softwareManuscript\data\AlleleFiles\VAND.allele" -o "C:\AsisKhan\softwareManuscript\data\ListFiles\VAND.list" -n 5 -m 0.8

Output

The output of the program is as follows:

TGME49_chrVIII
21    A    13.0    1.0    0.0    0.0    0.0    4    9    129.69
22    C    15.0    0.0    0.0    1.0    0.0    5    10    132.4
23    C    15.0    0.0    0.0    1.0    0.0    5    10    132.4
24    C    15.0    0.0    0.0    1.0    0.0    5    10    132.4
25    T    16.0    0.0    1.0    0.0    0.0    6    10    133.5
26    A    16.0    1.0    0.0    0.0    0.0    6    10    133.5
27    A    16.0    1.0    0.0    0.0    0.0    6    10    133.5
28    C    16.0    0.0    0.0    1.0    0.0    6    10    133.5
29    C    16.0    0.0    0.0    1.0    0.0    6    10    133.5
30    C    16.0    0.0    0.0    1.0    0.0    6    10    133.5
31    T    16.0    0.0    1.0    0.0    0.0    6    10    133.5
32    A    17.0    1.0    0.0    0.0    0.0    7    10    134.47
33    A    18.0    1.0    0.0    0.0    0.0    7    11    135.3
34    C    19.0    0.052    0.0    0.94    0.0    8    11    136.10
35    C    19.0    0.0    0.0    1.0    0.0    8    11    136.1
36    C    19.0    0.0    0.0    1.0    0.0    8    11    136.1
........
.........

The list file consists of the chromosome name followed by information corresponding to each genomic position. To generate this list file, a minimum read depth of 10 and minimum frequency of 15% were specified as shown above.
For example, " 34 C 19.0 0.05263157894736842 0.0 0.9473684210526315 0.0 8 11 136.10526315789474" in the list file would mean the following:

34 C 19.0 0.052 0.0 0.94 0.0 8 11 136.1- Genomic position
34 C 19.0 0.052 0.0 0.94 0.0 8 11 136.1- Nucleotide called. C is called as it occurs with 94.7% frequency. A is not called as it is less than 15% thresold that was specified when running the utility.
34 C 19.0 0.052 0.0 0.94 0.0 8 11 136.1- Read depth at that position.
34 C 19.0 0.052 0.0 0.94 0.0 8 11 136.1- Normalized frequency at each base
34 C 19.0 0.052 0.0 0.94 0.0 8 11 136.1- Forward and reverse reads contributing to the alleles listed
34 C 19.0 0.052 0.0 0.94 0.0 8 11 136.1- Average Mapping quality

Index of Utilities
Analytical Pipeline