MAIN
INDEX
ANALYTICAL PIPELINE
CONTACT
SYSTEM REQUIREMENTS
REDHORSE Package
Example Data | Analytical pipeline to extract recombinations from haploid genomes | This
section outlines the pipeline to extract recombinations and double
crossovers using the alignment of short reads to a reference genome of
interest in binary alignment map (BAM) format. Single nucleotide
variations (SNVs) between the haploid parental strains indicate loci
where both parents are homozygous different. The master merged allele
file that contains allelic makeup of the hybrids at these variant loci
facilitates comparison of parents against the hybrids to detect
recombination patterns.
PrerequisitesREDHORSE package can be downloaded here Example Data is available here Generate Input Data
Analytical PipelineThe
commands issued in this pipeline are using the data that is provided
with this software suite. All the data is in the directory 'data'. A
representative chromosome ChrVIII is used in this demo.
Step1: Find Alleles using the utility findAlleles
in REDHORSE package. This utility takes a bam file as input and outputs
an allele file. The BAM files are in the directory 'bamFiles'. Do
this for all the BAM files. The details on this utility can be found here
Step2: List Alleles using the utility listAlleles
in REDHORSE. At this stage, the minimum coverage value as well as the
minimum frequency to call an allele can be chosen. Since we are dealing
with a haploid genome, the minimum frequency to call an allele is set
to 0.8. Any allele occurring with less than this frequency may be
considered noise. Do this using all the allele files generated in step
1. The details on this utility can be found here
Step3: Call SNVs in the parental strains using the findSNPs
utility of REDHORSE. Single nucleotide variations (SNVs) between the
haploid parental strains indicate loci where both parents are
homozygous different. By referring to these loci in the hybrids, it is
easy to verify if the alleles inherited are from parent1 or parent 2.
The first step in doing so is to find SNVs that are different from the
reference genome. The details on this utility can be found here
Step4: Filter out SNPs in the close proximity using the filterSNPsWindow
utility of REDHORSE. Single nucleotide variations (SNVs) that occur
in the close proximity mostly represent noise and must be removed from
the SNVs found using both the parental strains. Do this for both the
SNV files generated in Step 3. The details on this utility can be found
here
Step5: Consolidate both the SNP files into one file using the consolidateSNPs
utility of REDHORSE. The parental SNV files contain loci where the
parents are individually different from the reference genome. There
might be loci where both the parents are different from the reference
genome but dont differ among themselves. This utility removes the loci
with SNVs common to both the parents and generates a consolidated SNV
file containing loci where both the parents are different from each
other. The details on this utility may be found here
Step6: Generate a merged allele file using the findMergedAlleles
utility of REDHORSE. This utility takes the loci from step 5 and
generates a merged allele file that contains the allele information
from the hybrids as well as the parents at those loci. This master file
containing the allele information from all the samples at the loci
where both parents are different provides a direct comparison of
hybrids against the parents. The details on this utility may be found here
Step7: Keep only bi-allelic segments and remove the multi-allelic segments using keepBiallelicSegments
utility of REDHORSE. This utility takes the merged allele file
generated in step 6 and scans each loci to verify if it contains more
than 2 alleles and filters them out. The input to this utility is the merged allele file generated in step 6. The details on this utility may be
found here
Step8: Filter out loci with lots of missing data using filterMergedSNPFile4MissingData utility
of REDHORSE. The loci which contain lots of missing data often come
from the noisy segments of the genome and must be filtered out before
carrying out further analysis. Further they do not provide any clues
regarding occurrence of recombinations. The input to this utility is
the merged allele file generated in step 7. The details on this
tool may be found here
Step9: Find conventional recombinations using findConventionalRecombinations utility of REDHORSE. This algorithm employs
a comparison based approach to scan potential recombinant loci using a
fixed window and evaluates each potential break point by comparing it
against other markers in the window and tags it based on the majority
rule. It removes potential noisy loci and evaluates the breakpoints
which sometimes form common boundaries of conventional recombination.
The input to this utility is the filtered merged allele file generated
in step 8. The details on ths tool may be found here
Step10: Find double crossovers using findDoubleCrossovers utility of REDHORSE. The
merged allele file which contains nucleotide information of parents as
well as hybrids not only facilitates comparison of hybrids against the
parents but also allows detection of double crossovers with reasonable
accuracy because of presence of physical location of the markers. The input to this utility is the filtered merged allele file generated in step 8. The details on ths tool may be found here
Step11 (OPTIONAL): Convert the merged allele file to multiple sequence alignment fasta file using the convertMergedAllele2Fasta utility of REDHORSE. The
multiple sequence alignment file can be used as input to other
recombination detection algorithms. The input to this utility is the
filtered merged allele file generated in step 8. The details on ths tool may be found here
|
|