AnalyticalPipeline

REDHORSE- A Software Suite to Detect Recombinations From Next-Generation Sequencing Data

MAIN

INDEX

ANALYTICAL PIPELINE

CONTACT

SYSTEM REQUIREMENTS

REDHORSE Package

Example Data

Analytical pipeline to extract recombinations from haploid genomes

This section outlines the pipeline to extract recombinations and double crossovers using the alignment of short reads to a reference genome of interest in binary alignment map (BAM) format. Single nucleotide variations (SNVs) between the haploid parental strains indicate loci where both parents are homozygous different. The master merged allele file that contains allelic makeup of the hybrids at these variant loci facilitates comparison of parents against the hybrids to detect recombination patterns.

Prerequisites

REDHORSE package can be downloaded here
Example Data is available here
Generate Input Data

Analytical Pipeline

The commands issued in this pipeline are using the data that is provided with this software suite. All the data is in the directory 'data'. A representative chromosome ChrVIII is used in this demo.

Step1: Find Alleles using the utility findAlleles in REDHORSE package. This utility takes a bam file as input and outputs an allele file. The BAM files are in the directory 'bamFiles'. Do this for all the BAM files. The details on this utility can be found here

Step2: List Alleles using the utility listAlleles in REDHORSE. At this stage, the minimum coverage value as well as the minimum frequency to call an allele can be chosen. Since we are dealing with a haploid genome, the minimum frequency to call an allele is set to 0.8. Any allele occurring with less than this frequency may be considered noise. Do this using all the allele files generated in step 1. The details on this utility can be found here

Step3: Call SNVs in the parental strains using the findSNPs utility of REDHORSE. Single nucleotide variations (SNVs) between the haploid parental strains indicate loci where both parents are homozygous different. By referring to these loci in the hybrids, it is easy to verify if the alleles inherited are from parent1 or parent 2. The first step in doing so is to find SNVs that are different from the reference genome. The details on this utility can be found here

Step4: Filter out SNPs in the close proximity using the filterSNPsWindow utility of REDHORSE. Single nucleotide variations (SNVs) that occur in the close proximity mostly represent noise and must be removed from the SNVs found using both the parental strains. Do this for both the SNV files generated in Step 3. The details on this utility can be found here

Step5: Consolidate both the SNP files into one file using the consolidateSNPs utility of REDHORSE. The parental SNV files contain loci where the parents are individually different from the reference genome. There might be loci where both the parents are different from the reference genome but dont differ among themselves. This utility removes the loci with SNVs common to both the parents and generates a consolidated SNV file containing loci where both the parents are different from each other. The details on this utility may be found here

Step6: Generate a merged allele file using the findMergedAlleles utility of REDHORSE. This utility takes the loci from step 5 and generates a merged allele file that contains the allele information from the hybrids as well as the parents at those loci. This master file containing the allele information from all the samples at the loci where both parents are different provides a direct comparison of hybrids against the parents. The details on this utility may be found here

Step7: Keep only bi-allelic segments and remove the multi-allelic segments using keepBiallelicSegments utility of REDHORSE. This utility takes the merged allele file generated in step 6 and scans each loci to verify if it contains more than 2 alleles and filters them out. The input to this utility is the merged allele file generated in step 6. The details on this utility may be found here

Step8: Filter out loci with lots of missing data using filterMergedSNPFile4MissingData utility of REDHORSE. The loci which contain lots of missing data often come from the noisy segments of the genome and must be filtered out before carrying out further analysis. Further they do not provide any clues regarding occurrence of recombinations. The input to this utility is the merged allele file generated in step 7. The details on this tool may be found here

Step9: Find conventional recombinations using findConventionalRecombinations utility of REDHORSE. This algorithm employs a comparison based approach to scan potential recombinant loci using a fixed window and evaluates each potential break point by comparing it against other markers in the window and tags it based on the majority rule. It removes potential noisy loci and evaluates the breakpoints which sometimes form common boundaries of conventional recombination. The input to this utility is the filtered merged allele file generated in step 8. The details on ths tool may be found here

Step10: Find double crossovers using findDoubleCrossovers utility of REDHORSE. The merged allele file which contains nucleotide information of parents as well as hybrids not only facilitates comparison of hybrids against the parents but also allows detection of double crossovers with reasonable accuracy because of presence of physical location of the markers. The input to this utility is the filtered merged allele file generated in step 8. The details on ths tool may be found here

Step11 (OPTIONAL): Convert the merged allele file to multiple sequence alignment fasta file using the convertMergedAllele2Fasta utility of REDHORSE. The multiple sequence alignment file can be used as input to other recombination detection algorithms. The input to this utility is the filtered merged allele file generated in step 8. The details on ths tool may be found here