From: Analysing complex Triticeae genomes – concepts and strategies

Schematic representation of the orthologous group assembly workflow and the protocol for the estimation of gene copy number. Grey boxes represent the protein sequence of orthologous group representative, whereas lines connecting boxes depict exon boundaries. Coloured boxes visualize sequencing reads and assembled sequences, respectively. The colour code groups sequences that originate from the same genome and light colouring visualize non-coding regions. A The orthologous assembly algorithm sorts the raw sequencing reads to corresponding orthologous gene representatives based on sequence similarity (BLASTX). Then, each sequence bin is separately assembled using NEWBLER, an overlap graph assembler which identifies overlapping sequence reads first and then creates a consensus assembly sequence based on the overlap graph. B For estimating the gene copy number the orthologous assembly sequences were re-aligned to the corresponding template (BLASTX) and, thus, ordered along its protein sequence. The alignments are transferred into a position-specific hit count profile that counts the number of distinct sub-assemblies mapped to each amino acid of the template protein. Based on the cumulative coverage distribution of the hit-count profile, the final gene copy number is determined as the maximum number of distinct sub-assemblies covering a defined proportion of the template gene.

