Skip to main content

Table 3 Additional helper scripts included in the DAWGPAWS package.

From: The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes

DAWGPAWS Script

Purpose

cnv_gff2game.pl

Converts GFF files to the game.xml format.

cnv_game2gff3.pl

Converts game.xml files to the GFF3 format.

batch_hardmask.pl

Given a directory of lowercase masked sequence files, this will replace lowercase residues with an N or X to indicate masking.

dir_merge.pl

Given annotation results scattered across multiple directories, this program can merge the results into subdirectories in a single parent directory.

vennseq.pl

Given GFF annotation results from multiple methods, this program generates a Euler Diagram of these features using the VennMaster program [55]

batch_findgaps.pl

This program will annotate gaps in the query sequences in the input directory.

clust_write_shell.pl

This program writes shell scripts to run DAWGPAWS in a cluster environment running the Platform LSF queuing system.

cnv_seq2dir.pl

Given a FASTA file with multiple sequence files, this program generates a separate FASTA file for each sequence record. The sequence files produced are named using the sequence ID in the FASTA header in the input file.

fasta_merge.pl

This program merges all FASTA files in a directory into a single FASTA file.

fasta_shorten.pl

This program shortens the FASTA header by limiting the header length, or splitting the header by a delimiting character. Some annotation programs are limited by the length of the FASTA header that is accepted, and this programs allows input files to meet this limitation.

fetch_tenest.pl

Fetches multiple results from the Plant GDB TEnest server and converts the results to GFF.

gff_seg.pl

Given a GFF file that contains point or segment data, this will extract segments with score values that exceed a threshold value.

ltrstruc_prep.pl

Because the LTR_STRUC program only runs under the windows environment, this program converts FASTA sequences in UNIX to DOS line endings and generates the files name and flist file required for LTR_STRUC.

seq_oligiocount.pl

This program allows for the generation of a GFF file that counts the number of times an oligomer in the genomic contig occurs in a reference shotgun sequence database.