Skip to main content

Table 2 Details of the core genes used to assess the quality of de novo transcriptome assemblies

From: Next generation sequencing and de novo transcriptomics to study gene evolution

Gene name

GenBank ID

Length (amino acid)

Average coverage

Late embryogenesis abundant (LEA)

X59700.1

104

174,360

Oleosin (OLE)

X78679.1

183

9,894

Aspartic proteinase (AP)

AB025359.2

509

1,561

Pathogenesis related (PR)

AB091075.1

158

503

Cysteine protease-1 (CP-1)

AB109186.1

461

126

Serine/threonine protein kinase (PK)

AB090881.1

439

50

  1. The level of transcription in sunflower ranges from high (LEA) to low (PK). The GenBank ID for the amino acid sequence used to tBLASTn each de novo transcriptome is shown. The average coverage in a word size 60, paired method transcriptome provides an indication of their relative abundance. This control set was found to be appropriate for the Asteraceae. For reference, the most closely related sequences in Arabidopsis thaliana are LEA4-5 (At5g06760), an OLEOSIN family member (At3g01570), APA1 (At1g11910), MLP423 (At1g24020), RD21B (At5g43060), and Protein kinase (PK) super family member (At5g15080).