Skip to main content

Table 1 Summary statistics of S. chinensis high-quality (HQ) isoforms and HQ isoforms post Cd-Hit datasets

From: Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows

 

HQ Isoforms

HQ Isoforms post Cd-Hit

Standard

Long

Standard

Long

Merged

Number of sequences

262,206

216,879

106,568

103,904

167,866

Total length (Mb)

568

690

261

347

493

Longest sequence (bp)

8373

8991

8373

8991

9099

Shortest sequence (bp)

53

77

69

106

56

Mean sequence length (bp)

2166

3184

2451

3347

2939

Median sequence length (bp)

2094

3058

2365

3232

2873

Number of sequences > 103 (bp)

231,231 (88.2%)

215,162 (99.2%)

99,680 (93.5%)

103,082 (99.2%)

161,419 (96.1%)

GC content (%)

42.2

41.6

41.4

41.0

41.1

  1. The HQ isoforms and HQ isoforms post Cd-Hit datasets represent the standard workflow (SW) and long workflow (LW), in addition to the final transcriptome Iso-Seq reference dataset which includes the merged dataset of the HQ isoforms post Cd-Hit standard and long workflows (SW + LW)