Table 1 Summary statistics of S. chinensis high-quality (HQ) isoforms and HQ isoforms post Cd-Hit datasets

From: Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows

	HQ Isoforms		HQ Isoforms post Cd-Hit
	Standard	Long	Standard	Long	Merged
Number of sequences	262,206	216,879	106,568	103,904	167,866
Total length (Mb)	568	690	261	347	493
Longest sequence (bp)	8373	8991	8373	8991	9099
Shortest sequence (bp)	53	77	69	106	56
Mean sequence length (bp)	2166	3184	2451	3347	2939
Median sequence length (bp)	2094	3058	2365	3232	2873
Number of sequences > 10³ (bp)	231,231 (88.2%)	215,162 (99.2%)	99,680 (93.5%)	103,082 (99.2%)	161,419 (96.1%)
GC content (%)	42.2	41.6	41.4	41.0	41.1

The HQ isoforms and HQ isoforms post Cd-Hit datasets represent the standard workflow (SW) and long workflow (LW), in addition to the final transcriptome Iso-Seq reference dataset which includes the merged dataset of the HQ isoforms post Cd-Hit standard and long workflows (SW + LW)

Back to article page

ISSN: 1746-4811

Contact us

Submission enquiries: journalsubmissions@springernature.com