Next generation sequencing and de novo transcriptomics to study gene evolution

Table 1 Assembly statistics

Species	Read length	Raw reads	Clean reads	Assembler	N50	Contig count
H. annuus	101	2 x 21,404,702	40,742,686	CLC (ws60,paired)	482	59,530
A. montana	101	2 x 14,458,043	27,516,042	CLC (ws60,paired)	485	45,194
Z. haageana	101	2 x 38,382,090	64,649,107	CLC (autows,non-paired)	308	205,324
Z. haageana	101	2 x 38,382,090	64,649,107	CLC (ws60,paired)	435	80,460
Z. haageana	101	2 x 38,382,090	72,756,408	CLC (ws60,paired)	629	40,764
H. helianthoides	101	2 x 109,627,594	169,128,716	CLC (autows,non-paired)	305	443,800
H. helianthoides	101	2 x 109,627,594	169,128,716	CLC (ws60,paired)	497	151,272
H. helianthoides	101	2 x 109,627,594	200,130,791	CLC (ws60,paired)	496	162,563

Clean reads were assembled using two methods; automatic word size (autows, 23), non-paired and word size 60 (ws60), paired method. Number of clean reads when quality filtering was done to achieve a quality threshold (q) of 30 and 22 are shown for Z. haageana and H. helianthoides datasets. N50 refers to the contig length where 50% of the assembly is represented by contigs of this size or longer.

ISSN: 1746-4811