Sub-dataset | #Positive sequence | #Negative sequence | Length category |
---|
Q1 | 1588 | 2045 | P1, N1 |
Q2 | 1596 | 2047 | P2, N2 |
Q3 | 1593 | 2050 | P3, N3 |
Q4 | 1365 | 1499 | P4, N4 |
Total (Full dataset) | 6142 | 7641 | - |
- Full dataset of positive and negative classes are partitioned into four sub-datasets i.e., Q1, Q2, Q3 and Q4. The partitioning was done based on the homogeneity of sequence length. For the Q1 sub-dataset, the sequence lengths for the positive and negative classes are P1 and N1 respectively, where P1 corresponds to 39 to 221 amino acids and N1 corresponds to 43 to 407 amino acids sequence length. Similar inference can be made for other sub-datasets
- P1: 39 to 221 amino acids; P2: 221 to 363 amino acids; P3: 363 to 538 amino acids; P4: 538 to 1000 amino acids; N1: 43 to 407 amino acids; N2: 407 to 485 amino acids; N3: 485 to 607 amino acids; N4: 607 to 1000 amino acids