Considerable interest exists in the ability to determine genotypes within species in a cost-effective manner. Cost-effectiveness is principally determined by desired outcome: when the outcome is a complete genotypic description of a single individual (for example a human patient), the cost is largely defined by healthcare economics, and is the driving force behind initiatives to minimize the whole genome costs of sequencing . For outcomes in the agricultural sector, for example ones leading to identification of genes responsible for desired agronomic traits, genotyping is applied to large populations rather than single individuals, which considerably changes the economic considerations. Moreover, since downstream gene mapping and identification technologies are increasingly well-established for different crop species , the required resolution of such genotyping platforms need not approach the single-nucleotide level provided by whole genome sequencing. Consequently, economic considerations and practical applications of a genotyping technology are driven largely by cost-per-individual rather than cost-per-datum.
Microarray-based technologies for genotyping have become increasingly popular since they offer an assay that is highly multiplexed, and this was immediately recognized as providing a low cost per data point . One of the earliest reports of microarray-based genotyping employed high density whole-genome tiling arrays, produced by photolithographic synthesis (Affymetrix, Santa Clara, CA), for the simultaneous discovery and assay of DNA polymorphisms in yeast. In genotyping assays based on microarrays, allelic variations are detected as differential hybridization of labeled genomic DNA to individual probes, or sets of probes, covering identifiable genomic locations. Using this approach, a large number of single feature polymorphisms (SFPs) were identified between two laboratory strains of yeast . In this case, 3,714 markers were identified using microarrays which comprised 157,112 overlapping 25-mers spanning all annotated Saccharomyces cerevisiae open reading frames . For the larger and more complex Arabidopsis genome, tiling arrays were not available, and hence the first experiments involved hybridization of labeled genomic DNA using Affymetrix AtGenome1 GeneChips based on available, expression-based annotation for open reading frames (ORFs). Despite this ORF-based focus, nearly 4,000 SFPs were identified between the Columbia (Col) and Landsberg erecta (Ler) accessions . In a subsequent study, more than 8000 SFPs were identified using the ATH1 GeneChip comprising 22,500 probe sets representing approximately 24,000 genes .
High density microarray platforms of this type provide a very large amount of information from single individuals, and therefore are ideally suited for polymorphism discovery  or for genome-wide association studies [9, 10]. However, for genotyping populations, the economic utility of microarray genotyping platforms is a function not simply of the multiplexing level, but also of the costs associated with processing each sample . Affymetrix Genechips have the conspicuous disadvantage of a high cost of production and hybridization per array, and this limits their use in situations requiring the genotyping of large numbers of individuals, such as in plant breeding. In contrast, the production of microarray slides through robotic printing of array elements is relatively inexpensive [12, 13]. For microarrays of this type, the array elements (probes) are either PCR amplicons , or synthesized single-stranded oligonucleotides . Since very little DNA is needed for printing each element, beyond the initial cost of production, the cost per element becomes vanishingly small. A further cost-savings is achieved since the microarrays are conventionally hybridized to mixed pairs of nucleic acid targets, separately labeled with different fluorochromes, rather than using one target per hybridization as done with Affymetrix Genechips.
Diversity array technology (DArT) is a modification of the amplified fragment length polymorphism (AFLP) procedure using a microarray platform [16–18]. In DArT, a pool of DNA fragments is produced from a subset of the genome by restriction enzyme digestion of genomic DNA followed by ligation of adaptors and PCR amplification with adaptor specific primers. Fragments from this pool of DNA are cloned and spotted on a microarray. Pools of target DNA are similarly generated from other samples, fluorescently labeled, and hybridized to the arrays. The assay reveals whether the specific cloned DNA fragments are present in the queried sample. An advantage of the DArT technology is that prior genome sequence information is not required; therefore it can be applied to a large range of species. A disadvantage is that, similarly to AFLP, the differential PCR amplification of specific fragments may vary between experiments depending on PCR conditions. Another disadvantage is that the sequence and precise genomic location of the cloned fragments is not known. Therefore, with DArT, it is difficult to target specific genes or genomic regions with higher densities of markers.
Here we describe and validate a method for cost-effective genotyping using printed microarrays comprising single-stranded oligonucleotide array elements. The microarrays were designed to recognize known polymorphic sequences. Each oligonucleotide probe corresponds to an insertion/deletion (indel) polymorphism (i.e. a SFP) discovered through the alignment of whole genome sequences. The DNA sequences used as probes were selected for uniqueness, and to have a uniform melting temperature, and a similar length (approximately 70 nucleotides), to ensure specificity of hybridization. Rice (Oryza sativa) was selected, because of the availability of whole genome sequences for the highly divergent japonica (International Rice Genome Sequencing Project http://rgp.dna.affrc.go.jp/IRGSP/) and indica  cultivars. We recognized that it should be relatively straightforward to employ genomic sequence alignment to identify polymorphisms. Further, rice has abundant mapping populations and germplasm collections to which the genotyping technology can be applied. Finally, rice is considered world-wide the most important agricultural crop, because it provides approximately 23% of the caloric requirements of humans and up to 60% of the calories in countries that rely on rice as the main staple . Because most of the rice improvement efforts occur in developing countries; a low-cost and robust method would be particularly important for breeding institutions with modest levels of research infrastructure.
This low-cost, focused method of genotyping, using printed long-oligonucleotide microarrays, will be particularly useful for applications that require high-density molecular marker coverage of entire genomes for large numbers of samples. Such applications include quantitative trait locus (QTL) mapping, genetic diversity and population structure studies, association mapping, molecular breeding, polymorphism surveys, and marker assisted selection. In this study, we describe the development and use and validation of the genotyping microarrays, and their utilization in the assessment of the levels of polymorphism and genetic relationships within a collection of diverse rice accessions, and to map a major gene conferring resistance to the rice blast pathogen (Magnaporthe grisea) in a segregating recombinant inbred line (RIL) population. Finally, since this method of genotyping is general in scope and can be implemented in other species, provided that sufficient genomic sequences from multiple individuals are available for the identification of SFPs, we describe a bioinformatics pipeline that has been developed for this purpose.