Problems associated With Affymetrix GeneChip analysis of Arabidopsis gene expression
Analysis of Arabidopsis gene expression using Affymetrix GeneChips is complicated by difficulties with the associated Affymetrix data files. For example, the Affymetrix Arabidopsis annotation file does not contain GenBank identifiers and genes in the Affymetrix annotation tables were quickly outdated due to continuous updating of the TIGR Arabidopsis genome annotation.
To address these problems Microsoft excel files which contain tables of updated AGI identifiers plus gene, protein, and promoter sequences were made (Ghassemian et al., 2001; www-biology.ucsd.edu/labs/Schroeder/genechip.html). Similarly, analysis of the AtGenome1 GeneChip by the Sheen laboratory (http://genetics.mgh.harvard.edu/sheenweb) revealed three categories of discrepancy with the Arabidopsis genome data. Some BAC accession numbers have protein prediction errors and this was corrected by performing Blast searches using specific GeneChip sequences against the genome databases.
Also the MIPS and TIGR databases sometimes gave different AGI identifiers for the same gene sequences, when discrepancies arose the TIGR genome database was used. Finally, target sequences used in the GeneChip were sometimes too short and not unique, enabling more than one cDNA to bind to the probe.
The annotations of the Arabidopsis ATH1 GeneChip were recently improved taking account of these issues (Ghassemian et al., 2001) . This analysis revealed that 22,132 of the 22,746 GeneChip probe and target sequences have either 100% identity and match length >=50, or 98% identity and match length equal to the length of the target sequence. Furthermore, 133 genes had different AGI identifiers in the Affymetrix list compared to the TIGR database.
After reading these problems you'll probably need a cup of tea or coffee!




