BACKGROUNDSoybean (Glycine max, L. Merr.) is one of the world's most important crops, however, its complete genomic sequence has yet to be determined. Nonetheless, a large body of sequence information exists, particularly in the form of expressed sequence tags (ESTs). Herein, we report the use of the model organism Arabidopsis thaliana (thale cress) for which the entire genomic sequence is available as a framework to align thousands of short soybean sequences.RESULTSA series of JAVA-based programs were created that processed and compared 341,619 soybean DNA sequences against A. thaliana chromosomal DNA. A. thaliana DNA was probed for short, exact matches (15 bp) to each soybean sequence, and then checked for the number of additional 7 bp matches in the adjacent 400 bp region. The position of these matches was used to order soybean sequences in relation to the A. thaliana genome.CONCLUSIONReported associations between soybean sequences and A. thaliana were within a 95% confidence interval of e(-30)-e(-100). In addition, the clustering of soybean expressed sequence tags (ESTs) based on A. thaliana sequence was accurate enough to identify potential single nucleotide polymorphisms (SNPs) within the soybean sequence clusters. An EST, bacterial artificial chromosome (BAC) end sequence and marker amplicon sequence synteny map of soybean and A. thaliana is presented. In addition, all JAVA programs used to create this map are available upon request and on the WEB.