Identifying Promoter Sequence on Arabidopsis Genome
by Prediction Method
The predicted promoter sequences were determined by
downloading the annotated coding sequences and the
chromosomal sequences from TAIR(http://arabidopsis.org). We then mapped the coding sequences
to the chromosomal sequences using BLAT. If the
upstream intergenic region for each gene was greater
than 3 kb, we retrieved 3 kb of the sequence upstream
of the start site (ATG). Otherwise, the intergenic
region was considered to be the promoter of the next
downstream gene, to exclude any coding region of
upstream genes.
Identifying Promoter Sequence on Arabidopsis Genome by Curation Method
The curated promoters were found by matching 13,181
Full length Arabidopsis cDNAs from the Riken institute
at http://pfgweb.gsc.riken.go.jp/pub_data/index.html
with the 27,281 predicted promoters from AGRIS. The regions upstream
of the ATG for open reading frames (ORFs) from the
predicted promoters obtained from AGRIS are called
"Upstream Regions". They contain the promoter regions
and the corresponding 5'UTRs for each of gene. We
compared the first 30 bp of each FL cDNA to every
Upstream Region sequence, allowing no mismatches. From
the results obtained, we grouped unique matching pairs
(one to one) together and duplicated matches (one to
many) in clusters for further analysis of the
transcription start site predicted position. We
considered only the clusters in which the FL cDNAs
corresponded to a single transcription start site for
a Arabidopsis gene with an exact
recognition of the promoter region.