Identifying Promoter Sequence on Arabidopsis Genome by Prediction Method

The predicted promoter sequences were determined by downloading the annotated coding sequences and the chromosomal sequences from

  • TAIR(http://arabidopsis.org)
  • . We then mapped the coding sequences to the chromosomal sequences using BLAT. If the upstream intergenic region for each gene was greater than 3 kb, we retrieved 3 kb of the sequence upstream of the start site (ATG). Otherwise, the intergenic region was considered to be the promoter of the next downstream gene, to exclude any coding region of upstream genes.

    Identifying Promoter Sequence on Arabidopsis Genome by Curation Method
    The curated promoters were found by matching 13,181 Full length Arabidopsis cDNAs from the Riken institute at
  • http://pfgweb.gsc.riken.go.jp/pub_data/index.html
  • with the 27,281 predicted promoters from AGRIS. The regions upstream of the ATG for open reading frames (ORFs) from the predicted promoters obtained from AGRIS are called "Upstream Regions". They contain the promoter regions and the corresponding 5'UTRs for each of gene. We compared the first 30 bp of each FL cDNA to every Upstream Region sequence, allowing no mismatches. From the results obtained, we grouped unique matching pairs (one to one) together and duplicated matches (one to many) in clusters for further analysis of the transcription start site predicted position. We considered only the clusters in which the FL cDNAs corresponded to a single transcription start site for a Arabidopsis gene with an exact recognition of the promoter region.