Iterated Motif Searching

Abstract: Deciphering the regulatory logic of DNA sequences is currently one of molecular biology’s great puzzles. One common approach is to do a search in the upsteam DNA region near transcription start sites of sets of genes that are up- or down-regulated together in response to a particular environmental stimuli. Here is one way to perform such a search in a relatively unbiased fashion. First Published: 12/30/12. Last Updated: 1/16/13.

Let’s say that you are interested in the possible a single potential promoter region and you expect that there is some sequence of DNA there that . There are sets of other genes that are also upregulated in response to  , but this is the gene that you are most interested in.

One option is to  . There are two downsides to this:

1) There are all sorts of possible parameter choices you could use. Yes, you could imagine using standard choices, but your situation is unique enough that a few could be justified, and it’s hard to know which ones might lead to the best solution.



. . You are particularly interested in this one promoter region.

There are so many promoters … , etc.

Randomly Simulated DNA Sequences With A Motif

Most of the time,

Some of the time, the majority of the overlaps map to the


2 mismatches in the rest of the motifs and no mismatches in the tested motif.

[good ex]


[bad ex]

Randomly Simulated Sequences With No Motif

Specificity is another important part of any




If we leave out one of the promoters and re-do the search, the promoter is still found the majority of times.

So, just OK sensitivity and not great specificity. But, this seems

Next Steps

Somehow, we need to figure how to improve the proportion of times that the “true” motif is found in the simulated environment. One possible extension to help here would be to iterate over more of the relevant parameters when doing the searches. Another possible extension is to iterate over motif searching algorithms other than MEME, which apparently improves prediction ability. This sort of search, alas, also might just be basically impossible.