[[ What is EMD? ]]

EMD (Ensemble Motif Discovery) is an ensemble (consensus) algorithm that identifies one or more frequent motifs among multiple sequences. The basic idea is to combine motif predictions from multiple runs of multiple component algorithms to build consensus motifs as its prediction. In the current version, five component algorithms are included: AlignACE, BioProspector, MotifSampler, MDScan, and MEME. The former three are stochastic algorithms, while the latter two are deterministic algorithms. EMD 1.0 is a parallel program which requires PBS system to run the component motif discovery programs on multiple input sequences in parallel. This will greatly speed up the whole algorithm.

 
[[ How to install EMD? ]]

Uncompress EMD.tar.gz into the installation directory
Add the EMD directory to the system path
Add the EMD directory to the PERL library path
  
[[ How to run EMD ? ]]

emdrunPX.pl runs all the component algorithms on the input data sets(./step5regulon). The number of component algorithms, the number of runs of each algorithm, and their command line options are all specified in a configuration file, e.g., emd.cfg

emdMotif.pl combines the results from multiple component algorithms and output the consensus result.

In EMD direcotry:
1)$mkdir test
2)$cd test
3)$mkdir step5regulon
4) create input files in  step5regulon/, ( e.g. cp ../input/Ada.txt step5regulon/)
5)$cp ../runmotif.sh  .      			#template PBS job file
6)$cp ../emd.cfg  .          			#configuration file of EMD algorithm
6)$cp ../bg_seq/*.bg .       			#background files of component algorithms
7)$emdrunPX.pl -f emd.cfg -w 15 -n 5
8)$emdMotif.pl -f step5regulon/Ada.txt  -c emd.cfg -n 5


[[ Example outputs ]]

Motif 0:
GACTTGTAAACCTAA 0 21 15
GACTTGTAAACCAAA 1 20 15
TTACAAGTCTACACC 1 51 15

Motif 1:
ATTCGGTGTAGACTT 0 11 15
TTTACAAGTCGATTA 0 51 15
TTTAGGTTTACAAGT 1 44 15

Motif 2:
AGACTTGTAAACCTA 0 20 15
CGACTTGTAAACCAA 1 19 15
TTTACAAGTCTACAC 1 50 15

Motif 3:
ACTTGTAAACCTAAA 0 22 15
ACTTGTAAACCAAAT 1 21 15
AAGTCTACACCGAAT 1 55 15

Motif 4:
CTTGTAAACCTAAAT 0 23 15
AACCAAATTGAAAAG 1 28 15
AAGTCTACACCGAAT 1 55 15
