Version 4.

Requirements
------------

 * gcc
 * python2, scipy, matplotlib
 * latex (for plots)

Datasets
--------

One must obtain:

 * Kevin Chen's Binding site location dataset, and put it in the folder 
   rawData/chen. Also do "grep '^[0-9]' all_sites_sep_cerevisiae >sites.txt" 
 * A copy of the SGD data, put in rawData/sgd 
   (Files: S288C_reference_sequence_R53-1-1_20060414.fsa  
    SGD_features.tab   from SGD_features.tab.200604)
 * the file rawData/totIntergenic, obtained from SGD's NotFeature.fasta    
   by concatenating all sequences, then adding the reverse complement 
 * the MITOMI dataset, placed in rawData/mitomi
 * the yeast Deletion dataset in rawData/Deletion, 
   http://www-deletion.stanford.edu/YDPM/Download_Data/Yeast_Deletion_Project/
 * in paradoxus/cerevisiae, put orf_coding.fasta from SGD, and in 
   paradoxus/paradoxus, put orf_genomic_1000.fasta from SGD paradoxus files.
 * clustalw2 and paml4.7, placed in the paradoxus folder
 * expression data, formatted in two column (gene, expr) format in 
   rawData/transcriptome, from 
   http://younglab.wi.mit.edu/expression/transcriptome.html


Procedure
---------

Make sure that the 'python' directory is in your PYTHONPATH.

 * run RUN.sh in rawData, to process the raw data
 * run RUN.sh in processedData
 * run RUN.sh in paradoxus
 * run RUN.sh in EssSelCorr

For the data fits:

These are set up to run on a cluster with a PBS queue.  Compile the files in
fitcode/C with 'make' on the cluster.  Copy rawData/alignedEM0.9 and
rawData/sites0.9 to the cluster, as well as scripts from either
fitcode/fit or fitcode/subsample to the cluster.  Run setup.sh, which creates a
'data' directory, then run './submitDir.sh data'. Once it is complete, copy the
data directory either to yeastfit/fit or subsamples/data, and then run RUN.sh
in that directory.

There are 'makeTable.py' scripts which make the tables shown in the paper. They
take the data directory as argument, and output html which should be piped to
"table.html" in the 'outut' directory.


Credits
-------

Code written by Allan Haldane in 2014 in collaboration with Alex Morozov and
Michael Manhart.

Contact: ahalda@physics.rutgers.edu  (or allan.haldane@gmail.com)
         morozov@physics.rutgers.edu

Results published in:

Biophysical Fitness Landscapes for Transcription Factor Binding Sites
Allan Haldane, Michael Manhart, and Alexandre V. Morozov
PLoS Comp Biol 2014
