README

This package contains the current release of SUBWAI(SUBoptimal Weighted AlIgnment). SUBWAI is the protein structure 
prediction program based on threading strategy with SPAD, the error estimator. 
For more details, please refer to our paper:

Estimating Quality of Template-Based Protein Models by Alignment Stability. Hao Chen and Daisuke Kihara.

If you need to cite our work in your paper, please cite the above one.

Content

This package contains the following files:
1aab-d1aab-1_20_1_1.seq, 1aab-d1aab-1_20_1_1.fa, 1aab-d1aab-1_20_1_1.sable, 1hme-d1hme-1_20_1_1.seq, 1hme-d1hme-1_20_1_1.fa, 1hme-d1hme-1_20_1_1.dssp, Readme.txt,  example.pl, msa-1.pl, msasub-6.cpp, subwai, confidence.dat, confidence1.dat, confidence2-test.dat, confidence3.dat, output1-test.dat, pathmatrix-test.dat, sw.dat, 1aab-d1aab-1_20_1_1.profile, 1hme-d1hme-1_20_1_1.profile.

Copyright

Copyright of this distribution belongs to Hao Chen & Daisuke Kihara. It's free for academic non-profit institutions. 
For commercial entities or government research labs, please contact us (dkihara@purdue.edu) to get the allowance of 
using this distribution. Redistribution of any files in this pack without our allowance is prohibited.

Usage

1. After downloading the tar file, release the content from the package to your working directory for SUBWAI:
>gzip -d SPAD.tar.gz
>tar -xf SPAD.tar

2. Check your working directory. You will find two FASTA sequence files named: 1aab-d1aab-1_20_1_1.seq and 1hme-d1hme-
1_20_1_1.seq. Here I will show you how to get the threading alignments between these two sequences and to calculate the
 SPAD (SuboPtimal Alignment Diveristy) value for the alignment. For your own work, you need to replace the example seq-
uence files by your sequence files. You may need to change the following command a little bit to make the program work 
properly under your local environment.

3. Generate the profile for each example sequence file by PSIBLAST:
>/bio/liger3d6/CASP7/BLAST/blastpgp -d ../nr/nr -i 1aab-d1aab-1_20_1_1.seq -o 1aab-d1aab-1_20_1_1.profile -m 6 -j 5 -e
 0.002 -h 0.002
>/bio/liger3d6/CASP7/BLAST/blastpgp -d ../nr/nr -i 1hme-d1hme-1_20_1_1.seq -o 1hme-d1hme-1_20_1_1.profile -m 6 -j 5 -e
 0.002 -h 0.002
Here you need to use your PSIBLAST path. Don't change "-m 6 -j 5 -e 0.002 -h 0.002" and otherwise msa-1.pl in the next
 step might not work.


4. Generate the amino acid frequency from the profile:
>./msa-1.pl 1aab-d1aab-1_20_1_1.profile
>./msa-1.pl 1hme-d1hme-1_20_1_1.profile
You will get the frequence file named: 1aab-d1aab-1_20_1_1.fa and 1hme-d1hme-1_20_1_1.fa by these two commands, respec-
tively.

5. Generate the SABLE secondary structure prediction for the target sequence (Here I suppose 1aab-d1aab-1_20_1_1.seq as
 the target sequence, so the structure of this sequence is unknown and you have to predict its seconday structure):
>cp 1aab-d1aab-1_20_1_1.seq /bio/liger3d6/CASP7/sable_distr/data.seq
Copy 1aab-d1aab-1_20_1_1.seq to your SABLE directory. Then change your current directory to SABLE directory and run:
>./run.sable
Then SABLE will generate two output files: OUT_SABLE_graph and OUT_SABLE_res. Copy OUT_SABLE_graph to your SUBWAI dire-
ctory:
cp /bio/liger3d6/CASP7/sable_distr/OUT_SABLE_graph /bio/liger3d8/chen177/test/time/1aab-d1aab-1_20_1_1.sable
And don't forget to change back your current directory to your SUBWAI directory.

6. Generate the secondary structure description file for the template sequence (1hme-d1hme-1_20_1_1.seq is the template
 sequence here. Its structure information is provided by 1hme-d1hme-1_20_1_1.pdb in this package):
>/bio/liger3/chen177/SubOptimal/dsspcmbi 1hme-d1hme-1_20_1_1.pdb 1hme-d1hme-1_20_1_1.dssp
You need to change the path to your DSSP directory here.

7. Compile SUBWAI:
>g++ msasub-6.cpp -o subwai

8. Now we have all files we need and will get the threading alignment and the SPAD by SUBWAI:
>./subwai -q 1aab-d1aab-1_20_1_1.seq -t 1hme-d1hme-1_20_1_1.seq -pq 1aab-d1aab-1_20_1_1.fa -pt 1hme-d1hme-1_20_1_1.seq
 -sq 1aab-d1aab-1_20_1_1.sable -st 1hme-d1hme-1_20_1_1.dssp -n test
Here the format of SUBWAI command is like the following:
subwai -q (The target sequence filename) -t (The template sequence filename) -pq (The frequency filename of the target)
 -pt (The frequency filename of the template) -sq (The SABLE prediction filename of the target) -st (The DSSP output of
 the template) -n (The arbitrary name affix for the output filename. It doesn't relate to the result and can be any word.)




Output

The program will generate many output files, including output1.dat pathmatrix.dat, confidence.dat, confidence1.dat, 
confidence2.dat, confidence3.dat and sw.dat. Output1.dat is a plain-text file to record all optimal and suboptimal 
target-template alignments. In this file, the first alignment is the best alignment, the second alignment has the se-
cond-highest alignment score and so forth. Confidence2.dat contains the SPAD value for each residue of the target se-
quence in the plain-text format:  top one for the first residue, top second for the second residue and so on. 

Contact Us

For technical issues, please contact:
Hao Chen, ph.D. candidate
Department of Biology, Purdue University
West Lafayette, IN, 47907, USA
Email: chen177@purdue.edu

For getting the allowance, please contact:
Daisuke Kihara, Assistant Professor
Department of Biology and Department of Computer Science, Purdue University
West Lafayette, IN, 47907, USA
Email: dkihara@purdue.edu


