README (11/26/2009)
================

Gene Expression input file format
---------------------------------
CluFa takes one gene expression file format:
(1) Tab-delimited format
    First two rows are header information
    From the third row onwards, the first column is the gene symbol, the second column is the gene description, and then gene expression values starting from the fourth column.
    See "yeastall_public.txt" for example

How to run the program
--------------------------
To run CluFa, you must have Java 1.5 installed on the machine.
- To start CluFa, go to the command line and type the following:

   java -Xms128M -Xmx512M -jar CluFa.jar geneexpfile taxonId numGenes numSamples

- Example:
  To run Eisen's yeast dataset (i.e. yeastall_public.txt), the command is
  
   java -Xms128M -Xmx512M -jar CluFa.jar yeastall_public.txt 4932 6221 80 [sgd][ent][mips]
   
  To run Gasch's yeast dataset (i.e. gasch.txt), the command is
    
   java -Xms128M -Xmx512M -jar CluFa.jar gasch.txt 4932 6152 93 [sgd][ent][mips]
   
TaxonId
-------
Currently, CluFa can take yeast (4932) and humans (9606). CluFa expects gene symbols for humans to be HUGO.

Output
------
There are three outputs generated by CluFa after a successful run. The files are named after the input gene expression file.
- out-geneexp: the initial assignment of genes according to the GO annotation file
- multiple-out-geneexp: all genes assigned to the optimal cluster
- stat-out-geneexp: list of genes that are identified to have new functions different from the original GO annotation used.


Directory:
- "go_files": contains GO data files ("go_200509-termdb-tables"), GO Slim yeast file ("goslim_yeast.obo", Entrez Gene "gene_info")
- "annotation": contains GO annotation files, "evidence_code.txt" contains probabilities for various evidence codes

