Covariance Network Programming
Project supervisor: John Tavis, Ph.D.; firstname.lastname@example.org; 314-977-8893
Primary technical supervisor: Rajeev Aurora, Ph.D.; email@example.com; 314-977-8891
Secondary technical supervisor: Maureen Donlin, Ph.D.; firstname.lastname@example.org; 314-977-8858
Confidentiality: Some or all of the data, algorithms, or concepts associated with this work may be used to support patent applications. Therefore, all information associated with this project is strictly confidential and is not to be released to third parties without the explicit approval of Drs. Tavis and/or Aurora. This includes the information in this scope of work.
Scope of work: Write two computer programs under the direct supervision of Drs. Aurora and Donlin.
Program 1: Write a program to adapt an algorithm developed by Dr. Aurora to evaluate patterns in networks of interactions derived from amino acid sequence alignments. This program will have adjustable input parameters and cut-points for data output. Inputs will be amino acid sequence alignments and/or summary network data derived from pre-existing sequence alignment programs and network analysis programs, intermediate outputs will be a series of descriptive values pertaining to the alignments/networks, and the final outputs will be application of a decision tree to the descriptive values. Many of the key computational modules are pre-existing, and hence much of this work will involve scripting existing modules into an integrated data analysis pipeline. This program will employ the python programming language. The program need not be polished to the point of being a commercial-grade product, but it should be easily useable by anyone familiar with command-line Unix computing. Experience with CGI-scripting is preferred.
Program 2: Write a program to evaluate the proportion of the various nucleotides at a given position in sequence trace files and determine how the variations could affect the coding potential of the nucleotide sequence. Inputs would be either the primary sequence trace data in an ABI sequencing file format, or the proportion of nucleotides at each position derived from an existing bioinformatics module. Outputs would be the probability of each of the amino acids potentially encoded by the codon. This program will need to be able batch-process sequence traces and automatically determine the appropriate codon positions, as derived from alignment against a reference sequence (the alignment algorithm itself will employ existing sequence modules). This program will employ the python programming language. The program need not be polished to the point of being a commercial-grade product, but it should be easily useable by anyone familiar with command-line Unix computing.
Timeframe: Program 1 should be completed by early December, 2010. Program 2 should be completed by the end of January, 2011.