Outline

profilebunny9095
Outline.docx

· Project Aim: 

· Develop a conditional random field model which can assess protein functionally utilizing a protein family. 

· Protein family acts as a database for scoring new protein sequences for functionality. 

· What are Graphical CRFs? 

· More powerful than HMMs due to their application of feature functions.

· Undirected graphical model.

· Has a single exponential model for the joint probability of the entire sequence of labels given the observation sequence.

· Linear CRFs, like HMMs, only impose dependencies on the previous element whereas with general CRFs we can impose dependencies to arbitrary elements.

· Applications of CRFs

· Natural Language processing

· Parts-of-speech tagging

· Name Entity recognition

· Prediction sequences

· Gene prediction

· CRF options

· RNNSharp: CRFs based on recurrent neural networks

· CRF-ADF:  Linear-chain CRFs with fast online ADF training

· CRFSharp: Linear-chain CRFs

· GCO: CRF with submodular energy functions

· DGM: General CRFs

· HCRF library: Hidden-state CRFs

· PyStruct: Structured Learning and prediction library in Python

· Advantages

· Design is flexible

· No strict independence assumptions like HMM

· Overcomes the drawbacks of label bias in MEMM

· Computes the conditional probability of global output nodes

· Computes the joint probability distribution

· Disadvantages

· Highly computationally complex at the training stage

· Difficult to re-train data with newer data