|
Week |
Subject |
Related Preparation |
1) |
Introduction to bioinformatics, brief history of bioinformatics, database applications in bioinformatics
web tools and services, pattern analysis, contribution of information theory
|
|
2) |
Introduction to machine learning, supervised, unsupervised, reinforcement and semi-supervised learning, no free lunch theorem, ugly duckling theorem, Occam's razor, overfitting, cross-validation, bootstrap |
|
3) |
Introduction to unsupervised learning, brief review of probability theory, probability density estimation, histogram approach, parametric approach, non-parametric approach: K-nearest neighbor, kernel approach
|
|
4) |
Dimension reduction, principal component analysis, independent component analysis, multi-dimensional scaling, application of the Sammon algorithm to the gene data
|
|
5) |
Cluster analysis, hierarchical clustering, K-means clustering,
Fuzzy C-means, Gaussian mixture models, application of clustering algorithms to gene expression data
|
|
6) |
Self organising map, vector quantization, SOM structure, SOM learning algorithm, Using SOM for classification, bioinformatics applications of VQ and SOM
|
|
7) |
Introduction to supervised learning, general concepts and definition, model evaluation, data organization |
|
8) |
Bayes rule for classification, minimum error rate classification, discriminant functions, Bayesian belief networks |
|
9) |
Linear and quadratic discriminant analysis, generalized discriminant analysis, K-nearest neighbors and application to gene data analysis |
|
10) |
Classification and regression trees, CART for compound pathway involvement prediction, random forest algorithm, analyzing gene expression profiles with random forest |
|
11) |
Feature selection, built-in strategy, lasso regression, ridge regression, partial least squares algorithm, exhaustive strategy |
|
12) |
Feature selection (cont'd), heuristic strategy: orthogonal least square approach, criteria for feature selection, correlation measure, Fisher ratio measure, mutual information measure |
|
13) |
Feature extraction, biological data coding: molecular sequences, chemical compounds, sequence analysis |
|
14) |
Project presentations |
|
Course Notes: |
1. Machine Learning Approaches to Bioinformatics, Zheng Rong Yang, World Scientific Publishing Company, 2011.
|
References: |
1. Data Mining for Bioinformatics, Sumeet Dua, CRC Press, 2013.
2. Bioinformatics: The Machine Learning Approach, Pierre Baldi and Soren Brunak, 2nd edition, MIT Press, 2001.
3. Pattern Recognition and Machine Learning, Christopher M. Bishop, 2nd printing edition, Springer, 2011.
4. Pattern Recognition, Sergios Theodoridis, Konstantinos Koutroumbas, Academic Press, 4th edition, 2008.
5. Introduction to Pattern Recognition: A Matlab Approach, Sergios Theodoridis, Aggelos Pikrakis, Konstantinos Koutroumbas, Dionisis Cavouras, Academic Press, 2010.
6. Pattern Classification, Richard O. Duda, Peter E. Hart, David G. Stork, 2nd edition, Wiley-Interscience, 2000.
|