1 BasicsforBioinfbrmatics.
Xuegong Zhang,Xueya Zhou,and Xiaowo Wang
1.1 WhatIs l3;ioinformatics
1.2 SomeBasicBiology
1.2.1 Scale andTime.
1.2.2 Cells.
1.2.3 DNA and Chromosome
1.2.4 TheCen~a1Dogma.
1.2.5 GenesandtheGenome.一
1.2.6 Measurements Along the Central Dogma
1.2.7 DNA Sequencing一
1.2.8 Transcriptomics and DNA Microarrays
1.2.9 Proteomics and Mass Spectrometry.
1.2.10 ChIP-Chip andChIP.Seq
1.3 ExampleTopicsofBioinformatics
1.3.1 Examples of Algorithmatic Topics
1.3.2 ExamplesofStatisticalTopics.
1.3.3 Machine Learning and Pattern
RecognitionExamples
1.3.4 Basic Principles ofGenetics.
Re:fe:rences
2 Basic StatisticsforBioinformatics.
Yuanlie Lin and Rui Jiang
2.1 Introduction.
2.2 FoundationsofStatistics
2.2.1 Probabilities
2.2.2 RandomVariables
2.2.3 Multiple Random Variables
2.2.4 Distributions.
2.2.5 random sampling.
2.2.6 suf.cientstatistics
2.3 point estimation
2.3.1 method of moments.
2.3.2 maximum likelihoodestimators
2.3.3 bayes estimators
2.3.4 mean squared error.
2.4 hypothesistesting
2.4.1 likelihood ratio tests
2.4.2 errorprobabilitiesandthepowerfunction
2.4.3 p-values
2.4.4 some widely used tests
2.5 intervalestimation
2.6 analysis of variance
2.6.1 one-way analysis of variance.
2.6.2 two-wayanalysisofvariance.
2.7 regression models
2.7.1 simple linear regression.
2.7.2 logistic regression
2.8 statisticalcomputingenvironments.
2.8.1 downloadingand installation
2.8.2 storage, input, and outputof data.
2.8.3 distributions.
2.8.4 hypothesis testing
2.8.5 anova and linear model
references
3 topics in computational genomics 69 michael q. zhang and andrew d. smith
3.1 overview:genomeinformatics
3.2 finding protein-codinggenes.
3.2.1 how to identifya coding exon
3.2.2 how to identifya gene with multiple exons
3.3 identifyingpromoters.
3.4 genomic arraysand acghcnp analysis
3.5 introduction on computational analysis of transcriptionalgenomicsdata
3.6 modelingregulatory elements
3.6.1 word-based representations
3.6.2 thematrix-basedrepresentation
3.6.3 other representations.
3.7 predicting transcriptionfactor binding sites.
3.7.1 the multinomial model for describing sequences
3.7.2 scoring matrices and searching sequences
3.7.3 algorithmic techniques for identifying high-scoringsites
3.7.4 measuring statistical signi.cance of matches
3.8 modelingmotif enrichmentin sequences
3.8.1 motif enrichmentbased on likelihoodmodels.
3.8.2 relative enrichment between two sequence sets
3.9 phylogeneticconservationof regulatoryelements
3.9.1 three strategies for identifying conserved binding sites
3.9.2 considerationswhen using phylogeneticfootprinting
3.10 motif discovery.
3.10.1 word-basedandenumerativemethods
3.10.2 general statistical algorithms applied to motif discovery
3.10.3 expectationmaximization
3.10.4 gibbs sampling
references
4 statistical methods in bioinformatics 101 jun s. liu and bo jiang
4.1 introduction
4.2 basics of statistical modeling and bayesian inference.
4.2.1 bayesian method with examples.
4.2.2 dynamic programmingand hidden markovmodel
4.2.3 metropolis-hastingsalgorithm and gibbs sampling
4.3 gene expressionand microarrayanalysis
4.3.1 low-level processing and differential expression identi.cation
4.3.2 unsupervised learning
4.3.3 dimensionreductiontechniques
4.3.4 supervised learning
4.4 sequencealignment
4.4.1 pair-wise sequence analysis.
4.4.2 multiple sequence alignment
4.5 sequence pattern discovery
4.5.1 basic models and approaches
4.5.2 gibbsmotifsampler
4.5.3 phylogenetic footprinting method and the identi.cation of cis-regulatorymodules.
4.6 combining sequence and expression information for analyzing transcriptionregulation
4.6.1 motifdiscoveryinchip-arrayexperiment.
4.6.2 regression analysis of transcriptionregulation
4.6.3 regulatoryroleofhistonemodi.cation
4.7 protein structure and proteomics
4.7.1 protein structure prediction
4.7.2 protein chip data analysis.
references
5 algorithms in computational biology . 151 tao jiang and jianxing feng
5.1 introduction
5.2 dynamic programmingand sequence alignment
5.2.1 the paradigm of dynamic programming
5.2.2 sequence alignment
5.3 greedy algorithmsfor genome rearrangement
5.3.1 genome rearrangements
5.3.2 breakpoint graph, greedy algorithm and approximationalgorithm 159 references
6 multivariate statistical methods in bioinformatics research . 163 lingsongzhang and xihong lin
6.1 introduction
6.2 multivariate normal distribution
6.2.1 de.nition and notation
6.2.2 properties of the multivariate normal distribution
6.2.3 bivariate normal distribution
6.2.4 wishart distribution.
6.2.5 sample mean and covariance
6.3 one-sampleand two-sample multivariate hypothesis tests
6.3.1 one-sample t test for a univariate outcome
6.3.2 hotelling''s t2 test for the multivariate outcome
6.3.3 properties of hotelling''st2 test.
6.3.4 paired multivariate hotelling''s t2 test
6.3.5 examples
6.3.6 two-samplehotelling''s t2 test
6.4 principalcomponentanalysis.
6.4.1 de.nition of principal components
6.4.2 computing principalcomponents
6.4.3 variance decomposition
6.4.4 pcawithacorrelationmatrix.
6.4.5 geometricinterpretation
6.4.6 choosing the numberof principal components
6.4.7 diabetes microarraydata.
6.5 factor analysis
6.5.1 orthogonalfactor model
6.5.2 estimating the parameters
6.5.3 an example
6.6 linear discriminant analysis
6.6.1 two-grouplinear discriminant analysis.
6.6.2 an example
6.7 classi.cation methods
6.7.1 introductionof classi.cation methods
6.7.2 k-nearestneighbormethod
6.7.3 density-basedclassi.cationdecisionrule.
6.7.4 quadraticdiscriminantanalysis.
6.7.5 logistic regression
6.7.6 supportvector machine
6.8 variableselection.
6.8.1 linear regression model
6.8.2 motivation for variable selection
6.8.3 traditionalvariableselectionmethods
6.8.4 regularization and variable selection
6.8.5 summary
references
7 association analysis for human diseases: methods and examples . 233 jurg ott and qingrunzhang
7.1 whydoweneedstatistics.
7.2 basic concepts in population and quantitative genetics.
7.3 genetic linkageanalysis
7.4 geneticcase-controlassociationanalysis.
7.4.1 basic steps in an association study
7.4.2 multiple testing corrections
7.4.3 multi-locusapproaches
7.5 discussion.
references
8 data mining and knowledge discovery methods with case examples
s. bandyopadphyayand u. maulik
8.1 introduction
8.2 different tasks in data mining
8.2.1 classi.cation
8.2.2 clustering
8.2.3 discoveringassociations.
8.2.4 issues and challengesin data mining
8.3 some commontools and techniques.
8.3.1 arti.cial neural networks
8.3.2 fuzzy sets and fuzzy logic
8.3.3 genetic algorithms
8.4 case examples
8.4.1 pixelclassi.cation
8.4.2 clustering of satellite images
8.5 discussionandconclusions
references
9 applied bioinformatics tools 271 jingchu luo
9.1 introduction
9.1.1 welcome.
9.1.2 about this web site
9.1.3 outline
9.1.4 lectures
9.1.5 exercises.
9.2 entrez
9.2.1 pubmed query
9.2.2 entrez query
9.2.3 my ncbi
9.3 expasy
9.3.1 swiss-prot query
9.3.2 explore the swiss-prot entry hba human.
9.3.3 database query with the ebi srs
9.4 sequencealignment
9.4.1 pairwise sequence alignment
9.4.2 multiple sequence alignment
9.4.3 blast
9.5 dna sequence analysis
9.5.1 gene structure analysis and prediction
9.5.2 sequencecomposition
9.5.3 secondarystructure.
9.6 protein sequence analysis
9.6.1 primary structure
9.6.2 secondarystructure.
9.6.3 transmembranehelices
9.6.4 helical wheel
9.7 motif search
9.7.1 smart search
9.7.2 memesearch.
9.7.3 hmm search
9.7.4 sequence logo
9.8 phylogeny
9.8.1 protein
9.8.2 dna
9.9 projects
9.9.1 sequence, structure, and function analysis of the bar-headed goose hemoglobin.
9.9.2 exercises.
9.10 literature
9.10.1 courses and tutorials
9.10.2 scienti.c stories
9.10.3 free journalsand books
9.11 bioinformaticsdatabases
9.11.1 list of databases
9.11.2 database query systems
9.11.3 genome databases
9.11.4 sequencedatabases.
9.11.5 proteindomain,family,andfunctiondatabases.
9.11.6 structure databases
9.12 bioinformaticstools
9.12.1 list of bioinformatics tools at international bioinformaticscenters
9.12.2 web-basedbioinformaticsplatforms
9.12.3 bioinformatics packages to be downloaded and installed locally
9.13 sequence analysis
9.13.1 dotplot.
9.13.2 pairwise sequence alignment
9.13.3 multiple sequence alignment
9.13.4 motif finding
9.13.5 gene identi.cation
9.13.6 sequence logo
9.13.7 rna secondary structure prediction
9.14 database search.
9.14.1 blast search
9.14.2 other database search
9.15 molecular modeling
9.15.1 visualizationandmodelingtools
9.15.2 protein modelingweb servers
9.16 phylogeneticanalysisandtreeconstruction.
9.16.1 list of phylogenyprograms
9.16.2 online phylogenyservers
9.16.3 phylogenyprograms
9.16.4 displayofphylogenetictrees
references
10 foundations for the study of structure and function of proteins 303 zhirongsun
10.1 introduction
10.1.1 importanceof protein.
10.1.2 amino acids, peptides, and proteins.
10.1.3 some noticeable problems
10.2 basic concept of protein structure
10.2.1 different levels of protein structures
10.2.2 acting force to sustain and stabilize the high-dimensionalstructure of protein
10.3 fundamentalof macromoleculesstructuresand functions
10.3.1 differentlevelsofproteinstructure.
10.3.2 primary structure
10.3.3 secondarystructure.
10.3.4 supersecondarystructure.
10.3.5 folds
10.3.6 summary
10.4 basis of protein structure and function prediction
10.4.1 overview
10.4.2 the signi.cance of protein structure prediction
10.4.3 the field of machine learning.
10.4.4 homological protein structure prediction method
10.4.5 abinitiopredictionmethod
reference.
11 computational systems biology approaches for deciphering traditional chinese medicine 337 shao li and le lu
11.1 introduction
11.2 disease-related network.
11.2.1 fromagenelisttopathwayandnetwork
11.2.2 construction of disease-related network.
11.2.3 biological network modularity and phenotypenetwork.
11.3 tcm zheng-related network
11.3.1 "zheng" in tcm
11.3.2 acsb-basedcasestudyfortcmzheng
11.4 network-based study for tcm "fu fang"
11.4.1 systems biology in drug discovery
11.4.2 network-based drug design
11.4.3 progresses in herbal medicine
11.4.4 tcm fu fang herbal formula
11.4.5 a network-based case study for tcm fu fang
references
12 advanced topics in bioinformatics and computational biology . 369 bailin hao, chunting zhang, yixue li, hao li, liping wei, minoru kanehisa, luhualai, runsheng chen, nikolaus rajewsky, michael q. zhang, jingdonghan, rui jiang, xuegong zhang, and yanda li
12.1 prokaryotephylogenymeets taxonomy
12.2 z-curve method and its applications in analyzing eukaryoticand prokaryotic genomes
12.3 insights into the coupling of duplication events and macroevolution from an age pro.le of transmembranegene families
12.4 evolution of combinatorial transcriptional circuits inthefungallineage.
12.5 can a non-synonymous single-nucleotide polymorphism nssnp affect protein function analysis from sequence, structure, and enzymatic assay
12.6 bioinformatics methods to integrate genomic andchemicalinformation
12.7 from structure-based to system-based drug design
12.8 progressin the study of noncodingrnas in c. elegans
12.9 identifyingmicrornas and their targets
12.10 topics in computationalepigenomics
12.11 understanding biological functions through molecular networks
12.12 identi.cationof network motifs in random networks
12.13 examples of pattern recognition applicationsin bioinformatics.
12.14 considerationsin bioinformatics