Mathematics for Machine Learning 1st Edition by Marc Peter Deisenroth, ISBN-13: 978-1108455145
[PDF eBook eTextbook]
- Publisher:  Cambridge University Press; 1st edition (April 23, 2020)
- Language:  English
- 398 pages
- ISBN-10:  110845514X
- ISBN-13:  978-1108455145
Distills key concepts from linear algebra, geometry, matrices, calculus, optimization, probability and statistics that are used in machine learning.
The fundamental mathematical tools needed to understand machine learning include linear algebra, analytic geometry, matrix decompositions, vector calculus, optimization, probability and statistics. These topics are traditionally taught in disparate courses, making it hard for data science or computer science students, or professionals, to efficiently learn the mathematics. This self contained textbook bridges the gap between mathematical and machine learning texts, introducing the mathematical concepts with a minimum of prerequisites. It uses these concepts to derive four central machine learning methods: linear regression, principal component analysis, Gaussian mixture models and support vector machines. For students and others with a mathematical background, these derivations provide a starting point to machine learning texts. For those learning the mathematics for the first time, the methods help build intuition and practical experience with applying mathematical concepts. Every chapter includes worked examples and exercises to test understanding. Programming tutorials are offered on the book’s web site.
Table of Contents:
Foreword 1
Part I Mathematical Foundations 9
1 Introduction and Motivation 11
1.1 Finding Words for Intuitions 12
1.2 Two Ways to Read This Book 13
1.3 Exercises and Feedback 16
2 Linear Algebra 17
2.1 Systems of Linear Equations 19
2.2 Matrices 22
2.3 Solving Systems of Linear Equations 27
2.4 Vector Spaces 35
2.5 Linear Independence 40
2.6 Basis and Rank 44
2.7 Linear Mappings 48
2.8 Affine Spaces 61
2.9 Further Reading 63
Exercises 63
3 Analytic Geometry 70
3.1 Norms 71
3.2 Inner Products 72
3.3 Lengths and Distances 75
3.4 Angles and Orthogonality 76
3.5 Orthonormal Basis 78
3.6 Orthogonal Complement 79
3.7 Inner Product of Functions 80
3.8 Orthogonal Projections 81
3.9 Rotations 91
3.10 Further Reading 94
Exercises 95
4 Matrix Decompositions 98
4.1 Determinant and Trace 99
4.2 Eigenvalues and Eigenvectors 105
4.3 Cholesky Decomposition 114
4.4 Eigendecomposition and Diagonalization 115
4.5 Singular Value Decomposition 119
4.6 Matrix Approximation 129
4.7 Matrix Phylogeny 134
4.8 Further Reading 135
Exercises 137
5 Vector Calculus 139
5.1 Differentiation of Univariate Functions 141
5.2 Partial Differentiation and Gradients 146
5.3 Gradients of Vector-Valued Functions 149
5.4 Gradients of Matrices 155
5.5 Useful Identities for Computing Gradients 158
5.6 Backpropagation and Automatic Differentiation 159
5.7 Higher-Order Derivatives 164
5.8 Linearization and Multivariate Taylor Series 165
5.9 Further Reading 170
Exercises 170
6 Probability and Distributions 172
6.1 Construction of a Probability Space 172
6.2 Discrete and Continuous Probabilities 178
6.3 Sum Rule, Product Rule, and Bayes’ Theorem 183
6.4 Summary Statistics and Independence 186
6.5 Gaussian Distribution 197
6.6 Conjugacy and the Exponential Family 205
6.7 Change of Variables/Inverse Transform 214
6.8 Further Reading 221
Exercises 222
7 Continuous Optimization 225
7.1 Optimization Using Gradient Descent 227
7.2 Constrained Optimization and Lagrange Multipliers 233
7.3 Convex Optimization 236
7.4 Further Reading 246
Exercises 247
Part II Central Machine Learning Problems 249
8 When Models Meet Data 251
8.1 Data, Models, and Learning 251
8.2 Empirical Risk Minimization 258
8.3 Parameter Estimation 265
8.4 Probabilistic Modeling and Inference 272
8.5 Directed Graphical Models 278
8.6 Model Selection 283
9 Linear Regression 289
9.1 Problem Formulation 291
9.2 Parameter Estimation 292
9.3 Bayesian Linear Regression 303
9.4 Maximum Likelihood as Orthogonal Projection 313
9.5 Further Reading 315
10 Dimensionality Reduction with Principal Component Analysis 317
10.1 Problem Setting 318
10.2 Maximum Variance Perspective 320
10.3 Projection Perspective 325
10.4 Eigenvector Computation and Low-Rank Approximations 333
10.5 PCA in High Dimensions 335
10.6 Key Steps of PCA in Practice 336
10.7 Latent Variable Perspective 339
10.8 Further Reading 343
11 Density Estimation with Gaussian Mixture Models 348
11.1 Gaussian Mixture Model 349
11.2 Parameter Learning via Maximum Likelihood 350
11.3 EM Algorithm 360
11.4 Latent-Variable Perspective 363
11.5 Further Reading 368
12 Classification with Support Vector Machines 370
12.1 Separating Hyperplanes 372
12.2 Primal Support Vector Machine 374
12.3 Dual Support Vector Machine 383
12.4 Kernels 388
12.5 Numerical Solution 390
12.6 Further Reading 392
References 395
Index 407
Marc Peter Deisenroth is DeepMind Chair in Artificial Intelligence at the Department of Computer Science, University College London. Prior to this, he was a faculty member in the Department of Computing, Imperial College London. His research areas include data-efficient learning, probabilistic modeling, and autonomous decision making. Deisenroth was Program Chair of the European Workshop on Reinforcement Learning (EWRL) 2012 and Workshops Chair of Robotics Science and Systems (RSS) 2013. His research received Best Paper Awards at the International Conference on Robotics and Automation (ICRA) 2014 and the International Conference on Control, Automation and Systems (ICCAS) 2016. In 2018, he was awarded the President’s Award for Outstanding Early Career Researcher at Imperial College London. He is a recipient of a Google Faculty Research Award and a Microsoft P.hD. grant.
A. Aldo Faisal leads the Brain and Behaviour Lab at Imperial College London, where he is faculty at the Departments of Bioengineering and Computing and a Fellow of the Data Science Institute. He is the director of the 20Mio£ UKRI Center for Doctoral Training in AI for Healthcare. Faisal studied Computer Science and Physics at the Universität Bielefeld (Germany). He obtained a Ph.D. in Computational Neuroscience at the University of Cambridge and became Junior Research Fellow in the Computational and Biological Learning Lab. His research is at the interface of neuroscience and machine learning to understand and reverse engineer brains and behavior.
Cheng Soon Ong is Principal Research Scientist at the Machine Learning Research Group, Data61, Commonwealth Scientific and Industrial Research Organisation, Canberra (CSIRO). He is also Adjunct Associate Professor at Australian National University. His research focuses on enabling scientific discovery by extending statistical machine learning methods. Ong received his Ph.D. in Computer Science at Australian National University in 2005. He was a postdoc at Max Planck Institute of Biological Cybernetics and Friedrich Miescher Laboratory. From 2008 to 2011, he was a lecturer in the Department of Computer Science at Eidgenössische Technische Hochschule (ETH) Zürich, and in 2012 and 2013 he worked in the Diagnostic Genomics Team at NICTA in Melbourne.
What makes us different?
• Instant Download
• Always Competitive Pricing
• 100% Privacy
• FREE Sample Available
• 24-7 LIVE Customer Support





Reviews
There are no reviews yet.