What this book covers
Chapter 1, Vectors and vector spaces covers what vectors are and how to work with them. We’ll travel from concrete examples through precise mathematical definitions to implementations, understanding vector spaces and NumPy arrays, which are used to represent vectors efficiently. Besides the fundamentals, we’ll learn
Chapter 2, The geometric structure of vector spaces moves forward by studying the concept of norms, distances, inner products, angles, and orthogonality, enhancing the algebraic definition of vector spaces with some much-needed geometric structure. These are not just tools for visualization; they play a crucial role in machine learning. We’ll also encounter our first algorithm, the Gram-Schmidt orthogonalization method, turning any set of vectors into an orthonormal basis.
In Chapter 3, Linear algebra in practice, we break out NumPy once more, and implement everything that we’ve learned so far. Here, we learn how to work with the high-performance NumPy arrays in practice: operations, broadcasting, functions, culminating in the from-scratch implementation of the Gram-Schmidt algorithm. This is also the first time we encounter matrices, the workhorses of linear algebra.
Chapter 4, Linear transformations is about the true nature of matrices; that is, structure-preserving transformations between vector spaces. This way, seemingly arcane things – such as the definition of matrix multiplication – suddenly make sense. Once more, we take the leap from algebraic structures to geometric ones, allowing us to study matrices as transformations that distort their underlying space. We’ll also look at one of the most important descriptors of matrices: the determinants, describing how the underlying linear transformations affect the volume of the spaces.
Chapter 5, Matrices and equations presents the third (and for us, the final) face of matrices as systems of linear equations. In this chapter, we first learn how to solve systems of linear equations by hand using the Gaussian elimination, then supercharge it via our newfound knowledge of linear algebra, obtaining the mighty LU decomposition. With the help of the LU decomposition, we go hard and achieve a roughly 70000 × speedup on computing determinants.
Chapter 6 introduces two of the most important descriptors of matrices: eigenvalues and eigenvectors. Why do we need them?
Because in Chapter 7, Matrix factorizations, we are able to reach the pinnacle of linear algebra with their help. First, we show that real and symmetric matrices can be written in diagonal form by constructing a basis from their eigenvectors, known as the spectral decomposition theorem. In turn, a clever application of the spectral decomposition leads to the singular value decomposition, the single most important result of linear algebra.
Chapter 8, Matrices and graphs closes the linear algebra part of the book by studying the fruitful connection between linear algebra and graph theory. By representing matrices as graphs, we are able to show deep results such as the Frobenius normal form, or even talk about the eigenvalues and eigenvectors of graphs.
In Chapter 9, Functions, we take a detailed look at functions, a concept that we have used intuitively so far. This time, we make the intuition mathematically precise, learning that functions are essentially arrows between dots.
Chapter 10, Numbers, sequences, and series continues down the rabbit hole, looking at the concept of numbers. Each step from natural numbers towards real numbers represents a conceptual jump, peaking at the study of sequences and series.
With Chapter 11, Topology, limits, and continuity, we are almost at the really interesting parts. However, in calculus, the objects, concepts, and tools are most often described in terms of limits and continuous functions. So, we take a detailed look at what they are.
Chapter 12 is about the single most important concept in calculus: Differentiation. In this chapter, we learn that the derivative of a function describes 1) the slope of the tangent line, and 2) the best local linear approximation to a function. From a practical side, we also look at how derivatives behave with respect to operations, most importantly the function composition, yielding the essential chain rule, the bread and butter of backpropagation.
After all the setup, Chapter 13, Optimization introduces the algorithm that is used to train virtually every neural network: gradient descent. For that, we learn how the derivative describes the monotonicity of functions and how local extrema can be characterized with the first and second order derivatives.
Chapter 14, Integration wraps our study of univariate functions. Intuitively speaking, integration describes the (signed) area under the functions’ graph, but upon closer inspection, it also turns out to be the inverse of differentiation. In machine learning (and throughout all of mathematics, really), integrals describe various probabilities, expected values, and other essential quantities.
Now that we understand how calculus is done in single variables, Chapter 15 leads us to the world of Multivariable functions, where machine learning is done. There, we have an entire zoo of functions: scalar-vector, vector-scalar, and vector-vector ones.
In Chapter 16, Derivatives and gradients, we continue our journey, overcoming the difficulties of generalizing differentiation to multivariable functions. Here, we have three kinds of derivatives: partial, total, and directional; resulting in the gradient vector and the Jacobian and Hessian matrices.
As expected, optimization is also slightly more complicated in multiple variables. This issue is cleared up by Chapter 17, Optimization in multiple variables, where we learn the analogue of the univariate second-derivative test, and implement the almighty gradient descent in its final form, concluding our study of calculus.
Now that we have a mechanistic understanding of machine learning, Chapter 18, What is probability? shows us how to reason and model under uncertainty. In mathematical terms, probability spaces are defined by the Kolmogorov axioms, and we’ll also learn the tools that allow us to work with probabilistic models.
Chapter 19 introduces Random variables and distributions, allowing us not only to bring the tools of calculus into probability theory, but to compact probabilistic models into sequences or functions.
Finally, in Chapter 20, we learn the concept of The expected value, quantifying probabilistic models and distributions with averages, variances, covariances, and entropy.