Linear Algebra and its effects on Data Science
There is hardly any theory which is more elementary than linear algebra, in spite of the fact that generations of professors and textbook writers have obscured its simplicity by preposterous calculations with matrices. – Jean Dieudonné
Among the most important competencies required of a data scientist, mathematics literacy tops the list as a basic understanding of how these concepts work and how they add up or exponentially grow. This vital knowledge is one which the data scientist has to be equipped with. A well-informed data scientist is one who is competent to formulate, employ or interpret concepts in different contexts and has the ability to provide a reasoning, analysis or solution to the same.
This does not mean that every data scientist has to be mathematically gifted, but a good understanding of the concept gives the person an edge in terms of analysis. The skill sets include a coming together of statistics and computer science along with persistence and patience.
Linear algebra enhances the understanding of Machine learning algorithms which can be substantially beneficial to a data scientist.
Basic concepts of Linear Algebra
1. Matrices and Vectors
2. Matrix operations
3. Matrix Inverse
4. Orthogonal Matrix
Linear Algebra and Data Science
Linear Algebra helps a data scientist expand his horizons. It pushes us to be more than a mere consumer of scientific software and helps us understand what goes behind the scenes. A data scientist must be more than aware of the tools involved in data science and should be able to apply and develop new tools where required. Some of the areas where linear programming help are:
1. in deriving first and second-order derivates and gradients for multivariate expressions
2. to solve a complex system of equations
3. It compares the relative efficiency of two estimators
4. It understands the performance and behaviour of non-linear systems and their dependence on initial conditions.
Beyond these most obvious aspects, linear algebra helps in the following areas
1. Loss function – An application of vector norm which is used to find the difference between the predicted value and the expected value
2. Regularization – A technique used to prevent models from overfitting.
3. Covariance matrix – These are measures used to study the relationship between two continuous variables.
4. Support vector machine classification – It is a discriminative classifier that works by finding a decision surface
5. Principal Component Analysis – It is an unsupervised dimensionality reduction technique which finds the directions of maximum variance.
6. Dimensionality reduction – A method used to bring down the number of variables to perform coherent analysis
7. Word Embeddings – Machine learning cannot work with raw textual data and it has to be converted into numerical or statistical features. It is way of representing words as low dimensional vectors.
8. Latent Semantic Analysis – Finding meanings of words hidden under different context by using topic modelling
9. Image representation – Images cannot be processed as images and need to be converted to numbers.
To be an able data scientist, one needs to have a complete understanding of the math and statistics behind every decision.