2.1 KiB
2.1 KiB
Machine Learning Intro
= Machines that learn to perform a task from experience
3 forms of learning based on labels availability:
- Yes -> Supervised learning
- Some -> Semi-supervised learning
- No -> Unsupervised learning
Supervised Learning
Training data has labels \mathcal{D} = \{(x_1, t_1), \dots, (x_N, t_N\}
Goal: learn a predictive function that yields good performance on unseen data
Data may need to be preprocessed to handle
- Missing/wrong values
- Outliers
- Inconsistencies
Features
Feature extraction = process that creates descriptive vectors from samples
- Features should be invariant to irrelevant input variations
- Selecting the right features!
- Usually encode some domain knowledge
- Higher-dimensional features are more discriminative
Curse of dimensionality: complexity increases exponentially with number of dimensions
Terms, Concepts, Notation
Mostly based on statistics and probability theory Notation:
- Scalar
x \in \mathbb{R} - Vector-valued
\text{x} \in \mathbb{R} - Datasets
\mathcal{X} \in \mathbb{R} - Labelled datasets
\mathcal{D} = \{(x_1, t_1), \dots, (x_N, t_N\} - Matrices
\text{M} \in \mathbb{R}^{m \times n} - Dot product
\text{w}^\text{T}\text{x} = \sum_{j=1}^D w_j x_j
Probability Basics
Over random variables:
- Discrete case:
p(X = x_j) = \frac{n_j}{N} - Continuous case:
p(X \in (x_1, x_2)) = \int_{x_1}^{x_2}p(x)\, dxwherep(x)is the probability desnity function (pdf) ofx
Some formulas:
Let A \in \{a_i\}, B \in \{b_j\}
Consider N trials:
n_{ij} = \# \{A = a_i \land B =b_j\}c_i = \#\{A=a_i\}r_j = \#\{B=b_j\}Then we get:- Joint probability
p(A=a_i, B=b_j) = \frac{n_{ij}}{N} - Marginal probability
p(A=a_i) = \frac{c_i}{N} - Conditional probability
P(B=b_j | A=a_i)=\frac{n_{ij}}{c_i} - Sum rule
p(A=a_i) = \frac{1}{N}\sum_j n_{ij} = \sum_{b_j}p(A=a_i,B=b_j) - Product rule
P(A=a_i, B =b_j) = \frac{n_{ij}}{c_i}\cdot \frac{c_i}{N} = p(B=b_j |A=a_i)\cdot p(A=a_i)
In short:
- Sum rule:
p(A) = \sum_Bp(A,b) - Product rule:
p(A,B) = p(B|A)p(A) - Bayes' Theorem:
p(A|B)= \frac{p(B|A)p(A)}{\sum_Ap(B|A)p(A)}