creation of obsidian vault and first notes
This commit is contained in:
60
ObsidianNotes/elemlds lecture 8.md
Normal file
60
ObsidianNotes/elemlds lecture 8.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# Machine Learning Intro
|
||||
|
||||
= Machines that *learn* to perform a task from *experience*
|
||||
|
||||
3 forms of learning based on labels availability:
|
||||
- Yes -> Supervised learning
|
||||
- Some -> Semi-supervised learning
|
||||
- No -> Unsupervised learning
|
||||
|
||||
# Supervised Learning
|
||||
Training data has labels $\mathcal{D} = \{(x_1, t_1), \dots, (x_N, t_N\}$
|
||||
Goal: learn a *predictive* function that yields good performance on *unseen* data
|
||||
|
||||
Data may need to be preprocessed to handle
|
||||
- Missing/wrong values
|
||||
- Outliers
|
||||
- Inconsistencies
|
||||
|
||||
# Features
|
||||
Feature extraction = process that creates descriptive vectors from samples
|
||||
- Features should be invariant to irrelevant input variations
|
||||
- Selecting the *right* features!
|
||||
- Usually encode some domain knowledge
|
||||
- Higher-dimensional features are more discriminative
|
||||
|
||||
Curse of dimensionality: complexity increases *exponentially* with number of dimensions
|
||||
|
||||
# Terms, Concepts, Notation
|
||||
Mostly based on statistics and probability theory
|
||||
Notation:
|
||||
- Scalar $x \in \mathbb{R}$
|
||||
- Vector-valued $\text{x} \in \mathbb{R}$
|
||||
- Datasets $\mathcal{X} \in \mathbb{R}$
|
||||
- Labelled datasets $\mathcal{D} = \{(x_1, t_1), \dots, (x_N, t_N\}$
|
||||
- Matrices $\text{M} \in \mathbb{R}^{m \times n}$
|
||||
- Dot product $\text{w}^\text{T}\text{x} = \sum_{j=1}^D w_j x_j$
|
||||
|
||||
# Probability Basics
|
||||
Over random variables:
|
||||
- Discrete case: $p(X = x_j) = \frac{n_j}{N}$
|
||||
- Continuous case: $p(X \in (x_1, x_2)) = \int_{x_1}^{x_2}p(x)\, dx$ where $p(x)$ is the probability desnity function (pdf) of $x$
|
||||
|
||||
Some formulas:
|
||||
Let $A \in \{a_i\}, B \in \{b_j\}$
|
||||
Consider $N$ trials:
|
||||
- $n_{ij} = \# \{A = a_i \land B =b_j\}$
|
||||
- $c_i = \#\{A=a_i\}$
|
||||
- $r_j = \#\{B=b_j\}$
|
||||
Then we get:
|
||||
- Joint probability $p(A=a_i, B=b_j) = \frac{n_{ij}}{N}$
|
||||
- Marginal probability $p(A=a_i) = \frac{c_i}{N}$
|
||||
- Conditional probability $P(B=b_j | A=a_i)=\frac{n_{ij}}{c_i}$
|
||||
- Sum rule $p(A=a_i) = \frac{1}{N}\sum_j n_{ij} = \sum_{b_j}p(A=a_i,B=b_j)$
|
||||
- Product rule $P(A=a_i, B =b_j) = \frac{n_{ij}}{c_i}\cdot \frac{c_i}{N} = p(B=b_j |A=a_i)\cdot p(A=a_i)$
|
||||
|
||||
In short:
|
||||
- Sum rule: $p(A) = \sum_Bp(A,b)$
|
||||
- Product rule: $p(A,B) = p(B|A)p(A)$
|
||||
- Bayes' Theorem: $p(A|B)= \frac{p(B|A)p(A)}{\sum_Ap(B|A)p(A)}$
|
||||
|
||||
Reference in New Issue
Block a user