Files
RWTH-Notizen/ObsidianNotes/elemlds lecture 8.md
2026-02-10 00:08:46 +01:00

2.1 KiB

Machine Learning Intro

= Machines that learn to perform a task from experience

3 forms of learning based on labels availability:

  • Yes -> Supervised learning
  • Some -> Semi-supervised learning
  • No -> Unsupervised learning

Supervised Learning

Training data has labels \mathcal{D} = \{(x_1, t_1), \dots, (x_N, t_N\} Goal: learn a predictive function that yields good performance on unseen data

Data may need to be preprocessed to handle

  • Missing/wrong values
  • Outliers
  • Inconsistencies

Features

Feature extraction = process that creates descriptive vectors from samples

  • Features should be invariant to irrelevant input variations
  • Selecting the right features!
  • Usually encode some domain knowledge
  • Higher-dimensional features are more discriminative

Curse of dimensionality: complexity increases exponentially with number of dimensions

Terms, Concepts, Notation

Mostly based on statistics and probability theory Notation:

  • Scalar x \in \mathbb{R}
  • Vector-valued \text{x} \in \mathbb{R}
  • Datasets \mathcal{X} \in \mathbb{R}
  • Labelled datasets \mathcal{D} = \{(x_1, t_1), \dots, (x_N, t_N\}
  • Matrices \text{M} \in \mathbb{R}^{m \times n}
  • Dot product \text{w}^\text{T}\text{x} = \sum_{j=1}^D w_j x_j

Probability Basics

Over random variables:

  • Discrete case: p(X = x_j) = \frac{n_j}{N}
  • Continuous case: p(X \in (x_1, x_2)) = \int_{x_1}^{x_2}p(x)\, dx where p(x) is the probability desnity function (pdf) of x

Some formulas: Let A \in \{a_i\}, B \in \{b_j\} Consider N trials:

  • n_{ij} = \# \{A = a_i \land B =b_j\}
  • c_i = \#\{A=a_i\}
  • r_j = \#\{B=b_j\} Then we get:
  • Joint probability p(A=a_i, B=b_j) = \frac{n_{ij}}{N}
  • Marginal probability p(A=a_i) = \frac{c_i}{N}
  • Conditional probability P(B=b_j | A=a_i)=\frac{n_{ij}}{c_i}
  • Sum rule p(A=a_i) = \frac{1}{N}\sum_j n_{ij} = \sum_{b_j}p(A=a_i,B=b_j)
  • Product rule P(A=a_i, B =b_j) = \frac{n_{ij}}{c_i}\cdot \frac{c_i}{N} = p(B=b_j |A=a_i)\cdot p(A=a_i)

In short:

  • Sum rule: p(A) = \sum_Bp(A,b)
  • Product rule: p(A,B) = p(B|A)p(A)
  • Bayes' Theorem: p(A|B)= \frac{p(B|A)p(A)}{\sum_Ap(B|A)p(A)}