comp34312
Bias Variance Decomposition
Noise
Bias
Variance
- Double Descent: neural nets typically exhibit low bias and high variance. if over-trained the variance + bias increases over time however if in the over-parameterised regime (
) both bias and variance start decreasing again
Bias Variance Decomposition for Ensembles
Ambiguity of the Ensemble
-
loss of the ensemble is guaranteed to be less than or equal to the average loss i.e. ensemble will at least be better than the average
-
overall ensemble variance is reduced by a factor of
relative to the average -
expected loss = avg bias + avg variance - expected ambiguity q(diversity)
- the improvement in performance is entirely determined by diversity
Empirical Risk Minimisation
-
empirical risk minimiser -
best model in a given family (lowest population risk in family) -
bayes model -
Excess Risk:
-
Estimation/Approximation Decomposition:
- Estimation Error:
(random depends on sample size) - Approximation Error:
(constant, depends on choice of model)
- Estimation Error:
-
Optimisation/Estimation/Approximation Decomposition
- Optimisation Error:
- Optimisation Error:
Linear Regression
GD
GD in Over-parameterised Linear Regression
- If
is of rank and then is invertible
Objective Function
Loss function
- Assumptions:
- over-parameterised i.e.
is full rank
- over-parameterised i.e.
NOTE:
-
iff s.t. multiplicity of global minima, there are uncountably many infinite global minima which can be described as -
s.t. i.e. 2-norm of all global minimisers of is bigger than that of -
Full-rank over-parameterised Linear Regression can be set up (
) such that GD find the minimum 2-norm interpolant ( )