Learning Curve

../../../../_images/learning_curve.svg

Generates a learning curve by training model multiple timeson incrementally larger subsets of the data and using cross validation for scoring. Plot performance of train-mean vs. test-mean for curve.

Documentation

A learning curve shows the validation and training score of an estimator for varying numbers of training samples. It is a tool to find out how much we benefit from adding more training data and whether the estimator suffers more from a variance error or a bias error.

A cross-validation generator splits the whole dataset k times in training and test data. Subsets of the training set with varying sizes will be used to train the estimator and a score for each training subset size and the test set will be computed. Afterwards, the scores will be averaged over all k runs for each training subset size.

Definition

Input ports

model
Type: model
Description: Model
X
Type: table
Description: X
Y
Type: table
Description: Y

Output ports

results
Type: table
Description: results
statistics
Type: table
Description: statistics

Configuration

Cross validation folds (cv)

Number of fold of cross-validation (minimum 2)

Shuffle (shuffle)

Randomizes the input dataset before passed to internal cross validation

Smallest fraction (smallest)

Size of the smallest dataset as fraction of total

Steps (steps)

Number of different sizes of training/test data measured

Implementation

class node_metrics.LearningCurve[source]