We can make your applications smarter by giving them the tools to dynamically learn trends and make predictions.

Overfitting

Description

Overfitting is a condition that occurs when the learning works so well that it memorizes the entire dataset instead of just the meaningful trends. Thus, the model's accuracy goes to 100% on the same dataset, but loses its ability to generalize (and therefore its usefulness) because it has captured irrelevant information.

Example

A patient in a medical database has a history of heart attack. Her name is “Alice”, she's 58, and has blood pressure of 150/90. Rather than learning about the association between blood pressure, age, and heart attacks in general, the classifier learns that people named “Alice” who are 58 and have a blood pressure of 150/90 are prone to heart attacks. Should the classifier then encounter someone named Bob with the same medical findings, it would consider Bob less likely to have a heart attack, since his name is not Alice. The classifier has thus fit more precisely than is useful.

Solution

Use of validation (“stop”) sets, regularization, or ensemble methods can largely eliminate the problem of overfitting. Cross-validation also works fairly well at mitigating its effects.