Pretraining in machine learning refers to the process of training a model—or part of a model—on a large, general-purpose dataset before fine-tuning it on a specific downstream task. The goal is to learn useful features or representations that can be transferred to other tasks, reducing the amount of data and computation required for those tasks.
For example:
In natural language processing, large models like BERT or GPT are pretrained on vast corpora using self-supervised objectives (e.g., masked language modeling or next-token prediction). These models learn grammar, semantics, and world knowledge.
In computer vision, models like ResNet or Vision Transformers are pretrained on large image datasets (e.g., ImageNet) to learn generic visual features like edges, textures, and object parts.
Pretraining can significantly improve performance, especially when labeled data for the target task is limited, and it often serves as the foundation for transfer learning.