TildAliceWhy Your 200-Image Dataset Isn't Doomed Small datasets are the norm, not the exception....
Small datasets are the norm, not the exception. You've got 200 labeled medical images, or 500 product photos, and you need to train a classifier that doesn't just memorize the training set. Data augmentation is the obvious move, but the implementation details matter more than most tutorials admit.
I tested two popular augmentation libraries — Albumentations and Kornia — on a 300-image dataset of industrial defects. One crashed my training loop with CUDA errors. The other added 40% to my epoch time. Here's what actually happened.
Albumentations builds on NumPy and OpenCV. Every transform runs on CPU, outputs a NumPy array, and you convert to tensor afterward. It's been the go-to choice since 2018 because the API is clean and the transform catalog is huge — 70+ operations including domain-specific stuff like CLAHE and optical distortion.
Continue reading the full article on TildAlice