Skip to main content

Best practices

Follow these guidelines to optimize your model training process and achieve better performance with less computational cost.

Model type

Classification vs. Segmentation

Understanding when to use each approach is crucial for model performance.

Why segmentation often wins

In most cases, segmentation is preferable to classification, since it provides greater robustness.

ApproachInformation usedBest for
SegmentationGeometric + color + spatial dataPrecise localization, irregular objects
ClassificationCategory assignment onlySimple presence/absence detection

Key advantage: Segmentation leverages geometric and color-based information, while classification relies solely on category assignment without structural detail.

Model input

When to use patch inference

Patch inference divides large images into smaller patches for processing, then combines the results.

Use patch inference when:

✅ Analyzing large images in high resolution
✅ Your image resolution exceeds your model's input size
✅ You need to preserve fine details in large scenes
✅ Memory constraints prevent processing the full image

Example scenarios:

  • High-resolution satellite imagery (4000x4000+)
  • Whole-slide medical imaging
  • Detailed industrial inspection of large surfaces
  • Panoramic scene analysis
Performance benefit

Patch inference allows you to work with high-resolution images without sacrificing detail or overwhelming your GPU memory.

Using regions of interest (ROIs)

ROIs allow you to focus model attention on specific areas of your images where analysis is needed.

Why use ROIs?

BenefitImpact
Focused attentionModel learns relevant features, ignores background noise
Faster trainingSmaller input size = faster processing
Faster inferenceLess data to process in production
Better accuracyModel isn't distracted by irrelevant areas
Memory efficiencyProcess only what matters

When to define ROIs:

Use ROIs when:

  • Your area of interest is a small portion of the full image
  • Background contains irrelevant or distracting information
  • Objects always appear in predictable locations
  • You want to exclude certain areas from analysis

Skip ROIs when:

  • The entire image is relevant
  • Object locations vary unpredictably across the full frame
  • You need context from the full scene

Example use cases:

Manufacturing line inspection:

Full image: 1920x1080 (entire conveyor belt)
ROI: 800x600 (product inspection area only)
Result: 3x faster inference, better defect detection

Document analysis:

Full image: 2480x3508 (entire page)
ROI: 2000x400 (header section only)
Result: Focus on relevant text, ignore margins
Pro tip

Define ROIs during the dataset creation phase. This ensures consistent preprocessing and optimal model training.

Model architecture

Right-sizing your model

Bigger isn't always better - choose model size based on your actual needs.

The Model size trap

Even if your original images are large, this doesn't automatically mean you need a large model.

ScenarioImage sizeRecommended Model size
Simple objects, clear featuresLarge (2000x2000)Small to medium
Complex patterns, fine detailsLarge (2000x2000)Medium to large
Simple task on small objectsSmall (512x512)Small
Complex segmentationMedium (1024x1024)Medium to large

Why avoid oversized models?

  • ❌ Longer training times
  • ❌ Higher memory requirements
  • ❌ Risk of overfitting on small datasets
  • ❌ Slower inference in production
Remember

Start with a smaller model and scale up only if performance plateaus. A well-trained medium model often outperforms a poorly trained large one.

Dataset augmentation

Augmentation is powerful, but incorrect augmentation can destroy your labels and harm model performance.

Preserve label validity

Your augmentations must not invalidate your labels. Always ask: "Does this transformation change what the label represents?"

Common mistakes to avoid

Example 1: Orientation classification

Task: Classify bottle orientation (upright vs. horizontal)

Wrong approach:

Augmentations enabled:
✅ Brightness adjustment
❌ Horizontal flip ← This changes the orientation!
❌ Vertical flip ← This changes the orientation!
❌ Rotation ← This changes the orientation!


Why it's wrong:These augmentations alter the very feature you're trying to classify. A flipped "upright" bottle becomes "inverted" but keeps the "upright" label.

Correct approach:

Augmentations enabled:
✅ Brightness adjustment
✅ Contrast adjustment
✅ Minor scale changes
❌ Any rotation or flipping

Example 2: Text direction detection

Task: Detect text reading direction (left-to-right vs. right-to-left)

Wrong approach:

❌ Horizontal flip ← Reverses text direction!


Correct approach:

✅ Brightness/contrast only
✅ Slight scale variations
❌ No geometric transformations

Example 3: Defect location matters

Task: Detect defects on specific product sides (front, back, left, right)

Wrong approach:

❌ Any flipping or rotation
(Changes which side is which)


Correct approach:

✅ Lighting variations
✅ Slight perspective changes
❌ No flips or rotations

Augmentation decision matrix

Use this guide to decide which augmentations are safe:

Your taskSafe augmentationsDangerous augmentations
Orientation/direction classificationBrightness, contrast, color, noiseRotation, flips
General object detectionAll geometric + photometricNone (usually)
Symmetrical object detectionAll augmentationsCheck if asymmetric features matter
Position-dependent classificationBrightness, contrast onlyAny geometric transform
Color-based classificationGeometric onlyColor jitter, brightness, contrast
Size-based classificationMost except scaleZoom, scale augmentations

Before enabling augmentation: Checklist

Ask yourself these questions:

  • Does my label depend on orientation? → Avoid rotation/flips
  • Does my label depend on position? → Avoid translations
  • Does my label depend on size? → Avoid scaling
  • Does my label depend on color? → Avoid color augmentations
  • Does my label depend on direction? → Avoid flips
  • Am I classifying based on geometry? → Avoid geometric augmentations
Critical

When in doubt, test your augmentation pipeline. Visually inspect augmented images with their labels overlaid. If the label no longer matches the transformed image, disable that augmentation.

Quick best practices summary

Before training:

  • Use patch inference for high-resolution images (>2000px)
  • Choose model size based on task complexity, not just image size
  • Define ROIs to focus on relevant areas
  • Test augmentations don't invalidate labels
  • Start with conservative augmentation settings

During training:

  • Monitor that augmented samples still make sense
  • Verify ROIs capture all relevant information
  • Adjust patch size if inference is slow

After training:

  • Validate model on non-augmented test data
  • Check performance on full images vs. patches
  • Verify ROI-based predictions match expectations