Best practices

Follow these guidelines to optimize your model training process and achieve better performance with less computational cost.

Model type

Classification vs. Segmentation

Understanding when to use each approach is crucial for model performance.

Why segmentation often wins

In most cases, segmentation is preferable to classification, since it provides greater robustness.

Approach	Information used	Best for
Segmentation	Geometric + color + spatial data	Precise localization, irregular objects
Classification	Category assignment only	Simple presence/absence detection

Key advantage: Segmentation leverages geometric and color-based information, while classification relies solely on category assignment without structural detail.

Model input

When to use patch inference

Patch inference divides large images into smaller patches for processing, then combines the results.

Use patch inference when:

✅ Analyzing large images in high resolution
✅ Your image resolution exceeds your model's input size
✅ You need to preserve fine details in large scenes
✅ Memory constraints prevent processing the full image

Example scenarios:

High-resolution satellite imagery (4000x4000+)
Whole-slide medical imaging
Detailed industrial inspection of large surfaces
Panoramic scene analysis

Performance benefit

Patch inference allows you to work with high-resolution images without sacrificing detail or overwhelming your GPU memory.

Using regions of interest (ROIs)

ROIs allow you to focus model attention on specific areas of your images where analysis is needed.

Why use ROIs?

Benefit	Impact
Focused attention	Model learns relevant features, ignores background noise
Faster training	Smaller input size = faster processing
Faster inference	Less data to process in production
Better accuracy	Model isn't distracted by irrelevant areas
Memory efficiency	Process only what matters

When to define ROIs:

✅ Use ROIs when:

Your area of interest is a small portion of the full image
Background contains irrelevant or distracting information
Objects always appear in predictable locations
You want to exclude certain areas from analysis

❌ Skip ROIs when:

The entire image is relevant
Object locations vary unpredictably across the full frame
You need context from the full scene

Example use cases:

Manufacturing line inspection:

Full image: 1920x1080 (entire conveyor belt)
ROI: 800x600 (product inspection area only)
Result: 3x faster inference, better defect detection

Document analysis:

Full image: 2480x3508 (entire page)
ROI: 2000x400 (header section only)
Result: Focus on relevant text, ignore margins

Pro tip

Define ROIs during the dataset creation phase. This ensures consistent preprocessing and optimal model training.

Model architecture

Right-sizing your model

Bigger isn't always better - choose model size based on your actual needs.

The Model size trap

Even if your original images are large, this doesn't automatically mean you need a large model.

Scenario	Image size	Recommended Model size
Simple objects, clear features	Large (2000x2000)	Small to medium
Complex patterns, fine details	Large (2000x2000)	Medium to large
Simple task on small objects	Small (512x512)	Small
Complex segmentation	Medium (1024x1024)	Medium to large

Why avoid oversized models?

❌ Longer training times
❌ Higher memory requirements
❌ Risk of overfitting on small datasets
❌ Slower inference in production

Remember

Start with a smaller model and scale up only if performance plateaus. A well-trained medium model often outperforms a poorly trained large one.

Dataset augmentation

Augmentation is powerful, but incorrect augmentation can destroy your labels and harm model performance.

Preserve label validity

Your augmentations must not invalidate your labels. Always ask: "Does this transformation change what the label represents?"

Common mistakes to avoid

Example 1: Orientation classification

Task: Classify bottle orientation (upright vs. horizontal)

Wrong approach:

Augmentations enabled:
✅ Brightness adjustment
❌ Horizontal flip  ← This changes the orientation!
❌ Vertical flip    ← This changes the orientation!
❌ Rotation         ← This changes the orientation!


Why it's wrong:These augmentations alter the very feature you're trying to classify. A flipped "upright" bottle becomes "inverted" but keeps the "upright" label.

Correct approach:

Augmentations enabled:
✅ Brightness adjustment
✅ Contrast adjustment  
✅ Minor scale changes
❌ Any rotation or flipping

Example 2: Text direction detection

Task: Detect text reading direction (left-to-right vs. right-to-left)

Wrong approach:

❌ Horizontal flip ← Reverses text direction!

Correct approach:

✅ Brightness/contrast only
✅ Slight scale variations
❌ No geometric transformations

Example 3: Defect location matters

Task: Detect defects on specific product sides (front, back, left, right)

Wrong approach:

❌ Any flipping or rotation
(Changes which side is which)


Correct approach:

✅ Lighting variations
✅ Slight perspective changes
❌ No flips or rotations

Augmentation decision matrix

Use this guide to decide which augmentations are safe:

Your task	Safe augmentations	Dangerous augmentations
Orientation/direction classification	Brightness, contrast, color, noise	Rotation, flips
General object detection	All geometric + photometric	None (usually)
Symmetrical object detection	All augmentations	Check if asymmetric features matter
Position-dependent classification	Brightness, contrast only	Any geometric transform
Color-based classification	Geometric only	Color jitter, brightness, contrast
Size-based classification	Most except scale	Zoom, scale augmentations

Before enabling augmentation: Checklist

Ask yourself these questions:

Does my label depend on orientation? → Avoid rotation/flips
Does my label depend on position? → Avoid translations
Does my label depend on size? → Avoid scaling
Does my label depend on color? → Avoid color augmentations
Does my label depend on direction? → Avoid flips
Am I classifying based on geometry? → Avoid geometric augmentations

Critical

When in doubt, test your augmentation pipeline. Visually inspect augmented images with their labels overlaid. If the label no longer matches the transformed image, disable that augmentation.

Quick best practices summary

Before training:

Use patch inference for high-resolution images (>2000px)
Choose model size based on task complexity, not just image size
Define ROIs to focus on relevant areas
Test augmentations don't invalidate labels
Start with conservative augmentation settings

During training:

Monitor that augmented samples still make sense
Verify ROIs capture all relevant information
Adjust patch size if inference is slow

After training:

Validate model on non-augmented test data
Check performance on full images vs. patches
Verify ROI-based predictions match expectations

Advanced Configuration - Detailed parameter tuning guide
Data Augmentation Reference - Complete augmentation catalog
Troubleshooting Guide - Common training issues and solutions

Model type
- Classification vs. Segmentation
Model input
- When to use patch inference
- Using regions of interest (ROIs)
Model architecture
- Right-sizing your model
Dataset augmentation
Quick best practices summary
Related resources

Best practices

Model type​

Classification vs. Segmentation​

Why segmentation often wins​

Model input​

When to use patch inference​

Use patch inference when:​

Example scenarios:​

Using regions of interest (ROIs)​

Why use ROIs?​

When to define ROIs:​

Example use cases:​

Model architecture​

Right-sizing your model​

The Model size trap​

Dataset augmentation​

Preserve label validity​

Common mistakes to avoid​

Augmentation decision matrix​

Before enabling augmentation: Checklist​

Quick best practices summary​

Related resources​