Best practices
Follow these guidelines to optimize your model training process and achieve better performance with less computational cost.
Model type
Classification vs. Segmentation
Understanding when to use each approach is crucial for model performance.
Why segmentation often wins
In most cases, segmentation is preferable to classification, since it provides greater robustness.
| Approach | Information used | Best for |
|---|---|---|
| Segmentation | Geometric + color + spatial data | Precise localization, irregular objects |
| Classification | Category assignment only | Simple presence/absence detection |
Key advantage: Segmentation leverages geometric and color-based information, while classification relies solely on category assignment without structural detail.
Model input
When to use patch inference
Patch inference divides large images into smaller patches for processing, then combines the results.
Use patch inference when:
✅ Analyzing large images in high resolution
✅ Your image resolution exceeds your model's input size
✅ You need to preserve fine details in large scenes
✅ Memory constraints prevent processing the full image
Example scenarios:
- High-resolution satellite imagery (4000x4000+)
- Whole-slide medical imaging
- Detailed industrial inspection of large surfaces
- Panoramic scene analysis
Patch inference allows you to work with high-resolution images without sacrificing detail or overwhelming your GPU memory.
Using regions of interest (ROIs)
ROIs allow you to focus model attention on specific areas of your images where analysis is needed.
Why use ROIs?
| Benefit | Impact |
|---|---|
| Focused attention | Model learns relevant features, ignores background noise |
| Faster training | Smaller input size = faster processing |
| Faster inference | Less data to process in production |
| Better accuracy | Model isn't distracted by irrelevant areas |
| Memory efficiency | Process only what matters |
When to define ROIs:
✅ Use ROIs when:
- Your area of interest is a small portion of the full image
- Background contains irrelevant or distracting information
- Objects always appear in predictable locations
- You want to exclude certain areas from analysis
❌ Skip ROIs when:
- The entire image is relevant
- Object locations vary unpredictably across the full frame
- You need context from the full scene
Example use cases:
Manufacturing line inspection:
Full image: 1920x1080 (entire conveyor belt)
ROI: 800x600 (product inspection area only)
Result: 3x faster inference, better defect detection
Document analysis:
Full image: 2480x3508 (entire page)
ROI: 2000x400 (header section only)
Result: Focus on relevant text, ignore margins
Define ROIs during the dataset creation phase. This ensures consistent preprocessing and optimal model training.
Model architecture
Right-sizing your model
Bigger isn't always better - choose model size based on your actual needs.
The Model size trap
Even if your original images are large, this doesn't automatically mean you need a large model.
| Scenario | Image size | Recommended Model size |
|---|---|---|
| Simple objects, clear features | Large (2000x2000) | Small to medium |
| Complex patterns, fine details | Large (2000x2000) | Medium to large |
| Simple task on small objects | Small (512x512) | Small |
| Complex segmentation | Medium (1024x1024) | Medium to large |
Why avoid oversized models?
- ❌ Longer training times
- ❌ Higher memory requirements
- ❌ Risk of overfitting on small datasets
- ❌ Slower inference in production
Start with a smaller model and scale up only if performance plateaus. A well-trained medium model often outperforms a poorly trained large one.
Dataset augmentation
Augmentation is powerful, but incorrect augmentation can destroy your labels and harm model performance.
Preserve label validity
Your augmentations must not invalidate your labels. Always ask: "Does this transformation change what the label represents?"
Common mistakes to avoid
Example 1: Orientation classification
Task: Classify bottle orientation (upright vs. horizontal)
Wrong approach:
Augmentations enabled:
✅ Brightness adjustment
❌ Horizontal flip ← This changes the orientation!
❌ Vertical flip ← This changes the orientation!
❌ Rotation ← This changes the orientation!
Why it's wrong:These augmentations alter the very feature you're trying to classify. A flipped "upright" bottle becomes "inverted" but keeps the "upright" label.
Correct approach:
Augmentations enabled:
✅ Brightness adjustment
✅ Contrast adjustment
✅ Minor scale changes
❌ Any rotation or flipping
Example 2: Text direction detection
Task: Detect text reading direction (left-to-right vs. right-to-left)
Wrong approach:
❌ Horizontal flip ← Reverses text direction!
Correct approach:
✅ Brightness/contrast only
✅ Slight scale variations
❌ No geometric transformations
Example 3: Defect location matters
Task: Detect defects on specific product sides (front, back, left, right)
Wrong approach:
❌ Any flipping or rotation
(Changes which side is which)
Correct approach:
✅ Lighting variations
✅ Slight perspective changes
❌ No flips or rotations
Augmentation decision matrix
Use this guide to decide which augmentations are safe:
| Your task | Safe augmentations | Dangerous augmentations |
|---|---|---|
| Orientation/direction classification | Brightness, contrast, color, noise | Rotation, flips |
| General object detection | All geometric + photometric | None (usually) |
| Symmetrical object detection | All augmentations | Check if asymmetric features matter |
| Position-dependent classification | Brightness, contrast only | Any geometric transform |
| Color-based classification | Geometric only | Color jitter, brightness, contrast |
| Size-based classification | Most except scale | Zoom, scale augmentations |
Before enabling augmentation: Checklist
Ask yourself these questions:
- Does my label depend on orientation? → Avoid rotation/flips
- Does my label depend on position? → Avoid translations
- Does my label depend on size? → Avoid scaling
- Does my label depend on color? → Avoid color augmentations
- Does my label depend on direction? → Avoid flips
- Am I classifying based on geometry? → Avoid geometric augmentations
When in doubt, test your augmentation pipeline. Visually inspect augmented images with their labels overlaid. If the label no longer matches the transformed image, disable that augmentation.
Quick best practices summary
Before training:
- Use patch inference for high-resolution images (>2000px)
- Choose model size based on task complexity, not just image size
- Define ROIs to focus on relevant areas
- Test augmentations don't invalidate labels
- Start with conservative augmentation settings
During training:
- Monitor that augmented samples still make sense
- Verify ROIs capture all relevant information
- Adjust patch size if inference is slow
After training:
- Validate model on non-augmented test data
- Check performance on full images vs. patches
- Verify ROI-based predictions match expectations
Related resources
- Advanced Configuration - Detailed parameter tuning guide
- Data Augmentation Reference - Complete augmentation catalog
- Troubleshooting Guide - Common training issues and solutions