Best practices

Image capture

Creating a robust dataset starts with proper image capture techniques.

Match production conditions

Training images should closely resemble the images your model will encounter in production. This ensures the model learns relevant features and patterns.

Maximize variability

Capture as much variability as possible that may occur in production scenarios:

Different lighting conditions
Multiple object positions
Edge cases and uncommon scenarios ( but realistic ones)

This diversity ensures your model can handle real-world variations.

Dataset creation

Maintain image uniqueness

Avoid Duplicates

Do not add duplicate or very similar images to your dataset. Duplicates can bias your model and waste computational resources.

Separate test sets

Keep a completely independent subset of images for testing with OneVision:

✅ Use images from different sessions or time periods
❌ Do not use consecutive images from the same sequence
✅ Ensure test data is never seen during training

Strategic validation sampling

When selecting validation images from a folder:

❌ Don't: Take consecutive images
✅ Do: Use proper sampling techniques (e.g., every nth image, random selection)

This prevents data leakage and ensures honest performance metrics.

Handle questionable images

Images with unclear or ambiguous labels can harm model performance. Best approach:

Avoid adding questionable images from the start
If already added, mark them to be ignored during training
Review and relabel when possible

Continuous Improvement

Once a model is in production, if you want to refine it, focus your efforts on edge cases:

✅Only add images that caused errors or misclassifications - This targeted approach improves model weaknesses efficiently.

❌Avoid adding random production images that the model already handles well.

Quick checklist

When creating your dataset, ensure the following best practices are followed:

Training images match production conditions
Dataset includes diverse scenarios and variations
No duplicate or near-duplicate images
Test set is completely independent
Validation images are properly sampled (not consecutive)
Questionable images are excluded or ignored
(Refinement datasets) Only error-causing images added

Image Upload Guide - Learn how to upload your images.

Creating a Dataset - Step-by-step instructions to create a dataset and choose the right type balancing.

Best practices

Image capture​

Match production conditions​

Maximize variability​

Dataset creation​

Maintain image uniqueness​

Separate test sets​

Strategic validation sampling​

Handle questionable images​

Refinement datasets​

Quick checklist​

Related resources​