Skip to main content

Best practices

Image capture

Creating a robust dataset starts with proper image capture techniques.

Match production conditions

Training images should closely resemble the images your model will encounter in production. This ensures the model learns relevant features and patterns.

Maximize variability

Capture as much variability as possible that may occur in production scenarios:

  • Different lighting conditions
  • Multiple object positions
  • Edge cases and uncommon scenarios ( but realistic ones)

This diversity ensures your model can handle real-world variations.

Dataset creation

Maintain image uniqueness

Avoid Duplicates

Do not add duplicate or very similar images to your dataset. Duplicates can bias your model and waste computational resources.

Separate test sets

Keep a completely independent subset of images for testing with OneVision:

✅ Use images from different sessions or time periods
❌ Do not use consecutive images from the same sequence
✅ Ensure test data is never seen during training

Strategic validation sampling

When selecting validation images from a folder:

❌ Don't: Take consecutive images
✅ Do: Use proper sampling techniques (e.g., every nth image, random selection)

This prevents data leakage and ensures honest performance metrics.

Handle questionable images

Images with unclear or ambiguous labels can harm model performance. Best approach:

  • Avoid adding questionable images from the start
  • If already added, mark them to be ignored during training
  • Review and relabel when possible

Refinement datasets

Continuous Improvement

Once a model is in production, if you want to refine it, focus your efforts on edge cases:

✅Only add images that caused errors or misclassifications - This targeted approach improves model weaknesses efficiently.

❌Avoid adding random production images that the model already handles well.

Quick checklist

When creating your dataset, ensure the following best practices are followed:

  • Training images match production conditions
  • Dataset includes diverse scenarios and variations
  • No duplicate or near-duplicate images
  • Test set is completely independent
  • Validation images are properly sampled (not consecutive)
  • Questionable images are excluded or ignored
  • (Refinement datasets) Only error-causing images added

Image Upload Guide - Learn how to upload your images.

Creating a Dataset - Step-by-step instructions to create a dataset and choose the right type balancing.