Skip to main content

Advanced configuration

The Parameters section is your control center for configuring the deep learning architecture and training process. Making the right parameter choices can dramatically impact training time, model accuracy, and real-world performance.

Parameters are grouped into three main categories to help you navigate the many options available:

  • Model architecture: Define the neural network structure and behavior.
  • Dataset parameters: Control how your data is processed and augmented.
  • Training parameters: Configure the optimization process.

Each parameter category affects different aspects of your model's performance and behavior.

Model parameters
Parameters section with the three main configuration categories.

Model architecture

Define the fundamental structure of your neural network. These choices affect capacity, speed, and learning ability.

Backbone selection

The backbone is the pre-trained neural network that serves as the foundation for your model. Different backbones offer different tradeoffs between accuracy and computational efficiency:

Backbone familyCharacteristicsBest for
SimpleSimple and fast, but limited in capacityBasic classification tasks
ResNet-SimpleCustom implementation of ResNetComplex classification tasks
ResNetGood feature extraction, residual connections prevent vanishing gradientsGeneral purpose, excellent starting point for most tasks
VGGSimple architecture, strong feature extractionTasks requiring fine-grained visual features
DenseNetDense connections, parameter efficientComplex visual tasks with limited training data
EfficientNetOptimized for efficiency, scales well across different sizesMobile applications, resource-constrained environments
DensenetDense connections, parameter efficientComplex visual tasks with limited training data

Decision guide:

First model? → ResNet-Simple
Limited resources? → EfficientNetB0
Complex task + lots of data? → ResNet50 or DenseNet
Need fine details? → VGG
Pro tip

For most industrial applications, ResNet50 provides an excellent balance of accuracy and performance. Only switch to more complex backbones if you have sufficient data and computational resources.

Backbone size

The backbone size determines the depth and complexity of the neural network. Larger backbones can learn more complex features but require more memory and computation.

Size optionCharacteristicsBest for
SmallFewer layers, faster training, lower memory usageSimple tasks, limited data
MediumBalanced depth and complexityMost general tasks
LargeMore layers, higher capacityComplex tasks with large datasets
Extra largeDeepest networks, highest capacityVery complex tasks with abundant data

Quick selection:

  • ✅ Start with Medium for most tasks
  • ⬇️ Go Small if you hit memory errors
  • ⬆️ Go Large if accuracy plateaus and you have more data

How to choose:

  • For simple tasks or limited resources: Start with Small or Medium
  • For general tasks with moderate data: Medium or Large
  • For complex tasks with lots of data: Large or Extra Large
  • If you encounter out-of-memory errors during training: Decrease the backbone size

Output downsample

The output downsample factor controls the spatial resolution of the model's output feature maps. Lower downsample values retain more spatial detail but increase memory usage.

Downsample factorCharacteristicsBest for
4xHigh resolution output, more detailSegmentation tasks, small objects
8xBalanced resolution and memory usageMost tasks, good default choice
16xLower resolution output, less detailLarge objects, classification tasks

How to choose:

  • For segmentation or tasks requiring fine detail: Use 4x or 8x.
  • For general tasks: 8x is a good starting point.
  • For classification or large objects: 16x to save memory.
  • If you encounter out-of-memory errors during training: Increase the downsample factor.

Loss configuration

The loss function measures how far your model's predictions are from the true values, guiding the learning process:

Classification loss options:

  • Cross entropy: Standard loss for classification, works well for balanced classes.
  • Focal loss: Focuses on hard examples, useful when classes are imbalanced.
  • Binary cross entropy: Used for multi-label classification where multiple labels can apply.

Segmentation loss options:

  • Dice loss: Measures overlap between predicted and true masks, good for imbalanced pixel classes.
  • IoU loss: Based on Intersection over Union, robust to class imbalance.
  • Cross entropy: Can be used for segmentation but may struggle with class imbalance.

Detection loss options:

  • Combination of regression losses (for bounding box coordinates) and classification losses (for object classes).

How to choose:

  • For balanced classification tasks: Cross Entropy.
  • For imbalanced classes (some classes appear much more frequently): Focal Loss.
  • For segmentation with varying sized objects: Dice Loss or IoU Loss.

Dataset parameters

Dataset parameters control how your images are processed before and during training, affecting what your model "sees" during the learning process.

Data augmentation

Augmentation artificially expands your training data by applying transformations to create variations of your images. This helps your model generalize better to new, unseen images.

Common augmentations include:

AugmentationEffectWhen to useWhen to avoid
Horizontal flipMirrors image horizontallyMost applicationsWhen left/right orientation matters (text, asymmetric objects)
Vertical flipMirrors image verticallySatellite imagery, microscopyWhen up/down orientation matters (most real-world objects)
RotationRotates image by random anglesObjects that can appear at any orientationWhen orientation is fixed in your application
TranslationMoves image along X or Y axisObjects that can appear at different positionsWhen object position is fixed
Brightness/Contrast/SharpnessAdjusts lighting conditionsOutdoor applications or varying lightingHigh-precision color-based analysis
Aspect ratioChanges the aspect ratio of the imageObjects that can appear at different scalesWhen scale is fixed in your application
Crop/PadApplies cropping or paddingAdding variation for more robust modelsPrecision measurement applications
ShearApplies angular distortionAdding variation for more robust modelsPrecision measurement applications
Color ChangesVaries colors randomlyNatural scene analysisMedical imaging, color-critical applications
Background ChangesVaries background colors or patternsHelps models to focus on foreground objectsBackground-foreground separation tasks
Distractors additionAdds irrelevant objects or patternsImproves model robustness to distractionsWhen focusing on specific objects

Recommended configurations:

  • For general classification: Enable horizontal flip, minor rotation (±15°), brightness/contrast adjustments
  • For object detection: All of the above plus minor scaling/zoom variations
  • For precise segmentation: Limited augmentations - horizontal flip and subtle brightness adjustments
  • For outdoor scenes: More aggressive brightness/contrast and weather augmentations
Important

Choose augmentations that reflect realistic variations your model will encounter in production. Unrealistic augmentations can harm performance.

Training parameters

These settings control how your model learns from data during the training process.

Learning rate

The most critical training parameter - controls how quickly your model adapts to errors. Do not change this value unless you understand its impact. The default value should work well for most cases.

RateBenefitPotential problem
High (~0.001)If stable, training will be fasterLoss increases or jumps erratically
Good (~0.0001)Steady improvement, stable trainingLoss decreases slowly
Too low (~0.00001)Stable training, will not oscillateVery slow progress, training barely improves
Tuning strategy

Start with the default. If training is unstable (loss spikes), reduce by 10x. If progress is too slow after 20 epochs, increase by 2-3x.

Batch size

Number of images processed together in each training step.

Batch sizeMemorySpeedStabilityWhen to use
4LowSlowerLess stableMemory-limited GPUs
8-16MediumOptimalGoodRecommended default
32+HighFasterVery stableLarge GPUs, simple tasks

Rule of thumb: Use the largest batch size that fits in your GPU memory.

Epochs

Complete passes through your entire training dataset.

Max epochs: Upper limit on training duration

  • Too few (< 50): Model doesn't learn enough (underfitting)
  • Too many (> 500): Wasted computation
  • Recommended: 100-200 epochs with early stopping

Min epochs: Minimum training duration before stopping is allowed

  • Prevents premature stopping
  • Recommended: 30-50 epochs
Best practice

Set Max epochs to a generous number (e.g., 200) and rely on early stopping to halt training at the optimal point.

Early stopping

Automatically stops training when performance plateaus, preventing overfitting and saving time.

Key settings:

  • Patience: Epochs to wait without improvement (typical: 10-15)
  • Metric: What to monitor (validation loss, accuracy, IoU)
  • Mode: Maximize (accuracy) or minimize (loss)

Example:

Patience: 15 epochs
Metric: Validation Loss
Mode: Minimize

→ Training stops if loss doesn't improve for 15 consecutive epochs