GeoLocator v1.0: Baseline Training Configuration

October 25, 20256 min read

This model serves as the initial baseline for the GeoLocator project, featuring a simple single-head classifier architecture. Its performance was significantly hindered by the quality of the training data and initial clustering strategy, leading to high prediction errors. This article documents the architecture, training configuration, and critical lessons learned.

Poor Baseline Performance

With a median error of approximately 1,500 km, this model could barely identify the correct country. The primary issues stemmed from data quality and a flawed clustering strategy that allowed clusters to span international borders.

Performance Metrics

Metric	Value	Details
Median Error	~1,500 km	Barely able to guess correct country
Evaluation Metric	Haversine Distance	Calculated between predicted cluster and true coords
Model Selection	Validation Loss	Saved to best_geo_model.pt on improvement

Architecture & Loss Function

The v1.0 model utilized a straightforward single-head classification approach, outputting logits for 10,000 geographic clusters.

Backbone

ConvNeXt Tiny

convnext_tiny

Architecture

GeoModel

Single-Head Classifier

Loss Function

CrossEntropyLoss

Standard classification

Training Hyperparameters

Parameter	Value
Global Batch Size	1024
Max Epochs	200
Learning Rate	3e-4 (Base) / 1e-6 (Min)
Optimizer	AdamW (weight_decay=1e-4)
Precision	AMP (autocast)
Gradient Clipping	max_grad_norm=1.0
Early Stopping	Patience = 20 epochs

Data Handling & Preprocessing

Training data was sourced from the Flickr Dataset, which contained many images deemed unsuitable for geolocation tasks. Data was loaded via a streaming approach from sharded MessagePack (.msg) files.

Clustering Strategy

Number of Clusters10,000

Source Points500,000 raw coordinates

AlgorithmKMeans

Clustering Issue: Border Confusion

Clusters were allowed to span international borders, which significantly confused the model during training. This architectural flaw was identified and corrected in subsequent versions.

Image Preprocessing (Albumentations)

Input Size256 → 224×224 (crop)

Training AugmentationsHorizontalFlip, ColorJitter, CoarseDropout

NormalizationImageNet mean/std

Lessons Learned

The v1.0 baseline established critical insights that informed subsequent development. The poor performance, while disappointing, revealed fundamental issues with both data quality and clustering methodology. These learnings directly influenced the architectural decisions in v1.4, including the move to border-respecting clustering and multi-task learning objectives.

Back to Blog Next: GeoLocator v1.4