GeoLocator v1.4: Refined Multi-Task Training

Model v1.4 represents a significant architectural and training improvement over the baseline, adopting a Multi-Task learning approach with a powerful backbone, dedicated loss function, and robust data handling. This article provides a comprehensive technical overview of the training configuration and key innovations.

Critical Note on Performance

While the reported median error was approximately 10 km, this result is unrealistic due to an accidental data leak (training on validation data). The core training script and architecture, however, showed a marked improvement in learning capability over v1.0. A clean re-training is required for accurate performance metrics.

Architecture & Model Configuration

The v1.4 model introduces a Multi-Task learning architecture with three specialized output heads, leveraging a powerful pre-trained backbone for enhanced feature extraction.

Backbone

ConvNeXt XXLarge

clip_laion2b_soup_ft_in1k

Model Type

MultiTaskGeoModel

Three specialized output heads

Output Heads

Geo Head

Outputs logits for 50,000 geographic clusters

Country Head

Auxiliary classification across ~222 unique countries

Refinement Head

Predicts a 2D (lat/lon) offset for fine-tuning the location

Training & Data Configuration

Parameter	Value
Max Source Points	1,000,000
Clustering Algorithm	MiniBatchKMeans
Number of Clusters	50,000
Global Batch Size	160
Max Epochs	200
Steps/Epoch	1,000

Clustering Fix: Border-Respecting

Unlike v1.0, clustering was configured to respect geographical borders. This critical fix aids the country head and prevents cross-border cluster confusion that plagued the baseline model.

Loss Function & Optimization

The model uses a custom PigeonLoss function to balance the three learning tasks with carefully tuned weights.

Total Loss Formula

Loss = 1.0 · L_Geo + 1.0 · L_Country + 10.0 · L_Refinement

Component	Function	Weight
L_Geo	CrossEntropyLoss (50k clusters)	1.0
L_Country	CrossEntropyLoss (~222 countries)	1.0
L_Refinement	MSE (lat/lon offset)	10.0

Optimizer Configuration

OptimizerAdamW (weight_decay=1e-4)

Learning Rate5e-5 (Base) / 1e-6 (Min)

PrecisionAMP with GradScaler

Image Preprocessing

Training Transforms400×400 → RandomCrop 384×384

Validation Transforms400×400 → CenterCrop 384×384

AugmentationsColorJitter, CoarseDropout

NormalizationImageNet mean/std

Key Artifacts

last.pt

Latest checkpoint after each epoch

best.pt

Lowest median error on validation

clusters_cache.npy

50,000 cluster centers (Lat/Lon)

Next Steps

While the architecture and training pipeline show significant promise, a clean re-training on properly split data is required to obtain accurate performance metrics. The multi-task approach with border-respecting clustering represents a substantial improvement over the v1.0 baseline, and we expect the re-trained model to demonstrate meaningful real-world performance gains.

Previous: GeoLocator v1.0 Next: GeoLocator v2.0