GeoLocator v1.4: Refined Multi-Task Training
Model v1.4 represents a significant architectural and training improvement over the baseline, adopting a Multi-Task learning approach with a powerful backbone, dedicated loss function, and robust data handling. This article provides a comprehensive technical overview of the training configuration and key innovations.
Critical Note on Performance
While the reported median error was approximately 10 km, this result is unrealistic due to an accidental data leak (training on validation data). The core training script and architecture, however, showed a marked improvement in learning capability over v1.0. A clean re-training is required for accurate performance metrics.
Architecture & Model Configuration
The v1.4 model introduces a Multi-Task learning architecture with three specialized output heads, leveraging a powerful pre-trained backbone for enhanced feature extraction.
Backbone
ConvNeXt XXLarge
clip_laion2b_soup_ft_in1k
Model Type
MultiTaskGeoModel
Three specialized output heads
Output Heads
Geo Head
Outputs logits for 50,000 geographic clusters
Country Head
Auxiliary classification across ~222 unique countries
Refinement Head
Predicts a 2D (lat/lon) offset for fine-tuning the location
Training & Data Configuration
| Parameter | Value |
|---|---|
| Max Source Points | 1,000,000 |
| Clustering Algorithm | MiniBatchKMeans |
| Number of Clusters | 50,000 |
| Global Batch Size | 160 |
| Max Epochs | 200 |
| Steps/Epoch | 1,000 |
Clustering Fix: Border-Respecting
Unlike v1.0, clustering was configured to respect geographical borders. This critical fix aids the country head and prevents cross-border cluster confusion that plagued the baseline model.
Loss Function & Optimization
The model uses a custom PigeonLoss function to balance the three learning tasks with carefully tuned weights.
Total Loss Formula
Loss = 1.0 · LGeo + 1.0 · LCountry + 10.0 · LRefinement
| Component | Function | Weight |
|---|---|---|
| LGeo | CrossEntropyLoss (50k clusters) | 1.0 |
| LCountry | CrossEntropyLoss (~222 countries) | 1.0 |
| LRefinement | MSE (lat/lon offset) | 10.0 |
Optimizer Configuration
Image Preprocessing
Key Artifacts
last.pt
Latest checkpoint after each epoch
best.pt
Lowest median error on validation
clusters_cache.npy
50,000 cluster centers (Lat/Lon)
Next Steps
While the architecture and training pipeline show significant promise, a clean re-training on properly split data is required to obtain accurate performance metrics. The multi-task approach with border-respecting clustering represents a substantial improvement over the v1.0 baseline, and we expect the re-trained model to demonstrate meaningful real-world performance gains.