🏆 BirdCLEF 2026 — Top-Scoring Solution

Multi-model ensemble approach based on SOTA bioacoustic research.

Architecture

Model	Backbone	Params	Pre-training	Weight in Ensemble
Bird-MAE-Large	ViT-L/16	302M	SSL on XCL-1.7M (Xeno-Canto)	60%
EfficientNet-B1	EfficientNet-B1	19M	BirdSet-XCL (9,735 species)	40%

Key Techniques

1. Bird-MAE (arxiv:2504.12880) — Primary model

Domain-specific SSL pre-training on 1.7M Xeno-Canto bird recordings
Fine-tuning with Asymmetric Loss (handles noisy multi-label annotations)
Layer-wise LR decay (0.75) for stable ViT fine-tuning
2-stage: 2 epochs frozen backbone → 28 epochs full fine-tune

2. Domain Adaptation (focal → soundscape)

Waveform mixup (p=0.9, up to 3 sources) — simulates co-occurring species in soundscapes
Cyclic rolling (p=1.0) — removes position bias
Background noise injection (p=0.5, SNR 3-30 dB)
Colored noise (p=0.2, spectral slope -2 to +2)
SpecAugment: freq-mask (50, p=0.3) + time-mask (100, p=0.3)
Energy-based window selection (from Perch 2.0)

3. Ensemble + Post-processing

5-fold CV × 2 architectures = 10 models total
Logit averaging + no-call detection + TTA (time-reversal + gain)

Quick Start

# 1. Download competition data from Kaggle
kaggle competitions download -c birdclef-2026

# 2. Train Bird-MAE-Large (primary model, all 5 folds)
for fold in 0 1 2 3 4; do
  python train_birdclef.py \
    --data_dir ./data/train_audio \
    --metadata ./data/train_metadata.csv \
    --output_dir ./outputs/birdmae \
    --hub_model_id YOUR_USERNAME/birdclef2026-birdmae \
    --epochs 30 --batch_size 32 --lr 3e-4 --fold $fold
done

# 3. Train EfficientNet-B1 (ensemble member, all 5 folds)
for fold in 0 1 2 3 4; do
  python train_effnet.py \
    --data_dir ./data/train_audio \
    --metadata ./data/train_metadata.csv \
    --output_dir ./outputs/effnet \
    --hub_model_id YOUR_USERNAME/birdclef2026-effnet \
    --epochs 50 --batch_size 64 --lr 5e-4 --fold $fold
done

# 4. Inference + Ensemble
python inference_birdclef.py --test_dir ./data/test_soundscapes --model_dir ./outputs/birdmae --output sub_birdmae.csv --tta
python inference_birdclef.py --test_dir ./data/test_soundscapes --model_dir ./outputs/effnet --output sub_effnet.csv
python ensemble_submit.py --submissions sub_birdmae.csv sub_effnet.csv --weights 0.6 0.4 --output final_submission.csv

Files

File	Description
`train_birdclef.py`	Bird-MAE-Large training (primary model)
`train_effnet.py`	EfficientNet-B1 training (ensemble member)
`inference_birdclef.py`	Inference with multi-fold ensemble + TTA
`ensemble_submit.py`	Combine predictions + post-processing

Hardware Requirements

Model	GPU	VRAM	Time/fold
Bird-MAE-Large	A100 80GB	~40GB	~6-8h
EfficientNet-B1	A10G 24GB	~8GB	~3-4h

Dependencies

torch>=2.0
torchaudio>=2.0
transformers==4.48.0
librosa
scikit-learn
pandas
numpy
soundfile
trackio
huggingface_hub

Hyperparameters (from published papers)

Bird-MAE-Large Fine-tuning

Parameter	Value	Source
Learning rate	3e-4	Bird-MAE Table 10
Weight decay	3e-4	Bird-MAE Table 10
Layer decay	0.75	Bird-MAE Table 10
Batch size	32	Adjusted for A100
Epochs	30	Bird-MAE
Freeze epochs	2	sl-BEATs recipe
Loss	Asymmetric (γ_neg=4, γ_pos=0, clip=0.05)	ASL paper
Gradient clip	2.0	Bird-MAE
Sample rate	32,000 Hz	Bird-MAE

EfficientNet-B1

Parameter	Value	Source
Learning rate	5e-4	BirdSet + EffNetB0-all
Weight decay	0.01	sl-BEATs recipe
Batch size	64	-
Epochs	50	EffNetB0-all recipe
Loss	BCE	Standard

References

Bird-MAE: Rauch et al., "Can Masked Autoencoders Also Listen to Birds?", 2025 (2504.12880)
sl-BEATs-all: "What Matters for Bioacoustic Encoding", ICLR 2026 (2508.11845)
Perch 2.0: "The Bittern Lesson for Bioacoustics", 2025 (2508.04665)
FINCH: "Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion", 2026 (2602.03817)
BirdSet: Rauch et al., "BirdSet: A Multi-Task Benchmark", 2024 (2403.10380)
Asymmetric Loss: Ridnik et al., 2021 (2009.14119)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for minalkharat12/birdclef-2026-solution