YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

πŸ† BirdCLEF 2026 β€” Top-Scoring Solution

Multi-model ensemble approach based on SOTA bioacoustic research.

Architecture

Model Backbone Params Pre-training Weight in Ensemble
Bird-MAE-Large ViT-L/16 302M SSL on XCL-1.7M (Xeno-Canto) 60%
EfficientNet-B1 EfficientNet-B1 19M BirdSet-XCL (9,735 species) 40%

Key Techniques

1. Bird-MAE (arxiv:2504.12880) β€” Primary model

  • Domain-specific SSL pre-training on 1.7M Xeno-Canto bird recordings
  • Fine-tuning with Asymmetric Loss (handles noisy multi-label annotations)
  • Layer-wise LR decay (0.75) for stable ViT fine-tuning
  • 2-stage: 2 epochs frozen backbone β†’ 28 epochs full fine-tune

2. Domain Adaptation (focal β†’ soundscape)

  • Waveform mixup (p=0.9, up to 3 sources) β€” simulates co-occurring species in soundscapes
  • Cyclic rolling (p=1.0) β€” removes position bias
  • Background noise injection (p=0.5, SNR 3-30 dB)
  • Colored noise (p=0.2, spectral slope -2 to +2)
  • SpecAugment: freq-mask (50, p=0.3) + time-mask (100, p=0.3)
  • Energy-based window selection (from Perch 2.0)

3. Ensemble + Post-processing

  • 5-fold CV Γ— 2 architectures = 10 models total
  • Logit averaging + no-call detection + TTA (time-reversal + gain)

Quick Start

# 1. Download competition data from Kaggle
kaggle competitions download -c birdclef-2026

# 2. Train Bird-MAE-Large (primary model, all 5 folds)
for fold in 0 1 2 3 4; do
  python train_birdclef.py \
    --data_dir ./data/train_audio \
    --metadata ./data/train_metadata.csv \
    --output_dir ./outputs/birdmae \
    --hub_model_id YOUR_USERNAME/birdclef2026-birdmae \
    --epochs 30 --batch_size 32 --lr 3e-4 --fold $fold
done

# 3. Train EfficientNet-B1 (ensemble member, all 5 folds)
for fold in 0 1 2 3 4; do
  python train_effnet.py \
    --data_dir ./data/train_audio \
    --metadata ./data/train_metadata.csv \
    --output_dir ./outputs/effnet \
    --hub_model_id YOUR_USERNAME/birdclef2026-effnet \
    --epochs 50 --batch_size 64 --lr 5e-4 --fold $fold
done

# 4. Inference + Ensemble
python inference_birdclef.py --test_dir ./data/test_soundscapes --model_dir ./outputs/birdmae --output sub_birdmae.csv --tta
python inference_birdclef.py --test_dir ./data/test_soundscapes --model_dir ./outputs/effnet --output sub_effnet.csv
python ensemble_submit.py --submissions sub_birdmae.csv sub_effnet.csv --weights 0.6 0.4 --output final_submission.csv

Files

File Description
train_birdclef.py Bird-MAE-Large training (primary model)
train_effnet.py EfficientNet-B1 training (ensemble member)
inference_birdclef.py Inference with multi-fold ensemble + TTA
ensemble_submit.py Combine predictions + post-processing

Hardware Requirements

Model GPU VRAM Time/fold
Bird-MAE-Large A100 80GB ~40GB ~6-8h
EfficientNet-B1 A10G 24GB ~8GB ~3-4h

Dependencies

torch>=2.0
torchaudio>=2.0
transformers==4.48.0
librosa
scikit-learn
pandas
numpy
soundfile
trackio
huggingface_hub

Hyperparameters (from published papers)

Bird-MAE-Large Fine-tuning

Parameter Value Source
Learning rate 3e-4 Bird-MAE Table 10
Weight decay 3e-4 Bird-MAE Table 10
Layer decay 0.75 Bird-MAE Table 10
Batch size 32 Adjusted for A100
Epochs 30 Bird-MAE
Freeze epochs 2 sl-BEATs recipe
Loss Asymmetric (Ξ³_neg=4, Ξ³_pos=0, clip=0.05) ASL paper
Gradient clip 2.0 Bird-MAE
Sample rate 32,000 Hz Bird-MAE

EfficientNet-B1

Parameter Value Source
Learning rate 5e-4 BirdSet + EffNetB0-all
Weight decay 0.01 sl-BEATs recipe
Batch size 64 -
Epochs 50 EffNetB0-all recipe
Loss BCE Standard

References

  1. Bird-MAE: Rauch et al., "Can Masked Autoencoders Also Listen to Birds?", 2025 (2504.12880)
  2. sl-BEATs-all: "What Matters for Bioacoustic Encoding", ICLR 2026 (2508.11845)
  3. Perch 2.0: "The Bittern Lesson for Bioacoustics", 2025 (2508.04665)
  4. FINCH: "Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion", 2026 (2602.03817)
  5. BirdSet: Rauch et al., "BirdSet: A Multi-Task Benchmark", 2024 (2403.10380)
  6. Asymmetric Loss: Ridnik et al., 2021 (2009.14119)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for minalkharat12/birdclef-2026-solution