ASL Citizen BiGRU + Attention Isolated Sign Encoder
This repository contains the trained isolated American Sign Language recognition model for the FYP project:
Bidirectional ASL โ English Translation System
The model was trained on the processed ASL Citizen keypoints-200 dataset and is intended to be used as the isolated sign encoder baseline. The encoder checkpoint can later be reused or transferred into the continuous How2Sign CTC stage.
Project Stage
ASL Citizen processed keypoints-200
โ
BiGRU + Attention isolated sign recognition model
โ
Saved isolated encoder checkpoint
โ
Future transfer to How2Sign CTC model
Dataset
Training dataset:
SharoonArshad/asl-citizen-processed-200
The dataset was processed into fixed-length keypoint sequences:
(N, 200, 450)
Where:
200 = fixed sequence length
450 = 225 position features + 225 velocity features
225 = 75 landmarks ร 3 coordinates
75 landmarks = 33 pose + 21 left hand + 21 right hand
Model Architecture
The model uses a BiGRU encoder with attention pooling:
Input: (B, 200, 450)
LayerNorm: 450
Linear Projection: 450 โ 256
BiGRU Encoder: hidden_dim=256, layers=2, bidirectional=True
Attention Pooling: temporal attention over 200 frames
Embedding Layer: 512-dimensional representation
Classifier Head: 200 ASL classes
Training Configuration
Main settings:
Optimizer: AdamW
Learning rate: 3e-4
Weight decay: 1e-4
Scheduler: CosineAnnealingLR
Batch size: 32
Max epochs: 60
Early stopping patience: 10
Dropout: 0.35
Label smoothing: 0.10
Gradient clipping: 1.0
Mixed precision: enabled on GPU
Results
Final training summary:
Best epoch: 45
Best validation top-1 accuracy: 85.47%
Test loss: 0.9543
Test correct: 3498/3552
Test top-1 accuracy: 98.48%
Test top-5 accuracy: 99.58%
Test macro F1: 0.9850
Test weighted F1: 0.9848
The validation score is the safest indicator of generalization:
Validation Top-1 Accuracy: 85.47%
The test score is very high and should be reported carefully because the test split may be easier or may contain samples similar to training samples.
Repository Files
asl_isolated_bigru_attention.zip
checkpoints/best_bigru_attention_model.pt
checkpoints/best_isolated_encoder_only.pt
checkpoints/last_bigru_attention_model.pt
reports/final_metrics.json
reports/training_history.csv
reports/test_classification_report.csv
reports/test_classification_report.json
reports/test_confusion_matrix.npy
metadata/label_to_id.json
metadata/id_to_label.json
training_config.json
Important Files
Full model checkpoint
checkpoints/best_bigru_attention_model.pt
Use this file if you want to reload the complete classifier for isolated sign prediction.
Encoder-only checkpoint
checkpoints/best_isolated_encoder_only.pt
Use this file for future transfer learning into the continuous How2Sign CTC model.
Full ZIP archive
asl_isolated_bigru_attention.zip
This archive contains the full training output folder, including checkpoints, reports, label maps, and configuration files.
Loading the Model in Kaggle
Example download code:
from huggingface_hub import hf_hub_download
repo_id = "SharoonArshad/asl-citizen-bigru-attention-encoder-200"
model_path = hf_hub_download(
repo_id=repo_id,
repo_type="model",
filename="checkpoints/best_bigru_attention_model.pt"
)
encoder_path = hf_hub_download(
repo_id=repo_id,
repo_type="model",
filename="checkpoints/best_isolated_encoder_only.pt"
)
print(model_path)
print(encoder_path)
Loading the Full ZIP Archive
from huggingface_hub import hf_hub_download
import zipfile
from pathlib import Path
repo_id = "SharoonArshad/asl-citizen-bigru-attention-encoder-200"
zip_path = hf_hub_download(
repo_id=repo_id,
repo_type="model",
filename="asl_isolated_bigru_attention.zip"
)
extract_dir = Path("/kaggle/working/asl_isolated_bigru_attention_loaded")
extract_dir.mkdir(parents=True, exist_ok=True)
with zipfile.ZipFile(zip_path, "r") as zip_ref:
zip_ref.extractall(extract_dir)
print("Extracted to:", extract_dir)
Thesis Use
This model can be documented as the isolated sign recognition baseline for the project. It provides the first trained visual/keypoint encoder before moving to continuous sign recognition using How2Sign and CTC.
Suggested thesis statement:
A BiGRU with attention pooling was trained on the processed ASL Citizen keypoints-200 dataset for 200 isolated ASL classes. The model achieved 85.47% validation top-1 accuracy and 98.48% test top-1 accuracy, with 99.58% test top-5 accuracy. The trained encoder is saved for future transfer to a CTC-based continuous sign recognition model.
Limitations
- The model is trained on isolated signs, not continuous signing.
- Test accuracy is much higher than validation accuracy, so split difficulty and sample similarity should be discussed.
- The model recognizes 200 selected ASL classes from the processed ASL Citizen dataset.
- It does not yet perform ASL sentence translation.
Next Step
The next project stage is to use the saved encoder checkpoint as the pretrained visual encoder for a CTC-based How2Sign continuous sign recognition model.