ASL Citizen BiGRU + Attention Isolated Sign Encoder

This repository contains the trained isolated American Sign Language recognition model for the FYP project:

Bidirectional ASL โ†” English Translation System

The model was trained on the processed ASL Citizen keypoints-200 dataset and is intended to be used as the isolated sign encoder baseline. The encoder checkpoint can later be reused or transferred into the continuous How2Sign CTC stage.

Project Stage

ASL Citizen processed keypoints-200
        โ†“
BiGRU + Attention isolated sign recognition model
        โ†“
Saved isolated encoder checkpoint
        โ†“
Future transfer to How2Sign CTC model

Dataset

Training dataset:

SharoonArshad/asl-citizen-processed-200

The dataset was processed into fixed-length keypoint sequences:

(N, 200, 450)

Where:

200 = fixed sequence length
450 = 225 position features + 225 velocity features
225 = 75 landmarks ร— 3 coordinates
75 landmarks = 33 pose + 21 left hand + 21 right hand

Model Architecture

The model uses a BiGRU encoder with attention pooling:

Input:              (B, 200, 450)
LayerNorm:          450
Linear Projection:  450 โ†’ 256
BiGRU Encoder:      hidden_dim=256, layers=2, bidirectional=True
Attention Pooling:  temporal attention over 200 frames
Embedding Layer:    512-dimensional representation
Classifier Head:    200 ASL classes

Training Configuration

Main settings:

Optimizer: AdamW
Learning rate: 3e-4
Weight decay: 1e-4
Scheduler: CosineAnnealingLR
Batch size: 32
Max epochs: 60
Early stopping patience: 10
Dropout: 0.35
Label smoothing: 0.10
Gradient clipping: 1.0
Mixed precision: enabled on GPU

Results

Final training summary:

Best epoch: 45
Best validation top-1 accuracy: 85.47%
Test loss: 0.9543
Test correct: 3498/3552
Test top-1 accuracy: 98.48%
Test top-5 accuracy: 99.58%
Test macro F1: 0.9850
Test weighted F1: 0.9848

The validation score is the safest indicator of generalization:

Validation Top-1 Accuracy: 85.47%

The test score is very high and should be reported carefully because the test split may be easier or may contain samples similar to training samples.

Repository Files

asl_isolated_bigru_attention.zip
checkpoints/best_bigru_attention_model.pt
checkpoints/best_isolated_encoder_only.pt
checkpoints/last_bigru_attention_model.pt
reports/final_metrics.json
reports/training_history.csv
reports/test_classification_report.csv
reports/test_classification_report.json
reports/test_confusion_matrix.npy
metadata/label_to_id.json
metadata/id_to_label.json
training_config.json

Important Files

Full model checkpoint

checkpoints/best_bigru_attention_model.pt

Use this file if you want to reload the complete classifier for isolated sign prediction.

Encoder-only checkpoint

checkpoints/best_isolated_encoder_only.pt

Use this file for future transfer learning into the continuous How2Sign CTC model.

Full ZIP archive

asl_isolated_bigru_attention.zip

This archive contains the full training output folder, including checkpoints, reports, label maps, and configuration files.

Loading the Model in Kaggle

Example download code:

from huggingface_hub import hf_hub_download

repo_id = "SharoonArshad/asl-citizen-bigru-attention-encoder-200"

model_path = hf_hub_download(
    repo_id=repo_id,
    repo_type="model",
    filename="checkpoints/best_bigru_attention_model.pt"
)

encoder_path = hf_hub_download(
    repo_id=repo_id,
    repo_type="model",
    filename="checkpoints/best_isolated_encoder_only.pt"
)

print(model_path)
print(encoder_path)

Loading the Full ZIP Archive

from huggingface_hub import hf_hub_download
import zipfile
from pathlib import Path

repo_id = "SharoonArshad/asl-citizen-bigru-attention-encoder-200"

zip_path = hf_hub_download(
    repo_id=repo_id,
    repo_type="model",
    filename="asl_isolated_bigru_attention.zip"
)

extract_dir = Path("/kaggle/working/asl_isolated_bigru_attention_loaded")
extract_dir.mkdir(parents=True, exist_ok=True)

with zipfile.ZipFile(zip_path, "r") as zip_ref:
    zip_ref.extractall(extract_dir)

print("Extracted to:", extract_dir)

Thesis Use

This model can be documented as the isolated sign recognition baseline for the project. It provides the first trained visual/keypoint encoder before moving to continuous sign recognition using How2Sign and CTC.

Suggested thesis statement:

A BiGRU with attention pooling was trained on the processed ASL Citizen keypoints-200 dataset for 200 isolated ASL classes. The model achieved 85.47% validation top-1 accuracy and 98.48% test top-1 accuracy, with 99.58% test top-5 accuracy. The trained encoder is saved for future transfer to a CTC-based continuous sign recognition model.

Limitations

  • The model is trained on isolated signs, not continuous signing.
  • Test accuracy is much higher than validation accuracy, so split difficulty and sample similarity should be discussed.
  • The model recognizes 200 selected ASL classes from the processed ASL Citizen dataset.
  • It does not yet perform ASL sentence translation.

Next Step

The next project stage is to use the saved encoder checkpoint as the pretrained visual encoder for a CTC-based How2Sign continuous sign recognition model.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train SharoonArshad/asl-citizen-bigru-attention-encoder-200