Hide download count on dataset page

pauldoucet · July 12, 2024, 3:09pm

Good morning,

Is there a way to hide the download count on the dataset page? In our case (MahmoodLab/hest · Datasets at Hugging Face) it’s easier to use snapshot_download instead of load_dataset because of the format of Spatial Transcriptomics data (.h5, .tiff), therefore the download count isn’t incrementing.

Thank you

pauldoucet · July 14, 2024, 12:46am

Actually, the main reason why we are not using load_dataset is because files are being renamed to some hash in the cache. Is there a way to create a custom dataset loading script (datasets.GeneratorBasedBuilder) such that files are not being renamed?

pauldoucet · July 14, 2024, 2:55pm

Using snapshot_download inside _split_generators seems to be the solution:

import datasets
from datasets import Features, Value
from huggingface_hub import snapshot_download


class HestDataset(datasets.GeneratorBasedBuilder):
    def _info(self):
        return datasets.DatasetInfo(
            description="HEST: A Dataset for Spatial Transcriptomics and Histology Image Analysis",
            homepage="https://github.com/mahmoodlab/hest",
            license="CC BY-NC-SA 4.0 Deed",
            features=Features({
                'path': Value('string')
            })
        )

    def _split_generators(self, dl_manager):
        # Download files using the huggingface_hub API
        filenames = [f.split('hest@main/')[-1] for f in self.config_kwargs['data_files']['train']]
        extracted_files = {}
        snapshot_download(repo_id=self.repo_id, allow_patterns=filenames, repo_type="dataset", local_dir=self._cache_dir_root)
        extracted_files['data'] = filenames
        return [
            datasets.SplitGenerator(
                name=datasets.Split.TRAIN,
                gen_kwargs={"filepath": extracted_files["data"]},
        )]

    def _generate_examples(self, filepath):
        idx = 0
        for file in filepath:
            yield idx, {
                'path': file
            }
            idx += 1

Topic		Replies	Views
How to use load_dataset the dataset downloaded via snapshot_download? 🤗Datasets	4	1880	July 8, 2024
The downloads count of dataset hasn't been updated 🤗Datasets	2	42	March 19, 2026
Download only a subset of a split 🤗Datasets	10	18949	February 25, 2025
How to download files stored in repo of dataset script? 🤗Datasets	1	936	March 7, 2022
Datasets download counter 🤗Datasets	2	869	April 3, 2023

Hide download count on dataset page

Related topics