Some HF operations take an excessively long time to complete

Hi,

I am trying to adapt the notebook Getting started with NLP for absolute beginners authored by Jeremy Howard to another use case (lithology classification, for background only).

I am working on my laptop, have set up a dedicated huggingface jupyter conda kernel (end of this post, for reference).

My network connection (workplace) is rather fast, and working fine, but some operations with Hugging Face are taking a very long time to complete, and I am puzzled as to why. I cannot recall the original notebook taking such a large time to execute the same operations. I am new to Hugging Face, so any guidance or feedback would be appreciated.

A few things I observed using ipython’s %%time directives:

model_nm = 'microsoft/deberta-v3-small'
tokz = AutoTokenizer.from_pretrained(model_nm)
CPU times: user 504 ms, sys: 57.9 ms, total: 562 ms
Wall time: 14min 13s
model = AutoModelForSequenceClassification.from_pretrained(model_nm, num_labels=num_labels) 
Downloading: 100%
273M/273M [00:25<00:00, 11.6MB/s]
CPU times: user 4.38 s, sys: 1.17 s, total: 5.55 s
Wall time: 19min 16s
from datasets import load_metric
metric = load_metric("accuracy")
CPU times: user 92.7 ms, sys: 70.7 ms, total: 163 ms
Wall time: 6min 41s

Background: my conda environment was created with:

conda create -n hf python=3.9 -c conda-forge
myenv=hf
mamba install -n $myenv -c conda-forge ipykernel matplotlib
mamba install -n $myenv -c huggingface -c conda-forge datasets transformers
mamba install -n $myenv pytorch=1.11 -c pytorch -c nvidia -c conda-forge
mamba install -n $myenv torchvision torchaudio -c pytorch -c nvidia -c conda-forge
mamba install -n $myenv -c conda-forge sentencepiece

This is very strange behavior. There is a huge discrepancy between wall time and CPU time. Do you have the same issue if you do not run this in a notebook but in a regular Python script?

To me, I would guess at first glance that this is an I/O. Where are you storing the models/files? Is it possible that you have set up your cache on an external drive or something?

I’ve tried to replicate the issue on another PC; same LAN, but a Windows10 box.

The completion time of AutoTokenizer.from_pretrained('microsoft/deberta-v3-small') is much more acceptable (10s). On Linux, using a very similar conda environment, set up on a local drive, still stubbornly 14 minutes.

I’ve taken some more notes for possible repro:

set myenv=hf
mamba create -n  %myenv%  python=3.9 -c conda-forge
mamba install -n  %myenv%  ipykernel matplotlib -c conda-forge
mamba install -n  %myenv%  -c pytorch -c conda-forge --file nbdev.txt
mamba install -n  %myenv%  pytorch=1.11 -c pytorch -c nvidia -c conda-forge
mamba install -n  %myenv%  torchvision torchaudio -c pytorch -c nvidia -c conda-forge
mamba install -n  %myenv%  -c conda-forge sentencepiece
mamba install -n  %myenv%  -c huggingface -c conda-forge datasets transformers scikit-learn

mamba activate  %myenv% 
python -m ipykernel install --user --name  %myenv%  --display-name "Hugging Face"
# %%
from transformers import AutoTokenizer # Takes 30 seconds, but only the first time this is done. After, pretty quick
# %%
model_nm = 'microsoft/deberta-v3-small'
# %%
tokz = AutoTokenizer.from_pretrained(model_nm)  # 9 seconds on Windows, 14 minutes on Linux

The Windows OS is:

Edition	Windows 10 Enterprise
Version	21H2
OS build	19044.1645
Experience	Windows Feature Experience Pack 120.2212.4170.0

The Linux OS is:

Linux blahblahblah 5.10.0-14-amd64 #1 SMP Debian 5.10.113-1 (2022-04-29) x86_64 GNU/Linux

I am scratching my head. The only difference I can think of is that I also installed on Linux in the conda env some fast.ai libraries, notably fastcore, as well as specifying:

nbformat>=4.4.0
nbconvert>=6.1
pyyaml
jupyter 
jupyter_client<8 
ghapi 
Jinja2<3.1.0

I think I can only try to recreate from the very scratch the exact same conda env on Linux, and see if the issue goes away.

I hope this helps someone out there if one bumps into the same issue. I may open a github issue if I feel I’ve ruled out problems within my control.

This was a rabbit hole, but I diagnosed and sorted this out.

I won’t get these hours of my life back, but I sincerely hope this will save some for a future reader.

TL;DR

  • Are you on a network configured with IPv6 and IPv4?
  • Have you turned off your WiFi connection and kept only your ethernet one? Resolved the issue for me.
  • I am not sure if this is a set of circumstances likely enough to warrant some design changes in HF. I may have hit a very unlikely scenario.

Detailed diagnosis

The call stack was:

create_connection (/path/to/condaenv/urllib3/util/connection.py:86)
_new_conn (/path/to/condaenv/urllib3/connection.py:174)
connect (/path/to/condaenv/urllib3/connection.py:358)
_validate_conn (/path/to/condaenv/urllib3/connectionpool.py:1040)
_make_request (/path/to/condaenv/urllib3/connectionpool.py:386)
urlopen (/path/to/condaenv/urllib3/connectionpool.py:703)
send (/path/to/condaenv/requests/adapters.py:489)
send (/path/to/condaenv/requests/sessions.py:701)
request (/path/to/condaenv/requests/sessions.py:587)
request (/path/to/condaenv/requests/api.py:59)
head (/path/to/condaenv/requests/api.py:100)
get_from_cache (/path/to/condaenv/transformers/file_utils.py:1573)
cached_path (/path/to/condaenv/transformers/file_utils.py:1402)
get_config_dict (/path/to/condaenv/transformers/configuration_utils.py:546)
from_pretrained (/path/to/condaenv/transformers/models/auto/configuration_auto.py:527)
from_pretrained (/path/to/condaenv/transformers/models/auto/tokenization_auto.py:463)

in the urllib3 package, create_connection has a loop trying to create socket connections:

    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
        af, socktype, proto, canonname, sa = res
        sock = None
        try:
            # snip, some things including setting a timeout=10 seconds
            sock.connect(sa)

socket.getaddrinfo(host, port, family, socket.SOCK_STREAM) consists of (censored with fake addresses):

(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff00:...:b1f2:e3a4', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff50:...:8142:2344', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff00:...:2102:d3a4', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff50:...:d122:d3b4', 443, 0, 0))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1.5.66.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('2.5.65.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('3.5.64.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('4.5.63.77', 443))

it looks like the one that succeeds is the 6th, (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('2.5.65.77', 443)).

transformers/file_utils has at least one connexion initiation: get_from_cache (/path/to/condaenv/transformers/file_utils.py:1573)

    if not local_files_only:
        try:
            r = requests.head(url, headers=headers, allow_redirects=False, proxies=proxies, timeout=etag_timeout)

etag_timeout is 10, visibly 10 seconds, which may be a sensible default.

I have not looked at how many connexions attempts are made, but presumatly this is way more than the 6 * 10 seconds worth triggered by one get_from_cache call.

I have not debugged running all this as root, but presumably the first attempt at a socket connection just succeeds.

If I turn off the WiFi adapter…

…the list of socket address to try are in a different order, importantly IPv4 addresses are listed first:

(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('3.5.64.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('2.5.65.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1.5.66.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('4.5.63.77', 443))

(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff50:...:8142:2344', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff00:...:b1f2:e3a4', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff00:...:2102:d3a4', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff50:...:d122:d3b4', 443, 0, 0))

Interesting find! Thanks for digging through it, and thanks for reporting your findings here!

Note that there was some outage on the Model Hub this weekend due to DDOS attacks. So it’s possible it affected the time for responses on the requests.

Hi, I had the same problem. Another workaround is to disable ipv6, seems like a problem in urllib3

$ sysctl net.ipv6.conf.all.disable_ipv6=1

wow. hard to believe its something so low level. i used System Settings to turn off ip6