3

Is there any way to get list of models available on Hugging Face? E.g. for Automatic Speech Recognition (ASR).

alvas
123k118 gold badges504 silver badges810 bronze badges
asked Mar 22, 2023 at 9:08

3 Answers 3

7

You can use huggingface_hub with list_models and it's filter attribute to query the hub:

from huggingface_hub import HfApi
api = HfApi()
models = api.list_models( 
 filter="automatic-speech-recognition"
)
models = list(models)
print(len(models))
print(models[0].modelId)

Output:

26126
nvidia/parakeet-tdt-0.6b-v3

filter takes either a string or a list of strings and is used to find matches with library, language, task or tags specified by the models on the huggingface hub.

In case you are using huggingface_hub <0.24.0, you can pass a ModelFilter object as filter argument:

from huggingface_hub import HfApi, ModelFilter
api = HfApi()
models = api.list_models(
 filter=ModelFilter(
 task="automatic-speech-recognition"
 )
)
models = list(models)
print(len(models))
print(models[0].modelId)

Output:

7195
13048909972/wav2vec2-common_voice-tr-demo
answered Mar 22, 2023 at 20:24
Sign up to request clarification or add additional context in comments.

1 Comment

ModelFilter is depreciated. https://github.com/huggingface/huggingface_hub/issues/2478
1

Try something like:

import re
import requests
from bs4 import BeautifulSoup
def list_models(task, sort_by):
 """Returns first page of results from available models on huggingface.co"""
 url = f"https://huggingface.co/models?pipeline_tag={task}&sort={sort_by}"
 response = requests.get(url)
 soup = BeautifulSoup(response.content.decode('utf8'))
 for model in soup.find_all('article'):
 parsed_text = [line.strip() for line in re.sub(' +', ' ', model.text.replace('\n', ' ').replace('\t', ' ').replace('•', '\n')).strip().split('\n')]
 model_name_str, last_updated_str, downloaded, *liked = parsed_text
 liked = int(liked[0]) if liked else 0
 model_name = model.find('a').attrs['href'][1:]
 timestamp = model.find('time').attrs['datetime']
 yield {"model_name": model_name, "last_updated": timestamp, "downloaded": downloaded.strip(), "liked": liked}
task = "automatic-speech-recognition"
sort_by = "downloads"
list(list_models(task, sort_by))

[out]:

[{'model_name': 'jonatasgrosman/wav2vec2-large-xlsr-53-english',
 'last_updated': '2022-12-14T02:02:32',
 'downloaded': '13.7M',
 'liked': 31},
 {'model_name': 'pyannote/voice-activity-detection',
 'last_updated': '2022-10-28T13:46:55',
 'downloaded': '1.06M',
 'liked': 37},
 {'model_name': 'pyannote/speaker-diarization',
 'last_updated': '2022-11-17T13:45:04',
 'downloaded': '855k',
 'liked': 171},
 {'model_name': 'facebook/wav2vec2-large-960h-lv60-self',
 'last_updated': '2022-05-23T16:13:42',
 'downloaded': '636k',
 'liked': 71},
 {'model_name': 'yongjian/wav2vec2-large-a',
 'last_updated': '2022-10-22T07:21:15',
 'downloaded': '272k',
 'liked': 5},
 {'model_name': 'jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli',
 'last_updated': '2022-02-25T19:07:57',
 'downloaded': '248k',
 'liked': 4},
 {'model_name': 'facebook/wav2vec2-base-960h',
 'last_updated': '2022-11-14T21:37:23',
 'downloaded': '222k',
 'liked': 109},
 {'model_name': 'openai/whisper-tiny.en',
 'last_updated': '2023-02-23T15:31:39',
 'downloaded': '66.4k',
 'liked': 35},
 {'model_name': 'facebook/wav2vec2-xlsr-53-espeak-cv-ft',
 'last_updated': '2021-12-10T17:18:39',
 'downloaded': '62.1k',
 'liked': 11},
 {'model_name': 'maxidl/wav2vec2-large-xlsr-german',
 'last_updated': '2021-07-06T12:32:21',
 'downloaded': '60k',
 'liked': 0},
 {'model_name': 'pyannote/overlapped-speech-detection',
 'last_updated': '2022-10-28T13:46:33',
 'downloaded': '45.9k',
 'liked': 4},
 {'model_name': 'openai/whisper-tiny',
 'last_updated': '2023-03-10T17:15:01',
 'downloaded': '43.6k',
 'liked': 41},
 {'model_name': 'ceyda/wav2vec2-base-760-turkish',
 'last_updated': '2021-07-06T00:16:04',
 'downloaded': '42.1k',
 'liked': 2},
 {'model_name': 'openai/whisper-large-v2',
 'last_updated': '2023-03-10T17:15:07',
 'downloaded': '41.7k',
 'liked': 275},
 {'model_name': 'openai/whisper-small',
 'last_updated': '2023-03-10T17:15:13',
 'downloaded': '39.7k',
 'liked': 37},
 {'model_name': 'techiaith/wav2vec2-xlsr-ft-en-cy',
 'last_updated': '2023-03-02T06:30:13',
 'downloaded': '32.5k',
 'liked': 1},
 {'model_name': 'facebook/wav2vec2-xlsr-53-phon-cv-babel-ft',
 'last_updated': '2021-11-10T12:02:20',
 'downloaded': '31.2k',
 'liked': 1},
 {'model_name': 'comodoro/wav2vec2-xls-r-300m-cs-250',
 'last_updated': '2022-03-23T18:26:50',
 'downloaded': '31.1k',
 'liked': 0},
 {'model_name': 'jonatasgrosman/wav2vec2-large-xlsr-53-russian',
 'last_updated': '2022-12-14T01:58:43',
 'downloaded': '29.3k',
 'liked': 12},
 {'model_name': 'facebook/hubert-large-ls960-ft',
 'last_updated': '2022-05-24T10:43:42',
 'downloaded': '23.8k',
 'liked': 27},
 {'model_name': 'jonatasgrosman/wav2vec2-large-xlsr-53-japanese',
 'last_updated': '2022-12-14T01:58:09',
 'downloaded': '21.7k',
 'liked': 9},
 {'model_name': 'openai/whisper-base',
 'last_updated': '2023-03-10T17:13:49',
 'downloaded': '20.5k',
 'liked': 52},
 {'model_name': 'jonatasgrosman/wav2vec2-large-xlsr-53-dutch',
 'last_updated': '2022-12-14T01:58:20',
 'downloaded': '20.1k',
 'liked': 0},
 {'model_name': 'openai/whisper-medium',
 'last_updated': '2023-03-10T17:15:08',
 'downloaded': '20k',
 'liked': 40},
 {'model_name': 'jonatasgrosman/wav2vec2-large-xlsr-53-portuguese',
 'last_updated': '2022-12-14T01:59:47',
 'downloaded': '19.2k',
 'liked': 11},
 {'model_name': 'facebook/wav2vec2-large-960h',
 'last_updated': '2022-04-05T16:40:42',
 'downloaded': '15.5k',
 'liked': 14},
 {'model_name': 'facebook/data2vec-audio-base-960h',
 'last_updated': '2022-05-24T10:41:22',
 'downloaded': '15.4k',
 'liked': 7},
 {'model_name': 'theainerd/Wav2Vec2-large-xlsr-hindi',
 'last_updated': '2023-03-20T05:28:11',
 'downloaded': '15.3k',
 'liked': 2},
 {'model_name': 'tyoc213/wav2vec2-large-xlsr-nahuatl',
 'last_updated': '2021-04-07T02:59:04',
 'downloaded': '12.5k',
 'liked': 1},
 {'model_name': 'openai/whisper-large',
 'last_updated': '2023-03-10T17:15:11',
 'downloaded': '12k',
 'liked': 228}]

Also posted a feature request on https://discuss.huggingface.co/t/feature-request-listing-available-models-datasets-and-metrics/34389

answered Mar 22, 2023 at 12:54

1 Comment

imo it should be noted that this solution is webscraping, which may not play well with OPs use case. ie: if OP plans to ship an app, they may get rejected for web scraping
0

In the latest version of the library huggingface-hub, the ModelFilterhas been deprecated and removed.

So, the modern recommended approach is to pass filter parameters directly to api.list_models(...) instead of using ModelFilter.

For example:

from huggingface_hub import HfApi
api = HfApi()
models = api.list_models(
 task="automatic-speech-recognition", # Filter by task directly
 author="openai", # Optional: filter by author
 library="transformers", # Optional: filter by library
 limit=10 # Optional: limit results
)
models = list(models)
print(len(models), models[0].modelId)

output:

10 openai/whisper-large-v3
answered Sep 12, 2025 at 9:09

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.