-
open-llm-leaderboard/tensopolis__virtuoso-lite-tensopolis-v2-details
Viewer • Updated • 43.2k • 26 -
open-llm-leaderboard/tensopolis__falcon3-10b-tensopolis-v1-details
Viewer • Updated • 43.2k • 25 -
open-llm-leaderboard/Pinkstack__SuperThoughts-CoT-14B-16k-o1-QwQ-details
Viewer • Updated • 43.2k • 47 • 2 -
open-llm-leaderboard/prithivMLmods__QwQ-LCoT-14B-Conversational-details
Viewer • Updated • 43.2k • 45 • 1
In this space you will find the dataset with detailed results and queries for the models on the leaderboard.
\nScore results are here, and current state of requests is here. \nFor the detailed prediction, look for your model name in the datasets below!
\n","classNames":"hf-sanitized hf-sanitized-5JhBlw-N-x9ELzhG1i100"},"users":[{"_id":"5df7e9e5da6d0311fd3d53f9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583857746553-5df7e9e5da6d0311fd3d53f9.jpeg","isPro":true,"fullname":"Thomas Wolf","user":"thomwolf","type":"user"},{"_id":"5e48005437cb5b49818287a5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e48005437cb5b49818287a5/4uCXGGui-9QifAT4qelxU.png","isPro":false,"fullname":"Leandro von Werra","user":"lvwerra","type":"user"},{"_id":"5e9ecfc04957053f60648a3e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1594214747713-5e9ecfc04957053f60648a3e.png","isPro":true,"fullname":"Quentin Lhoest","user":"lhoestq","type":"user"},{"_id":"5f0c746619cb630495b814fd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1594651707950-noauth.jpeg","isPro":true,"fullname":"Lewis Tunstall","user":"lewtun","type":"user"},{"_id":"5f17f0a0925b9863e28ad517","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f17f0a0925b9863e28ad517/fXIY5i9RLsIa1v3CCuVtt.jpeg","isPro":true,"fullname":"Victor Mustar","user":"victor","type":"user"},{"_id":"5fbfd09ee366524fe8e97cd3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1606406298765-noauth.jpeg","isPro":false,"fullname":"Albert Villanova del Moral","user":"albertvillanova","type":"user"},{"_id":"60c757ea5f9a76ab3f844f12","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1626214544196-60c757ea5f9a76ab3f844f12.png","isPro":false,"fullname":"Margaret Mitchell","user":"meg","type":"user"},{"_id":"61d5bf2f0435582ab69f8f6d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1641398053462-noauth.jpeg","isPro":false,"fullname":"Pete","user":"pngwn","type":"user"},{"_id":"6200d0a443eb0913fa2df7cc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1644220542819-noauth.jpeg","isPro":true,"fullname":"Edward Beeching","user":"edbeeching","type":"user"},{"_id":"6202a599216215a22221dea9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1644340617257-noauth.png","isPro":false,"fullname":"Clémentine Fourrier","user":"clefourrier","type":"user"},{"_id":"626a9bfa03e2e2796f24ca11","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1654278567459-626a9bfa03e2e2796f24ca11.jpeg","isPro":true,"fullname":"Freddy Boulton","user":"freddyaboulton","type":"user"},{"_id":"626ede24d2fa9e7d598c8709","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/626ede24d2fa9e7d598c8709/JKS8-Y2Jw87EgNQZBRswq.jpeg","isPro":false,"fullname":"Hynek Kydlicek","user":"hynky","type":"user"},{"_id":"6273f303f6d63a28483fde12","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1659336880158-6273f303f6d63a28483fde12.png","isPro":true,"fullname":"Lucain Pouget","user":"Wauplin","type":"user"},{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","isPro":true,"fullname":"Adina Yakefu","user":"AdinaY","type":"user"},{"_id":"63e0eea7af523c37e5a77966","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678663263366-63e0eea7af523c37e5a77966.jpeg","isPro":true,"fullname":"Nathan Habib","user":"SaylorTwift","type":"user"},{"_id":"63f5010dfcf95ecac2ad8652","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f5010dfcf95ecac2ad8652/vmRox4fcHMjT1y2bidjOL.jpeg","isPro":false,"fullname":"Alina Lozovskaya","user":"alozowski","type":"user"},{"_id":"649074612549fd68a762b2ba","avatarUrl":"/avatars/a713feffa472a1104363b0a12ac676fc.svg","isPro":false,"fullname":"Open LLM Bot","user":"open-llm-bot","type":"user"},{"_id":"64cb7fdb9e30a46f7b92aa45","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64cb7fdb9e30a46f7b92aa45/TKaRtn_-R_W__QY8DsQv3.jpeg","isPro":true,"fullname":"frere thibaud","user":"tfrere","type":"user"}],"userCount":18,"collections":[{"slug":"open-llm-leaderboard/details-67a9e2c8e0b8764ad379a023","title":"Details","description":"A gated collection of datasets containing evaluation details","gating":true,"lastUpdated":"2025-07-30T19:58:32.877Z","owner":{"_id":"649070e345920777b9f1f5c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/j21QZzv9_PGPUH5FbUaeM.png","fullname":"Open LLM Leaderboard","name":"open-llm-leaderboard","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"isEnterprise":true,"followerCount":1401},"items":[{"_id":"67a9e3c127de07e1f0a30f5c","position":0,"type":"dataset","author":"open-llm-leaderboard","downloads":26,"gated":"auto","id":"open-llm-leaderboard/tensopolis__virtuoso-lite-tensopolis-v2-details","lastModified":"2025-03-09T20:20:07.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":43212,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["json"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false},{"_id":"67a9e3de6f7aa4e04bf8aba0","position":1,"type":"dataset","author":"open-llm-leaderboard","downloads":25,"gated":"auto","id":"open-llm-leaderboard/tensopolis__falcon3-10b-tensopolis-v1-details","lastModified":"2025-03-08T15:08:19.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":43212,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["json"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false},{"_id":"67a9e4bdc5eabb8cb2e74d6e","position":2,"type":"dataset","author":"open-llm-leaderboard","downloads":47,"gated":"auto","id":"open-llm-leaderboard/Pinkstack__SuperThoughts-CoT-14B-16k-o1-QwQ-details","lastModified":"2025-02-13T16:43:23.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":43212,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["json"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":2,"isLikedByUser":false},{"_id":"67a9e4bde5cc8501f008d509","position":3,"type":"dataset","author":"open-llm-leaderboard","downloads":45,"gated":"auto","id":"open-llm-leaderboard/prithivMLmods__QwQ-LCoT-14B-Conversational-details","lastModified":"2025-02-13T16:48:49.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":43212,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["json"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":1,"isLikedByUser":false}],"position":0,"theme":"blue","private":false,"shareUrl":"https://huggingface.co/collections/open-llm-leaderboard/details-67a9e2c8e0b8764ad379a023","upvotes":4,"isUpvotedByUser":false},{"slug":"open-llm-leaderboard/open-llm-leaderboard-2-660cdb7601eba6852431fffc","title":"Open LLM Leaderboard 2","description":"","gating":false,"lastUpdated":"2024-10-17T09:53:09.338Z","owner":{"_id":"649070e345920777b9f1f5c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/j21QZzv9_PGPUH5FbUaeM.png","fullname":"Open LLM Leaderboard","name":"open-llm-leaderboard","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"isEnterprise":true,"followerCount":1401},"items":[{"_id":"6710d97363510b26bc531fab","position":0,"type":"space","author":"open-llm-leaderboard","authorData":{"_id":"649070e345920777b9f1f5c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/j21QZzv9_PGPUH5FbUaeM.png","fullname":"Open LLM Leaderboard","name":"open-llm-leaderboard","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"isEnterprise":true,"followerCount":1401},"colorFrom":"blue","colorTo":"red","createdAt":"2023-04-17T11:40:06.000Z","emoji":"🏆","id":"open-llm-leaderboard/open_llm_leaderboard","lastModified":"2025-03-25T09:02:15.000Z","likes":13553,"pinned":true,"private":false,"sdk":"docker","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":"cpu-upgrade","requested":"cpu-upgrade"},"storage":"small","gcTimeout":172800,"replicas":{"current":1,"requested":1},"devMode":false,"domains":[{"domain":"open-llm-leaderboard-open-llm-leaderboard.hf.space","stage":"READY"}],"sha":"6ee9164f8a40124224ffd0ca2be9d859f048dacb"},"shortDescription":"Track, rank and evaluate open LLMs and chatbots","title":"Open LLM Leaderboard","isLikedByUser":false,"ai_short_description":"Compare open-source LLMs across benchmarks","ai_category":"Text Analysis","trendingScore":23,"tags":["docker","leaderboard","modality:text","submission:automatic","test:public","language:english","eval:code","eval:math","region:us"]},{"_id":"6710da2bb85ad4e5234000a5","position":1,"type":"space","note":{"html":"Release blog of the Open LLM Leaderboard v2 - read this to better understand what we did and why","text":"Release blog of the Open LLM Leaderboard v2 - read this to better understand what we did and why"},"author":"open-llm-leaderboard","authorData":{"_id":"649070e345920777b9f1f5c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/j21QZzv9_PGPUH5FbUaeM.png","fullname":"Open LLM Leaderboard","name":"open-llm-leaderboard","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"isEnterprise":true,"followerCount":1401},"colorFrom":"pink","colorTo":"red","createdAt":"2024-06-23T16:59:22.000Z","emoji":"🏔️","id":"open-llm-leaderboard/blog","lastModified":"2024-07-01T08:57:50.000Z","likes":124,"pinned":false,"private":false,"sdk":"static","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":null,"requested":null},"storage":null,"replicas":{"requested":1,"current":1}},"title":"Open-LLM performances are plateauing, let’s make the leaderboard steep again","isLikedByUser":false,"ai_short_description":"Explore and compare advanced language models on a new leaderboard","ai_category":"Text Analysis","trendingScore":0,"tags":["static","region:us"]},{"_id":"6710d9b6f4108091ee8daba8","position":2,"type":"dataset","note":{"html":"Aggregated results for the Open LLM Leaderboard - if you want to download something, it's probably this!\n","text":"Aggregated results for the Open LLM Leaderboard - if you want to download something, it's probably this!\n"},"author":"open-llm-leaderboard","downloads":9081,"gated":false,"id":"open-llm-leaderboard/contents","lastModified":"2025-03-20T12:17:27.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":4576,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":20,"isLikedByUser":false},{"_id":"6710d98e0ddb4ceed0e24cd0","position":3,"type":"dataset","note":{"html":"Results, model per model, of all the evaluations we run.\n","text":"Results, model per model, of all the evaluations we run.\n"},"author":"open-llm-leaderboard","downloads":3903,"gated":false,"id":"open-llm-leaderboard/results","lastModified":"2025-03-15T05:57:14.000Z","datasetsServerInfo":{"viewer":"preview","numRows":0,"libraries":[],"formats":[],"modalities":[]},"private":false,"repoType":"dataset","likes":15,"isLikedByUser":false}],"position":2,"theme":"indigo","private":false,"shareUrl":"https://huggingface.co/collections/open-llm-leaderboard/open-llm-leaderboard-2-660cdb7601eba6852431fffc","upvotes":18,"isUpvotedByUser":false},{"slug":"open-llm-leaderboard/the-big-benchmarks-collection-64faca6335a7fc7d4ffe974a","title":"The Big Benchmarks Collection","description":"Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard)","gating":false,"lastUpdated":"2024-11-18T08:11:27.905Z","owner":{"_id":"649070e345920777b9f1f5c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/j21QZzv9_PGPUH5FbUaeM.png","fullname":"Open LLM Leaderboard","name":"open-llm-leaderboard","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"isEnterprise":true,"followerCount":1401},"items":[{"_id":"64faca75b3eee10ba5f528b6","position":0,"type":"space","note":{"html":"📐 The 🤗 Open LLM Leaderboard aims to track, rank and evaluate open LLMs and chatbots.\n🤗 Submit a model for automated evaluation on the 🤗 GPU cluster on the “Submit” page!","text":"📐 The 🤗 Open LLM Leaderboard aims to track, rank and evaluate open LLMs and chatbots.\n🤗 Submit a model for automated evaluation on the 🤗 GPU cluster on the “Submit” page!"},"author":"open-llm-leaderboard","authorData":{"_id":"649070e345920777b9f1f5c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/j21QZzv9_PGPUH5FbUaeM.png","fullname":"Open LLM Leaderboard","name":"open-llm-leaderboard","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"isEnterprise":true,"followerCount":1401},"colorFrom":"blue","colorTo":"red","createdAt":"2023-04-17T11:40:06.000Z","emoji":"🏆","id":"open-llm-leaderboard/open_llm_leaderboard","lastModified":"2025-03-25T09:02:15.000Z","likes":13553,"pinned":true,"private":false,"sdk":"docker","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":"cpu-upgrade","requested":"cpu-upgrade"},"storage":"small","gcTimeout":172800,"replicas":{"current":1,"requested":1},"devMode":false,"domains":[{"domain":"open-llm-leaderboard-open-llm-leaderboard.hf.space","stage":"READY"}],"sha":"6ee9164f8a40124224ffd0ca2be9d859f048dacb"},"shortDescription":"Track, rank and evaluate open LLMs and chatbots","title":"Open LLM Leaderboard","isLikedByUser":false,"ai_short_description":"Compare open-source LLMs across benchmarks","ai_category":"Text Analysis","trendingScore":23,"tags":["docker","leaderboard","modality:text","submission:automatic","test:public","language:english","eval:code","eval:math","region:us"]},{"_id":"64facb1fdcc5ce730e4f5095","position":1,"type":"space","note":{"html":"Massive Text Embedding Benchmark (MTEB) Leaderboard.","text":"Massive Text Embedding Benchmark (MTEB) Leaderboard."},"author":"mteb","authorData":{"_id":"624bfda5459c48438cc39f80","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1664267264786-5f1eb362eec0ad2a071ad6e2.png","fullname":"Massive Text Embedding Benchmark","name":"mteb","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"isEnterprise":false,"followerCount":722},"colorFrom":"blue","colorTo":"indigo","createdAt":"2022-09-29T11:29:23.000Z","emoji":"🥇","id":"mteb/leaderboard","lastModified":"2025-09-04T21:04:45.000Z","likes":6427,"pinned":true,"private":false,"sdk":"docker","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":"cpu-upgrade","requested":"cpu-upgrade"},"storage":null,"gcTimeout":172800,"replicas":{"current":1,"requested":1},"devMode":false,"domains":[{"domain":"mteb-leaderboard.hf.space","stage":"READY"}],"sha":"170ea3c9962918091df6ed4769fb754b19bd11f6"},"shortDescription":"Embedding Leaderboard","title":"MTEB Leaderboard","isLikedByUser":false,"ai_short_description":"Select and benchmark text and image embedding models","ai_category":"Model Benchmarking","trendingScore":39,"tags":["docker","leaderboard","region:us"]},{"_id":"64facb9ecb692ce13b4d48c5","position":2,"type":"space","note":{"html":"🏆 This leaderboard is based on the following three benchmarks:\nChatbot Arena - a crowdsourced, randomized battle platform. We use 70K+ user votes to compute Elo ratings.\nMT-Bench - a set of challenging multi-turn questions. We use GPT-4 to grade the model responses.\nMMLU (5-shot) - a test to measure a model’s multitask accuracy on 57 tasks.\n","text":"🏆 This leaderboard is based on the following three benchmarks:\nChatbot Arena - a crowdsourced, randomized battle platform. We use 70K+ user votes to compute Elo ratings.\nMT-Bench - a set of challenging multi-turn questions. We use GPT-4 to grade the model responses.\nMMLU (5-shot) - a test to measure a model’s multitask accuracy on 57 tasks.\n"},"author":"lmarena-ai","authorData":{"_id":"66ef3ed65599f029664ac6ac","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6889593b8899f3948aea982a/KvT-0UV-ukIDuqOVxXp_j.png","fullname":"LMArena","name":"lmarena-ai","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"isEnterprise":false,"followerCount":326},"colorFrom":"indigo","colorTo":"green","createdAt":"2023-05-20T09:26:19.000Z","emoji":"🏆🤖","id":"lmarena-ai/lmarena-leaderboard","lastModified":"2025-09-03T05:38:14.000Z","likes":4617,"pinned":false,"private":false,"sdk":"static","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":null,"requested":"cpu-basic"},"storage":null,"gcTimeout":null,"errorMessage":"Static SDK is not supported","replicas":{"requested":1},"domains":[{"domain":"lmarena-ai-lmarena-leaderboard.static.hf.space","stage":"READY"},{"domain":"unknown.lmsys.org","stage":"PENDING"}]},"title":"LMArena Leaderboard","isLikedByUser":false,"ai_short_description":"Display LMArena Leaderboard","ai_category":"Other","trendingScore":4,"tags":["static","leaderboard","region:us"]},{"_id":"64facba499123d7698753472","position":3,"type":"space","note":{"html":"The 🤗 LLM-Perf Leaderboard 🏋️ aims to benchmark the performance (latency, throughput & memory) of Large Language Models (LLMs) with different hardwares, backends and optimizations using Optimum-Benchmark and Optimum flavors.\nAnyone from the community can request a model or a hardware/backend/optimization configuration for automated benchmarking:","text":"The 🤗 LLM-Perf Leaderboard 🏋️ aims to benchmark the performance (latency, throughput & memory) of Large Language Models (LLMs) with different hardwares, backends and optimizations using Optimum-Benchmark and Optimum flavors.\nAnyone from the community can request a model or a hardware/backend/optimization configuration for automated benchmarking:"},"author":"optimum","authorData":{"_id":"623c9532ca87989bbf785959","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653061054662-5ff5d596f244529b3ec0fb89.png","fullname":"Hugging Face Optimum","name":"optimum","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"isEnterprise":false,"followerCount":70},"colorFrom":"green","colorTo":"indigo","createdAt":"2023-06-22T12:28:01.000Z","emoji":"🏆🏋️","id":"optimum/llm-perf-leaderboard","lastModified":"2025-07-31T08:48:17.000Z","likes":561,"pinned":true,"private":false,"sdk":"gradio","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":"cpu-basic","requested":"cpu-basic"},"storage":null,"gcTimeout":86400,"replicas":{"current":1,"requested":1},"devMode":false,"domains":[{"domain":"optimum-llm-perf-leaderboard.hf.space","stage":"READY"}],"sha":"9a9500659f27523428f91cc8ebb0621bf76ca21f"},"title":"LLM-Perf Leaderboard","isLikedByUser":false,"ai_short_description":"Explore hardware performance for LLMs","ai_category":"Model Benchmarking","trendingScore":0,"tags":["gradio","llm perf leaderboard","llm performance leaderboard","llm","performance","leaderboard","region:us"]}],"position":4,"theme":"blue","private":false,"shareUrl":"https://huggingface.co/collections/open-llm-leaderboard/the-big-benchmarks-collection-64faca6335a7fc7d4ffe974a","upvotes":248,"isUpvotedByUser":false}],"datasets":[{"author":"open-llm-leaderboard","downloads":9081,"gated":false,"id":"open-llm-leaderboard/contents","lastModified":"2025-03-20T12:17:27.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":4576,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":20,"isLikedByUser":false},{"author":"open-llm-leaderboard","downloads":48709,"gated":false,"id":"open-llm-leaderboard/requests","lastModified":"2025-03-17T12:04:59.000Z","datasetsServerInfo":{"viewer":"preview","numRows":0,"libraries":[],"formats":[],"modalities":[]},"private":false,"repoType":"dataset","likes":12,"isLikedByUser":false},{"author":"open-llm-leaderboard","downloads":48,"gated":"auto","id":"open-llm-leaderboard/rootxhacker__Apollo_v2-32B-details","lastModified":"2025-03-15T06:01:08.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":43212,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["json"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false},{"author":"open-llm-leaderboard","downloads":3903,"gated":false,"id":"open-llm-leaderboard/results","lastModified":"2025-03-15T05:57:14.000Z","datasetsServerInfo":{"viewer":"preview","numRows":0,"libraries":[],"formats":[],"modalities":[]},"private":false,"repoType":"dataset","likes":15,"isLikedByUser":false},{"author":"open-llm-leaderboard","downloads":51,"gated":"auto","id":"open-llm-leaderboard/rubenroy__Gilgamesh-72B-details","lastModified":"2025-03-14T22:47:15.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":43212,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["json"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false},{"author":"open-llm-leaderboard","downloads":53,"gated":"auto","id":"open-llm-leaderboard/tomasmcm__sky-t1-coder-32b-flash-details","lastModified":"2025-03-14T22:05:05.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":43212,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["json"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false},{"author":"open-llm-leaderboard","downloads":58,"gated":"auto","id":"open-llm-leaderboard/Aryanne__QwentileSwap-details","lastModified":"2025-03-14T11:47:21.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":43212,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["json"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false},{"author":"open-llm-leaderboard","downloads":48,"gated":"auto","id":"open-llm-leaderboard/sthenno__tempesthenno-sft-0314-stage1-ckpt50-details","lastModified":"2025-03-14T00:37:11.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":43212,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["json"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false},{"author":"open-llm-leaderboard","downloads":47,"gated":"auto","id":"open-llm-leaderboard/braindao__DeepSeek-R1-Distill-Qwen-14B-ABUB-ST-details","lastModified":"2025-03-13T20:24:11.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":43212,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["json"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false},{"author":"open-llm-leaderboard","downloads":54,"gated":"auto","id":"open-llm-leaderboard/prithivMLmods__Galactic-Qwen-14B-Exp2-details","lastModified":"2025-03-13T20:12:24.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":43212,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["json"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false}],"models":[],"spaces":[{"author":"open-llm-leaderboard","authorData":{"_id":"649070e345920777b9f1f5c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/j21QZzv9_PGPUH5FbUaeM.png","fullname":"Open LLM Leaderboard","name":"open-llm-leaderboard","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"isEnterprise":true,"followerCount":1401},"colorFrom":"blue","colorTo":"red","createdAt":"2023-04-17T11:40:06.000Z","emoji":"🏆","id":"open-llm-leaderboard/open_llm_leaderboard","lastModified":"2025-03-25T09:02:15.000Z","likes":13553,"pinned":true,"private":false,"sdk":"docker","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":"cpu-upgrade","requested":"cpu-upgrade"},"storage":"small","gcTimeout":172800,"replicas":{"current":1,"requested":1},"devMode":false,"domains":[{"domain":"open-llm-leaderboard-open-llm-leaderboard.hf.space","stage":"READY"}],"sha":"6ee9164f8a40124224ffd0ca2be9d859f048dacb"},"shortDescription":"Track, rank and evaluate open LLMs and chatbots","title":"Open LLM Leaderboard","isLikedByUser":false,"ai_short_description":"Compare open-source LLMs across benchmarks","ai_category":"Text Analysis","trendingScore":23,"tags":["docker","leaderboard","modality:text","submission:automatic","test:public","language:english","eval:code","eval:math","region:us"]},{"author":"open-llm-leaderboard","authorData":{"_id":"649070e345920777b9f1f5c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/j21QZzv9_PGPUH5FbUaeM.png","fullname":"Open LLM Leaderboard","name":"open-llm-leaderboard","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"isEnterprise":true,"followerCount":1401},"colorFrom":"gray","colorTo":"green","createdAt":"2024-10-03T14:41:19.000Z","emoji":"🏆","id":"open-llm-leaderboard/comparator","lastModified":"2025-01-09T15:13:23.000Z","likes":105,"pinned":false,"private":false,"sdk":"gradio","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":"cpu-upgrade","requested":"cpu-upgrade"},"storage":null,"gcTimeout":172800,"replicas":{"current":1,"requested":1},"devMode":false,"domains":[{"domain":"open-llm-leaderboard-comparator.hf.space","stage":"READY"}],"sha":"19703eef3712d1f7feec039f7e28ae9d4722489c"},"shortDescription":"Compare Open LLM Leaderboard results","title":"Open LLM Leaderboard Model Comparator","isLikedByUser":false,"ai_short_description":"Compare results of Open LLM Leaderboard models","ai_category":"Text Analysis","trendingScore":0,"tags":["gradio","leaderboard","region:us"]},{"author":"open-llm-leaderboard","authorData":{"_id":"649070e345920777b9f1f5c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/j21QZzv9_PGPUH5FbUaeM.png","fullname":"Open LLM Leaderboard","name":"open-llm-leaderboard","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"isEnterprise":true,"followerCount":1401},"colorFrom":"pink","colorTo":"red","createdAt":"2024-06-23T16:59:22.000Z","emoji":"🏔️","id":"open-llm-leaderboard/blog","lastModified":"2024-07-01T08:57:50.000Z","likes":124,"pinned":false,"private":false,"sdk":"static","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":null,"requested":null},"storage":null,"replicas":{"requested":1,"current":1}},"title":"Open-LLM performances are plateauing, let’s make the leaderboard steep again","isLikedByUser":false,"ai_short_description":"Explore and compare advanced language models on a new leaderboard","ai_category":"Text Analysis","trendingScore":0,"tags":["static","region:us"]},{"author":"open-llm-leaderboard","authorData":{"_id":"649070e345920777b9f1f5c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/j21QZzv9_PGPUH5FbUaeM.png","fullname":"Open LLM Leaderboard","name":"open-llm-leaderboard","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"isEnterprise":true,"followerCount":1401},"colorFrom":"yellow","colorTo":"indigo","createdAt":"2024-05-14T14:06:18.000Z","emoji":"👀","id":"open-llm-leaderboard/GenerationVisualizer","lastModified":"2024-06-27T12:41:53.000Z","likes":6,"pinned":false,"private":false,"sdk":"gradio","repoType":"space","runtime":{"stage":"RUNTIME_ERROR","hardware":{"current":null,"requested":"cpu-basic"},"storage":null,"gcTimeout":172800,"errorMessage":"Exit code: 1. Reason: Traceback (most recent call last):\n File \"/home/user/app/app.py\", line 2, inAI & ML interests
Evaluating open LLMs
Recent Activity
Open LLM Leaderboard
This is the hub organisation maintaining the Open LLM Leaderboard.
In this space you will find the dataset with detailed results and queries for the models on the leaderboard.
Score results are here, and current state of requests is here. For the detailed prediction, look for your model name in the datasets below!
-
13.6k
Open LLM Leaderboard
🏆Track, rank and evaluate open LLMs and chatbots
-
124
Open-LLM performances are plateauing, let’s make the leaderboard steep again
🏔Explore and compare advanced language models on a new leaderboard
-
open-llm-leaderboard/contents
Viewer • Updated • 4.58k • 9.08k • 20 -
open-llm-leaderboard/results
Preview • Updated • 3.9k • 15
-
open-llm-leaderboard/tensopolis__virtuoso-lite-tensopolis-v2-details
Viewer • Updated • 43.2k • 26 -
open-llm-leaderboard/tensopolis__falcon3-10b-tensopolis-v1-details
Viewer • Updated • 43.2k • 25 -
open-llm-leaderboard/Pinkstack__SuperThoughts-CoT-14B-16k-o1-QwQ-details
Viewer • Updated • 43.2k • 47 • 2 -
open-llm-leaderboard/prithivMLmods__QwQ-LCoT-14B-Conversational-details
Viewer • Updated • 43.2k • 45 • 1
-
13.6k
Open LLM Leaderboard
🏆Track, rank and evaluate open LLMs and chatbots
-
124
Open-LLM performances are plateauing, let’s make the leaderboard steep again
🏔Explore and compare advanced language models on a new leaderboard
-
open-llm-leaderboard/contents
Viewer • Updated • 4.58k • 9.08k • 20 -
open-llm-leaderboard/results
Preview • Updated • 3.9k • 15
spaces
5
Open LLM Leaderboard
Track, rank and evaluate open LLMs and chatbots
Open LLM Leaderboard Model Comparator
Compare Open LLM Leaderboard results
Open-LLM performances are plateauing, let’s make the leaderboard steep again
Explore and compare advanced language models on a new leaderboard