refs/convert/parquet
branch.\nWhat is Parquet?
\nApache Parquet is a popular columnar storage format known for:
\n- \n
- reduced memory requirement, \n
- fast data retrieval and filtering, \n
- efficient storage. \n
This is what powers the dataset viewer on each dataset page and every dataset on the Hub can be accessed with the same code (you can use HF Datasets, ClickHouse, DuckDB, Pandas, PostgreSQL, or Polars, up to you).
\nYou can learn more about the advantages associated with Parquet in the documentation.
\nHow to access the Parquet version of the dataset?
\nYou can access the Parquet version of the dataset by following this link: refs/convert/parquet
What if my dataset was already in Parquet?
\nWhen the dataset is already in Parquet format, the data are not converted and the files in refs/convert/parquet
are links to the original files. This rule has an exception to ensure the dataset viewer API to stay fast: if the row group size of the original Parquet files is too big, new Parquet files are generated.
What should I do?
\nYou don't need to do anything. The Parquet version of the dataset is available for you to use. Refer to the documentation for examples and code snippets on how to query the Parquet files with ClickHouse, DuckDB, Pandas or Polars.
\nIf you have any questions or concerns, feel free to ask in the discussion below. You can also close the discussion if you don't have any questions.
\n","updatedAt":"2025-07-11T00:11:13.964Z","author":{"_id":"61f02cf649ea1fb7363729dc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1658495802629-61f02cf649ea1fb7363729dc.png","fullname":"Parquet-converter (BOT)","name":"parquet-converter","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":250}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8244360089302063},"editors":["parquet-converter"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1658495802629-61f02cf649ea1fb7363729dc.png"],"reactions":[],"isReport":false}},{"id":"68cd9223725fe0cae50dc6ef","author":{"_id":"62e7dd4036a8e8a82700041c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62e7dd4036a8e8a82700041c/Dgk9mXYLVd4LpiNLWjn-q.jpeg","fullname":"Felix Friedrich","name":"felfri","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":16,"isOwner":true,"isOrgMember":false},"createdAt":"2025-09-19T17:25:55.000Z","type":"status-change","data":{"status":"closed"}}],"pinned":false,"locked":false,"collection":"discussions","isPullRequest":false,"isReport":false},"repo":{"name":"felfri/DivBench","type":"dataset"},"activeTab":"discussion","discussionRole":0,"watched":false,"muted":false,"repoDiscussionsLocked":false}">[bot] Conversion to Parquet
refs/convert/parquet
branch.\nWhat is Parquet?
\nApache Parquet is a popular columnar storage format known for:
\n- \n
- reduced memory requirement, \n
- fast data retrieval and filtering, \n
- efficient storage. \n
This is what powers the dataset viewer on each dataset page and every dataset on the Hub can be accessed with the same code (you can use HF Datasets, ClickHouse, DuckDB, Pandas, PostgreSQL, or Polars, up to you).
\nYou can learn more about the advantages associated with Parquet in the documentation.
\nHow to access the Parquet version of the dataset?
\nYou can access the Parquet version of the dataset by following this link: refs/convert/parquet
What if my dataset was already in Parquet?
\nWhen the dataset is already in Parquet format, the data are not converted and the files in refs/convert/parquet
are links to the original files. This rule has an exception to ensure the dataset viewer API to stay fast: if the row group size of the original Parquet files is too big, new Parquet files are generated.
What should I do?
\nYou don't need to do anything. The Parquet version of the dataset is available for you to use. Refer to the documentation for examples and code snippets on how to query the Parquet files with ClickHouse, DuckDB, Pandas or Polars.
\nIf you have any questions or concerns, feel free to ask in the discussion below. You can also close the discussion if you don't have any questions.
\n","updatedAt":"2025-07-11T00:11:13.964Z","author":{"_id":"61f02cf649ea1fb7363729dc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1658495802629-61f02cf649ea1fb7363729dc.png","fullname":"Parquet-converter (BOT)","name":"parquet-converter","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":250}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8244360089302063},"editors":["parquet-converter"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1658495802629-61f02cf649ea1fb7363729dc.png"],"reactions":[],"isReport":false}},{"id":"68cd9223725fe0cae50dc6ef","author":{"_id":"62e7dd4036a8e8a82700041c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62e7dd4036a8e8a82700041c/Dgk9mXYLVd4LpiNLWjn-q.jpeg","fullname":"Felix Friedrich","name":"felfri","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":16,"isOwner":true,"isOrgMember":false},"createdAt":"2025-09-19T17:25:55.000Z","type":"status-change","data":{"status":"closed"}}],"pinned":false,"locked":false,"collection":"discussions","isPullRequest":false,"isReport":false},"primaryEmailConfirmed":false,"repo":{"name":"felfri/DivBench","type":"dataset"},"discussionRole":0,"acceptLanguages":["*"],"hideComments":true,"repoDiscussionsLocked":false,"isDiscussionAuthor":false}">The parquet-converter bot has created a version of this dataset in the Parquet format in the refs/convert/parquet
branch.
What is Parquet?
Apache Parquet is a popular columnar storage format known for:
- reduced memory requirement,
- fast data retrieval and filtering,
- efficient storage.
This is what powers the dataset viewer on each dataset page and every dataset on the Hub can be accessed with the same code (you can use HF Datasets, ClickHouse, DuckDB, Pandas, PostgreSQL, or Polars, up to you).
You can learn more about the advantages associated with Parquet in the documentation.
How to access the Parquet version of the dataset?
You can access the Parquet version of the dataset by following this link: refs/convert/parquet
What if my dataset was already in Parquet?
When the dataset is already in Parquet format, the data are not converted and the files in refs/convert/parquet
are links to the original files. This rule has an exception to ensure the dataset viewer API to stay fast: if the row group size of the original Parquet files is too big, new Parquet files are generated.
What should I do?
You don't need to do anything. The Parquet version of the dataset is available for you to use. Refer to the documentation for examples and code snippets on how to query the Parquet files with ClickHouse, DuckDB, Pandas or Polars.
If you have any questions or concerns, feel free to ask in the discussion below. You can also close the discussion if you don't have any questions.