ngxson/kokoro-podcast-generator, using DeepSeek-R1 and Kokoro-TTS\n","updatedAt":"2025-02-17T22:13:54.430Z","author":{"_id":"63ca214abedad7e2bf1d1517","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674191139776-noauth.png","fullname":"Xuan-Son Nguyen","name":"ngxson","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":363}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.745898425579071},"editors":["ngxson"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674191139776-noauth.png"],"reactions":[{"reaction":"🔥","users":["rajatarya","Bradarr"],"count":2},{"reaction":"🤗","users":["rajatarya"],"count":1}],"isReport":false}},{"id":"67b435c8bffd44cc85a0ade1","author":{"_id":"66a07001440c06d64352cfc8","avatarUrl":"/avatars/93a7d12ac4d300a56bfb4bdfc54e6d33.svg","fullname":"liming","name":"largegpt","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false},"createdAt":"2025-02-18T07:24:56.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"How can we find the chunk content using chunk hash? The CAS system only stores \"block_hash -> block_content\", Where does the map of chunk to block?\n\nwhat does the shards store? Is it \"file_name, shard_id, chunk_hash, block_hash\"?","html":"
How can we find the chunk content using chunk hash? The CAS system only stores \"block_hash -> block_content\", Where does the map of chunk to block?
\n
what does the shards store? Is it \"file_name, shard_id, chunk_hash, block_hash\"?
\n","updatedAt":"2025-02-18T07:25:14.971Z","author":{"_id":"66a07001440c06d64352cfc8","avatarUrl":"/avatars/93a7d12ac4d300a56bfb4bdfc54e6d33.svg","fullname":"liming","name":"largegpt","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.8209852576255798},"editors":["largegpt"],"editorAvatarUrls":["/avatars/93a7d12ac4d300a56bfb4bdfc54e6d33.svg"],"reactions":[],"isReport":false},"replies":[{"id":"67b5037e864242a26c23e881","author":{"_id":"64beb7a2c733e8552ffd63b3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64beb7a2c733e8552ffd63b3/ZI_DOExd737quSFZLmx58.jpeg","fullname":"Sam Horradarn","name":"sirahd","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":37},"createdAt":"2025-02-18T22:02:38.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"> How can we find the chunk content using chunk hash?\n\nChunk hash is calculated via content-defined chunking (CDC), which means that if two chunks have the same content they will share the same hash. CDC removes the need to store the mapping between chunk hash -> chunk content because we know if two chunks share the same hash, they will have identical content.\n\n> The CAS system only stores \"block_hash -> block_content\", Where does the map of chunk to block?\n\nThis is explained in the \"key chunks\" section in the blog post above. Essentially we only store a tiny subset of chunk -> block by leveraging spatial locality in the file. Trying to store every mapping of chunk -> block can get impractical very quickly.\n\n> what does the shards store? Is it \"file_name, shard_id, chunk_hash, block_hash\"\n\nYou can think of the shards as storing mappings between file (identified via file hash) to list of chunks that make up the file.\n\nI hope this help explains our underlying tech better!","html":"
\nHow can we find the chunk content using chunk hash?
\n
\n
Chunk hash is calculated via content-defined chunking (CDC), which means that if two chunks have the same content they will share the same hash. CDC removes the need to store the mapping between chunk hash -> chunk content because we know if two chunks share the same hash, they will have identical content.
\n
\nThe CAS system only stores \"block_hash -> block_content\", Where does the map of chunk to block?
\n
\n
This is explained in the \"key chunks\" section in the blog post above. Essentially we only store a tiny subset of chunk -> block by leveraging spatial locality in the file. Trying to store every mapping of chunk -> block can get impractical very quickly.
\n
\nwhat does the shards store? Is it \"file_name, shard_id, chunk_hash, block_hash\"
\n
\n
You can think of the shards as storing mappings between file (identified via file hash) to list of chunks that make up the file.
\n
I hope this help explains our underlying tech better!
\n","updatedAt":"2025-02-18T22:02:38.999Z","author":{"_id":"64beb7a2c733e8552ffd63b3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64beb7a2c733e8552ffd63b3/ZI_DOExd737quSFZLmx58.jpeg","fullname":"Sam Horradarn","name":"sirahd","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":37}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.924042284488678},"editors":["sirahd"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64beb7a2c733e8552ffd63b3/ZI_DOExd737quSFZLmx58.jpeg"],"reactions":[{"reaction":"❤️","users":["jsulz"],"count":1}],"isReport":false,"parentCommentId":"67b435c8bffd44cc85a0ade1"}},{"id":"67b54eae92ed469d5f99f6ef","author":{"_id":"66a07001440c06d64352cfc8","avatarUrl":"/avatars/93a7d12ac4d300a56bfb4bdfc54e6d33.svg","fullname":"liming","name":"largegpt","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false},"createdAt":"2025-02-19T03:23:26.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2025-02-24T04:56:31.841Z","author":{"_id":"66a07001440c06d64352cfc8","avatarUrl":"/avatars/93a7d12ac4d300a56bfb4bdfc54e6d33.svg","fullname":"liming","name":"largegpt","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[],"parentCommentId":"67b435c8bffd44cc85a0ade1"}}]},{"id":"67bbfc2194fcec47ae0e36bc","author":{"_id":"66a07001440c06d64352cfc8","avatarUrl":"/avatars/93a7d12ac4d300a56bfb4bdfc54e6d33.svg","fullname":"liming","name":"largegpt","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false},"createdAt":"2025-02-24T04:57:05.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"If we don't store every chunk hash to block hash, how can we download a model file. For example if a model does not have a key chunk, the shard info only contains \"file hash -> list of chunks\", the CAS only has \"block hash -> block content\", how can we know which block is needed ?\n","html":"
If we don't store every chunk hash to block hash, how can we download a model file. For example if a model does not have a key chunk, the shard info only contains \"file hash -> list of chunks\", the CAS only has \"block hash -> block content\", how can we know which block is needed ?
\n","updatedAt":"2025-02-24T04:57:05.089Z","author":{"_id":"66a07001440c06d64352cfc8","avatarUrl":"/avatars/93a7d12ac4d300a56bfb4bdfc54e6d33.svg","fullname":"liming","name":"largegpt","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9205638766288757},"editors":["largegpt"],"editorAvatarUrls":["/avatars/93a7d12ac4d300a56bfb4bdfc54e6d33.svg"],"reactions":[],"isReport":false},"replies":[{"id":"67be432790cc736e2f8e6c4e","author":{"_id":"65d50e9ef9cbfa798c590004","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65d50e9ef9cbfa798c590004/FlVe8chafigMfrPpMeJRL.jpeg","fullname":"Jared Sulzdorf","name":"jsulz","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":243},"createdAt":"2025-02-25T22:24:39.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"Great questions :) \n\n> For example if a model does not have a key chunk\n\nTwo points of clarification might be in order concerning the above point: \n1. Not all files will contain a key chunk; this is purely an optimization.\n2. Key chunks are used for deduplication on the upload path, not for downloads. They allow us to see if a file uploaded to a repository for the first time has any content in the global store. This allows us to deduplicate over the entirety of the storage. \n\nAs for your other question: \n> If we don't store every chunk hash to block hash, how can we download a model file\n\nWhen downloading a model file, a request is made to our services with the file hash. This is mapped to a list of block subranges. Logically these are chunks, but by storing the offsets we save on the metadata storage and ultimately many offsets will share boundaries inside a block (allowing us to group them together in a response) providing benefits when sending the content back to the client.","html":"
Great questions :)
\n
\nFor example if a model does not have a key chunk
\n
\n
Two points of clarification might be in order concerning the above point:
\n
\n- Not all files will contain a key chunk; this is purely an optimization.
\n- Key chunks are used for deduplication on the upload path, not for downloads. They allow us to see if a file uploaded to a repository for the first time has any content in the global store. This allows us to deduplicate over the entirety of the storage.
\n
\n
As for your other question:
\n
\nIf we don't store every chunk hash to block hash, how can we download a model file
\n
\n
When downloading a model file, a request is made to our services with the file hash. This is mapped to a list of block subranges. Logically these are chunks, but by storing the offsets we save on the metadata storage and ultimately many offsets will share boundaries inside a block (allowing us to group them together in a response) providing benefits when sending the content back to the client.
\n","updatedAt":"2025-02-26T04:56:49.883Z","author":{"_id":"65d50e9ef9cbfa798c590004","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65d50e9ef9cbfa798c590004/FlVe8chafigMfrPpMeJRL.jpeg","fullname":"Jared Sulzdorf","name":"jsulz","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":243}},"numEdits":2,"identifiedLanguage":{"language":"en","probability":0.9278646111488342},"editors":["jsulz"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/65d50e9ef9cbfa798c590004/FlVe8chafigMfrPpMeJRL.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"67bbfc2194fcec47ae0e36bc"}},{"id":"67d55ea8a074f6158e265304","author":{"_id":"61bf40824b4300d0fb0acf59","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1644224872623-61bf40824b4300d0fb0acf59.jpeg","fullname":"Leshem Choshen","name":"borgr","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":23},"createdAt":"2025-03-15T11:04:08.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Is all this data saved in raw fp? Or compressed somehow?\n\n (yes I have an unhidden agenda of wondering due to zipNN lossless (open source) model compression and its relevance https://github.com/zipnn/zipnn)","html":"
Is all this data saved in raw fp? Or compressed somehow?
\n
(yes I have an unhidden agenda of wondering due to zipNN lossless (open source) model compression and its relevance https://github.com/zipnn/zipnn)
\n","updatedAt":"2025-03-15T11:04:08.625Z","author":{"_id":"61bf40824b4300d0fb0acf59","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1644224872623-61bf40824b4300d0fb0acf59.jpeg","fullname":"Leshem Choshen","name":"borgr","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":23}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8546241521835327},"editors":["borgr"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1644224872623-61bf40824b4300d0fb0acf59.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"67bbfc2194fcec47ae0e36bc"}},{"id":"67d897cae3089f36355fb807","author":{"_id":"66ac094a8fc00b5c160d7da4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66ac094a8fc00b5c160d7da4/1-DnsQ0zlyTA-18bncHbt.jpeg","fullname":"yuchenglow","name":"yuchenglow","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":69},"createdAt":"2025-03-17T21:44:42.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"We LZ4 everything automatically, but when we do encounter a model, we also perform a format-agnostic byte grouping inspired by ZipNN before LZ4ing. This does empirically save about 20%.\n https://github.com/huggingface/xet-core/blob/main/cas_object/src/byte_grouping/bg4.rs","html":"
We LZ4 everything automatically, but when we do encounter a model, we also perform a format-agnostic byte grouping inspired by ZipNN before LZ4ing. This does empirically save about 20%.
https://github.com/huggingface/xet-core/blob/main/cas_object/src/byte_grouping/bg4.rs
\n","updatedAt":"2025-03-17T21:44:42.530Z","author":{"_id":"66ac094a8fc00b5c160d7da4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66ac094a8fc00b5c160d7da4/1-DnsQ0zlyTA-18bncHbt.jpeg","fullname":"yuchenglow","name":"yuchenglow","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":69}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7416666746139526},"editors":["yuchenglow"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/66ac094a8fc00b5c160d7da4/1-DnsQ0zlyTA-18bncHbt.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"67bbfc2194fcec47ae0e36bc"}},{"id":"67d8990ea35ca2f0c093b59b","author":{"_id":"61bf40824b4300d0fb0acf59","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1644224872623-61bf40824b4300d0fb0acf59.jpeg","fullname":"Leshem Choshen","name":"borgr","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":23},"createdAt":"2025-03-17T21:50:06.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"That's great to know! Tx for sharing.","html":"
That's great to know! Tx for sharing.
\n","updatedAt":"2025-03-17T21:50:06.717Z","author":{"_id":"61bf40824b4300d0fb0acf59","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1644224872623-61bf40824b4300d0fb0acf59.jpeg","fullname":"Leshem Choshen","name":"borgr","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":23}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9869318604469299},"editors":["borgr"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1644224872623-61bf40824b4300d0fb0acf59.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"67bbfc2194fcec47ae0e36bc"}}]},{"id":"67d304341099f6e8e456536a","author":{"_id":"65d50e9ef9cbfa798c590004","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65d50e9ef9cbfa798c590004/FlVe8chafigMfrPpMeJRL.jpeg","fullname":"Jared Sulzdorf","name":"jsulz","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":243},"createdAt":"2025-03-13T16:13:40.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"If you're reading this post and want to get your hands dirty, join the waitlist to get access! https://huggingface.co/join/xet","html":"
If you're reading this post and want to get your hands dirty, join the waitlist to get access! https://huggingface.co/join/xet
\n","updatedAt":"2025-03-13T16:13:40.787Z","author":{"_id":"65d50e9ef9cbfa798c590004","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65d50e9ef9cbfa798c590004/FlVe8chafigMfrPpMeJRL.jpeg","fullname":"Jared Sulzdorf","name":"jsulz","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":243}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9134101867675781},"editors":["jsulz"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/65d50e9ef9cbfa798c590004/FlVe8chafigMfrPpMeJRL.jpeg"],"reactions":[],"isReport":false}}],"status":"open","isReport":false,"pinned":false,"locked":false,"collection":"canonical_blogs"},"contextAuthors":["jsulz","yuchenglow","znation","saba9"],"primaryEmailConfirmed":false,"discussionRole":0,"acceptLanguages":["*"],"withThread":true,"cardDisplay":false,"repoDiscussionsLocked":false}">