Xiang Zhang
fancyzhx
AI & ML interests
None yet
Organizations
None yet
Video Datasets
Text Datasets
- Running134
TxT360: Trillion Extracted Text
📖134Explore a massive deduplicated LLM training dataset
-
CASIA-LM/ChineseWebText2.0
Viewer • Updated • 2k • 3.29k • 29 -
HPLT/HPLT2.0_cleaned
Viewer • Updated • 9.03B • 58.4k • 42 -
TrevorDohm/Pile_Tokenized
Viewer • Updated • 134M • 33
Audio Datasets
Robotic Datasets
Video Datasets
Image Datasets
Text Datasets
- Running134
TxT360: Trillion Extracted Text
📖134Explore a massive deduplicated LLM training dataset
-
CASIA-LM/ChineseWebText2.0
Viewer • Updated • 2k • 3.29k • 29 -
HPLT/HPLT2.0_cleaned
Viewer • Updated • 9.03B • 58.4k • 42 -
TrevorDohm/Pile_Tokenized
Viewer • Updated • 134M • 33