GitHub - leeguandong/MiniLLaMA3: llama3的迷你版本,包括了数据,tokenizer,pt的全流程llama3的迷你版本,包括了数据,tokenizer,pt的全流程. Contribute to leeguandong/MiniLLaMA3 development by creating an account on GitHub.https://github.com/leeguandong/MiniLLaMA31.数据预处理
SFT数据集全部来自[BELLE](https://github.com/LianjiaTech/BELLE)大佬的贡献,感谢。SFT数据集分别为:[generated_chat_0.4M](https://huggingface.co/datasets/BelleGroup/generated_chat_0.4M)、[train_0.5M_CN](https://huggingface.co/datasets/Bel