-
MDP
Markov Decision Process
-
Megascale
Scaling Large Language Model Training to More Than 10,000 GPUs
-
Mycroft
Mycroft Tracing Dependencies in Collective Communication Towards Reliable LLM Training
-
Minder
Minder Faulty Machine Detection for Large-scale Distributed Model Training
-
ByteRobust
Robust LLM Training Infrastructure at ByteDance