NVFP4
Keep the original full-precision model frozen as a teacher, then train the quantized NVFP4 model as a student to match the teacher’s output distributions using KL-divergence, rather than retraining the entire model with task losses.
Distillation from the teacher model is the key point.
Subscribe to:
Post Comments (Atom)
apt quotation..
“A man should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects.” by Robert A. Heinlein (author, aeronautical engineer, and naval officer)

No comments:
Post a Comment