It&#x27;s not better quality: 59.3% vs 59.4% fp16 on AIME 25

Faster than Fp16, not better quality i guess

Better performance than TQ and better quality than FP16?<p>Am I reading this right??

And with the help of AI, pointing at AI at this paper and saying &quot;making a vLLM PR from this paper&quot; tends to work surprisingly well, even if you need to nudge it a little bit along the way.

It&#x27;s the output of a research paper; the authors are not trying to build up vLLM, and they probably have no incentive to do so. You can submit a PR, though! It&#x27;s easier now while the divergence is low, so don&#x27;t wait. Since there are six authors, I bet you could get help with the inevitable review chores if you just take the step of creating the PR.<p>edit: It might not be clear that it is based on vLLM 0.22, which is the current version: <a href="https:&#x2F;&#x2F;github.com&#x2F;huawei-csl&#x2F;KVarN&#x2F;commit&#x2F;d6290e99098d7426dcce01cdd8cc57a2eecf21a0" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;huawei-csl&#x2F;KVarN&#x2F;commit&#x2F;d6290e99098d7426d...</a>. All you have to do is create a diff off it; it&#x27;s fairly straightforward.