Google publishes TurboQuant to ease AI memory strain
The new algorithm combines earlier Google-led work on zero-overhead and polar-coordinate quantization to shrink LLM key-value caches by at least 6x in reported tests
The new algorithm combines earlier Google-led work on zero-overhead and polar-coordinate quantization to shrink LLM key-value caches by at least 6x in reported tests
Aaron Boasman-Patel, VP of innovation at TM Forum, on getting competitors to collaborate, why AI-washing is rife and what the Victorian era has to do with satellite networks