Scaling LLMs to serve the academic community

Modern AI research faces a crisis of dependency. As we increasingly rely on commercial APIs and closed-source models, the scientific community risks undermining the reproducibility, interpretability, and sustainability of its work. This talk argues for a strategic pivot: moving from viewing AI as a tool we rent, to viewing it as infrastructure we control. We present a case study in building this "sovereign stack." Moving beyond the traditional model where hard-won compute resources are necessarily siloed, we demonstrate how we architected a server to operate as a shared utility for the academic community. However, independence brings complexity. We candidly explore the significant engineering friction encountered in moving from raw hardware to a production-ready service. We discuss the realities of hardening the attack surface, managing API keys, and the intricate balancing act of optimizing throughput, latency, and VRAM usage against massive context windows. By releasing our full software stack, monitoring configurations, and Architecture Decision Records (ADRs), we aim to "outsource the tedium" of these discoveries. This talk serves as a blueprint for how institutions can build secure, scalable, and open AI infrastructure that keeps science reproducible and data secure.

Further information

Time:

Venue:

Speaker:

Series:

Forthcoming Seminars

News, Announcements and Events

Social media

Study at Cambridge

About the University

Research at Cambridge