Loading…
In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Friday November 15, 2024 2:00pm - 2:35pm MST
Imagine once an LLM learns something from a document, the knowledge can be instantly shared with other LLMs. Unfortunately, today, LLMs must read the same document multiple times, causing a significant slowdown. This session will introduce a new KNOWLEDGE-SHARING system that enables LLMs to share their digested knowledge, in the form of KV caches, so only one LLM needs to process each document. The key challenge is how to store the KV caches cheaply and serve them quickly. Instead of keeping the KV caches of all reusable chunks in GPU/CPU memory, we show a DEMO that with careful implementation on Kubernetes, storing them on cheaper devices is not only economically superior but also delivers significant reductions in LLM serving delay, especially the time to the first token.
Speakers
avatar for Junchen Jiang

Junchen Jiang

Professor, University of Chicago
Junchen Jiang is an Assistant Professor of Computer Science at the University of Chicago. He works at the intersections between networked systems and machine learning. He received his Ph.D. from CMU in 2017 and his bachelor’s degree from Tsinghua in 2011. He has received a Google... Read More →
avatar for Zhou Sun

Zhou Sun

CEO, Mooncake Labs
Mooncake Labs is working on the next generation of stateless data architecture, bringing database performance and functionality to structured and unstructured data in datalakes and raw datasets. Previous I lead the query team at SingleStore (cloud-native distributed HTAP database... Read More →
Friday November 15, 2024 2:00pm - 2:35pm MST
Salt Palace | Level 2 | 255 B
  Emerging + Advanced
Log in to leave feedback.

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link