Description:

Explore innovative approaches to scaling and enhancing Llama capabilities in this 13-minute talk by Lin Qiao, co-founder and CEO of Fireworks. Discover open compound AI architectures, FireAttention (a distributed inference engine), and scalable serving designs that enable Llama developers to accelerate inference, integrate external tools, and achieve domain specialization. Gain insights into addressing Gen AI challenges and advancing from single models to more complex AI systems.

Going from Single Model to Compound AI Systems