Cerebras has introduced a groundbreaking AI inference chip that’s being hailed as a formidable rival to Nvidia's DGX100. This new chip boasts 44GB of high-speed memory, enabling it to handle AI models with billions to trillions of parameters, surpassing the capabilities of many existing solutions. Unlike competitors that rely on 8-bit precision, Cerebras uses 16-bit model weights, ensuring a 5% boost in performance for complex tasks like multi-turn conversations and reasoning.
For larger models, Cerebras can distribute parameters across multiple CS-3 systems, with each system capable of running up to 20 billion parameters. With as few as four systems, it can manage models as large as 70 billion parameters.
The Cerebras AI platform, available via API, promises near-instantaneous inference speeds, including the ability to run Llama3.1 70B models at 450 tokens per second. This is particularly beneficial for AI workflows involving multiple agents, where fast inference is critical for efficient interactions.
Cerebras is offering 1 million free tokens daily for developers at launch, with significantly lower pricing than traditional GPU clouds. The platform is poised to support even larger models like Llama3 405B and Mistral Large 2 in the near future, further enhancing its capabilities.
With this latest development, Cerebras is set to revolutionize AI model deployment, offering unmatched performance and speed for large-scale AI applications. Developers can explore the platform and try it out for free at Cerebras AI Inference.