Novita AI LLM Inference Engine: the largest throughput and cheapest inference available

The Novita AI Inference Engine stands out as an exceptionally fast inference service, surpassing all others in terms of speed. It demonstrates impressive performance, processing 130 tokens per second when used with the Llama-2–70B-Chat model, and an even higher rate of 180 tokens per second when paired with the

blogs.novita.ai