Efficient open-weight model for fast inference. Good for focused tasks where speed matters most. Lightweight enough to self-host or use for high-volume applications. Open weights enable fine-tuning for domain-specific tasks.
Meta
128k tokens
4,096 tokens
meta/llama-3.1-8b
Drop-in compatible with any OpenAI client library.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.deployai.dev/v1",
apiKey: process.env.DEPLOYAI_API_KEY,
});
const completion = await client.chat.completions.create({
model: "meta/llama-3.1-8b",
messages: [
{ role: "user", content: "Hello, how are you?" }
],
});
console.log(completion.choices[0].message.content);Largest open-weight model. State-of-the-art performance across benchmarks with full open access.
Strong open-weight model balancing capability and efficiency. Great for production workloads.
Near-instant responses for lightweight tasks. Ideal for high-throughput applications and quick interactions.