Llama 3.1 8B

Meta

ChatFastOpen SourceValue

About

Efficient open-weight model for fast inference. Good for focused tasks where speed matters most. Lightweight enough to self-host or use for high-volume applications. Open weights enable fine-tuning for domain-specific tasks.

Provider

Meta

Context Window

128k tokens

Max Output

4,096 tokens

Model ID

meta/llama-3.1-8b

Capabilities

Open weightsFast inferenceFine-tunableStreaming

API Usage

Drop-in compatible with any OpenAI client library.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.deployai.dev/v1",
  apiKey: process.env.DEPLOYAI_API_KEY,
});

const completion = await client.chat.completions.create({
  model: "meta/llama-3.1-8b",
  messages: [
    { role: "user", content: "Hello, how are you?" }
  ],
});

console.log(completion.choices[0].message.content);

Related Models

View all

Llama 3.1 405B

Open Source

Largest open-weight model. State-of-the-art performance across benchmarks with full open access.

Meta

Llama 3.1 70B

Strong open-weight model balancing capability and efficiency. Great for production workloads.

Meta

Claude 3 Haiku

Fast

Near-instant responses for lightweight tasks. Ideal for high-throughput applications and quick interactions.

Anthropic