🖥️ Локальные LLM

Запускайте LLM на своём железе. Для каждой модели — характеристики, команда установки и минимальные требования. Все модели доступны через Ollama.

Granite 3.1 Gemma 3 4B Gemma 2 9B Gemma 2 27B Devstral Qwen2.5 VL Phi 4 Reasoning Phi 4 Mini Gemma 3 27B Gemma 3 1B

Granite 3.1 8B

8B params · Dense
Dense LLM from IBM supporting up to 128K context length, trained on 12T tokens. Suitable for general instructions following and can be used to build AI assistants.
Контекст
128K tokens
VRAM (GPU)
6 GB
RAM (CPU)
8 GB
Размер
4.9 GB
ollama pull granite3.1:8b

Gemma 3 4B

4B params · Multimodal
State-of-the-art image + text input models from Google, built from the same research and tech used to create the Gemini models.
Контекст
32K tokens
VRAM (GPU)
3 GB
RAM (CPU)
4 GB
Размер
2.6 GB
ollama pull gemma3:4b

Gemma 2 9B

9B params · Dense
The mid-sized option of the Gemma 2 model family. Built by Google, using from the same research and technology used to create the Gemini models.
Контекст
8K tokens
VRAM (GPU)
8 GB
RAM (CPU)
10 GB
Размер
5.4 GB
ollama pull gemma2:9b

Gemma 2 27B

27B params · Dense
The large option of the Gemma 2 model family. Built by Google, using from the same research and technology used to create the Gemini models.
Контекст
8K tokens
VRAM (GPU)
18 GB
RAM (CPU)
28 GB
Размер
15.7 GB
ollama pull gemma2:27b

Devstral Small 2505

22B params · Dense
Devstral by MistralAI is based on Mistral Small 3.1. Debuts as the #1 open source model on SWE-bench.
Контекст
128K tokens
VRAM (GPU)
16 GB
RAM (CPU)
24 GB
Размер
13 GB
ollama pull devstral-small:2505

Qwen2.5 VL 7B

7B params · Vision-Language
A 7B Vision Language Model (VLM) from the Qwen2.5 family.
Контекст
128K tokens
VRAM (GPU)
6 GB
RAM (CPU)
8 GB
Размер
4.7 GB
ollama pull qwen2.5-vl:7b

Phi 4 Reasoning Plus

14B params · Reasoning
Advanced open-weight reasoning model, finetuned from Phi-4 with additional reinforcement learning for higher accuracy.
Контекст
128K tokens
VRAM (GPU)
12 GB
RAM (CPU)
16 GB
Размер
8.5 GB
ollama pull phi4:14b

Phi 4 Mini Reasoning

3.8B params · Lightweight
Lightweight open model from the Phi-4 family.
Контекст
128K tokens
VRAM (GPU)
3 GB
RAM (CPU)
4 GB
Размер
2.4 GB
ollama pull phi4-mini:3.8b

Gemma 3 27B

27B params · Multimodal
State-of-the-art image + text input models from Google, built from the same research and tech used to create the Gemini models.
Контекст
128K tokens
VRAM (GPU)
18 GB
RAM (CPU)
28 GB
Размер
17 GB
ollama pull gemma3:27b

Gemma 3 1B

1B params · Ultralight
State-of-the-art image + text input models from Google, built from the same research and tech used to create the Gemini models. Smallest model in the Gemma 3 family — runs anywhere.
Контекст
32K tokens
VRAM (GPU)
1 GB
RAM (CPU)
2 GB
Размер
0.8 GB
ollama pull gemma3:1b

Llama 3.2 3B

3B params · Lightweight
Meta's lightweight multilingual model. Text-only, great for on-device and edge deployment. Supports 8 languages.
Контекст
128K tokens
VRAM (GPU)
3 GB
RAM (CPU)
4 GB
Размер
2.0 GB
ollama pull llama3.2:3b

Llama 3.1 8B

8B params · Dense
Meta's flagship 8B. Multilingual, strong reasoning, tool use. The most popular open model for general tasks.
Контекст
128K tokens
VRAM (GPU)
6 GB
RAM (CPU)
8 GB
Размер
4.9 GB
ollama pull llama3.1:8b

Llama 3.3 70B

70B params · Dense
Meta's most powerful open model. Near GPT-4 class on reasoning, coding, and instruction following.
Контекст
128K tokens
VRAM (GPU)
40 GB
RAM (CPU)
70 GB
Размер
40 GB
ollama pull llama3.3:70b

Mistral 7B

7B params · Dense
Mistral AI's groundbreaking 7B. Outperforms larger models on reasoning. Excellent for fine-tuning.
Контекст
8K tokens
VRAM (GPU)
6 GB
RAM (CPU)
8 GB
Размер
4.1 GB
ollama pull mistral:7b

Mixtral 8x7B

46B params · MoE (8 experts)
Mistral's Mixture-of-Experts. 8 experts x 7B, activates 2 per token. 46B performance at 12B inference cost.
Контекст
32K tokens
VRAM (GPU)
26 GB
RAM (CPU)
46 GB
Размер
26 GB
ollama pull mixtral:8x7b

DeepSeek-R1 8B

8B params · Reasoning (CoT)
DeepSeek's reasoning model with chain-of-thought. Shows thinking process. Strong on math, code, logic.
Контекст
128K tokens
VRAM (GPU)
6 GB
RAM (CPU)
8 GB
Размер
4.9 GB
ollama pull deepseek-r1:8b

DeepSeek-R1 32B

32B params · Reasoning (CoT)
DeepSeek-R1 distilled to 32B (Qwen base). Strong reasoning with visible chain-of-thought. Near GPT-4o on math.
Контекст
128K tokens
VRAM (GPU)
20 GB
RAM (CPU)
32 GB
Размер
19 GB
ollama pull deepseek-r1:32b

DeepSeek-Coder V2 16B

16B params · MoE · Code
MoE code model from DeepSeek. 338 languages, 128K context. Strongest open code model for single-GPU.
Контекст
128K tokens
VRAM (GPU)
12 GB
RAM (CPU)
16 GB
Размер
8.9 GB
ollama pull deepseek-coder-v2:16b

Qwen2.5 7B

7B params · Dense
Alibaba's Qwen2.5 7B. 29 languages, 128K context. Great all-rounder for Asian and European languages.
Контекст
128K tokens
VRAM (GPU)
6 GB
RAM (CPU)
8 GB
Размер
4.7 GB
ollama pull qwen2.5:7b

Qwen2.5-Coder 7B

7B params · Code specialist
Alibaba's code-specialized model. Trained on 5.5T tokens of code. 92 programming languages.
Контекст
128K tokens
VRAM (GPU)
6 GB
RAM (CPU)
8 GB
Размер
4.7 GB
ollama pull qwen2.5-coder:7b

CodeLlama 13B

13B params · Code specialist
Meta's code-specialized Llama. Python specialist with fill-in-the-middle. Code completion and generation.
Контекст
16K tokens
VRAM (GPU)
10 GB
RAM (CPU)
14 GB
Размер
7.4 GB
ollama pull codellama:13b

LLaVA 13B

13B params · Vision-Language
Large Language and Vision Assistant. Llama 2 + vision encoder. Image understanding, OCR, visual QA.
Контекст
4K tokens
VRAM (GPU)
10 GB
RAM (CPU)
14 GB
Размер
7.4 GB
ollama pull llava:13b

Dolphin Mixtral 8x7B

46B params · MoE · Uncensored
Uncensored Mixtral fine-tune. Removes refusal behaviour. Good for creative and unrestricted tasks.
Контекст
32K tokens
VRAM (GPU)
26 GB
RAM (CPU)
46 GB
Размер
26 GB
ollama pull dolphin-mixtral:8x7b

Zephyr 7B

7B params · Fine-tuned
HuggingFace's DPO-trained Mistral 7B. Punches above weight on chat benchmarks. Excellent conversationalist.
Контекст
8K tokens
VRAM (GPU)
6 GB
RAM (CPU)
8 GB
Размер
4.1 GB
ollama pull zephyr:7b

OpenChat 7B

7B params · Chat-optimised
C-RLFT trained Mistral 7B. Top performer on MT-Bench among 7B models. Natural conversation.
Контекст
8K tokens
VRAM (GPU)
6 GB
RAM (CPU)
8 GB
Размер
4.1 GB
ollama pull openchat:7b

Yi 34B

34B params · Dense
01.AI's 34B model. 3T training tokens. Strong bilingual (EN/ZH). Near GPT-3.5 performance.
Контекст
200K tokens
VRAM (GPU)
22 GB
RAM (CPU)
34 GB
Размер
20 GB
ollama pull yi:34b

Command R+ 104B

104B params · Dense
Cohere's flagship. RAG-optimised, tool-use native, 10 languages. Enterprise-grade reasoning.
Контекст
128K tokens
VRAM (GPU)
60 GB
RAM (CPU)
100 GB
Размер
60 GB
ollama pull command-r-plus:104b

StarCoder2 15B

15B params · Code specialist
BigCode project. The Stack v2 (600+ languages). Fill-in-the-middle, code completion.
Контекст
16K tokens
VRAM (GPU)
12 GB
RAM (CPU)
16 GB
Размер
9.0 GB
ollama pull starcoder2:15b

SQLCoder 7B

7B params · SQL specialist
Defog.ai's SQL specialist. 20K+ SQL queries across diverse schemas. Text-to-SQL near GPT-4 accuracy.
Контекст
8K tokens
VRAM (GPU)
6 GB
RAM (CPU)
8 GB
Размер
4.1 GB
ollama pull sqlcoder:7b

WizardLM 2 8x22B

141B params · MoE (8 experts)
Microsoft's MoE model. Evolved training on complex instructions. Top reasoning, multilingual, coding.
Контекст
64K tokens
VRAM (GPU)
80 GB
RAM (CPU)
140 GB
Размер
78 GB
ollama pull wizardlm2:8x22b

Nous Hermes 2 Mixtral 8x7B

46B params · MoE · Instruction-tuned
Nous Research fine-tune on 1M+ instructions. Structured output, function calling, JSON mode. ChatML format.
Контекст
32K tokens
VRAM (GPU)
26 GB
RAM (CPU)
46 GB
Размер
26 GB
ollama pull nous-hermes2:mixtral

DBRX Instruct 132B

132B params · MoE (16 experts)
Databricks' MoE flagship. 16 experts, 4 active. Strong at structured data, SQL, Python, reasoning.
Контекст
32K tokens
VRAM (GPU)
80 GB
RAM (CPU)
130 GB
Размер
75 GB
ollama pull dbrx:instruct

MiniCPM-V 8B

8B params · Vision-Language
OpenBMB's vision model. Strong OCR, image understanding, bilingual (EN/ZH). Runs on edge devices.
Контекст
4K tokens
VRAM (GPU)
6 GB
RAM (CPU)
8 GB
Размер
5.5 GB
ollama pull minicpm-v:8b

Falcon 3 10B

10B params · Dense
TII's latest Falcon. 14T training tokens. Strong multilingual (EN, FR, ES, PT). Function calling, code.
Контекст
32K tokens
VRAM (GPU)
8 GB
RAM (CPU)
12 GB
Размер
6.0 GB
ollama pull falcon3:10b

Nomic Embed Text 137M

137M params · Embedding
Nomic AI's embedding model. #1 on MTEB among open models. 768-dim vectors. Perfect for RAG and search.
Контекст
8K tokens
VRAM (GPU)
0.5 GB
RAM (CPU)
1 GB
Размер
0.3 GB
ollama pull nomic-embed-text

mxbai-embed-large 335M

335M params · Embedding
Mixedbread AI's embedding. 1024-dim vectors, MTEB leader. Best open embedding for retrieval tasks.
Контекст
512 tokens
VRAM (GPU)
1 GB
RAM (CPU)
2 GB
Размер
0.7 GB
ollama pull mxbai-embed-large