Локальные LLM | Qantcore — Аналитика AI-агентов

Granite 3.1 8B

8B params · Dense

Dense LLM from IBM supporting up to 128K context length, trained on 12T tokens. Suitable for general instructions following and can be used to build AI assistants.

Контекст

128K tokens

VRAM (GPU)

6 GB

RAM (CPU)

8 GB

Размер

4.9 GB

ollama pull granite3.1:8b

Ollama → HuggingFace →

Gemma 3 4B

4B params · Multimodal

State-of-the-art image + text input models from Google, built from the same research and tech used to create the Gemini models.

Контекст

32K tokens

VRAM (GPU)

3 GB

RAM (CPU)

4 GB

Размер

2.6 GB

ollama pull gemma3:4b

Ollama → HuggingFace →

Gemma 2 9B

9B params · Dense

The mid-sized option of the Gemma 2 model family. Built by Google, using from the same research and technology used to create the Gemini models.

Контекст

8K tokens

VRAM (GPU)

8 GB

RAM (CPU)

10 GB

Размер

5.4 GB

ollama pull gemma2:9b

Ollama → HuggingFace →

Gemma 2 27B

27B params · Dense

The large option of the Gemma 2 model family. Built by Google, using from the same research and technology used to create the Gemini models.

Контекст

8K tokens

VRAM (GPU)

18 GB

RAM (CPU)

28 GB

Размер

15.7 GB

ollama pull gemma2:27b

Ollama → HuggingFace →

Devstral Small 2505

22B params · Dense

Devstral by MistralAI is based on Mistral Small 3.1. Debuts as the #1 open source model on SWE-bench.

Контекст

128K tokens

VRAM (GPU)

16 GB

RAM (CPU)

24 GB

Размер

13 GB

ollama pull devstral-small:2505

Ollama → Mistral AI →

Qwen2.5 VL 7B

7B params · Vision-Language

A 7B Vision Language Model (VLM) from the Qwen2.5 family.

Контекст

128K tokens

VRAM (GPU)

6 GB

RAM (CPU)

8 GB

Размер

4.7 GB

ollama pull qwen2.5-vl:7b

Ollama → HuggingFace →

Phi 4 Reasoning Plus

14B params · Reasoning

Advanced open-weight reasoning model, finetuned from Phi-4 with additional reinforcement learning for higher accuracy.

Контекст

128K tokens

VRAM (GPU)

12 GB

RAM (CPU)

16 GB

Размер

8.5 GB

ollama pull phi4:14b

Ollama → HuggingFace →

Phi 4 Mini Reasoning

3.8B params · Lightweight

Lightweight open model from the Phi-4 family.

Контекст

128K tokens

VRAM (GPU)

3 GB

RAM (CPU)

4 GB

Размер

2.4 GB

ollama pull phi4-mini:3.8b

Ollama → HuggingFace →

Gemma 3 27B

27B params · Multimodal

State-of-the-art image + text input models from Google, built from the same research and tech used to create the Gemini models.

Контекст

128K tokens

VRAM (GPU)

18 GB

RAM (CPU)

28 GB

Размер

17 GB

ollama pull gemma3:27b

Ollama → HuggingFace →

Gemma 3 1B

1B params · Ultralight

State-of-the-art image + text input models from Google, built from the same research and tech used to create the Gemini models. Smallest model in the Gemma 3 family — runs anywhere.

Контекст

32K tokens

VRAM (GPU)

1 GB

RAM (CPU)

2 GB

Размер

0.8 GB

ollama pull gemma3:1b

Ollama → HuggingFace →

Llama 3.2 3B

3B params · Lightweight

Meta's lightweight multilingual model. Text-only, great for on-device and edge deployment. Supports 8 languages.

Контекст

128K tokens

VRAM (GPU)

3 GB

RAM (CPU)

4 GB

Размер

2.0 GB

ollama pull llama3.2:3b

Ollama → HuggingFace →

Llama 3.1 8B

8B params · Dense

Meta's flagship 8B. Multilingual, strong reasoning, tool use. The most popular open model for general tasks.

Контекст

128K tokens

VRAM (GPU)

6 GB

RAM (CPU)

8 GB

Размер

4.9 GB

ollama pull llama3.1:8b

Ollama → HuggingFace →

Llama 3.3 70B

70B params · Dense

Meta's most powerful open model. Near GPT-4 class on reasoning, coding, and instruction following.

Контекст

128K tokens

VRAM (GPU)

40 GB

RAM (CPU)

70 GB

Размер

40 GB

ollama pull llama3.3:70b

Ollama → HuggingFace →

Mistral 7B

7B params · Dense

Mistral AI's groundbreaking 7B. Outperforms larger models on reasoning. Excellent for fine-tuning.

Контекст

8K tokens

VRAM (GPU)

6 GB

RAM (CPU)

8 GB

Размер

4.1 GB

ollama pull mistral:7b

Ollama → HuggingFace →

Mixtral 8x7B

46B params · MoE (8 experts)

Mistral's Mixture-of-Experts. 8 experts x 7B, activates 2 per token. 46B performance at 12B inference cost.

Контекст

32K tokens

VRAM (GPU)

26 GB

RAM (CPU)

46 GB

Размер

26 GB

ollama pull mixtral:8x7b

Ollama → HuggingFace →

DeepSeek-R1 8B

8B params · Reasoning (CoT)

DeepSeek's reasoning model with chain-of-thought. Shows thinking process. Strong on math, code, logic.

Контекст

128K tokens

VRAM (GPU)

6 GB

RAM (CPU)

8 GB

Размер

4.9 GB

ollama pull deepseek-r1:8b

Ollama → HuggingFace →

DeepSeek-R1 32B

32B params · Reasoning (CoT)

DeepSeek-R1 distilled to 32B (Qwen base). Strong reasoning with visible chain-of-thought. Near GPT-4o on math.

Контекст

128K tokens

VRAM (GPU)

20 GB

RAM (CPU)

32 GB

Размер

19 GB

ollama pull deepseek-r1:32b

Ollama → HuggingFace →

DeepSeek-Coder V2 16B

16B params · MoE · Code

MoE code model from DeepSeek. 338 languages, 128K context. Strongest open code model for single-GPU.

Контекст

128K tokens

VRAM (GPU)

12 GB

RAM (CPU)

16 GB

Размер

8.9 GB

ollama pull deepseek-coder-v2:16b

Ollama → HuggingFace →

Qwen2.5 7B

7B params · Dense

Alibaba's Qwen2.5 7B. 29 languages, 128K context. Great all-rounder for Asian and European languages.

Контекст

128K tokens

VRAM (GPU)

6 GB

RAM (CPU)

8 GB

Размер

4.7 GB

ollama pull qwen2.5:7b

Ollama → HuggingFace →

Qwen2.5-Coder 7B

7B params · Code specialist

Alibaba's code-specialized model. Trained on 5.5T tokens of code. 92 programming languages.

Контекст

128K tokens

VRAM (GPU)

6 GB

RAM (CPU)

8 GB

Размер

4.7 GB

ollama pull qwen2.5-coder:7b

Ollama → HuggingFace →

CodeLlama 13B

13B params · Code specialist

Meta's code-specialized Llama. Python specialist with fill-in-the-middle. Code completion and generation.

Контекст

16K tokens

VRAM (GPU)

10 GB

RAM (CPU)

14 GB

Размер

7.4 GB

ollama pull codellama:13b

Ollama → HuggingFace →

LLaVA 13B

13B params · Vision-Language

Large Language and Vision Assistant. Llama 2 + vision encoder. Image understanding, OCR, visual QA.

Контекст

4K tokens

VRAM (GPU)

10 GB

RAM (CPU)

14 GB

Размер

7.4 GB

ollama pull llava:13b

Ollama → HuggingFace →

Dolphin Mixtral 8x7B

46B params · MoE · Uncensored

Uncensored Mixtral fine-tune. Removes refusal behaviour. Good for creative and unrestricted tasks.

Контекст

32K tokens

VRAM (GPU)

26 GB

RAM (CPU)

46 GB

Размер

26 GB

ollama pull dolphin-mixtral:8x7b

Ollama → HuggingFace →

Zephyr 7B

7B params · Fine-tuned

HuggingFace's DPO-trained Mistral 7B. Punches above weight on chat benchmarks. Excellent conversationalist.

Контекст

8K tokens

VRAM (GPU)

6 GB

RAM (CPU)

8 GB

Размер

4.1 GB

ollama pull zephyr:7b

Ollama → HuggingFace →

OpenChat 7B

7B params · Chat-optimised

C-RLFT trained Mistral 7B. Top performer on MT-Bench among 7B models. Natural conversation.

Контекст

8K tokens

VRAM (GPU)

6 GB

RAM (CPU)

8 GB

Размер

4.1 GB

ollama pull openchat:7b

Ollama → HuggingFace →

Yi 34B

34B params · Dense

01.AI's 34B model. 3T training tokens. Strong bilingual (EN/ZH). Near GPT-3.5 performance.

Контекст

200K tokens

VRAM (GPU)

22 GB

RAM (CPU)

34 GB

Размер

20 GB

ollama pull yi:34b

Ollama → HuggingFace →

Command R+ 104B

104B params · Dense

Cohere's flagship. RAG-optimised, tool-use native, 10 languages. Enterprise-grade reasoning.

Контекст

128K tokens

VRAM (GPU)

60 GB

RAM (CPU)

100 GB

Размер

60 GB

ollama pull command-r-plus:104b

Ollama → HuggingFace →

StarCoder2 15B

15B params · Code specialist

BigCode project. The Stack v2 (600+ languages). Fill-in-the-middle, code completion.

Контекст

16K tokens

VRAM (GPU)

12 GB

RAM (CPU)

16 GB

Размер

9.0 GB

ollama pull starcoder2:15b

Ollama → HuggingFace →

SQLCoder 7B

7B params · SQL specialist

Defog.ai's SQL specialist. 20K+ SQL queries across diverse schemas. Text-to-SQL near GPT-4 accuracy.

Контекст

8K tokens

VRAM (GPU)

6 GB

RAM (CPU)

8 GB

Размер

4.1 GB

ollama pull sqlcoder:7b

Ollama → HuggingFace →

WizardLM 2 8x22B

141B params · MoE (8 experts)

Microsoft's MoE model. Evolved training on complex instructions. Top reasoning, multilingual, coding.

Контекст

64K tokens

VRAM (GPU)

80 GB

RAM (CPU)

140 GB

Размер

78 GB

ollama pull wizardlm2:8x22b

Ollama → HuggingFace →

Nous Hermes 2 Mixtral 8x7B

46B params · MoE · Instruction-tuned

Nous Research fine-tune on 1M+ instructions. Structured output, function calling, JSON mode. ChatML format.

Контекст

32K tokens

VRAM (GPU)

26 GB

RAM (CPU)

46 GB

Размер

26 GB

ollama pull nous-hermes2:mixtral

Ollama → HuggingFace →

DBRX Instruct 132B

132B params · MoE (16 experts)

Databricks' MoE flagship. 16 experts, 4 active. Strong at structured data, SQL, Python, reasoning.

Контекст

32K tokens

VRAM (GPU)

80 GB

RAM (CPU)

130 GB

Размер

75 GB

ollama pull dbrx:instruct

Ollama → HuggingFace →

MiniCPM-V 8B

8B params · Vision-Language

OpenBMB's vision model. Strong OCR, image understanding, bilingual (EN/ZH). Runs on edge devices.

Контекст

4K tokens

VRAM (GPU)

6 GB

RAM (CPU)

8 GB

Размер

5.5 GB

ollama pull minicpm-v:8b

Ollama → HuggingFace →

Falcon 3 10B

10B params · Dense

TII's latest Falcon. 14T training tokens. Strong multilingual (EN, FR, ES, PT). Function calling, code.

Контекст

32K tokens

VRAM (GPU)

8 GB

RAM (CPU)

12 GB

Размер

6.0 GB

ollama pull falcon3:10b

Ollama → HuggingFace →

Nomic Embed Text 137M

137M params · Embedding

Nomic AI's embedding model. #1 on MTEB among open models. 768-dim vectors. Perfect for RAG and search.

Контекст

8K tokens

VRAM (GPU)

0.5 GB

RAM (CPU)

1 GB

Размер

0.3 GB

ollama pull nomic-embed-text

Ollama → HuggingFace →

mxbai-embed-large 335M

335M params · Embedding

Mixedbread AI's embedding. 1024-dim vectors, MTEB leader. Best open embedding for retrieval tasks.

Контекст

512 tokens

VRAM (GPU)

1 GB

RAM (CPU)

2 GB

Размер

0.7 GB

ollama pull mxbai-embed-large

Ollama → HuggingFace →

🖥️ Локальные LLM

Granite 3.1 8B

Gemma 3 4B

Gemma 2 9B

Gemma 2 27B

Devstral Small 2505

Qwen2.5 VL 7B

Phi 4 Reasoning Plus

Phi 4 Mini Reasoning

Gemma 3 27B

Gemma 3 1B

Llama 3.2 3B

Llama 3.1 8B

Llama 3.3 70B

Mistral 7B

Mixtral 8x7B

DeepSeek-R1 8B

DeepSeek-R1 32B

DeepSeek-Coder V2 16B

Qwen2.5 7B

Qwen2.5-Coder 7B

CodeLlama 13B

LLaVA 13B

Dolphin Mixtral 8x7B

Zephyr 7B

OpenChat 7B

Yi 34B

Command R+ 104B

StarCoder2 15B

SQLCoder 7B

WizardLM 2 8x22B

Nous Hermes 2 Mixtral 8x7B

DBRX Instruct 132B

MiniCPM-V 8B

Falcon 3 10B

Nomic Embed Text 137M

mxbai-embed-large 335M