GitHub

DeepSeek model families

DeepSeek TUI is model-agnostic at the wiring layer but shines when paired with DeepSeek V4 via API. This reference collects related families people still deploy locally or fine-tune—complete with typical sizes for discovery on Hugging Face or GGUF mirrors.

Mixture-of-experts in one sentence

Many flagship DeepSeek checkpoints use MoE: large total parameter counts with smaller subsets active per token. That impacts VRAM planning—always read the model card for actives vs totals before downloading quantized builds.

Families & tables

DeepSeek V4

Current-generation models DeepSeek positions for agentic coding and long context. DeepSeek TUI is tuned around V4 APIs and workflows.

Model	Params	Context	License	Notes
deepseek-v4-flash	Mixture-of-experts (public docs)	1M tokens	API terms	Cost-efficient default for tooling-heavy sessions.
deepseek-v4-pro	Mixture-of-experts (public docs)	1M tokens	API terms	Stronger reasoning; higher price (discount periods may apply).

DeepSeek R1

Reasoning-focused line associated with chain-of-thought style outputs. Often compared to other reasoning models for math and logic-heavy prompts.

Model	Params	Context	License	Notes
DeepSeek-R1	671B MoE (reported)	128K (typical API)	MIT / variants	Flagship reasoning model; distill smaller checkpoints exist.
DeepSeek-R1-Distill-Qwen-1.5B	1.5B	Varies	MIT	Ultra-small distilled line for experimentation.
DeepSeek-R1-Distill-Qwen-7B	7B	Varies	MIT	Common mid-size distilled checkpoint.
DeepSeek-R1-Distill-Qwen-14B	14B	Varies	MIT	Balances quality and hardware requirements.
DeepSeek-R1-Distill-Qwen-32B	32B	Varies	MIT	Higher quality distilled tier.
DeepSeek-R1-Distill-Llama-70B	70B	Varies	MIT	Large distilled variant for strong local runs if you have GPUs.

DeepSeek V3

Prior flagship MoE generation; still widely referenced for Hugging Face downloads and GGUF conversions.

Model	Params	Context	License	Notes
DeepSeek-V3	671B MoE (reported)	128K (typical)	MIT	Successor narrative continues with V4 for newest API features.
DeepSeek-V3-0324	671B MoE	128K	MIT	Point release commonly mirrored on Hugging Face.

DeepSeek Coder

Code-specialized models popular for completion and repository tasks before V-class unified lines absorbed much of that traffic.

Model	Params	Context	License	Notes
deepseek-coder-1.3b-base	1.3B	16K	DeepSeek license	Tiny baseline for constrained hardware.
deepseek-coder-6.7b-base	6.7B	16K	DeepSeek license	Useful small coder baseline.
deepseek-coder-33b-base	33B	16K	DeepSeek license	Strong local coder if VRAM allows.
deepseek-coder-v2-instruct	MoE (236B total, 21B active reported)	128K	DeepSeek license	Instruct-tuned coding MoE.

DeepSeek VL / VL2

Vision-language variants for image+text prompts; relevant when your workflow mixes screenshots, diagrams, or UI captures.

Model	Params	Context	License	Notes
DeepSeek-VL2	MoE variants	Multimodal	DeepSeek license	Check Hugging Face cards for exact variant sizes (Tiny/Small/etc.).
DeepSeek-VL	7B class	Multimodal	DeepSeek license	Earlier VL line; still referenced in quantization/GGUF repos.

Running locally

GGUF, AWQ, and other quantized builds vary by maintainer. Start with the official model card, then follow Local deployment for Ollama, vLLM, or SGLang routing into DeepSeek TUI.

DeepSeek model families

Mixture-of-experts in one sentence

Families & tables

⚡ DeepSeek V4

🔬 DeepSeek R1

📦 DeepSeek V3

💻 DeepSeek Coder

🖼️ DeepSeek VL / VL2