DeepSeekTUI.wiki

DeepSeek model families

DeepSeek TUI is model-agnostic at the wiring layer but shines when paired with DeepSeek V4 via API. This reference collects related families people still deploy locally or fine-tune—complete with typical sizes for discovery on Hugging Face or GGUF mirrors.

Mixture-of-experts in one sentence

Many flagship DeepSeek checkpoints use MoE: large total parameter counts with smaller subsets active per token. That impacts VRAM planning—always read the model card for actives vs totals before downloading quantized builds.

Families & tables

DeepSeek V4

Current-generation models DeepSeek positions for agentic coding and long context. DeepSeek TUI is tuned around V4 APIs and workflows.

Model Params Context License Notes
deepseek-v4-flash Mixture-of-experts (public docs) 1M tokens API terms Cost-efficient default for tooling-heavy sessions.
deepseek-v4-pro Mixture-of-experts (public docs) 1M tokens API terms Stronger reasoning; higher price (discount periods may apply).

DeepSeek R1

Reasoning-focused line associated with chain-of-thought style outputs. Often compared to other reasoning models for math and logic-heavy prompts.

Model Params Context License Notes
DeepSeek-R1 671B MoE (reported) 128K (typical API) MIT / variants Flagship reasoning model; distill smaller checkpoints exist.
DeepSeek-R1-Distill-Qwen-1.5B 1.5B Varies MIT Ultra-small distilled line for experimentation.
DeepSeek-R1-Distill-Qwen-7B 7B Varies MIT Common mid-size distilled checkpoint.
DeepSeek-R1-Distill-Qwen-14B 14B Varies MIT Balances quality and hardware requirements.
DeepSeek-R1-Distill-Qwen-32B 32B Varies MIT Higher quality distilled tier.
DeepSeek-R1-Distill-Llama-70B 70B Varies MIT Large distilled variant for strong local runs if you have GPUs.

DeepSeek V3

Prior flagship MoE generation; still widely referenced for Hugging Face downloads and GGUF conversions.

Model Params Context License Notes
DeepSeek-V3 671B MoE (reported) 128K (typical) MIT Successor narrative continues with V4 for newest API features.
DeepSeek-V3-0324 671B MoE 128K MIT Point release commonly mirrored on Hugging Face.

DeepSeek Coder

Code-specialized models popular for completion and repository tasks before V-class unified lines absorbed much of that traffic.

Model Params Context License Notes
deepseek-coder-1.3b-base 1.3B 16K DeepSeek license Tiny baseline for constrained hardware.
deepseek-coder-6.7b-base 6.7B 16K DeepSeek license Useful small coder baseline.
deepseek-coder-33b-base 33B 16K DeepSeek license Strong local coder if VRAM allows.
deepseek-coder-v2-instruct MoE (236B total, 21B active reported) 128K DeepSeek license Instruct-tuned coding MoE.

DeepSeek VL / VL2

Vision-language variants for image+text prompts; relevant when your workflow mixes screenshots, diagrams, or UI captures.

Model Params Context License Notes
DeepSeek-VL2 MoE variants Multimodal DeepSeek license Check Hugging Face cards for exact variant sizes (Tiny/Small/etc.).
DeepSeek-VL 7B class Multimodal DeepSeek license Earlier VL line; still referenced in quantization/GGUF repos.

Running locally

GGUF, AWQ, and other quantized builds vary by maintainer. Start with the official model card, then follow Local deployment for Ollama, vLLM, or SGLang routing into DeepSeek TUI.