ByteDance DeerFlow Complete Guide: Install, Configure DeepSeek, Run Research, and the Privacy Question You're Probably Wondering About
DeerFlow hit 45k GitHub stars last month, landing the #1 spot on Trending. It's been making the rounds on Twitter and developer communities — but complete, practical setup guides are still surprisingly sparse.
This one fills that gap. We'll cover what DeerFlow actually does differently from ChatGPT, how to get it running, how to configure a cost-effective DeepSeek setup, how it stacks up against Perplexity and OpenAI Deep Research, and the question everyone's quietly wondering about: is it safe to use something built by ByteDance?
TL;DR
- DeerFlow isn't a smarter chatbot — it's a self-hosted framework that lets AI actually execute research tasks, with a Docker sandbox, real filesystem access, and code execution
- Installation requires Docker + Python 3.12 + Node.js 22; start with
make docker-start, then openlocalhost:2026 - Best budget option: DeepSeek v3 API — significantly cheaper than GPT-4o with comparable quality
- ByteDance privacy concerns are real: using Ollama local models keeps your data entirely on your machine
- Best suited for people who regularly run deep research tasks (competitive analysis, market reports) — for occasional lookups, Perplexity is simpler
What Is DeerFlow? The Fundamental Difference From ChatGPT
DeerFlow is not another chat interface.
When you ask ChatGPT "analyze the competitive strategy between Company A and Company B," you get coherent-sounding text assembled from training data. It hasn't actually checked Company A's latest filings or browsed Company B's pricing page to see what changed last week.
DeerFlow does something fundamentally different. It gives AI a dedicated computer: an isolated Docker sandbox with a real filesystem and bash terminal. Instead of just describing what should happen, it can actually execute the steps — browse the web, run Python scripts to analyze data, write results to files.
Twitter user @lxfater's description was surprisingly accurate: "ByteDance basically built openclaw + claude code + a sandbox."
Architecturally, DeerFlow is a SuperAgent orchestration framework. The main Orchestrator agent breaks your task into sub-tasks, dispatches them to specialized sub-agents running in parallel, and a Reporter synthesizes the final output. You give one instruction and wait for results.
That waiting is also its limitation, though. Multi-step agent systems naturally accumulate hallucination errors — a small mistake in step one can compound by step three. DeerFlow has no built-in grounding or cross-verification mechanism, so you still need to review the output yourself. Think of it as a highly capable research assistant, not a source you can trust blindly.
The core question to ask yourself: does your research task require AI to take actions? (browsing, running analysis, organizing files) If yes, DeerFlow is worth the setup time. If you just need fast answers, Perplexity is more practical.
DeerFlow 2.0 Core Features
DeerFlow 2.0 is a complete departure from v1. The official explanation: the community used it in ways that far exceeded what the original was built for, so they rewrote it entirely with zero shared code. The result has four core capabilities:
Docker Sandbox Environment: Each task runs in an isolated container. AI can install packages, run scripts, read and write files — without touching your main system. This is the fundamental line between DeerFlow and pure chat tools. Note: the Coder Agent inside the sandbox can execute arbitrary bash commands. If a prompt injection attack manipulates the AI into running malicious instructions, your main system is isolated, but data inside the sandbox is exposed. Don't put sensitive files in the sandbox.
Hierarchical Multi-Agent System: The main agent breaks tasks into sub-tasks, dispatched to sub-agents running in parallel. A competitive analysis might have three sub-agents simultaneously pulling data on different companies, which the Reporter then synthesizes into one coherent output.
Markdown Skills System: Workflows are defined in Markdown files, no code required. You can customize your research pipeline ("search → analyze → generate slides") and the system follows the defined steps.
Persistent Memory: DeerFlow remembers your preferences and context across sessions. If you ran a competitive analysis last week, this week's follow-up can build on those conclusions without re-establishing context.
Beyond text reports, DeerFlow can generate PPT slides, full web pages, and data dashboards. Primary output formats are Markdown and HTML, which you can copy directly into Notion or similar tools. That said, I haven't seen independent quality evaluations for the slide and web page generation — official demos look solid, but real-world results will vary.
Telegram, Slack, and Feishu integrations are also worth noting. You can send DeerFlow instructions directly from a Telegram group and receive results when the task completes, in the background. For teams, that's noticeably more convenient than keeping a browser tab open.
One practical reality check: DeerFlow 2.0 launched in late February 2026 and is still iterating rapidly. The Python requirement has already moved from 3.11 to 3.12. Commands and configuration options will likely keep changing over the coming months. Pin to a specific release tag rather than tracking main.
Requirements and Installation (Mac / Windows)
Installation isn't particularly hard, but a few things will catch you the first time. Here are the requirements, then a walkthrough.
What You Need
- Python 3.12+ (3.11 and below won't work)
- Node.js 22+
- Docker Desktop (required — the sandbox runs on it)
- pnpm (Node package manager)
- uv 0.7.20+ (Python package manager)
Mac users with Homebrew can install most of these with brew install. Windows users should set up Docker Desktop and WSL2 first.
Installation Steps
# 1. Clone the repository
git clone https://github.com/bytedance/deer-flow.git
cd deer-flow
# 2. Generate config templates
make config
# 3. Edit .env to add your API key (covered in the next section)
# Open .env in your editor of choice
# 4. Start with Docker (recommended)
make docker-start
Open your browser to http://localhost:2026 — if the interface loads, you're in.
Things to Know Before You Start
The most common failure point: skipping make config and jumping straight to start. This command generates the config.yaml and .env templates. Without running it, everything downstream breaks.
DeerFlow also uses four ports: 2026 (nginx unified entry), 8001 (gateway API), 2024 (LangGraph server), 3000 (frontend). If anything else on your machine is using these ports, startup will fail.
To verify your environment before starting:
make check
This validates that all dependencies are installed and accessible.
API Setup and Model Selection: DeepSeek, Gemini, or Ollama?
One of DeerFlow's best design decisions: it's completely model-agnostic. Any model with an OpenAI-compatible API works. You don't have to use GPT-4o.
Three Paths, Depending on Your Priorities
Path 1: DeepSeek API (recommended for most people)
Low cost, solid quality, simple setup. Add your key to .env:
DEEPSEEK_API_KEY=your-key-here
Then open config.yaml and set the model to DeepSeek v3 (the exact field names depend on your version — the template generated by make config includes comments explaining each option). DeepSeek API costs considerably less than GPT-4o, making it the practical starting point for budget-conscious users.
Path 2: OpenAI API (if you already have a key)
Straightforward. Set in .env:
OPENAI_API_KEY=your-key-here
Most stable quality, highest cost. If you're already paying for API access for other projects, this is the path of least resistance.
Path 3: Ollama Local Models (zero cost + maximum privacy)
Your data never leaves your machine. Install Ollama, pull a model (Qwen or DeepSeek local recommended), then point DeerFlow's API endpoint to localhost:11434.
The tradeoff: you need a decent GPU (at least 8GB VRAM) and inference will be noticeably slower than cloud APIs. But for privacy-sensitive work, this is the only option that guarantees zero data transmission.
How to Choose
| Factor | DeepSeek API | OpenAI API | Ollama Local |
|---|---|---|---|
| Cost | Low | High | Zero (but needs GPU) |
| Quality | Close to GPT-4o | Most consistent | Depends on model and hardware |
| Privacy | Data sent to DeepSeek servers | Data sent to OpenAI | Fully local |
| Setup difficulty | Low | Low | Medium |
If you don't have specific privacy requirements, start with DeepSeek API. Once you've confirmed DeerFlow fits your workflow, revisit whether the Ollama investment makes sense.
DeerFlow vs OpenAI Deep Research vs Perplexity: Different Tools, Different Jobs
These three tools get compared constantly, but they're not really competing for the same use case.
Perplexity: Fastest, simplest. Ask a question, get a cited answer in seconds. Great for quick lookups and fact-checking. Zero setup, open and go.
OpenAI Deep Research: Requires a ChatGPT Plus subscription. Give it a research topic, get a high-quality deep report in a few minutes. No workflow customization — the output is the report.
DeerFlow: Open-source, self-hosted. According to LiveResearchBench evaluations, research quality is comparable to Deep Research (overall averages differ by less than 1 point). The additional value is execution capability and customization — define your own research pipeline, run Python analysis, deploy as a Telegram bot for team use. The tradeoff is installation and setup time.
| Perplexity | OpenAI Deep Research | DeerFlow | |
|---|---|---|---|
| Best for | Quick lookups, citations | One-off deep reports | Repeating complex tasks, custom pipelines |
| Cost | Pro $20/month | ChatGPT Plus $20/month | Model API costs (can be free) |
| Setup required | None | None | Medium-high |
| Customizable | No | No | Fully open |
| Can execute code | No | Limited | Full Docker sandbox |
My take: if you look things up a few times a week, use Perplexity. If you occasionally need a thorough report on something, Deep Research is fine. But if you're running similar research tasks every week — competitive tracking, market reports, technical documentation — DeerFlow's one-time setup pays off over time.
ByteDance Privacy Risks: Where Does Your Data Actually Go?
This is the section I wasn't going to skip. DeerFlow is a ByteDance product, and ByteDance is a Chinese company. That fact doesn't change because the project is open-source.
The Technical Layer (solvable)
The good news: DeerFlow itself doesn't collect your data. Where your research content actually goes depends entirely on your LLM backend:
- OpenAI API → data goes to OpenAI
- DeepSeek API → data goes to DeepSeek servers
- Ollama local models → data never leaves your machine
So technically, Ollama gives you zero data transmission. The official Chinese README explicitly warns: "deploy only in locally trusted environments."
The Legal Layer (not solvable technically)
ByteDance is subject to Chinese law. VentureBeat's enterprise analysis specifically noted that regulated industries — finance, healthcare, government — need compliance review before deploying DeerFlow.
More concerning is the precedent. ByteDance's Trae IDE was previously reported by TechRadar to have collected user data. DeerFlow is a different product, but the same company's trust track record factors into the judgment.
There's also this: no public independent security audit exists. 45k stars, many users — but I haven't found any third-party systematic review of the source code. That transparency gap is itself a risk signal.
How I'd Approach It
Three scenarios:
- Personal research, non-sensitive data: DeepSeek API is fine. You're already sending data to Google when you search.
- Business, non-sensitive: Workable, but don't expose DeerFlow's ports externally. Keep it on an internal network.
- Business, sensitive data (financial, customer, medical): Either use Ollama fully offline, or don't use DeerFlow. The legal risk isn't something technical measures can eliminate.
This isn't meant to scare you off — it's context for a real decision. Good tool, but direct your trust accordingly.
Installation Troubleshooting: 5 Most Common Errors
Based on GitHub Issues and community reports, the problems are almost always the same five:
1. API Key Not Configured
Symptom: Any task reports API key not configured after startup.
Fix: Confirm you ran make config first to generate the .env file, then added at least one API key. Forgetting to run config is the single most common cause.
2. Pydantic Validation Error / JSON Parse Failure
Symptom: Tasks fail mid-execution with ValidationError or JSON parse errors.
Fix: Usually the model isn't capable enough to reliably produce structured JSON. Upgrade to a stronger model (e.g., DeepSeek v2 → v3), or verify you're on the latest DeerFlow version.
3. Port Conflicts
Symptom: make docker-start fails with "port already in use" in the logs.
Fix:
# See what's using the port
lsof -i :2026
# Kill the process, or reconfigure DeerFlow's port settings
DeerFlow uses four ports by default: 2026, 8001, 2024, 3000. Any one being occupied will prevent startup.
4. Wrong Python Version
Symptom: Syntax errors or package incompatibilities during installation.
Fix:
# Check your Python version
python3 --version
# If not 3.12+, pin the version with uv
uv python pin 3.12
5. Docker Sandbox Image Not Pulled
Symptom: Sandbox environment fails to start when running tasks.
Fix:
make setup-sandbox
This pre-pulls the required Docker image. First run may take a few minutes depending on your connection speed.
Universal Diagnostic Commands
When you're not sure where the problem is:
make check # Verify all dependencies
curl localhost:2026/api/health # Check if the API is responding
make docker-logs-gateway # View gateway logs
The Verdict: Useful Tool, But Not for Everyone
DeerFlow has real installation friction, real privacy considerations, and outputs that still need human review. But for people who regularly run deep research — weekly competitive tracking, market reports, systematic documentation review — it's the most complete open-source option currently available.
Use DeepSeek to keep costs low, Ollama if privacy is a priority. The setup investment is a few hours, the benefit compounds over time.
If you want to try it: confirm you have Docker and Python 3.12 → git clone → make config → add a DeepSeek API key → make docker-start → give it a research task that would normally take you 30 minutes manually. That result tells you whether it's worth continuing to invest in.
FAQ
Is DeerFlow free? How does the cost work?
The DeerFlow framework itself is completely free and open-source. Your actual cost depends on which LLM you use: Ollama local models are free (but require a GPU), DeepSeek API costs significantly less than OpenAI, and GPT-4o is the most expensive option. Prices vary with model rates, so check each provider's current pricing. The framework is free — you pay for the model.
Does DeerFlow support Chinese research tasks and output?
Yes, though quality depends on your model choice. GPT-4o and DeepSeek v3 both handle Chinese well. Specify 'please write in Traditional Chinese' in your prompt and you'll get Chinese output. For Ollama local models, Qwen series tends to produce the most stable Chinese results.



