Install & run
About 927 wordsAbout 3 min
2026-03-23
1. Prerequisites
- Python 3.11+
- An Anthropic-compatible gateway URL + API key (the examples use
http://35.220.164.252:3888) - Optional: MySQL or PostgreSQL client libs — only if you point the
dbcapability at one of them.
Tips
DataMind v0.2 does not require the claude CLI or claude-agent-sdk. We talk to the gateway via the official anthropic Python SDK — some environments ship a vendor-rebranded claude binary that ignores ANTHROPIC_API_KEY, so we sidestep that entirely.
2. Install
git clone https://github.com/your-org/DataMind.git
cd DataMind
python -m venv .venv && source .venv/bin/activate
# Install the v0.2 package
pip install -e .
# Optional extras
pip install -e '.[mysql]' # pymysql + cryptography
pip install -e '.[huggingface]' # sentence-transformers for local embeddings
pip install -e '.[dev]' # pytest + pytest-asyncio3. Configure
cp .env.datamind.example .env.datamind
$EDITOR .env.datamindMinimum required:
DATAMIND__LLM__API_BASE=http://35.220.164.252:3888
DATAMIND__LLM__API_KEY=sk-your-key-here
DATAMIND__LLM__MODEL=claude-sonnet-4-6That's it. The same key drives embeddings too — the embedding provider auto-falls-back to LLM credentials if DATAMIND__EMBEDDING__* isn't set.
4. Verify connectivity
python -m datamind.scripts.hello_sdkExpected:
[hello_sdk] gateway = http://35.220.164.252:3888/
[hello_sdk] model = claude-sonnet-4-6
[hello_sdk] prompt = 'Reply with just the single word: pong'
[hello_sdk] --- stream ---
pong
[hello_sdk] usage: input=16 output=5 cache_read=0 cache_create=0
[hello_sdk] OK: gateway reachable, streaming works, model replied 'pong'.If this fails, the rest will too — fix the API key or base URL first.
5. Try every capability standalone
Each capability ships with a live smoke script. They use a throwaway profile (hello_<cap>_demo) so they don't touch real data.
python -m datamind.scripts.hello_kb # Chroma + hybrid retriever
python -m datamind.scripts.hello_db # SQLite + NL2SQL + safeguards
python -m datamind.scripts.hello_graph # NetworkX multi-hop (no LLM needed)
python -m datamind.scripts.hello_skills # SKILL.md semantic search
python -m datamind.scripts.hello_memory # short + long term + fact extraction
python -m datamind.scripts.hello_agent # the full agent — 4 real questions6. Run the full agent
# Interactive REPL
python -m datamind chat
# One-shot question
python -m datamind ask "How do I review a pull request?"
# Build / rebuild the KB vector index for the active profile
python -m datamind ingest
# Show config and registered tools
python -m datamind info7. HTTP server + browser UI
python -m uvicorn datamind.server:app --host 127.0.0.1 --port 8000Open http://127.0.0.1:8000 in your browser for a chat UI with streamed answers, collapsible tool-call cards, and a sidebar inspector (config, tools, graph stats, KB docs, memory viewer, one-click reindex).
Or talk to the API directly:
| Method & path | Purpose |
|---|---|
GET / | Browser UI (served from static/app.html) |
GET /api/health | Liveness + config snapshot |
GET /api/tools | Every registered tool's name, description, and JSON schema |
POST /api/ask | Non-streaming convenience |
POST /api/chat | Real SSE stream of text / tool_use / tool_result / done events |
POST /api/kb/reindex | Rebuild the KB |
GET /api/kb/documents | Docs under the active profile |
GET /api/memory/{namespace} | Peek at a memory namespace |
GET /api/graph/stats | Node / edge counts |
Quick check:
curl -s http://127.0.0.1:8000/api/health | jq
curl -s -X POST http://127.0.0.1:8000/api/ask \
-H 'Content-Type: application/json' \
-d '{"message":"Say 你好."}' | jqStream:
curl -N -X POST http://127.0.0.1:8000/api/chat \
-H 'Content-Type: application/json' \
-d '{"message":"Tell me the Status meeting time."}'7a. (Optional) Swap the agent loop to claude-agent-sdk
DataMind ships two interchangeable agent-loop implementations. Toggle with one env var:
| Backend | Path | Pick when |
|---|---|---|
native (default) | Pure Python, anthropic SDK → your gateway | Simplest deploy, fewest deps |
sdk | claude-agent-sdk → claude CLI → CCR → your gateway | You want Hooks / Subagents / Compaction / Plan mode |
The 23 DataMind tools, SSE event protocol, and frontend are identical on both — only the inner loop differs.
Why CCR
The SDK only speaks Anthropic's /v1/messages. If your upstream is OpenAI-format (/v1/chat/completions), drop claude-code-router (CCR) in the middle — a tiny Node process that translates both directions, ~20ms overhead per request.
Start CCR
# Needs node >= 18
export UPSTREAM_BASE=http://your-gateway.example.com/v1
export UPSTREAM_KEY=sk-...
export UPSTREAM_MODEL=claude-sonnet-4-6
bash scripts/start_ccr.sh
# → listens on http://127.0.0.1:13456Keep it running in its own terminal.
Switch DataMind to the SDK backend
# In .env.datamind or exported inline:
export DATAMIND__AGENT__BACKEND=sdk
export DATAMIND__AGENT__CCR_BASE_URL=http://127.0.0.1:13456
# Everything else unchanged:
python -m datamind chat
python -m uvicorn datamind.server:app --port 8000Server startup logs show the backend:
INFO agent_loop_backend backend=sdk ccr=http://127.0.0.1:13456Switch back to native any time with DATAMIND__AGENT__BACKEND=native (or unset — it's the default).
8. Run the test suite
pytest datamind/tests/
# 95 passed in ~0.6s — no network requiredLegacy v0.1 still works
If you want to compare, the old main.py / server.py / modules/ layout remains untouched. It still reads the original .env keys (LLM_API_BASE, LLM_API_KEY, etc.).
Compatible gateways & models
Any Anthropic-compatible /v1/messages service works. Confirmed against http://35.220.164.252:3888 with:
| Model | Good for |
|---|---|
claude-opus-4-7 | Complex reasoning subagents |
claude-sonnet-4-6 | Main agent (default) |
claude-haiku-4-5-20251001 | Memory fact extraction, cheap subtasks |
Troubleshooting
| Symptom | Fix |
|---|---|
ValidationError: llm.api_key: Field required | Set DATAMIND__LLM__API_KEY in env or .env.datamind. |
401 Invalid token | Key / gateway mismatch; verify with curl $BASE/v1/messages. |
Unknown embedding provider 'openai' | Run from the repo root so datamind is importable. |
OperationalError: no such table: employees | You hit db_query_nl before seeding; run hello_agent.py or seed manually. |
Agent not ready from HTTP server | Lifespan still warming up; wait a few seconds, retry /api/health. |
