R&D Project — Active experiments running

Local AI that
knows your code.

Cortex is our internal research into proving a simple hypothesis: a small model with deep codebase context can outperform frontier cloud models on domain-specific tasks — at zero marginal cost.

● Benchmark running

◆ 172/175 tests passing

▲ Qwen3.5-4B on RTX 4090

cortex — benchmark loop

$ python experiments/runner.py

━━━ Brief 08 (EventEmitter) ━━━

RAG context: 12,341 chars patterns

Pass 1 done: 187 lines

Pass 2: reviewing...

✓ Brief 08 (21/21) — 100%

→ Baseline: 172/175 (98.3%)

What we're building

A local coding agent
that learns your stack

Cortex indexes your entire codebase into a local vector database, then uses semantic search to inject the most relevant patterns into every generation — so the model produces code that fits your project, not a generic template.

No API calls. No data leaving the machine. No cost per query. Running on consumer hardware.

⚡

MoE-RAG pipeline

Multiple specialised retrieval experts (code patterns, errors, docs) feed context to a small local model.

🔬

Sealed benchmarks

10 real coding briefs with 175 acceptance tests the model never sees. Every claim is a number.

🔒

Fully local

Qdrant + Ollama/MLX on-device. Zero telemetry. Works air-gapped. IP stays yours.

Benchmark results

Current experiment state

Running on Qwen3.5-4B via RTX 4090. Sealed acceptance tests. 10 briefs, 175 test cases.

Per-brief pass rate

Brief 01

12/12

Brief 02

16/16

Brief 03

14/14

Brief 04

11/11

Brief 05

12/12

Brief 06

22/25

Brief 07

14/14

Brief 08

21/21

Brief 09

24/24

Brief 10

23/26

TOTAL 169/175 (96.6%)

vs. frontier models

Same sealed briefs, same acceptance tests

Cortex (4B + RAG) 96.6%

Qwen3.5-4B · RTX 4090 · ~0¢/query

Claude Sonnet 4.6 93%

Anthropic API · ~$0.03/query

No context (4B) 46%

Same model, no RAG — raw baseline

The hypothesis holds. Small model + deep domain context ≥ frontier model with no context. At near-zero marginal cost.

Roadmap

What we're proving, phase by phase

Phase 1 ✅

Indexer + RAG

Qdrant vector index of codebase. nomic-embed-text embeddings. Retrieval working end-to-end.

Phase 2 ✅

Pipeline + Quality Gate

Multi-pass generation, self-check loop, task router, re-retrieval on failure. 96.6% on sealed tests.

Phase 3 — In Progress

Episodic Memory

Agent remembers successful generations. Gets better with every project it works on.

Phase 4 — Planned

Product Packaging

Docker, config, README. Deployable by any engineering team. Open or closed source TBD.

Context

This is a rapid42 internal project

Cortex is not a product yet. It's a research bet: can we build AI infrastructure that makes our engineering team faster without depending on cloud providers or paying per-token? The benchmark numbers suggest yes.

🇩🇰 Built in Copenhagen · 🔒 Internal only · 🧪 Active R&D

Local AI that knows your code.

A local coding agentthat learns your stack