CodeClone b4: from CLI tool to a real review surface for VS Code, Claude Desktop, and Codex

# python# mcp# vscode# devtools

orenlab

I already wrote about why I built CodeClone and why I cared about baseline-aware code health. Then I...

I already wrote about why I built CodeClone and why I cared about baseline-aware code health.

Then I wrote about turning it into a read-only, budget-aware MCP server for AI agents.

This post is about what changed in 2.0.0b4.

The short version: if b3 made CodeClone usable through MCP, b4 made it feel like a product.

Not because I added more analysis magic or built a separate "AI mode." But because I pushed the same structural truth into the places where people and agents actually work — VS Code, Claude Desktop, Codex — and tightened the contract between all of them.

A lot of developer tools are strong on analysis and weak on workflow. A lot of AI-facing tools shine in a demo and fall apart in daily use.

For b4, I wanted a tighter shape:

the CLI, HTML report, MCP, and IDE clients should agree on what "health" means
the first pass should stay conservative
deeper inspection should be explicit, not accidental
report-only signals should stay visible without polluting gates
setup failures should tell you what went wrong

That is the release theme. Not "more output" — better day-to-day workflows.

The most interesting new layer: Overloaded Modules

Clone detection tells you this logic is repeated. Complexity tells you this function is locally hard to reason about.

Overloaded Modules asks a different question: which modules are taking on too much responsibility?

The signals include module size pressure, dependency pressure, hub-like shape, and reimport-heavy structure. This points to code that often feels wrong before it is easy to classify. You know the file keeps attracting logic. Every change in it feels heavier than it should. But it is not a clone group or a single high-complexity function.

The important design choice: this layer is report-only for now. It shows up in JSON, HTML, Markdown, text, MCP, and the VS Code extension — but it does not affect health score, gates, baseline novelty, or SARIF.

I wanted the signal to be useful before letting it become consequential.

VS Code became a real client, not a demo

The preview VS Code extension is the first release where CodeClone feels properly usable inside an editor instead of only around one.

It is now live on the Visual Studio Marketplace.

The extension is not a generic linter panel. It is built around a review loop:

Analyze the workspace.
Look at compact structural health.
Review priorities first.
Reveal source.
Open detail only when needed.

A lot of extensions get this wrong by dumping every result into the IDE and calling it integration. I wanted the opposite: a client that is baseline-aware, triage-first, source-first, trust-aware, and read-only.

b4 also tightened the surrounding UX:

Restricted Mode — onboarding works in untrusted workspaces, but analysis stays gated until trust is granted
Explicit analysis profiles — "deeper review" is a deliberate follow-up, not silent threshold drift
Hard version checks — if the IDE client quietly talks to the wrong local server version, you do not get a tool you can trust; you get folklore

That last one mattered more than I expected.

Claude Desktop and Codex speak the same contract

I also added native client paths for Claude Desktop and Codex.

The goal was not "be available in more places." It was keeping one analysis contract across all of them:

no second analyzer
no plugin-specific findings
no AI-only semantics
no client that quietly disagrees with the CLI

Claude Desktop gets a local .mcpb bundle with pre-loaded review instructions.
Codex gets a native plugin with two focused skills — full review and quick hotspot discovery. Both sit on top of the same codeclone-mcp server.

That may sound boring, but boring is good here. The more clients you add, the easier it becomes to fork your own semantics without noticing. A lot of the b4 work was about resisting exactly that.

Conservative first, deeper only when you mean it

CodeClone defaults are intentionally conservative. That is the right first pass for CI, baseline-aware review, and agent-driven workflows.

But there is a real second need: sometimes the default pass looks clean, and you want to go hunting for smaller, more local repetition.

b4 makes that distinction explicit:

Start with defaults or pyproject.toml thresholds.
Use that as the stable first pass.
Lower thresholds only for an intentional deeper review.

This now shows up clearly in MCP help topics and in the VS Code analysis profiles.

"More sensitive" is not the same as "more correct." A clean conservative pass
does not prove there is no finer-grained repetition. But a lower-threshold exploratory pass should not quietly pretend to have the same meaning as the default profile. That distinction needed to become product-level.

MCP got smarter about guiding agents — and cheaper to talk to

Two things happened on the MCP side that are easy to miss but matter a lot in practice.

First: the help tool. In b3, agents had 20 analysis and query tools but no way to ask "what should I do next?" or "what does this baseline state mean?" without burning tokens on trial and error.

b4 adds a help(topic=...) tool with bounded, static answers for common uncertainty points: workflow sequencing, analysis profile semantics, baseline interpretation, suppression rules, review state, and changed-scope review. An agent can ask one cheap question instead of making three exploratory tool calls to figure out the right next step.

This is a small surface — seven topics, short answers, no dynamic analysis. But it changes the economics of agent workflows significantly. The difference between "the agent guesses and retries" and "the agent asks and proceeds" is often 3–5x in token cost.

Second: tighter token budgets across the board. b4 continued the budget-aware work from b3:

finding IDs are now sha256-based short forms instead of full canonical URIs
the derived section in MCP payloads is projected down to what agents actually need
metrics_detail is paginated with family and path filters so agents never pull the full metrics table by accident

None of this changes the canonical report — the JSON is still the complete truth. But the MCP view over it is now meaningfully leaner.

The boring fixes that matter most

Some of my favorite changes in b4 are not flashy:

setup guidance that matches the real install path
faster launcher failure behavior with clear error messages
stricter local version handling across all client surfaces
enriched MCP server instructions so agents get behavioral context on connect, not just a list of tools
terminology cleanup around module hotspots

This is not the kind of work that looks impressive in a screenshot. But it is exactly the kind of work that makes an engineering tool feel trustworthy over weeks and months.

What `b4` feels like

b1 — CodeClone became more than a clone detector.
b3 — it became a serious MCP server.
b4 — it started to feel coherent across the CLI, the report, MCP, and every client surface.

You can start in the editor. You can stay aligned with baseline-aware truth. You can inspect module-level pressure without turning it into fake gating. You can move between human and agent workflows without changing the underlying semantics.

That is much closer to what I wanted CodeClone to become.

Try it

uv tool install --pre codeclone        # core CLI (beta)
uv tool install --pre "codeclone[mcp]" # + MCP server for agents and IDEs
codeclone .                            # analyze the current project
codeclone . --html --open-html-report  # open the interactive report

GitHub — source, extensions, plugin
Docs — contracts, guides, live report
MCP guide — agent and IDE setup
PyPI

If you are building review workflows around IDEs, MCP clients, or AI-assisted refactoring, I would love feedback on one question:

What makes a structural analysis tool feel trustworthy once it leaves the CLI and starts living inside real developer workflows?