
Abe WheelerThe sunpeak simulator tests cover a lot. They replicate the ChatGPT and Claude runtimes, run display...
The sunpeak simulator tests cover a lot. They replicate the ChatGPT and Claude runtimes, run display mode transitions, test themes, and validate tool invocations without any paid accounts or AI credits. For most development work, they're enough.
But simulators don't catch everything. Real ChatGPT wraps your app in a nested iframe sandbox. The MCP protocol goes through ChatGPT's actual connection layer. Resource loading happens over a real network with production builds. There's a gap between "works in the simulator" and "works in ChatGPT," and the only way to close it is to test against the real thing.
sunpeak 0.16.23 adds live testing: automated Playwright tests that run against real ChatGPT. You write the same kind of assertions you write for simulator tests, and sunpeak handles authentication, MCP server refresh, host-specific message formatting, and iframe traversal.
TL;DR: Run pnpm test:live with a tunnel active. sunpeak imports your browser session, starts the dev server, refreshes the MCP connection, and runs your tests/live/*.spec.ts files in parallel against real ChatGPT. You write assertions against the app iframe. Everything else is automated.
A live test opens a real ChatGPT session in a browser, types a message that triggers your MCP tool, waits for ChatGPT to call it, and then asserts against the rendered app inside the host's iframe.
Here's a complete live test for an albums resource:
import { test, expect } from 'sunpeak/test';
test('albums tool renders photo grid', async ({ live }) => {
const app = await live.invoke('show-albums');
await expect(app.getByText('Summer Slice')).toBeVisible({ timeout: 15_000 });
await expect(app.locator('img').first()).toBeVisible();
// Switch to dark mode without re-invoking the tool
await live.setColorScheme('dark', app);
await expect(app.getByText('Summer Slice')).toBeVisible();
});
live.invoke('show-albums') starts a new chat, sends /{appName} show-albums to ChatGPT, waits for the LLM response to finish streaming, waits for the app iframe to render, and returns a Playwright FrameLocator pointed at your app's content. From there, it's standard Playwright assertions.
The { timeout: 15_000 } accounts for the LLM response time. ChatGPT needs to process your message, decide to call the tool, receive the result, and render the iframe. In practice this takes 5 to 10 seconds.
You need three things:
/mcp path)You do not need to install anything extra in your sunpeak project. Live test infrastructure ships with sunpeak starting at v0.16.23. New projects scaffolded with sunpeak new include example live test specs and the Playwright config.
Open two terminals:
# Terminal 1: Start a tunnel
ngrok http 8000
# Terminal 2: Run live tests
pnpm test:live
On first run, sunpeak imports your ChatGPT session from your browser. It checks Chrome, Arc, Brave, and Edge automatically. If no valid session is found, it opens a browser window and waits for you to log in. The session is saved to tests/live/.auth/chatgpt.json and reused for 24 hours.
After authentication, sunpeak:
sunpeak dev --prod-resources (production resource builds)tests/live/*.spec.ts files fully in parallel, each in its own chat windowThe MCP refresh happens once in globalSetup, before any test workers start. This means your test workers don't each individually refresh the connection, which would be slow and flaky.
All live tests import from sunpeak/test:
import { test, expect } from 'sunpeak/test';
The test function provides a live fixture with:
| Method | What it does |
|---|---|
invoke(prompt) |
Starts a new chat, sends the prompt with host-specific formatting, waits for the app iframe, returns a FrameLocator
|
sendMessage(text) |
Sends a message in the current chat with /{appName} prefix |
sendRawMessage(text) |
Sends a message without any prefix |
startNewChat() |
Opens a fresh conversation |
waitForAppIframe() |
Waits for the MCP app iframe and returns a FrameLocator
|
setColorScheme(scheme, appFrame?) |
Switches to 'light' or 'dark' via page.emulateMedia()
|
page |
Raw Playwright Page object |
Most tests only need invoke and setColorScheme. The invoke method handles the full flow: new chat, message formatting (ChatGPT requires /{appName} before your prompt), waiting for streaming to finish, waiting for the nested iframe to render, and returning a locator into your app's content.
Sending a second message to trigger a new tool call is slow and burns credits. setColorScheme avoids that by switching the browser's prefers-color-scheme via Playwright's page.emulateMedia(). ChatGPT propagates the change into the iframe, and your app re-renders with the new theme.
test('ticket card text stays readable in dark mode', async ({ live }) => {
const app = await live.invoke('show-ticket');
const title = app.getByText('Search results not loading on mobile');
await expect(title).toBeVisible({ timeout: 15_000 });
// Verify status badge and assignee are visible in light mode
await expect(app.getByText('in progress')).toBeVisible();
await expect(app.getByText('Sarah Chen')).toBeVisible();
// Switch to dark mode — common bugs: text blends into background,
// borders disappear, badge colors lose contrast
await live.setColorScheme('dark', app);
// Same elements should still be visible with the new theme applied
await expect(title).toBeVisible();
await expect(app.getByText('in progress')).toBeVisible();
await expect(app.getByText('Sarah Chen')).toBeVisible();
// Badge background should still be distinguishable from the card
const badge = app.locator('span:has-text("high")');
const badgeBg = await badge.evaluate(
(el) => window.getComputedStyle(el).backgroundColor
);
expect(badgeBg).not.toBe('rgba(0, 0, 0, 0)');
});
The second argument to setColorScheme tells it to wait for the app's <html data-theme="dark"> attribute to confirm the theme propagated through the iframe boundary before your assertions run.
Here's a live test for a review card resource. It invokes the tool, checks the rendered content, verifies a button interaction triggers a state transition, and confirms the card re-themes correctly in dark mode:
import { test, expect } from 'sunpeak/test';
test('review card renders and handles approval flow', async ({ live }) => {
const app = await live.invoke('review-diff');
// Verify the card rendered with the right content
const title = app.locator('h1').first();
await expect(title).toBeVisible({ timeout: 15_000 });
await expect(title).toHaveText('Refactor Authentication Module');
// Action buttons present
const applyButton = app.getByRole('button', { name: 'Apply Changes' });
await expect(applyButton).toBeVisible();
// Theme switch: card should stay readable in dark mode
await live.setColorScheme('dark', app);
await expect(title).toBeVisible();
await expect(applyButton).toBeVisible();
// Click Apply Changes — UI transitions to accepted state
await applyButton.click();
await expect(applyButton).not.toBeVisible({ timeout: 5_000 });
await expect(
app.locator('text=Applying changes...').first()
).toBeVisible({ timeout: 5_000 });
});
This catches real issues that simulator tests can miss: the iframe sandbox blocking a script load, a theme change not propagating through the nested iframe boundary, or a button click failing because of host-specific event handling.
The live test config is a one-liner:
// tests/live/playwright.config.ts
import { defineLiveConfig } from 'sunpeak/test/config';
export default defineLiveConfig();
This generates a full Playwright config with:
globalSetup pointing to sunpeak's auth and MCP refresh flowheadless: false because chatgpt.com blocks headless browsers--prod-resources on a dynamically allocated portYou can pass options to customize the environment:
export default defineLiveConfig({
colorScheme: 'dark',
viewport: { width: 1440, height: 900 },
locale: 'fr-FR',
timezoneId: 'Europe/Paris',
geolocation: { latitude: 48.8566, longitude: 2.3522 },
permissions: ['geolocation'],
});
Live tests don't replace simulator tests. They complement them.
Simulator (pnpm test:e2e) |
Live (pnpm test:live) |
|
|---|---|---|
| Runs against | Local simulator | Real ChatGPT |
| Speed | Seconds | 10-30 seconds per test |
| Cost | Free | Requires ChatGPT Plus |
| CI/CD | Yes | Not recommended (needs auth) |
| Catches | Component logic, display modes, themes, cross-host layout | Real MCP connection, LLM tool invocation, iframe sandbox, production resource loading |
Use simulator tests for development and CI/CD. Use live tests before shipping, after major changes, or when debugging issues that only reproduce in the real host.
A Claude Connector built with sunpeak now has three test tiers:
pnpm test): Vitest, jsdom, fast, test component logic in isolationpnpm test:e2e): Playwright against the local ChatGPT and Claude simulator, test display modes and themes, runs in CI/CD
pnpm test:live): Playwright against real ChatGPT (with Claude coming soon), test real MCP protocol behavior and iframe renderingEach tier catches different classes of bugs. Unit tests catch logic errors. Simulator tests catch rendering and layout issues across hosts and display modes. Live tests catch protocol and sandbox issues that only show up in the real host environment.
All three are pre-configured when you run sunpeak new. You don't need to set up Vitest, Playwright, or any test infrastructure yourself.
The live test infrastructure is designed to support multiple hosts. The live fixture resolves the correct host page object based on the Playwright project name. All host-specific DOM interaction (selectors, login flow, settings navigation, iframe nesting) lives in per-host page objects that sunpeak maintains.
Your test code is host-agnostic:
import { test, expect } from 'sunpeak/test';
test('my resource renders', async ({ live }) => {
const app = await live.invoke('show me something');
await expect(app.locator('h1')).toBeVisible();
});
This same test will run against any host that sunpeak supports. Today that's ChatGPT. When Claude live testing ships, add it with one line:
// tests/live/playwright.config.ts
export default defineLiveConfig({ hosts: ['chatgpt', 'claude'] });
No changes to your test files.
If you have an existing sunpeak project, update to v0.16.23 or later:
pnpm add sunpeak@latest && sunpeak upgrade
Create tests/live/playwright.config.ts:
import { defineLiveConfig } from 'sunpeak/test/config';
export default defineLiveConfig();
Add the test script to package.json:
{
"scripts": {
"test:live": "playwright test --config tests/live/playwright.config.ts"
}
}
Write your first live test in tests/live/your-resource.spec.ts:
import { test, expect } from 'sunpeak/test';
test('my tool renders correctly in ChatGPT', async ({ live }) => {
const app = await live.invoke('your prompt here');
await expect(app.locator('your-selector')).toBeVisible({ timeout: 15_000 });
});
Start a tunnel, run pnpm test:live, and watch Playwright drive a real ChatGPT session.
New projects created with sunpeak new include all of this out of the box, with example live tests for every starter resource.