HTML to PDF in C#: why it needs a browser engine, and how to get it right using CobaltPDF.

HTML to PDF in C#: why it needs a browser engine, and how to get it right using CobaltPDF.

# pdf# dotnet# webdev# csharp
HTML to PDF in C#: why it needs a browser engine, and how to get it right using CobaltPDF.James Hanson

"Can we generate a PDF of this page?" is one of those tickets that looks like an afternoon and turns...

"Can we generate a PDF of this page?" is one of those tickets that looks like an afternoon and turns into a week. I've shipped this a few times now, and the hard part is almost never the PDF, it's faithfully reproducing a modern web page. Let's unpack why, look at the options, and then walk through a concrete implementation.

The deceptively hard problem

Say you need a PDF of an invoice page, a dashboard, or a full article from a CMS. What does "convert this to PDF" actually require?

  • Real CSS layout, Flexbox, Grid, @media print, custom fonts, the works. Your PDF has to look like what a browser paints, not an approximation.
  • JavaScript, a huge amount of the web renders content client-side. If your tool can't run JS, half the page is missing.
  • Web fonts, @font-face, Google Fonts, icon fonts. Get this wrong and text reflows or falls back to Times New Roman.
  • Cookies and consent walls, many sites won't show real content until a consent cookie is set, and they redirect or overlay a banner otherwise.
  • Client-side redirects, example.com quietly becomes www.example.com or a localized domain via JavaScript after the page loads.
  • Lazy-loaded images, images below the fold don't load until you scroll. Capture too early and they're blank.
  • Knowing when the page is "done", network idle? A specific element present? A fixed delay? Capture at the wrong moment and you get a half-rendered page.

None of this is exotic. It's just… the web. And it's why a surprising number of "HTML to PDF" solutions fall over on the first real-world URL you throw at them.

Why you (usually) need a browser engine

There are broadly two families of HTML-to-PDF in .NET:

1. Markup-to-PDF libraries that implement their own (partial) HTML/CSS.
These parse a subset of HTML/CSS and draw it onto a PDF canvas. They're fast, dependency-light, and great for fixed templates you fully control, a receipt, a label, a simple statement. But they don't run JavaScript and only support a slice of CSS, so they break on anything resembling the real web.

2. Real browser engines (Chromium / WebKit).
These run the actual web platform, the same layout, font, and JS engine a browser uses, then print to PDF. If you're rendering arbitrary URLs, CMS output, dashboards, or anything with modern CSS/JS, this is what you need. The cost is that you're now shipping and operating a browser engine.

This is exactly the history of wkhtmltopdf: it was popular for years precisely because it used a real engine (an old WebKit fork) instead of a CSS subset. The catch today is that its engine is frozen circa-2016, no modern CSS, no current JS, and the project is archived/unmaintained. It still works for simple, static templates, but it's a poor fit for the modern web.

So the practical question isn't "browser or no browser", for real pages it's a browser. The question is which engine, and how do you run it well (warm, pooled, and on Linux without a fragile 2 GB image).

The landscape (the honest version)

Before the worked example, here's how I think about the options:

Approach Tools Cost In short
Drive a browser yourself Playwright for .NET, PuppeteerSharp Free, open source Full control, but you own pooling, cookies, redirects, lazy-load, and the Linux story
Commercial libraries IronPDF, Syncfusion, Aspose Paid Mature and easy to start; engine and Linux support vary, so check yours
Legacy engine wkhtmltopdf (DinkToPdf) Free, archived Abandoned and archived since early 2023, no security patches, stuck on a 2016 era engine
Hosted API DocRaptor, PDFShift Pay per call No renderer in your stack, but your HTML and data leave your infrastructure, so it is a third party security and privacy concern
Managed browser engine CobaltPDF / CobaltPDF.WebKit Free forever (watermarked), license removes the watermark Chromium or WebKit, warm pool, cookies and lazy-load handled for you

There's no single "best", it depends on whether you're rendering real web pages, how much you want to operate, and your budget. For the rest of this post I'll use a browser-engine library because that's the case most people land on for real pages.

A worked example with CobaltPDF

CobaltPDF is a .NET library that wraps a managed, warm browser pool and handles the "hard parts" above. The same ideas apply if you roll your own with Playwright, you'll just be writing the plumbing yourself.

Install

dotnet add package CobaltPDF
Enter fullscreen mode Exit fullscreen mode

The first PDF

using CobaltPdf;

await new CobaltEngine()
    .WithPaperFormat("A4")
    .RenderHtmlAsPdfAsync("<h1>Hello, PDF</h1><p>Rendered by a real engine.</p>")
    .SaveAsAsync("hello.pdf");

var pdf = await new CobaltEngine().RenderUrlAsPdfAsync("https://example.com");
await File.WriteAllBytesAsync("example.pdf", pdf.BinaryData);
Enter fullscreen mode Exit fullscreen mode

Everything here runs in free evaluation mode, so you can follow along without a license; add CobaltEngine.SetLicense("...") for production.

Handling the actual web page

This is where the earlier list of hard parts shows up. Capturing a real URL, consent cookie, client-side redirect, lazy images, and waiting for the page to settle, looks like this:

var pdf = await new CobaltEngine()
    .WithViewportSize(1280)                          // render at a desktop width
    .AddCookie("consent", "accepted", ".example.com")// get past the consent wall
    .WithWaitStrategy("networkIdle")                  // wait until the page settles
    .WithLazyLoadPages(5)                             // scroll to trigger lazy images
    .RenderUrlAsPdfAsync("https://example.com/article");
Enter fullscreen mode Exit fullscreen mode

Each option maps to one of the problems from the top: viewport → layout, cookie → consent, wait strategy → "is it done", lazy-load scroll → below-the-fold images. (Cookies are committed before navigation and re-asserted after client-side redirects, so a bbc.com → bbc.co.uk-style hop still works.)

Document options you'll actually use

var pdf = await new CobaltEngine()
    .WithPaperFormat("A4")
    .WithMargins("15mm")
    .WithHeader("<div style='font-size:10px;text-align:center'>Acme Corp</div>")
    .WithFooter("<div style='font-size:10px;text-align:center'>Page <span class='pageNumber'></span></div>")
    .WithWatermark("<div style='font-size:80px;color:rgba(255,0,0,.15)'>DRAFT</div>")
    .WithEncryption(userPassword: "open-me", allowPrinting: true, allowCopying: false)
    .RenderHtmlAsPdfAsync(invoiceHtml);
Enter fullscreen mode Exit fullscreen mode

The performance gotcha everyone hits first

Launching a browser per request will fall over under load. Keep a warm pool: configure once at startup, register the engine as a singleton.

builder.Services.AddSingleton(_ =>
{
    CobaltEngine.Configure(o =>
    {
        CloudEnvironment.ConfigureForDocker(o); // correct engine flags for containers
        o.MinSize = 1;   // keep one warm
        o.MaxSize = 4;   // concurrency ceiling
    });
    return new CobaltEngine();
});

await CobaltEngine.PreWarmAsync(); // pay warm-up at boot, not on the first request
Enter fullscreen mode Exit fullscreen mode

This is the part you'd otherwise build yourself with Playwright, pooling, recycling, and concurrency limits.

Two standalone libraries, pick the engine that fits

Here's the key thing: CobaltPDF comes in two flavors, one powered by Chromium and one by WebKit. They are separate, standalone NuGet packages with the same API, so you simply pick the engine that fits your needs and install that one.

  • CobaltPDF (Chromium), choose this for maximum fidelity (the exact rendering Chrome produces, including bleeding-edge CSS/JS) and when you deploy on Windows or Linux. It's the faster, highest-fidelity engine.
  • CobaltPDF.WebKit (WebKit), choose this for Linux when you want a leaner idle footprint and a small, self-contained deploy (it provisions a WebKitGTK bundle on first run). It covers the modern web well; the trade-off is speed and fidelity vs. Chromium. On Windows, CobaltPDF.WebKit runs automatically inside WSL or Docker for development, but the WebKit edition is built to be deployed on Linux in production.

They expose the same CobaltEngine fluent API, so if you ever need to move from one to the other it's essentially a one-line using change, but they're independent libraries, and an app uses one of them.

// CobaltPDF.WebKit, identical code, different package/namespace
using CobaltPdf.WebKit;
var pdf = await new CobaltEngine().RenderHtmlAsPdfAsync(html);
Enter fullscreen mode Exit fullscreen mode

How to choose, quickly:

You want… Pick
Exact Chrome fidelity, fastest renders, Windows support CobaltPDF (Chromium)
A lean, self-contained Linux deploy, lower idle memory CobaltPDF.WebKit

Bonus: keep the engine out of your other services

If more than one service needs PDFs, you don't want a browser in each. There's a tiny shared package, CobaltPDF.Requests, that's just a serializable PdfRequest model. A client references only that (no engine), builds a request, and POSTs it to a render service:

using CobaltPdf.Requests;

var request = PdfRequest.ForHtml("<h1>Statement</h1>")
    .WithPaperFormat("A4")
    .Build();

using var http = new HttpClient();
using var resp = await http.PostAsJsonAsync("https://pdf.internal/api/render", request);
var bytes = await resp.Content.ReadAsByteArrayAsync();
Enter fullscreen mode Exit fullscreen mode

The render service (an ASP.NET endpoint or Azure Function) turns it into a PDF with whichever engine it installed:

app.MapPost("/api/render", async (PdfRequest request, CancellationToken ct) =>
{
    var pdf = await request.ExecuteAsync(new CobaltEngine(), ct);
    return Results.File(pdf.BinaryData, "application/pdf", "render.pdf");
});
Enter fullscreen mode Exit fullscreen mode

Running on Linux (easier than it sounds)

CobaltPDF handles the fiddly parts for you. The engine ships inside the NuGet package, the right binary is picked automatically, and the correct Linux launch flags are set for you. WebKit even provisions its own self-contained bundle on first run.

  • Managed cloud Linux (Azure App Service, Functions, Container Apps): just works, nothing to install.
  • Plain Docker image: add one apt-get line. The Docker guide has it.
  • Azure Functions: use a Premium or Dedicated plan so the warm pool stays alive.

Takeaways

  • For real web pages, you need a browser engine, markup-to-PDF libraries only work for fixed templates you control.
  • The genuine difficulty is the page lifecycle: cookies, redirects, lazy-loading, fonts, and knowing when to capture. Budget for those, whichever tool you pick.
  • Pool your renderer and plan your Linux deployment early.
  • If you try CobaltPDF: it's two standalone libraries, CobaltPDF (Chromium) for fidelity/Windows, CobaltPDF.WebKit for a lean Linux footprint, sharing one API. Pick one.

Whatever you choose, I hope the "why" here saves you the week it took me to learn it. Questions and war stories welcome in the comments.

Links

Disclosure: I maintain CobaltPDF, and I use it for the worked example later in this post. I've tried to give the alternatives a fair, honest treatment, the concepts in the first half apply to any browser-based approach, whichever library you choose.