How to Build an Automated B2B Lead Enrichment Pipeline in n8n

How to Build an Automated B2B Lead Enrichment Pipeline in n8n

# automation# dataextraction# datapipelines# api
How to Build an Automated B2B Lead Enrichment Pipeline in n8nAlterLab

B2B lead databases go stale quickly. The most accurate data source for a company is its own website....

B2B lead databases go stale quickly. The most accurate data source for a company is its own website. You can build an automated enrichment pipeline in n8n to extract fresh data from company sites and push clean JSON directly into your CRM.

This guide details how to set up an n8n workflow that triggers when a new lead is created, fetches the target company's website, extracts structured data, and updates the CRM record.

The Core Extraction Concept

Extracting data from B2B websites used to require maintaining hundreds of custom CSS selectors. Every time a company redesigned their marketing site, the scraper broke.

Modern extraction relies on AI models to parse the DOM and return structured JSON based on a schema. You define the fields you want. The API handles the mapping. This approach survives layout changes and completely removes the need for DOM traversal code.

Pipeline Architecture

An n8n pipeline orchestrates the data movement. It acts as the glue between your CRM and the extraction API. By design, the n8n instance should not perform the heavy lifting of browser rendering. It delegates the extraction task and waits for the response.

API Integration Overview

Before diving into the visual n8n setup, look at the underlying API calls. You can manage this logic directly in code using our Python SDK if you prefer code over visual nodes.

Here is how you request specific data points from a target URL using Python. We define a precise schema dictating the exact keys and types we expect in the response.

```python title="enrichment.py" {7-12}

client = alterlab.Client("YOUR_API_KEY")

response = client.scrape(
url="https://example-company.com",
extract_schema={
"company_name": "string",
"value_proposition": "string",
"contact_email": "string"
}
)

print(json.dumps(response.data, indent=2))




And the equivalent cURL command. This is exactly what the n8n HTTP Request node will execute under the hood.



```bash title="Terminal" {4-11}
curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example-company.com",
    "extract_schema": {
      "company_name": "string",
      "value_proposition": "string",
      "contact_email": "string"
    }
  }'
Enter fullscreen mode Exit fullscreen mode

Step 1: Configuring the n8n Trigger

Start your n8n workflow with a Webhook node. Set the HTTP Method to POST.
Your CRM will send a payload to this webhook URL whenever a new lead is created. The payload must include the lead's company domain or website URL.

Example incoming payload from your CRM system:

```json title="Incoming Webhook Payload"
{
"lead_id": "987654",
"domain": "example-company.com",
"timestamp": "2026-05-03T10:00:00Z"
}




Ensure the webhook responds immediately with a 200 OK to prevent CRM timeouts. Do not wait for the entire scraping pipeline to finish before responding to the CRM webhook. Use n8n's "Respond to Webhook" node early in the flow.

## Step 2: The Extraction Node

Add an HTTP Request node to your n8n workflow. This node calls the scraping API to perform the actual data extraction. Configure the node with these settings:

- **Method**: POST
- **URL**: `https://api.alterlab.io/v1/scrape`
- **Authentication**: Header Auth (Set `X-API-Key` to your AlterLab API key)

In the Body Parameters, set up your extraction schema. Reference the domain from the previous webhook node dynamically. In n8n expression syntax, this looks like `{{ $json.body.domain }}`.



```json title="n8n Body Parameters"
{
  "url": "https://{{ $json.body.domain }}",
  "extract_schema": {
    "company_name": "string",
    "industry": "string",
    "features": ["string"],
    "target_audience": "string"
  }
}
Enter fullscreen mode Exit fullscreen mode

The API will fetch the site, execute JavaScript if necessary, run the AI extraction, and return a clean JSON object containing the requested fields. For all available extraction parameters, read the API docs.

Step 3: Handling Errors and Validation

Websites go offline. Domains expire. Companies block geographic regions. Scraping pipelines must account for missing data and failed requests.

Add a Switch node in n8n after the HTTP Request node to check the API response status code.

If the extraction fails or the target server returns a 404, route the workflow to a fallback path. This path should update the CRM lead with a flag indicating manual review is required.

If the extraction succeeds, proceed to data mapping. Cortex AI guarantees the output will match your JSON schema. You do not need complex null checks for the structure itself. The fields will exist, though they may contain empty strings if the requested data was simply not present on the target page.

Step 4: Updating the CRM

Add your CRM's specific n8n node (like HubSpot, Salesforce, or Pipedrive).
Map the extracted JSON fields directly to your custom CRM properties using n8n's visual mapper.

  • Map response.data.company_name to the CRM Company Name field.
  • Map response.data.industry to the CRM Industry category dropdown.
  • Map response.data.value_proposition to a custom text field or note.

By automating this mapping, your sales team gets fully enriched lead context without performing manual research.

Bypassing Bot Protections Reliably

Many B2B sites use aggressive bot protection rules to block automated traffic. If you attempt to scrape these sites with a standard HTTP library, you will receive 403 Forbidden errors or CAPTCHA challenges. Systems evaluate incoming requests based on TLS fingerprints, IP address reputation, and browser execution environments.

When targeting modern single-page applications or sites with strict security profiles, you need robust anti-bot handling. The API handles proxy rotation, header spoofing, and headless browser rendering automatically behind the scenes. You do not need to configure Playwright or Puppeteer inside your n8n instance.

By delegating the extraction logic to an external API, you keep your n8n instance lightweight. Headless browsers consume significant RAM. Running them directly in n8n can crash the container during concurrent executions.

Expanding the Pipeline

Once the basic enrichment pipeline is active, you can expand its capabilities. You can add a branch to check the company's pricing page specifically. Modify the schema to ask for pricing_tiers or has_enterprise_plan.

You can also use cron triggers in n8n to re-scrape your target accounts monthly. This allows you to track changes in a company's messaging or feature set over time. If a target account updates their features page, the pipeline can alert your sales team automatically.

Takeaway

Automating B2B lead enrichment in n8n replaces manual research with clean, structured data collection. By combining a webhook trigger, an HTTP request to an AI extraction API, and CRM integration, you ensure sales teams have fresh context for every lead. Offloading the browser rendering and bot bypass to an API keeps the pipeline stable and your infrastructure costs low.