Syncing Azure SQL Databases Across Subscriptions Using OpenShift CronJob (Without ADF)

# cronjobs# openshift# kubernetes# sql
Syncing Azure SQL Databases Across Subscriptions Using OpenShift CronJob (Without ADF)Falolu Olaitan

Sometimes you need to move data between environments without introducing a heavy ETL tool like Azure...

Sometimes you need to move data between environments without introducing a heavy ETL tool like Azure Data Factory.

In my case, I needed to sync a table from:

  • A Production Azure SQL Database
  • To a Development Azure SQL Database
  • Across different subscriptions
  • Over Private Endpoints
  • Running inside Azure Red Hat OpenShift (ARO)

The goal:

Incrementally sync new rows every 2 minutes.
No ADF. No linked servers. No manual exports.

Here’s how I built a lightweight, production-ready sync using:

  • bcp
  • sqlcmd
  • Kubernetes CronJob
  • A watermark table

Architecture Overview

  • Inside the OpenShift cluster:
  • A CronJob runs every 2 minutes
  • Reads the last synced ID from Dev
  • Exports new rows from Prod
  • Imports into Dev
  • Updates watermark

Networking:

  • Both SQL servers use Private Endpoints
  • privatelink.database.windows.net DNS zone linked properly
  • No public database access

Step 1: Create a Watermark Table in Dev

This keeps track of what has already been synced.

CREATE TABLE dbo.DataSyncWatermark (
  TableName sysname PRIMARY KEY,
  LastSyncedId bigint NOT NULL DEFAULT(0),
  UpdatedAt datetime2 NOT NULL DEFAULT SYSUTCDATETIME()
);

INSERT INTO dbo.DataSyncWatermark(TableName, LastSyncedId)
VALUES ('dbo.ActivityLogs', 0);

Enter fullscreen mode Exit fullscreen mode

Step 2: Example Table Schema (Generic)

To keep this reusable, here’s a sample table structure:

CREATE TABLE dbo.ActivityLogs (
  Id bigint IDENTITY(1,1) NOT NULL PRIMARY KEY,
  UserId nvarchar(100),
  ActionType nvarchar(200),
  Payload nvarchar(max),
  Response nvarchar(max),
  StatusCode int,
  CreatedAt datetime2,
  ExtraMetadata nvarchar(2048)
);
Enter fullscreen mode Exit fullscreen mode

Both Prod and Dev must have identical structure.

Step 3: Create Kubernetes Secret

Never hardcode credentials in YAML.

Create a secret like this:

oc create secret generic sql-sync-secret \
  --from-literal=PROD_SERVER=prod-sql.database.windows.net \
  --from-literal=PROD_DB=prod_database \
  --from-literal=PROD_USER=sync_user \
  --from-literal=PROD_PASS='StrongPassword!' \
  --from-literal=DEV_SERVER=dev-sql.database.windows.net \
  --from-literal=DEV_DB=dev_database \
  --from-literal=DEV_USER=sync_user \
  --from-literal=DEV_PASS='StrongPassword!'

Enter fullscreen mode Exit fullscreen mode

Now the CronJob can safely consume these via environment variables.

Step 4: Required SQL Permissions

On Prod:

GRANT SELECT ON dbo.ActivityLogs TO sync_user;

Enter fullscreen mode Exit fullscreen mode

On Dev:

GRANT SELECT, INSERT ON dbo.ActivityLogs TO sync_user;
GRANT SELECT, INSERT, UPDATE ON dbo.DataSyncWatermark TO sync_user;
GRANT ALTER ON dbo.ActivityLogs TO sync_user;

Enter fullscreen mode Exit fullscreen mode

ALTER is required because we preserve identity values during import.

Step 5: The CronJob YAML

This version:

  • Runs every 2 minutes
  • Handles incremental sync
  • Avoids collation errors
  • Preserves identity
  • Avoids duplicate inserts
  • Auto-detects sqlcmd and bcp
apiVersion: batch/v1
kind: CronJob
metadata:
  name: sql-table-sync
spec:
  schedule: "*/2 * * * *"
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: Never
          containers:
            - name: runner
              image: mcr.microsoft.com/mssql-tools
              envFrom:
                - secretRef:
                    name: sql-sync-secret
              command: ["/bin/bash","-lc"]
              args:
                - |
                  set -e

                  SQLCMD="/opt/mssql-tools/bin/sqlcmd"
                  BCP="/opt/mssql-tools/bin/bcp"

                  TABLE="dbo.ActivityLogs"
                  WM="dbo.DataSyncWatermark"

                  LAST_ID=$($SQLCMD -S "$DEV_SERVER" -d "$DEV_DB" -U "$DEV_USER" -P "$DEV_PASS" \
                    -h -1 -W -Q "SET NOCOUNT ON; SELECT LastSyncedId FROM $WM WHERE TableName = '$TABLE';")

                  DEV_MAX_ID=$($SQLCMD -S "$DEV_SERVER" -d "$DEV_DB" -U "$DEV_USER" -P "$DEV_PASS" \
                    -h -1 -W -Q "SET NOCOUNT ON; SELECT ISNULL(MAX(Id),0) FROM $TABLE;")

                  if [ "$LAST_ID" -lt "$DEV_MAX_ID" ]; then
                    LAST_ID="$DEV_MAX_ID"
                  fi

                  NEW_COUNT=$($SQLCMD -S "$PROD_SERVER" -d "$PROD_DB" -U "$PROD_USER" -P "$PROD_PASS" \
                    -h -1 -W -Q "SET NOCOUNT ON; SELECT COUNT(1) FROM $TABLE WHERE Id > ${LAST_ID};")

                  if [ "$NEW_COUNT" = "0" ]; then
                    exit 0
                  fi

                  $BCP "
                  SELECT
                    Id,
                    UserId COLLATE DATABASE_DEFAULT,
                    ActionType COLLATE DATABASE_DEFAULT,
                    Payload COLLATE DATABASE_DEFAULT,
                    Response COLLATE DATABASE_DEFAULT,
                    StatusCode,
                    CreatedAt,
                    ExtraMetadata COLLATE DATABASE_DEFAULT
                  FROM $TABLE
                  WHERE Id > ${LAST_ID}
                  ORDER BY Id
                  " queryout /tmp/data.dat \
                    -S "$PROD_SERVER" -d "$PROD_DB" -U "$PROD_USER" -P "$PROD_PASS" -n

                  $BCP $TABLE in /tmp/data.dat \
                    -S "$DEV_SERVER" -d "$DEV_DB" -U "$DEV_USER" -P "$DEV_PASS" \
                    -n -E -b 5000

                  $SQLCMD -S "$DEV_SERVER" -d "$DEV_DB" -U "$DEV_USER" -P "$DEV_PASS" -Q "
                    UPDATE $WM
                    SET LastSyncedId = (SELECT MAX(Id) FROM $TABLE),
                        UpdatedAt = SYSUTCDATETIME()
                    WHERE TableName = '$TABLE';
                  "

Enter fullscreen mode Exit fullscreen mode

Common Issues I Faced

  1. Private DNS returning NXDOMAIN Fix: Ensure both SQL servers have private endpoints and DNS zone groups attached.
  2. Collation errors Fix: Add COLLATE DATABASE_DEFAULT to string columns in export query.
  3. Duplicate primary key errors Fix: Align watermark with MAX(Id) in Dev.
  4. Table does not exist” error Fix: Grant proper INSERT + ALTER permissions.

Why This Approach Works

  • No external ETL tool required
  • Lightweight
  • Kubernetes-native
  • Works across subscriptions
  • Fully automated
  • Secure over Private Link
  • Easy to generalize for other tables

When You Should Use Something Else

If you need:

  • Near real-time replication
  • Massive table sync
  • Transformations
  • CDC
  • Multi-region replication

Then consider:

  • Azure SQL replication
  • Azure Data Factory
  • Change Data Capture
  • Streaming architecture

Final Thoughts

Sometimes you don’t need a heavy data pipeline.

A well-designed incremental job, proper networking, and a watermark table can solve the problem cleanly.