
Igor NosatovMySQL has always been... let's say "forgiving" with JSON. Too forgiving. Like a parent who lets their...
MySQL has always been... let's say "forgiving" with JSON. Too forgiving. Like a parent who lets their kid eat ice cream for breakfast because it's technically dairy.
Before MySQL 8.3:
-- This abomination? Totally fine in older versions
INSERT INTO products (metadata) VALUES ('{"price": 29.99, "price": 39.99}');
-- Duplicate keys? No problem! MySQL picks one. Which one? Β―\_(γ)_/Β―
After MySQL 8.3:
-- Now this rightfully explodes
INSERT INTO products (metadata) VALUES ('{"price": 29.99, "price": 39.99}');
-- ERROR 3141 (22032): Invalid JSON text in argument 1 to function json_extract:
-- "Duplicate key name 'price'."
Think of it like this: Old MySQL was autocorrect that changed "duck" to "duck" even when you meant "duck." New MySQL actually checks if you're making sense.
The Old Way (Chaos Reigns):
-- Multiple values for the same key? Russian roulette with data!
CREATE TABLE settings (
id INT PRIMARY KEY,
config JSON
);
INSERT INTO settings VALUES (1, '{"theme": "dark", "theme": "light"}');
-- Which theme wins? Depends on the phase of the moon.
The New Way (Sanity Returns):
-- MySQL 8.3+ enforces uniqueness
INSERT INTO settings VALUES (1, '{"theme": "dark", "theme": "light"}');
-- ERROR: Duplicate key 'theme' not allowed
-- Correct approach:
INSERT INTO settings VALUES (1, '{"theme": "dark", "previousTheme": "light"}');
Real-World Gotcha: We had a legacy API that merged user preferences from multiple sources without deduplication. It generated JSON like:
{
"notifications": true,
"language": "en",
"notifications": false,
"timezone": "UTC"
}
One notifications key from user settings, another from admin override. MySQL 8.2 silently picked one. MySQL 8.3 said "NOPE."
Migration Fix:
// Before sending to MySQL, deduplicate keys (last value wins)
function deduplicateKeys(obj) {
return JSON.parse(JSON.stringify(obj)); // Simplest way in Node.js
}
// Or be explicit about merge strategy
function mergeWithPriority(userPref, adminOverride) {
return { ...userPref, ...adminOverride }; // Admin wins
}
JSON can represent the same data in multiple ways. MySQL 8.3 enforces one canonical format.
Example: Number Formatting
-- These look the same to humans but not to MySQL
INSERT INTO analytics (data) VALUES ('{"value": 1.0}');
INSERT INTO analytics (data) VALUES ('{"value": 1}');
INSERT INTO analytics (data) VALUES ('{"value": 1.00}');
-- In MySQL 8.3+, internally stored as:
-- {"value": 1} -- No trailing zeros, minimal representation
Why This Matters:
-- Searching for exact JSON matches
SELECT * FROM analytics WHERE data = '{"value": 1.0}';
-- Before: Might or might not match depending on input format
-- After: Normalized internally, consistent matches
The Encoding Trap:
-- UTF-16 surrogates must be properly paired
INSERT INTO messages (content) VALUES ('{"emoji": "\uD83D\uDE00"}');
-- β
Valid UTF-16 surrogate pair for π
INSERT INTO messages (content) VALUES ('{"emoji": "\uD83D"}');
-- β Invalid: Lone high surrogate
-- ERROR: Invalid JSON text: "Invalid UTF-16 sequence"
ASCII Art Explanation:
Valid UTF-16 Surrogate Pair (Emoji):
βββββββββββ βββββββββββ
β \uD83D βββ \uDE00 β = π
β (High) β β (Low) β
βββββββββββ βββββββββββ
Invalid (Causes Error):
βββββββββββ
β \uD83D β β β Missing pair!
β (High) β
βββββββββββ
Old MySQL was like a lenient English teacher who accepted "definately" because they knew what you meant. New MySQL is a compiler.
Common Mistakes:
-- Unescaped control characters
INSERT INTO logs (entry) VALUES ('{"message": "Line1
Line2"}');
-- β Newlines must be escaped as \n
-- Correct version
INSERT INTO logs (entry) VALUES ('{"message": "Line1\\nLine2"}');
-- β
Properly escaped
-- Invalid escape sequences
INSERT INTO data (info) VALUES ('{"path": "C:\new\folder"}');
-- β \n is interpreted as newline, \f as form feed
-- Correct version (double backslashes)
INSERT INTO data (info) VALUES ('{"path": "C:\\\\new\\\\folder"}');
-- β
Escaped backslashes
Pro Tip: Use MySQL's JSON_QUOTE() function to auto-escape:
SET @unsafe_string = 'User said: "Hello\nWorld"';
INSERT INTO messages (content) VALUES (
JSON_OBJECT('text', @unsafe_string)
);
-- MySQL handles all escaping automatically
-- Audit existing JSON for duplicates
SELECT id, metadata
FROM products
WHERE JSON_LENGTH(JSON_KEYS(metadata)) !=
(SELECT COUNT(DISTINCT k.key)
FROM JSON_TABLE(
metadata,
'$.*' COLUMNS(key VARCHAR(255) PATH '$')
) AS k);
-- Find problematic encoding
SELECT id, content
FROM messages
WHERE content REGEXP '\\\\u[dD][89aAbB][0-9a-fA-F]{2}(?!\\\\u[dD][c-fC-F][0-9a-fA-F]{2})';
# Python script to clean JSON before MySQL 8.3 migration
import json
import mysql.connector
def deduplicate_json(json_str):
"""Remove duplicate keys (last value wins)"""
try:
obj = json.loads(json_str)
return json.dumps(obj) # Re-encoding removes duplicates
except json.JSONDecodeError:
return None
def validate_surrogates(json_str):
"""Check for invalid UTF-16 surrogates"""
import re
# Lone high surrogate pattern
if re.search(r'\\u[dD][89aAbB][0-9a-fA-F]{2}(?!\\u[dD][c-fC-F])', json_str):
return False
return True
# Apply to database
conn = mysql.connector.connect(...)
cursor = conn.cursor()
cursor.execute("SELECT id, metadata FROM products")
for (id, metadata) in cursor:
cleaned = deduplicate_json(metadata)
if cleaned and validate_surrogates(cleaned):
cursor.execute(
"UPDATE products SET metadata = %s WHERE id = %s",
(cleaned, id)
)
else:
print(f"β οΈ Row {id} needs manual review")
Never trust user input. Never trust API responses. Never trust past-you's code.
// Node.js example: Validate before INSERT
const mysql = require('mysql2/promise');
async function safeJsonInsert(connection, tableName, jsonData) {
// Validate JSON format
let parsed;
try {
parsed = JSON.parse(jsonData);
} catch (e) {
throw new Error(`Invalid JSON: ${e.message}`);
}
// Check for duplicate keys by round-tripping
const normalized = JSON.stringify(parsed);
if (normalized !== jsonData) {
console.warn('JSON was normalized (might have had duplicates)');
}
// Validate UTF-16 surrogates
const hasBadSurrogates = /\\u[dD][89aAbB][0-9a-fA-F]{2}(?!\\u[dD][c-fC-F])/.test(jsonData);
if (hasBadSurrogates) {
throw new Error('Invalid UTF-16 surrogate pair detected');
}
// Safe to insert
await connection.execute(
`INSERT INTO ${tableName} (data) VALUES (?)`,
[normalized]
);
}
Stricter rules = better optimization opportunities. Here's what we measured after upgrading:
Query Performance:
-- Finding exact JSON matches
EXPLAIN SELECT * FROM products WHERE metadata = '{"category": "electronics"}';
-- Before MySQL 8.3:
-- Full table scan + JSON parsing = 2.3s on 1M rows
-- After MySQL 8.3:
-- Normalized comparison + better indexing = 0.4s on 1M rows
Why? Normalized storage means:
Storage Savings:
-- Our 10M product database
-- Before: 4.2 GB (with duplicates and varied number formats)
-- After: 3.8 GB (normalized, deduplicated)
-- Savings: ~10% just from cleaner JSON
-- Totally legal, totally evil
INSERT INTO data (info) VALUES ('{"": "I am a key with no name"}');
-- β
Allowed by JSON spec and MySQL
-- Access it with JSON_EXTRACT
SELECT JSON_EXTRACT(info, '$."" ') FROM data;
My advice: Ban them in application validation. Future-you will thank present-you.
-- MySQL has no depth limit, but should you test it?
INSERT INTO crazy (data) VALUES (
'{"a":{"b":{"c":{"d":{"e":{"f":{"g":{"h":{"i":{"j":"deep"}}}}}}}}}}'
);
-- Performance degrades after ~20 levels
SELECT JSON_EXTRACT(data, '$.a.b.c.d.e.f.g.h.i.j') FROM crazy;
-- 100 rows: 0.02s
-- 100,000 rows: 12s (ouch)
Solution: Flatten when possible:
-- Instead of:
{"user": {"profile": {"settings": {"theme": "dark"}}}}
-- Consider:
{"user_profile_settings_theme": "dark"}
-- Or use separate columns for frequently queried paths
Use MySQL 8.3+ from day one. The stricter rules will save you from:
# Our upgrade checklist
1. β
Audit all JSON columns
2. β
Clean bad data in MySQL 8.2
3. β
Add app-level validation
4. β
Deploy validation to prod (still on 8.2)
5. β
Monitor for 2 weeks
6. β
Snapshot database
7. β
Upgrade staging to 8.3
8. β
Run full test suite
9. β
Upgrade prod during maintenance window
10. β
Keep snapshot for 30 days (just in case)
Permissive systems feel easier... until they're not.
MySQL 8.3's strict JSON rules aren't punishmentβthey're guard rails on a cliff you didn't know you were driving near.
MySQL 8.3+ enforces three critical JSON rules:
{"a": 1, "a": 2} is now an errorAction Items: