DatastratoAuthor: Yu hui Last Updated: 2026-02-11 Overview In this tutorial, you will learn how to...
Author: Yu hui
Last Updated: 2026-02-11
In this tutorial, you will learn how to integrate Apache Gravitino with Trino to enable query federation across multiple data sources through a unified metadata layer. By the end of this guide, you'll be able to configure Trino to automatically load catalogs from Gravitino and run cross-catalog queries seamlessly.
What you'll accomplish:
Trino is a distributed SQL query engine designed for fast analytic queries against data of any size. In modern data architectures, organizations often need to query data across multiple heterogeneous systems (like MySQL, PostgreSQL, Iceberg, Hive) without moving or copying data. This is where query federation becomes essential.
Apache Gravitino simplifies this by acting as a unified metadata control plane. By using the Gravitino Trino Connector, you can access multiple data sources through a single catalog interface in Trino, with automatic catalog discovery and centralized metadata management.
Key benefits:
Architecture overview:
Before starting this tutorial, you will need:
System Requirements:
Required Components:
02-setup-guide/README.md)Optional Components:
Before proceeding, verify your Java installation:
${JAVA_HOME}/bin/java -version
Important: Ensure your Gravitino server is configured to use
simpleauthentication mode. The Gravitino Trino connector currently connects as an anonymous user and does not propagate user authentication.
The Gravitino Trino Connector enables Trino to dynamically load catalogs from Gravitino:
gravitino via etc/catalog/gravitino.properties
iceberg_test and mysql_test) through Gravitino stored procedures or REST APIsgravitino.metadata.refresh-interval-seconds)catalog.schema.table namingReplace these values with your environment settings:
http://gravitino-server:8090
trino_metalake
thrift://hive-host:9083
hdfs://namenode:9000/user/iceberg/warehouse
jdbc:mysql://mysql-host:3306?useSSL=false
trino / ds123
The Gravitino Trino Connector must be installed on all Trino nodes (coordinator and workers).
1. Download the connector
Download the Gravitino Trino connector from the Apache Gravitino download page or build from source.
2. Install on all Trino nodes
Extract the connector and copy it to Trino's plugin directory:
# Extract the connector
tar -xzf gravitino-trino-connector-<version>.tar.gz
# Copy to Trino plugin directory (on coordinator and all workers)
cp -r gravitino-trino-connector-<version> ${TRINO_HOME}/plugin/gravitino
Configure Trino for dynamic catalogs
Edit ${TRINO_HOME}/etc/config.properties on the coordinator node:
catalog.management=dynamic
Create etc/catalog/gravitino.properties
On each Trino node, create the Gravitino catalog configuration file, pointing to your Gravitino server and metalake:
connector.name=gravitino
gravitino.uri=http://gravitino-server:8090
gravitino.metalake=trino_metalake
Note: The metalake specified in
gravitino.metalakemust already exist in Gravitino. If not, create it via the Web UI or REST API:curl -X POST -H "Content-Type: application/json" \ -d '{"name":"trino_metalake","comment":"Metalake for Trino federation","properties":{}}' \ http://gravitino-server:8090/api/metalakes
Restart Trino
After creating the configuration file on each node, restart Trino to load the connector.
Check the gravitino catalog is loaded
SHOW CATALOGS;
You should see gravitino in the catalog list, confirming successful installation.
Once the gravitino catalog is configured, you can create additional catalogs using stored procedures in gravitino.system.
Example using Hive Metastore backend
CALL gravitino.system.create_catalog(
'iceberg_test',
'lakehouse-iceberg',
MAP(
ARRAY['uri', 'catalog-backend', 'warehouse'],
ARRAY['thrift://hive-host:9083', 'hive', 'hdfs://namenode:9000/user/iceberg/warehouse']
)
);
Note: For S3 or other cloud storage, you may need to pass additional properties using the
trino.bypass.prefix for Trino-specific settings, read more in Apache Gravitino Trino connector - Iceberg catalog.
Example JDBC catalog for MySQL
CALL gravitino.system.create_catalog(
'mysql_test',
'jdbc-mysql',
MAP(
ARRAY['jdbc-url', 'jdbc-user', 'jdbc-password', 'jdbc-driver'],
ARRAY['jdbc:mysql://mysql-host:3306?useSSL=false', 'trino', 'ds123', 'com.mysql.cj.jdbc.Driver']
)
);
Tip: To ignore "already exists" errors, use named arguments with
ignore_exist => true.
Inspect Gravitino catalogs
SELECT * FROM gravitino.system.catalog;
Expected output:
name | provider | properties
--------------+-------------------+-------------------------------
iceberg_test | lakehouse-iceberg | {...}
mysql_test | jdbc-mysql | {...}
After creating catalogs in Gravitino, verify they are visible in Trino.
Check available catalogs and schemas
SHOW CATALOGS;
SHOW SCHEMAS FROM iceberg_test;
SHOW SCHEMAS FROM mysql_test;
Note: Trino syncs catalogs from Gravitino according to the configured refresh interval (10 seconds by default). If catalogs don't appear immediately, wait for the next refresh cycle and retry.
Create sample schemas and tables to demonstrate federation capabilities.
Set up a users dimension table
-- Create schema
CREATE SCHEMA mysql_test.demo;
-- Create users table
CREATE TABLE mysql_test.demo.users (
user_id BIGINT,
user_name VARCHAR
);
-- Insert sample data
INSERT INTO mysql_test.demo.users VALUES
(1, 'alice'),
(2, 'bob');
-- Verify data
SHOW TABLES FROM mysql_test.demo;
SELECT * FROM mysql_test.demo.users;
Set up an events fact table
-- Create schema
CREATE SCHEMA iceberg_test.demo;
-- Create events table
CREATE TABLE iceberg_test.demo.events (
user_id BIGINT,
event_type VARCHAR,
ts TIMESTAMP
);
-- Insert sample data
INSERT INTO iceberg_test.demo.events VALUES
(1, 'click', TIMESTAMP '2024-01-01 10:00:00'),
(2, 'view', TIMESTAMP '2024-01-01 10:01:00');
-- Verify data
SHOW TABLES FROM iceberg_test.demo;
SELECT * FROM iceberg_test.demo.events;
These examples demonstrate the core value of query federation: joining data across heterogeneous sources in a single query.
Join dimension and fact tables
SELECT
e.user_id,
u.user_name,
e.event_type,
e.ts
FROM iceberg_test.demo.events e
JOIN mysql_test.demo.users u
ON e.user_id = u.user_id
ORDER BY e.ts;
Count events by user
SELECT
u.user_name,
COUNT(*) AS event_cnt
FROM iceberg_test.demo.events e
JOIN mysql_test.demo.users u
ON e.user_id = u.user_id
WHERE e.ts >= TIMESTAMP '2024-01-01 00:00:00'
GROUP BY u.user_name
ORDER BY event_cnt DESC, u.user_name;
Filter fact table by dimension membership
SELECT e.*
FROM iceberg_test.demo.events e
WHERE EXISTS (
SELECT 1
FROM mysql_test.demo.users u
WHERE u.user_id = e.user_id
);
Keep all events, even without matching users
SELECT
e.user_id,
COALESCE(u.user_name, 'unknown') AS user_name,
e.event_type,
e.ts
FROM iceberg_test.demo.events e
LEFT JOIN mysql_test.demo.users u
ON e.user_id = u.user_id
ORDER BY e.ts;
In federated queries, understanding where work happens is crucial for optimization:
How queries execute:
Query optimization tips:
BIGINT for IDs)EXPLAIN to understand execution strategyAnalyze query execution
EXPLAIN
SELECT
u.user_name,
COUNT(*) AS event_cnt
FROM iceberg_test.demo.events e
JOIN mysql_test.demo.users u
ON e.user_id = u.user_id
GROUP BY u.user_name;
Remove sample data and catalogs
-- Drop tables
DROP TABLE mysql_test.demo.users;
DROP TABLE iceberg_test.demo.events;
-- Drop schemas
DROP SCHEMA mysql_test.demo;
DROP SCHEMA iceberg_test.demo;
-- Drop catalogs
CALL gravitino.system.drop_catalog('mysql_test');
CALL gravitino.system.drop_catalog('iceberg_test');
Common issues and their solutions:
Connector installation issues:
gravitino.properties exists in etc/catalog/
catalog.management=dynamic is set in etc/config.properties on the coordinatorConnection issues:
gravitino.uri is correctgravitino.metalake exists in GravitinoCatalog sync issues:
gravitino.metadata.refresh-interval-seconds
Query execution issues:
catalog.schema.table
anonymous user if the connector is configured to run anonymously) has the required catalog/schema/table privileges in Gravitino for the objects being queriedYou have successfully completed the Gravitino Trino query federation tutorial!
You now have a fully functional Trino environment with Gravitino integration, including:
Your Trino environment is now ready to leverage Gravitino for unified metadata management and cross-system query federation.
For more advanced configurations and detailed documentation:
Apache Gravitino is rapidly evolving, and this article is written based on the latest version 1.1.0. If you encounter issues, please refer to the official documentation or submit issues on GitHub.