Using Apache Gravitino with Trino for Query Federation

# gravitino101# trino# metadata# lakehouse
Using Apache Gravitino with Trino for Query FederationDatastrato

Author: Yu hui Last Updated: 2026-02-11 Overview In this tutorial, you will learn how to...

Author: Yu hui

Last Updated: 2026-02-11

Overview

In this tutorial, you will learn how to integrate Apache Gravitino with Trino to enable query federation across multiple data sources through a unified metadata layer. By the end of this guide, you'll be able to configure Trino to automatically load catalogs from Gravitino and run cross-catalog queries seamlessly.

What you'll accomplish:

  • Connect Trino to Gravitino to enable automatic loading of Gravitino-managed catalogs
  • Create catalogs from Trino SQL including Iceberg and MySQL examples
  • Execute federated queries that join data across heterogeneous sources
  • Validate catalog discovery and inspect catalogs using Trino SQL

Trino is a distributed SQL query engine designed for fast analytic queries against data of any size. In modern data architectures, organizations often need to query data across multiple heterogeneous systems (like MySQL, PostgreSQL, Iceberg, Hive) without moving or copying data. This is where query federation becomes essential.

Apache Gravitino simplifies this by acting as a unified metadata control plane. By using the Gravitino Trino Connector, you can access multiple data sources through a single catalog interface in Trino, with automatic catalog discovery and centralized metadata management.

Key benefits:

  • Unified catalog access: Query MySQL, Iceberg, Hive, and other sources through a single interface
  • Automatic catalog discovery: Catalogs created in Gravitino are automatically available in Trino
  • Zero data movement: Join across heterogeneous systems without copying data
  • Centralized management: Create and update catalogs in one place, reflected everywhere

Architecture overview:

Gravitino Trino Query Federation Architecture

Prerequisites

Before starting this tutorial, you will need:

System Requirements:

  • Linux or macOS operating system with outbound internet access for downloads
  • JDK 17 or higher installed and properly configured

Required Components:

  • Gravitino server installed and running (see 02-setup-guide/README.md)
  • Trino server (coordinator + workers, or single-node for testing)
  • Trino version 435 or compatible version for your Gravitino Trino connector release

Optional Components:

  • MySQL or PostgreSQL for JDBC federation examples
  • Hive Metastore for Iceberg catalog backend
  • Object storage (S3/GCS/Azure) for cloud-based table storage

Before proceeding, verify your Java installation:

${JAVA_HOME}/bin/java -version
Enter fullscreen mode Exit fullscreen mode

Important: Ensure your Gravitino server is configured to use simple authentication mode. The Gravitino Trino connector currently connects as an anonymous user and does not propagate user authentication.

Setup

How the Integration Works

The Gravitino Trino Connector enables Trino to dynamically load catalogs from Gravitino:

  • The connector is configured as a Trino catalog named gravitino via etc/catalog/gravitino.properties
  • You create additional catalogs (like iceberg_test and mysql_test) through Gravitino stored procedures or REST APIs
  • Trino automatically syncs Gravitino-managed catalogs every 10 seconds (configurable via gravitino.metadata.refresh-interval-seconds)
  • You query federated data using standard catalog.schema.table naming

Values Used in This Tutorial

Replace these values with your environment settings:

  • Gravitino URI: http://gravitino-server:8090
  • Metalake: trino_metalake
  • Iceberg HMS URI: thrift://hive-host:9083
  • Iceberg warehouse: hdfs://namenode:9000/user/iceberg/warehouse
  • MySQL JDBC URL: jdbc:mysql://mysql-host:3306?useSSL=false
  • MySQL credentials: trino / ds123

Step 1: Install and Configure Gravitino Trino Connector

The Gravitino Trino Connector must be installed on all Trino nodes (coordinator and workers).

Install the Connector Plugin

1. Download the connector

Download the Gravitino Trino connector from the Apache Gravitino download page or build from source.

2. Install on all Trino nodes

Extract the connector and copy it to Trino's plugin directory:

# Extract the connector
tar -xzf gravitino-trino-connector-<version>.tar.gz

# Copy to Trino plugin directory (on coordinator and all workers)
cp -r gravitino-trino-connector-<version> ${TRINO_HOME}/plugin/gravitino
Enter fullscreen mode Exit fullscreen mode

Gravitino Trino Connector Plugin Directory

Enable Dynamic Catalog Management

Configure Trino for dynamic catalogs

Edit ${TRINO_HOME}/etc/config.properties on the coordinator node:

catalog.management=dynamic
Enter fullscreen mode Exit fullscreen mode

Configure the Gravitino Catalog

Create etc/catalog/gravitino.properties

On each Trino node, create the Gravitino catalog configuration file, pointing to your Gravitino server and metalake:

connector.name=gravitino
gravitino.uri=http://gravitino-server:8090
gravitino.metalake=trino_metalake
Enter fullscreen mode Exit fullscreen mode

Note: The metalake specified in gravitino.metalake must already exist in Gravitino. If not, create it via the Web UI or REST API:

curl -X POST -H "Content-Type: application/json" \
  -d '{"name":"trino_metalake","comment":"Metalake for Trino federation","properties":{}}' \
  http://gravitino-server:8090/api/metalakes

Restart Trino

After creating the configuration file on each node, restart Trino to load the connector.

Verify Installation

Check the gravitino catalog is loaded

SHOW CATALOGS;
Enter fullscreen mode Exit fullscreen mode

You should see gravitino in the catalog list, confirming successful installation.

Step 2: Create Catalogs from Trino SQL

Once the gravitino catalog is configured, you can create additional catalogs using stored procedures in gravitino.system.

Create an Iceberg Catalog

Example using Hive Metastore backend

CALL gravitino.system.create_catalog(
    'iceberg_test',
    'lakehouse-iceberg',
    MAP(
        ARRAY['uri', 'catalog-backend', 'warehouse'],
        ARRAY['thrift://hive-host:9083', 'hive', 'hdfs://namenode:9000/user/iceberg/warehouse']
    )
);
Enter fullscreen mode Exit fullscreen mode

Note: For S3 or other cloud storage, you may need to pass additional properties using the trino.bypass. prefix for Trino-specific settings, read more in Apache Gravitino Trino connector - Iceberg catalog.

Create a MySQL Catalog

Example JDBC catalog for MySQL

CALL gravitino.system.create_catalog(
    'mysql_test',
    'jdbc-mysql',
    MAP(
        ARRAY['jdbc-url', 'jdbc-user', 'jdbc-password', 'jdbc-driver'],
        ARRAY['jdbc:mysql://mysql-host:3306?useSSL=false', 'trino', 'ds123', 'com.mysql.cj.jdbc.Driver']
    )
);
Enter fullscreen mode Exit fullscreen mode

Tip: To ignore "already exists" errors, use named arguments with ignore_exist => true.

Verify Catalog Creation

Inspect Gravitino catalogs

SELECT * FROM gravitino.system.catalog;
Enter fullscreen mode Exit fullscreen mode

Expected output:

   name       |     provider      | properties
--------------+-------------------+-------------------------------
 iceberg_test | lakehouse-iceberg | {...}
 mysql_test   | jdbc-mysql        | {...}
Enter fullscreen mode Exit fullscreen mode

Step 3: Validate Catalog Discovery

After creating catalogs in Gravitino, verify they are visible in Trino.

Confirm Catalog Visibility

Check available catalogs and schemas

SHOW CATALOGS;
SHOW SCHEMAS FROM iceberg_test;
SHOW SCHEMAS FROM mysql_test;
Enter fullscreen mode Exit fullscreen mode

Note: Trino syncs catalogs from Gravitino according to the configured refresh interval (10 seconds by default). If catalogs don't appear immediately, wait for the next refresh cycle and retry.

Step 4: Prepare Sample Data

Create sample schemas and tables to demonstrate federation capabilities.

Create MySQL Dimension Table

Set up a users dimension table

-- Create schema
CREATE SCHEMA mysql_test.demo;

-- Create users table
CREATE TABLE mysql_test.demo.users (
  user_id BIGINT,
  user_name VARCHAR
);

-- Insert sample data
INSERT INTO mysql_test.demo.users VALUES
  (1, 'alice'),
  (2, 'bob');

-- Verify data
SHOW TABLES FROM mysql_test.demo;
SELECT * FROM mysql_test.demo.users;
Enter fullscreen mode Exit fullscreen mode

Create Iceberg Fact Table

Set up an events fact table

-- Create schema
CREATE SCHEMA iceberg_test.demo;

-- Create events table
CREATE TABLE iceberg_test.demo.events (
  user_id BIGINT,
  event_type VARCHAR,
  ts TIMESTAMP
);

-- Insert sample data
INSERT INTO iceberg_test.demo.events VALUES
  (1, 'click', TIMESTAMP '2024-01-01 10:00:00'),
  (2, 'view',  TIMESTAMP '2024-01-01 10:01:00');

-- Verify data
SHOW TABLES FROM iceberg_test.demo;
SELECT * FROM iceberg_test.demo.events;
Enter fullscreen mode Exit fullscreen mode

Step 5: Execute Federated Queries

These examples demonstrate the core value of query federation: joining data across heterogeneous sources in a single query.

Pattern 1: Cross-Catalog JOIN

Join dimension and fact tables

SELECT
  e.user_id,
  u.user_name,
  e.event_type,
  e.ts
FROM iceberg_test.demo.events e
JOIN mysql_test.demo.users u
  ON e.user_id = u.user_id
ORDER BY e.ts;
Enter fullscreen mode Exit fullscreen mode

Pattern 2: Aggregation Across Catalogs

Count events by user

SELECT
  u.user_name,
  COUNT(*) AS event_cnt
FROM iceberg_test.demo.events e
JOIN mysql_test.demo.users u
  ON e.user_id = u.user_id
WHERE e.ts >= TIMESTAMP '2024-01-01 00:00:00'
GROUP BY u.user_name
ORDER BY event_cnt DESC, u.user_name;
Enter fullscreen mode Exit fullscreen mode

Pattern 3: Semi-Join Filter

Filter fact table by dimension membership

SELECT e.*
FROM iceberg_test.demo.events e
WHERE EXISTS (
  SELECT 1
  FROM mysql_test.demo.users u
  WHERE u.user_id = e.user_id
);
Enter fullscreen mode Exit fullscreen mode

Pattern 4: LEFT JOIN with Unmatched Rows

Keep all events, even without matching users

SELECT
  e.user_id,
  COALESCE(u.user_name, 'unknown') AS user_name,
  e.event_type,
  e.ts
FROM iceberg_test.demo.events e
LEFT JOIN mysql_test.demo.users u
  ON e.user_id = u.user_id
ORDER BY e.ts;
Enter fullscreen mode Exit fullscreen mode

Step 6: Understanding Federation Mechanics

In federated queries, understanding where work happens is crucial for optimization:

How queries execute:

  • Connector-level reads: Each connector (Iceberg, MySQL) reads from its respective source
  • Trino-level joins: Trino combines results from multiple sources in memory
  • Pushdown optimization: Some filters and predicates may be pushed to source systems

Query optimization tips:

  • Filter early: Apply partition/time filters on large tables to reduce data scanned
  • Align join keys: Use consistent data types across sources (e.g., BIGINT for IDs)
  • Small dimension pattern: Join large fact tables with small dimension tables for efficiency
  • Review query plans: Use EXPLAIN to understand execution strategy

Analyze query execution

EXPLAIN
SELECT
  u.user_name,
  COUNT(*) AS event_cnt
FROM iceberg_test.demo.events e
JOIN mysql_test.demo.users u
  ON e.user_id = u.user_id
GROUP BY u.user_name;
Enter fullscreen mode Exit fullscreen mode

Step 7: Clean Up Resources

Remove sample data and catalogs

-- Drop tables
DROP TABLE mysql_test.demo.users;
DROP TABLE iceberg_test.demo.events;

-- Drop schemas
DROP SCHEMA mysql_test.demo;
DROP SCHEMA iceberg_test.demo;

-- Drop catalogs
CALL gravitino.system.drop_catalog('mysql_test');
CALL gravitino.system.drop_catalog('iceberg_test');
Enter fullscreen mode Exit fullscreen mode

Troubleshooting

Common issues and their solutions:

Connector installation issues:

  • Catalog not found: Ensure the Gravitino connector plugin is installed on all Trino nodes and gravitino.properties exists in etc/catalog/
  • Dynamic catalog not working: Verify catalog.management=dynamic is set in etc/config.properties on the coordinator

Connection issues:

  • Cannot connect to Gravitino: Check that Gravitino server is running and gravitino.uri is correct
  • Metalake not found: Ensure the metalake specified in gravitino.metalake exists in Gravitino

Catalog sync issues:

  • Catalogs not appearing: Wait for the sync interval (default 10 seconds) or adjust gravitino.metadata.refresh-interval-seconds
  • Stale catalog information: Restart Trino or wait for the next sync cycle

Query execution issues:

  • Table not found: Verify the fully qualified table name format: catalog.schema.table
  • Permission denied (Gravitino metadata): Verify that the Trino identity (or the mapped Gravitino user, or the anonymous user if the connector is configured to run anonymously) has the required catalog/schema/table privileges in Gravitino for the objects being queried
  • Permission denied (underlying data source): If Gravitino privileges are correct but the error persists, check the credentials and permissions for the underlying data sources (for example MySQL user/password, Hive/HDFS/S3 ACLs) configured in the corresponding Gravitino catalog

Congratulations

You have successfully completed the Gravitino Trino query federation tutorial!

You now have a fully functional Trino environment with Gravitino integration, including:

  • A configured Gravitino Trino Connector for automatic catalog discovery
  • Multiple registered catalogs (Iceberg and MySQL) accessible from Trino
  • Working federated queries that join data across heterogeneous sources
  • Understanding of query optimization patterns and federation mechanics

Your Trino environment is now ready to leverage Gravitino for unified metadata management and cross-system query federation.

Further Reading

For more advanced configurations and detailed documentation:

Next Steps


Apache Gravitino is rapidly evolving, and this article is written based on the latest version 1.1.0. If you encounter issues, please refer to the official documentation or submit issues on GitHub.