READ_ICEBERG
returns a table with data from the specified Iceberg table.
Topics:
- Syntax
- Parameters
- Return Type
- Best Practices
- Examples
- About Metadata Versions in File-Based Catalogs
- Limitations
Syntax
Parameters
Using a LOCATION object
Parameter | Description | Supported input types |
---|---|---|
LOCATION | The name of a location object that contains Iceberg parameters and credentials. When specified, individual credential parameters are not required. For a comprehensive guide, see LOCATION objects. | IDENTIFIER |
NAMESPACE | The namespace an Iceberg table resides in. Used only for REST catalogs. | TEXT |
TABLE | The name of the Iceberg table to read. Used only for REST catalogs. | TEXT |
MAX_STALENESS | Specifies a maximum staleness for results returned by this function, e.g., INTERVAL '30 seconds' . The default value is 0 seconds, forcing Firebolt to fetch the latest version metadata from the catalog for every query. Values larger than zero instruct Firebolt to cache metadata and vended credentials in memory, and can typically reduce query latency by tens or hundreds of milliseconds. | INTERVAL |
Using individual TVF parameters
Common Parameters
Parameter | Description | Supported input types |
---|---|---|
URL | A url pointing to a table in an Iceberg file-based catalog, or a url pointing to an Iceberg REST catalog API endpoint. For file-based catalogs, the expected format is s3://{bucket_name}/{path}/{to}/{table} or s3://{bucket_name}/{path}/{to}/{table}/metadata/{version.metadata.json} . For REST catalogs, the expected format is https://{path}/{to}/{rest}/{host} . | TEXT |
MAX_STALENESS | Specifies a maximum staleness for results returned by this function, e.g., INTERVAL '30 seconds' . The default value is 0 seconds, forcing Firebolt to fetch the latest version metadata from the catalog for every query. Values larger than zero instruct Firebolt to cache metadata and vended credentials in memory, and can typically reduce query latency by tens or hundreds of milliseconds. | INTERVAL |
Parameters for file-based Iceberg tables (Iceberg tables hosted in S3)
Parameter | Description | Supported input types |
---|---|---|
AWS_ACCESS_KEY_ID | The AWS access key ID. Used only for file-based catalogs. | TEXT |
AWS_SECRET_ACCESS_KEY | The AWS secret access key. Used only for file-based catalogs. | TEXT |
AWS_SESSION_TOKEN | The AWS session token. Used only for file-based catalogs. | TEXT |
AWS_ROLE_ARN | The AWS role ARN. Used only for file-based catalogs. | TEXT |
AWS_ROLE_EXTERNAL_ID | The AWS role external ID. Used only for file-based catalogs. | TEXT |
Parameters for Iceberg REST catalogs
Parameter | Description | Supported input types |
---|---|---|
WAREHOUSE | The name of the warehouse an Iceberg table resides in. Used only for REST catalogs. | TEXT |
NAMESPACE | The namespace an Iceberg table resides in. Used only for REST catalogs. | TEXT |
TABLE | The name of the Iceberg table to read. Used only for REST catalogs. | TEXT |
OAUTH_CLIENT_ID | An OAuth client ID for authenticating to the REST catalog. Used only for REST catalogs. | TEXT |
OAUTH_CLIENT_SECRET | An OAuth client secret for authenticating to the REST catalog. Used only for REST catalogs. | TEXT |
OAUTH_SCOPE | An OAuth scope for authenticating to the REST catalog. Used only for REST catalogs. | TEXT |
OAUTH_SERVER_URL | The URL to use when requesting an access token for the REST catalog. If not specified, {URL}/v1/oauth/tokens will be used. Used only for REST catalogs. | TEXT |
Return Type
The result is a table with data from the Iceberg files. Columns are read and parsed using their inferred data types.Best practices
- Use a
LOCATION
object to store credentials for authentication. This approach centralizes credential management and eliminates the need to specify individual credential parameters in each query. See CREATE LOCATION and CREATE LOCATION (Iceberg). - Specifying a value for
MAX_STALENESS
can help improve performance in tight loops and infrequently-updated tables. - It is recommended to specify all parameters using the named-parameter syntax rather than relying on parameter positions. For example: use
URL => 'http://example.com'
rather than omitting theURL =>
parameter name specifier.
Examples
Reading using a LOCATION
The following code example reads the first 5 rows from an Iceberg table using aLOCATION
object:
LOCATION
objects are supported for file-based (S3-hosted) catalogs as well as REST catalogs.
For more examples of LOCATION
, see CREATE LOCATION and CREATE LOCATION (Iceberg).
Using TVF parameters, from a public location in S3
The following code example reads the first 5 rows from an Iceberg table in a file-based catalog stored on S3:l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct | l_shipmode | l_comment |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 156 | 4 | 1 | 17.00 | 17954.55 | 0.04 | 0.02 | N | O | 1996-03-13 | 1996-02-12 | 1996-03-22 | DELIVER IN PERSON | TRUCK | to beans x-ray carefull |
1 | 68 | 9 | 2 | 36.00 | 34850.16 | 0.09 | 0.06 | N | O | 1996-04-12 | 1996-02-28 | 1996-04-20 | TAKE BACK RETURN | according to the final foxes. qui | |
1 | 64 | 5 | 3 | 8.00 | 7712.48 | 0.10 | 0.02 | N | O | 1996-01-29 | 1996-03-05 | 1996-01-31 | TAKE BACK RETURN | REG AIR | ourts cajole above the furiou |
1 | 3 | 6 | 4 | 28.00 | 25284.00 | 0.09 | 0.06 | N | O | 1996-04-21 | 1996-03-30 | 1996-05-16 | NONE | AIR | s cajole busily above t |
1 | 25 | 8 | 5 | 24.00 | 22200.48 | 0.10 | 0.04 | N | O | 1996-03-30 | 1996-03-14 | 1996-04-01 | NONE | FOB | the regular, regular pa |
Using a direct path to a metadata.json file
A URL pointing to an Iceberg metadata.json file will also return the same result:l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct | l_shipmode | l_comment |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 156 | 4 | 1 | 17.00 | 17954.55 | 0.04 | 0.02 | N | O | 1996-03-13 | 1996-02-12 | 1996-03-22 | DELIVER IN PERSON | TRUCK | to beans x-ray carefull |
1 | 68 | 9 | 2 | 36.00 | 34850.16 | 0.09 | 0.06 | N | O | 1996-04-12 | 1996-02-28 | 1996-04-20 | TAKE BACK RETURN | according to the final foxes. qui | |
1 | 64 | 5 | 3 | 8.00 | 7712.48 | 0.10 | 0.02 | N | O | 1996-01-29 | 1996-03-05 | 1996-01-31 | TAKE BACK RETURN | REG AIR | ourts cajole above the furiou |
1 | 3 | 6 | 4 | 28.00 | 25284.00 | 0.09 | 0.06 | N | O | 1996-04-21 | 1996-03-30 | 1996-05-16 | NONE | AIR | s cajole busily above t |
1 | 25 | 8 | 5 | 24.00 | 22200.48 | 0.10 | 0.04 | N | O | 1996-03-30 | 1996-03-14 | 1996-04-01 | NONE | FOB | the regular, regular pa |
Authenticated read from S3
The following code examples use various valid combinations of AWS secrets to read from S3: Using access key + secret:Reading from REST
The following code example reads the first 5 rows from an Iceberg table in a REST catalog:Reading from REST, with a custom OAuth token URL
Some Iceberg REST catalogs do not support the/v1/oauth/tokens
API, and serve OAuth tokens from a different API. For these catalogs, specify OAUTH_SERVER_URL
.
Reading from Databricks Unity Catalog
One example of an Iceberg REST catalog that requires a customOAUTH_SERVER_URL
is the Databricks Unity Catalog. The following code example reads the first 5 rows from a table in a Databricks Unity Catalog, showing how Databricks concepts map to READ_ICEBERG
parameters:
Reading from Snowflake Open Catalog
About Metadata Versions in File-Based Catalogs
When reading from a file-based catalog, Firebolt first looks for aversion-hint.text
file to determine which metadata version to use. If one is not available, READ_ICEBERG
also accepts a path to a specific metadata.json
file.
Limitations
- Iceberg tables with Parquet data files in S3 are currently supported.
- Versions 1 and 2 of the Apache Iceberg specification are supported.
- Tables with following Iceberg features are currently not supported:
- Row-level deletes (position deletes or equality deletes)
- Schema evolution
- Partition evolution
- The following data types are currently not supported:
variant
geometry
geography
- Reading past snapshots with time travel is currently not supported.
- Nested complex types such as
struct
,list
, andmap
are currently read as nullable even if Iceberg defines the field to be non-nullable. This only applies to nested complex types (struct
/list
/map
nested inside anotherstruct
/list
/map
). - Returning partition values for Identity Transforms from partition metadata is currently not supported.
- By default, cross-region reads for S3 are disabled for
read_iceberg
, as they may incur additional costs. They may be enabled on a per-query basis using the cross_region_request_mode setting.