HIVE-29359: Support Credential Vending in Hive Iceberg REST Catalog Client#6474
HIVE-29359: Support Credential Vending in Hive Iceberg REST Catalog Client#6474difin wants to merge 1 commit into
Conversation
c092060 to
db520b9
Compare
db520b9 to
d3274be
Compare
d3274be to
d390532
Compare
d390532 to
f40686f
Compare
f40686f to
c9d71f5
Compare
c9d71f5 to
e50bbd2
Compare
e50bbd2 to
9198f81
Compare
|
| "iceberg.catalog.ice01." + IcebergVendedCredentialUtil.SECRET_ACCESS_KEY) | ||
| .satisfies(map -> | ||
| assertThat(map.get(InputFormatConfig.VENDED_STORAGE_CREDENTIALS)) | ||
| .isNotBlank()); |
There was a problem hiding this comment.
I am feeling the prefix information is gone.
| table = readTableObjectFromFile(location, config); | ||
| } | ||
| checkAndSetIoConfig(config, table); | ||
| IcebergVendedCredentialUtil.applyFromJobConf(table, config); |
There was a problem hiding this comment.
Most Iceberg clients don't need to ser/de credentials on their own, probably. We need it. That's because we serialize an Iceberg table in Hadoop's configuration as an intermediate expression? I guess we should store the credentials as they are and just restore them, without refining or normalizing the content on the Hive side
There was a problem hiding this comment.
You're right that most Iceberg clients don't need to ser/de credentials themselves. Hive does, because we serialize the Iceberg Table (SerializableTable) into JobConf for Tez/LLAP, and vended credentials on FileIO typically don't survive that round-trip. Executors rebuild the table from job conf and don't re-run REST loadTable, so we propagate credentials separately (VENDED_STORAGE_CREDENTIALS + S3A bucket keys) and restore them in deserializeTable via applyFromJobConf.
The main place we mutate vended credential content is withConfigurationOverrides() method. REST catalogs can vend connectivity settings from their network view (e.g. http://minio:9000 when the catalog runs in Docker), while Hive session config sets a host-reachable endpoint (iceberg.catalog.ice01.s3.endpoint=http://host:9000). That method overrides only non-secret fields (s3.endpoint, s3.path-style-access) so Iceberg FileIO and S3A agree on connectivity; vended keys are preserved. It runs at both store time (propagateToJob, so the blob on executors is self-contained) and restore time (applyFromJobConf, e.g. when commit still has the catalog-internal endpoint on FileIO from loadTable).
9198f81 to
0ff20b0
Compare
|



What changes were proposed in this pull request?
Extended Gravitino LLAP qtest (
TestIcebergRESTCatalogGravitinoLlapLocalCliDriver) to add a vended credentials header and run against MinIO + S3 warehouse with Gravitino s3-secret-key vending and OAuth2; configure host-side S3A and Iceberg S3FileIO for the published MinIO port so Tez/LLAP on the host work reliably.Updated Hive to pass vended credentials to executors using jobProperties and jobSecrets.
Why are the changes needed?
To enable vended credentials support with REST Catalog servers.
Does this PR introduce any user-facing change?
Yes. Users configuring an Iceberg REST catalog in Hive can set the header
iceberg.catalog.<name>.X-Iceberg-Access-Delegationon REST requests to enabled vended credentials.How was this patch tested?
Updated existing test with vended credentials testing:
TestHiveRESTCatalogClient: new unit tests for vended credentials header mapping.
TestIcebergRESTCatalogGravitinoLlapLocalCliDriver with Gravitino + MinIO + OAuth2 + credentials vending.