Skip to content

HIVE-29413: Generalise column related APIs in Table.java#6413

Open
ramitg254 wants to merge 42 commits into
apache:masterfrom
ramitg254:HIVE-29413
Open

HIVE-29413: Generalise column related APIs in Table.java#6413
ramitg254 wants to merge 42 commits into
apache:masterfrom
ramitg254:HIVE-29413

Conversation

@ramitg254

@ramitg254 ramitg254 commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

added getEffectivePartCols() in most places possible to avoid code duplication.

Why are the changes needed?

getPartCols() does not have support for iceberg tables.

Does this PR introduce any user-facing change?

No

How was this patch tested?

ci tests and local build

@deniskuzZ

Copy link
Copy Markdown
Member

@ramitg254 please take a look: 9e7535c. I would suggest following similar approach

@ramitg254

ramitg254 commented Apr 10, 2026

Copy link
Copy Markdown
Contributor Author

9e7535c

but here we are creating separate method getEffectivePartCols() and leaving getPartCols() as it is, which as per our discussion on that closed pr we shouldn't do that, and only go ahead with updating getPartCols()

@deniskuzZ

deniskuzZ commented Apr 10, 2026

Copy link
Copy Markdown
Member

9e7535c

but here we are creating separate method getEffectivePartCols() and leaving getPartCols() as it is, which as per our discussion on that closed pr we shouldn't do that, and only go ahead with updating getPartCols()

Where did I say that? The ask was to keep the original method unchanged. same here

@ramitg254

ramitg254 commented Apr 10, 2026

Copy link
Copy Markdown
Contributor Author

oh I got confused due to this comment: #6337 (comment) in which getSupportedPartCols() was just separate method similar to getEffectivePartCols()

@ramitg254

ramitg254 commented Apr 10, 2026

Copy link
Copy Markdown
Contributor Author

I am fine with that earlier approach as well but recently I saw this one: https://issues.apache.org/jira/browse/HIVE-29525 so I thought we should have unified getPartCols() and getCols() which gives similar results as native hive tables as first step towards solving this after that those plan logics can be taken care of later on when that ticket will be addressed.
So I was first focussing on making getPartCols() unified for iceberg tables as well.

please share your thoughts on this idea

ramitg254 added 24 commits June 22, 2026 19:21
# Conflicts:
#	iceberg/iceberg-handler/src/test/results/positive/llap/iceberg_bucket_map_join_7.q.out
Change-Id: I09ffba356ac47e3416c8b6717e8671d2cb6432b8
Change-Id: I6812d068be702edf01b158c3e7eacadeadbd5c2b
Change-Id: I6f5d81c5be00e080753efd5e8157b14fcfde2dde
Change-Id: I5a95791b4d756fe949ee4697204be9de3025c6b5
Change-Id: I852cf156a1a5ac20525bb9376f4cbb5783b8e1e3
Change-Id: Ib1542e38c65c7811074b97d7da7aaf780f3895dd
Change-Id: Ie3440239248870fb44c6e956d5be10271b28a1a0
Change-Id: Ic8f6d6c391ba49e12de7d04b7bb20542d879b0a3
Change-Id: I2017f7c771f8f611b5d8900d3e0276817842b41f
Change-Id: I87dc65eb942e3db1b1a0d9f2608f1b88b44b11e6
Change-Id: Ic0fefede8b079205c5580a31f59f195b6b1e94e0
Change-Id: I5a34151d5ec0d4e1759eee5fcc6339b5fd6b92d3
Change-Id: I0b130777712dd05df53a1ca6ed80ea4c5abd0187
Change-Id: Id1ce8250addc158a36887d4c8538ee2ab18b6f39
Change-Id: Ic5ec07bc8c9c1c656407fcaec08904792dd61093
*/
public void setTTable(org.apache.hadoop.hive.metastore.api.Table tTable) {
this.tTable = tTable;
tablePartCols = null;

@deniskuzZ deniskuzZ Jun 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider extracting it to a private self-describing reset method (i.e. resetColumnCache etc)

   tablePartCols = null; 
    tableNonPartCols = null;
    columnsByName = null;

is done in at least 3 places

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Change-Id: I518f8baac535b65fad0295bd1c892457c904d8e9
@sonarqubecloud

Copy link
Copy Markdown

sqlGenerator.append(identifier);
// This is one of the columns we're setting, record it's position so we can come back
// later and patch it up. 0th is ROW_ID
setColExprs.put(setColExprIndex + columnOffset, setCol);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify: in this method only the result of this expressionsetColExprIndex + columnOffset is used but two parameters are passed. Please pass the result of the expression and remove one of the parameters.

Comment on lines +136 to +138
if (prependComma) {
sqlGenerator.append(",");
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this back to the loop where this method is called.
We always append non-partition columns and in the next step we add partition columns. In case of non-partition columns the the comma is not needed but in case of every follow-up columns it is. And also partition columns need it but when we add partition columns we continue appending to an existing list so all columns need a comma. As a result please remove prependComma parameter of this method and the appendPartitionColumns

Comment on lines +102 to +103
List<FieldSchema> cols = table.hasNonNativePartitionSupport() ?
table.getAllCols() : table.getCols();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is blocking collecting the lineage information of all columns in case of native tables?

Comment on lines +299 to +300
boolean isStaticPartition = partSpec != null && partSpec.containsKey(partColName)
&& partSpec.get(partColName) != null;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't partSpec.containsKey(partColName) redundant here?

Comment on lines +624 to +625
// Partition cols come from the handler, but user comments are stored in HMS sd.getCols().
// Copy HMS comments onto handler fields so DESCRIBE/SHOW match native table behavior.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add that these can be invalid in the case of imported non-native tables, because the storage handler might not support user comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants