Implement SQL grouping extensions: ROLLUP, CUBE, GROUPING SETS, GROUPING(), GROUP BY DISTINCT#9029
Implement SQL grouping extensions: ROLLUP, CUBE, GROUPING SETS, GROUPING(), GROUP BY DISTINCT#9029livius2 wants to merge 13 commits into
Conversation
GROUP BY ROLLUP(...) GROUP BY CUBE(...) GROUP BY GROUPING SETS (...) GROUP BY () GROUPING(...) GROUPING_ID(...) GROUP BY DISTINCT
This is unfortunate; at least aggregate functions containing only |
|
I agree. This implementation intentionally uses lowering to the existing aggregate + UNION ALL infrastructure as a first correctness-focused step, avoiding new BLR, ODS, record source, or executor changes. This is not meant to be the final optimal execution strategy. ROLLUP is a good candidate for a future single-pass implementation, as its grouping levels are hierarchical and can be produced from one ordered/grouped stream. CUBE and arbitrary GROUPING SETS are less straightforward, but there are still possible optimizations, for example sharing/materializing the base stream or reusing buffered records instead of re-executing the full input for every grouping set. For this PR, the main goal is SQL semantics and integration with existing DSQL/BLR paths. Native execution and optimizer improvements can be developed as a follow-up without changing the SQL surface introduced here. |
|
I don't see any reason to change the ODS. The introduction of new BLR verbs may be necessary to improve implementation efficiency, but it's not necessary. The main question: do you plan to implement support in the optimizer yourself or wait for someone else to do it? |
|
Yes, if I understand correctly, by optimizer support you mean a future native execution strategy, for example single-pass execution for pure ROLLUP cases. I do plan to work on that, but I would prefer to do it after this PR is accepted and merged. This PR has already required a significant amount of work, and I would like to avoid investing in a larger implementation before knowing whether this approach is acceptable. |
# Conflicts: # src/dsql/parse-conflicts.txt
# Conflicts: # src/common/ParserTokens.h # src/dsql/parse-conflicts.txt # src/dsql/parse.y
dyemanov
left a comment
There was a problem hiding this comment.
This is the first part of the review, the hardest changes inside pass1.cpp are still to be reviewed a bit later (the next week, hopefully).
| } | ||
| } | ||
| else | ||
| $$->legacyGroup = NULL; |
There was a problem hiding this comment.
I have a feeling that fixup of legacyGroup does not belong to the parser and should better be done inside dsqlPass().
| }; | ||
|
|
||
| explicit Element(Type aType = Type::SIMPLE, ValueListNode* aItems = NULL, | ||
| GroupingClause* aGroupingSets = NULL) |
There was a problem hiding this comment.
nullptr instead of NULL, please
| { | ||
| ALL, | ||
| DISTINCT | ||
| }; |
There was a problem hiding this comment.
This could be just a bool, I suppose.
| { | ||
| explicit Dimension(MemoryPool&) | ||
| : expr(NULL), | ||
| index(0) |
There was a problem hiding this comment.
Here and in other places -- constant-based initialization should better be done in the member declaration, e.g. unsigned index = 0;. And NestConst already initializes to nullptr by default, no need to assign it explicitly.
|
|
||
| dsc* GroupingNode::execute(thread_db* /*tdbb*/, Request* /*request*/) const | ||
| { | ||
| fb_assert(false); |
There was a problem hiding this comment.
Does it mean <group operation> is temporarily disallowed (returns NULL) until proper BLR codes are added in another PR?
There was a problem hiding this comment.
No, GROUPING/GROUPING_ID are not temporarily disallowed and they do not return NULL.
The node exists only as a DSQL-level representation parsed from SQL. For supported contexts, advanced grouping is lowered before BLR generation, and GROUPING/GROUPING_ID are replaced with branch-specific integer constants: 0/1 for GROUPING and a bit mask for GROUPING_ID.
So GroupingNode::execute() should never be reached in a valid compiled request. The fb_assert(false) is a defensive guard for an internal lowering failure. Similarly, genBlr() raises an error if a GROUPING expression reaches BLR generation without being lowered.
This PR intentionally avoids new BLR opcodes. Proper native BLR/executor support could be added later as an optimization path, but it is not required for the SQL feature to work in this implementation.
…be nested inside another grouping set, matching the parser and SQL specification.
…ation and lowering logic
Hi
A few years ago I made an earlier attempt at implementing ROLLUP / CUBE support in Firebird.
That prototype was based on joining the input stream with a procedure producing a sequence of numbers/masks.
It worked for simple cases, but the approach turned out to be hard to generalize, difficult to integrate cleanly with the optimizer/executor, and not a good long-term fit for Firebird internals.
After several years of following Firebird development and repository changes, I decided to revisit this feature with a cleaner implementation strategy.
This pull request implements SQL extended grouping support, including:
The implementation lowers advanced grouping to existing aggregate and UNION ALL machinery, without adding new BLR opcodes, ODS changes.
This is a flattened commit from my local repo, there were too many commits and experiments there.
Please note that all English texts come from a translator.
I will add tests to Firebird-qa soon.
Tests added now:
FirebirdSQL/firebird-qa#38