Skip to content

Implement SQL grouping extensions: ROLLUP, CUBE, GROUPING SETS, GROUPING(), GROUP BY DISTINCT#9029

Open
livius2 wants to merge 13 commits into
FirebirdSQL:masterfrom
livius2:grouping_sets__flattened
Open

Implement SQL grouping extensions: ROLLUP, CUBE, GROUPING SETS, GROUPING(), GROUP BY DISTINCT#9029
livius2 wants to merge 13 commits into
FirebirdSQL:masterfrom
livius2:grouping_sets__flattened

Conversation

@livius2

@livius2 livius2 commented May 17, 2026

Copy link
Copy Markdown

Hi

A few years ago I made an earlier attempt at implementing ROLLUP / CUBE support in Firebird.
That prototype was based on joining the input stream with a procedure producing a sequence of numbers/masks.
It worked for simple cases, but the approach turned out to be hard to generalize, difficult to integrate cleanly with the optimizer/executor, and not a good long-term fit for Firebird internals.

After several years of following Firebird development and repository changes, I decided to revisit this feature with a cleaner implementation strategy.

This pull request implements SQL extended grouping support, including:

  • GROUP BY ROLLUP (...)
  • GROUP BY CUBE (...)
  • GROUP BY GROUPING SETS (...)
  • mixed grouping elements, for example GROUP BY a, ROLLUP(b, c)
  • empty grouping set ()
  • GROUPING() with multiple arguments, SQL feature T433
  • GROUP BY DISTINCT, SQL feature T434
  • GROUPING_ID as a compatibility extension

The implementation lowers advanced grouping to existing aggregate and UNION ALL machinery, without adding new BLR opcodes, ODS changes.

This is a flattened commit from my local repo, there were too many commits and experiments there.
Please note that all English texts come from a translator.
I will add tests to Firebird-qa soon.
Tests added now:
FirebirdSQL/firebird-qa#38

  GROUP BY ROLLUP(...)
  GROUP BY CUBE(...)
  GROUP BY GROUPING SETS (...)
  GROUP BY ()
  GROUPING(...)
  GROUPING_ID(...)
  GROUP BY DISTINCT
@sim1984

sim1984 commented May 18, 2026

Copy link
Copy Markdown
Contributor

The implementation lowers advanced grouping to existing aggregate and UNION ALL machinery, without adding new BLR opcodes, ODS changes.

This is unfortunate; at least aggregate functions containing only WITH ROLLUP can be executed in a single pass (without re-executing the query). This is unlikely to be possible with CUBE and GROUPING SETS, but there are options there, such as repeating groupings based on records stored in the "Record Buffer".

@livius2

livius2 commented May 18, 2026

Copy link
Copy Markdown
Author

I agree. This implementation intentionally uses lowering to the existing aggregate + UNION ALL infrastructure as a first correctness-focused step, avoiding new BLR, ODS, record source, or executor changes.

This is not meant to be the final optimal execution strategy. ROLLUP is a good candidate for a future single-pass implementation, as its grouping levels are hierarchical and can be produced from one ordered/grouped stream. CUBE and arbitrary GROUPING SETS are less straightforward, but there are still possible optimizations, for example sharing/materializing the base stream or reusing buffered records instead of re-executing the full input for every grouping set.

For this PR, the main goal is SQL semantics and integration with existing DSQL/BLR paths. Native execution and optimizer improvements can be developed as a follow-up without changing the SQL surface introduced here.

@sim1984

sim1984 commented May 18, 2026

Copy link
Copy Markdown
Contributor

I don't see any reason to change the ODS. The introduction of new BLR verbs may be necessary to improve implementation efficiency, but it's not necessary.

The main question: do you plan to implement support in the optimizer yourself or wait for someone else to do it?

@livius2

livius2 commented May 18, 2026

Copy link
Copy Markdown
Author

Yes, if I understand correctly, by optimizer support you mean a future native execution strategy, for example single-pass execution for pure ROLLUP cases.

I do plan to work on that, but I would prefer to do it after this PR is accepted and merged. This PR has already required a significant amount of work, and I would like to avoid investing in a larger implementation before knowing whether this approach is acceptable.

# Conflicts:
#	src/dsql/parse-conflicts.txt
livius2 added 3 commits June 22, 2026 22:48
# Conflicts:
#	src/common/ParserTokens.h
#	src/dsql/parse-conflicts.txt
#	src/dsql/parse.y
@dyemanov dyemanov self-requested a review July 3, 2026 13:07

@dyemanov dyemanov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first part of the review, the hardest changes inside pass1.cpp are still to be reviewed a bit later (the next week, hopefully).

Comment thread doc/sql.extensions/README.grouping_sets.md
Comment thread src/dsql/parse.y Outdated
Comment thread src/dsql/parse.y Outdated
}
}
else
$$->legacyGroup = NULL;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling that fixup of legacyGroup does not belong to the parser and should better be done inside dsqlPass().

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 25eb2a9

Comment thread src/jrd/RecordSourceNodes.h Outdated
};

explicit Element(Type aType = Type::SIMPLE, ValueListNode* aItems = NULL,
GroupingClause* aGroupingSets = NULL)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nullptr instead of NULL, please

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in cfc8434

Comment thread src/jrd/RecordSourceNodes.h Outdated
{
ALL,
DISTINCT
};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be just a bool, I suppose.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3282fff

Comment thread src/jrd/RecordSourceNodes.h Outdated
{
explicit Dimension(MemoryPool&)
: expr(NULL),
index(0)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in other places -- constant-based initialization should better be done in the member declaration, e.g. unsigned index = 0;. And NestConst already initializes to nullptr by default, no need to assign it explicitly.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7fede78

Comment thread src/dsql/ExprNodes.h
Comment thread src/dsql/ExprNodes.cpp Outdated

dsc* GroupingNode::execute(thread_db* /*tdbb*/, Request* /*request*/) const
{
fb_assert(false);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean <group operation> is temporarily disallowed (returns NULL) until proper BLR codes are added in another PR?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, GROUPING/GROUPING_ID are not temporarily disallowed and they do not return NULL.

The node exists only as a DSQL-level representation parsed from SQL. For supported contexts, advanced grouping is lowered before BLR generation, and GROUPING/GROUPING_ID are replaced with branch-specific integer constants: 0/1 for GROUPING and a bit mask for GROUPING_ID.

So GroupingNode::execute() should never be reached in a valid compiled request. The fb_assert(false) is a defensive guard for an internal lowering failure. Similarly, genBlr() raises an error if a GROUPING expression reaches BLR generation without being lowered.

This PR intentionally avoids new BLR opcodes. Proper native BLR/executor support could be added later as an optimization path, but it is not required for the SQL feature to work in this implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add standard SQL "extended grouping capabilities" - ROLLUP, CUBE, GROUPING SETS [CORE4995] SELECT ... GROUP BY ... WITH CUBE [CORE650]

3 participants