[quantization] Activate Gemma4 multimodal embedder PTQ wrapper#795
Open
dvsav wants to merge 1 commit into
Open
[quantization] Activate Gemma4 multimodal embedder PTQ wrapper#795dvsav wants to merge 1 commit into
dvsav wants to merge 1 commit into
Conversation
Add as_export_module, registry entry, unit tests, internal smoke tests, example script, and wrapper smoke case for QuantGemma4MultimodalEmbedder. TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Activate the
QuantGemma4MultimodalEmbedderPTQ wrapper so thatGemma4MultimodalEmbedderis fully supported in the prepare-calibrate-convert-export pipeline. This includes adding the missingas_export_modulemethod, uncommenting the registry entry, and providing comprehensive test coverage, a smoke test case, and an example script.Why
The
QuantGemma4MultimodalEmbedderwrapper class was already present in the codebase as part of the initial Gemma4 skeleton, but it was not usable in practice for two reasons:prepare()could not discover the wrapper, soGemma4MultimodalEmbeddersubmodules were never wrapped during PTQ preparation.as_export_module()— Without this method, thetico.convert()step would skip Circle export for the multimodal embedder, breaking the end-to-end quantization-to-Circle flow.The multimodal embedder (RMSNorm + Linear projection) is a critical component in the Gemma4 E2B architecture: it projects vision-tower soft tokens into the text decoder's hidden space. Without an active wrapper, any model containing a
Gemma4MultimodalEmbeddersubmodule would fail during PTQ or produce an incomplete Circle model.Key Design Decisions
as_export_module()returnsself— Unlike more complex wrappers (e.g.,QuantGemma4VisionModel) that construct a separate export adapter with decomposed forward logic,QuantGemma4MultimodalEmbedderis a simple sequential module (RMSNorm → Linear). Itsforward()is alreadytorch.export-compatible, so returningselfis sufficient. No separate export adapter class is needed.embedding_pre_projection_normandembedding_projection), so_all_observers()returns(). This is consistent with the design principle that only leaf-level wrappers own observers.prepare()for anyGemma4MultimodalEmbeddersubmodule.Changes
tico/quantization/wrapq/wrappers/gemma4/quant_multimodal_embedder.py— Addedas_export_module()method that returnsself, enabling Circle export.tico/quantization/wrapq/wrappers/registry.py— Uncommented thequant_multimodal_embedderentry in_CORE_MODULES, activating the wrapper for auto-discovery byprepare().test/quantization/wrapq/wrappers/gemma4/test_quant_multimodal_embedder.py— Added 10 unit tests covering: no-quant forward parity, output shape, mode transitions, observer collection, quant mode output finiteness, submodule wrapping, config attributes,as_export_modulereturns self,as_export_moduleforward matches quant forward, andprepareintegration.test/quantization/wrapq/wrappers/gemma4/test_quantize_multimodal_embedder.py— Added 3 internal smoke tests (gated behindRUN_INTERNAL_TESTS=1): no-quant reference parity, full prepare-calibrate-convert flow, andas_export_moduleexport flow.tico/quantization/recipes/debug/wrapper_smoke/cases/gemma4.py— AddedGemma4MultimodalEmbedderCaseto theGEMMA4_CASEStuple withbuild(),_sample(),calibration_inputs(),eval_input(),export_module(), andexport_input()methods.tico/quantization/wrapq/examples/gemma4/quantize_multimodal_embedder.py— Added example script demonstrating the full PTQ flow: create tiny model, prepare, calibrate, convert, compute PEIR, and export to Circle format.Tests
test_quant_multimodal_embedder.py): All pass. Cover wrapper construction, mode transitions, forward parity, observer delegation,as_export_module, andprepare()integration.test_quantize_multimodal_embedder.py): All pass withRUN_INTERNAL_TESTS=1. Cover no-quant reference parity, full prepare-calibrate-convert flow, and export module flow.Unit Tests
Internal Tests
Smoke Test
Example Script
tico/quantization/wrapq/examples/gemma4/quantize_multimodal_embedder.py— Demonstrates the complete PTQ quantization pipeline forGemma4MultimodalEmbedder:Gemma4MultimodalEmbedderwith random weights (no download needed).tico.quantization.prepare().tico.quantization.convert().tico.convert()and saves asgemma4_multimodal_embedder.q.circle.