Skip to content

Fix fill values in all Omega fields#428

Open
xylar wants to merge 22 commits into
E3SM-Project:developfrom
xylar:omega/fix-fill-values
Open

Fix fill values in all Omega fields#428
xylar wants to merge 22 commits into
E3SM-Project:developfrom
xylar:omega/fix-fill-values

Conversation

@xylar

@xylar xylar commented Jun 3, 2026

Copy link
Copy Markdown

Motivation

Omega fields declared their fill values locally in each module with inconsistent values (-9.99e30, -9.99E+30, -999, etc.) that did not match the NetCDF-C standard. More critically, the Kokkos arrays backing those fields were never explicitly initialized: inactive ocean layers (below MaxLayerCell) held uninitialized memory rather than a well-defined sentinel, so output files contained garbage rather than a recognizable fill value. Switching from zero-initialization to fill-value initialization also exposed a latent class of bugs where edge and boundary-layer fields assumed zero at mesh boundaries, producing corrupted tendencies when those layers were no longer zero by default.

Summary

Fix fill-value handling across Omega:

  • Add components/omega/src/base/FillValues.h with centralized constexpr fill value constants (FillValueI4, FillValueI8, FillValueR4, FillValueR8, FillValueReal) that exactly match the NetCDF-C NC_FILL_* / SCORPIO PIO_FILL_* values. A type-indexed variable template FillValue<T> provides the primary interface.
  • Remove the FillValue argument from Field::create() (and from Tracers::define() / TracerDefs.inc). The fill value is now deduced automatically from the array's element type at attachData() time.
  • Add auto-fill to Field::attachData<T>(): a private fillWithValue<T>() helper calls Kokkos::deep_copy(InDataArray, FillValue<ValType>) and stores the fill value in FieldMeta["FillValue"] / ["_FillValue"]. An optional FillOnAttach = false argument is provided for the three anti-pattern cases where attachData() is called to re-point a field to an already-computed array (time-level updates in OceanState and Tracers, and the IO time-coordinate attach/assign ordering).
  • Add two edge-masking helpers to VertCoord (zeroEdgeField, applyEdgeLayerMask) that enforce a three-zone invariant on edge fields: layers outside [MinLayerEdgeTop, MaxLayerEdgeBot] hold FillValueReal; boundary layers inside that range but outside [MinLayerEdgeBot, MaxLayerEdgeTop] hold 0; active layers are unchanged.
  • Call applyEdgeLayerMask on NormalVelocity after IC read (in OceanInit) and after each updateVelocityByTend step (in TimeStepper) to keep the three-zone invariant throughout the run.
  • Call zeroEdgeField on NormalVelocityTend and VelocityDel2Aux.Del2Edge at the start of each tendency computation, so boundary layers show 0 rather than FillValueReal as required for flux-type edge fields.
  • Fix VertAdv::computeVerticalPseudoVelocity (DivHU): clamp the inner edge loop to K <= MaxLayerEdgeTop(JEdge) to avoid reading fill values from edges shallower than the owning cell.
  • Add FillValueTest CTest (8 MPI tasks) covering: fill constant equality with NC_FILL_*, attachData auto-fill, inactive-layer fill verification after VertCoord::computePressure, and the three-zone NormalVelocity invariant after IC initialization.
  • Update Field, Tracer developer/user guide docs.

Edge masking approach

Edge fields in Omega span layers [0, NVertLayers), but the valid extent of an edge is determined by its two neighboring cells. Four per-edge layer indices (stored in VertCoord) define the three zones:

Index Meaning
MinLayerEdgeTop shallowest active layer of the shallower neighbor
MinLayerEdgeBot shallowest active layer of the deeper neighbor
MaxLayerEdgeTop deepest active layer of the shallower neighbor
MaxLayerEdgeBot deepest active layer of the deeper neighbor

Zone 1 — fully inactive (K < MinLayerEdgeTop or K > MaxLayerEdgeBot): both neighboring cells are inactive at this layer. These layers hold FillValueReal and must never participate in any computation.

Zone 2 — boundary layer (MinLayerEdgeTop <= K < MinLayerEdgeBot or MaxLayerEdgeTop < K <= MaxLayerEdgeBot): exactly one neighbor is active. For flux-type edge fields (NormalVelocity, NormalVelocityTend, Del2Edge) the physically correct value is 0 because no flux crosses a solid boundary. For thickness-interpolated fields (MeanPseudoThickEdge, FluxPseudoThickEdge) the value is genuinely undefined and retains FillValueReal.

Zone 3 — active (MinLayerEdgeBot <= K <= MaxLayerEdgeTop): both neighbors are active; the field carries a normally computed value.

VertCoord::zeroEdgeField sets zones 1+2 to 0 (used at the start of each tendency step for flux fields where attachData fill cannot be used because the field is recomputed each step). VertCoord::applyEdgeLayerMask imposes the full three-zone pattern (zone 1 → fill, zone 2 → 0, zone 3 → unchanged) and is applied once after IC/restart read and once after each velocity tendency update.

Checklist

  • Documentation:
  • Linting
  • Testing
    • Add a comment to the PR titled Testing with the following:
      • Which machines CTest unit tests
        have been run on and indicate that are all passing.
      • The Polaris omega_pr test suite
        has passed, using the Polaris e3sm_submodules/Omega baseline
      • Document machine(s), compiler(s), and the build path(s) used for -p for both the baseline (Polaris e3sm_submodules/Omega) and the PR build
      • Indicate "All tests passed" or document failing tests
      • Document testing used to verify the changes including any tests that are added/modified/impacted.
    • New tests:
      • CTest unit tests for new features have been added per the approved design.

Fixes #347
Fixes #409

@xylar xylar force-pushed the omega/fix-fill-values branch from 7fc03ea to 8fc6203 Compare June 3, 2026 10:26
@xylar xylar self-assigned this Jun 3, 2026
@xylar xylar added the clean up label Jun 3, 2026
@xylar xylar force-pushed the omega/fix-fill-values branch from 8fc3284 to 6625253 Compare June 3, 2026 11:33
@mwarusz

mwarusz commented Jun 3, 2026

Copy link
Copy Markdown
Member

It looks to me like it should be possible to automatically deduce the right fill value from the array type when calling attachData and not have to specify it when creating fields. What do you think @xylar ?

@xylar xylar force-pushed the omega/fix-fill-values branch from 6625253 to 8cc604f Compare June 3, 2026 15:52
@xylar

xylar commented Jun 3, 2026

Copy link
Copy Markdown
Author

@mwarusz, great idea! I'll see if I can incorporate that quickly into this PR.

@mwarusz

mwarusz commented Jun 3, 2026

Copy link
Copy Markdown
Member

@xylar Here are some suggestions on how to do it cleanly in C++ that you might find useful. You could define the fill values in FillValues.h as follows:

// Primary template intentionally undefined.
template <typename T>
constexpr T FillValue;

template <>
constexpr I4 FillValue<I4> = -2147483647;             ///< NC_FILL_INT 

template<>
constexpr I8 FillValue<I8> = -9223372036854775806LL;  ///< NC_FILL_INT64

...

Then if T is an array the following code gets the fill value:

constexpr auto FillVal = FillValue<typename T::non_const_value_type>;

The main benefit of doing it this way is that you don't have to write a long chain of if constexpr statements and you only need to modify one place when adding a new fill value.

@xylar xylar force-pushed the omega/fix-fill-values branch from 8cc604f to 147144f Compare June 3, 2026 16:40
@xylar

xylar commented Jun 3, 2026

Copy link
Copy Markdown
Author

@mwarusz, I like that. I'll implement it and run some tests, then I'll ask you to review once I've got it working.

@xylar xylar force-pushed the omega/fix-fill-values branch 2 times, most recently from 5fb3358 to 29d8f99 Compare June 3, 2026 17:59
@xylar xylar requested review from alicebarthel and mwarusz June 3, 2026 17:59
@xylar

xylar commented Jun 3, 2026

Copy link
Copy Markdown
Author

@mwarusz and @alicebarthel, I got optimistic about nearly being done and assigned you to review. But I still have some kinks to work out. I'll take this out of draft mode once it's working (hopefully tomorrow) and ping you again for a review.

@xylar xylar force-pushed the omega/fix-fill-values branch 3 times, most recently from fba980b to 079b462 Compare June 6, 2026 16:21
@xylar xylar force-pushed the omega/fix-fill-values branch from 079b462 to 0804a22 Compare June 8, 2026 09:04
@xylar xylar marked this pull request as ready for review June 8, 2026 09:21
@xylar

xylar commented Jun 8, 2026

Copy link
Copy Markdown
Author

Testing

I tested this after rebasing onto develop to bring in #403.

I am using the polaris branch E3SM-Project/polaris#603 for testing to make sure that the higher-order horizontal advection fix still works with fill values.

CTest unit tests:

  • Machine: chrysalis
  • Compiler: oneapi-ifx
  • Build type: Release
  • Result: All tests passed
  • Log: /lcrc/group/e3sm/ac.xylar/polaris_1.0/chrysalis/test_20260608/omega-pr-fix-fill-values/build/ctests.log

Polaris omega_pr suite

  • Baseline workdir: /lcrc/group/e3sm/ac.xylar/polaris_1.0/chrysalis/test_20260607/omega-pr-higher-order-overflow
  • Baseline build: /lcrc/group/e3sm/ac.xylar/polaris_1.0/chrysalis/test_20260607/overflow/build
  • PR build: /lcrc/group/e3sm/ac.xylar/polaris_1.0/chrysalis/test_20260608/omega-pr-fix-fill-values/build
  • PR workdir: /lcrc/group/e3sm/ac.xylar/polaris_1.0/chrysalis/test_20260608/omega-pr-fix-fill-values
  • Machine: chrysalis
  • Partition: compute
  • Compiler: oneapi-ifx
  • Build type: Release
  • Log: /lcrc/group/e3sm/ac.xylar/polaris_1.0/chrysalis/test_20260608/omega-pr-fix-fill-values/polaris_omega_pr.o1230206
  • Result: All tests passed

@xylar

xylar commented Jun 8, 2026

Copy link
Copy Markdown
Author

@mwarusz and @alicebarthel, I know you may both be busy with work related to the All-hands but I would appreciate a review on this.

I had started to implement a proper treatment of the surface pressure -- initially just reading it in from the initial state if it is present. But that work ran into the problem of needing to define a fill value for SurfacePressure. Thus, I decided to work on this first. I will base my SurfacePressure work on this branch but would appreciate getting it in before too long.

At the same time, I recognize that these changes touch a fair bit of code, so a good review may require some time.

I am happy to explain my thinking, particularly regarding the handling of fields on edges.

@xylar xylar added the bug Something isn't working label Jun 8, 2026
@xylar xylar requested a review from brian-oneill June 10, 2026 09:05
@xylar

xylar commented Jun 10, 2026

Copy link
Copy Markdown
Author

@brian-oneill, I also added you as a reviewer. The changes here are extensive enough that it feels like one more pair of eyes on the code would be a good idea.

@cbegeman cbegeman mentioned this pull request Jun 25, 2026
10 tasks
@cbegeman

Copy link
Copy Markdown

I pushed a few commits that seem to work at putting/retaining fill values in all inactive cells of state fields. I'll do some further testing and inspection after rebasing.

xylar and others added 17 commits June 25, 2026 16:26
This merge adds 5 constexpr fill value constants (FillValueI4,
FillValueI8, FillValueR4, FillValueR8, FillValueReal) in namespace
OMEGA, wrapping PIO_FILL_* from SCORPIO.
This merge adds private fillWithValue<T>() template helper
(using if constexpr on T::value_type) and a call to it at the end
of attachData<T>(). Arrays are now auto-filled with the declared
fill value at attach time.
This merge removes all local fill value declarations and replaces
references with the centralized constants.
In some cases we definitely don't want to overwrite fields we attach
with fill values.

Fix 2 such cases in Tracers and OceanState
We don't want fill values at every boundary edge becausee these
are also valid.  Instead, we want zeros except for edges between
two inactive cells (e.g. both adjacent cells are below bathymetry)
Rather than assuming that the NormalVelocity read from an initial
condition has correct masking, we ensure that it has fill values
for inactive layers, zeros at boundary edges (both adjacent to
land and to bathymetry), and leave its values unchanged for active,
non-boundary layers.
We need to ensure that it is zero at active boundary edges.
Co-authored-by: Maciej Waruszewski <mwarusz@igf.fuw.edu.pl>
@cbegeman cbegeman force-pushed the omega/fix-fill-values branch from 1812b58 to 7cad4b5 Compare June 25, 2026 22:46
@cbegeman cbegeman force-pushed the omega/fix-fill-values branch from 7cad4b5 to bd1d09f Compare June 25, 2026 23:48
@cbegeman

Copy link
Copy Markdown

@mwarusz When you have a chance, let me know what you think of the changes I made.

@xylar

xylar commented Jun 26, 2026

Copy link
Copy Markdown
Author

@cbegeman, thanks so much for working on this. Depending on where things stand, I can pick this up again starting on Monday.

@cbegeman

Copy link
Copy Markdown

Testing

Polaris omega_pr suite

  • Baseline workdir: /lcrc/group/e3sm/ac.cbegeman/scratch/MPAS-Ocean-output/chrys/polaris-wind-omega-windforcing-20260622/
  • Baseline build: /lcrc/group/e3sm/ac.cbegeman/scratch/MPAS-Ocean-output/chrys/polaris-wind-omega-windforcing-20260622/build
  • PR build: /lcrc/group/e3sm/ac.cbegeman/scratch/MPAS-Ocean-output/chrys/polaris-fillvalue-omega-fillvalue-20260625/build
  • PR workdir: /lcrc/group/e3sm/ac.cbegeman/scratch/MPAS-Ocean-output/chrys/polaris-fillvalue-omega-fillvalue-20260625
  • Machine: chrysalis
  • Partition: compute
  • Compiler: oneapi-ifx
  • Build type: Release
  • Log: not found
  • Result: All tests passed

I checked that there are fill values in the overflow Temperature field in this suite and not in the baseline run. My interpretation is that the baseline comparison does not fail when comparing fill value with a float.


// ------------------------------------------------------------------
// Test 4: NormalVelocity has the correct 3-zone pattern after IC read
// and applyEdgeLayerMask():

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding additional tests for applyCellLayerMask and applyVertexLayerMask

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working clean up

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ensure that invalid cells and levels are populated with FillValue Invalid data below MaxLayerCell is not being masked as invalid in output

4 participants