Skip to content

task: reset canon length units on program open (fixes #4194)#4196

Draft
grandixximo wants to merge 1 commit into
LinuxCNC:masterfrom
grandixximo:fix/canon-units-reset
Draft

task: reset canon length units on program open (fixes #4194)#4196
grandixximo wants to merge 1 commit into
LinuxCNC:masterfrom
grandixximo:fix/canon-units-reset

Conversation

@grandixximo

Copy link
Copy Markdown
Contributor

Root cause

canon.lengthUnits (in emccanon.cc) is module-global and is only updated when the interpreter emits USE_LENGTH_UNITS, i.e. when it reaches a G20/G21. It is never reset when a program is reopened. So on a re-run, the program-open and setup conversions that happen before the units word are done with the previous run's units (off by 25.4x), while the first run used the machine default.

That difference is small, but it perturbs the entry velocity into the first segments. On the S-curve planner the short-tangent ramp/no-ramp decision is sensitive to the live entry velocity (see the analysis in #4194), so the perturbation flips borderline segments and cascades through the rest of the run. The net effect is that the entire second run commands a different, rougher velocity profile than the first, even though the G-code is identical.

Fix

Reset canon.lengthUnits to the machine default in emcTaskPlanOpen() before reopening the interpreter, so every run starts from the same canon unit state:

double units = GET_EXTERNAL_LENGTH_UNITS();
if (fabs(units - 1.0 / 25.4) < 1.0e-3) {
    USE_LENGTH_UNITS(CANON_UNITS_INCHES);
} else if (fabs(units - 1.0) < 1.0e-3) {
    USE_LENGTH_UNITS(CANON_UNITS_MM);
}

Testing

  • axis_mm_scurve, PLANNER_TYPE=1, dense micro-segment file: before the fix the 2nd+ runs are rough; after the fix every run matches the first.
  • Long single run: smooth start to end, no degradation toward the end after the fix.
  • Present on both 2.9 and 2.10. This PR targets 2.10; a byte-identical back-port to 2.9 will follow once this finalizes, so the periodic 2.9 to 2.10 back-merge stays conflict-free.

@greatEndian I believe this is the source of the run-to-run difference. Could you test it against your axis_mm_scurve dense micro-segment file and confirm both the rerun roughness and any long-run degradation are gone for you?

Fixes #4194

G20/G21 from a previous run persists in canon.lengthUnits, causing
FROM_PROG_LEN() to convert differently on re-run for G-code commands
that appear before the G20/G21 line. This shows up as wrong unit
scaling (off by 25.4x) on the lead-in moves of the second run,
producing jagged velocity in the scope.

Reset canon.lengthUnits to the machine default in emcTaskPlanOpen
before reopening the interpreter.

Fixes LinuxCNC#4194
@BsAtHome

Copy link
Copy Markdown
Contributor

Are there other machine state issues when opening a new file?

@grandixximo

grandixximo commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Good question, it goes to the heart of it. Let me give the concrete mechanism first, then the options, because I think this is a direction call more than a one-line fix.

The trigger is in the gcode:

N10 G90 G94 G17 G91.1 G64 P0.01 R0.0
N15 G20

G64 P is unit-aware: the tolerance is interpreted in whatever length unit is active when the line executes. Here the program sets the tolerance on N10 but does not declare units until N15. So:

  • First run after startup: INIT_CANON() (called from interp.init() at task start) has set canon.lengthUnits to the machine-native unit, mm on this config. So P0.01 = 0.01 mm, a sane tolerance, and it runs smooth. N15 G20 then switches to inches.
  • Second and later runs: the file is re-opened but INIT_CANON() is not re-run on a program open, so canon.lengthUnits is still inches, left over from the previous run's G20. Now P0.01 = 0.254 mm, and on a dense micro-segment path it runs rough. N15 G20 is then a no-op.

So the run-to-run difference is the init-versus-open asymmetry: machine state is re-established to native at interp.init() (startup) but not at program open. The first run starts native; every re-open inherits the previous run's leftover.

To your actual question, are there other machine state issues on opening a new file: yes, structurally. INIT_CANON() resets the whole canon modal set (offsets, plane, motion mode, feed mode and rate, tool offset, tolerances, units), and none of it is re-run on a plain open. Units is the only one that surfaced because it is the only modal that rescales values rather than just selecting a state. The others (plane, distance mode, feed mode, path-control mode) are re-declared by the program's own safety line, N10 here sets G90 G94 G17 G64, so their staleness is overwritten before use. Units would be too, if the program declared G20/G21 before the first unit-dependent word, which this one does not.

Granted the gcode is malformed: units should precede any unit-dependent word, and every controller documents the safe-start line for exactly this reason. But even malformed gcode should run the same way on every run, and right now it does not. That inconsistency is the part I would not want to leave as is, independent of whose fault the gcode is.

A few directions, and I would like your read:

  1. This PR: re-establish machine-native units at program start (the open path), so it matches what interp.init() already does at startup. Minimal, and it makes every run consistent. It does not pretend the gcode is correct; it just defines the start-of-program default as the machine's native unit, the same default the first run already gets.

  2. On top of that, an opt-in [RS274NGC] flag, say RETAIN_G20_G21, to instead persist the modal unit across restart, saved in the var file and restored at init, matching Fanuc and most other controls for users who want that.

  3. Or reconsider the default toward Fanuc-style persistence more broadly. The presence of DISABLE_G92_PERSISTENCE and RETAIN_G43 suggests LinuxCNC's state-persistence model is already a bit uneven compared to other controllers, so this may be worth a wider look than just units.

My own lean is option 1 as the baseline, since it gives consistent behavior with the least disruption and matches the existing boot-to-native model, with option 2 available if retained units are wanted. But you or @andypugh might have a better sense of the historical intent here, so I would rather hear which way you think it should go before I expand the PR.

@BsAtHome

Copy link
Copy Markdown
Contributor

There are three things that come in mind (probably more):

  1. machine modal - the state when the machine starts/power-up
  2. session modal - the state when a user sets up the machine
  3. program modal - the state before and after a program run:
    ** considering re-running the same program
    ** considering loading and running a new program

The real question that needs to be answered is what the user expects at 3 and what we are supposed to provide. The first and second are normally deterministic afaik. However, there is bleed-over into 2 from running a program. What is "session modal" in the context of just having run a program?

Even re-running the same program can result in disaster when you switch units in the middle and don't reset to the previous state. Bad, program code, yes, but should that error plow through your machine?

For a "quick fix", you could restore the session modal state after a program finishes. That would prevent you from "seeing" what exact state the program ended in. But any new run, same or new program, would always see the same state.

@grandixximo

Copy link
Copy Markdown
Contributor Author

The way I read the report now: the bug isn't that the second run is rough, it's that malformed gcode doesn't run consistently bad. Frame it that way and holding the state is the fix, not resetting it. That's my shift, and it's why I land opposite to restoring session modal on stop or end.

Persistence is what every other control does and what an experienced operator expects. The modal state, above all the active WCS and offsets but units too, is set deliberately and relied on staying put. Resetting it out from under the operator after a program ends is its own surprise and its own crash class, and no other control sandboxes a run this way.

On what "session modal" is after a program: the program's end state is the new session state. That's the model on Fanuc and the rest; the operator knows what they ran and what it left active.

On the safety case: a program that flips units mid-run and is re-run from a stale state is bad gcode, and the guard against it is the safe-start line that every post already uses. Having the controller silently reset deliberately-set state to defend against malformed programs trades a rare, self-inflicted problem for a common surprise. Better the bad program run wrong consistently, like on any other control, so the lesson is the usual one: declare your modals at the top.

The defect here exists only because state is re-established to native at startup (INIT_CANON() from interp.init()) but not on a re-open. The consistent answer in the retain direction is to make startup honor the retained unit too, so the first run and every later run see the same state, without ever resetting modals on open or finish. DISABLE_G92_PERSISTENCE and RETAIN_G43 already point this way: persistence with opt-outs, not session resets.

Do you or @andypugh see a reason the historical reset-to-native-at-startup should win over matching the persistence every other control provides? If retain is the agreed direction I'll reshape the PR toward saving and restoring the unit state rather than resetting it.

@grandixximo grandixximo marked this pull request as draft June 24, 2026 14:10
@Sigma1912

Sigma1912 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Persistence is what every other control does and what an experienced operator expects. The modal state, above all the active WCS and offsets but units too, is set deliberately and relied on staying put. Resetting it out from under the operator after a program ends is its own surprise and its own crash class, and no other control sandboxes a run this way.

  1. I very much agree with this comment

  2. There is a fair bit of comment about resetting modal gcodes here:
    https://github.com/Sigma1912/linuxcnc/blob/3dd423905a261b33e2e2ee76245bb3ddea7508b7/src/emc/rs274ngc/interp_convert.cc#L5087-L5124

Note that we have just added an opt-in setting to persist the WCS: #4093

  1. If we wanted to add the G21/G20 reset, shouldn't we add it in Interp::convert_stop (this would also take care point 4):
    https://github.com/Sigma1912/linuxcnc/blob/3dd423905a261b33e2e2ee76245bb3ddea7508b7/src/emc/rs274ngc/interp_convert.cc#L5128

  2. Gremlin preview has just recently been modified to use the currently active modals. If the task interpreter just silently resets G20/G21 then this also needs to be mirrored in the preview interpreter as otherwise the preview might show an incorrect tool path for gcode lacking a proper preamble.
    master gremlin: Initialize preview interpreter with active modal gcodes #4123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

S-curve planner (PLANNER_TYPE=1): velocity/acceleration "peaking" on dense micro-segment G-code, worse on 2nd+ run of the same file

3 participants