Add NMR data model, integration with EnzymeML, and peak assignment features by torogi94 · Pull Request #13 · NMRPy/nmrpy

torogi94 · 2025-01-20T15:13:43Z

Add md-models-based research data model for NMR
- Collect NMR parameters from instrument metadata
- Record processing steps preformed in NMRpy
- Save both the raw data and the latest version of the processed data
- Assign species from a list or from an EnzymeML document to the peaks
Add EnzymeML integration using pyenzyme library
- Load EnzymeML document into a FidArray
- Have EnzymeML species available within Fid object
- Save calculated concentrations to and return the EnzymeML document
Add widgets to assign peaks either on the entire FidArray or single Fid objects
- Use EnzymeML species if document available
- Pass list of EnzymeML species if no document has been loaded yet
- Use list of simple strings instead of working with EnzymeML standard
- Use only a slice of Fids when assigning on the entire FidArray by passing an index list

jmrohwer · 2025-01-20T19:56:48Z

@torogi94 Thanks for this!
I need some time to look through this. But from a brief inspection the following come to mind:

Dependencies: utils.py imports sympy as well as pyenzyme, yet none of them are declared in requirements.txt. We should discuss dependencies and which of these are needed for core functionality. One option is to have optional dependencies for certain functionalities. Pyenzyme itself has a multitude of dependencies. I am not keen for the core NMRPy library to have a huge list of dependences. Associated with that the version requirement dependency hell.
Related: how stable are pyenzyme, md-models and the other dependencies from your group? E.g. Pyenzyme vs. Pyenzyme2? My concern is long-term maintenance.
The merge conflict needs to be resolved.
Looking at data_objects.py as well as plottling.py there are a very large number of changes relating to linting and formatting (e.g. single vs. double quotes, linebreaks etc.). Are these linting changes done in a separate commit? As it is, it's very tedious to separate the real code changes from "cosmetic" linting.

torogi94 · 2025-01-23T15:48:59Z

Thank you for looking over it! Here, a few comments:

Dependencies: utils.py imports sympy as well as pyenzyme, yet none of them are declared in requirements.txt. We should discuss dependencies and which of these are needed for core functionality. One option is to have optional dependencies for certain functionalities. Pyenzyme itself has a multitude of dependencies. I am not keen for the core NMRPy library to have a huge list of dependences. Associated with that the version requirement dependency hell.

Good catch! I really did forget to declare the new dependencies in the requirements.txt file. I think you are right that we should not necessarily introduce them to the core of NMRpy. How about we declare them as optional nmrpy[enzymeml]? For the NMRpy data model, however, it would be easier if we could make pydantic a core dependency. If I am not mistaken, it is the only one required for the data model to work. If not, I can restructure my code, so the data model part becomes entirely optional, too, though.

Related: how stable are pyenzyme, md-models and the other dependencies from your group? E.g. Pyenzyme vs. Pyenzyme2? My concern is long-term maintenance.

We are preparing a new stable pyenzyme (v2) release which we can then pin in the requirements. In my experience, it is already running better and will be more maintainable than the previous v1 version. With regard to md-models, the NMRpy data model is not dependent at all on the md-models version. Actually, it is not even a dependency needed, as all functionalities are handled via pydantic, which is itself a very well maintained library!

The merge conflict needs to be resolved.

Absolutely. I’ll merge main into my branch and fix the conflict manually so you can have a clean merge without conflicts!

Looking at data_objects.py as well as plottling.py there are a very large number of changes relating to linting and formatting (e.g. single vs. double quotes, linebreaks etc.). Are these linting changes done in a separate commit? As it is, it's very tedious to separate the real code changes from "cosmetic" linting.

Sorry for the noisy diff, it seems I accidentally let my linter loose on the two files and did not even notice... I’ll revert the linting changes from this PR and submit them in a separate PR (like discussed).

jmrohwer · 2025-01-24T07:34:00Z

Good catch! I really did forget to declare the new dependencies in the requirements.txt file. I think you are right that we should not necessarily introduce them to the core of NMRpy. How about we declare them as optional nmrpy[enzymeml]? For the NMRpy data model, however, it would be easier if we could make pydantic a core dependency. If I am not mistaken, it is the only one required for the data model to work. If not, I can restructure my code, so the data model part becomes entirely optional, too, though.

I agree with making pydantic a core dependency and the others as optional in nmrpy[enzymeml].

On another note. I also have a conda build recipe as well as a CI job that builds and distributes a conda package. This has been very easy to date because NMRPy is a pure python package and there are/were conda packages available for all of the dependencies (channel conda-forge). Is PyEnzyme released as a conda package? I would like to continue distributing conda packages, just as we do for PySCeS as well.

We are preparing a new stable pyenzyme (v2) release which we can then pin in the requirements. In my experience, it is already running better and will be more maintainable than the previous v1 version. With regard to md-models, the NMRpy data model is not dependent at all on the md-models version. Actually, it is not even a dependency needed, as all functionalities are handled via pydantic, which is itself a very well maintained library!

Okay great.

Sorry for the noisy diff, it seems I accidentally let my linter loose on the two files and did not even notice... I’ll revert the linting changes from this PR and submit them in a separate PR (like discussed).

Give me a ping once you are ready with all of this then I'll proceed with the detailed code review 😄

torogi94 · 2025-01-29T09:46:32Z

I think I have addressed all points mentioned so far, so you may continue with the detailed code review now!

On another note. I also have a conda build recipe as well as a CI job that builds and distributes a conda package. This has been very easy to date because NMRPy is a pure python package and there are/were conda packages available for all of the dependencies (channel conda-forge). Is PyEnzyme released as a conda package? I would like to continue distributing conda packages, just as we do for PySCeS as well.

I am in contact with @JR-1991 about the conda-forge release. While it was not planned initially, he says we can make it happen! I will come back to you about this when we are preparing the PyPI release of v2. For now, the GitHub version is still used in the new optional requirements for nmrpy[enzymeml]. Of course, I will change that to the stable PyPI version as soon as it is available!

torogi94 · 2025-05-05T15:21:49Z

I have addressed a few more issues we discussed in pull requests #14 and #15 and merged them with this pull request:

With Add measurement handling #14, handling of multiple EnzymeML measurements is now possible.
With Rework data handling #15, data handling within NMRpy has been optimized to keep memory requirements and computation times low.

torogi94 · 2025-08-27T14:01:12Z

@jmrohwer: In #18, I have added handling of t0 values in MeasurementData of EnzymeML documents to the library. Initial values can now be set using FidArray.add_t0_to_enzymeml(). Both an option for interactive assignment using the new T0Adder widget and script-like handling when setting gui=False are available. The user has the option to either using the existing t1 values as t0, if no initials have been measured, or provide measured t0 values instead. Furthermore, an offset value to account for delays in measurements can be applied to the time array.

Copilot

Pull request overview

This PR introduces an NMR-focused research data model aligned with md-models/EnzymeML, adds EnzymeML integration (via optional pyenzyme), and extends interactive widgets for peak assignment and measurement editing.

Changes:

Added NMRpy/EnzymeML-aligned data model specifications and an auto-generated Pydantic schema (JSON-LD capable).
Implemented EnzymeML helpers (species lookup, measurement construction, applying concentrations) and new peak/t0/measurement widgets.
Updated packaging/tooling (optional extras for EnzymeML, dependency bumps, Ruff config) and added/extended tests.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 18 comments.

Show a summary per file

File	Description
specifications/nmrpy.md	Adds written spec for the NMRpy data model objects/enums.
specifications/datamodel_schema.md	Adds Mermaid class diagram for the data model structure.
setup.py	Adds EnzymeML optional extras and updates author metadata.
ruff.toml	Introduces Ruff formatting/lint configuration.
requirements.txt	Updates baseline dependencies and adds pydantic.
nmrpy/utils.py	Adds EnzymeML utility helpers (species, measurements, serialization helpers).
nmrpy/tests/nmrpy_tests.py	Adds tests for new data model + EnzymeML-related setters.
nmrpy/plotting.py	Adds peak assignment + t0/measurement widgets and integrates EnzymeML utilities.
nmrpy/nmrpy_model.py	Adds generated Pydantic models for the NMRpy data model (JSON-LD fields).
nmrpy/data_objects.py	Integrates data model + EnzymeML into Fid/FidArray and processing steps tracking.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-17T15:49:47Z

+from typing import Mapping, Optional
+
+from pyenzyme.versions.v2 import Protein


This unconditional pyenzyme import will raise at import-time when pyenzyme isn't installed, even though the rest of the file treats it as an optional dependency. Move Protein into the existing try: block (or avoid importing it directly and use getattr(pyenzyme, ...) / isinstance(..., pyenzyme.Protein) inside the guarded section).

@torogi94 Please look into this and confirm that you agree. Only usage of Protein is in line 1758. How will this affect the functionality if pyenzyme is not installed? Does the PeakAssigner class depend on pyenzyme? I'm assuming so, see

nmrpy/nmrpy/data_objects.py

Lines 1467 to 1475 in 1928d7c

if (pyenzyme is None) and (isinstance(species_list, EnzymeMLDocument)):

raise RuntimeError(

'The `pyenzyme` package is required to use NMRpy with an EnzymeML document. Please install it via `pip install nmrpy[enzymeml]`.'

)

self._assigner_widget = PeakAssigner(

fid=self,

species_list=species_list,

title='Assign species for {}'.format(self.id),

)

Thus seems okay to move the Protein import into the existing try/except block. Done.

Copilot · 2026-02-17T15:49:47Z

+            self.__fid_object.raw_data = [str(datum) for datum in self.raw_data]
+            self.__fid_object.processed_data = [float(datum) for datum in self.data]
+        except AttributeError:


float(datum) will raise TypeError when self.data contains complex values (common before .real()), but only AttributeError is caught here. Update serialization logic to handle complex processed data (e.g., store strings for complex values) and/or catch TypeError to avoid hard crashes when accessing fid_object.

Suggested change

self.__fid_object.raw_data = [str(datum) for datum in self.raw_data]

self.__fid_object.processed_data = [float(datum) for datum in self.data]

except AttributeError:

# Always serialise raw_data as strings

self.__fid_object.raw_data = [str(datum) for datum in self.raw_data]

# Serialise processed_data, handling complex or otherwise non-floatable values gracefully

processed_data = []

for datum in self.data:

try:

processed_data.append(float(datum))

except (TypeError, ValueError):

# Fallback for complex or non-scalar values: store string representation

processed_data.append(str(datum))

self.__fid_object.processed_data = processed_data

except (AttributeError, TypeError):

Calling float(datum) on a complex128 will not raise a TypeError but just issue a ComplexWarning and discard the imaginary part. Are we going to just ignore this error message? The point is, do you want the FIDObject to handle complex data in the processed_data list? This is an object that got introduced with the data model.

- Added T0Tab and T0Logic classes to utils.py - Added T0Adder widget class to plotting.py - Added add_t0_to_enzymeml() method that can be used interactively with a Jupyter widget using gui=True or script-like with gui=False

Changed behaviour of Fid._setup_peak_objects(). Multiple peaks per range are handled properly now by using Fid._grouped_peaklist instead of Fid.peaks. Also refactored the method to make it more robust and readable overall.

- Added NMR parameters to fid_object setter - Fixed p0 and p1 phasing parameter assignment

- Added tests for all of utils.py - Added tests for new widgets in plotting.py - Removed unused imports from nmrpy_tests.py

- Added information on optional installs to installation.rst - Added data model information to intro.rst - Added utils.py information to new utility_objects.rst - Updated index.rst - Described new features in quickstart.rst - Added screenshots to _static/

jmrohwer · 2026-06-19T15:25:10Z

@torogi94 Before merge, we actually need to deal with the Copilot comments. I've been through the code and they have not yet been applied. Most of them seem useful and true, and indeed correct errors, so I'd appreciate it if you could go through them yourself as well. The issue of p0/p1 in radians vs. degrees I need to investigate a bit further myself but don't have the time right now.

jmrohwer · 2026-06-23T10:16:34Z

@torogi94 I have dealt with most of the remaining issues from the Copilot review, but there are some outstanding ones that you need to deal with - I have mentioned you in the comments. Please review the changes in the last commit (79f1ff5) if you agree.

Also FYI I updated the build system to use pyproject.toml and get rid of the deprecated setup.py. Been meaning to do this for a while.

…ditional classifiers

jmrohwer · 2026-06-24T19:55:13Z

3 more commits to fix compatibility bugs. Numpy 2.5 has many deprecations, so the nmrglue import crashes out with a bug. There has not been a new release yet (the last one was 2 years ago!) so for the time being we have to get the requirement from github.

Furthermore another numpy 2.4 (which numpy are you using @torogi94 ?) deprecation caused one of the tests to fail. Fixed.

Also fixed another bug that manifested if pyenzyme was not installed.

Checked that all tests (except plotting) now pass when pyenzyme is installed and when not. In a new environment with the latest version of all packages.

jmrohwer · 2026-06-25T14:55:45Z

Updated the scripts for Anaconda build and for CI. The tests are split out now that first the tests run without pyenzyme installed and then the remaining ones after installing pyenzyme. All passes if you look at the CI runs so it should be good to go 🤔
Except for the Copilot reviews that still need resolving as discussed.

torogi94 added the enhancement label Jan 20, 2025

torogi94 requested a review from jmrohwer January 20, 2025 15:13

torogi94 assigned jmrohwer Jan 20, 2025

torogi94 force-pushed the data-model-refactor branch 2 times, most recently from 8182bfc to 9f55722 Compare November 13, 2025 11:51

Copilot AI review requested due to automatic review settings February 17, 2026 15:37

Copilot started reviewing on behalf of torogi94 February 17, 2026 15:43 View session

Copilot AI reviewed Feb 17, 2026

View reviewed changes

torogi94 and others added 16 commits March 14, 2026 13:10

Initial commit

1b8f283

Minor changes

d15d298

Implement IdentityAssigner for one FID object

9a85080

Update generate_api.yaml

b67d327

Update generate_api.yaml

8a76be5

Update generate_api.yaml

d90ad72

Update generate_api.yaml

0ab36e6

Update generate_api.yaml

59c1b4a

Implement IdentityAssigner for entire FIDArray

c85239d

Update data model

b70add9

Minor bug fix

2391d9d

Add interface with EnzymeML

c545190

Update IdentityAssigner

bdd6e4d

Update generate_api.yaml

d08af42

Update generate_api.yaml

4ee3f87

Update generate_api.yaml

c13f22d

torogi94 and others added 11 commits March 14, 2026 13:17

Upgrade to pyenzyme v2

50f06aa

Add t0 handling (#18)

46173be

- Added T0Tab and T0Logic classes to utils.py - Added T0Adder widget class to plotting.py - Added add_t0_to_enzymeml() method that can be used interactively with a Jupyter widget using gui=True or script-like with gui=False

Fix copy() error in plotting.py

c2b6398

Add explicit super init to FidArray

f57ca3f

Update range handling in Fid (#20)

e96ff4d

Changed behaviour of Fid._setup_peak_objects(). Multiple peaks per range are handled properly now by using Fid._grouped_peaklist instead of Fid.peaks. Also refactored the method to make it more robust and readable overall.

Add attribute checks for data model existence

c797d04

Fix faulty argument to T0Adder widget

0f097ec

Add data model test suites

ca89e34

Add data model property unit tests + fixes

787d7bb

Fix parameter mapping to data model

19624c0

- Added NMR parameters to fid_object setter - Fixed p0 and p1 phasing parameter assignment

Add proper single NMR in array edge case handling

27ef6e3

torogi94 force-pushed the data-model-refactor branch from ccb2087 to 27ef6e3 Compare March 14, 2026 12:19

torogi94 added 4 commits March 18, 2026 14:54

Update references to deprecated PyEnzyme features

976e7a7

Add unit tests for new features

09c56f3

- Added tests for all of utils.py - Added tests for new widgets in plotting.py - Removed unused imports from nmrpy_tests.py

Replace double ticks with single ticks

ef864a4

jmrohwer added 3 commits June 23, 2026 10:58

convert build to use pyproject.toml

457f7fb

bump version to 1.0.0.dev0

1928d7c

apply fixes from Copilot code review

79f1ff5

jmrohwer added 5 commits June 23, 2026 12:55

update README.md with extended description and pyproject.toml with ad…

d366d45

…ditional classifiers

remove License from classifiers

edd6626

fix nmrglue dependency

4c67c68

fix nmrglue dependency

070a27b

fix numpy 2.4 deprecation bug and various typos

5183f3b

jmrohwer added 2 commits June 24, 2026 22:34

fix more imports when pyenzyme is not installed

848ac6e

update Anaconda and CI build scripts

2e3380c

		from typing import Mapping, Optional

		from pyenzyme.versions.v2 import Protein

	if (pyenzyme is None) and (isinstance(species_list, EnzymeMLDocument)):
	raise RuntimeError(
	'The `pyenzyme` package is required to use NMRpy with an EnzymeML document. Please install it via `pip install nmrpy[enzymeml]`.'
	)
	self._assigner_widget = PeakAssigner(
	fid=self,
	species_list=species_list,
	title='Assign species for {}'.format(self.id),
	)

-            self.__fid_object.raw_data = [str(datum) for datum in self.raw_data]
-            self.__fid_object.processed_data = [float(datum) for datum in self.data]
-        except AttributeError:
+            # Always serialise raw_data as strings
+            self.__fid_object.raw_data = [str(datum) for datum in self.raw_data]
+            # Serialise processed_data, handling complex or otherwise non-floatable values gracefully
+            processed_data = []
+            for datum in self.data:
+                try:
+                    processed_data.append(float(datum))
+                except (TypeError, ValueError):
+                    # Fallback for complex or non-scalar values: store string representation
+                    processed_data.append(str(datum))
+            self.__fid_object.processed_data = processed_data
+        except (AttributeError, TypeError):

Uh oh!

Conversation

torogi94 commented Jan 20, 2025

Uh oh!

jmrohwer commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

torogi94 commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmrohwer commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

torogi94 commented Jan 29, 2025

Uh oh!

torogi94 commented May 5, 2025

Uh oh!

torogi94 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

jmrohwer Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

jmrohwer Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmrohwer commented Jun 19, 2026

Uh oh!

jmrohwer commented Jun 23, 2026

Uh oh!

jmrohwer commented Jun 24, 2026

Uh oh!

jmrohwer commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jmrohwer commented Jan 20, 2025 •

edited

Loading

torogi94 commented Jan 23, 2025 •

edited

Loading

jmrohwer commented Jan 24, 2025 •

edited

Loading

torogi94 commented Aug 27, 2025 •

edited

Loading

jmrohwer Jun 23, 2026 •

edited

Loading