Skip to content

MLE-30241: Apply XXE protections#1941

Open
jonmille wants to merge 2 commits into
developfrom
MLE-30241
Open

MLE-30241: Apply XXE protections#1941
jonmille wants to merge 2 commits into
developfrom
MLE-30241

Conversation

@jonmille

Copy link
Copy Markdown

Summary

Fixes FIND-005 from the Java Client Security Findings Report (May 14, 2026),
tracked as MLE-30241.

OpenCSVBatcher.write() was creating a DocumentBuilderFactory via
DocumentBuilderFactory.newInstance() with no XXE mitigations, in contrast
to the rest of the library which consistently uses the hardened
XmlFactories.getDocumentBuilderFactory(). This provided an insecure
baseline to any developer who adapts or extends the example (CWE-611,
CVSS 6.1 Medium).

Changes

examples/.../OpenCSVBatcher.java

  • Replaced DocumentBuilderFactory.newInstance() + two set* calls +
    newDocumentBuilder() with a single call to
    XmlFactories.getDocumentBuilderFactory().newDocumentBuilder()
  • Updated imports: removed DocumentBuilderFactory, added XmlFactories

marklogic-client-api/.../OpenCSVBatcherTest.java

  • Added documentBuilderShouldRejectDoctype() test verifying that the
    DocumentBuilder produced by XmlFactories.getDocumentBuilderFactory()
    throws SAXException on XML containing a DOCTYPE declaration, confirming
    XXE / DTD processing is disabled

Jira Bug: https://progresssoftware.atlassian.net/browse/MLE-30241

Replace bare DocumentBuilderFactory.newInstance() in OpenCSVBatcher.write()
with XmlFactories.getDocumentBuilderFactory()
Copilot AI review requested due to automatic review settings June 17, 2026 18:33

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Applies XXE (CWE-611) hardening to the OpenCSV batcher example by switching it to the library’s hardened XML factory, and adds a regression test to ensure DOCTYPE declarations are rejected by the configured DocumentBuilderFactory.

Changes:

  • Updated OpenCSVBatcher.write() to use XmlFactories.getDocumentBuilderFactory() instead of a raw DocumentBuilderFactory.newInstance().
  • Added a JUnit test asserting that parsing XML containing a DOCTYPE fails (verifying XXE/DTD is disabled).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
examples/src/main/java/com/marklogic/client/example/extension/OpenCSVBatcher.java Switches XML builder creation to the hardened factory used across the client library.
marklogic-client-api/src/test/java/com/marklogic/client/test/example/extension/OpenCSVBatcherTest.java Adds a regression test ensuring DOCTYPE is rejected by the hardened factory.

Comment thread examples/src/main/java/com/marklogic/client/example/extension/OpenCSVBatcher.java Outdated

@rjrudin rjrudin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that deleting these examples is a reasonable approach too. I think the intent of the Splitter API was as part of replacing MLCP with a DMSDK-based solution... but Flux has done that instead, and Flux reuses Spark's CSV parsing instead of having to roll our own.

@rjrudin

rjrudin commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Another reason to delete this example in particular is to get rid of the opencsv dependency has had some vulnerabilities associated with it. Those don't impact the Java Client of course, but they do cause noise in Black Duck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants