Conversation
Replace bare DocumentBuilderFactory.newInstance() in OpenCSVBatcher.write() with XmlFactories.getDocumentBuilderFactory()
There was a problem hiding this comment.
Pull request overview
Applies XXE (CWE-611) hardening to the OpenCSV batcher example by switching it to the library’s hardened XML factory, and adds a regression test to ensure DOCTYPE declarations are rejected by the configured DocumentBuilderFactory.
Changes:
- Updated
OpenCSVBatcher.write()to useXmlFactories.getDocumentBuilderFactory()instead of a rawDocumentBuilderFactory.newInstance(). - Added a JUnit test asserting that parsing XML containing a DOCTYPE fails (verifying XXE/DTD is disabled).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| examples/src/main/java/com/marklogic/client/example/extension/OpenCSVBatcher.java | Switches XML builder creation to the hardened factory used across the client library. |
| marklogic-client-api/src/test/java/com/marklogic/client/test/example/extension/OpenCSVBatcherTest.java | Adds a regression test ensuring DOCTYPE is rejected by the hardened factory. |
rjrudin
left a comment
There was a problem hiding this comment.
Note that deleting these examples is a reasonable approach too. I think the intent of the Splitter API was as part of replacing MLCP with a DMSDK-based solution... but Flux has done that instead, and Flux reuses Spark's CSV parsing instead of having to roll our own.
|
Another reason to delete this example in particular is to get rid of the opencsv dependency has had some vulnerabilities associated with it. Those don't impact the Java Client of course, but they do cause noise in Black Duck. |
Summary
Fixes FIND-005 from the Java Client Security Findings Report (May 14, 2026),
tracked as MLE-30241.
OpenCSVBatcher.write()was creating aDocumentBuilderFactoryviaDocumentBuilderFactory.newInstance()with no XXE mitigations, in contrastto the rest of the library which consistently uses the hardened
XmlFactories.getDocumentBuilderFactory(). This provided an insecurebaseline to any developer who adapts or extends the example (CWE-611,
CVSS 6.1 Medium).
Changes
examples/.../OpenCSVBatcher.javaDocumentBuilderFactory.newInstance()+ twoset*calls +newDocumentBuilder()with a single call toXmlFactories.getDocumentBuilderFactory().newDocumentBuilder()DocumentBuilderFactory, addedXmlFactoriesmarklogic-client-api/.../OpenCSVBatcherTest.javadocumentBuilderShouldRejectDoctype()test verifying that theDocumentBuilderproduced byXmlFactories.getDocumentBuilderFactory()throws
SAXExceptionon XML containing a DOCTYPE declaration, confirmingXXE / DTD processing is disabled
Jira Bug: https://progresssoftware.atlassian.net/browse/MLE-30241