Skip to content

[EMPSRE-1029] Add options.storage for pre-disk content validation#3

Merged
dwinrick-lever merged 3 commits into
masterfrom
task/EMPSRE-1029-pre-disk-validation
May 27, 2026
Merged

[EMPSRE-1029] Add options.storage for pre-disk content validation#3
dwinrick-lever merged 3 commits into
masterfrom
task/EMPSRE-1029-pre-disk-validation

Conversation

@dwinrick-lever

Copy link
Copy Markdown

Summary

Adds a multer-1.x-style StorageEngine hook so consumers can take over how each file part is persisted — buffer the first few KB, run a magic-byte gate, then either pipe the rest to disk or destroy the part stream. No `fs.createWriteStream` is opened until validation passes.

This unblocks the postings2 side of EMPSRE-1029 (the hook-receiver side landed in lever/hook-receiver#842 via multer's native StorageEngine). postings2 uses connect-multiparty globally; with this option it can apply the same magic-byte gate that hook-receiver does, without an upstream multer migration.

Behavior

Without `options.storage` (default): identical to current behavior. multiparty's `autoFiles=true` shortcut opens the write stream and pipes to `uploadDir`; we emit `'file'` and populate `req.files`. Fully backward-compatible.

With `options.storage`:

  • `autoFields` stays on; text fields still come through as `'field'` events.
  • `autoFiles` is forced off; multiparty emits raw `'part'` events.
  • For each file part, we pre-populate the same `publicFile` shape consumers expect (`fieldName`, `originalFilename`, `name`, `type`, `headers`, `size: 0`) and hand it + the part stream to the storage callback.
  • The storage engine is expected to fill `publicFile.path` and `publicFile.size` before calling `cb(null)`, or destroy the part stream and call `cb(err)` on rejection.

```js
app.use(multipart({
uploadDir: '/tmp',
storage: function (req, part, publicFile, cb) {
// buffer first ~4 KB → validate → pipe to disk or destroy(part)
}
}))
```

Why the in-flight counter

multiparty's `'close'` event fires when every part STREAM has ended, but the storage engine may still be flushing bytes to disk (writeStream finish, S3 upload, etc.) after that. Without tracking pending storage callbacks, `next()` could fire before `publicFile.size` is final. The wrapper now waits for every storage `cb()` to resolve before finalizing.

Test plan

  • `npm test` — 16/16 pass (13 existing + 3 new):
    • storage engine receives the part stream and the prepopulated publicFile
    • storage error → 400 via existing form 'error' handler
    • async storage that completes after multiparty's 'close' is awaited correctly
  • Consumer-side smoke in postings2 once we bump the pin and wire it up

Adds a multer-1.x-style StorageEngine hook so consumers can take over how
each file part is persisted — buffer the first 4 KB, run a magic-byte
gate, then either pipe the rest to disk or destroy the part stream. No
fs.createWriteStream is opened until validation passes.

Without options.storage, behavior is identical to before (multiparty's
autoFiles=true → built-in handleFile writes to uploadDir, 'file' event
populates req.files). This is fully backward-compatible.

With options.storage:
- autoFields stays on so text fields still come through unchanged.
- autoFiles is forced off so multiparty emits raw 'part' events.
- For each part with a filename, we pre-populate the same publicFile
  shape consumers expect (fieldName, originalFilename, name, type,
  headers, size: 0) and hand it + the part stream to the storage
  callback. The storage engine is expected to fill publicFile.path
  and publicFile.size before calling cb(null), or destroy(part) and
  call cb(err) on rejection.
- A per-request in-flight counter ensures connect-multiparty waits for
  every storage callback to resolve before calling next() — multiparty's
  'close' event fires when part STREAMS end, but storage may still be
  flushing bytes to disk after that.

3 new tests cover the happy path, error-translates-to-400, and async
storage that completes after multiparty's 'close'.
@dwinrick-lever dwinrick-lever self-assigned this May 26, 2026
@dwinrick-lever dwinrick-lever requested a review from aerweb May 26, 2026 23:12
Document the maxFilesSize bypass
- multiparty enforces maxFilesSize via LimitStream inside handleFile.
  With autoFiles=false (forced when options.storage is set) file parts
  go to handlePart and the LimitStream is not inserted. Callers who
  rely on the default 100 MB cap need to know that providing a storage
  engine moves size enforcement into their engine. Documented in the
  storage section of the docstring.

Drop parts that arrive after the request has settled
- A storage callback erroring sets done=true (via form.emit('error')
  → existing handler), but multiparty keeps parsing the drained
  request and may emit more 'part' events for parts that hadn't been
  parsed yet. Without a guard those kick off new storage work whose
  results have nowhere to go. Same applies to events buffered past
  finishParse(). Added an `if (done) { part.resume(); return; }` at
  the top of the part handler.

Drop the dead !part.filename branch
- With autoFields=true / autoFiles=false multiparty routes filename-
  less parts to handleField → emits 'field'; only file parts reach
  the 'part' event. The defensive branch was unreachable. Removed.

Tighten cb(err) docstring
- Destroying the part stream doesn't release multiparty's parser
  (parser waits for partStream.on('end'); destroy() doesn't emit
  'end'). It is form.emit('error') in the wrapper that drives the
  response via req.resume() + onFinished. The docstring now says
  destroy is for resource hygiene; form.emit('error') is what releases
  the parser.

Add a field-parts-with-storage test
- The PR docstring promises text fields still come through when
  storage is set. Added a test that mixes .field('user', 'Tobi')
  with .attach('resume', ...) and asserts both req.body.user and the
  file's publicFile payload.
@aerweb

aerweb commented May 27, 2026

Copy link
Copy Markdown

@dwinrick-lever LGTM. However, Claude flagged one small thing.

In storage mode, the part stream has no 'error' listener. When a request aborts mid-part,
  multiparty emits 'error' directly on the part (handleError → errorEventQueue → emit('error'),
   index.js:210-211, 644-646). The default path is safe because handleFile attaches
  fileStream.on('error') (index.js:696), but handlePart and the wrapper attach nothing — so an
  everyday client disconnect could surface as an unhandled 'error' and crash the process (and 2
   of the 3 new tests don't add a part.on('error'), so the "obvious" engine omits it too).

…rash the process

Multiparty's handleFile attaches its own part.on('error') listener
(multiparty/index.js:696). handlePart — the path file parts take when
options.storage is set — does NOT. If the client aborts mid-upload,
multiparty's handleError calls errorEventQueue which directly does
eventEmitter.emit('error', err) on the part stream
(multiparty/index.js:644-645). An unhandled 'error' event crashes the
Node process.

The wrapper now installs a noop part.on('error') listener BEFORE
invoking the storage engine, so engines that forget to attach their
own listener are still safe. Storage engines that want to clean up
their own state on abort can still attach their own listeners on top
— Node EventEmitter allows multiple listeners. The form-level error
handler in the wrapper still drives the 400 response either way.

Also:
- Updated the three other new tests to attach part.on('error') so
  they model the correct pattern for real-world storage engines.
- New regression test asserts the wrapper's listener is present
  BEFORE the storage engine attaches anything of its own. Verified
  by removing the fix and watching this test fail.

@aerweb aerweb left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dwinrick-lever dwinrick-lever merged commit 7ba7411 into master May 27, 2026
1 check failed
@dwinrick-lever dwinrick-lever deleted the task/EMPSRE-1029-pre-disk-validation branch May 27, 2026 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants