Verixa Phase-1 — Upload Document (.docx + .pdf)

3rd creation method · upload Word or PDF · server validation + conversion · then standard authoring · 2026-06-01
Scenario. Lakshmi has an existing SOP authored in Microsoft Word — centrifuge_cal_v3_draft.docx — and a PDF reference document vendor_spec_centrifuge.pdf she wants to attach. She uses the Upload creation method (3rd of 4 options). The flow shows: file picker → server validation (MIME, size, AV scan, macro strip) → conversion (different paths for .docx vs .pdf) → preview converted content → fill metadata → submit for review as usual.
✓ .docx upload (Microsoft Word OOXML) ✓ .pdf upload (read-only attachment) ✓ .txt upload (plain text) ✓ 25 MB max file size ✓ AV scan + MIME validation ✓ Macro stripping (.docx security) ✓ Basic structure preservation (.docx → HTML+JSON) ✗ .doc (legacy binary) — not supported ✗ Word round-trip with redlines — Phase-2 ✗ Track-changes preservation — dropped on convert ✗ Word comments preservation — dropped on convert
See
Click
Fill
Decide
Auto
Done
UP
Part 1 · Lakshmi picks Upload mode
Creation method 3 of 4 (alongside Blank · Template · AI)
L
Lakshmi — Author (upload path)
quality_leaddocument_author
Part 1 · Pick Upload
1
Click
Open Documents → + New Document → pick Upload mode
4 creation methods in Phase-1: Blank · Template · Upload ✓ · AI. Picks Upload.
Behind the scenes
Permission gate: documents:create + documents:upload. Server returns the upload form with the format whitelist + size limit. Tenant admin can restrict upload mode by role if desired.
2
See
Upload form opens with format guidance
Accepted formats
.docx · .pdf · .txt
Max file size
25 MB
Not accepted
.doc (legacy) · .odt · .rtf · .html · .md
What's preserved
Headings · paragraphs · basic tables · images (extracted)
What's stripped/dropped
Macros · track-changes · Word comments · text boxes · SmartArt
3
Click
Click Choose File · OS file picker opens
Part 2 · Server validation pipeline
5 mandatory checks before content is accepted
L
Lakshmi — picks file
document_author
Part 2 · Upload
4
Click
Selects centrifuge_cal_v3_draft.docx
DOCX
centrifuge_cal_v3_draft.docx
1.2 MB · Microsoft Word · last modified 2 days ago
5
Auto
Server runs validation pipeline (5 checks)
Validation pipeline · all must pass
1
Format whitelist check · extension matches accepted list (.docx ✓)
PASS
2
MIME type validation · server reads file bytes · checks magic number matches extension (prevents .exe renamed .docx)
PASS
3
File size limit · 1.2 MB < 25 MB cap
PASS
4
Antivirus scan · ClamAV (or equivalent) full-content scan
PASS
5
Macro detection · scan for VBA macros · strip if present
PASS
Behind the scenes · failure responses
Each check has its own HTTP error code if it fails:
  • Format wrong → 415 UNSUPPORTED_MEDIA_TYPE
  • MIME mismatch → 400 FILE_MIME_MISMATCH
  • Size exceeded → 413 FILE_TOO_LARGE
  • AV detected threat → 422 FILE_FAILED_AV_SCAN + alert security team
  • Macros present → quarantine for review (or auto-strip + warn user — tenant config)
Failed uploads are not retried; user gets specific error message + corrective action.
CV
Part 3 · Conversion path (depends on file type)
.docx becomes editable draft · .pdf becomes read-only attachment
.docx Path · Editable
D
DOCX conversion
Word OOXML → HTML+JSON
6a
Auto
Convert OOXML → internal HTML+JSON
Behind the scenes
Pandoc (or similar) parses .docx structure. Output: HTML body + JSON metadata. Stored in documents.content + content_hash. Original .docx kept as source_attachment for audit (immutable original).
7a
See
What survived conversion
✓ Kept
  • Headings (H1–H6)
  • Paragraph text
  • Basic tables
  • Numbered/bullet lists
  • Bold/italic/underline
  • Images (extracted)
✗ Dropped
  • Track-changes
  • Word comments
  • Macros (VBA)
  • Text boxes
  • SmartArt
  • Custom fonts
  • Complex columns
Conversion complete. 6 sections preserved · 1 SmartArt diagram in §4 could not be converted (placeholder inserted).
8a
See
Converted content rendered in editor
Same authoring UI as Blank/Template mode · Lakshmi can edit, fix the SmartArt placeholder, refine.
.pdf Path · Read-only
P
PDF attachment
read-only · not editable
6b
Auto
PDF stored as read-only attachment
Behind the scenes
PDF is not converted to editable content · stored as a binary attachment to a thin document record. Document has metadata + viewer-link only. documents.content_type='pdf_attachment'.
7b
See
PDF rendered in viewer · cannot edit
Used for: vendor specs · regulatory docs (FDA guidance PDFs) · external standards. Lakshmi fills only metadata + summary text; the PDF itself is fixed.
8b
See
Different downstream behavior
PDF docs still go through review & release · but reviewers comment on metadata only (not content) · effective state has same cascade (distribution + training).
MD
Part 4 · Fill metadata · save · submit for review
From here · same as Blank/Template flows
L
Lakshmi — completes upload setup
document_author
Part 4 · Complete
9
Fill
Fill metadata (same as Blank/Template mode)
Document type
SOP
Module
Manufacturing
Title *
"Centrifuge Calibration Protocol — Antibiotic Line"
Document ID
SOP-MFG-019 (auto-minted)
Created via
Upload (.docx) — provenance preserved
Source attachment
centrifuge_cal_v3_draft.docx (kept for audit)
Behind the scenes · provenance preservation
documents.created_via = 'upload' · source_file_name · source_file_hash · conversion_log all permanently stored. Inspectors 5 years later can ask "where did this document come from?" and get a complete answer.
10
Fill
Edit converted content if needed
Lakshmi fixes the SmartArt placeholder by typing a replacement diagram description · cleans up table formatting that converted imperfectly · adds 2 missing references.
11
Click
Save Draft
"SOP-MFG-019 v1 saved as draft. Content hash recorded. Source .docx preserved as attachment."
12
Click
Submit for Review (picks Priya + Vikram)
"Submitted. State: under_review. Reviewers notified."
Behind the scenes · from here, identical
Same downstream as Blank/Template flows: SoD-12-01 reviewer enforcement · inline comments + iteration loop · multi-reviewer approval · Anita signs release · 3-state lifecycle (under_review → released → effective) · WP-2 cascade fires at effective_date · distribution + training + supersede.

The ONLY thing different about upload-originated documents: the audit trail has created_via='upload' and a source_attachment link to the original .docx for inspection purposes.
DONE · Uploaded SOP enters review · proceeds through standard happy path · becomes effective on signed date.
Upload modes side by side
Aspect.docx (Word).pdf.txt
Editable after upload?✓ Yes (converted to HTML+JSON)✗ No (read-only attachment)✓ Yes (plain text)
Best forMigrating existing SOPs · onboardingVendor specs · external regulatory docsQuick clean migration
Headings preservedN/A (not converted)Limited (no formatting)
Tables preserved✓ (basic)N/A
Images preserved✓ (extracted)In PDF, viewable only
MacrosStripped (security)N/AN/A
Track-changesDropped (Phase-2 will preserve)N/AN/A
Review processSame (comment on converted content)Limited (metadata only)Same
Source file kept?✓ As attachment (audit)✓ The PDF IS the doc✓ As attachment
From here on the flowStandard review/releaseStandard review/release (metadata only)Standard review/release