Modular PDF Engine
Overview
A modular JSON→PDF generation engine designed to guarantee a consistent corporate identity (CI) across all services. The system runs as a console-based microservice: it consumes JSON via piped stdin, renders PDFs through service‑specific plugins, and returns Base64‑encoded output to stdout. This makes it easy to embed in web servers, cron jobs, CI pipelines, and legacy PHP systems.
Motivation
Different services generated PDFs independently, causing duplicated logic and inconsistent branding. This project establishes a single, CI‑enforced pipeline where each service can extend functionality safely via its own plugin—without modifying the core engine.
Features
- Console‑based microservice (stdin → Base64 PDF on stdout)
- Strict CI layout (headers, footers, margins, typography)
- Plugin architecture via dynamically loaded JARs
- Jakarta Validation for schema enforcement + deterministic error codes
- Sanitization layer for XSS/HTML safety
- Apache PDFBox rendering
- Automated unit, integration, and performance tests
- Optional AES‑GCM encryption
Architecture
The engine follows a modular pipeline: validate → sanitize → identify plugin → load plugin → render → encode → output. The core enforces CI rules, security checks, and error handling; plugins implement only their content logic.
Data Flow
1. Input Handling
- JSON via stdin or file
- Supports Base64 and Base64URL output
2. Sanitisation & Validation
- Jakarta Validation checks structure
- Sanitization removes unsafe content
- Deterministic exit codes simplify debugging
3. Plugin Lookup
- Extracts plugin type from input
- Verifies plugin integrity (HMAC)
- Loads plugin classes via reflection
4. Execution Model
Plugins must extend an abstract PDF class and implement:
byte[] execute(String inputJson)
This supports multiple rendering libraries while enforcing CI rules.
5. Rendering
- Plugin produces raw PDF bytes
- Core can optionally encrypt (AES‑GCM)
6. Output Response
{
"data": "<base64>",
"exitcode": 0
}
7. Logging
- Metadata and errors only (no PDF content)
Architecture Diagram
JSON (stdin)
│
┌──┴───────────────────┐
│ Validate & Sanitize │
└──┬───────────────────┘
│ Determine plugin
▼
┌──────────────────────┐
│ Plugin Lookup + HMAC │
└──┬───────────────────┘
│ Reflection load
▼
┌──────────────────────┐
│ Plugin Execution │
└──┬───────────────────┘
│ PDF bytes
▼
┌──────────────────────┐
│ Encode (Base64) │
└──┬───────────────────┘
│
▼
JSON Output + Logging
Key Decisions
Reflection‑based Plugin Architecture
- Enables new PDF formats without redeploying the core
- Reduces coupling between services
Abstract CI Base Class
- Global enforcement of headers, footers, margins, typography
- Plugins cannot override CI‑critical elements
HMAC Verification
- Prevents untrusted JARs from being executed
Optional AES‑GCM Encryption
- Allows secure distribution of generated documents
Structured Output Format
- Predictable integration with PHP, Node.js, Java, and legacy tooling
Limitations / Lessons Learned
- Raw PDF byte output provides flexibility but also allows plugins to bypass CI rules if misused.
- A future version could use a declarative content API to enforce layout more strictly.
- JVM startup time dominates tiny‑document performance; a persistent mode would improve speed.
Your Contributions
- Designed and implemented the entire plugin architecture
- Built three production‑ready plugins (Base, List, EPT)
- Developed sanitization + validation logic
- Implemented deterministic exit code system
- Created custom reflection‑based plugin loader
- Designed all architecture, system, and class diagrams
- Built complete unit, integration, and performance test suites
- Managed deployment and stakeholder communication
Post‑Project Enhancements
- Added HMAC integrity checks after initial delivery
- Improved sanitization for escaped JSON edge cases
- Refactored plugin loader for faster startup
- Strengthened CI enforcement layer
- Added Base64URL output mode
Impact
A single unified PDF engine replaced multiple inconsistent in‑house generators, reduced maintenance costs, and ensured strict CI compliance across all departments.