Modernizing the KDE Documentation Pipeline: dblatex to Apache FOP ( Season of KDE - 2026)
A technical deep dive into migrating KDE’s documentation build system from dblatex to Apache FOP, including toolchain challenges, rendering differences, and lessons learned during the transition.
Introduction
Season of KDE is a mentorship program organized by the KDE community, aimed at introducing contributors to real-world open-source development. Unlike short-term contribution drives, it focuses on sustained collaboration, code review discipline, and meaningful improvements to KDE’s ecosystem.
KDE itself is a large open-source community that develops a wide range of software, including desktop environments, frameworks, and developer tools. Its documentation infrastructure plays a critical role in ensuring users and contributors can understand and build the software effectively.
During my Season of KDE project, I am working on modernizing KDE’s documentation PDF generation pipeline by migrating it from dblatex to Apache FOP.
This migration involved replacing a LaTeX-based processing chain with an XSL-FO based system, improving maintainability and simplifying the build process.
Why Migrate from dblatex?
KDE’s documentation pipeline previously relied on dblatex, which converts DocBook XML into LaTeX before producing a PDF. While functional, this approach had several long-term drawbacks.
One major concern was maintainability. dblatex is no longer actively maintained, which makes relying on it increasingly risky for a large and evolving project like KDE. Unmaintained tooling can introduce compatibility issues over time and may expose the build system to unresolved security vulnerabilities.
The pipeline also depended heavily on the LaTeX ecosystem, increasing dependency complexity and making debugging more difficult. Rendering problems often required tracing issues across multiple transformation layers, which slowed down iteration and maintenance.
For a project with a large documentation footprint, depending on an aging and unmaintained toolchain was not sustainable. A more actively maintained and modular solution was needed.
Why Apache FOP?
Unlike dblatex, Apache FOP works directly with XSL Formatting Objects (XSL-FO). Since KDE’s documentation system already relied on XSL stylesheets for generating HTML output, adopting FOP allowed the PDF pipeline to remain within the same transformation ecosystem.
The new pipeline simplifies to:
DocBook XML → XSL-FO → PDF (via Apache FOP)
This removes the intermediate LaTeX layer entirely.
By staying within the XSL-based workflow, the build process becomes more consistent and easier to reason about. It reduces dependency complexity, avoids the need for a full TeX toolchain, and aligns PDF generation with the existing stylesheet infrastructure.
Additionally, Apache FOP is actively maintained, making it a more sustainable long-term choice for a project of KDE’s scale.
Architectural Differences
Although both dblatex and Apache FOP ultimately produce PDF output from DocBook XML, their internal architectures differ significantly.
dblatex Pipeline
The dblatex-based workflow introduces an intermediate LaTeX layer:
DocBook XML → LaTeX → PDF
In this model:
- DocBook is transformed into LaTeX
- LaTeX is compiled using the TeX toolchain
- The final PDF is generated via LaTeX compilation
This approach tightly couples PDF generation to the LaTeX ecosystem. While powerful, it introduces a large dependency surface and adds an additional abstraction layer that can complicate debugging and customization.
Apache FOP Pipeline
The Apache FOP workflow removes the LaTeX layer entirely:
DocBook XML → XSL-FO → PDF (via FOP)
Here:
- DocBook is transformed using XSL stylesheets into XSL-FO
- Apache FOP processes the FO file directly to generate PDF
Because KDE already relied on XSL stylesheets for HTML output, this approach keeps the transformation logic within the same ecosystem. The result is a more consistent and modular build pipeline.
Key Differences
| Aspect | dblatex | Apache FOP |
|---|---|---|
| Intermediate Format | LaTeX | XSL-FO |
| Dependency Ecosystem | TeX toolchain | Java-based FOP engine |
| Maintenance Status | Unmaintained | Actively maintained |
| Debugging Complexity | Multi-layer (XML → LaTeX → PDF) | Direct XML → FO → PDF |
| Ecosystem Alignment | Separate from HTML toolchain | Shares XSL infrastructure |
The migration is not just a tool replacement — it is an architectural simplification.
Technical Challenges During Migration
The first major issue appeared immediately after switching to Apache FOP: fonts broke.
PDFs generated through FOP did not render text correctly. Some characters were missing, spacing was inconsistent, and certain documents failed to render properly. The root cause was simple but critical — there was no fop.xconf configuration file and no custom XSL layer handling font definitions.
Unlike dblatex, which relies on the TeX ecosystem and its default font handling, Apache FOP requires explicit font configuration. Without proper configuration, FOP falls back to limited default fonts, which often lack full Unicode coverage.
This meant that:
- Fonts were not embedded
- Certain character sets rendered incorrectly
- Multi-language documentation became unstable
Introducing fop.xconf
To resolve this, a fop.xconf configuration file was introduced. This file explicitly defines:
- Font directories
- Font families
- Embedding rules
- Rendering options
By configuring FOP to recognize and embed the required fonts, PDF generation became stable and predictable.
Customizing the XSL Layer
In addition to font configuration, a custom.xsl stylesheet was introduced to refine formatting behavior. This allowed control over layout elements such as:
- Page margins
- Heading styles
- Code block formatting
- Table rendering
Since the pipeline already relied on XSL transformations, integrating these adjustments remained consistent with the overall architecture.
Ongoing Work
The migration is still in progress.
Stylesheet refinement and full pipeline integration remain active areas of work. I plan to document the remaining challenges and solutions as they unfold.
More soon.