Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only include fonts effectively used in the final PDF document #1382

Open
Lucas-C opened this issue Mar 3, 2025 · 3 comments
Open

Only include fonts effectively used in the final PDF document #1382

Lucas-C opened this issue Mar 3, 2025 · 3 comments
Labels
enhancement font hacktoberfest performance research needed too complicated to implement without careful study of official specifications up-for-grabs

Comments

@Lucas-C
Copy link
Member

Lucas-C commented Mar 3, 2025

Quoting @andersonhc in #1133 (comment):

I am running some tests and it's interesting your PR makes it clear we are letting unused fonts end up in the output document.
The page 1 will have "F1" and "F4" in the resources dictionary, and page 2 will have "F3" and "F4".
"F2" is added on the final document and not referenced at all. "F4" is added to the resource list because of set_font() although not used.
I have documents with many fallback fonts and there is a considerable amount of unused font data added to the documents.

fpdf2 should NOT include unused fonts.

We should implement a mechanism to only add to the final PDF document the fonts actually used.

cf. comment #1382 (comment) below for mor details.

@yuyiz67
Copy link

yuyiz67 commented Mar 7, 2025

Hi @Lucas-C ,

I checked this issue and plan to fix it in this way:

  • Delaying font resource addition until text is actually output.

  • Modifying set_font() and set_font_size() to avoid adding unused fonts to the resource catalog.

  • Adding a mechanism to track and apply fonts only when needed.

If the plan looks good, could you please assign this issue to me? Thanks. Please let me know if you have more ideas about how to fix this issue!

Thanks!

@Lucas-C
Copy link
Member Author

Lucas-C commented Mar 7, 2025

Hi @yuyiz67!

Thank you for volunteering 🙂 👍

I just realized that @andersonhc slightly modified the font-embedding logic at the beginning of the year in PR #1334

So there is no more issue with fonts added with .add_font() but whose corresponding style is never used (= case /F2 in the issue description).

The remaining issue is with fonts inserted initially by add_page(),
but never reallly used because the end-user later switch to another font.

There is a minimal script reproducing the case, using Noto Sans (the /Fx font IDs dot not match the ones in the issue description):

from pathlib import Path
from fpdf import FPDF

DIR = Path(__file__).parent
FONT_DIR = DIR / "test" / "fonts"  # put the NotoSans .ttf font files in this directory

pdf = FPDF()
pdf.add_font("NotoSans",             fname=FONT_DIR / "NotoSans-Regular.ttf")  # will be /F1
pdf.add_font("NotoSans", style="B",  fname=FONT_DIR / "NotoSans-Bold.ttf")  # will be /F2
pdf.add_font("NotoSans", style="I",  fname=FONT_DIR / "NotoSans-Italic.ttf")  # will be /F3
pdf.add_font("NotoSans", style="BI", fname=FONT_DIR / "NotoSans-BoldItalic.ttf")  # OK/fixed, will NOT be inserted in the PDF
pdf.set_font("NotoSans", size=12)

pdf.add_page()  # currently inserts /F1 on the page
pdf.multi_cell(w=pdf.epw, text="**Text in bold**", markdown=True)  # font effectively used is /F2 = NotoSans-Bold

pdf.add_page()  # currently inserts /F1 on the page
pdf.multi_cell(w=pdf.epw, text="__Text in italic__", markdown=True)  # font effectively used is /F3 = NotoSans-Italic

pdf.add_page()  # currently inserts /F1 on the page
pdf.multi_cell(w=pdf.epw, text="Regular text\n**Text in bold**\n__Text in italic__", markdown=True)  # all 3 fonts used

pdf.output("issue_1382.pdf")

When inspecting the resulting PDF (qpdf --qdf can help), we can see that :

  • page 1 includes fonts /F1 & /F2
  • page 2 includes fonts /F1 & /F3

The problem is with /F1, that is not used on pages 1 & 2 but still referenced in those pages.

It comes from:

The same logic (if self.page > 0: self._out(f"BT /F{self.current_font.i} ...")) also appears:

We also have 5 defensive checks like this one in FPDF methods, ensuring FPDF.set_font() is called before rendering some text:
https://github.com/py-pdf/fpdf2/blob/2.8.2/fpdf/fpdf.py#L2460
We must preserve those checks, but delay the insertion of BT /Fx in the page content stream.

Given that FPDF._render_styled_text_line() already has some BT /Fx insertion logic,
maybe the best strategy would be to get rid of the BT /Fx-inserting code in FPDF.set_font() & FPDF.set_font_size(),
and slightly change the logic in FPDF._render_styled_text_line(), in order to perform this "delayed insertion in the content stream".

Are my explanations clear enough @yuyiz67?
Please ask if you have any question! 🙂

Also, note that this change will cause many reference PDF to be modified,
because many useless /Fx ... operators will be removed once this is implemented.
However, of course, the visual aspect of all those PDF reference files should not change.

@yuyiz67
Copy link

yuyiz67 commented Mar 8, 2025

Hi @Lucas-C
Thanks for the very detailed explanation. I have modified as you said and created a unit test but it broke some existing unit tests. I will create a PR after I fix all unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement font hacktoberfest performance research needed too complicated to implement without careful study of official specifications up-for-grabs
Projects
None yet
Development

No branches or pull requests

2 participants