Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Only include fonts effectively used in the final PDF document #1387

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

yuyiz67
Copy link

@yuyiz67 yuyiz67 commented Mar 8, 2025

Font Optimization: Only include fonts effectively used in the final PDF document. Fonts added via set_font() or add_page() but not actually used in the document are no longer included in the final output. This reduces file size and improves performance, especially for documents with many fallback fonts
Fixes Issue #1382

Checklist:

  • A unit test is covering the code added / modified by this PR

  • This PR is ready to be merged

  • In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder

  • A mention of the change is present in CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

@yuyiz67 yuyiz67 requested a review from gmischler as a code owner March 8, 2025 08:15
@yuyiz67
Copy link
Author

yuyiz67 commented Mar 8, 2025

hi @Lucas-C thanks for your detailed instruction and this is the PR. I cannot add reviewer so I could only mention you here. Thanks

@Lucas-C Lucas-C requested review from Lucas-C and removed request for gmischler March 8, 2025 09:11
Copy link
Member

@Lucas-C Lucas-C left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several problems with modified documents:

  • text disapeared in test/barcodes/barcodes_code39.pdf
  • card sizes changed in test/fonts/fonts_emoji_glyph.pdf
  • text disapeared in test/fonts/fonts_set_builtin_font.pdf
  • first text line in black disapeared in test/free_text_annotation_all_parameters.pdf
  • etc.

This must be fixed before merging this PR.

fpdf/fpdf.py Outdated
@@ -3427,6 +3415,7 @@ def _render_styled_text_line(
self._resource_catalog.add(
PDFResourceType.FONT, current_font.i, self.page
)
current_font_is_added = True
Copy link
Member

@Lucas-C Lucas-C Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoudn't that be reset to False at some point?

What happens when a text fragment with a different font is parsed?
Is it correctly added to the content stream?

reproduce.py Outdated
@@ -0,0 +1 @@
a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script should be removed from the PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should not be added.
The corresponding unit test fail has @pytest.mark.xfail, it should not produce a PDF reference file.

@Lucas-C
Copy link
Member

Lucas-C commented Mar 8, 2025

Could you rebase your branch on this repo master branch please?

Once you have done that, you could check PDF differences using scripts/compare-changed-pdfs.py:
https://py-pdf.github.io/fpdf2/Development.html#visually-comparing-all-pdf-reference-files-modified-on-a-branch

@yuyiz67
Copy link
Author

yuyiz67 commented Mar 11, 2025

@Lucas-C thanks for your quick review. I have changed my code.
I added a variable font_size_set_for_page to indicate whether current page has set font and size.
I extracted the font set code into method set_font_size_for_page

    def set_font_size_for_page(self):
        """
        Set font and size for current page. This step is needed before adding text into page and not needed in set_font and set_font_size. 
        """
        if self.page > 0 and self.current_font and not self.font_size_set_for_page:
            self._out(f"BT /F{self.current_font.i} {self.font_size_pt:.2f} Tf ET")
            self._resource_catalog.add(
                PDFResourceType.FONT, self.current_font.i, self.page
            )
            self.font_size_set_for_page = True

This method should be only called when self.font_size_set_for_page == False
When set_font or set_font_size is called, if the font is same as current font, font_size_set_for_page will not be reset to False. It will be reset to False only when a new font is selected.

My assumption about font is simple: In a page, if a <font,size> is set with method set_font_size_for_page, any following text will use the <font, size> until a new <font,size> comes.
When a new page is added, font_size_set_for_page should be reset to false, so it is reset in add_page()
set_font_size_for_page is mainly called in text(), add_text_notation(), _render_styled_text_line() because text will be rendered in these functions.

However, after I refresh all unit test pdfs. I noticed a lot of difference in test_html, a good example is test_html_features
If you open test/html/html_features.pdf you will see the text disappeared after tag and came back after tag. In test/html/html_long_ol_bullets.pdf also some text disappeared. It feels like tag and

    tag have different logic and removed the font.
    Also some text disappeared in test/html/html_blockquote_indent.pdf and test/cell_skew_text.pdf

    I think my assumption about font is incorrect. In a page, after a <font,size> is set, when some code is added, the font will disappear. But I don't know what exactly made the font gone. If I know, I could add font_size_set_for_page = False after that code.
    Do you have any idea or suggestions about debugging this issue? Thank you very much

@yuyiz67
Copy link
Author

yuyiz67 commented Mar 11, 2025

I added a fix but these files still miss text, let me try to fix them:
test/html/html_table_with_null_text_in_span_cell.pdf
test/html/html_table_with_only_tds.pdf
test/outline/footer_leaking_style_on_toc.pdf
test/outline/test_start_section_horizontal_alignment.pdf
test/outline/toc_without_font_style.pdf (wrong font size)
test/table/table_valign_per_row.pdf
test/table/table_with_fill_color_set_beforehand.pdf
test/table/table_with_multiline_cells_and_split_over_3_pages.pdf
test/table/table_with_multiline_cells_and_without_headings.pdf
test/table/table_with_no_headers_nor_horizontal_lines.pdf
test/table/table_with_qrcode.pdf
test/table/table_with_ttf_font.pdf
test/table/table_without_headings.pdf
test/template/flextemplate_wrapmode.pdf
test/template/template_wrapmode.pdf
test/text/multi_cell_table_unbreakable_with_split_only.pdf
test/text/text_modes.pdf
test/transparency.pdf

@yuyiz67
Copy link
Author

yuyiz67 commented Mar 11, 2025

Just confirmed all pdfs are the same as master branch. Let me double check all code is good.

self._push_local_stack()
self.add_page(same=True)
self._pop_local_stack()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously, the sequence looks quite confusing:
pop_gs -> push_local -> add_page -> push_gs -> pop_local

so here I changed it to correct sequence:
pop_gs -> push_local -> add_page -> pop_local -> push_gs

This makes code more readable
@Lucas-C

@yuyiz67 yuyiz67 requested a review from Lucas-C March 12, 2025 08:13
@yuyiz67
Copy link
Author

yuyiz67 commented Mar 12, 2025

@Lucas-C I have checked the pdfs with the script you mentioned. I think the code looks much better now.
Main logic is to reset font_size_set_for_page to False for all states in _perform_page_break.

@Lucas-C
Copy link
Member

Lucas-C commented Mar 12, 2025

Thank you for all your work on this @yuyiz67 👍

I may not have the time to review until Friday,
but I'll definitely get back to you before next week 🙂

@yuyiz67
Copy link
Author

yuyiz67 commented Mar 12, 2025

@Lucas-C Thanks for running workflow. I tried to run it locally and found error :

❯ scripts/pdfchecker.py --print-aggregated-report
PDF Checker 2.5.1 Copyright 2018-2024 Datalogics, Inc. All Rights Reserved
AGGREGATED REPORT
Errors:

  • Uses Base 14 fonts not embedded in document:: x341 - First 3 files: test/free_text_annotation.pdf, test/barcodes/barcodes_interleaved2of5.pdf, test/alias_nb_pages.pdf

I will fix all errors and let you know after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants