Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
f7c5507
BUG: inital fix for merge page #3467
HSY-999 Nov 21, 2025
01a52a2
MAINT: made inspector capable of accepting keyword only args
HSY-999 Nov 25, 2025
f82d39d
STY: refactored Polygon and PolyLine with shared abstract class
HSY-999 Nov 25, 2025
b47c25e
MAINT: Improved type check and pass in vertices as arg
HSY-999 Nov 25, 2025
4817e85
MAINT: Reworked determineVertices function to be 3.9 compliant
HSY-999 Nov 25, 2025
e82dd43
MAINT: use union type hinting to 3.9+ compliance, use snake case
HSY-999 Nov 26, 2025
cbd58d9
fixed missing assignment to vertices variable
HSY-999 Dec 1, 2025
2560d16
add test case for issue #3467
HSY-999 Dec 1, 2025
2a7e5b3
revert change to vertices arg check for polyline/polygon
HSY-999 Dec 1, 2025
5f2e9a6
Merge branch 'main' into fixMergePageBug#3467
HSY-999 Dec 1, 2025
8d82c2d
precommit style changes
HSY-999 Dec 1, 2025
1899aa6
Merge branch 'fixMergePageBug#3467' of github.com:HSY-999/pypdf into …
HSY-999 Dec 1, 2025
f5f5604
additional precommit style changes in tests and _data_structures
HSY-999 Dec 1, 2025
81e38cd
added lines to fix for autogen documentation
HSY-999 Dec 5, 2025
0ae775b
lower string to pass regex test
HSY-999 Dec 5, 2025
a5d4b6c
uncommented class descriptions strings
HSY-999 Dec 5, 2025
817e981
add option to map different keyword to arg name if necessary
HSY-999 Dec 8, 2025
0e410ba
Merge branch 'main' into fixMergePageBug#3467
HSY-999 Dec 8, 2025
330f253
added test case for setting alt arg names
HSY-999 Dec 9, 2025
bcf9d2c
Merge branch 'main' into fixMergePageBug#3467
HSY-999 Dec 9, 2025
bb710e1
add test case for type error
HSY-999 Dec 9, 2025
5266c84
Merge branch 'fixMergePageBug#3467' of github.com:HSY-999/pypdf into …
HSY-999 Dec 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions pypdf/annotations/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

from ._base import NO_FLAGS, AnnotationDictionary
from ._markup_annotations import (
AbstractPolyLine,
Ellipse,
FreeText,
Highlight,
Expand All @@ -27,6 +28,7 @@

__all__ = [
"NO_FLAGS",
"AbstractPolyLine",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which user-facing functionality needs this class to be public API?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had trouble with sphinx documentation build as it could not see AbstractPolyLine. MarkupAnnotation is also marked in __all__, but only reference in _markup_annotations.py, no other file. because the two class are similar, i thought this correct approach

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide the exact error you are getting for the documentation. I do not want to make this base class a public interface, as this introduces unnecessary maintenance burdens.

"AnnotationDictionary",
"Ellipse",
"FreeText",
Expand Down
63 changes: 44 additions & 19 deletions pypdf/annotations/_markup_annotations.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,41 @@ def __init__(self, *, title_bar: Optional[str] = None) -> None:
self[NameObject("/T")] = TextStringObject(title_bar)


class AbstractPolyLine(MarkupAnnotation, ABC):
"""
Base class for Polygon and PolyLine

Args:
vertices: List of coordinates of each vertex;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something missing in the parameter description?


"""
def __init__(
self,
vertices: Union[list[Vertex], ArrayObject],
**kwargs: Any
) -> None:
super().__init__(**kwargs)
if len(vertices) == 0:
raise ValueError(f"A {type(self).__name__.lower()} needs at least 1 vertex with two coordinates")

@staticmethod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I usually prefer class methods instead of static methods.
  • Why cannot we call this in our __init__ instead of enforcing this on the subclasses?
  • We probably want to use the stdlib code for suitable Python versions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* Why cannot we call this in our `__init__` instead of enforcing this on the subclasses?

i would need to store the coord_list and vertices in a variable to access them in subclass. i preferred that they should be garbage collected because value is stored in dict object and it unnecessary to keep as instance variables, and modifying them as argument to __init__ did not seem correct to me.

* We probably want to use the stdlib code for suitable Python versions.

do you mean using zip_longest rather than zip?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is garbage collection really relevant here? If yes, I vote to get rid of the base class completely and just make this method a helper function directly.

Regarding the stdlib part: You initially had itertools.batched here. Why aren't we keeping this for Python >= 3.12 and just use our own approach for Python < 3.12? Discussions are in #3532 (comment). Or is there any other benefit of using the current approach over itertools.batched?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is garbage collection really relevant here? If yes, I vote to get rid of the base class completely and just make this method a helper function directly.

we store vertices data in /Vertices, so field variable vertices does not need existing and only need to exist inside __init__ until they store in dict . it is small detail but why keep around field that we do not need?

alternatively, i could move call of _determine_vertices in to _get_bounding_rectangle and remove abstract base class. but PolyLine and Polygon both perform similar length check which why we wanted shared class i thought

Regarding the stdlib part: You initially had itertools.batched here. Why aren't we keeping this for Python >= 3.12 and just use our own approach for Python < 3.12? Discussions are in #3532 (comment). Or is there any other benefit of using the current approach over itertools.batched?

i thought one solution cover all python version was better. less code branch to maintain. batched offers strict mode which fail when miss element to make full tuple of length n. i can make small change to support feature with zip however

def _determine_vertices(
vertices: Union[list[Vertex], ArrayObject]
) -> tuple[list[Vertex], list[NumberObject]]:
coord_list: ArrayObject = ArrayObject()
if isinstance(vertices, ArrayObject):
coord_list = vertices
args = [iter(vertices)] * 2 # Adapted def grouper()
vertices = list(zip(*args)) # from https://docs.python.org/3.9/library/itertools.html#itertools-recipes

else:
for x, y in vertices:
coord_list.append(NumberObject(x))
coord_list.append(NumberObject(y))

return vertices, coord_list


class Text(MarkupAnnotation):
"""
A text annotation.
Expand Down Expand Up @@ -186,19 +221,15 @@ def __init__(
)


class PolyLine(MarkupAnnotation):
class PolyLine(AbstractPolyLine):
def __init__(
self,
vertices: list[Vertex],
vertices: Union[list[Vertex], ArrayObject],
**kwargs: Any,
) -> None:
super().__init__(**kwargs)
if len(vertices) == 0:
raise ValueError("A polyline needs at least 1 vertex with two coordinates")
coord_list = []
for x, y in vertices:
coord_list.append(NumberObject(x))
coord_list.append(NumberObject(y))
super().__init__(vertices=vertices, **kwargs)

vertices, coord_list = self._determine_vertices(vertices)
self.update(
{
NameObject("/Subtype"): NameObject("/PolyLine"),
Expand Down Expand Up @@ -280,20 +311,14 @@ def __init__(
)


class Polygon(MarkupAnnotation):
class Polygon(AbstractPolyLine):
def __init__(
self,
vertices: list[tuple[float, float]],
vertices: Union[list[Vertex], ArrayObject],
**kwargs: Any,
) -> None:
super().__init__(**kwargs)
if len(vertices) == 0:
raise ValueError("A polygon needs at least 1 vertex with two coordinates")

coord_list = []
for x, y in vertices:
coord_list.append(NumberObject(x))
coord_list.append(NumberObject(y))
super().__init__(vertices=vertices, **kwargs)
vertices, coord_list = self._determine_vertices(vertices)
self.update(
{
NameObject("/Type"): NameObject("/Annot"),
Expand Down
50 changes: 49 additions & 1 deletion pypdf/generic/_data_structures.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
__author__ = "Mathieu Fenniak"
__author_email__ = "[email protected]"

import inspect
import logging
import re
import sys
Expand Down Expand Up @@ -263,6 +264,40 @@ def read_from_stream(


class DictionaryObject(dict[Any, Any], PdfObject):

_init_alt_arg_names_: Optional[dict[str, str]] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Why does this have a leading and trailing underscore?
  • Please avoid abbreviations for naming purposes.



"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation seems to be somehow located at the wrong place. It should go into the methods, not before them.

Used to map DictionaryObject keyword to __init__ arg names
when cloning. key -> arg_name.

Args:
key: string identifying DictionaryObject keyword
arg_name: string identifying argument name (value)
"""
def set_alt_arg_name(self, key: str, arg_name: str) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What benefit does this have? Couldn't we just provide the mapping directly in the corresponding subclass?

if self._init_alt_arg_names_ is None:
self._init_alt_arg_names_ = {}

if not isinstance(key, str) or not isinstance(arg_name, str):
raise TypeError

self._init_alt_arg_names_[key] = arg_name

"""
Used to return __init__ arg names when cloning
Args:
key: string identifying DictionaryObject keyword
Returns:
Returns None if not found, else the string representing the
argument name
"""
def get_alt_arg_name(self, key: str) -> Optional[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This should not be public API.
  • The type hint requests the key to be a string. Which parameter name should not be a string in this use case where an explicit type check would be required?
  • Why does this need to require None? Couldn't we just return the original key as the fallback and use a more generic name?

if self._init_alt_arg_names_ is None or not isinstance(key, str):
return None
return self._init_alt_arg_names_.get(key, None)

def replicate(
self,
pdf_dest: PdfWriterProtocol,
Expand Down Expand Up @@ -291,9 +326,22 @@ def clone(
pass

visited: set[tuple[int, int]] = set() # (idnum, generation)

kwargs = {}
inspector = inspect.getfullargspec(self.__class__.__init__)

for key, val in self.items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for key, val in self.items():
for key, value in self.items():

alt_arg_name = self.get_alt_arg_name(key)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this code could be simplified by passing the stripped key to the alternative name retrieval method and let it return the original code for unmapped entries. You would not need the first check condition anymore.

key_stripped = key.removeprefix("/").lower()

if alt_arg_name:
kwargs[alt_arg_name] = val
elif key_stripped in inspector.args or key_stripped in inspector.kwonlyargs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, unknown keys would raise a proper error. With this change, these issues will get obfuscated completely. This does not sound correct.

kwargs[key_stripped] = val

d__ = cast(
"DictionaryObject",
self._reference_clone(self.__class__(), pdf_dest, force_duplicate),
self._reference_clone(self.__class__(**kwargs), pdf_dest, force_duplicate),
)
if ignore_fields is None:
ignore_fields = []
Expand Down
25 changes: 25 additions & 0 deletions tests/generic/test_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import pytest

from pypdf import PdfReader, PdfWriter
from pypdf.annotations._markup_annotations import Polygon
from pypdf.constants import AFRelationship
from pypdf.errors import PdfReadError, PyPdfError
from pypdf.generic import (
Expand Down Expand Up @@ -575,3 +576,27 @@ def test_embedded_file__order():
"test.txt", attachment4.pdf_object.indirect_reference,
"xyz.txt", attachment3.pdf_object.indirect_reference,
]


def test_merge_page_with_annotation():
# added and adapted from issue #3467
writer = PdfWriter()
writer2 = PdfWriter()
writer.add_blank_page(100, 100)
writer2.add_blank_page(100, 100)

annotation = Polygon(
vertices=[(50, 550), (200, 650), (70, 750), (50, 700)],
)

writer.add_annotation(0, annotation)

page1 = writer.pages[0]
page2 = writer2.pages[0]
page2.merge_page(page1)

assert page2.annotations[0].get_object()["/Type"] == annotation["/Type"]
assert page2.annotations[0].get_object()["/Subtype"] == annotation["/Subtype"]
assert page2.annotations[0].get_object()["/Vertices"] == annotation["/Vertices"]
assert page2.annotations[0].get_object()["/IT"] == annotation["/IT"]
assert page2.annotations[0].get_object()["/Rect"] == annotation["/Rect"]
28 changes: 28 additions & 0 deletions tests/test_generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,18 @@
RESOURCE_ROOT = PROJECT_ROOT / "resources"


class DummyDictObject(DictionaryObject):
def __init__(self, test_name: str) -> None:
super().__init__()
self.set_alt_arg_name("/TestName", "test_name")

self.update(
{
NameObject("/TestName"): test_name
}
)


class ChildDummy(DictionaryObject):
@property
def indirect_reference(self):
Expand Down Expand Up @@ -1282,6 +1294,22 @@ def test_coverage_arrayobject():
assert isinstance(k, int)
assert isinstance(v, PdfObject)

def test_alt_keyword_when_cloning():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put exactly two empty lines between top-level functions.

obj = DummyDictObject(test_name="testval")
obj2: DummyDictObject = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type hint is wrong.


clone_obj = obj.clone(obj2)

assert clone_obj.get("/TestName") == "testval"

def test_type_error_when_using_alt_arg_keys():
dict_obj = DictionaryObject()
with pytest.raises(TypeError):
dict_obj.set_alt_arg_name(1, "test_key")

with pytest.raises(TypeError):
dict_obj.set_alt_arg_name("test_key", 1)


def test_coverage_streamobject():
writer = PdfWriter()
Expand Down