-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDF/A-1 6.1.5 and tab vs. space #892
Comments
The ISO 19005-1 specification says:
I guess the specification uses the word "equivalent" and not equal exactly because direct comparison of essentially different data types is not possible. The following explanation about the text case is yet another hint. The normalization of TAB to SPACE is a part of XML specification (https://www.w3.org/TR/2004/REC-xml-20040204/#AVNormalize). So, I believe in this case TAB in XMP values is equivalent to the space in the values of Info dictionary keys. |
I tried to make veraPDF with --fixmetadata craete a Metadata property (stored in an attribute) containing a TAB, but I found out that --fixmetadata is a misnomer: it sets the Info dictionary from Metadata rather than vice versa. |
Well, this is by design -- veraPDF assumes that XMP values take precedence over Info dictionary. But this is not the first time we get a request to make it an option: whether to sync XMP package from the Info dictionary or vice versa. So, we'll raise the priority of this one. |
This still feels like an open issue and would probably require a specific option to control. It's interesting from a preservation POV also. |
yes, this is still open. We can raise the priority and include it in the next release |
TAB and SPACE are two different (Unicode) characters. As Info strings and the corresponding XMP strings are compared for comparison of Unicode characters, the use of SPACE in XMP and TAB in Info dictionary string will result in the validation error. |
Dev Effort
3D
Description
This is not a veraPDF issue, I think veraPDF behaves correctly. But the behavior is a bit unexpected, so this is something the Validation TWG might be interested in.
Look at these two documents:
6.1.5-pass-03.pdf
6.1.5-fail-14.pdf
Both have a TAB (U+0009) in the Metadata property and a space (U+0020) in the Info dictionary.
The difference comes from one property being an XML attribute, the other being an XML element. For the former, Attribute-Value Normalization is applied, unlike for the latter. For the Info entries which correspond to XML attributes in Metadata, it's impossible to have a TAB in the value without violating 6.1.5.
Well, after finding out that ModDate "2017-08-09" does not match xmp:ModifyDate "2017-08-09", this isn't that big a surprise.
The question is: Is this really the behavior intented by the ISO 19005 committee? That is, did they choose attribute vs. element based on whether Attribute-Value Normalization is to be applied or not? Or is this just some unfortunate side effect?
The text was updated successfully, but these errors were encountered: