The conversion from input password to single byte string is ambiguous. Certainly the input password could contain characters from more than one script, say some Korean, some Chinese. Do we process via multiple DBCS code pages Or just one and then replace the unmapped characters with 0×3F If only one DBCS code page is used, how is that determined in this case
Clarify this processing, especially for passwords that use characters from more than one script.
pg. 1916 Part 4, Section 3.2.29
te
Proposed Disposition of DIS 29500 Comment GR-0078 (Modified: 2008-01-04) Agreed; this process was not fully defined. As well, we agree that this comment correctly points out a limitation of the legacy hashing mechanism (the mapping to a single code page results in characters being effectively lost by being converted to 0×3F). To resolve this, the following changes will be made to the specification: The specification will be updated (see below) to explicitly note the effects of the legacy hashing algorithm when dealing with input strings which use characters from multiple scripts. As well, the legacy hashing mechanism will be deprecated and replaced with a new mechanism which uses the full UTF-16 encoded password in conjunction with a set of well-known cryptographic algorithms (e.g. those algorithms defined in ISO/IEC 10118-3:2004). This change is fully detailed in the response to the following comments: CA-0037, CL-0027, CL-0028, CL-0055, CL-0197, CL-0202, CO-0096, CO-0143, CO-0146, DK-0030, DK-0114, DK-0139, FR-0338, FR-0341, FR-0345, GB-0219, GB-0291, GB-0292, GB-0298, GH-0008, GR-0021, GR-0022, GR-0076, IN-0010, IN-0026, IN-0063, IN-0064, IN-0077, IR-0012, IR-0047, JP-0068, KR-0024, MY-0017, PT-0090, PT-0091, PT-0093, US-0051, US-0138, US-0144, US-0252, VE-0001, VE-0017, VE-0054, and ZA-0009. Finally, to resolve GB-0473 (and others), the legacy hashing algorithm will be updated to store the character set to which the string is converted, in order to improve its portability and improve interoperability with other document interchange standards. Specific to the first issue, we agree that the deprecated algorithm should still be completely defined to ensure that legacy files with this hash value are handled appropriately by all implementations of the specification; accordingly, the following changes will be made: Part 4, ยง3.2.29, page 1,916, attribute revisionsPassword, between paragraphs 1 and 2: Attributes Description revisionsPasswor d (Revisions Password) Specifies the hash of the password required for unlocking revisions in this workbook. The hash is generated from an 8-bit wide character. 16-bit Unicode characters must be converted down to 8 bits before the hash is computed, using the following logic: [Note: This legacy conversion attempts to fit UTF-16 encoded characters into a single-byte character set. As such, if the input string uses characters from multiple character sets, many characters will be unmapped in the destination character set Attributes Description and take on the default value, 0x3F. For this reason, it is recommended that applications choose a character set which maps the maximum number of characters from the input string and explicitly declare the character set used in the revisionsCharacterSet attribute. Not doing so will inhibit interoperability. end note] Similar Comments: CL-0196 , CO-0145 , FR-0340 , GB-0294 , PT-0095 , US-0140 , VE-0056

Dupe of GB 294