The presence of non-XML characters, escaped, or not escaped in an OOXML document, is contrary to interoperability of XML and XML-based tools. The W3C’s Internationalization Activity confirms this interpretation, saying Control codes should be replaced with CA-appropriate markup. Since XML provides a standard way of encoding structured data, representing control codes other than as markup would undo the actual advantages of using XML. Use of control codes in HTML and XHTML is never appropriate, since these markup languages are for representing text, not data."
Change the following text:
For all characters that cannot be represented in XML as defined by the XML 1.0 specification, the characters are escaped using the Unicode numerical character representation escape character format _xHHHH_, where H represents a hexadecimal character in the character’s value.
to the following (adding the word UNICODE near the beginning of the sentence):
"For all UNICODE characters that cannot be represented in XML as defined by the XML 1.0 specification, the characters are escaped using the Unicode numerical character representation escape character format _xHHHH_, where H represents a hexadecimal character in the character’s value."
In addition this Section should make it clear that valid XML 1.0 UNICODE characters are permitted in the bstr value and not just UNICODE characters that cannot be represented.
For all characters that cannot be represented in XML as defined by the XML 1.0 specification, the characters are escaped using the Unicode numerical character representation escape character format _xHHHH_, where H represents a hexadecimal character in the character’s value.
to the following (adding the word UNICODE near the beginning of the sentence):
"For all UNICODE characters that cannot be represented in XML as defined by the XML 1.0 specification, the characters are escaped using the Unicode numerical character representation escape character format _xHHHH_, where H represents a hexadecimal character in the character’s value."
In addition this Section should make it clear that valid XML 1.0 UNICODE characters are permitted in the bstr value and not just UNICODE characters that cannot be represented.
Part 4, Section 07.04.02.04
ED
Proposed Disposition of DIS 29500 Comment CA-0064 (Modified: 2008-01-04) We agree that control codes should not be stored within the text of an element value. However, these characters do not represent control codes–this property is used solely to store user-defined data stored within the legacy document format; as such, we believe that it would be inappropriate to remove this datatype from the specification and lose this information. As suggested by the Canadian National Body, we believe some clarification would be useful; as a result, the following change will be made in Part 4, §7.4.2.4, page 5,122, lines 26: This element defines a binary basic string variant type , which can store any valid Unicode character . For all Unicode characters that cannot be directly represented in XML , as defined by the XML 1.0 specification, the characters are shall be escaped using the Unicode numerical character representation escape character format _xHHHH_, where H represents a hexadecimal character in the character’s value. [Example: The Unicode character 8 is not permitted in an XML 1.0 document, so it shall be escaped as _x0008_. end example] Similar Comments: BR-0059 , CO-0232 , FR-0378 , GB-0591 , GR-0010 , US-0161
