Ecma 376 has extensive use of bitmasks (eg in Part 4, Section 2.15.1.86 and Part 4, Section 2.15.1.87). Bitmasks are not suitable in an XML standard as they do not allow existing widely used XML processing tools to manipulate Ecma 376 documents.
Ecma 376 should be reviewed to remove all use of bitmasks and replace them with correct and equivalent XML constructs.
Part 4, Section 2.15.1.86 and Part 4, Section 2.15.1.87
te
Proposed Disposition of DIS 29500 Comment MY-0018 (Modified: 2008-01-11) Introduction A bitfield (often referred to as a bitmask) is not a binary format. Bitfields in DIS 29500 are fully defined in XML and can be processed by common XML tools such as XSLT, as described below. At the end of this disposition we propose a new Annex which will include a complete XSLT (tested using Saxon 9.0.0.2N) that can process bitfields as defined in DIS29500. It is important to note that many XML-based formats use non-XML syntax for compactness. For example, the XSLT standard itself uses XPath which is a non-XML notation. It should be noted that the original XSL submission used an XML syntax for patterns (§3.2 Patterns in http://www.w3.org/TR/NOTE- XSL.html ), but that the W3C standardization process changed this XML-based syntax to XPath, a more compact non-XML notation. Detailed Response Within Office Open XML, three types of “bitmasks” are used, each after careful consideration of the tradeoffs of taking this approach: Values already defined in another ISO/IEC Standard for example, ISO/IEC 14496-22:2007 (Part 4, §2.8.2.13 and §2.8.2.16) Values stored as a raw bitfield (Part 4, §2.3.1.8, §2.4.7, and §2.4.8) Values stored by encoding the bitfield as a hexadecimal value (e.g. Part 4, §2.4.51, §2.4.52, §2.15.1.86, and §6.1.2.7) (It should be noted that although concerns were raised regarding Part 4, §2.15.1.87, this is simply an enumerated value list, not a bitfield.) Given the concern with this choice, a discussion of this decision and the resulting concerns raised by national bodies is extremely important. Decision to use Bitfields The decision to use this format was motivated by several important design goals: First, it was important to the design goals of Office Open XML that existing standards be used without modification. This is apparent from the use of bitfields in Part 4, §2.8.2.13 and §2.8.2.16, both of which contain values defined in ISO/IEC 14496-22:2007. We believe that it would be inappropriate to redefine the syntax of values already defined in an existing ISO/IEC Standard therefore, these values are stored in the “bitfield” form defined by that Standard. Second, it was important that compactness be a part of the design of Office Open XML, which is a common design goal, even within XML-based standards. For example, the XSLT standard itself uses XPath as an attribute value. XPath is frequently used in many contexts (the elements to which an XSLT template is applied (e.g <xsl:template match=”para[last()=1]” >), the elements to which a unique constant is enforced, etc.). However, the XPath expression itself is a string (e.g. “para[last()=1]”) , as is explicitly noted in §1 of the XPath specification: “XPath uses a compact, non-XML syntax to facilitate use of XPath within URIs and XML attribute values.” Consider that 196 individual attributes would have been required to properly represent the value space of the data stored in the six attributes in §2.8.2.16, which would likely increase the complexity for both producers and consumers (as 196 similarly-named attributes would be easily confused). It is also inconsistent with the goal of human readability. Finally, it was important that the resulting XML be processed easily using a wide range of tools from XSLT to programming languages like Java or C++. A full discussion of this (including a discussion of the concerns specific to XSLT) is located below. Can be validated by XML Schema technologies Another concern raised with these values was that their value space was outside of that provided by common schema languages, and as such, they could not be validated using standard XML Schema languages. This was a consideration of the use of these fields they can be represented using all modern schema languages, as demonstrated below: W3C XML Schema 1.0 Bitfields can be declared in W3C XML Schema as a restriction on a string token. These declarations provide 100% of the lexical-space constraints. (The value-space constraints have a one-to-one relationship with the lexical-space constraints.) <xsd:simpleType name="bitfield16"> <xsd:restriction base="xs:NMTOKEN"> <xsd:pattern value=”[01]{16}” /> </xsd:restriction> </xsd:simpleType> … <xsd:element name=”x”> <xsd:attribute name=”b1″ type=”bitfield16″ /> </xsd:element> That XML Schema simple type declaration does not explicitly specify that each bit is a Boolean value, nor does it define named values for each bit. However, a simple type of Booleans “derived by list” enables manipulation by programming language bitmask operations as well as XSLT. ISO RELAX NG Compact Syntax Bitfields can be declared in ISO RELAX NG (Compact Syntax) as string tokens. pattern x-pattern = element x { attribute b1 { xs:NMTOKEN }} ISO Schematron Bitfields can be declared in ISO Schematron using the following. These declarations provide 100% of the lexical-space constraints. (The value-space constraints have a one-to-one relationship with the lexical-space constraints.) <sch:rule abstract=”true” name=”bitfield16″> <sch:let name=”b1″ value=”string-normalize(.)”/> <sch:let name=”f1″ value=”substring( $b1, 1, 1)” /> <sch:let name=”f2″ value=”substring( $b1, 2, 1)” /> <sch:let name=”f3″ value=”substring( $b1, 3, 1)” /> <sch:let name=”f4″ value=”substring( $b1, 4, 1)” /> <sch:let name=”f5″ value=”substring( $b1, 5, 1)” /> <sch:let name=”f6″ value=”substring( $b1, 6, 1)” /> <sch:let name=”f7″ value=”substring( $b1, 7, 1)” /> <sch:let name=”f8″ value=”substring( $b1, 8, 1)” /> <sch:let name=”f9″ value=”substring( $b1, 9, 1)” /> <sch:let name=”f10″ value=”substring( $b1, 10, 1)” /> <sch:let name=”f11″ value=”substring( $b1, 11, 1)” /> <sch:let name=”f12″ value=”substring( $b1, 12, 1)” /> <sch:let name=”f13″ value=”substring( $b1, 13, 1)” /> <sch:let name=”f14″ value=”substring( $b1, 14, 1)” /> <sch:let name=”f15″ value=”substring( $b1, 15, 1)” /> <sch:let name=”f16″ value=”substring( $b1, 16, 1)” /> <sch:assert test=”string-length($b1) = 16″> <sch:name/> is a bitfield. Its value (omitting leading and trailing whitespace) should be 16 characters long. </sch:assert> <sch:assert test=” (f1=0 or f1=1) and (f2=0 or f2=1)and (f3=0 or f3=1) and (f4=0 or f4=1) and (f5=0 or f5=1) and (f6=0 or f6=1)and (f7=0 or f7=1) and (f8=0 or f8=1) and (f9=0 or f9=1) and (f10=0 or f10=1)and (f11=0 or f11=1) and (f12=0 or f12=1) and (f13=0 or f13=1) and (f14=0 or f14=1)and (f15=0 or f15=1) and (f16=0 or f16=1)”> <sch:name/> is a bitfield. It should only have 0s and 1s. </sch:assert> </sch:rule> … <sch:rule context=”x/@b1″> <sch:extends rule=”bitmask16″/> </sch:rule> More descriptive names can be used, following the documentation in DIS29500 or the appropriate standard. Schematron also allows constraints related to any complex relationships between individual bits in a bitfield (or between bitfields). Using the upcoming XSLT2 Query Lanugage Binding for Schematron, simpler tests are possible: <sch:rule abstract=”true” name=”bitfield16″> <sch:let name=”b1″ value=”string-normalize(.)”/> <sch:assert test=”match( $b1 , ‘^[01]{16}$’)”> <sch:name/> is a bitfield. It should only have 0s and 1s. It should have 16 bits. </sch:assert> </sch:rule> … <sch:rule context=”x/@b1″> <sch:extends rule=”bitfield16″/> </sch:rule> ISO DTLL ISO Data Type Library (DTLL) Language is a draft schema language part of ISO DSDL, being developed by SC34 WG1. It is intended to address exactly the kind of issue of bitfields: mapping from data values in arbitrary notations to standard types and notations. The data-type model adopted by W3C XML Schemas has been problematic for users with data which uses notations which pre-exist or independently exist to the notations provided by W3C XML Schemas. For example, some users may wish to have dates in their national or regional format, not in ISO 8601. ISO DTLL provides a mechanism for mapping to and from these local notations. In the case of bitfields, it is expected that ISO DTLL would allow, for example, the mapping between a bitfield and an XML Schemas list of Booleans. This would allow closer integration with “type-aware” XML tools. In the following DTLL (draft) schema fragment, the bitmask16 type is declared, a regular expression parses the bitfield into substrings, and then each substring is given a datatype. <dt:datatype name=”bitfield16″> <dt:parse whitespace=”collapse”> <dt:regex> (?[bit1][01]) (?[bit2][01]) (?[bit3][01]) (?[bit4][01]) (?[bit5][01]) (?[bit6][01]) (?[bit7][01]) (?[bit8][01]) (?[bit9][01]) (?[bit10][01]) (?[bit11][01]) (?[bit12][01]) (?[bit13][01]) (?[bit14][01]) (?[bit15][01]) (?[bit16][01]) </dt:regex> </dt:parse> <dt:property name=”bit1″ type=”xs:boolean” /> <dt:property name=”bit2″ type=”xs:boolean” /> <dt:property name=”bit3″ type=”xs:boolean” /> <dt:property name=”bit4″ type=”xs:boolean” /> <dt:property name=”bit5″ type=”xs:boolean” /> <dt:property name=”bit6″ type=”xs:boolean” /> <dt:property name=”bit7″ type=”xs:boolean” /> <dt:property name=”bit8″ type=”xs:boolean” /> <dt:property name=”bit9″ type=”xs:boolean” /> <dt:property name=”bit10″ type=”xs:boolean” /> <dt:property name=”bit11″ type=”xs:boolean” /> <dt:property name=”bit12″ type=”xs:boolean” /> <dt:property name=”bit13″ type=”xs:boolean” /> <dt:property name=”bit14″ type=”xs:boolean” /> <dt:property name=”bit15″ type=”xs:boolean” /> <dt:property name=”bit16″ type=”xs:boolean” /> </dt:datatype> Using DTLL, a bitfield is parsed into a small XML document (or fragment) such as <root> <bit1>1</bit1> <bit2>1</bit2> <bit3>1</bit3> <bit4>1</bit4> <bit5>1</bit5> <bit6>1</bit6> <bit7>1</bit7> <bit8>1</bit8> <bit9>1</bit9> <bit10>1</bit10> <bit11>1</bit11> <bit12>1</bit12> <bit13>1</bit13> <bit14>1</bit14> <bit15>1</bit15> <bit16>1</bit16> </root> This document can be processed with even basic XML processing tools. More descriptive names can be used, following the documentation in DIS 29500 or the appropriate standard. Can be processed with modern tools Contrary to the concerns raised by several NBs, bitfields can be manipulated by most (perhaps all) common programming languages, including XSLT, Java, JavaScript, C, C++ and C#. These languages support reading and writing bitfields and logical operations with bitmasks. Can be processed with XSLT Support for reading, manipulating and writing bitfields in XSLT is not only completely possible, but almost trivial. Handling raw bitfields For raw bitfields (Part 4, §2.3.1.8, §2.4.7, and §2.4.8) to name and read individual bits, variables may be used. These may use string operations (indexing) or numeric operations to extract the value of each bit. The simplest way to get an individual value from a bitfield by position is to use string indexing, for example, <xsl:if test="substring( normalize-space (@eg), 5,1)"/> … </xsl:if> To read in and name each value, variables may be used (with more descriptive names): <xsl:template match=”x” > <xsl:variable name=”b1″ select=”string-normalize( $b1 )”/> <xsl:variable name=”f1″ select=”substring( $b1, 1, 1)” /> <xsl:variable name=”f2″ select=”substring( $b1, 2, 1)” /> … </xsl:template> The variables may be simply serialized again, e.g.: <xsd:value-of select=”concat( $f1, $f2, $f3, $f4, $new-f5, $f6, $f7, $f8, $f9, $f10, $f11, $f12, $f13, $f14, $f15, $f16)” /> To perform logical operations, named templates such as the following may be defined. In the following example, we demonstrate the technique of casting 0 and 1 to numbers and making use of mathematical operations. <xsl:template name=”logical-and-16″> <xsl:param name=”arg1″ /> <xsl:param name=”arg2″ /> <xsl:variable name=”token1″ select=”string-normalize(arg1)”/> <xsl:variable name=”token2″ select=”string-normalize(arg2)”/> <xsl:if test=” string-length( $token1 ) != string-length( $token2 ) “> <xsl:message>Error incorrect lengths</xsl:message> </xsl:if> <xsl:value-of select=”number(substring($arg1,1,1)) * number(substring($arg2, 1, 1))”/> <xsl:value-of select=”number(substring($arg1,2,1)) * number(substring($arg2, 2, 1))”/> <xsl:value-of select=”number(substring($arg1,3,1)) * number(substring($arg2, 3, 1))”/> <xsl:value-of select=”number(substring($arg1, 4, 1)) * number(substring($arg2, 4, 1))”/> <xsl:value-of select=”number(substring($arg1, 5, 1)) * number(substring($arg2, 5, 1))”/> <xsl:value-of select=”number(substring($arg1, 6, 1)) * number(substring($arg2, 6, 1))”/> <xsl:value-of select=”number(substring($arg1, 7, 1)) * number(substring($arg2, 7, 1))”/> <xsl:value-of select=”number(substring($arg1, 8, 1)) * number(substring($arg2, 8, 1))”/> <xsl:value-of select=”number(substring($arg1, 9, 1)) * number(substring($arg2, 9, 1))”/> <xsl:value-of select=”number(substring($arg1, 10, 1)) * number(substring($arg2, 10, 1))”/> <xsl:value-of select=”number(substring($arg1, 11, 1)) * number(substring($arg2, 11, 1))”/> <xsl:value-of select=”number(substring($arg1, 12, 1)) * number(substring($arg2, 12, 1))”/> <xsl:value-of select=”number(substring($arg1, 13, 1)) * number(substring($arg2, 13, 1))”/> <xsl:value-of select=”number(substring($arg1, 14, 1)) * number(substring($arg2, 14, 1))”/> <xsl:value-of select=”number(substring($arg1, 15, 1)) * number(substring($arg2, 15, 1))”/> <xsl:value-of select=”number(substring($arg1, 16, 1)) * number(substring($arg2, 16, 1))”/> </xsl:template> Following is a simple XSLT named template which extracts the specified bit from a bitfield of this type: <xsl:template name="GetValueForPosition"> <xsl:param name="Bitmask" /> <xsl:param name="Position"/> <xsl:value-of select="substring(normalize-space($Bitmask),$Position,1)" /> </xsl:template> XSLT also allows extension functions. In typical Java implementations, for example, these extension functions make the platform API functions available; bitfield operations may be available through these. Handling hexadecimal-encoded bitfields For bitfields which have been encoded into hexadecimal values (e.g. Part 4, §2.4.51, §2.4.52, and §2.15.1.86), the state of any individual bit can be easily extracted via a simple named template (just as shown above) which: Takes in the bitfeld and the value to check (i.e. check for the 0200 bit in 0A40) Returns true if set, false otherwise That template would be represented as follows: <xsl:template name="GetValueForPosition"> <xsl:param name="Bitfield" /> <xsl:param name="CompareValue" /> <xsl:variable name="DecimalCompareValue" select="translate($CompareValue,’0′,”)"/> <xsl:variable name="PositionInCompareValue" select="string-length(substring-before($CompareValue,$DecimalCompareValue))+1"/> <xsl:variable name="DecimalInBitfield" select="number(substring-before(substring-after(’00/11/22/33/44/55/66/77/88/99/A10/B11/C12/D13/E14/F15/ ‘,substring($Bitfield,$PositionInCompareValue,1)),’/'))"/> <xsl:value-of select="$DecimalInBitfield mod (2*$DecimalCompareValue) >= $DecimalCompareValue"/> </xsl:template> Even with the addition of error handling, the template is still quite short: <xsl:template name="GetValueForPosition"> <xsl:param name="Bitfield" /> <xsl:param name="CompareValue" /> <xsl:variable name="DecimalCompareValue" select="translate($CompareValue,’0′,”)"/> <xsl:variable name="PositionInCompareValue" select="string-length(substring-before($CompareValue,$DecimalCompareValue))+1"/> <xsl:variable name="DecimalInBitfield" select="number(substring-before(substring-after(’00/11/22/33/44/55/66/77/88/99/A10/B11/C12/D13/E14/F15/a10/b11/c12/d13/e14/f15/’,substring($Bitfield,$PositionInCompareValue,1)),’/'))"/> <xsl:choose> <xsl:when test="$DecimalCompareValue=(8 or 4 or 2 or 1) and string-length($Bitfield)=4 and $PositionInCompareValue >=1 and $PositionInCompareValue <=4 and not(string($DecimalInBitfield)=’NaN’)"> <xsl:value-of select="$DecimalInBitfield mod (2*$DecimalCompareValue) >= $DecimalCompareValue"/> </xsl:when> <xsl:otherwise> <xsl:message terminate="yes">Invalid input.</xsl:message> </xsl:otherwise> </xsl:choose> </xsl:template> Bitfields allow for extensibility Another concern raised concerned the ability to extend this datatype in the future. Although it is true that a bitfield is a fixed size, this does not mean that they cannot be extended: for example, if an additional bit was required, an additional attribute (whether a bitfield, a Boolean or a list) can be added to the element. The values of the particular bitfield may be fixed by the bitfield width, but the information that can be marked up is not limited. Conclusion Overall, we believe that this datatype is appropriate, as: It is compact In several places, it is used in order to reuse existing standards It can be easily processed by XSLT and other tools The following new annex will be added to Part 4: Annex J Processing Bitfields with XSLT This Annex is informative. Two types of bitfields are used in the markup of this specification: Values stored as a raw bitfield (Part 4, §2.3.1.8, §2.4.7, and §2.4.8) Values stored by encoding the bitfield as a hexadecimal value (e.g. Part 4, §2.4.51, §2.4.52, and §2.15.1.86) The following information provides a mechanism for detailing with these values using XSLT 1.0. The following transforms have been tested using Saxon 9.0.0.2N. J.1 Handling raw bitfields For raw bitfields (Part 4, §2.3.1.8, §2.4.7, and §2.4.8) to name and read individual bits, string operations (indexing) or numeric operations to can be used extract the value of each bit. The following complete XSLT shows how each table in a WordprocessingML document can be processed to check if the paragraph’s conditional formatting, by checking the bitfield stored in the cnfStyle element (§2.3.1.8): <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"> <xsl:template match="/"> <BitmaskXSLTExample> <xsl:apply-templates select="//w:tbl" /> </BitmaskXSLTExample> </xsl:template> <xsl:template match="w:tbl"> <table> <xsl:apply-templates select=".//w:p" /> </table> </xsl:template> <xsl:template match="w:p"> <paragraph> <xsl:apply-templates mode="paragraph" select=".//w:cnfStyle"/> </paragraph> </xsl:template> <xsl:template match="w:cnfStyle" mode="paragraph"> <ParagraphSpecifiesFirstRow> <xsl:call-template name="GetValueForPosition"> <xsl:with-param name="Bitfield" select="@w:val"/> <xsl:with-param name="ComparePosition" select="1"/> </xsl:call-template> </ParagraphSpecifiesFirstRow> </xsl:template> <xsl:template name="GetValueForPosition"> <xsl:param name="Bitfield" /> <xsl:param name="ComparePosition" /> <xsl:value-of select="substring($Bitfield,$ComparePosition,1)=1"/> </xsl:template> </xsl:stylesheet> J.2 Handling hexadecimal-encoded bitfields For bitfields which have been encoded into hexadecimal values (e.g. Part 4, §2.4.51, §2.4.52, and §2.15.1.86), the state of any individual bit can be easily extracted via a simple named template (just as shown above) which: Takes in the bitfeld and the value to check (i.e. check for the 0200 bit in 0A40) Returns true if set, false otherwise The following complete XSLT shows how each table in a WordprocessingML document can be processed to check if the conditional table header row should be shown, by checking the bitfield stored in the tblLook element (§2.4.51): <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"> <xsl:template match="/"> <BitmaskXSLTExample> <xsl:apply-templates select="//w:tblLook" /> </BitmaskXSLTExample> </xsl:template> <xsl:template match="w:tblLook"> <TableHasHeaderRow> <xsl:call-template name="GetValueForPosition"> <xsl:with-param name="Bitfield" select="@w:val"/> <xsl:with-param name="CompareValue" select="’0020′"/> </xsl:call-template> </TableHasHeaderRow> </xsl:template> <xsl:template name="GetValueForPosition"> <xsl:param name="Bitfield" /> <xsl:param name="CompareValue" /> <xsl:variable name="DecimalCompareValue" select="translate($CompareValue,’0′,”)"/> <xsl:variable name="PositionInCompareValue" select="string-length(substring-before($CompareValue,$DecimalCompareValue))+1"/> <xsl:variable name="DecimalInBitfield" select="number(substring-before(substring-after(’00/11/22/33/44/55/66/77/88/99/A10/B11/C12/D13/E14/F15/a10/b11/c12/d13/e14/f15/’,substring($Bitfield,$PositionInCompareValue,1)),’/'))"/> <xsl:choose> <xsl:when test="$DecimalCompareValue=(8 or 4 or 2 or 1) and string-length($Bitfield)=4 and $PositionInCompareValue >=1 and $PositionInCompareValue <=4 and not(string($DecimalInBitfield)=’NaN’)"> <xsl:value-of select="$DecimalInBitfield mod (2*$DecimalCompareValue) >= $DecimalCompareValue"/> </xsl:when> <xsl:otherwise> <xsl:message terminate="yes">Invalid input.</xsl:message> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet> End informative Annex. Similar Comments: CL-0016 , CO-0080 , CO-0087 , CO-0088 , CO-0090 , CO-0091 , CO-0094 , CO-0100 , CO-0101 , CO-0224 , DE-0102 , DK-0037 , DK-0108 , DK-0109 , DK-0110 , DK-0111 , DK-0112 , DK-0116 , DK-0117 , DK-0132 , DK-0147 , FI-0009 , FR-0185 , FR- 0186 , FR-0187 , FR-0188 , FR-0189 , FR-0190 , FR-0191 , FR-0192 , FR-0370 , GB-0182 , GB-0197 , GB-0198 , GB-0200 , GB-0201 , GB- 0212 , GB-0224 , GB-0225 , GB-0482 , GH-0007 , GR-0009 , GR-0029 , GR-0030 , GR-0068 , GR-0070 , GR-0071 , GR-0072 , GR-0073 , GR-0089 , IR-0015 , IR-0016 , IR-0037 , IR-0038 , IR-0039 , IR-0040 , IR-0041 , IR-0042 , US-0056 , US-0123 , US-0125 , US-0126 , US- 0127 , US-0128 , US-0129 , US-0156 , UY-0014 , VE-0068 , ZA-0010
