> the proposers works within the W3C’s MathML activity,
It should perhaps be noted that Microsoft do play a full roll in the W3C Math Working Group, (both the current group working on MathML3 and the earlier MathML2 group). As it happens none of the main teams working on ODF is similarly represented, they would be welcome if they joined!
It is not clear what using MathML internally would achieve.
Word 2007 for example uses oomml internally but puts MathML on the clipboard when doing cut and paste, which means that you can, using standard open MathML format, cut and paste expressions between IE, Word and Maple (for example).
(There are some bugs in the current conversion stylesheet that is used here but that is an application issue, irrelevant to the standards process.) Conversely while it’s true that ODF uses MathML in the zip file, the ODF application, OpenOffice, when cutting and pasting Mathematics uses some internal private format rather than mathml on the clipboard. It’s far more useful to expose MathML in interfaces such as clipboard cut and paste.
oooml is fairly strange structurally but the point is that it’s no more or less weird than the rest of office open xml. The Math Zones are designed to be integrated in to the general text editing mechanisms and the markup reflects that by being very similar in structure. I think there are legitimate questions as to whether it really makes sense for ISO to be standardising the internal format of MS Word, but if they _are_ doing that I think that it makes far more sense for the mathematics to be encoded in the same style, rather than having it be isolated in some foreign MathML markup.
So I’m fairly neutral on whether DIS29500 should be approved, but I don’t think that its non-use of MathML should be taken as a reason for blocking approval.
The situation is very similar to that of computer algebra systems. mMthematica and Maple for example have good MathML input and output, but if you look at the XML encodings of their worksheets the mathematics is not in MathML it’s in application specific form.
David
(co-editor of MathML2 and 3, but writing in a personal capacity)
yes, I think the point is that the file format is not an “internal” thing. If it was all considered internal then they might as well stick with the binary format.
Sam, I don’t understand your comment. I didn’t say that the format for storing mathematics was an implementation detail, certainly it is an important part of the spec. But mathematics (as opposed to say image formats) is best served by being treated as a sort of richly marked up _text_. Which lets you do things to it in ways similar to other text, using the same interface, so searching, font changes, spelling autocorrection, whatever else your application does. So it makes perfect sense for documents stored in ooxml to have a markup for mathematics that is basically the same in style (because it’s basically rtf with additional math properties written out with pointy brackets). I think there is a perfectly reasonable argument to be made that ISO shouldn’t be standardising ooxml (I’m personally undecided on that) but I don’t think ooml for normal text and mathml for math really makes sense as a coherent format, or at least I think that the ooml developers could make a perfectly reasonable argument along those lines.
Alan, my comment about “internal” is perhaps misplaced, but really is a reaction to Rob Weir’s posts on this where he gave ODF as an example of a good citizen for using MathML and showed how by unzipping the odf file, and extracting the mathml you could get to mathematica. Conversely Word 2007 is held up as a bad citizen for storing the math as ooml despite the fact that it’s _much_ simpler to get mathml out of Word, you just use cut and paste. Microsoft have a oomml format that suits them and allows them to hook their GUI interface in to the math expressions, they also supply an XSLt transformation to/from xslt and invoke that transformation on cut and paste. This is all good it seems to me. the fact that it’s xml rather than binary is also good, for example it allows you to easily fix the translations when they are not quite right (see my blog).
The file format (it seems to me) is an external dump of Word’s state, and as such it doesn’t really make sense to include MathML in it, as Word doesn’t internally handle mathematical text as MathML. As I said, I won’t necessarily argue with anyone who questions whether an external dump of Word’s state is something ISO ought to standardise, as that is a political/historical/commercial judgement as much as anything else, and not something that I’m particularly qualified to address, but I do know something about math markup having design roles in all three of LaTeX, OpenMath and MathML (but not oomml:-) so I only really want to comment on the specific issue of whether
ooxml for text+mathml for math would be a better format than either of the two other formats currently on the table (ooxml+oomml and ODF)
“I think there are legitimate questions as to whether it really makes sense for ISO to be standardising the internal format of MS Word, but if they _are_ doing that I think that it makes far more sense for the mathematics to be encoded in the same style, rather than having it be isolated in some foreign MathML markup.”
The intent is to standardise a format for office documents, not Office documents!
Like you say, reusing the MathML standard for copy/paste is a useful feature for interop. If we were developing a standard for copy/paste behaviour in math-related applications, the specified behaviour would likely be exactly what Office does now, and OOo would need to change its behaviour to comply.
File formats are a different and important place for interop, and for the same reason the standard should reuse MathML where appropriate, and the next version of Office should implement the standard.
If MathML isn’t a drop-in replacement, but there’s significant overlap, extensions could be defined here or in an external standard.
“If MathML isn’t a drop-in replacement, but there’s significant overlap, extensions could be defined here or in an external standard.”
There is essentially no overlap between mathml and oomml: it isn’t just a case of adding some extra attributes or changing some element names, it is structurally completely different.
Going from oomml to mathml requires character-by-character parsing of text nodes. Murray would probably disagree with me (in fact I’m sure he would:-) but if you were just starting from scratch and wanted to encode mathematical expressions in XML you’d never end up with oomml. Frankly it’s just too weird, but when looked at it context it doesn’t look at all weird when embedded in an ooxml document, it just looks just like the rest, which is not unrelated to the fact that the editing interface for mathematics and other text is rather similar.
It’s perfectly valid to make an argument that said that ISO should just be standardising one thing (ODF/MathML) but if they are going to standardise two things, they may as well be two coherent document formats (ODF/MathML and ooxml/oomml) not force one of the formats to be a shotgun marriage of ooxml with mathml. The worst possible outcome would be that documents were saved with “extended” mathml
which would end up being like an MS Office generated “web page” which is only recognisable as html as it starts with
[grrr half baked comment system on this site ate the end of my comment]
…. only recognisable as html as it starts with <html and ends with /html, the whole data is really stuffed into private application specific comments and namespaced attributes and elements.
Hi David,
thanks for your comment, I wasn’t really expecting quite so much in depth discussion so this is just the vanilla WordPress comment system. I am planning to add some bits to it soon to make it a fully baked comment system.
> the proposers works within the W3C’s MathML activity,
It should perhaps be noted that Microsoft do play a full roll in the W3C Math Working Group, (both the current group working on MathML3 and the earlier MathML2 group). As it happens none of the main teams working on ODF is similarly represented, they would be welcome if they joined!
It is not clear what using MathML internally would achieve.
Word 2007 for example uses oomml internally but puts MathML on the clipboard when doing cut and paste, which means that you can, using standard open MathML format, cut and paste expressions between IE, Word and Maple (for example).
(There are some bugs in the current conversion stylesheet that is used here but that is an application issue, irrelevant to the standards process.) Conversely while it’s true that ODF uses MathML in the zip file, the ODF application, OpenOffice, when cutting and pasting Mathematics uses some internal private format rather than mathml on the clipboard. It’s far more useful to expose MathML in interfaces such as clipboard cut and paste.
oooml is fairly strange structurally but the point is that it’s no more or less weird than the rest of office open xml. The Math Zones are designed to be integrated in to the general text editing mechanisms and the markup reflects that by being very similar in structure. I think there are legitimate questions as to whether it really makes sense for ISO to be standardising the internal format of MS Word, but if they _are_ doing that I think that it makes far more sense for the mathematics to be encoded in the same style, rather than having it be isolated in some foreign MathML markup.
So I’m fairly neutral on whether DIS29500 should be approved, but I don’t think that its non-use of MathML should be taken as a reason for blocking approval.
The situation is very similar to that of computer algebra systems. mMthematica and Maple for example have good MathML input and output, but if you look at the XML encodings of their worksheets the mathematics is not in MathML it’s in application specific form.
David
(co-editor of MathML2 and 3, but writing in a personal capacity)
“It is not clear what using MathML internally would achieve.”
It’s in an ISO file format - that’s an interface, not an implementation detail.
yes, I think the point is that the file format is not an “internal” thing. If it was all considered internal then they might as well stick with the binary format.
Sam, I don’t understand your comment. I didn’t say that the format for storing mathematics was an implementation detail, certainly it is an important part of the spec. But mathematics (as opposed to say image formats) is best served by being treated as a sort of richly marked up _text_. Which lets you do things to it in ways similar to other text, using the same interface, so searching, font changes, spelling autocorrection, whatever else your application does. So it makes perfect sense for documents stored in ooxml to have a markup for mathematics that is basically the same in style (because it’s basically rtf with additional math properties written out with pointy brackets). I think there is a perfectly reasonable argument to be made that ISO shouldn’t be standardising ooxml (I’m personally undecided on that) but I don’t think ooml for normal text and mathml for math really makes sense as a coherent format, or at least I think that the ooml developers could make a perfectly reasonable argument along those lines.
Alan, my comment about “internal” is perhaps misplaced, but really is a reaction to Rob Weir’s posts on this where he gave ODF as an example of a good citizen for using MathML and showed how by unzipping the odf file, and extracting the mathml you could get to mathematica. Conversely Word 2007 is held up as a bad citizen for storing the math as ooml despite the fact that it’s _much_ simpler to get mathml out of Word, you just use cut and paste. Microsoft have a oomml format that suits them and allows them to hook their GUI interface in to the math expressions, they also supply an XSLt transformation to/from xslt and invoke that transformation on cut and paste. This is all good it seems to me. the fact that it’s xml rather than binary is also good, for example it allows you to easily fix the translations when they are not quite right (see my blog).
The file format (it seems to me) is an external dump of Word’s state, and as such it doesn’t really make sense to include MathML in it, as Word doesn’t internally handle mathematical text as MathML. As I said, I won’t necessarily argue with anyone who questions whether an external dump of Word’s state is something ISO ought to standardise, as that is a political/historical/commercial judgement as much as anything else, and not something that I’m particularly qualified to address, but I do know something about math markup having design roles in all three of LaTeX, OpenMath and MathML (but not oomml:-) so I only really want to comment on the specific issue of whether
ooxml for text+mathml for math would be a better format than either of the two other formats currently on the table (ooxml+oomml and ODF)
“I think there are legitimate questions as to whether it really makes sense for ISO to be standardising the internal format of MS Word, but if they _are_ doing that I think that it makes far more sense for the mathematics to be encoded in the same style, rather than having it be isolated in some foreign MathML markup.”
The intent is to standardise a format for office documents, not Office documents!
Like you say, reusing the MathML standard for copy/paste is a useful feature for interop. If we were developing a standard for copy/paste behaviour in math-related applications, the specified behaviour would likely be exactly what Office does now, and OOo would need to change its behaviour to comply.
File formats are a different and important place for interop, and for the same reason the standard should reuse MathML where appropriate, and the next version of Office should implement the standard.
If MathML isn’t a drop-in replacement, but there’s significant overlap, extensions could be defined here or in an external standard.
“If MathML isn’t a drop-in replacement, but there’s significant overlap, extensions could be defined here or in an external standard.”
There is essentially no overlap between mathml and oomml: it isn’t just a case of adding some extra attributes or changing some element names, it is structurally completely different.
Going from oomml to mathml requires character-by-character parsing of text nodes. Murray would probably disagree with me (in fact I’m sure he would:-) but if you were just starting from scratch and wanted to encode mathematical expressions in XML you’d never end up with oomml. Frankly it’s just too weird, but when looked at it context it doesn’t look at all weird when embedded in an ooxml document, it just looks just like the rest, which is not unrelated to the fact that the editing interface for mathematics and other text is rather similar.
It’s perfectly valid to make an argument that said that ISO should just be standardising one thing (ODF/MathML) but if they are going to standardise two things, they may as well be two coherent document formats (ODF/MathML and ooxml/oomml) not force one of the formats to be a shotgun marriage of ooxml with mathml. The worst possible outcome would be that documents were saved with “extended” mathml
which would end up being like an MS Office generated “web page” which is only recognisable as html as it starts with
[grrr half baked comment system on this site ate the end of my comment]
…. only recognisable as html as it starts with <html and ends with /html, the whole data is really stuffed into private application specific comments and namespaced attributes and elements.
Hi David,
thanks for your comment, I wasn’t really expecting quite so much in depth discussion so this is just the vanilla WordPress comment system. I am planning to add some bits to it soon to make it a fully baked comment system.