C S Shyam Sundar's Weblog

under the hood of a “.docx” file

Posted by: Shyam Sundar on: August 23, 2007

Microsoft Word 2007 saves all its new document files in “.docx” format which is *NOT* backward compatible with any previous versions of Microsoft Office.

Microsoft offers a free utility for download: which converts the “.docx” file to be usable by older version of Office.

Now, we can analyze what is under the hood of a .docx file.

[+] First, create a small document. I have created a small hello world doc.

1.jpg

[+] Now, rename the .docx file to .zip. In my case it is Hello World.docx. After renaming, unzip to a folder.

2.jpg

[+] Inside the “word” folder, we have many xml files. In that: the “document.xml” contains the contents of the file in OOXML ( Supposed to be Open Office compatible XML and not Object Oriented XML ;-) ).

3.png

From the XML we can see that the contents of our document is present in the form of XML markups.
Good to see that Microsoft has started to support open formats.

Leave a Reply

You must be logged in to post a comment.