Log In

Come Join Us!

Are you an
Engineering professional?
Join Eng-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Eng-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here


Microsoft Word Formats

Microsoft Word Formats

Microsoft Word Formats

   I am now running an up-to-date version of Microsoft Office here.  I have just created a zip file with drawings and a requisition form.  The form is in Microsoft Word format, and I deliberately saved in the old 97/2000 doc format.  When I tried to add this to my zip archive, WINZIP simply loaded the file and showed me the XML data.  

   I am pretty certain that XML is part of the new version of Word.  The old format was something binary.  I am concerned that the person o the other end of this, may not be able to read the latest version of Word.  

   Has anyone else seen anything like this?


RE: Microsoft Word Formats

XML?  You are correct, I thought XML is in the new format and not the old one. Are you sure is XML and not Rich Text Format? RTF would make more sense.

On the other hand, maybe BECAUSE you have asked to downgrade, MS-Word may be using XML as an intermediate format.  

While XML is NOT the native format of old Word files, it is very likely that old Word does import XML.

RE: Microsoft Word Formats


   The new OpenDoc and Word formats both are zip archives containing XML files.  The files appear to be XML.  I did not examine them closely.  I don't see why they would be RTF.  There is no need to contain an RTF file into an archive.


RE: Microsoft Word Formats

XML is a newer file format that is eventually aimed at replacing HTML across the internet. Newer versions of Office/Word write their data in .xdoc, etc. so that these files include formatting for use on the world wide web. Older versions of Office/Word like 97/2000 versions did not write files with .xml encoding. I don't believe that came about until release of Office 2003 or later. Those earlier versions had file formats like .doc, .xls, etc. Notice the lack of "x" in the file format nomenclature for those older versions.

RE: Microsoft Word Formats

tz101 that info is misleading. It reflects some common confusion about the history and purpose of XML. I'm no expert, but I'd hate for that misinformation to persist.

FWIW, XML is not aimed a replacing HTML because they're not substitutes for each other. You can write your HTML in XML compliant format, and it becomes XHTML (very roughly speaking). It's XHTML that is supposed to supersede HTML, not XML. Lots of people got confused by this distinction and assumed XML was a web language. It's not, it's something far more general than that. Think of it more like a container format like a database than a language format like HTML. It can contain HTML, but it can't replace HTML.

Secondly, newer Office files have .docx extension, not .xdoc.

Thirdly, the files don't necessarily "include formatting for use on the WWW". They just use the XML format to encapsulate the file data - the file data is still Word data or Excel data or whatever. Sure, it makes it easier to parse the document using web services, but that's by-the-by. Whatever is parsing the file still needs to understand the actual data content of the file, which could be anything. Just like someone reading a database file could identify each of the fields but not necessarily the content of those fields, someone reading an XML file could identify the elements but wouldn't necessarily know what to do with the content of each.

RE: Microsoft Word Formats



   I have tested that.  Word makes an absolute shambles of HTML, hand coded with NOTEPAD.  Word is designed to put text on paper, rigorously formatted.  HTML is designed to place text on all sorts of things, definitely including computer screens.  

   XML is a replacement for SGML (Standardized General Markup Language).  HTML is an SGML document type.  XHTML is an XML document type.  Word's text is another XML document type.


Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Eng-Tips Forums free from inappropriate posts.
The Eng-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Eng-Tips forums is a member-only feature.

Click Here to join Eng-Tips and talk with other members!


Close Box

Join Eng-Tips® Today!

Join your peers on the Internet's largest technical engineering professional community.
It's easy to join and it's free.

Here's Why Members Love Eng-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close