Tutorial: Extract original images from MS-Word .DOC using OpenOffice.org
This tutorial shows you how to extract the uncompressed images from a Microsoft Word Document using OpenOffice.org free office suite. You can even extract embedded cliparts (like WMF files) or other vector based graphics embedded in the document, in their original format.
This tutorial is inspired by How to extract the images from a .doc file using OpenOffice.org Writer tutorial which explains another way of extracting the images through exporting the pages as HTML document. The HTML method is good especially if the images are .GIF or .JPG files. But if you have for instance .WMF files in the Word document and you want them in the original scalable format or without loosing information through compression, you need another way of extracting them:
Ingredients:
- the OpenOffice.org free office suite, either installed or the portable version available at PortableApps.com website.
- a .DOC document containing the images (here is a sample DOC document we prepared for you sample-images.doc which contains two cliparts in form of WMF files and one PNG image)
The few simple steps:
1. Converting the .DOC in .ODT files:
–> Open the document in OpenOffice.org Writer
–> Save the document in ODT (Open Document Text) file, the native OpenOffice.org file format (from the File menu choose Save As and in file type select .ODT)
2. Use a file manager/browser to rename the saved ODT document to .ZIP and unpack it
–> Open a file browser and go to the ODT document (sample-images.odt)
–> Rename the .ODT document to .ZIP (sample-images.odt –> sample-images.zip)
![]()
–> unzip the document in a folder
![]()
3. Get your original embedded and uncompressed images from the PICTURES folder inside the ZIP you unpacked.
![]()
THIS IS IT! As simple as easy.
Explanations:
This tutorial takes advantage of the (ODF) Open Document File format and the fact that the document is in fact stored as a ZIP archive, containing the original texts and styles as XML files and the Pictures in their original format in a sub-folder in the archive.
Open and Free:
Of course, the openess of the file format “opens” endless possibilities for creating applications to interact and use the documents you created in OpenOffice.org office suite and the documents you imported from Word.
Unlike the MS-Office file format which is closed, using an ODF based office suite (like OpenOffice.org) means - as you have seen from this tutorial - the freedom to do whatever you like with the files and documents you have created, even without waiting for the developers to create a special tool for extracting the images from the document. Such a feature is missing from Word too.
Enjoy the taste of freedom! …and please drop a comment and tell us your experience and opinion with the above.
Written by cdriga on November 4th, 2007 with
3 comments.
Read more articles on OPEN SOURCE WORLD and OpenOffice.org and Software you should try.
#1. November 4th, 2007, at 3:41 PM.
Hey that’s a really good way to go about getting the images out, it never even occurred to me. Cheers!