Copyright (C) 2009-2010 Aurelien Bompard
This script can convert a wiki page to the OpenDocument Text (ODT) format, standardized as ISO/IEC 26300:2006, and the native format of office suites such as OpenOffice.org, KOffice, and others.
It uses a template ODT file which will be filled with the converted content of the XHTML page.
Website: http://xhtml2odt.org
Inspired by the work on docbook2odt, by Roman Fordinal
Call the script with the --help option to see all the available options. The main options are:
The full help message is:
Usage: xhtml2odt.py [options] -i input -o output -t template.odt
Options:
-h, --help show this help message and exit
-i FILE, --input=FILE
Read the html from this file
-o FILE, --output=FILE
Location of the output ODT file
-t FILE, --template=FILE
Location of the template ODT file
-u URL, --url=URL Use this URL for relative links
-v, --verbose Show what's going on
--html-id=ID Only export from the element with this ID
--replace=KEYWORD Keyword to replace in the ODT template (default is
ODT-INSERT)
--cut-start=KEYWORD Keyword to start cutting text from the ODT template
(default is ODT-CUT-START)
--cut-stop=KEYWORD Keyword to stop cutting text from the ODT template
(default is ODT-CUT-STOP)
--top-header-level=LEVEL
Level of highest header in the HTML (default is 1)
--img-default-width=WIDTH
Default image width (default is 8cm)
--img-default-height=HEIGHT
Default image height (default is 6cm)
--dpi=DPI Screen resolution in Dots Per Inch (default is 96)
--no-network Do not download remote images
--stylesdir=DIR Override the style templates directory
GNU LGPL v2 or later: http://www.gnu.org/licenses/lgpl-2.0.html
This program is free software; you can redistribute it and/or modify it under the terms of the GNU Library General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public License for more details.
This class contains the HTML document to convert to ODT. The HTML code will be run through Tidy to ensure that is is valid and well-formed XHTML.
| Variable options: | |
|---|---|
| An OptionParser-result object containing the options for processing. | |
| Variable html: | The HTML code. |
Handles the conversion and production of an ODT file
Downloads the given image to a temporary location.
| Parameter: | src (str) – the URL to download |
|---|
Handling of image tags in the XHTML. Local and remote images are handled differently: see the handle_local_img() and handle_remote_img() methods for details.
| Parameter: | xhtml (str) – the XHTML content to import |
|---|---|
| Returns: | XHTML with normalized img tags |
| Return type: | str |
Imports an image into the ODT file.
| Parameters: |
|
|---|
Handling of local images. This method should be called as a callback on each img tag.
Find the real path of the image file and use the handle_img() method to flag it for inclusion in the ODT file.
This implementation downloads the files that come from the same domain as the XHTML document cames from, but server-based export plugins can just retrieve it from the local disk, using either the DOCUMENT_ROOT or any appropriate method (depending on the web application you’re writing an export plugin for).
| Parameter: | img_mo – the match object from the re.sub callback |
|---|
Downloads remote images to a temporary file and flags them for inclusion using the handle_img() method.
| Parameter: | img_mo – the match object from the re.sub callback |
|---|
Main function to run the conversion process:
The next logical step is to use the save() method.
| Parameter: | xhtml (str) – the XHTML content to import |
|---|
Insert ODT XML content into the content.xml file, replacing the keywords if needed.
| Parameter: | content (str) – ODT XML content to insert |
|---|
General method to save the in-memory content to an ODT file on the disk.
If output is None, the document is returned.
| Parameter: | output (str or file-like object or None) – where the document should be saved, see the -o option. |
|---|---|
| Returns: | if output is None: the ODT document ; or else None. |
Converts the XHTML content into ODT.
| Parameter: | xhtml (str) – the XHTML content to import |
|---|---|
| Returns: | the ODT XML from the conversion |
| Return type: | str |