Home Map Index Search News Archives Links About LF
[Top bar]
[Bottom bar]
This article is available in: English  Castellano  Deutsch  Francais  Nederlands  Russian  Turkce  
convert to palmConvert to GutenPalm
or to PalmDoc

[Photo of the Author]
by

About the author:

Joined the Dutch LF team in 1999 and became second editor earlier this year. Is an informational chemistry student at the University of Nijmegen. Plays basketball and enjoys hiking.


Content:

Making PDF documents with DocBook

[Illustration]

Abstract:

This article describes how you can use DocBook to develop PDF documents and will cover tools you need to edit DocBook articles and tools to translate them to PDF documents. Since this article only names the software tools you need and does not tell how to install them, this article is intended for experienced Linux users.

The first part of this article will focus on the format of DocBook documents. When DocBook is introduced, i will try to explain what tools are needed to convert these DocBook documents to PDF documents which can be viewed with Acrobat.



 

What is DocBook?

DocBook [1] is an SGML application developed to markup documents, just like HTML marks up web documents. In contrast to HTML, DocBook offers no information on the layout of the document. That is the reason why DocBook documents need to be converted to other formats before they can be viewed. Conversion to other formats is done by tools which apply a certain stylesheet to the DocBook document.


 
Figure 1: Conversion from DocBook to PDF with a stylesheet

Later in this article will be explained what stylesheet you must use for this conversion and what tool applies the stylesheet to the DocBook document. First we are going to see how documents are put together.

 

Writing an article

DocBook is able to markup two kinds of documents: articles and books. Since they are in principle the same, I will use the article markup as an example. Before I will give an example of a simple article document, first some basic principles about DocBook.

DocBook is in principle a SGML application, just like HTML. But there is also an XML version of DocBook. The XML version is more strict, but easier to read and therefore to easier learn. Since XML itself is also an SGML application, all SGML tools can still be used. The main difference between the SGML and XML variant are the following (and this holds for every XML application):

This means that you cannot use <BR> as in HTML, but must use <BR></BR>. The second requirement means that you cannot write <B><A HREF="some_url">click here</B></A> but must nest the elements properly: <B><A HREF="some_url">click here</A></B>.

Now that we covered these important formalities, we can start writing articles in DocBook.

    <?xml version="1.0"?>
    <article>
      <title>Writing DocBook articles</title>
      <artheader>
        <abstract>
          This article describes how you can use DocBook to develop
          PDF documents and will cover tools you need to edit DocBook
          articles and tools to translate them to PDF documents.
        </abstract>
        <author>
          <firstname>Egon</firstname>
          <surname>Willighagen</surname>
        </author>
        <date></date>
      </artheader>
    </article>

Not that difficult I would say. We have started an article with a title, a short abstract, a date on which it was written and the name of the author.

The next step is to add sections to the article by making use of section elements:

    <?xml version="1.0"?>
    <article>
      <title>Writing DocBook articles</title>
      <artheader>
        ... the articles header ...
      </artheader>

      <section>
        <title>Introduction</title>
      </section>

      ... other sections ...

    </article>

We have now added an Introduction section to the article. Additional section elements can be used to give Results, Conclusion or any other section.

 

Adding text and other information

All text is contained in para elements, comparable with HTML's p elements:

    <section>
      <title>Introduction</title>
      <para>
        DocBook is an SGML application
        developed to markup documents, just like HTML marks up webdocuments.
      </para>
    </section>

But besides text a lot of other elements are available. In the rest of this section it is shown how information like examples, lists, pictures and some others can be inserted into the article.

Adding examples

Examples can be added with the use of the example element, like in the following example where an example program is given:

<example>
  <title>Perl program that converts an XML document into a HTML page.</title>
  <programlisting>
    #!/usr/bin/perl -w
    use diagnostics;
    use strict;
    use XML::XSLT;

    my $XSLTparser = XML::XSLT->new();
    $XSLTparser->open_project ("file.xml", "stylesheet.xsl", "FILE", "FILE");
    $XSLTparser->process_project;
    $XSLTparser->print_result();
  </programlisting>
</example>
But example can also contain text, pictures and other information.

Adding lists

Like in HTML DocBook can also contain lists. Lists are defined by the itemizedlist element that may contain one or more listitem elements:

<itemizedlist>
  <listitem>
    <para>an item</para>
  </listitem>
  <listitem>
    <para>another item</para>
  </listitem>
  <listitem>
    <para>and again an item</para>
  </listitem>
</itemizedlist>
Note that here also the text is contained in a para element. Text must always be contained within this element!

Lists can as well be orderd. In that case you can use the orderedlist element instead of the itemizedlist element. By adding a numeration parameter (e.g. <orderedlist numeration="Arabic">) you can set the number type.

Adding pictures

Images can be put into the article:

<mediaobject>
<imageobject>
<imagedata fileref="some_picture.gif" format="gif"/>
</imageobject>
<textobject>
  <para>
    If you were not using <productname>Lynx</productname>
    you could now see a picture.
  </para>
</textobject>
</mediaobject>
You can see that beside the picture itself also a text is given. As a matter of fact i could have also added a movie. The stylesheet processor that would convert the DocBook document into PDF could then choose the best medium, which would probably be the picture.

Also note that the word Lynx has mark up. This is a feature specific for mark up language where layout is seperated from information. The article simply states that Lynx is a product of which Lynx is the name. The stylesheet later describes that the productname must be shown in a specific layout, for example, italic. In the following section we will see some additional markup for words.

Markup of words

As was shown in the picture example just above, words themselves can have markup. In the table below are some markup elements given for words:

Element Description
abbrev An abbreviation, especially one followed by a period.
Example:
<para><abbrev>e.g.</abbrev> means for example.</para>
acronym An acronym
Example:
<para><acronym>DSM</acronym> (chemical company) means "De StaatsMijnen" (=The State Mines).</para>
email Some persons email address
Example:
<para>My email is <email>egon.w@linuxfocus.org</email></para>
keyword One of the article keywords
Example:
<para>In my humble opinion <keyword>chemistry</keyword> is very important.</para>
And lots of other elements which are listed in a nice Reference Card [2].

Now that a short introduction is given about DocBook elements, it is time to move on and start making a PDF document.

 

Converting the document to PDF

Once we have a DocBook document we can convert them to several formats. Besides the obvious PDF, we could also convert the document to a website, a PostScript document, a Tex source file or a RTF (Rich Text Format) document that can be read with WordPerfect, Word, StarWriter and other wordprocessors. But in this article we are only concerned with conversion into a PDF document.

DocBook documents can be written with any editor like Vi and Nedit. Even better is Emacs: Norman Walsh wrote an Emacs major mode for docbook [3] which adds some usefull aspects, like completing element names or inserting complete template elements.
Besides making your own test article, you can also download my version which contains the examples given in this article.

As explained in the beginning of this article we need both a stylesheet and a tool that uses this stylesheet to convert the DocBook article to the PDF format. The stylesheet actually does not convert DocBook directly into PDF, but a TeX step is in between. The stylesheet we use are Norman Walsh's Modular DocBook Stylesheets which [4] are written in DSSSL.

To use these stylesheet DSSSL stylesheet for conversions we need a DSSSL processor. The processor I used is called Jade [5] and was developed by James Clark (he stopped supporting this tool). It is replaced by OpenJade [6], but I haven't used that tool yet.

Note that for packages of the Modular Stylesheets, Jade and JadeTex (see below) are available for all distributions that use packages (like RedHat, Suse, Corel en Debian)! So check your installation program, CD or distribution website first!

On my Debian system Walsh's Modular Stylesheets for conversion to PDF are installed in /usr/lib/sgml/stylesheets/dsssl/docbook/nwalsh/print/ which is given with the "-d" parameter for Jade. The "-t" option tells Jade to use a TeX backend:

egonw@localhost> ls -al
total 3
-rw-r--r--    1 egonw    egonw        2887 Apr  8 22:06 docbook_article.xml
egonw@localhost> jade -t tex -d /usr/lib/sgml/stylesheets/dsssl/docbook/nwalsh/print/docbook.dsl docbook_article.xml
egonw@localhost> ls -al
total 21
-rw-r--r--    1 egonw    egonw        2887 Apr  8 22:06 docbook_article.xml
-rw-r--r--    1 egonw    egonw       17701 Apr  8 22:29 docbook_article.tex
As you can see Jade generates a TeX file. This TeX file can then be converted to a PDF file with the pdfjadetex tool contained in the JadeTeX package [7]:
egonw@localhost> ls -al
total 21
-rw-r--r--    1 egonw    egonw        2887 Apr  8 22:06 docbook_article.xml
-rw-r--r--    1 egonw    egonw       17701 Apr  8 22:29 docbook_article.tex
egonw@localhost> pdfjadetex docbook_article.tex
This produces a nice docbook_article.pdf. Note that a lot of layout is added like the article title at the top of each page and the use of a different font for the program listing. When I started working with DocBook most time was consumed to understand what combinations I could have. This article shows only one such combination.

 

Concluding remarks

The DocBook XML language is very extensive. And so are the means of converting them into other formats. This article only gives a very short introduction. Questions can be posted on the talkback pages for this article. More information can be found at references [8] and [9]. Note that this last reference itself is completly written in DocBook!

Advanced topics that are not covered by this article but are available with DocBook are:

Maybe subjects for a future article.

 

References

1.  DocBook website
2.  Quick Reference: DocBook Elements
3.  Emacs major mode for DocBook
4.  The Modular DocBook Stylesheets
5.  Jade
6.  OpenJade
7.  JadeTeX
8.  Norman Walsh's DocBook site
9.  DocBook: The Definate Guide on SGML variant

 

Talkback form for this article

Every article has its own talkback page. On this page you can submit a comment or look at comments from other readers:
 talkback page 

Webpages maintained by the LinuxFocus Editor team
© Egon Willighagen, FDL
LinuxFocus.org

Click here to report a fault or send a comment to LinuxFocus

2001-01-27, generated by lfparser version 2.8