Linuxdoc Linux Questions
Click here to ask our community of linux experts!
Custom Search

10. How do I get a plain text man page without all that ^H^_ stuff?

Have a look at col(1), because col can filter out backspace sequences. Just in case you can't wait that long:

funnyprompt$ groff -t -e -mandoc -Tascii manpage.1 | col -bx > manpage.txt

The -t and -e switches tell groff to preprocess using tbl and eqn. This is overkill for man pages that don't require preprocessing but it does no harm apart from a few CPU cycles wasted. On the other hand, not using -t when it is actually required does harm: the table is terribly formatted. You can even find out (well, "guess" is a better word) what command is needed to format a certain groff document (not just man pages) by issuing

funnyprompt$ grog /usr/man/man7/signal.7
groff -t -man /usr/man/man7/signal.7

"Grog" stands for "GROff Guess", and it does what it says--guess. If it were perfect we wouldn't need options any more. I've seen it guess incorrectly on macro packages and on preprocessors. Here is a little perl script I wrote that can delete the page headers and footers, thereby saving you a few pages (and mother nature a tree) when printing long and elaborate man pages. Save it in a file named strip-headers & chmod 755.

    #!/usr/bin/perl -wn
    #  make it slurp the whole file at once:
    undef $/;
    #  delete first header:
    s/^\n*.*\n+//;
    #  delete last footer:
    s/\n+.*\n+$/\n/g;
    #  delete page breaks:
    s/\n\n+[^ \t].*\n\n+(\S+).*\1\n\n+/\n/g;
    #  collapse two or more blank lines into a single one:
    s/\n{3,}/\n\n/g;
    #  see what's left...
    print;

You have to use it as the first filter after the man command as it relies on the number of newlines being output by groff. For example:

funnyprompt$ man bash | strip-headers | col -bx > bash.txt