[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
re: Tags (was: RE: New Threads (Was...))
- To: "'Gary Preckshot'" <>, LDP <>
- Subject: re: Tags (was: RE: New Threads (Was...))
- From: Gregory Leblanc <>
- Date: Mon, 12 Jun 2000 17:04:58 -0700
- Resent-Date: Mon, 12 Jun 2000 20:04:52 -0400 (EDT)
- Resent-Message-ID: <rU6JKD.A.8oF.sqXR5@murphy>
> -----Original Message-----
> From: Gary Preckshot 
> Sent: Monday, June 12, 2000 3:50 PM
> To: LDP
> Subject: New Threads (Was...)
> > We need a subset, and
> > the template marks out that subset pretty well.
> Nothing has been defined as a subset. In order to
> have a subset, you need something that says,
> baldly, "Here is the subset of DocBook 3.1 tags
> LDP uses. Here's another subset that are OK to
> use, but we'll ignore (as in search) them. Here's
> the remaining tags that we would prefer you never
We've got most of this already, sort of. I think that there should be three
lists, as you propose, although I think that your second list shouldn't ever
be clearly defined. The first list should be all of the tags that we use to
search/present better tailored content. The third list (yeah, I was a math
major :) should be all of the tags that the LDP considers "depreciated", and
that we'd prefer that nobody used. The second list should be just a list of
all of the tags in DocBook, minus our other two lists. Since we don't have
anything to define that first set, and we've got only a very short third
set, we don't have much of a list right now. Basically, we've got two sets
until we have some viewer/searcher that speaks DocBook.
> The filter should do three things:
> 1) It should do a grammar scan so the DocBook
> grammar is right.
> 2) It should note when required tags are not
> 3) It should note when tags outside the allowed
> subset are present.
sgmlnorm does two of three here. The third can't be done because we don't
have an "allowed subset", and have no reason to define one. We can say that
these tags are depreciated, but that takes about 8 tags out of use. Hardly
worthwhile, if you ask me.
> > I brought up DocBook:TDG because it is an excellent
> reference and one of
> > it's points was that minimized markup can be expanded easily with
> > sgmlnorm. You stated your anger towards people who
> handwrite SGML and I
> > told you of an easy way for even laxy handwriters to
> produce code to the
> > letter and intent of the LDP policy. That solution has not
> been heeded
> > by you so far...but I offered.
> > Regardless of that, there is no reason not to recommend
> people take a
> > gander at TDG. It is a good book. DocBook is not a
> confusing markup
> > language, and there is no reason someone can't gain
> valuable knowledge
> > and understanding from a few minutes of browsing through relevent
> > sections of TDG while composing a HOWTO.
> I agree that it's a useful reference. My concern
> was that it was being offered as a substitute for
> the hard work of deciding on subsets.
Again, I don't think we have anything to define subsets, except to remove
those "depreciated" tags. Until we have some search engines that take
advantage of the DocBook markup, there's no reason to define any more than
"ok to use" and "don't use"
> > Finally, when the subset has been agreed upon, we should
> codify it and
> > put it on the website in an easy to find spot. Let's start talking
> > about that subset. I haven't seen a thread on it
> yet...shall we start
> > one?
> Yes. How about "Subsets". We need three -
> Required, permitted, and searchable.
Hmm, that would make four, the way that I count. However, they would
definitely have some overlap. Required would be the ones that you MUST
have, in order to have a valid HOWTO document. Permitted would be ones that
are allowed in HOWTOs, but not required. Searchable would be some from both
sets, although not necessarily all of either set. These would be the ones
that our search engine/viewer understands. The last set would be restricted
tags, which would basically be any tags that we don't want people to use.
> > Perhaps we should begin by stripping all tags from the template
> > (and example.sgm?) and annotate them? Is that a good start
> for defining
> > our subset?
> It's a start, but the issue of searching needs to
> be dealt with.
I've put some minor thought into doing this, but it's a big enough project
that I need to get back up to speed with programming first.
> > > 2) Define a set of required tags
> > > that will be used for search.
> > Ok. Do you want to start the new thread on this, or shall
> > I?
> > Where do
> > we start...with index tags? section tags? something else?
> What are the
> > good tags to use for intelligent context sensitive searches?
> We need required structure tags (like
> <sect1>,<Article> etc.) required identification
> tags (like <Author> and subsidiary tags), required
> history tags (like <RevisionHistory> and
> subsidiary tags), search tags (like keyword
> lists), indexing tags (I'm not sure what they are,
> but they should mark points in the text. Maybe
> link tags.) Deprecated tags. Other tags that are
> OK, but not special. Whichever of us gets to it
Alrighty, I think I'll give that a shot this evening, in between Solaris
> > > 3) Put together an
> > > on-line thesaurus of keywords.
> > Ok, I'm seen a Glossary suggested, but no thesaurus
> suggestion so far.
> > Why a thesaurus?
> A glossary would make a good howto. I suggested a
> thesaurus because keywords can get out of hand. A
> thesaurus would do two things: authors could avoid
> new keywords if one already existed that met their
> requirements. People doing searches could find out
> which keywords were likely to hit their subject.
What kind of structure are you looking at for the thesaurus? Is this for
people to read, or for authors/maintainers to use in trying to make their
document show up in searches more appropriately?
> > > 5) Define
> > > (E.g. select from the DocBook DTD and publish)
> > > indexing tags to allow for "go to" display of
> > > cross-referenced or searched documents. At the
> > > moment, HTML looks like the only format amenable
> > > to this function.
> > I don't get what you mean? HTML is the only HyperText capable mode
> > we've got (other than the very limited PDF), true. But I
> think that's
> > really where things like an intelligent search needs to
> happen anyway.
> > Our only problem in this regard is making the search on the
> SGML using
> > intelligent tag reading and then placing the reader into the correct
> > spot in the HTML online and making it seamless.
> Somehow we have to come up with a way to identify
> a location in the SGML with a location we can tell
> a viewer to go to. Otherwise we're stuck with
> referencing the HOWTO as a whole.
> A marker in the SGML probably has to transfer
> invisibly to the database view, so we can tell the
> viewer to go there.
I think, maybe, possibly, that the indexterm tags can do some of these, and
the <TOC> stuff may be able to do the rest. Not completely sure though.
> > Now, if you'd like for there to be a script to correct everyone's
> > mistakes, I won't try to stop you...But really, I think
> you're way too
> > worried about rabid vi users bringing anarchy to the LDP.
> Brother, I
> > gotta tell you, the LDP was built with vi and Emacs (with
> no SGML mode,
> > in the beginning) long before either of us were around.
> It isn't really difficult. The grammar of DocBook
> is simple. A pushdown stack recognizer can
> accommodate the two states an open tag can be in.
> Open and unreverted and open and reverted. These
> states govern which tags are legal as children,
> and the legal tag lists can be truncated to the
> LDP subset.
> SGML is actually a simple language. It's far less
> complicated than C or Pascal.
DocBook is actually a simple language, less complex than C or Pascal. SGML
is pretty much as complex as you choose to make it, since it's not a
language, but a language for describing other language. But that's just
To UNSUBSCRIBE, email to firstname.lastname@example.org
with a subject of "unsubscribe". Trouble? Contact email@example.com