[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Tags (was: RE: New Threads (Was...))

> -----Original Message-----
> From: Gary Preckshot []
> Sent: Monday, June 12, 2000 7:29 PM
> To: LDP
> Subject: Re: Tags (was: RE: New Threads (Was...))
> Gregory Leblanc wrote:
> > 
> > We've got most of this already, sort of.  I think that 
> there should be three
> > lists, as you propose, although I think that your second 
> list shouldn't ever
> > be clearly defined.
> I'm not sure that a list that isn't completely
> define is very useful.

Let me rephrase then.  There's no need to spell our every single tag that is
in the "permitted" group.  We simply say "here are the tags that you MUST
have, here are the tags that you CANNOT have.  Feel free to use any tags not
listed here however the DTD allows them to be used.  As an additional note,
here are the tags that our search engine is "smart" about.  How's that

> As a way of starting this tags discussion off, I
> have a text file with a hierarchical list of
> DocBook tags automatically generated from DocBook
> itself. The list has the form

Woo hoo, that's got to be HUGE.  ohmygosh, 65K.  Not that long, I guess,
much shorter than parts of the SHR.  :-)

> It's 3345 lines long, and each line consists of
> parent_tag child_tag
> For instance, above, if Article is open, Abstract
> ... FormalPara and the following are legal
> children. I haven't got the unreverted tags yet,
> but it would be a similar, but shorter file.

Unreverted tags?  I'm not familiar with the usage of that word in this
context (or, I just don't get it).

> As to what we can do with this, we can mark it up
> with list indicators for allowed, search,
> deprecated, and whatever other list we choose. The
> lists can then be used by a syntax checker to
> indicate where there were deviations from LDP's
> policy. The fact that this list was generated
> automatically from the DocBook DTD means we won't
> miss anything.
> > Again, I don't think we have anything to define subsets, 
> except to remove
> > those "depreciated" tags.  Until we have some search 
> engines that take
> > advantage of the DocBook markup, there's no reason to 
> define any more than
> > "ok to use" and "don't use"
> See above. You can use "grep -v string fileaname"
> to remove tags you don't like. So you can start
> with the above complete list and make several
> lists from it. 

Hmm, two ways to treat this.  The file that you sent me isn't really tags,
it's complete structures.  Are we going to restrict based on structures (a
huge set), or tags (a much smaller set)? 

> > Hmm, that would make four, the way that I count.  However, 
> they would
> > definitely have some overlap.  Required would be the ones 
> that you MUST
> > have, in order to have a valid HOWTO document.  Permitted 
> would be ones that
> > are allowed in HOWTOs, but not required.  Searchable would 
> be some from both
> > sets, although not necessarily all of either set.  These 
> would be the ones
> > that our search engine/viewer understands.  The last set 
> would be restricted
> > tags, which would basically be any tags that we don't want 
> people to use.
> Not a big deal if you use mechanical help. We
> should be able to try things on for size. Note
> that if you forget something, you can grep out
> what you need, append it to the file you're
> building, and sort it, and voila, a new list.

Yeah, need to clarify what we're specifying here.  I'm going to try to
produce a blank template here in a couple of minutes, assuming that I'm
awake enough to do that.  

> > > > Perhaps we should begin by stripping all tags from the template
> > > > (and example.sgm?) and annotate them?  Is that a good start
> > > for defining
> > > > our subset?
> > >
> > > It's a start, but the issue of searching needs to
> > > be dealt with.
> > 
> > I've put some minor thought into doing this, but it's a big 
> enough project
> > that I need to get back up to speed with programming first.
> Yes, it is a big project, and by using mechanized
> help we can avoid mistakes.

I'm not so concerned about mistakes, but more about completeness, and the
ability to not get stuck with bad design.  We do need to be able to adapt
this to DocBook 4.0 and 5.0, as well as XML at some point.

> > > > > 3) Put together an
> > > > > on-line thesaurus of keywords.
> > > >
> > > > Ok, I'm seen a Glossary suggested, but no thesaurus
> > > suggestion so far.
> > > > Why a thesaurus?
> > >
> > > A glossary would make a good howto. I suggested a
> > > thesaurus because keywords can get out of hand. A
> > > thesaurus would do two things: authors could avoid
> > > new keywords if one already existed that met their
> > > requirements. People doing searches could find out
> > > which keywords were likely to hit their subject.
> > 
> > What kind of structure are you looking at for the 
> thesaurus?   Is this for
> > people to read, or for authors/maintainers to use in trying 
> to make their
> > document show up in searches more appropriately?
> We need a database online that folks can search
> and add to if they can't find what they need. Each
> entry should have associated with it a meaning and
> intention. If we do it right, the thesaurus should
> expand with use.

Sounds cool.  Got any ideas on how to build such a thing?  Maybe somebody
can build a starting list of keywords.  Later,

To UNSUBSCRIBE, email to ldp-discuss-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org