home .. forth .. colorforth mail list archive ..

Re: [colorforth] shifted huffman trees


"Mark Slicker" <maslicke@xxxxxxxxxxx> said:
> On Tue, 21 Sep 2004, Bill Parker wrote:
>
> > --- Mark Slicker <maslicke@xxxxxxxxxxx> wrote:
> >
> > > This new tree should be regarded as an alternative
> > > to the 'comment', 'Capitalized', 'all caps' tags and
> > > the anti-space word.
> >
> > Starting with a colorForth where 13/16 tags were used,
> > extending the character set frees up 2 tags.  If you
> > also use a special marker value in the upper bits to
> > indicate when numbers need a second word (something
> > I've toyed with and I think has been mentioned here
> > before) then you free up another 2 tags which gets you
> > down to the point where only 9/16 tags are used.

> > > Huffman probabilities should influence the shape and
> > > balance of tree, but I haven't done that yet,
> >
> > One of the reasons I just added one big block of
> > extended characters was that I was hard pressed to
> > come up with any reasonable 'frequency of use' by the
> > time I got to digits, capital letters and punctuation.
>
> You could use a standard english corpus to compute a Huffman coding of
> digits, lower/upper case letters and punctuation. Forth source would
> likely skew probabilities.

If what you are trying to compress is English text, then you use English text to
build your Huffman tree.  If what you are trying to compress is FORTH source,
then you should be using FORTH source to build your tree.  A Huffman tree that
is optimal for English may or may not be optimal for FORTH.


---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@xxxxxxxxxxxxxxxxxx
For additional commands, e-mail: colorforth-help@xxxxxxxxxxxxxxxxxx
Main web page - http://www.colorforth.com