home .. forth .. colorforth mail list archive ..

Re: [colorforth] Unused character encodings available in Colorforth


On Sun, 22 Feb 2004, Samuel A. Falvo II wrote:

> On Sunday 22 February 2004 09:39 am, Bill Parker wrote:
> > It seems like the 'space' encoding of 0 000 is only
> > used to fill out the unused portion of packed words.
> > If that is true, it seems like 'space' should be an
> > inefficient 7 bit character rather than one of the
> > premium 4 bit characters.
>
> This is because his encoding is NOT Huffman encoding.  It's Shannon
> encoding.  I really don't know why everyone insists it is Huffman
> encoding.  True Huffman encoding involves structuring a binary tree of
> characters according to their frequencies found in a corpus of text.  If
> it were true Huffman coding, then you'd be seeing two-bit encodings for
> the most frequently used characters.

I'm not sure it is Shannon coding either, at least from the description I
hve found. Shannon coding involves first ordering a set of symbols
and then spliting the symbols into two groups  with equal probability,
assigning a 1 to one group, 0 to the other group. The procedure is
repeated until each symbol has a unique code.

Huffman coding uses a different procedure[1] to assign a code to each
symbol.

What Chuck is doing might be described as hand coded huffman. He does
assign the more frequently occuring characters to shorter codes, however
it apears to me the codes are asigned in such a way to make characters
simpler to decode. I'm speculating here, because I have not computed the
huffman nor shannon coding for english for this character set. If I did I
don't think the characters would fall into such neat and balanced
prefix classes.

Mark

[1] http://en.wikipedia.org/wiki/Huffman_coding

---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@xxxxxxxxxxxxxxxxxx
For additional commands, e-mail: colorforth-help@xxxxxxxxxxxxxxxxxx
Main web page - http://www.colorforth.com