home .. forth .. colorforth mail list archive ..

Re: [colorforth] Extending the character set - Take II


--- Chuck Moore <chipchuck@xxxxxxxxxxxxxx> wrote:
> Bill Parker's arguments about 0 0000 are ingenious.
> But perhaps I don't understand his stop bit
proposal.
> 
> Adding such a bit to each character would be
> advantageous if the average word were less than 5
> characters.

Yes, I would say you understand.

> That may be the case with colorForth code. But will
> it for text files?

I'm not sure what you mean by 'text files'.  

If you mean the comments that accompany colorForth
code then all I can say is that the code plus comments
that I have seen so far compress better as a whole
using one stop bit per character rather than one
four-bit stop code per word.  In other words, the
average word length of code plus comments in the
samples I have seen is less than four.  You are in a
much better position than I to judge whether that is
likely to be true in general.

If, by 'text files', you mean content at some web site
(e.g. The Declaration of Independence) then I am
unsure how to answer.  It would seem that some sort of
character set extension would be required to add at
least parenthesis, apostrophe, and the ability to mix
upper/lower case (to avoid insulting the honorable
representative from Delaware - Thomas McKean).

> And an expanded character set isn't desirable. I was
> pleased to realize that 48 characters are adequate.
> More are not necessary just because they're on
> qwerty. And there is no limit to the number of
> possible characters.

One of my biggest motivations is actually the fact
that you have freed us from the qwerty keyboard.  It
is a small step to redefine portions of the character
set to extend the language so it includes
application-specific symbols.  And this becomes a
double-benefit because those symbols can be used both
in the coding of the application and in the user
(menu) interface.  And all of that comes for free in
colorForth as it stands today simply by overwriting a
character bit map.  Piece of cake.

Except the character set has been optimized down to
the point where it's hard to find a spare character to
kick out.  Yeah, I could live without j and z I guess
but that is probably about it.

So I look around to see where I can shoehorn in some
extra characters.  And immediately I see that I don't
need a shoehorn...almost 15% of the possible 32-bit
word combinations are illegal and therefore are never
used.  (These combinations are all illegal for the
same reason, because nothing can ever follow a space
within an encoded word except another space.)

I really am a fan a minimalism, I like the 24-key
input device and it is not my goal to include a full
qwerty keyboard.

I assumed that you had left out parenthesis,
apostrophes and mixed case not because you felt
English comments really didn't need them but rather
because it was a compromise and they were the least
needed.  I may have been wrong in this assumption.

My personal style would benefit by a few more symbols
(particularly ><) so I threw some of those in as well
since 'shift' opened up so many new slots.  For
myself, I tend to find use of symbols leading to
smaller and more readable (again to me) Forth.

But in all honesty, I may well have overstepped with
those assumptions.  My idea is simply to provide some
number of application defined characters by reclaiming
the 15% of encodings which are currently illegal.

__________________________________
Do you Yahoo!?
Yahoo! Finance Tax Center - File online. File on time.
http://taxes.yahoo.com/filing.html

---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@xxxxxxxxxxxxxxxxxx
For additional commands, e-mail: colorforth-help@xxxxxxxxxxxxxxxxxx
Main web page - http://www.colorforth.com