home .. forth .. misc mail list archive ..

RE: Preprocessing of source text


Hi James,

The point it looks like you are missing is that the
'Ptr to DUP' just points to the string which has an
associated pointer to the code to execute.  This pointer
is only valid after it has been set by compiling the
code defining the word DUP, from this point on, references
to the word DUP are all compiled by following the pointer
to the string, picking up the pointer to the code to
execute and compiling a call to that address.  Very quick.
One can edit in the middle of a large source file and not
affect the validity of the pointers to strings which come
later in the source.  Dictionary searches are only required
as each complete word has been entered in the editor and
then, only for those words which are re-defined by the
editing process.  The records for strings which are
re-defined by editing need to be marked as deleted and
new records created, and there needs to be some form of
garbage collection (which could be as simple as passing
the editor through the rest of the source).

Mark

> -----Original Message-----
> From: James Hague [mailto:jamesh@volition-inc.com]
> Sent: Wednesday, December 13, 2000 12:46 PM
> To: MISC@pisa.rockefeller.edu
> Subject: Preprocessing of source text
>
>
> I have been working on a Color Forth insprired language, and I've been
> mulling over ideas for moving some preprocessing functions into
> the editor.
> This is largely the result of reading Chuck and Jeff's ideas at
> ultratechnology.com.
>
> I worked out some schemes for source preprocessing last night, and I ended
> up wondering exactly what I was trying to achieve.  The ultimate scheme
> seems to be to reduce a sequence like "DUP * 5 +" into what almost looks
> like threaded code:
>
> Ptr to DUP
> Ptr to *
> Ptr to 5
> Ptr to +
>
> In practice this is difficult, because you need to maintain a coherent
> dictionary at compile time, taking deletions into account.  I think that's
> trickier than it sounds.  This could happen in one big step at
> save and load
> time, but that's almost the same as a whole compiler.  I don't
> think it buys
> much.
>
> Taking one step back, another scheme is to simply preprocess all
> the strings
> so the compiler doesn't have to, sort of like this:
>
> 3 'D' 'U' 'P'  1 '*'  1 '5'  1 '+'
>
> So you effectively have a block of counted strings.  Color info can be
> included in the count byte.
>
> In both of these schemes, you could preprocess numbers,  so '5' would be
> represented as the binary value 5, plus a tag indicating that it is a
> number.  Again, this could be compressed into the count byte.
>
> In this second scheme, what is the preprocessing really buying?  Strings
> still have to be  looked up at compile time (either a linear search or a
> hash).  As such, does the preprocessing really simplify the compiler to a
> significant degree?
>
> In terms of memory, The counted string format is the same as the raw text.
> The threaded form is actually larger, if the words are short.
>
> Overall, I'm leaning toward processing raw text with embedded
> color tokens.
> I'd be interested in hearing other experiences.
>
> James
>