home .. forth .. misc mail list archive ..

No Subject


Dear MISC readers,

>Jeff and MISC readers,
>
>I was just attempting to give folks some idea of what they could expect from
>P21 with an apples/apples comparison.  I *did* point out that P21Forth
>was not as tweaked as UR/Forth.  I in no way meant the benchmark to be
>a criticism of P21, P21Forth, or Jeff.  P21 coming in at the equivalent to
>a 386DX27 or so impressed the hell out of me.

No offense taken,  I just thought some explanation would help make
things clearer to people.

>>CORDIC coordinate tranformation:
>>MuP21 running P21Forth in DRAM with CORDIC in CODE and video on:       20 uS
>>MuP21 in SRAM with CORDIC in CODE w/o video on: (est)                   6 uS
>>486 50 running FPC with Colon def of CORDIC:                          500 uS
>
>Apples/oranges.  CODE vs. COLON.

True, but he author Dr. Montvelishky told me that P21 could execute the
routine 50 times faster than the optimized hand coded version for 387
that he wrote for a client.

>>Towers of Hanoi:
>>MuP21 running P21Forth in DRAM with Colon def and video on:            .6 Sec
>>486 50 running eForth 2.42 (same as P21Forth) with Colon def:          12 Sec
>
>Apples/oranges again.  eForth 31 words in CODE vs. P21Forth 200 words in CODE.

True                

>>3D coordinate transformation with rotation and clipping of a CUBE:
>>MuP21 running P21Forth 1.02 in DRAM with Colon def and video on:  20 frames/S
>>      My version of Dave Lowry's 3D demo.
>>486 50 running FPC with Colon def (Mark Smiley's graphics demo):   4 frames/S
>
>Lowry/Smiley :-)  Don't know what Mark's code does or how it's written.

Well, apples/oranges again to some extent.  Mark's routine has some
hidden line removal.

>>Multitasking tests:
>>MuP21 running P21Forth background tasks incrementing counter:         120k /S
>>MuP21 w/ high speed sram and no video (estimated):                    400k /S
>>486 50 running eForth 2.42 (same as P21Forth) tasks in background:    40k /S
>>486 50 running FPC with 1 background task incrementing counter:       60k /S
>
>Apples/apples pretty much.  Fair comparison.

Pretty fair test, but there is math and multitasking going on, etc.

>>Loop Tests:
>>MuP21 running P21Forth : X 1000000 0 DO 34 DROP LOOP ;                  25 S
>>MuP21 running P21Forth : X 1000000 FOR 34 DROP NEXT ;                    4 S
>>MuP21 w/ high speed sram and no video (estimated):                       1 S
>>486 50 running eForth 2.42 (same) : X 1000000 0 DO 34 DROP LOOP ;       17 S
>>486 50 running FPC : X 1000 0 DO 1000 0 DO 34 DROP LOOP LOOP ;           2 S
>>486 50 running TCOM (optimizing native code compiler)                    1 S
>
>The point of my posting the graphics benchmark was to ground P21 performance
>in a "real world" application.  The "do 34 drop loop" benchmark seems utterly
>useless to me.
>
>-Dave

Sure, when Bernie Mentink originally posted his benchmark I stated my objection
which was that some of the optimizing compilers will actually remove the
34 DROP from the inner loop, then since there is nothing in the inner
loop they will remove that too!  This of course gives some optimizing
compilers very impressive numbers on:
: INNER ( -- ) 10000 0 DO 34 DROP LOOP :
: BENCH ( -- ) 10000 0 DO INNER LOOP 7 EMIT ;

P21 will produce the fast code when compiled to native code where 
34 # DROP is in machine code, and faster then 34 # DROP is inlined a
few times, and even faster code where the constant 34 is cached in
the A register and then inlined.  That is A DROP A DROP etc will
give some indication of what results you get for an optimizing compiler
that does not "cheat" by removing the "34 DROP" or the inner loop
altogether which some do.

Benchmarks are a difficult subject.  Some are really designed to show
how ADA will perform, some are designed to show how C will perform,
some are designed to show performance on floating point array
processing etc.

It is difficult to compare CPU performance if your benchmark also
measures compiler effiency, disk access, focusses on FP, or tests
compiler speed etc.  P21Forth uses a simple linked list for dictionary
searches, so it is not a good comparison (for cpu power) to compare it
to a system that uses a hash table etc etc etc.

The best proposed benchmark I have seen is the HINT benchmark that
Penio told me about last year.  

It is designed to measure quality of the computation.  It is not biased
to a particular machine.  It can be implented in FP or INT and on a bus
of arbitrary width etc etc.  OF course I don't have any real numbers
for HINT, or for Dhrystone or SpecInt or SpecFP or ....

I just posted what I had with some explantions.  I do not object  to 
anyone saying that these benchmarks are not relavant to what they
will be doing.  I fully understand that.

I just don't time to develop more standard benchmarks with all of the
other stuff I am working on.  Maybe at some point I can give some
results that are more conventional.

I would love to be able to give a good indication of a reasonably
optimized HINT or even SpecInt.  Many people will not put P21 or F21
into processor information tables without these kinds of numbers.

Jeff