home .. forth .. colorforth mail list archive ..

[colorforth] FS/Forth for Linux: Performance Testing


I decided to run a quick performance comparison between the various methods of
producing native code Forths under Linux.  Here are the numbers for a 2.4GHz
Athlon and 800MHz Athlon.

:: Timings for Athlon-XP 2400+

             Raw Assembly:    4362999 us
       Register Allocated:    6300999 us
           Baseline (GCC):    3794999 us
        Native Code (EBP):    7597999 us
        Native Code (ESI):    6577999 us
Subroutine Threaded (EBP):   81553999 us
Subroutine Threaded (ESI):   68567999 us

:: Timings for Athlon 800MHz

             Raw Assembly:    9369999 us
       Register Allocated:   16369999 us
           Baseline (GCC):    7929999 us
        Native Code (EBP):   21439999 us
        Native Code (ESI):   17689999 us
Subroutine Threaded (EBP):  165409999 us
Subroutine Threaded (ESI):  146249999 us

:: Notes/Interpretation

The '999...' part of each time is due to the inherent imprecision of Linux'
built-in timer API.

The goal of the software was to increment 2GB worth of 32-bit values,
sequentially.  It is implemented by creating a 1048576-element 32-bit integer
array, and invoking code which increments each element of the array by one 500
times.

Raw Assembly indicates the time taken by code which was hand-written and
hand-tuned.  Originally, I had used the Athlon's PREFETCHW instruction to
pre-fetch the next cache line while the CPU was executing operations on the
current cache line.  The time for this was comparable to, and sometimes even
beat, GCC's highly unrolled software (which did not PREFETCHW).  If I'd
unrolled the raw assembly loop to the same extent that GCC did, it would have
exceeded GCC's performance consistently, perhaps even substantially.  I love
AMD Athlons.  :)

Unfortunately, the software would crash on some non-Athlon systems.  So I
re-assembled the code without the use of PREFETCHW, which explains the
reduction in performance seen above.

GCC indicates the output of the GCC compiler.  The software was compiled using
-O3 -mcpu=athlon -funroll-loops.  This is the baseline performance.  GCC
produces the fastest average software, but at a *significant* size penalty. 
The difference between -mcpu=athlon and -mcpu=athlon-xp was so small as to be
in the noise floor.

Register Allocated is my re-interpretation of the software based on my past
work with register-allocation engines for Forth native code production (Billy
recalls this work).  I was expecting better performance, actually.  It sits
between the uses of EBP and ESI for data stack access.

Native Code (EBP) indicates the length of time taken by a hypothetical FS/Forth
compiler that uses EBP as its data stack pointer, and which uses XCHG EBP,ESP
to alternately switch between return and data stacks for the PUSH and POP
machine instructions to operate on.  EAX is the top of stack cache.  This is
the slowest of the native code methods.

Native Code (ESI) indicates the length of time taken by a hypothetical FS/Forth
compiler that uses ESI as its data stack pointer, BUT WHICH DOES NOT USE LODSD
FOR DROP.  EAX is the top of stack cache.  DROP is implemented using explicit
CPU instructions (MOV EAX,[ESI]; LEA ESI,[ESI+4]), to achieve the highest
possible performance.  It is otherwise largely equivalent to the method that
Chuck Moore is currently using in ColorForth.  This is the fastest of the
native code techniques.

  NOTE: I could still use EBP as the data stack pointer and get similar
  performance as long as I use the explicit addressing modes, rather than
  the PUSH/POP implicit addressing modes.

The various subroutine threaded implementations serve as additional baseline
comparisons, giving a representative indication of how FS/Forth for DOS would
compare, if ported literally to the Linux environment.  Both EBP and ESI
methods are employed.  FS/Forth for DOS uses EBP with exchanges.

  NOTE: Using explicit addressing modes provides an average 16%
  performance gain, regardless of subroutine threading or primitive
  inlining.

Anyway, I thought these numbers would be interesting to you folks.  The code is
accessible from the #Forth portal site at
http://forth.bespin.org/Members/kc5tja

--
Samuel A. Falvo II


__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@xxxxxxxxxxxxxxxxxx
For additional commands, e-mail: colorforth-help@xxxxxxxxxxxxxxxxxx
Main web page - http://www.colorforth.com