home .. forth .. colorforth mail list archive ..

Re: [colorforth] Intellasys question for Jeff Fox


> On Fri, May 23, 2008 at 05:21:14PM -0700, Jeff Fox wrote:
>> Dr. Monvelishsky wrote a CORDIC function in machineforth
>> for our ANS P21forth in 1992 which also ran on F21 with
>> a small mod.  I recall that I was impressed with the fact
>> that p21 running Michael's code was 50x faster than Intel's
>> 387 coprocessor on these transcendental functions at the
>> time. Michael wrote a CORDIC for SEAforth so long ago that
>> it probably needs to be updated now.  Such things will be
>> included when more libraries are published.
>
> That doesn't tell the whole story. Intel's result was for more
> than 50 % the nearest presentable float, and never more than
> the machine precision (epsilon) off.

You are quite correct that one sentence is not the whole
story. As always there are more details and the most
important is not raw performance but performance divided
by cost or performance divided by power consumption as
that was the intended target.

In this case both machines provide machine precision.
One can say they are equal in that sense and in this
case we compared on the same calculation. If you need
higher precision results it is easier to get them with
the wider bus, but cordic can calcute to arbitrary
precision.  But it becomes less efficient if you have to
do multiprecision math.

Of course F21 was designed for parallelism and a fair
comparison on power or cost or transitor count means
we should be comparing 100 F21 to a 386/387 combination.
So the performance ration raises to about 10000x on
the lower precision calculation when you level the
playing field and certainly that is the most important
part of the whole story.

It is a little like comparing C18 to Pentimum.  Since
Pentium don't cost a couple of cents or draw only mw
of power at full throttle a one to one comparison
makes little sense.  Pentium can't do what c18 does which
is match solutions wanting only a few cents or a few
mw of power.  A single processor costing a couple
of cents isn't likely to outperform processors
costing thousands of times more.  For a direct
chip to chip comparison one should probably pick a
chip the size and cost of c18 and note that you
might see a 30,000/1 performance difference. That
kind of comparison makes more sense since no one
is likely to swap out a Pentium for a really small
and cheap chip. If they can they have really been
wasting a lot of money and power.

For large scale performance comparisons where one is
going to spend pentium scale budgets for cost or
power we should be comaring a Pentium to 100 100x
clusters chips connected together, you know, stuff
in the millions of mips performance range.  That's
what the scalable in Scalable Embedded Arrays is
about.

> So to complete the picture:
>         How precise was the cordic?

Machine precision in each case.  On on the lower
precision calculation which is what had been asked about,
the important ratio is not the 50/1 for a single P21 but
I think a comparison on a more level playing field is
appropriate.  An p21, f21 or c18 was not meant to be
a big Intel chip designed for C code, they are designed
for realtime low-power efficient code.

If we are comparing differences we could also note
that with FP there are precision problems and errors
due to rounding.  Chuck often talks about how he
prefers CAD calculations that get more accurate results
than the popular floating point calculation methods.

Of course those tiny chips can already beat a Pentium very
badly not only on performance/cost or performance/power
but in raw performance when comparing on realtime
response which Pentium was not designed to do and which
the small chips were.  Pentium's deep pipelines and
multi-layered cache become a nightmare when trying to
meet realtime performance requirements that are quite
easy to meet with a processor costing a few cents.

> (Todays Intel's can do a cosine in a couple of cycles of 284 pS.
> It is not fair to compare the F21 to that, but indeed the world has
> moved on.)

Agreed.  It isn't fair to compare an <$1 early 90s embedded chip
needing only a couple of milliwatts to a anything being made
today because the world has moved on and it would be as unfair
to compare today's Intel chips to antique Forth chips as it
would be to compare today's Forth chips to antique Intel chips.
Anyone doing that would just be trying to infer that they had
moved on and that Intel was still stuck in the 80s (making 8051)

I have no idea why anyone would suggest a comparison of an
early 90s chips to 'todays Intel's' unless they are just
trying their best to be insulting and rude.

There is nothing wrong with the folks who prefer to work
with antique computers or antique language dialects.  But
most people have moved on from 80s chips and 70s dialects
except for a new notable nostalgists.

I was reporting that Intel's chips of that day had been
compared to Forth chips from the same time doing the same
calculation.  There are many other comparisons that could
be made to get the whole story.

For a fair comparison today you need to compare today's
chips to today's chips which is what we do today.  To
level the playing field we would need to scale quite
a bit to get to the level of machines people talk about
in c.l.f.  At the same level of cost or power consumption
as Pentium PC we need to talk about something like a
7,000,000 Forth MIPS PC.  The world has moved on since
FIG-Forth and transputers.

But even MIPS numbers alone are different things.  On Pentium
MIPS are associated with large scale large memory number
crunching.  They don't translate well into realtime
performance because of the cache and pipeline issues.
And since the Forth chips were not designed for that it
makes little sense to say one should compare on that
metric.  Though when we expand to Pentium budgets and
scale up to thousands of processors it will be intersting
to see where we sit with things like millions of mips
raw processing power.  We know we are very well matched
to cad calculations for instance.

When not scaled up to Pentium level systems he Forth chips
were designed for embedded realtime systems where power
and cost are critical.  Since Pentium is not designed to
do that it doesn't make much sense to compare on the
metric for Forth chips either.  Pentium looses so badly
there that is just isn't fair at all.

I found an interesting paper from Parallax benchmarking
an interface to one of their chips on Intel PC.  This was
an SPI interface bit-bang and since we have one of those
in a C18 ROM I thought it would make an interesting
comparison.  The raw performance ratios (not counting
the 1000/1 cost and power ratios) was still 100/1 and
not in favor of the Pentium.  The kind of mips on
Forth chips translate into things like faster control
loops on realtime applications because of the lack
of pipeline and cache problems.

But that's not the 'whole story' either. Including
the 'whole story' about Pentium will always take
10,000 pages of explanation about almost anything.
That's why almost no one claims to understand how
to optimize code for Pentium without a lot of
experimentation and code profiling.

In this case it was not just the problems with
pipelines and caches and parallel ports that limited
the PC performance but also the software that gets
loaded on Pentium PC that make it even more difficult
to get decent realtime numbers even on simple things
that simple and cheap processors do quite well.

I look forward to release of Chuck's colorforth
software targeting his Forth chip designs.  Colorforth
was made for okad and in my opinion almost all of the
value of colorforth is that it is a few percent of the
code at the bottom of the cad software that makes all
the progress possible.  Offering the SEAforth target
compiler in colorforth will give people a better sense
of the nature of colorforth code.

I also have to admit that there were many things I
liked about working with a large team of chips
designers who used to work at Intel on chips like
Pentium.  It was a lot of fun to talk to them and
learn how differently they think about things than
Forth programmers.

As often observed the Forth code in Forth compilers
isn't really very characteristic of Forth code but
is unfortunately all that a lot of Forth enthusiasts
ever get exposed to.  In my opinion the reason for
all this stuff is solving problems that no one has
solved before. In the case of Forth's intent I think
it has mostly been about pushing the performance/price
or performance/power envelope of applications, and
occasionally about programmer performance.

Best Wishes



---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@xxxxxxxxxxxxxxxxxx
For additional commands, e-mail: colorforth-help@xxxxxxxxxxxxxxxxxx
Main web page - http://www.colorforth.com