home .. forth .. colorforth mail list archive ..

Re: [colorforth] Intellasys question for Jeff Fox


On 24/05/2008, Jeff Fox <fox@xxxxxxxxxxxxxxxxxxx> wrote:
> In this case both machines provide machine precision.

You repeat this later, but I can't help thinking it's an evasion. Most
people, when they want to know precision, are thinking of a number of
binary digits. (And a word about accuracy wouldn't go amiss either.)
Since the F21 had a 21-bit word, would it be fair to assume that the
precision was 20-bit? What about accuracy?

> Of course F21 was designed for parallelism and a fair
> comparison on power or cost or transitor count means
> we should be comparing 100 F21 to a 386/387 combination.

What was the cheapest that single F21 chips were ever available?

> It is a little like comparing C18 to Pentium.  Since
> Pentium don't cost a couple of cents

But where can I buy a C18 for a couple of cents? It's an unrealistic
figure; even if they were packaged singly, and the C18 did cost a
couple of cents to fabricate, the chip packaging alone would
completely dominate the cost. (The same must now be true for PIC16s,
given how long the architecture has been kicking around - but even the
thoroughly obsolete PIC16F54, the cheapest processor Microchip do when
packaged in an 18-pin SOIC, is still 37c qty. 10k... or 48c qty. 4; 3
or less you might as well get via the sample program. I'd be amazed if
less than 98% of that was the cost of the carrier.) For a fairer
comparison, how much is a SEAforth-24A processor in various quantity
levels? And what proportion of that cost is packaging?

And comparisons with the Pentium are of course silly - but comparisons
with, say, the Cortex M3 (Luminary's implementation starts from $1 in
quantity) might be more realistic. And sure, the C-M3 doesn't come
close to Pentium performance - 62 Dhrystone MIPS (for what that's
worth) at 50MHz, I believe - and its raw instruction throughput isn't
as high as the C18's. But its programming model is a lot easier to
work with; and of course, Thumb-2 instructions *do* more than C18
instructions. Not as much as ARM instructions, admittedly; that's the
difference between an instruction set designed to be an efficient (in
both space and time) compilation target for C (Thumb), and an
instruction set designed to be a joy to program in assembler (ARM).
Still, it's a fair comparison - and historically ARM cores have been
pretty tiny (74k transistors for the ARM7TDMI, 112k for the ARM9TDMI).
And certainly in budgetary terms the C-M3 is a much better comparison
for the C18; it'd be interesting to see real comparative benchmarks.

In fact (and forgive the wandering off topic here), here's a
suggestion for an interesting benchmark - the number of voices of
MIDI-driven OPL2-style FM synthesis (at a 48k sample rate) that each
chip can perform, complete with a subjective audio quality comparison.
It's a nice realtime app; the specifications are fixed, well known,
and quite implementation-independent; it doesn't need multiplication
or large amounts of memory, but it can take advantage of it if it's
there; the clock required for sample output has the potential to test
interrupt latency; you end up with a nice little figure at the end of
it; it scales down to the lowest PICs (which may manage to get 1 voice
out, but not much more) and up to the scary fast GPUs nVidia are
producing these days (millions of voices! eek!); it can be implemented
easily in assembler, C or Forth; and it would provide anyone
interested in synthesis with a ready-made demo app.

> An p21, f21 or c18 was not meant to be
> a big Intel chip designed for C code, they are designed
> for realtime low-power efficient code.

Believe me, the Intel chips aren't designed for C code either. The
great gcc v x86 battles of the past and present bear witness to that!
In fact, at the time of the 286, Intel were producing the iAPX432,
which was intended to be directly programmable in Ada (of all things!)
and whose design had a heavy influence on the shape of 286 protected
mode - it wasn't until well after the 386's release that C could be
seen to have predominated.

> If we are comparing differences we could also note
> that with FP there are precision problems and errors
> due to rounding.

They are known, though, and not entirely unmanageable; and since the
x87 works with 80-bit floats by default (sufficient to retain 64 bits
of integer precision at all times, which matches the "long multiply"
of most 32-bit processors) they're much less of a factor than
denormalisation (which is generally disastrous for the x87's
performance).

Unfortunately, SSE is the future, and its maximum integer precision (I
believe; someone correct me if I'm mistaken) is 53 bits. Embrace the
regression :/ On the other hand, now that the P4 core is consigned to
history, might the x87 not be rehabilitated in the (PPro-derived) Core
2?

(http://docs.sun.com/source/806-3568/ncg_goldberg.html is a useful
reference. Floating point can be a very useful tool; particularly when
- as in the x86 - it's just there anyway, and you couldn't turn it off
if you wanted to.)

> Chuck often talks about how he
> prefers CAD calculations that get more accurate results
> than the popular floating point calculation methods.

Well, when you've carefully scaled all your quantities to be extremely
amenable to simple integer manipulation without precision loss, it's
not surprising that they are more accurate than floating-point
calculation with real-world units (which have a tendency to be
inconvenient and irrational).

> Agreed.  It isn't fair to compare an <$1 early 90s embedded chip
> needing only a couple of milliwatts to a anything being made
> today

Nor is it fair, in timeframe terms, to compare a 386/387 combination
to an F21, when the 486 had been the current x86 generation since
1989. The Pentium (P5) was introduced in 1993, too, so in terms of
timeframe, an F21 v P5 comparison isn't completely unreasonable.

Of course, there are whole hosts of other reasons why such a
comparison is completely unreasonable - it's just that release date
isn't one of them.

> There is nothing wrong with the folks who prefer to work
> with antique computers or antique language dialects.  But
> most people have moved on from 80s chips and 70s dialects
> except for a new notable nostalgists.

But the architecture of Chuck's chips has, if anything, become more
constricted since the F21 days. 20 bits of external bus (and 21 of
internal) have been shorn to 18 bits of each, and the ability to talk
to external RAM seems to have disappeared.

> The world has moved on since FIG-Forth and transputers.

You know, it actually hasn't; it's now turning out today that the
transputer was bang on target - just about 25 years too early... and I
think people yearn for the days of simple models like figForth /
Forth-79 or BBC / Applesoft Basic.

> But even MIPS numbers alone are different things.

Millions of Instructions Per Second... not that different, on the face
of it. Of course, the problem is that "Instruction" is a wildly
divergent concept, not remotely comparable between CPUs. :) Hence the
quest for "universal apples" - Dhrystone isn't much cop, but it
appears to be the best available (but see above).

> That's why almost no one claims to understand how
> to optimize code for Pentium without a lot of
> experimentation and code profiling.

Except Agner Fog. I think everyone leaves the heavy lifting to him
these days. ;) In any case, perhaps the single most important piece of
optimisation advice (after "if it's more than O(n log n) check you
haven't screwed up") that everyone needs to know about any modern CPU
is "make sure all your inner loops stay in L1 cache; make sure as much
of your data as possible is in L1 cache before using it".

> In the case of Forth's intent I think
> it has mostly been about pushing the performance/price
> or performance/power envelope of applications,

And possibly the distinction between high and low level.

> and occasionally about programmer performance.

Not necessarily more so than anything else, though; the kind of
programmer who can naturally bend their mind to the C18 will probably
have just as much of a field day in the hidden corners of the ARM
instruction set - or even the x86 one (look at Chuck's approach,
mining all the 1-byte microcoded instructions that nobody ever
generates, because they squish much more neatly into L1 cache...
unless you have a P4, of course).

Unfortunately, nobody seems to want that kind of programmer any more -
which leaves me (for one) out of a job, and increasingly alienated
from the field I trained in. Mneh.

Regards
Gwenhwyfaer  (... all job offers gratefully accepted ;)

---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@xxxxxxxxxxxxxxxxxx
For additional commands, e-mail: colorforth-help@xxxxxxxxxxxxxxxxxx
Main web page - http://www.colorforth.com