home .. forth .. misc mail list archive ..

Re: registers


On Wed, 2 Aug 1995, Jeff Fox wrote:

> Dear MISC readers,
> 
> Eugene wrote about MISC chips:
> 
> >I'd even wish for 5-10 all purpose registers, though this would bloat
> >code space and disable multiple instructions in single word feature.
> >Home code page in SRAM can possibly alleviate the lack of generic
> >registers.
> 
> I think you could get what you want without much penalty.  The
> SRAM home page access is nice, but it is still external memory and
> requires a set up time.  Even with the smallest setup time and

Yeah, this is true. But still 12 ns for home page in SRAM are
a lot better than 60-70 ns in DRAM and the instruction stream
is less bloated because of home-page implicit addressing.

Since we don't have cache, pipeline, etc. we can do:

70 ns 14 MHz for standard cheap memory
60 ns 17 MHz fast dear memory
12 ns 83 MHz SRAM platinum edition cache memory

accesses. In some cases we might get 4 instructions with one
core access, though it is hard to tell how the average code
mix will perform.

Were my wish granted and Chuck would implement registers,
they would have some slice of instruction space assigned
to them for implicit addressing. Instruction space being
very scarce for 21 bit words, this would hardly be wise.

Would we have a zero-page (as in 6502), consisting e.g. from
1-8 k fast (1-5 ns) on-chip SRAM on the P32, it would appear
in the main address space yet would also be addressable with
special instructions very like the home page. This would 
leave the stack-machine core intact and provide us with 
register-like memory. Is the on-chip memory large enough, 
even the home page could be substituted by it, thus hardwiring
home page base.

A much cleaner solution, imo.

> fastest ram easily available an access to external ram takes the
> same time as about 4 instruction clocks.  Addressable register
> access still goes through the memory processor I think, but it
> could be set to take only about two clock cylces.  Of course their
> are instructions that access the A and stack registers in one
> cycle because no addressing is involved.

Well, this is implicit addressing.
 
> There are also five unassigned (memory access) opcodes in the
> instruction set.  If these instructions are used you could make
> one a five bit argument with two cycle register access, and
> many some other opcodes for one cycle access like A@ and A!. 
> Of course someone would have to pay Chuck to add new opcodes and
> add registers.  Registers are also expensive in the sense that they

This is the main point, I think. Registers are expensive in terms
of silicon and developing while muddying F21' clean design core.
Chuck's finances being stretched thin as they are and workload being
what it is I don't think that pushing for registers would be good
idea.

> are the biggest visible portion of the layout.  In other words 32
> registers alone would require about 50% more silicon, and the extra
> instruction decode and execute would add a little more to the alu
> section.  It might be just the feature that is needed to make it
> programmable easily with C technology.  If it doubles the price of
> the die to do this it would be a small price to pay.  F21 might
> cost slightly over $1 to manufacture in large volume with this
> feature added.

This is another point of critique: $1 is too cheap. I mean 
I am totally baffled what one could achieve with 10 k transistors,
but there has to be some sensible price relation between the 
memory price and the CPU price. I don't think that a $10-20 CPU
will significantly increase the final product price. It is just
too small a fraction of the total node cost. 

On the other hand a bigger die could mean more bus/ALU bits and
some on-chip SRAM. Design won't be too complex, as add-ons are
modular. This is a good investment, I think.

> I like to think up new instructions like that and try to figure
> out what the design tradeoffs will be, then discuss it with Chuck
> and see how close I was to what thinks he would have to do.
> More memory access via different registers could also be added.

Yes, adding a B register for a two-address machine, or even C
for three-address? This would extremely valuable while not
deviating too far from stack machine design. And I don't
think anybody will need more than 3 memory pointers (a BitBlt
would need 3: source, destination and mask).

> You could think of these general purpose registers as on chip sram,
> but since you could have special instructions like A@ you could
> access some of them somehow with addressing already set up in
> hardware so that it happens in one instruction cycle.  If you

Yes, exactly: an abbreviated addressing. E.g. 10 bits for 1k
words embedded in opcode.

> use 5 to 15 bit type address to get at the registers I think 
> it takes a minium of two cycles for instruction execution and
> then access.  I am not sure but I think the way the instructions
> interact with the memory processor in the current design the
> A@ is once cycle because it is not memory access, and @A or @A+
> _could_ execute in two cycles if the address is decoded on chip.

This would be nice. Instead or cache hierarchy we would have

1) stack
2) on chip SRAM
2.5) on chip DRAM (optional)
3) off-chip SRAM (optional)
4) off-chip DRAM
5) virtual memory on the hard disk

This resembles the MIPS design credo somewhat, I must admit.

> I imagine it would make things much more complex if you try to
> make these (on chip sram) registers also act as memory addressing
> registers.  That might be quite involved, I will cc Chuck on that.

I mean that abbreviated addressing alone might be sufficient.
It would be more orthogonal to let the on-chip SRAM also appear
in main store, but not necessary if it introduces too much
design complexity and/or delay. It just making copying from
on chip SRAM to main store awkward, that's all.

Another big plus of register-like SRAM with abbreviated addressing:
we might use threaded code to implement a fast virtual machine
in "microprogramming". This might vastly increase code density
at still high speed and implement a standartized virtual machine
for the sake of compatibility. 1k would suffice to implement
the gist of a VM, just remember Novix ROM.

> Like some on chip sram, general purpose registers that can act
> as pointers to memory are powerful.  It would be great if the
> MISC approach can deliver useful register (on chip sram or whatever)
> access in a conventional way more or less to compilers, but
> without the conventional penalty of large instructions.

One common operation is to take things from store from one or
two locations, mangle it e.g. on TOS and write it back to
the third location. This would need A,B,C and write-TOS-to-X-and-increment
or decrement. Contiguous memory blocks. No offsets.

I might be wrong, but I think this would meet most demands.

> Of course F21 will not get any major changes like this.  It will
> get bug fixes, and _maybe_ implement something on the two unused
> pins on the first prototype.  But Chuck will be making more chips.
> I expect he will do a 64 bit with multiple high speed serial links
> and on chip sram and at rediculous speed before long.  He is now

This would be extremely useful and open wholly new vistas. Particularly
multiple links would enable scalable nodes with _scalable network
bandwidth_. This is crucial for true maspar supercomputers. 64 bits
scaled integers are absolutely sufficient resolution for almost any
scientific purpose.

I look forward to the day when Chuck releases this chip.

> dealling with the .8 to .5 micron jump as the results from .8
> prototype runs comes back.  It won't be that long before he makes
> the next step to .35.  At that point he can do very small chips
> with lots of pins and a gig cpu throughput, multi-giga serial
> ( or parallel) links, and lots of 1 or 2 ns access registers.
> The smaller faster geometry makes everything except pins pretty
> cheap, and pins are not expensive.
> 
> This place seems like a good forum to discuss this sort of
> thing.

Very true. Particularly, I'd wished to know which routing 
technique/topology Chuck intends to implement on the network
processor and how the interfacing CPU/Router/Memory is done
en detail and how it will supported by the instruction set.      
 
Will you keep us up to date on early drafts, Jeff? - Thanks.

-- Eugene



> Jeff Fox
> 
>