home .. forth .. misc mail list archive ..

Re: MISC-d Digest V99 #106


MISC-d-request@pisa.rockefeller.edu wrote:
> 
> Subject:
> 
> MISC-d Digest                           Volume 99 : Issue 106
> 
> Today's Topics:
>          Re: MISC-d Digest V99 #105
>          Re[2]: MISC-d Digest V99 #105
> 
>     ---------------------------------------------------------------
> 
> Subject: Re: MISC-d Digest V99 #105
> Date: Wed, 29 Dec 1999 00:52:10 EST
> From: "Wayne Morellini" <waynemm@hotmail.com>
> To: MISC
> CC: sz@uc.ru
> 
> Hello sz
> 
> I'll answer some of the questions because you have been so helpful before,
> and not so many people are using the list now a days.
> 
> From: sz <sz@uc.ru>To: "MISC@pisa. rockefeller. edu" <MISC>Subject: F21 and
> possible enhacements.Date: Thu, 23 Dec
> 
> Hello Misc@pisa.,
> 
>   I've discussed some of F21 features in our Russian FIDOnet
>   conference and my opponent points me that F21 works at 100MHz
>   because it can't get faster in 0.35 technology. As far as I remember
>   it's untrue. Am I right?
> 
> You are right the 100Mhz was either for the p21 1.2 or .8 micron, or the
> speed running from dram  (due to pad power up time limitations in the memory
> interface).  I do wish they (Chuck's misc cpu's) would move to DDR-ram,
> SDRAM or SRAM, but that is probably financially out of reach now.   What is
> the transistor count for pure  F21 command execution core?  Good question,
> everything was supoosed to be less than 15,000 tranistores, maybe less than
> 9000, based on the old p21

 100mhz was the fastest burst speed of the mup21 - as Jeff has listed on
his spec sheets
 the fastest burst speed for a f21 that could work from rom was 333mhz -
however the 
 current f21d speed must be judged by 
  a. the number of processors running in dynamic ram and the number of
times 
     they jump over page boundries 
  b. if the code is small enough to run in static ram - then  the main
processor can run 
     closer to the 200 mhz range 
   
>   Also, what is real state of stack computers?
    current stack computers include the novix - 
    Harris rtx used I think by ball brothers for space payloads 
    the patriot-scientific 1000 used now to run java applications on set
top boxes 
    - info on their web sight claims that they can do interrupt driven
java code faster
    than pico java chips designed for java code
     both the rtx and the pt-1000 are based on chuck's novix and shaboom
designs
 
> ? Usefull for tasks that don't require existing register CPU, and leading
> edge.
> 
>   Why mainstream goes
>   register CPU? Because stacks are slower to access, or because
>   mainstream did not develop suitable technology to handle stacks?
> 
   Mainstream designers stick with upgrades to 8080 code to keep all
that old software
   working. At some point they will have to change. What I have seen of
the Merced processor, it looks like they will change. It has three 41
bit opcodes stored in a 128 bit word and most opcodes work with three
registers - each in a range of 128 registers. 
 What Charles Moore is trying to do is have the simplest combination of
hardware and software. The stack computer makes opcode decoding and
addressing as simple as possible. 
 Chuck can get four five bit opcodes to work with 
   a. top of stack and operations combining top of stack and second
stack position
   b. push stack to return stack or pop from top of return stack to
stack 
   c. store indirect or fetch indirect using an address register
   d. store address from stack into address register or fetch address
from address register to stack 
   e. use both the address register and return stack to do a fetch with
address increment and store with address increment - this makes for a
very fast move sequence.
     The move is used a great deal in graphics, work processors,
database manipulation 
      and string operations. More complex processors have address
increment operations
  but you must be careful when looking at complex processors - speed
comparisons are made much more difficult because of 
  a. opcodes of many clock cycles - the 68000 had internal 32 bit
registers, but 
     because of multiple cycles per opcode - an 8 mhz 68000 was not
faster than an 8 bit
     1 mhz 6502 on some input output 8 bit operations - higher speed
variants of the 
     6502 were better for some microcontroller applications. 
  b. use of cache memory - what happens when the cashe is flushed - how
much code is 
     actually in the cache on average - how good is the cache hit rate. 
     When later Intel processors are compared to Chucks mup21 with 7000
transistors 
     and very small die size in 1.2 micron fab - you are comparing the
cheapest possible
     small design in terms of computer aided design and lowest cost chip
fabrication   
     to the R&D budget of a company so large and powerful that their
total valuation
     is greater than the GNP of some third world countries. If Chuck
could do his designs in the current .15 micron size and include his
entire dram and sram addresses
     on chip - about the same scale in terms of transistors as the
lastest intel or alpha processor - his chip could run continuous on chip
access and show useable continous instruction speeds in the range of 600
million to 1 billion. Since Chuck is
using an integer only opcode design, this chip would still be only
better for total integer applications. 
  c. how well does the whole system design work - in intel based pc
clone designs
     there is a lot of glue logic to consider when comparing speeds 
     1. dma, chip refresh, timer interupts, large number of input output
interrupts
        have to be considered when judging how well a system design
works .
        With the F21 chip one on board processor handles video - simpler
video 
        one processor does analog input and output - sound output
perhaps 
        one processor does network input and output 
        the main processor setups up all the other processors thru
memory access only 
        - multiple dynamic memory access still slows down memory access,
but problems
        seen in windows 95/98 where even serial data as slow as 28.8 khz
can be lost 
        because the system is not handling all interrupts fast enough
can be avoided.
       Chuck's chips try to include as many simple system functions as
possible on chip.
        This means that very cheap system boards can be made that also
use a great deal
        less low level code. This low level code also executes in less
time which makes
        it possible for fast interrupts such as network packet reception
to be handled
        without external first in first out buffers. To get my windows
95 modem to work 
        well I need to add a board with a serial fifo - a 16550 chip on
it.  

     d. how much work has to be done to produce a compiler to optimize
the processor 
        design 
        1. the complexity of modern compiler design was shown to me by a
speed                     comparison made between risc processors -
alpha, mips and sparc
           each one needed a very good compiler to optimize the use of
register                    operations to try to beat intel code and to
beat other risc processors. 
          one particular speed comparison showed the alpha behind the
other two 
          risc processors in comparisons running c code in some
situations
          I did not understand how this was possible since that
particular alpha ship
            perhaps 233 mhz or 300 mhz was faster in opcode speed. A
friend explained
         that indeed the alpha chip was faster but that Dec had not yet
written a                compiler that was optimal for the alpha chip.
Later comparisons showed that
        their compiler had been improved and the alpha was shown to be
the fastest chip.
        The point I wish to make is that a very large company with years
of computer           science experience - DEC - needed considerable
amount of time - perhaps more than 
       three or four years to make an optimal compiler. That is complex!
        In the case of Chuck's chips - a fast small stack design with
very quick opcode
        decode and less than 32 opcodes - a useable and fairly fast
forth system can be
        implemented with a great deal less effort. The person doing the
chip fabrication
       design - mainly Chuck - and the person doing the chip simulation
and testing - 
       again mainly Chuck - can also do the compiler - operating system
- driver -             design - again mainly Chuck. 

    If you check the most recent articles about Chuck's work on Ultra
technology, Chuck
 talks about having redone the I21 Itv chip design to use sdram. This
makes dram access
 faster and uses the memory chip design that is most likely to be the
lowest cost per 
  bit. In the article he said that the most recent design was not put on
a fab line yet.
  If Itv does have large scale production a new fab will be done later. 
  
    Jeff Fox has put the text and some pictures up from video 1 that
shows chuck using
 his okcad system to display F21 chip design. I bought the video, and
would encourage
 anyone to look at that article because it shows just how productive
Chuck is. by using
 simple well thought out code he designed an entire chip layout,
simulation and test 
 system on a 386 computer using very little dos or bios code. Some
people make fun of just how simple his designs are. He even stated that
you should get rid off all file 
 operations when possible and load the whole application into memory
with only a simple
 load from disk and store with a similar total memory  store to disk. I
can understand
 that now that systems with 32meg to 256meg are available, a forth
application can be 
  loaded totally in a small percentage of the total memory without much
chance of running out of memory. I ironically spent a few hours getting
a create-file operation
 to work under gforth. I do not know how I could have so much trouble
with a file operation because I have done file and/or block operations
on 6502 fig forth, f83 forth, f83s forth, f-pc forth and win32for. I
wish to find ways I can work with a simpler
 hardware, software combination. Perhaps I could become a little more
productive.
         anyway best wishes for the new year and I hope to chat with all
of you after
the new year if the world does not end.  gary
 

> Why go Microsoft, PC, BMX bikes, BMW/Citroen CV2, western Qwerty Keyboard
> etc = fashion, right place, right time etc and being stuck with it.
> 
> Best regards,
> sz                          mailto:sz@uc.ru
> 
> Merry Christmas and a happy new Mellenium (or Y2KB experience for those in
> system admin;) to you sz and all Miscers.
> 
> Wayne
> 
> P.S. sz calculate that only 1-2 cycles are needed to render each pixel in
> voxels, with *8 that much for photo realism.  I suggest that you stick with
> voxels, maybe usefull for a MISC extension, or with programmable silicon a
> hardware accelerator.