home .. forth .. colorforth mail list archive ..

[colorforth] Conquering the BIOS [LONG]


INTRODUCTION

I have been working on a disassembler with the ultimate goal to
better boot colorforth by analysing the bios. Here I report on the
progress thus far.

COMPILING TO A BUFFER

First I had to modify my assembler package such that it can assemble
to a buffer on a different place then where it is executed.
Unlike in an ordinary Forth assembler the code cannot be interspersed
with auxiliary stuff like macro's. Remember ultimately we want to
reassemble the BIOS with this.

This is accomplished by having a buffer, and a word that
swaps the dictionary pointer to this buffer and back.
It was some work to adapt my assembler package to use this.

In the end I come by with using AS-C, AS-HERE in the
assembler package, with the same meaning as C, and HERE for
normal Forth, plus AS-ALLOT as a convenience in testing.
For this the package had to be simplified a little bit.
By including alternative files that defines these few words for Forth
assembling and classic assembling the package can be used in both
situations.

It further turns out that the program counter such as seen by the
program when executing is needed. It is primarily used in the code to
be assembled, but also in a tool for relative jumps. This is called
_AP_ and for the Forth situation it too is indentical to HERE.

LABELS

In order to have a label LAB it is sufficient to do
    _AP_ CONSTANT LAB or
:: LAB (after : :: _AP_ CONSTANT ; )
In my Forth (ciforth) it is possible to use prefixes such that
:LAB expresses the same by defining
: : _AP_ CONSTANT ;  DENOTATION
DENOTATION is similar to IMMEDIATE , but demands that : is
recognized also when it is not a separate word, but a mere prefix.
(In colorforth it would be a new color. There are some technicalities
here to avoid conflicts with the colon compiler.)
The bottom line is that labels now look like in MS-DOS batch files.
The problem that labels might be used before they are defined is
addressed in the next section.

TWO PASSES

The assembly starts with
    <target-address> ORG
This puts the alternative dictionary pointer to the start of the
buffer and associates an address in the target space with it.
This means that a second time the same code is assembled to the
same address.

In the first pass, whenever there is an unknown word, it is assumed to
be a label. All the error detection in the kernel goes through ?ERROR
. By revectoring it the applicable errors 10 and 12 can be patched
such that they no longer break off the compilation. In the second pass
this mechanism is disabled, revealing any real errors. In the end all
labels are defined two times.

ADVERSE WIND

There was a bug in the package that came out with the modifications I
had to made. It lead to infinite test output. This took long to find
out, because the tests involve megabytes and hours. Also I made the
discovery that towards the disassembler it must be specified whether
an area is 16 or 32 bits code. This means that both the 8086 and 80386
tables must be loaded and switched at runtime -- probably using a
vocabulary mechanism. Decompiling an accidental 16 bits instruction in
the middle of 32 bits code and v.v. has never been implemented and
must be addressed too. (The prefixes to switch data and address
sizes.)

HOW IT LOOKS

This is the test.asm file and its disassembly.
The RX, is a convenience for not having to know how exactly to
manipulate _AP_ in the code.
-------------------------------------------------------------
1278 ORG
ASSEMBLER
CLD,
MOV, X| T| DI'| MEM| XXX X,
:QQQ
POP|ES,
ADD, B| F| AL'| D0| [SI]
MOV, X| T| DI'| MEM| XXX X,
:XXX
MOV, X| T| DI'| MEM| QQQ X,
JMP, XXX _AP_ 4 + - (RX,)
JMP, XXX RX,
PREVIOUS
-------------------------------------------------------------
CLD,
MOV,   X|   T|   DI'|   MEM|   1294 X,
POP|ES,
ADD,   B|   F|   AL'|   D0|   [SI]
MOV,   X|   T|   DI'|   MEM|   1294 X,
MOV,   X|   T|   DI'|   MEM|   1285 X,
JMP,   -11 (RX,)
JMP,   -16 (RX,)
-------------------------------------------------------------

The assembly proceeds with this script:

-------------------------------------------------------------

\ Load a classical two pass 386 assembler
INCLUDE aswrap.frt     \ Compile to buffer
INCLUDE asgen.frt      \ Generic part
INCLUDE asi386.frt     \ Tables
INCLUDE label.frt      \ Two pass and label mechanism

\ Two pass assembly
'?ERROR-FIXING >DFA @   '?ERROR >DFA !   \ First pass
INCLUDE test.asm
'?ERROR RESTORED    \ Second pass.
INCLUDE test.asm

\ Disassemble for comparison
CODE-SPACE CP @ DISASSEMBLE-RANGE

-------------------------------------------------------------

Total cost this far is about 150 WOC (words of code)
This amounts to one and a half screen.

Next on the agenda is to have the disassembly take into
acount the labels XXX and QQQ and the ORG.

Groetjes Albert.



Albert van der Horst,Oranjestr 8,3511 RA UTRECHT,THE NETHERLANDS
        One man-hour to invent,
                One man-week to implement,
                        One lawyer-year to patent.
albert@xxxxxxxxxxxxxxxxxx   http://home.hccnet.nl/a.w.m.van.der.horst

---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@xxxxxxxxxxxxxxxxxx
For additional commands, e-mail: colorforth-help@xxxxxxxxxxxxxxxxxx
Main web page - http://www.colorforth.com