home .. forth .. colorforth mail list archive ..

[colorforth] Reverse engineering the BIOS


Dear all,
This is a side trail. We want to analyse the BIOS in order
to extract information about interfacing colorforth to the hardware.
Whoever wants to be involved, will have to contact me directly,
because I will not burden the colorforth mailing list after this
message (until practical successes.)

What I already have is a Forth assembler/disassembler for the i386
with the "reverse engineering property". Code disassembled is such
that if reassembled the exact same binary code results. (Keeping in
mind we may have to modify the BIOS in the end.) You can retrieve it
from my site (see the signature.) <URL>/forthassembler.html

The basic word for disassembling is
    DISASSEMBLE-FROM-ADDRESS    ( addr1 -- addr2 )
It takes one instruction at addr1 , prints its disassembly
and increments the address past the instruction, leaving addr2.
(With an alias DFA for convenience.)

I have this nice 6809 disassembler (in C) of Sean Riddle. It allows to
accumulate knowledge about the code into a data file, such that
subsequent disassemblies get more and more sophisticated, e.g. if at
address F78A there is a label get_key defined, it automatically
generates code like JSR get_key. I have used it and it worked nicely
to extract code about accessing UART's on some 6809 boxes donated
to the Dutch Forth user group by Schiphol (say Amsterdam Airport.)

This is his description of the data file:
"
   The optional data file contains 7 types of lines:

 o Remarks - these are lines beginning with a semi-colon (;)
   they are completely ignored.

 o 1 ORG line - gives the origin of the code; this is the starting
   address to be used for the disassembly.

 o COMMENT lines - used to add comments to the end of lines of the
   disassembly.

 o COMMENTLINE lines - provide full-line comments to be included before
   a given address in the disassembly.

 o DATA lines - mark sections as data.  These sections will not be
   disassembled, but dumped as hex data instead.

 o ASCII lines - mark sections as text.  These sections will not be
   disassembled, but printed as text instead.

 o WTEXT lines - interprets section as text encoded as in Joust,
   Bubbles, Sinistar (0x0=0,...,0xa=space,0xb=A,...,0x24=Z,...,0x32=:

 See sample data files (*.dat) for examples.
"

Drawing on the facilities available in my Forth (ciforth: lina under
linux, wina under Windows 3.11 ..XP) and using the above as an example
I come to the following design.

There is a set with pairs (address action). The address points into
the code to be disassembled. The action is a Forth word. For speed
the set can be ordered, but this is a later concern.

Before a disassembly is done the address is looked up, and if present
in the set the corresponding action is executed. This may print a
comment, but it could print a memory area as ASCII, and advance the
disassembly pointer past it. Also label addresses can be looked up. If
the corresponding action is a label word then the label name can be
retrieved.

The actions in the set are just Forth words. They are generated and
added into the set by loading a file. So effectively the data file is
Forth code, generating headerless words and allocating strings in
the dictionary. Of course you can also manually add knowledge to the
set, and later dump it back to a source file again.

All of the following words specify an action to be done for an
address, adding to the above set. Strings are ciforth string constants
(sc), i.e. an address length pair. sc's can have any length and may
contain embedded quotes and new lines. 

\ When at addr:
\ Print the comment sc before the disassembly, with a closing
\ newline. It make sense to have multi-line string constant.
(addr sc ) COMMENTLINE

\ When at addr:
\ Print the comment sc after the disassembly, but before its closing
\ newline. Such a comment probably has no embedded new lines.
(addr sc ) COMMENT

\ Make the system aware of a label with name sc at addr.
\ Before addr is disassembled the label is printed with a colon prefix. 1]
\ Where appropriate (jumps e.g.) addr is replaced by the label.
\ Like so `` :getkey   MOV, A| ....  ''  ``   BSR, getkey ##, ''
(addr sc) LABEL

\ The range between addr and addr1 (non-inclusive) is disassembled as
\ separate characters like so ``   .CHAR 1B &A &B &[ ^J   ''
\ The assembly pointer is of course advanced to addr1.
(addr addr1 ) .CHAR
\ Likewise for .STRING "we gaan er voor"
\              .BYTE   17 4F 17 1B 01 13
\              .WORD   17F6 17F9
\              .LONG   7898,ABCD  7898,ABCD  7898,ABCD
\              .QUAD   7898,ABCD,7898,ABCD

That will be sufficient for a usable tool.

Left to be done: automatic determination of data/code type
by analysis. Scrolling through disassembly. Left clicking with
mouse to jump through code. Right clicking to select the data/code
type. Of course recording all this to the data file.

On the reassembly front: add labels and those .CHAR etc. words
   the assembler. ciforth has a facility for handling prefixes,                
hence the syntax :getkey for named labels.

Volunteers welcome!

--

Albert van der Horst,Oranjestr 8,3511 RA UTRECHT,THE NETHERLANDS
To suffer is the prerogative of the strong. The weak -- perish.
albert@xxxxxxxxxxxxxxxxxx     http://home.hccnet.nl/a.w.m.van.der.horst


---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@xxxxxxxxxxxxxxxxxx
For additional commands, e-mail: colorforth-help@xxxxxxxxxxxxxxxxxx
Main web page - http://www.colorforth.com