Sunday, January 25, 2009

SEAForth Matrix Multiplication Example update 1

It's been about 1 year since I posted my initial SEAForth matrix multiplication example. A lot has happened since then including changes to the VentureForth IDE/simulator as well as the commercial release of actual SEAForth chips. That's the good news. The bad news (for me at least) was that the changes were such that it broke my old code. That wouldn't have been so bad if there had been "this has been deprecated" section in the docs, but then I suppose that would have made the transition too easy. ;) Seriously though, some of the changes actually make the code much cleaner. Also I've made the code much more modular thanks to the judicious use of include files.

First I'll go through the changes that I've noticed in the new system and then an overview the new matrix multiplication code.

Change 1 : Node specification

In the original VentureForth you specified the node you were compiling to this way:


12 node !


Now the compiling node is specified this way:


12 {node
...
.../ some VentureForth code
...
node}


It's a lot clearer where compiling to a node begins and ends. This can also lead to a clean modular code pattern that will be discussed later.

Change 2 : Stream Specification

Section 2.3.1 of the VentureForth manual explains streams. Basically it's a way for code and data to be initially distributed to the correct nodes. If you are just learning the SEAForth system as I am you probably aren't ready to create your own custom stream nodepath. But you do need to make sure all of the nodes are initialized. The following code will do that.

19stream

reset

Change 3 : Literal Specification

The original VentureForth compiler would compile literals automatically. Now a "#" is required after every literal. Note that port selection constants are really literals and require a "#" after them also. So:


'r--- b!
7 for @p+ !b unext


becomes:


'r--- # b!
7 # for @p+ !b unext

Toward More Modular Programs

I saw the following code pattern in the "blinktest" sample project:


\ Main program.vf
.
.
19 {node 0 org here =p include module.vf node}
.
.


So I've followed a similar pattern for my code, breaking what was a single monolithic module into 6 smaller (and in my opinion easier to read) modules.

Main module : matrixmult.vf

( $Id: matrixmult.vf,v 1.0 2006-11-2 $ )
\ Matrix multiplication example
\ Parrallel multiplication of a 2 row matrix by a 2 column matrix
\ 1) Data initially is in node 12
\ 2) Row1 is passed to node 18
\ 3) Row2 is passed to node 13
\ 4) Column1 is passed to nodes 18 and 13 simultaneously
\ 5) Multiplications R1*C1 and R2*C1 done simultaneously
\ 6) Column2 is passed to nodes 18 and 13 simultaneously
\ 7) Multiplications R1*C2 and R2*C2 done simultaneously
\ 8) Results R1*C1, R1*C2, R2*C1, R2*C2 are returned to node 12

v.VF +include" c7Dr03/romconfig.f"

12 {node include node12.vf node}
13 {node include node13.vf node}
18 {node include node18.vf node}

19stream

reset

cr


Node 12 code : node12.vf
Note: This is the taskmaster node. It passes the data to nodes 13 and 18 to be processed and the collates the results.



\ ******* node 12 ***************************************
\ Pass row1 to node 13, row2 to node 18
\ and columns 1 and 2 to nodes 13 and 18 simultaneously

0 org here =p

\ Stream R1 to node 13
'r--- # b! \ point reg b to right node. (I/O node 13)
7 # for @p+ !b unext
1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ,

\ Stream R2 to node 18
| '-d-- # b! . . \ point reg b to down node. (I/O node 18)
7 # for @p+ !b . unext
9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 ,

\ Stream C1 to nodes 13 and 18 simultaneously
'rd-- # b! . . \ point reg b to right and down nodes. (I/O nodes 13 and 18)
0 # !b . . \ dummy write for synch purposes
15 # for @p+ !b . unext
17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 ,
25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 ,

59 # a! . . \ point A register to address 59
'r--- # b! . . \ point reg b to right node. (I/O node 13)
@b !a+ . . \ Get result R1*C1
@b !a+ . . \ Get result R1*C2
'-d-- # b! . . \ point reg b to down node. (I/O node 18)
@b !a+ . . \ Get result R2*c1
@b !a+ . . \ Get result R2*c2


Nodes 13 and 18 basically do the same thing. Get a row of data from node 12, then use that row to calculate a dot product of two columns of data.

Node 13 code : node13.vf

\ ******* node 13 *********************************************
\ Receive row1 then multiply it by columns 1 and 2
\ and return results.

0 org

include 8bitmult.vf

here =p
'r--- # b! . . \ point reg b to right node

include vectormult.vf


Node 18 code : node18.vf

\ ******* node 18 *********************************************
\ Receive row2 then multiply it by columns 1 and 2
\ and return results.

0 org

include 8bitmult.vf

here =p
'-d-- # b! . . \ point reg b to down node

include vectormult.vf


8 bit multiply : 8bitmult.vf

: 8*8 ( n1 n2 -- n1*n2 ) \ 16 bit output
push 2* 2* . \ left shift n1 8 times
2* 2* 2* .
2* 2* 2* .
pop +* +* +*
+* +* +* +*
+* push drop .
pop ;


Vector multiply : vectormult.vf

\ ******* vector multiply *******************************
\ Receive a row then multiply it by columns 1 and 2
\ and return results.

$20 # dup a! push \ point reg a to buffer - leave adr on return stack

7 # for @b !a+ unext

@b drop \ dummy read for synchronization purposes

pop dup a! push \ reset a to start of buffer
dup xor \ initilize TOS to 0
7 # for
@b @a+ 8*8
+ next

dup dup xor \ preserve result from vector mult - initialize TOS to 0

pop a!
7 # for
@b @a+ 8*8
+ next

!b !b \ Send results back to right node