I'm going to put up a better description later. Right now I just want to get this up on the web. It works! (Took a while). The source HTML was generated by the built in arrayForth HTML generator.

To run the code put the following at the top of block 1302:

0 node 200 load 1 node 204 load 2 node 206 node 3 node 210 load exit

Note that the multicore matrix example is in blocks 200 to 208 and uses nodes 0, 1 and 2. Block 210 does the same thing with a singlecore (node 3). The multicore example finishes in 1479 cycles and the singlecore example takes 3040 cycles. I verified that the multiplications are correct using the statistical language R. At some point I plan to make a video explaining how this all works, but again right now I just want to get the results up.

`multicore matrix multiplication example` ` cr` `a` ` 2` ` row by` ` 8` ` col matrix is multiplied by` ` cr` `a` ` 2` ` col by` ` 8` ` row matrix` ` cr` `resuting in vector r1*c1 r1*c2 r2*c1 r2*c2` ` cr` `r1out send vector r1 to right node` ` cr` `r2out send vector r2 to down node` ` cr` `rtdwn set port to right and down` ` cr` `synch dummy write to synchronize nodes` ` cr` `c1c2out send vectors c1 and c2 to right` ` indent` ` and down nodes` ` cr` `getres get results from right and down vectors` | 200 list`0` ` org` ` r1out` ` right b!` ` 7` ` for @p !b unext` ` cr` `1` ` ,` ` 2` ` ,` ` 3` ` ,` ` 4` ` ,` ` 5` ` ,` ` 6` ` ,` ` 7` ` ,` ` 8` ` ,` ` cr` `r2out` ` down b!` ` 7` ` for @p !b unext` ` cr` `9` ` ,` ` 10` ` ,` ` 11` ` ,` ` 12` ` ,` ` 13` ` ,` ` 14` ` ,` ` 15` ` ,` ` 16` ` ,` ` cr` `rtdwn` ` B5` ` b!` ` cr` `synch` ` 0` ` !b` ` cr` `c1c2out` ` 15` ` for @p !b unext` ` cr` `17` ` ,` ` 18` ` ,` ` 19` ` ,` ` 20` ` ,` ` 21` ` ,` ` 22` ` ,` ` 23` ` ,` ` 24` ` ,` ` cr` `25` ` ,` ` 26` ` ,` ` 27` ` ,` ` 28` ` ,` ` 29` ` ,` ` 30` ` ,` ` 31` ` ,` ` 32` ` ,` ` cr` `getres` ` right b! @b @b down b! @b @b r--- ` | |

`invect` ` a- read in vector and store at a`
` ab-a*b` ` 17` ` bit multiply`
` a-n read in vector and do dot product wi`
` p- read r from port p then read c1 and` | 202 list`invect` ` a!` ` 7` ` for @b !+ unext ;`
` a! dup dup or` ` 17` ` push . begin +* unext drop`
` a! dup dup or` ` 7` ` for @b @+ a push * + pop`
` b!` ` 60` ` invect` ` cr` `synch` ` @b drop` ` cr` `60` ` .prod` ` 60` ` .prod !b !b ; ` | |

` ` | 204 list`0` ` org` ` 202` ` load` ` 69` ` org` ` right r*c1,c2 r--- ` | |

` ` | 206 list`0` ` org` ` 202` ` load` ` 69` ` org` ` down r*c1,c2 r--- ` | |

` ` | 208 list`0` ` org`
` a! dup dup or` ` 17` ` push . begin +* unext drop` ` cr` `69` ` org` ` 3 3` ` * r--- ` | |

`singlecore matrix multiplication example` ` cr` `*` ` ab-a*b` ` 17` ` bit multiplication`
` v2ofs-n vector v1 is stored in a and` ` cr` `v2ofs is the offset to vector v2. sum the` ` cr` `products of v1 and v2. ` | 210 list`0` ` org`
` a push a! dup dup or` ` 17` ` push . begin +* unex`
` dup dup or` ` 7` ` for over a + b! @+ @b * + ne` ` 69` ` org` ` cr` `40` ` a!` ` 10` ` *sum` ` 8` ` *sum` ` 40` ` a!` ` 18` ` *sum` ` 10` ` *sum r--` ` 40` ` org` ` cr` `1` ` ,` ` 2` ` ,` ` 3` ` ,` ` 4` ` ,` ` 5` ` ,` ` 6` ` ,` ` 7` ` ,` ` 8` ` ,` ` cr` `9` ` ,` ` 10` ` ,` ` 11` ` ,` ` 12` ` ,` ` 13` ` ,` ` 14` ` ,` ` 15` ` ,` ` 16` ` ,` ` cr` `17` ` ,` ` 18` ` ,` ` 19` ` ,` ` 20` ` ,` ` 21` ` ,` ` 22` ` ,` ` 23` ` ,` ` 24` ` ,` ` cr` `25` ` ,` ` 26` ` ,` ` 27` ` ,` ` 28` ` ,` ` 29` ` ,` ` 30` ` ,` ` 31` ` ,` ` 32` ` ,` ` cr` |