A 32 Bit Variable Length Instruction Set Core and Transputer Like Comms Network
The Uart I had found has now been tested in simulation to see if it can transmit at 115200 Baud. It looks like it can so the fixes I put in appear to have worked.
As the testbench connects the tx to the rx a loopback is formed.
On checking the data coming back it appears that the same value is received back.
It's looking like its the correct speed as well, a 4.67% error over the entire 10 bits. Also the count is reset again at the start of the transmission so it should be good.
This shows the result
Open Source is excellent stuff, I would highly recommend looking at the FOSSI organisation which is dedicated to Open Source Hardware.
Unfortunately some Open Source shall we say leaves something to be desired.
Like a standard reset scheme, synchronous or asynchronous. It's best NOT to mix the two schemes unless you REALLY HAVE TO.
I am in the middle of debugging the Uart I got hold of. So far I've seen a mix of resets and registers not reset.
I sincerely hope that there are no latches in there !!
It's been a while but I thought I would add a log.
I decided to write a generic routine to allow text to be sent to the LCD screen on the dev board I have. I wrote a specific one a long time ago that sent 'Hello World' out to the display.
Now with the Dual and Single complexs I thought lets start to put bits of code together so that they can be used.
Using the techniques used in getting unformatted text out on the 'console' in the simulator time to use that and add in the LCD option.
Extracted the specific routine and then started to write the more generic routine.
Ran it up and hit a problems. This can be quite disheartening in a project that has been running this long. When you you hit problems you start to think
'Is it all worth while ?'
An interregnum ensues where Phased Array audio is investigated and using HP Calculators.
But laptop is with me on holiday and I start to skin the onion, all bugs have layers and each layer brings tears to the eye.
Crack the first RTL bug, sometime skip the stack pointer update ! Move onto the Assembler, fix a bug there, hit a new RTL bug (where have the instructions gone ?), fix that. Finally fix a typo in the Assembler (this was part of the old code but not up to date with present instruction name.
Now I have the PIO block sending out the control signalling for the LCD. Next step to get it all synthed again and to plug into the dev board.
Next thing after this will be to add in a proper comms link, still not got a RS232 block in there which hopefully should not be too hard to get in there !
It's been a while since I synthesized anything relating to Trinity.
I thought I would however take the Dual Core Complex and build it.
When it was synthesized back in 2011 it was around 13000 Logic blocks, lot's of room.
Now the Dual Core Complex, the Timer, PIO, Interrupt, the Memory Interconnect has consumed the Cyclone III FPGA, it's at 95% fill !
I spotted some long paths in there which I wasn't expectiing so managed to find a way of reducing those. Also added the PLL to give a clean clock.
The timing reports advise that I can get up to 30 MHz but I've got the clocks down at 25 MHz.
Yes it would be great to get up higher that this but at 95% fill I am pleasantly surprised it goes this fast.
There is a basic rule of thumb, once you start to get above 60% fill timing becomes harder, the more fill the harder. This is because the logic can't all be placed next to each other and the more logic requires more routing within the FPGA, this is a continuous issue with FPGA work.
Because I'm not doing this professionally I don't need to get it up to 100 MHz, 25 MHz is fine.
One thing did appear to be quite odd. The General Purpose register file appeared to be about 1500 Logic element (1024 registers), however the Control and Status registers was at ~ 20k Logic elements which seemed to twice the expected area. This is something to examine.
There is also something that is quite good to know, I appear to have only used 10 % of the available SRAM that is onboard which means that I can possibly expand the memories from their 4 KB blocks, however the routing may become an issue.
I have been looking at the tools again and have discovered that there is a methodology of reprogramming the memories without having to go through the whole process of resynthing the FPGA. Just get the updated if files and then run the tool and re -assemble. So all good.
A few days ago I was chatting to a friend with regards to documentation in Software and was astonished as to there being little done in comparison to Hardware. Well that has come back to bite me, I've been looking at some assembler I wrote six years ago attempting to update it for the new Dual Core complex and there aren't even any comments !!
Mea Culpa !
I intend to get the code converted and run up the LCD that is on the dev board that I have.
By the looks of it there needs to be a bit of preamble which fixed me and then we are into ASCII which is good news.
What next, well I think I need to get some kind of RS232 input in there so I can communicate properly.
This is not going to be as trivial as it sounds.
There is an RS232 port on the board but I need to workout how to connect to it, put a RS232 block within the code and then put in support to use it. Open ended projects are cool in this respect.
I have managed to get the latest binary for GHDL, 0.34.
This allows me to select signals to wave up.
Or rather it almost does.
Unfortunately it cannot cope with For Generate loops which means that if I want a specific signal each instance in the loop is also waved up.
Still this is a significant bit of progress.
Also I've put in a request for this to be fixed in future iterations.
I've created a Dual Core Complex that now has the Trinity Net block in it.
Then created a better frame work which instantiated this 'node' in a the three dimensional matrix.
The matrix is sized as a 2x2x2 which gives a total of 16 cores.
Run a test program but now back to the original issue with the previous array of cores which was that it takes a long time for anything to be simulated.
10 us takes 1 minute 40 seconds as each signal is recorded. I really need to get the latest version of the simulator to see if I can reduce the number of signals recorded.
It's been an interesting couple of weeks.
Testing out the Single Complex had as you may expect a wealth of entertainments.
The Arbiter needed some TLC. Springing into copying a simple count into memory the DMA copied it to elsewhere.
Interrupts. Interrupts opened up several bugs. The first being that the interrupt needed to be extended beyond a single pulse.
Interrupts also flagged up an issue with respect to Jumps. An interrupt was successfully called but the return was incorrect. The jump was flushing out the return address before it was captured.
The Jump is now held at the Execution stage until the next instruction is just about to arrive thus keeping a valid address for the Interrupt return address.
A enhancement was put in with respect to the Branch Prediction whic means it is now more efficient.
With a simple update to the CSR registers it was possible to add in a Processor ID. The reason for this was to allow a multiple core environment which would allow a shared ROM. This allows a core to determine which one it is and then run the appropriate code. It was also very simple to add in the extra core.
By adding the Timer I can now start to work on Coarse Grain Multitasking.
So fun fun fun :).
I now have a first cut Simple Complex, this has Trinity, a simple DMA, 16kB of Data SRAM, 8kB of Instruction SRAM, 4 kB of ROM, simple word DMA, Interconnect, PIO and the Interrupt controller.
I need to test it to ensure that it works.
Note the Instruction SRAM, the intention is that this will be an area which will allow the block to receive a block of code and then run from it.
I was thinking about applications for Trinity Net.
It comes to mind that a distributed array would be to analyse Radio Astronomy data.
While it would not be up to the demands of the Square Kilometer Array it would be interesting to have something which could analyse an array of data.
I now have a first cut nested interrupt controller and will need to test it. There is an initial interrupt conditioning block so that interrupts can be pulse, level and asynchronous. The conditioning set up via registers. It can have up to 256 interrupts at present but in theory it can have a programmable at build number.
Assuming this is all ok I just need to write a simple gpio block and maybe a Hitachi LCD Text driver rather than a bit bash code I got to display "Hello World" wayyyy back.
After that I can put together a
which will have the main features of a minimal micrcontroller by adding in the DMA, Interrupt Controller, Non Blocking Interconnect and the Open Cores UART.
This will allow me to place the elements into a small and thus cheap FPGA board to allow deployment into projects.
The Multicore environment will not be forgotten, I still have plans for this, initially formalising the system so that it is extracted from the testbench enviroment and become a deployable bit of IP with an enhanced Master Core.