Mining cryptocurrency with a lot of STM32F030

Similar projects worth following

Cryptocurrencies are the new 'trendy' things in 2018 and why don't we go with the trend by mining them with like, a lot of microcontrollers? Of course it'd be very, very, very slow, probably even slower than a CPU from 10 years ago, but it's all about the experience and the process, not the end result.

I took the inspiration from the hashing boards used in the AntMiner S9: Each board will be equipped with a lot of 'mining MCUs' and one 'controller MCU'. The controller gets work from a pool server, then breaks it down into smaller chunks and send them to the mining MCUs where the work will be performed. The result will then be sent back to the controller to be sent back to the pool server.

Some requirements:

  • Cheap, preferably under $200 because I don't have much capital to sink
  • Hashrate should reach 10kH/s at SHA-256
  • [to be added later]

This project is still in the researching phase so nothing here is concrete and may subject to changes in the future.

  • Is this project dead, and the GD32F330F8P6

    Ho Tuan Kiet09/27/2018 at 11:57 0 comments

    (can't believe it has been two months since the last posted project log)

    Okay, so first up: Is this project dead? Short answer: no. Long answer: 

    One thing that had been bothering me (until now) was that: since the uC we were using (STM32F030F4P6) and the one we were investigating on (GD32F330F4P6) only has 16kB of flash, how would we support multiple hashing algorithms, especially since the SHA256 firmware already takes up all the flash space? One solution that I did some investigation on was that we'd produce multiple firmwares for different algorithms, and then the appropriate firmware would be flashed upon user's request. This means on each hashing board, the manager uC should be able to flash all worker uCs with the appropriate firmware. However there're multiple shortcomings with this approach:

    • Programming has to be done sequentially, for each worker, and this would incur massive startup penalty (since you have to wait for all workers to be flashed)
    • I've yet found a way to make it work. Each worker uC needs at least three wires to program them: DIO, CLK, RST. And since a hashing board can accommodate a lot of workers, we'll have to figure out a way to multiplex all those signal lines so that the manager uC can do it.

    Because of all these problems, and the fact that I couldn't come up with something else, the project hit the wall basically. Until those problems are solved, there was no incentive to push this project forward.

    Which brings us back to today. Remember in the past when I talked about the GD32F330F4P6? It's an ARM Cortex-M4 chip offered by GigaDrive, packaged in the same TSSOP-20 package as the STM32F030F4P6, yet is quite more powerful. When I found out about that chip, I also learned that Gigadrive is also making the GD32F330F8P6, which is basically the F4P6 but with a lot more of flash space, 64kB to be exact. With this lot of flash, I now can pack more algorithms into the firmware, which means I don't have to solve the problem of mass programming anymore. However, at the time when I talked about the GD32F330F4P6 (June 2018), I couldn't found any resellers offering the F8P6 variant. There're only resellers offering the F4P6 one, and I have to buy them via a proxy service. Another dead end, welp.

    Not until now. This is September of 2018, and I finally found a dealer selling it internationally, with shipping. Ladies and gentleman, I'd like to introduce LCSC: They're selling it at $1, which is still less expensive then the STM32F030F4P6. And they even offer cheap shipping to my country at $3. And I can also order PCBs from JCLPCB and have them ship together with my uC from LCSC. Win-win.

    So a major issues have been solved, and now let's just cross our fingers and hope that I'd be able to finish the hashing board somewhere in October/November. Um no, I'm just joking. Since we'll be switching to the GD32F330F4P6 now, and no one produces development board for that chip, the first job is to spin a dev PCB for that chip, and then I can work on adding support for that chip in the firmware, and then after that I can work on the hashing board again.

  • Talking about the design

    Ho Tuan Kiet07/02/2018 at 18:25 0 comments

    While waiting for the replacement chips to come, I guess I should write about the design and structure of this project. The main reason is that I've though of that for a while, and writing it down makes sure that I know what to do next.

    What we're doing right now is to build a cluster of uC for the purpose of mining cryptocurrencies. To achieve that aim, the cluster shall consist of two main parts:

    • The controller board
    • Multiple hashing boards, connected to the controller board

    The controller board acts as a job scheduler. Basically, it receives work from the pool (via Ethernet port), then splits it up into multiple jobs, then distributes them to the hashing boards (explained below). When a nonce is found, the controller shall collect the result and send it back to the pool. Also, it shall monitor the health status of the hashing boards and display them onto an LCD screen. In case the hashing algorithm needs to change, the controller shall be able to reflash the entire hashing boards.

    The controller board will receive +12V from an external power supply. This power source is then used to power the controller board and all hashing boards attached to it. On the controller board, there shall be up to 16 4-pin connectors which is used to attach the hashing boards, consisted of +12V, GND, SDA, SCL (for I2C communication). Do notice that the hashing boards shall consume a lot of power (say 1A).

    Each hashing board is consisted of a manager MCU (any 48 pins STM32), and up to 64 worker MCUs (STM32F030F4P6/GD32F330F4P6). The manager receives jobs from the controller board, then divides it even further, and pushed them to the workers. The manager also handles the job of monitoring the status of the workers, and to provide the controller with that information. Even further, the manager shall be able to reflash all worker MCUs (using a firmware supplied from the controller) This allows for the cluster to change the hashing algorithm depending on the coins being mined. The reflash shall be done by using the UART bootloader on the worker uC. Communication is done using I2C. 

    A complete cluster shall be consisted of one controller board, and up to 16 hashing boards, stacked on top of each other.

  • Status on the worker's firmware, and an alternative to the STM32F030F4P6

    Ho Tuan Kiet06/17/2018 at 14:36 0 comments

    After nearly three weeks of work, I can finally say that the worker's firmware is 95% completed. Although all functionalities has been implemented, I still say 95% because the other 5% is for the bugs/features that haven't been found yet. Too bad as I accidentally pushed -5VDC through my STM32F030F4P6 board, it's dead now and until I get a replacement board I wouldn't be able to sort out the remaining 5%. That also means no screenshots/video demos available, so bear with me.

    Source code are available on GitLab so you can compile and test it out. Compatibility is only guaranteed on the STM32F030F4P6. If you're too lazy/can't afford to install all the prerequisites, you can download the automated builds and flash it on your 32F030. Download the latest production build that passes by the way, unless you want to study the internals/see the scrolling log, then the latest debug build is for you. After flashing, the user guide on utilizing the firmware is in the README. Basically, it exposes the 32F030 as an I2C device, so you'd need to write the block header to the device at this certain address (in the README), then it'll happily crunch through the nonce space.

    Some statistics:

    • Firmware size: 16384/16384 byte (that's quite an achievement)
    • Hashing rate: ~6104 hash/s

    Next, I've been looking for an alternative to the 32F030. The 32F030 is a Cortex-M0 processor, so the instruction set is limited and moving to a Cortex-M3/M4 would mean better performance and smaller code. However I've been avoiding the M3/M4 as no manufacturers make M3/M4 chip in TSSOP20 package, only TQFP32 and above, which makes soldering a pain. Until I found the GD32F330F4P6.

    Some background first: GD32 is a line of ARM Cortex-M processors made by GigaDevice. I first heard of it when they released the GD32F103C8T6, which according to many sites is a binary-compatible version of the STM32F103C8T6 except for some improvements such as faster clock and zero wait state flash. In other words, the GD32F103 has different silicon than the STM32 one, but the pins are the same, the registers are the same, so programs running on the STM32 also runs fine on the GD32 (except for some situations like SPI).

    The GD32F330F4P6 is a Cortex-M4 processor with the same flash/RAM size as the STM32F030F4P6, but with faster clock and what's more, it's the only TSSOP20 Cortex-M4 processor available on the market AFAIK, so it's the perfect alternative to the 32F030. Finding places to buy this chip is a hassle though, Google isn't anywhere helpful and neither is Aliexpress/Alibaba. Fortunately I found it on Taobao at $0.62, so I'm looking to buy 5/10 of them for initial testing (and hey, buying them is also a pain as I have to rely on agents). I do have concerns that these chips are garbage/low quality/fragile compared to other Cortex-M chips on the market, but hey, they're cheap as heck so might as well give it a try. If things go smoothly I'm hoping for a 2x-3x boost in hash rate.

  • GitHub is up

    Ho Tuan Kiet06/01/2018 at 15:42 0 comments

    As of now the source for the worker's firmware will be residing on (update: Microsoft is acquiring GitHub so I'm moving to GitLab Friendly notice: The code is very much incomplete and terrible and it doesn't even work correctly for now.

    While I've started on making the PCB that hosts the worker MCUs, I fell like I should be concentrating on the software first, then the hardware. They'll be available in the future, but just not now.

  • Taking a look at the ESP8266

    Ho Tuan Kiet05/28/2018 at 03:42 0 comments

    In the last log, I mentioned that the ESP8266 might be a good competitor to the F030 due to:

    • It's dirt cheap price ($1.5 and you can get an ESP-12)
    • The Xtensa processor embedded in the ESP8266 is 32-bit and runs at 80/160MHz

    tl;dr: The ESP-12 does 8620 SHA256d operations per second (or 116usec per operation). This is 3.36x faster than the F030

    At first, I didn't have high hopes for the ESP. While I know that the ESP and the F030 is pretty comparable except for the much higher clock speed, the ESP should be beating the F030 out of the water. Except had already implemented a Bitcoin miner on the ESP, and he only managed to get it to ~1200 ops/sec. That was quite counter-intuitive, so I decided to try it out by myself to see if it was really that bad, and if I could do anything to improve it.

    But first I need a USB-to-UART dongle. The CP2102 is pretty nice by the way.

    Then I loaded the same code that MerlotMachine used to my ESP8266. The result came out to be about 800usec per op, or just about 1250 ops/sec. Increasing the clock to 160MHz only made it a little bit more faster, at about 700usec/op or 1428 ops/sec. Basically I was hitting a brick wall here.

    Then I decided to look through the SHA256 implementation, and this seem like a pretty bad one. So I replaced it with the one that I used on the F030. Boom, now it took only 116 usec/ops (or 8620ops/sec). I though it would be fast, but not this fast.

    I'd account these two reasons for this drastic difference between the ESP and the F030:

    • Faster clock (duh) 64MHz to 160MHz = 2.5x increase in clock rate
    • More efficient LX106 instruction set. 24960 clock cycle to 18560 clock cycle = ~1.3x decrease in clock cycle

    So from that, the ESP looks like a very competitive alternative to the F030. But then here are some more comparisons:

    Hashing rate256486203.36x
    Power consumption[TBD]
    PCB efficiency (on a 5x5cm board)86 (ESP-01)
    4 (ESP-12)
    HeatCan't feel anyHot???

    Might as well stick with the F030 then.

    Disclaimer: I'm biasing towards choosing the F030.

  • STM32 and first steps

    Ho Tuan Kiet05/26/2018 at 19:02 0 comments

    After some months of swimming in school work, I can finally work on this project again! Time for some updates:

    Firstly, about the MCU. After looking at the plethora of MCUs out there, the STM32F030F4P6 comes out as the winner. It fits all of my requirements:

    • Heckin cheap ($0.7 / unit)
    • Small enough so I can cram a lot of them on a ~5x5cm PCB. The package is TSSOP20 though, not DIP, but that's a chance to practice SMD soldering.
    • Faster than an ATtiny85, obviously.
    • I can overclock it to 64MHz without an external crystal (96MHz with crystal)

    With that in mind, I'm also changing the project name to "STM32Miner". Having a short, memorable name is better than a full sentence that's hard to remember.

    Secondly, I've decided on using I2C to communicate between the worker and the master (master MCU to be decided). While using I2C means that the master will have to continuously poll workers for information, the hashrate is so low I don't think this would be a problem. Also I2C means that only two signal links is required, unlike UART or SPI.

    Currently, I'm working on the worker's firmware. Initial benchmark shows that the STM32F030 can reach 2500 - 2600 hash / sec of SHA256d operations (running at 64MHz, don't do this at home kids). While that's quite fast, it would still take 19 days for a F030 to scan through the nonce space (2^32). The SHA256 code I'm using is designed for generic data though, so some Bitcoin specific optimization should lead to a higher hashrate (I hope).

    (useless information: to measure the hashrate, I programmed the firmware to toggle a GPIO every 100 hashing operations, then a logic analyzer is used to measure the pulse width (below). At 39ms per 100 ops, it worked out to ~2564 ops / sec)

    By the way, I'm looking at the ESP8266/ESP32 as replacement to the F030. ESP8266 has an Xtensa 32-bit processor running at 160MHz (maximum), so in theory it should be somewhat faster then the F030. But an ESP-01 is also 2x more expensive than the F030, so unless I can get the ESP8266 to perform at least 3x faster than the F030, it's of no use.

  • Choosing mining MCUs

    Ho Tuan Kiet02/02/2018 at 05:34 5 comments

    My requirements for the mining MCUs:

    • Cheap (preferably <= $1)
    • Available in a DIP package
    • Minimize the pin count (preferably 8)
    • Can run without an external crystal
    • SRAM capacity >= 512 bytes

    The first requirement is that the MCU must be available in a DIP package. This pretty much excludes all 32-bit MCU (like the STM32). Hashing relies on 32-bit arithmetic and yes, 8/16-bit MCUs have terrible 32-bit arithmetic performance, but I don't have the equipment necessary to solder SMD parts. Also DIP parts = easier to replace in case of failure.

    Second requirement is to minimize the pin count (so that we can cram more MCUs onto one mining board). The best I could do is 8-DIP, and now only the ATtiny series and some PIC MCUs remains.

    Third requirement is internal oscillator as I don't want to put a crystal next to every MCUs. Luckily all MCUs nowadays have internal oscillator.

    Forth requirement is speed. While some PIC MCUs have very high internal clock speed (some up to 32MHz), they don't correlate to MIPS. A part might run at 32MHz but each instruction takes 125ns, which limits the MIPS to 8MIPS. Fortunately the ATtiny series can run up to 16MHz (with internal oscillator), and also they can do up to 1 instruction per clock cycle.

    Fifth is SRAM capacity. I haven't figured out the minimum SRAM capacity needed, but we're amining for about 512 bytes so now only the ATTiny85 remains.

View all 7 project logs

Enjoy this project?



Georg Wolf wrote 04/20/2021 at 16:12 point

I am trying the same project as you, but with teensy 4.0... Your last post is from 2018, hope you are well... but with the new teensy 4.0 (600 MHz + 2048 KB flash) i think the scales just tipped!

  Are you sure? yes | no

Kepler Miranda wrote 04/24/2018 at 04:40 point

Hello, why not use the Blue Pill (STM32F103C8T6)?

  Are you sure? yes | no

Ho Tuan Kiet wrote 05/26/2018 at 19:08 point

Sorry the the late reply. I've decided on the STM32F030F4P6 and you can read more about that in my latest log. While the F103 seems nice, I'm not comfortable with soldering 48LQFP, and it's a lotta more expensive than the F030.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates