Close
0%
0%

ThunderScope

An Open Source Software Defined Oscilloscope

Similar projects worth following
The goal of this project is to design and build an open source PC-connected alternative to low cost benchtop 1000 series oscilloscopes that is competitive on both performance and price. The specs this project must achieve are at least 100MHz on four channels, at a similar price to other entry-level scopes.

 I started this project sometime in 2018 and have been working on it ever since. From the very beginning, I've planned to release this project as open source, but fell prey to perhaps the most classic excuse open source has to offer: "I'll release it when I'm done". And so, the project moved forward, past various milestones of done-ness. And my fear of showing not just my work, but the (sometimes flawed, and always janky) process behind it kept me making the same excuse. In doing so, I've missed out on the input of the open-source community that I've spent so long lurking in, spent nights banging my head against problems that could have been spotted earlier, and slowed down the project as a whole.

"The best time to open source your project was when you started it, the second best time is now"


The project is now in a near-completed state and is released as open-source on GitHub under an MIT license. I will be making a series of project posts here detailing all the failures, fixes, and lessons learned in chronological order. I look back to when I was first learning about hardware through following open source projects and although I could learn a bit from finished layouts and schematics, the most I've learned is from blog posts and project logs that describe the problems faced and how they were solved. I wish to do the same for those just starting out in this amazing field, and hopefully also release an excellent oscilloscope for them to use in their electronics journey! If you're interested, sign up at Crowd Supply to be notified when the campaign starts!

  • FPGA Module: Extreme Artix Optimization

    Aleksa04/25/2022 at 02:33 1 comment

    It's been a while since I posted one of these! I've got a few days before another board comes in so I figured I'd post a log before I disappear into my lab once again. Hardware-wise, we left off after the main board was finished. This board required a third-party FPGA module, which had a beefy 100k logic element Artix-7 part as the star of the show, costarring two x16 DDR3 memory chips.

    But wouldn't it look better if it was all purple? The next step was to build my own FPGA module, tailored specifically to this project.

    Read more »

  • Demo Video!

    Aleksa11/07/2021 at 00:01 0 comments
  • Software Part 2: Electron, Redux and React

    Aleksa10/30/2021 at 19:12 1 comment

    Despite the name of this project log, we aren't talking about chemistry! Instead, I welcome back my friend Andrew, who I now owe a couple pounds of chicken wings for recounting the war stories behind the software of this project!

    We’re coming off the tail end of a lot of hardware, and some software sprinkled in as of the earlier post. Well my friends, this is it, we’re walking down from the top of Mount Doom, hopefully to the sound of cheering crowds as we wrap up this tale. Let ye who care not for the struggles of the software part of “software defined oscilloscope” exit the room now. No, seriously, this is your easy out. I’m not watching. Go on. Still here? Okay.

    Let’s get right to the most unceremonious point, since I’m sure this alone is plenty to cause us to be branded Servants of Sauron. The desktop application, and the GUI, is an Electron app. I know, I know, but hear me out. For context, Electron is the framework and set of tools that runs some of the most commonly used apps on your computer. Things like Spotify and Slack run on Electron. It is very commonly used and often gets a bad rep because of various things like performance, security, and the apps just not feeling like good citizens of their respective platforms.

    All of these things can be true. Electron is effectively a Chrome window, running a website, with some native OS integrations for Windows, macOS and Linux in general. It also provides a way for a web app to have deeper integrations with some core OS functions we alluded to earlier, such as the unix sockets/windows named pipes system. Chrome is famously known for not being light on memory, this much is true, but it has gotten significantly better over the last few years and continues to be so. Much the same can be said for security, between Chrome improvements that get upstreamed to Chromium and Electron specific hardening, poor security in an electron app is now often just developer oversight. The most pertinent point is the good citizenry of the app on its platform. Famously, people on Mac expect such an app to behave a certain way. Windows is much the same, though the visual design language is not as clearly policed, many of the behaviours are. Linux is actually the easiest since clear definitions don’t really exist, this has led to, funny enough, the Linux community being some of the largest acceptors of Electron apps. After all, they get apps they may otherwise not get at all.

    As much as I would love to write a book containing my thoughts on Electron, I am afraid that’s not what this blog calls for. So, in quick summary, why Electron for us, a high speed, performance sensitive application? I will note this, none on the team were web developers prior to starting. It is very often the case that when web developers or designers switch over to application development, they will use Electron in order to leverage their already existing skills. This is good, mind you, but this was not the case for us. We needed an easy way to create a cross platform application that could meet our requirements. In trying to find the best solution, I discovered two facts. Fact the first, many other high speed applications are beginning to leverage Electron. Fact the second, finding out that integration with native code on the Electron side is not nearly as prohibitive as I initially thought. So, twas on a faithful Noon when I suggested to our usual writer, Aleksa, that we should give Electron a whirl. I got laughed at. Then the comically necessary “Oh wait no you’re serious”. I got to work, making us a template to start from and proving the concept. That’s how we ended up here.

    Read more »

  • Software Part 1: HDL, Drivers and Processing

    Aleksa10/21/2021 at 01:07 0 comments

    We've gone through a lot of hardware over these last 14 project logs! Before we leave the hardware hobbit hole to venture to software mount doom, let's take a look at the map of middle earth that is the block diagram of the whole system.

    The first block we will tackle is the FPGA. The general structure is quite similar to the last design, there is ADC data coming in which gets de-serialized by the SERDES and placed into a FIFO, as well as scope control commands which are sent from the user's PC to be converted to SPI and I2C traffic. Since we don't have external USB ICs doing the work of connecting to the user's PC, this next part of the FPGA design is a little different.

    There is still a low speed and a high speed path, but instead of coming from two separate ICs, both are handled by the PCIe IP. The low speed path uses the AXI Lite interface, which goes to the AXI_LITE_IO block to either fill a FIFO which supplies the serial interface block or to control GPIO which read from the other FPGA blocks or write values to the rest of the board. On the high speed path, the datamover takes sample data out of the ADC FIFO and writes it to the DDR3 memory through an AXI4 interface and the PCIe IP uses another AXI4 interface to read the sample data from the DDR3 memory. The reads and writes to the DDR3 memory from the AXI4 interfaces are manged by the memory interface generator. The memory here serves as a circular buffer, with the datamover always writing to it, and the PCIe IP always reading from it. Collision prevention is done in software on the PC, using GPIO data from the low speed path to determine if it is safe to initiate a read.

    Read more »

  • ThunderScope 1000E Rev.1

    Aleksa10/12/2021 at 00:27 4 comments

    The time had come to make a new prototype, one with all the hardware needed to accomplish the goals of this project! The front end was well proven at this point, and just needed a slight shrink to fit under an off the shelf RF shield. The ADC had always behaved well during my tests, but it needed a new (and untested) clock generator since the one I had prototyped with wasn't suited for it. Most disturbing of all, I needed to design with an Artix-7 FPGA and DDR3 RAM in BGA packages for the first time.

    Tackling that last point first, I saw way too much risk in putting these BGA parts down on one board that I hand stencil and reflow solder on a hot plate. Not just that, but I only had three months until I had to submit this project to graduate my electrical engineering program and had no experience working with DDR3 nor even large BGA packages. I committed to learning these skills for the next revision, but had to find something to tide me over in a hurry.

    Enter, the TE0712-02 FPGA module. This bad boy had two DDR3 ICs, the second largest Artix-7 part, and only needed a 3.3V rail to operate. As my favorite circuits professor put it, "Simplicity itself".  

    Read more »

  • Designing and Testing a 1 GHz PLL

    Aleksa10/06/2021 at 23:04 0 comments

    Now that I knew that the throughput to the PC could match the ADC’s rated sample rate of 1 GS/s, I had to make a circuit that clocked the ADC at that rate as well. This circuit needed to output at 1 GHz with very low jitter, as any jitter on the ADC sample clock will turn into noise during the conversion process.

    The heart of the clock generation circuit is the phase locked loop (PLL). Without getting into too much detail, the PLL compares the phase of a low frequency reference (generally from a crystal oscillator) with a divided down copy of a high frequency that is generated by a voltage controlled oscillator (VCO), which it tunes until the two match. By changing the division settings any frequency can be synthesized, with the accuracy and jitter characteristics of the reference conferred onto the output.

    Looking at the other scopes that use the same ADC, I found that many also used the ADF4360-7 in their clock generation circuit. I did some research on the part and it seemed to be the cheapest solution that would give me the 1 GHz output I needed. This chip had an integrated VCO, so the only other parts I needed were the reference oscillator and some passives. Saving me loads of digging into the datasheet, Analog Devices had a tool for calculating all the values of the passives as well as the register values to program for a given output frequency.

    That sticky note yellow colour... The navy blue connections... That's not KiCad! It's true, it was at this point that I was offered an Altium license through my school. And with the size and scope of the next board already in mind, and the year of internships working with it, I decided to switch over. As for the design, I chose to use two 50Ω resistors (R5, R6) to bias the output as opposed to a more complicated matched network. The reference oscillator (Y1) was a 16 MHz crystal oscillator, which came temperature compensated for added frequency stability, and the LDO (U2) was a low noise part to avoid noise on the power rails affecting the performance of the circuit. Decoupling cap values were copied from the part's evaluation board and the rest of the passive values were taken from the design tool.

    Pictured here, a 1 GHz postage stamp! I didn't have any decent way to test it on its own, so I hooked the SPI bus up to the rest of the oscilloscope prototype and updated the software to set all the registers on the chip at boot.

    First I connected the RF output to a balun on a scrap ADC board to generate a single ended output that I could test on my spectrum analyzer. I then verified that it output at 1 GHz and used KE5FX's excellent GPIB toolkit to measure its phase noise performance against the simulation values from the tool as well as calculate total RMS jitter.

    Here it is against my RF signal generator (in pink). The 100 Hz range was off, but the other ranges matched the simulations pretty well. The RMS jitter from 1.00kHz to 1.00MHz (didn't have a screenshot of this range, so the numbers are different here) was 760 fs vs. a simulated value of 580 fs. All of this looked promising, so I moved on to functional testing.

    I hooked up the RF output into the ADC board through the two UFL connectors I included for differential inputs and updated the FPGA code to reflect the new clock rate. I then ran a quick capture to a CSV file, and the script hanged! That was odd, so I started debugging. Eventually, I found that the ADC wasn't outputting a clock at all! I looked through the clocking section of the ADC datasheet and this line jumped out at me:

    "For differential sine wave clock input the amplitude must be at least ± 0.8 Vpp."

    A quick trip to the dBm conversion table later, I found that I needed at least 2 dBm of output power. I had about -5 dBm! The matched output network I mentioned earlier would net me an output of -2 dBm according to the datasheet, which is still not up to spec.

    My conclusion is that the circuit would probably work,...

    Read more »

  • Mach 1 GB/s: Breaking the Throughput Barrier

    Aleksa09/25/2021 at 20:31 0 comments

    Now that the front end was in a satisfactory state, it was time to revisit the architecture of the digital interface. At this point it had been over a year since I designed that board. I chose a USB 3 Gen 1 interface capable of 400 MB/s (which proved to be 370 MB/s in practice) as a stopgap to develop on until a USB 3 Gen 2 chip was released that could match the 1 GB/s throughput of the raw ADC data. Unfortunately, the FX3G2 on Cypress's USB product roadmap failed to materialize, leaving me with few options.

    I considered using the Cyclone 10 GX (which is the cheapest FPGA with the needed 10 Gb/s transceivers) with USB 3 Gen 2 IP, but even this couldn't reach 1 GB/s, topping out at 905 MB/s according to the vendor's product sheet. I considered PCIe, which is super common on FPGAs, with free IP and loads of vendor support! However, that would seem to limit this to desktops, since most people don't have PCIe slots on their laptops.

    They did have the next best thing though! Thunderbolt 3 (and now USB 4 and Thunderbolt 4) supports up to four lanes of PCIe Gen 3 at a maximum throughput of 40 Gb/s. Perfect! Unfortunately, though the chips themselves are freely available on Mouser, the datasheets are not. I didn't worry about that yet, as I could prototype the system as if it was just a PCIe card by using an external GPU enclosure. This review and teardown really showcased how simple the extra Thunderbolt 3 circuitry was, so I didn't feel like it was a big stretch to incorporate it once the PCIe design was tried and true. I bought the enclosure and got to work finding a new FPGA to do all the PCIe magic.

    I used this list of FPGA development boards to find the most affordable way to start prototyping with PCIe. This turned out to be the Litefury, an Artix-7 development board which appears to be a rebadged SQRL Acorn CLE-215+ (an FPGA cryptomining board). Although this board had the four lanes of PCIe I needed, it came in an M.2 form factor so it needed an adaptor. It didn't have a built in programmer either, so I used this one, which was the cheapest one that worked directly with Vivado (Xilinx's IDE for their FPGAs).

    Shown above is the Vivado block diagram of the Litefury example design, this design allows DMA access from the PC to the onboard DDR3 memory and vice verse. I would use this to verify the transfer speeds when connected directly to a desktop PC compared to those through Thunderbolt when it was installed in the enclosure. I installed the XDMA drivers (which I had to enable test mode in Windows for, since the driver is unsigned) and ran a basic transfer with the maximum transfer size of 8 MB.

    It took 7.072 milliseconds to receive 8 MB, which is just over 1.1 GB/s! Best of all, this number didn't budge when I tested it over Thunderbolt!

    This inspired me to finally gave this project it's name: ThunderScope!

    Follow this project to catch my next post on designing a 1 GHz PLL to take advantage of this blazing fast transfer rate, and then promptly learning my lesson about cribbing off the other oscilloscope manufacturers!

    Thanks for giving this post a read, and feel free to write a comment if anything was unclear or explained poorly, so I can edit and improve the post to make things clearer!

  • Testing The New Front End Architecture

    Aleksa09/16/2021 at 00:04 0 comments

    It was time to see if the third time really was the charm and test the newest revision of the front end! The first task was to test the front of the front end (FFE) - the coupling circuit, attenuators and input buffer.

    Look ma no probes! I started off by verifying the DC bias voltage at the output, which was just about the 2.5V I expected. The exact value of the bias voltage isn't important as it will be matched by the trimmer DAC once the channel is calibrated. I tested the AC coupling by adding a DC component to the signal, which caused no change to the DC voltage at the output. Next, I enabled DC coupling and confirmed that this DC component was now added to the bias voltage at the output. I then measured the DC gain, which was just under unity. After the coupling tests, I switched on the attenuator and was greeted with a flat output - no oscillations this time! I cranked my function generator to the highest voltage it could do, and lo and behold I could see the signal again, now attenuated by a factor of 100.

    I then connected the FFE to the PGA and used the front end tester board to test the frequency response of the whole front end. I did this to avoid loading down the FFE’s buffer circuit with the high input capacitance (13 pF) of an oscilloscope input.

    The frequency response certainly looked more promising than the previous attempts! The bandwidth was about 230 MHz, out of the 350 MHz promised by the simulations. This alone wouldn’t be too much of an issue if I scaled back the bandwidth requirement to 200 MHz. The real issue here is the flatness of the response, which is over +/- 0.5 dB when it should ideally be +/- 0.1dB. That means that on a scope with this front end, a 100 MHz clock would look 10% larger than a 32 MHz clock! 

    These peaks and valleys in the frequency response could have been caused by parasitics (unwanted inductance and capacitance) in the layouts of the two boards and in the connection between them. To reduce these parasitics and improve the bandwidth and flatness of the frequency response, I combined both FFE and PGA into one front end board, moving all the parts closer together to shrink the layout. 

    This new board improved the bandwidth to 260 MHz and the flatness to 0.25 dB. This was clearly a step in the right direction, but also showed that the likely culprits were the components on the board. I resolved to tweak the component values to improve the response later, but was satisfied enough to keep this design and continue on to a very exciting new development in this project - breaking the 1 GB/s barrier!

    Thanks for giving this post a read, and feel free to write a comment if anything was unclear or explained poorly, so I can edit and improve the post to make things clearer!

  • A New Front End Architecture

    Aleksa09/01/2021 at 23:59 0 comments

    At this point, there was one big issue with the front end. The attenuators could not be switched in without causing the whole circuit to oscillate! This issue was compounded by the maximum 0.7 V output of the PGA as well as the massive cost of the design (three relays and an unobtainium opamp don't come cheap). Since I already had to use digital gain to boost the output of the PGA, I decided to remove the opamp gain stage present in the current front of front end (FFE) board and replace it with a unity gain (x1) buffer. Using a unity gain buffer would allow me to remove one of the attenuators, as it would not need to scale the input voltage just to gain it up anyway. I would also need to use an active level shifting circuit instead of the resistive divider to avoid losing half the signal shifting it up to a DC level of 2.5V. Below is the spreadsheet I used to plan out the attenuation and gain needed for all the voltage division settings. 

    Let's take a look at the schematic, starting from the input coupling and attenuation block. I chose to remove the 50Ω termination relay to lower cost per channel since this wasn't a feature often used or provided on entry level scopes like this one. The move to one attenuator also saved another relay's worth of materials cost, and I replaced the mechanical relay used for the coupling cap with a solid state relay (U2) to further reduce cost. The input coupling cap and its relay were moved from behind the attenuator to in front of it. This maintains consistent input impedance behavior in AC-coupled mode regardless of the attenuator state, as before it would go from infinite resistance at DC to the 1 MΩ impedance of the attenuator when the attenuator was switched on.

    Taking inspiration from the example oscilloscope circuit on page 34 of the LMH6518 datasheet, I used a JFET (Q1) as an AC-coupled input buffer alongside a opamp (U1) to handle the DC portion of the signal while adding the 2.5V offset needed for the PGA input. A JFET was a great choice for a front end buffer since they have very high input impedance and contribute very little noise to the signal. I used a clever circuit from page 34 of Jim Williams' AN47 application note to automatically bias the JFET at IDSS. This point is defined as the current at which the voltage between the gate and source is zero, resulting in a gain of exactly one - great news for our buffer! The circuit works by having the opamp (U3) adjust the current through the JFET using the BJT (Q2) until the filtered DC voltage at the output is equal to the DC component of the input (generated by U1) which by the definition above results in IDSS!

    Hopefully this mashup of two interesting circuits makes for a working front end! Join me in the next project log where I go through the testing and results for this board and talk about the next steps I took to perfect this design.

    Thanks for giving this post a read, and feel free to write a comment if anything was unclear or explained poorly, so I can edit and improve the post to make things clearer!

  • How Are The First Few Bytes?: Full System Testing

    Aleksa08/22/2021 at 19:53 0 comments

    Now that the FPGA code was done, I could finally assemble and test the whole system. There were many untested blocks at this point, so each block was tested incrementally to pinpoint any issues. Once these incremental tests were done, the final test would be hooking up a signal to the front end and getting the sampled signal data back to the host PC.

    The first of the incremental tests I did on the system was to turn a relay on in the front end. This would confirm that the FT2232 chip as well as the FT2 Read interface, FIFO and I2C FPGA blocks were working correctly. I figured out which bytes to send based off of the IO expander IC's datasheet and made a quick python script using pyserial to send the data (this interface on the FT2232 looks like a serial port to the PC). I executed the script and heard the clack of the relay on the front end board, it worked!

    Next up, I would send a SPI command to the ADC to come out of power down mode. The ADC clock starts running when it goes into active mode, so I programmed the FPGA to blink the LEDs if it gets a clock from the ADC. This would confirm that the SPI FPGA block and ADC board worked. Some more datasheet searching and a new line of python later, I was greeted with a well-deserved light show from the (too-bright) LEDs on the digital interface board.

    I tested the maximum transfer rate next. To do this, I lowered the clock generator's frequency from 400 MHz (theoretical maximum throughput of the FT601) down until the FIFO full flag (which I tied to an LED for this test) was not set while running transfers using FTDI's Data Streamer Application. This resulted in a consistent data throughput of 370 MB/s. This also verified that the FT6 Write block was initiating transfers correctly when the requests came in from the host PC.

    Up to this point, I didn't check the actual data coming in, only that the transfers were happening. I enlisted the help of a more software-savvy classmate (this scope would become our capstone project in a later term) to modify the data streamer code to dump a csv file from the data received. I then set the ADC to output a ramp test pattern. Since this pattern was generated inside the ADC, it would test only the FPGA blocks and not the front end. I captured the data and got what i expected: a count up from 0 to 255 and back to 0, over and over again. I did a basic check through the file and found no missing counts, this meant the transfers were completing smoothly with no interruptions in the FIFO or in the USB interface.

    Finally, I hooked up my function generator to the front end, got together the set of commands needed to start sampling and sent them to the ADC. This would be the final test, a real signal in and sampled data out.

    WE HAVE A PULSE! IT LIVESSSS! I was very happy to see the whole system working, but it had a long ways to go to meet the goal of this project. First of all, the front end still only supported a select few voltage ranges since the attenuators didn’t work. Secondly, the ADC’s sample rate was limited to 370 MS/s (of the 1 GS/s it was capable of) by the FT601’s maximum sustained transfer rate of 370 MB/s. And of course, software needed to be made to stream, process and display the data in real time. In my next blog post, I’ll recount how I fixed the front end issues and lowered the system’s materials cost with a new architecture!

    Thanks for giving this post a read, and feel free to write a comment if anything was unclear or explained poorly, so I can edit and improve the post to make things clearer!

View all 18 project logs

  • 1
    Assembly Video

View all instructions

Enjoy this project?

Share

Discussions

Valerio wrote 02/20/2023 at 10:07 point

any news about the project?

  Are you sure? yes | no

Aleksa wrote 02/20/2023 at 19:48 point

Yup! Testing rev3 of the baseboard now and have new FPGA modules in production. I haven't been keeping up with project updates here, but if you want to follow the development in real time feel free to join our discord server: https://discord.gg/pds7k3WrpK

  Are you sure? yes | no

EdaMilesLin wrote 12/01/2022 at 01:58 point

Hey Bro! I know HMCAD1520( 2Gsps ADC ) it's hard to buy in distribution, i am  in china! So I want to know your purchase channel !!! hope  your repy!!

  Are you sure? yes | no

EdaMilesLin wrote 12/01/2022 at 02:05 point

I also DIY a scope  ( 1Gsps x 2, and 2Gsps x1 ),united a  logic analyse, scope, Signal generator,by use Zynq Ultrascale+ MPSOC . Now  I lack HMCAD1520 2pcs!!

  Are you sure? yes | no

mh-nexus wrote 11/01/2022 at 14:08 point

Just want to encourage you to keep going. This project is very interesting and will be quite useful, especially since it allows direct logging on a computer. I am looking forward to the CrowdSupply campaign.

If at some point in time you also plan to make a 10 bit version, that would be great!

  Are you sure? yes | no

Aleksa wrote 11/01/2022 at 14:28 point

Appreciate it! We're making good progress to launch by the end of the year. As for a 10 bit version, how about 12 :) I've got a lead on a tray of hmcad1520s which can sample at 8, 12 and even 14 bit and is pin compatible with our current ADC (hmcad1511)

  Are you sure? yes | no

mh-nexus wrote 11/02/2022 at 10:35 point

The high res version would be interesting. I see the chip is about 100 instead of 50 $, so it could be an interesting option for those willing to have higher res :)

  Are you sure? yes | no

perrymail2000 wrote 11/25/2021 at 00:13 point

Are there plans to make this compatible with sigrok?

  Are you sure? yes | no

Aleksa wrote 12/08/2021 at 16:58 point

Sorry for the late reply, I didn't get notified about this comment for some reason! We're focusing our efforts on glscopeclient right now, but it should be able to support sigrok with appropriate tweaks to how the triggered data is sent to the client software.

  Are you sure? yes | no

edmund.humenberger wrote 11/20/2021 at 09:28 point

You probably know https://hackaday.com/2019/05/30/glscopeclient-a-permissively-licensed-remote-oscilloscope-utility/

There is a recent demo if its capabilities.  https://www.youtube.com/watch?v=z0ckmC2RXi4

  Are you sure? yes | no

Aleksa wrote 11/20/2021 at 18:08 point

That's a great demo, I'm seriously considering integrating it into this project. Why reinvent the wheel adding all these features when another open source project has them all? Just got to figure out how to hook the two together. I'm not a software guy myself, so I'd love to chat with a contributer behind that project to figure things out with!

  Are you sure? yes | no

edmund.humenberger wrote 11/21/2021 at 10:13 point

Awesome hardware without proper SW support is pretty useless. I was told that any hardware project these days consists of 80% software development effort. I really suggest to you to find someone who is capable and >>willing<< to put in the effort to make GLSCOPEclient work with your hardware.  But finding this person will be a challenge in itself. You might be able to provide/find funding to/for this person.
If you succeed and make a first version usable, you can tap into the community of developers for the GLSCOPEclient and don't have to build your own community for your firmware (which is even harder).

Your opportunity with your headless scope is that all existing cheap scopes suck with their capability to transfer waveforms >>fast<< to the PC (this is where you shine).

(PS: the 8 bit resolution unfortunately is on the low side)

  Are you sure? yes | no

drandyhaas wrote 11/19/2021 at 14:51 point

Hi,

Great project! As the developer of the first CrowdSupply scope ( https://www.crowdsupply.com/andy-haas/haasoscope ) I share your goals!

I've read a bit through your work here, but I have two main questions. 

What chip do you use to get 1 GB/s to the PC? 

Can a PC CPU really keep up with processing that much data in real time? To calculate the triggers, for instance, might take 10-100 floating point operations per sample. That's 10-100 GFlops. Do you use multiple threads/cores? GPU?

Thanks, Andy.

  Are you sure? yes | no

Aleksa wrote 11/19/2021 at 16:34 point

Hi Andy,

Seeing your scope succeed on Crowd Supply made me realize that people really do want open source test equipment - great work!

We've used the hard PCIe IP in a Artix7 FPGA to reach >1 GB/s with four lanes of PCIe gen 2. These PCIe lanes go to a Thunderbolt device controller and out to the user's PC.

Currently we only have edge triggering set up, but it does work in real time. This is since it only takes one operation per sample (subtract one from the other) and another operation to check for trigger events, that is only done once per block of samples. This will be further optimized by using a proper SIMD implementation. Triggering is only one part of the pipeline, so we do use multiple threads. We're aiming to run smoothly on any modern quad core, so can't use too many threads. Rendering the waves is GPU accelerated, but should run smoothly on integrated graphics.

Please feel free to ask me any more questions you might have, and consider joining our discord! https://discord.gg/pds7k3WrpK

Cheers,

Aleksa

  Are you sure? yes | no

remydyer wrote 08/26/2021 at 16:18 point

Great project.
I have a question: What's the highest data rate you can actually sustain continuously with that tiny little 24kB buffer on the usb3 fifo? 

I ask, because I know that with USB 2.0 Hi-Speed, one really needs at least about 8MiB of sdram attached to the FPGA as a 'deep buffer' in order to maintain 30 MB/s without dropping packets. This isn't the fifo chips' fault - it's the USBIF's fault for not requiring USB root hub controllers to handle packet timing with state machines and DMA when they added hi-speed.

What happens all too often, is that the PC OS just doesn't poll the bus sometimes, and the hardware attached to the fifo needs someplace to store fresh data whilst the usb-fifo chip is full and waiting on the OS.  This all didn't matter with USB 1 speeds, but with USB 2.0, just missing a packet time for a few too many microseconds really breaks using bulk transfers to capture data steadily from an FPGA with ADC's attached.

But with USB 3, I hope, this should not be an issue - I fervently hope that the super speed bus can in fact DMA straight through to host ram without needing the OS to service an interrupt. I haven't tried it, which is why I ask.

I found that it was a very good idea to test transfer integrity by just running a free-counting 24 bit binary counter on the FPGA - having it increment for each sample, and have a copy of it streamed through the USB fifo all the way to a file on a (big) disk. 

This helped me verify that it could reliably sustain the data rate I was shooting for by leaving it running until it filled the disk array (about 11TB at the time). With an incrementing counter, you can quickly scan the beginning and end of the file, and very easily determine whether the counter is where it should be at.

I also found that leaving an oscillocope to 'watch' the 'fifo full' fifo interface pin was good practise - you're looking for pulses longer than expected, which means the data isn't flowing as expected. 

In any case - I'd suggest that streaming the raw (from ADC) data straight to disk, and then looking at it 'retroactively' is a very good way to do science. In my line of work, I do 1MS/s capture of 6-8 channels at 12 bits, then just save it down to a file. This is run like the old paper 'strip chart recorders' - start and run all day - never bother with trying to 'trigger' and save just data you think might be interesting - you miss all the stuff that happens unexpectedly. And since I'm working things that may break very quickly without warning, it has been very helpful to have such a 'black box recorder' record to go back through later to figure out exactly what went wrong. I regard the 'trigger and save' approach to basically be too close to cherry picking. It's too easy to miss too much.

Anyway since the work is in an 'industrial' environment, I have a linux SBC in a box with the ADCs/FPGA's etc (with a gigabit ethernet adaptor) with which I stream the data out to where the big disks are, just using a couple invocations of netcat. I have mostly just been using the ztex.de usb-fpga boards this way, although only the usb 2.0 ones.

The program called 'snd' (https://ccrma.stanford.edu/software/snd/ or just 'apt install snd' in any debian) is very useful for quickly looking at arbitrary raw PCM files. It's intended for raw sound file editing, but uses memory mapped io and can accept data with arbitrary number of channels, format and sample rate. It can seem to 'lock up' if you open a very large file and zoom 'all the way' out - but this is because it internally scans the whole file and makes a low-res map of it. It may take a while, but when it's done you can then zoom right in anywhere - feels a bit like using google earth. 

For actual processing/data extraction, you can just use numpy from python. Just open the raw pcm file with memory mapped io, and let the OS kernel worry about chunking/loading/unloading it through memory. It just looks like an array to you ( there's a package called tqdm that easily adds nice progress bars with ETA's, great when you're chewing through multi-TiB data files). This usually results in performance quite close to disk read speed, depending on how much processing you do. Profile and use cython etc where it matters, if it does.

I have also got a setup which does the whole 'looking for a trigger, and saving so many seconds capture' setup, but that was at a much lower data rate with NI hardware and software. Using a multithreaded software architecture with separate threads and queues to pass data between them was key there, as was assembling the data into fairly large 'blocks' to handle at once. First thread handled 'catching' the data and chunking it up, second handled looking for trigger conditions and handling a ring buffer so that data before the trigger could also be saved, third thread to catch the 'collected' data to be saved out to a file.

If I was going to suggest anything to do with processing the datastream live, rather than just saving to a file and worrying about it later,  it would be to look into using either gstreamer (which is for pipeing video processing - also tends to be fairly heavy data streaming rates) and/or gnuradio (or both). 

Gstreamer would be especially useful, as it is already architected to try to handle high data rate streams like uncompressed video.  You could use your data to feed 'live' instrument readouts / plots (generated within custom gstreamer plugins), which you could then 'mix' into another live video stream. You could even then connect this directly to youtube, and livestream video with a event-detecting oscilloscope overlay mixed in running from live data. (I have kind of done this, but cheated by putting the whole oscilloscope I was using where the high res security camera I was livestreaming from could see it). 

Would be great for any experiments where things can go wrong quickly, and I suspect using gstreamer like that is possibly how spacex does it when they launch rockets. From what I recall, you could even use html5 in gstreamer to draw the overlay, and you can certainly do 'live' video mixing like cutting between cameras and greenscreen etc. 

Gnuradio is also an obvious one - the SDR guys are going to love your hardware, I am sure!

Hope this helps, good luck!

  Are you sure? yes | no

Aleksa wrote 08/27/2021 at 01:05 point

Great comment! I found that the FT601 could sustain a data rate of 370 MB/s. To verify that, I lowered the clock rate from an external clock generator until the FIFO full led wasn't lit. I also used a counter much like you described and sifted through the data in csv format to make sure it was all consecutive. I really like the idea of triggering off of the FIFO full pin (since it should never be full while streaming) and the method of analyzing the data coming out (certainly beats waiting ~10min for excel to do anything in such a large csv). Piggybacking off of video processing is also an interesting prospect for handling such large streams of data. I appreciate the suggestions!

  Are you sure? yes | no

Aaron Jaufenthaler wrote 06/15/2021 at 08:11 point

Thank you for the logs. I enjoy reading them

  Are you sure? yes | no

Aleksa wrote 06/15/2021 at 15:01 point

Thanks, glad to hear you're enjoying them so far!

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates