This is the second version of my FPGA VR Camera, now with video! This time, I'm building a VR video camera that shoots 4k (3840 x 1920 per eye) stereoscopic 360 video at 30fps, while stitching and encoding it on-camera in real time. All image processing functions will be performed on FPGAs except the final H.264 encoding, which will be carried out on an Nvidia Jetson TX2.
Shown below is the high-level data flow and hardware connection diagram for the camera.
Here are the components of the project that I have successfully implemented so far:
Camera I2C control
Camera image warping modules
Camera interface PCB
In progress/partially working:
Block matching optical flow
Grayscale conversion and downsampling
Optical flow filtering
Not yet started:
Jetson encoding software
Image stitching modules (uses optical flow map to shift pixels and create interpolated views between cameras)
Data interfaces between FPGAs and FPGA -> Jetson (either use MAX 10M50 dev boards or some other parallel-to-CSI device)
The camera interface PCB connects the DE10-Nano GPIO to 4 AR0330 modules and provides the power rails for the image sensors. My main design objectives were to keep the connections between the cameras and DE10 as short as possible to maintain signal integrity, and to mount the PCB perpendicular to the DE10 PCB so it would stay out of the cameras' field of view as much as possible.
To create seamless stereoscopic images, the 8 camera modules have to be laid out in an octagon with 64mm between every other camera's center of projection. I used OnShape to design a camera mount and sketch a PCB to attach to the mount and connect to the cameras. You can access the OnShape document here.
From OnShape, I exported the PCB shape, mounting hole locations, and connector positions as a DXF for use in Eagle. Ideally, I could route all of the camera connections through the 40-pin right-angle header and have only one connector to the FPGA board. Unfortunately, there are 36 available pins (4 for power and ground) and 4 cameras, leaving only 9 pins per camera. The parallel bus on each camera has an HSYNC, VSYNC, PCLK, and 10 pixel data lines. PCLK and HSYNC are the most critical to getting valid pixel data, so I opted to put these on the main header along with 7 bits of pixel data. 7 bits is far from optimal but the colors are going to be converted to 16-bit RGB with 5 bits per color so I don't think it makes much of a difference. The rest of the connections, VSYNC, XCLK (camera clock input), SDA, SCL, and RESET, are all on another header that connects via jumper wires to the DE10-Nano Arduino IOs.
I ended up assembling the PCB using solder paste and a hot plate for the SMD parts. The camera connectors were surprisingly easy, requiring a bit of rework but a lot less than I experienced the last time I soldered this type of connector.
Check out the Files section of this project for the PCB EAGLE project and BOM.
The figure below shows the system diagram I came up with after running the numbers on the system requirements. The DE10-Nano is still the centerpiece of the camera due to its high number of IOs, strong FPGA, and low price. I debated using the Ultra96 board, which has high speed connectivity, a faster memory bus, and more powerful Xilinx FPGA, but it wouldn't be able to support input from 4 cameras simultaneously and costs $200 compared to $130 for the DE10-Nano.
So far, I have the front end of the camera mostly working - the PCBs are assembled, and two DE10-Nanos are successfully talking to the cameras and performing debayering and warping.
Building my first 360 camera was extremely instructive of the difficulties and pain points of capturing immersive VR content. After crafting the image capture modules for the FPGA and creating stitching software for my desktop PC, I realized that capturing video and performing stitching on-board in real time would require an exponential increase in system complexity. To meet this challenge, I started by choosing my goal specifications and redesigning the camera from the ground up.
Output format: 3840 x 3840 @ 30fps, 3840 x 1920 per eye
No audio recording on board, to keep things simple - all videos will be accompanied by a fitting orchestral score to be performed at the time of viewing
Ideally keep cost under $1k
The image sensor I previously used, the OV5642, was great for its ease of use and resolution but the output interface limited it to 5fps of full resolution output. In my search for an inexpensive replacement, I found this $15 module with the Aptina AR0330, capable of outputting 3MP at 30fps through a simple parallel interface, just like that of the OV5642.
The logic on the FPGA needs to be significantly more complicated than the modules in the previous version, which simply stored the incoming raw pixel stream into the DDR3 RAM, where the ARM HPS read the image buffer and saved it to the MicroSD card. High-speed image processing tasks like this require a great deal of memory bandwidth and processing power. To power the full warping and stitching pipeline, I designed a setup using 3 DE10-Nano boards. 2 of the DE10 Nanos run identical logic, each one receiving images from 4 image sensors, debayering, and warping the images to the spherical projection required for VR viewing. The third DE10-Nano receives downsampled grayscale versions of the camera images from the first two FPGAs. It performs block matching on these grayscale images to get disparity maps, filters the disparity maps, and sends them back to the other FPGAs to be used for stitching.
From my research, it seemed that the Jetson TX2 was the best option and only reasonably price embedded device capable of encoding 4k video at 30fps. The fastest and most efficient way to get image data onto the Jetson is through its 12-lane MIPI CSI-2 camera interface. Unfortunately, the DE10-Nano doesn't have pins that can be configured to be used as CSI transmit lanes. Because of this, it'll be necessary to insert an intermediate parallel-to-CSI image conversion device. There is at least one chip (Toshiba TC358748XBG) designed for this purpose, but it's in BGA format - meaning it'll require an expensive and hard-to-assemble PCB - and doesn't appear to be regularly stocked. The simplest option seems to be using the Intel MAX 10M50 development board, which has 3.3V IO pins that can talk to the DE10s and a built in MIPI CSI-2 transmit PHY. I'll need at least 2 of these, since they have 4 TX lanes each.