Building my first 360 camera was extremely instructive of the difficulties and pain points of capturing immersive VR content. After crafting the image capture modules for the FPGA and creating stitching software for my desktop PC, I realized that capturing video and performing stitching on-board in real time would require an exponential increase in system complexity. To meet this challenge, I started by choosing my goal specifications and redesigning the camera from the ground up.
- Output format: 3840 x 3840 @ 30fps, 3840 x 1920 per eye
- 8 cameras
- No audio recording on board, to keep things simple - all videos will be accompanied by a fitting orchestral score to be performed at the time of viewing
- Ideally keep cost under $1k
The image sensor I previously used, the OV5642, was great for its ease of use and resolution but the output interface limited it to 5fps of full resolution output. In my search for an inexpensive replacement, I found this $15 module with the Aptina AR0330, capable of outputting 3MP at 30fps through a simple parallel interface, just like that of the OV5642.
The logic on the FPGA needs to be significantly more complicated than the modules in the previous version, which simply stored the incoming raw pixel stream into the DDR3 RAM, where the ARM HPS read the image buffer and saved it to the MicroSD card. High-speed image processing tasks like this require a great deal of memory bandwidth and processing power. To power the full warping and stitching pipeline, I designed a setup using 3 DE10-Nano boards. 2 of the DE10 Nanos run identical logic, each one receiving images from 4 image sensors, debayering, and warping the images to the spherical projection required for VR viewing. The third DE10-Nano receives downsampled grayscale versions of the camera images from the first two FPGAs. It performs block matching on these grayscale images to get disparity maps, filters the disparity maps, and sends them back to the other FPGAs to be used for stitching.
From my research, it seemed that the Jetson TX2 was the best option and only reasonably price embedded device capable of encoding 4k video at 30fps. The fastest and most efficient way to get image data onto the Jetson is through its 12-lane MIPI CSI-2 camera interface. Unfortunately, the DE10-Nano doesn't have pins that can be configured to be used as CSI transmit lanes. Because of this, it'll be necessary to insert an intermediate parallel-to-CSI image conversion device. There is at least one chip (Toshiba TC358748XBG) designed for this purpose, but it's in BGA format - meaning it'll require an expensive and hard-to-assemble PCB - and doesn't appear to be regularly stocked. The simplest option seems to be using the Intel MAX 10M50 development board, which has 3.3V IO pins that can talk to the DE10s and a built in MIPI CSI-2 transmit PHY. I'll need at least 2 of these, since they have 4 TX lanes each.