I decided to write a bit more about my attempts to decode the video stream from the quadcopter. So far, I haven't achieved much but I learnt a bit about h264 and captured some video for experimenting. Writing a little program that reads the stream from the quadcopter and plays it in real time isn't rocket science, it should be easy to anyone with the patience to learn about libavcodec but it isn't working for me yet.
I used the following code to dump the video from the quadcopter:
import socket import sys s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(('172.16.10.1', 8888)) magicword='495464000000580000009bf89049c926884d4f922b3b33ba7eceacef63f77157ab2f53e3f768ecd9e18547b8c22e21d01bfb6b3de325a27b8fb3acef63f77157ab2f53e3f768ecd9e185eb20be383aab05a8c2a71f2c906d93f72a85e7356effe1b8f5af097f9147f87e'.decode('hex') s.send(magicword) data = s.recv(106) n=0 while n<10000: #write replace by while 1 if you want this to not stop data = s.recv(1024) sys.stdout.write(data) n=n+1 s.close()
This operation takes about 107 seconds to complete. The generated file weights 5.9 MB and contains about 1,364 frames (according to VLC). This means that the video stream of the quadcopter is about 54.35 KiB/s with an approximate frame rate of 12.76 fps. I recorded a video with the quadcopter facing a timer running on my tablet in order to measure the time. The video contains raw h264, which is made of a series of so-called NAL units. This can be played using VLC by telling VLC to use the h264 demuxer:
$ vlc video.bin --demux h264
The next thing I wanted to do was to see how many NAL units the video contained and what different types of units would I find in the video. I run the following script to search the recorded video for NAL units and list the headers of those units:
f=open('recording.bin') dump=f.read() f.close() p1=dump.find('000001'.decode('hex')) while(p1!=-1): print(dump[p1:(p1+5)].encode('hex')) p1=dump.find('000001'.decode('hex'),p1+1)The output of this program shows the following: there are 3039 NAL units in my video, that's about 28.43 NAL units per second, which is about two times the number of frames per second. The next question I wanted to ask was how many different types of NAL units were in the video (because I know close to nothing about h264 and therefore I wonder this type of things). I counted the different NAL unit types using the following command:
$ cat log.txt |sort |uniq -c |sort -nr
This gives the following output:
1286 000001a000 1286 000001419a 116 000001a100 116 00000168ee 116 000001674d 116 0000016588 1 0000011600 1 0000011200 1 0000010600
All relevant information about how h264 works can be found in this document: https://www.itu.int/rec/T-REC-H.264
In brief, the first byte after 01 contains the basic information about what type of NAL unit it is. The most significant bit of the byte is the forbidden bit and it should be 0, otherwise something is wrong with the encoder or the NAL unit is expected to be ignored. The next 2 bits are the nal_ref_idc and they have different meanings depending on the type of NAL unit. The least significant 5 bits are the nal_unit_type and they show what type of NAL unit we are facing. Going back to our list above, we get the following:
1286 000001a000 --> a0 = 1010 0000 meaning forbidden=1, ref_idc=1 and unit_type=0 1286 000001419a --> 41 = 0100 0001 meaning forbidden=0, ref_idc=2 and unit_type=1 116 000001a100 --> a1 = 1010 0001 meaning forbidden=1, ref_idc=1 and unit_type=1 116 00000168ee --> 68 = 0110 1000 meaning forbidden=0, ref_idc=3 and unit_type=8 116 000001674d --> 67 = 0110 0111 meaning forbidden=0, ref_idc=3 and unit_type=7 116 0000016588 --> 65 = 0110 0101 meaning forbidden=0, ref_idc=3 and unit_type=5 1 0000011600 --> 16 = 0001 0110 meaning forbidden=0, ref_idc=0 and unit_type=22 1 0000011200 --> 12 = 0001 0010 meaning forbidden=0, ref_idc=0 and unit_type=18 1 0000010600 --> 06 = 0000 0110 meaning forbidden=0, ref_idc=0 and unit_type=6I gave a quick look at the position of these NAL units in the timeline of the video, and more or less it can be summarised as follows:
- the a0 is a (wrong) short NAL unit and it always appears before 41. Their frequency of appearance seems to be constant and more or less matches the frame rate.
- a1, 68, 57 and 65 always appear together and they seem to appear regularly every 11-12 frames (every second?).
- 16, 12 and 06 appear rarely at what looks like random times in the video at a frequency that is less than once every minute.
Having given a look at all this, the next logic step seems to be to use libavcodec and pass the functions in this library the data from the NAL units upon arrival. I found a few examples online that use SDL to handle the graphical windows but I still need to figure out how to tell the software what codec to use. I'll keep you posted.