Close

Will I ever be able to play the video stream?

A project log for Controlling a JJRC H37 Elfie quad from a PC

The JJRC Elfie Quadcopter comes with an Android/iOS app to control it from the phone. Can we control it from our own software?

adria.junyent-ferreadria.junyent-ferre 03/26/2017 at 22:362 Comments

I decided to write a bit more about my attempts to decode the video stream from the quadcopter. So far, I haven't achieved much but I learnt a bit about h264 and captured some video for experimenting. Writing a little program that reads the stream from the quadcopter and plays it in real time isn't rocket science, it should be easy to anyone with the patience to learn about libavcodec but it isn't working for me yet.

I used the following code to dump the video from the quadcopter:

import socket
import sys
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('172.16.10.1', 8888))

magicword='495464000000580000009bf89049c926884d4f922b3b33ba7eceacef63f77157ab2f53e3f768ecd9e18547b8c22e21d01bfb6b3de325a27b8fb3acef63f77157ab2f53e3f768ecd9e185eb20be383aab05a8c2a71f2c906d93f72a85e7356effe1b8f5af097f9147f87e'.decode('hex')

s.send(magicword)
data = s.recv(106) 
n=0
while n<10000: #write replace by while 1 if you want this to not stop
    data = s.recv(1024)
    sys.stdout.write(data)
    n=n+1
s.close()

This operation takes about 107 seconds to complete. The generated file weights 5.9 MB and contains about 1,364 frames (according to VLC). This means that the video stream of the quadcopter is about 54.35 KiB/s with an approximate frame rate of 12.76 fps. I recorded a video with the quadcopter facing a timer running on my tablet in order to measure the time. The video contains raw h264, which is made of a series of so-called NAL units. This can be played using VLC by telling VLC to use the h264 demuxer:

$ vlc video.bin --demux h264

The next thing I wanted to do was to see how many NAL units the video contained and what different types of units would I find in the video. I run the following script to search the recorded video for NAL units and list the headers of those units:

f=open('recording.bin')
dump=f.read()
f.close()

p1=dump.find('000001'.decode('hex'))
while(p1!=-1):
  print(dump[p1:(p1+5)].encode('hex'))
  p1=dump.find('000001'.decode('hex'),p1+1)
The output of this program shows the following: there are 3039 NAL units in my video, that's about 28.43 NAL units per second, which is about two times the number of frames per second. The next question I wanted to ask was how many different types of NAL units were in the video (because I know close to nothing about h264 and therefore I wonder this type of things). I counted the different NAL unit types using the following command:
$ cat log.txt |sort |uniq -c |sort -nr

This gives the following output:

1286 000001a000
1286 000001419a
 116 000001a100
 116 00000168ee
 116 000001674d
 116 0000016588
   1 0000011600
   1 0000011200
   1 0000010600

All relevant information about how h264 works can be found in this document: https://www.itu.int/rec/T-REC-H.264

In brief, the first byte after 01 contains the basic information about what type of NAL unit it is. The most significant bit of the byte is the forbidden bit and it should be 0, otherwise something is wrong with the encoder or the NAL unit is expected to be ignored. The next 2 bits are the nal_ref_idc and they have different meanings depending on the type of NAL unit. The least significant 5 bits are the nal_unit_type and they show what type of NAL unit we are facing. Going back to our list above, we get the following:

1286 000001a000 --> a0 = 1010 0000 meaning forbidden=1, ref_idc=1 and unit_type=0
1286 000001419a --> 41 = 0100 0001 meaning forbidden=0, ref_idc=2 and unit_type=1
 116 000001a100 --> a1 = 1010 0001 meaning forbidden=1, ref_idc=1 and unit_type=1
 116 00000168ee --> 68 = 0110 1000 meaning forbidden=0, ref_idc=3 and unit_type=8
 116 000001674d --> 67 = 0110 0111 meaning forbidden=0, ref_idc=3 and unit_type=7
 116 0000016588 --> 65 = 0110 0101 meaning forbidden=0, ref_idc=3 and unit_type=5
   1 0000011600 --> 16 = 0001 0110 meaning forbidden=0, ref_idc=0 and unit_type=22
   1 0000011200 --> 12 = 0001 0010 meaning forbidden=0, ref_idc=0 and unit_type=18
   1 0000010600 --> 06 = 0000 0110 meaning forbidden=0, ref_idc=0 and unit_type=6
I gave a quick look at the position of these NAL units in the timeline of the video, and more or less it can be summarised as follows:

Having given a look at all this, the next logic step seems to be to use libavcodec and pass the functions in this library the data from the NAL units upon arrival. I found a few examples online that use SDL to handle the graphical windows but I still need to figure out how to tell the software what codec to use. I'll keep you posted.

Discussions

Xgadget wrote 04/09/2017 at 16:29 point

With this live.py -> https://pastebin.com/9mSayKK9

I can play the livestream with mplayer and this command: "python live.py - | mplayer -fps 30 -demuxer h264es -nosound -noidx -mc 0 -"

or in vlc with: "python live.py - | vlc --demux=h264 -"

But the image is with both players very bad (70% of the video image is pixelated): http://imgur.com/a/YQ4XP

I hope that helps a bit.

  Are you sure? yes | no

Max Golubev wrote 04/23/2017 at 06:28 point

I have tried OP's scripts and it turns out to work quite fine if piped to vlc, e.g.: "python2 dump_video.py | vlc - --demux h264". By quite fine I mean that I can see continuous video stream, but eventually delay adds up to being about 2 seconds.

I have also discovered that the glitch you have mentioned appears only in low-light conditions and is absent even at day, even if it's cloudy. Is it same for you?

  Are you sure? yes | no