Rapidly becoming the next big thing, 1st with subject tracking on quad copters, then subject tracking on digital assistants. It's long been a dream to have an autonomous camera operator that tracks a subject. The Facebook Portal was the 1st sign lions saw that the problem was finally cracked. The problem is all existing tracking cameras are operated by services which collect the video & either sell it or report it to government agencies.
Compiling & fixing a machine vision library to run as fast as possible on a certain computer is such a monumental task, it's important to reuse it as much as possible. To simplify the task of a tracking camera, the same code is used to count reps & track a subject. The countreps program was a lot more complicated & consumed most of the time.
Previous work way back in Aug 2016 involved LIDAR.
There were other failed attempts with chroma keying, luma keying, difference keying. Then it lay dormant for 2 years.
The last year saw an explosion in CNN's for subject tracking. The key software is openpose. That theoretically allows a camera to track a whole body or focus in on a head, but it doesn't allow differentiating bodies. Differentiating bodies still would require chroma keying or face matching.
If a subject takes off its clothing, as subjects do in the kind of thing that this camera would be used for, chroma keying would get thrown off. Face tracking doesn't work so well when the subject looks away.
Rotating the Escam 45 degrees & defishing buys a lot of horizontal range, but causes it to drop lions far away. The image has to be slightly defished to detect any of the additional edge room, which shrinks the center.
The unrotated, unprocessed version does better in the distance & worse in the edges, but covers more of the corners. Another idea is defishing every other rotated frame, giving the best of both algorithms 50% of the time. The problem is the pose coordinates would alternate between the 2 algorithms, making the camera oscillate. It would have to average when the 2 algorithms were producing a match. When it went from 1 algorithm to 2 algorithms, there would be a glitch.
In practical use, the camera is going to be in the corner of a cheap motel room, so the maximum horizontal angle is only 90 degrees, while longer distance is desirable. The safest solution is neither rotating or defishing.
Another discovery about the Escam is if it's powered up below 5V, it ends up stuck in night vision B&W mode. It has to be powered up at 5V to go into color mode. From then on, it can work properly below 5V. So the USB hub has to be powered off a battery before plugging it into the laptop. Not sure if openpose works any better in color, but it's about getting your money's worth.
The next problem is the escam can only approximate what the DSLR sees, so there's parallax error. There's too much error to precisely get the head in the top of frame.
After much work to get the power supply, servo control, escam mount, & DSLR mount barely working, there was just enough data to see the next problems. Getting smooth motion is a huge problem. Body parts coming in & out of view is a huge problem. Parallax error made it aim high.
With 4x3 cropping, it was 4fps, but detected a lot of positions it couldn't with the Samsung in 4x3. Decided this was as cropped as the lion kingdom would go.
The escam is a lot narrower than 180 degrees, maybe only slightly wider than the gopro. Detection naturally fails near the edges. Because so much of the lens is cropped, the next step would be mounting it diagonally & stretching it in software.
The answer is no. It doesn't provide a USB webcam interface as sometimes claimed by the internet. It has no USB device port & doesn't even allow access to the SD card over USB. We can assume the venerable USB webcam is dead & everything is now using TCP/IP.
It only provides a delayed H.264 stream over wifi. It claims to support a standardized protocol for security cams called ONVIF. More confusingly, IP cams have shifted to being marketed only as security cams, rather than a replacement for the venerable USB webcam.
It requires an access point to stream video. It initializes as an access point only to allow the user to configure it to use an external access point. Then, it converts to a station for the rest of its life.
Once the ugly case was removed, a much smaller Hi3518 board was revealed, with RTL8188 USB wifi dongle.
The serial port for debugging was easily spotted. Helas, the internet doesn't actually develop with it. They only use the debugging console to run ifconfig or list directories. The debugging console outputs diagnostics for RTSP requests, the IP address, & the MAC address, which are very useful for troubleshooting.
There are some notes on using the debugging console.
The latency with H.264 & the app varies from 1-10 seconds. The chip physically supports JPEG for no latency, but it's not exposed on the app. The macbook can't provide a wifi access point unless it's hard wired to the internet. A raspberry pi would have to be used as a portable access point. It's not as bad as it sounds, considering the only alternative would require connecting the USB host on the Hi3818 to a USB bridge before connecting it to the macbook.
The complete workout with all the body positions shows how reliable that counter eventually became. No more lost counts. More attention spent on the TV. Just can't use wifi for anything else.
Made the GPU server just send coordinates for the tablet to overlay on the video. This was within the bandwidth limitations, but it still occasionally got behind by a few reps. It might be transient wifi usage. The next step would be reducing the JPEG quality from 90. Greyscale doesn't save any bandwidth. There's also omitting the quantization tables, but by then, you're better off using pipes with a native x264 on the tablet.
It never ceases to amaze lions how that algorithm tracks body poses with all the extra noise in the image, even though it has many errors. It's like a favorite toy as a kid, but a toy consisting of an intelligence.
The pose classifier eventually was bulletproof. This system had already dramatically improved the workout despite all its glitches. You know it's a game changer because no-one watches the video. Just like marriage & politics, the most mundane gadgets no-one cares about are the most revolutionary while the most exciting gadgets everyone wants are the least revolutionary.
The mane problem became connectivity. Lockups lasting several minutes & periods of many dropped frames continued. It seemed surmountable compared to the machine vision. Finally did the long awaited router upgrade.
Its days of computing ended 6 years ago. It is now the apartment complex's most powerful router. Seem to recall this was the laptop lions used while dating ... the single women. Lions wrote firmware on it while the single women watched TV or ate their doritos.
The other problem was for the pose classifier to work, it can't drop any frames. The tablet needs to capture only as many frames as the GPU can handle & the GPU needs to process all of them. Relying on the GPU to drop frames caused it to drop a lot of poses when a large number of frames got buffered & received in the time span of processing a single frame.
The hunt for connectivity eventually led to wifi not being fast enough to send JPEG frames both ways, at 6fps. The reason skype can send video 2 ways is H.264, which entails a native build of x264 so isn't worth it for this program. Wifi is also bad at multiplexing 2 way traffic. Slightly better results came from synchronizing the frame readbacks, but only after eliminating them completely did the rep counts have no delay & the lost connections go away. It counted instantaneously, even on the ancient broken tablet, but it didn't look as good without the openpose outlines. There's also bluetooth for frame readbacks or drawing vectors & statistics on the tablet, if H.264 is that bad.
The trick with the tracker is to use a spherecam on USB, but opencv can't deterministically select the same camera, scale the input, & display in fullscreen mode on a mac.
There is running a network client on virtualbox to handle all the I/O or fixing opencv. Running a network client in a virtual machine to fix the real machine is ridiculous, so the decision was made to fix opencv, uninstall the brew version of opencv & recompile it, caffe, & openpose again from scratch.
brew remove opencv
Openpose on the mac can't link to a custom opencv. You have to hack some cmake files or just link it manually.
Discovered openpose depends heavily on the aspect ratio & lens projection. It works much faster but detects fewer poses on a 1x1 aspect ratio. It works much slower & detects more poses on a 16x9 crop of the center. It works much slower on 512x512 & much faster on 480x480. Pillarboxing & stretching to fit the entire projection in 16x9 don't work.
Spherical projections are as good as equirectangular projections. The trick is detecting directly above & below with the 16x9 cropping. It's not going to detect someone looking down on it like the Facebook portal, but the current application doesn't need to see directly above & below. There's also scanning each frame in 2 passes.
Despite paying a fortune to ship an ESCAM Q8 from China in 7 days, it was put on the same boat as the free shipping, 2 weeks ago.
An edited video provided the 1st documentation of a computer counting reps. The reality was it only made 3.5fps instead of 4.5 & this wasn't fast enough to detect hip flexes.
It detected mane hair instead of arms.
Then it froze for several minutes before briefly hitting 4.5, then dropping back to 3.5. The only explanation was thermal throttling.
If only obsolete high end graphics cards could drop to $30 like they did 20 years ago, but tried running it again on the GTX 1050 with the framerate at 5fps. This actually allowed it to play 1920x1080 video with an acceptable amount of studdering, while also processing machine vision. They actually have some form of task switching on the GPU, but PCI express is a terrible way to transfer data.
As it has always been with GPU computing, we have 32GB of mane memory on the fastest bus being used for nothing while all the computations are done on 2GB of graphics memory on the PCI bus.
It began again with a webcam on the pan/tilt module. A wide angle lens was required for any tracking at 4fps to have a chance. 6 years on, the lion kingdom was still determined to find a use for that pan/tilt mount.
Tracking seemed easy: define a bounding box around the lion & point the camera at the bounding box. What really happened was body parts always came in & out of view because of lighting, glitches, movement out of frame, & movement too close to the camera. The system could not keep the head in view, even with a higher framerate, & would just center on a paw or a leg. This quickly showed tracking based only on the photographed field of view to be an unsolvable problem.
The Facebook Portal uses a spherical cam to track a subject & zooms in based on a bounding box of the entire body. It uses face detection to optionally track individual humans while using YOLO to detect the entire body. It probably doesn't have enough clockcycles for full pose detection.
Amazingly, for a corporation which once promoted live streaming & vlogging, it has no support for content creation or recording video locally. It just makes phone calls. It's either a new height of corporate dysfunction or only private phone calls are worth monetizing.
Any tracking camera needs a spherical cam as the tracker & to just live with the parallax error. Live video from a spherical cam is a premium feature. The Gear 360 requires you to use a Samsung phone to get live video. Newer cameras charge $300 for live output. There is exactly 1 hemisphere webcam which streams over USB, the ESCAM Q8 on a 1 month boat ride from China.
With the sun facing the camera, the lion kingdom remembered abandoning quad copters flown by machine vision 6 years ago because of the extremely perfect lighting conditions required. Today's 12 figure valuations are based on the exact same level of errors. There hasn't been any improvement at all in the lighting requirements.
Finding corner cases would continue once a night, when the lighting was suitable & the lion was fresh. Detecting hip flexes with the rear leg remaned the hardest problem. There was requiring the ankles to be different heights, requiring the longest leg to be longer than the back, debouncing reps. The framerate was never high enough to throw out false images, but it could throw out reps which happened too close together. The profile camera view was never solved. You always have to face the camera. Squats, situps, & pushups were all bulletproof, however.
As much as openpose accomplishes with video alone, 3D pose estimation is really needed. It's such a practical need, it's hard to believe kinekts never became ubiquitous rep counters & camera trackers, except for the cost.
As long as you manually select the exercise, it can be quite scaleable. One could imagine ordinary humans setting it up in a gym & using it to count a wide variety of exercises.
The machine vision blues continue just like 2013, but at 10,000,000x the valuations. It has trouble with the occluded leg & rubber band. Another problem is double counting.
Another $130 GTX 1050 would help immensely. A debouncing algorithm can help with double counting, but the lion just has to play with the workout to get it to register. There's also just facing the camera during this exercise.
After much loathing, finally caved into the idea of a tablet serving as the camera & user interface, while still offloading the pose tracking over the network. Tablets have been utterly hopeless at finding any practical use for the last 10 years, but have finally come unto their own for computer vision.
Computer vision platforms need a subject facing camera, viewfinder which can be seen from far away, & always have to be on. That's making tablets perfect & why Facebook, Goog, & Amazon have finally started recycling their failed tablet inventory as the latest round of intelligent assistants.
The 6 year old Asus with a dead camera, dead GPS, marginal wifi, & marginal battery still has a better front facing camera than the webcam. After much implementation of a network protocol & fixing openpose bugs, the tablet was counting reps with the macbook as a remote compute server.
False positives abounded. If it detected a relevant pose anywhere in the 5 minutes before starting the workout, it falsely counted it as part of a rep. It also sagged into the carpet over time, causing the lion to drift out of frame. There is a long delay in counting, but it's bearable.
The Asus is still the best looking of all the tablets. The knife edge somehow works better than the fat edge of the ipad pro, although the ipad pro is overall the best gadget ever made in all history. The Asus looked utterly gigantic when it was new. It's tiny, now. The ipad might even be better for the rep counter, if lions weren't keen on finding uses for everything old.