Rapidly becoming the next big thing, 1st with subject tracking on quad copters, then subject tracking on digital assistants. It's long been a dream to have an autonomous camera operator that tracks a subject. The Facebook Portal was the 1st sign lions saw that the problem was finally cracked. The problem is all existing tracking cameras are operated by services which collect the video & either sell it or report it to government agencies.
Compiling & fixing a machine vision library to run as fast as possible on a certain computer is such a monumental task, it's important to reuse it as much as possible. To simplify the task of a tracking camera, the same code is used to count reps & track a subject. The countreps program was a lot more complicated & consumed most of the time.
Previous work way back in Aug 2016 involved LIDAR.
There were other failed attempts with chroma keying, luma keying, difference keying. Then it lay dormant for 2 years.
The last year saw an explosion in CNN's for subject tracking. The key software is openpose. That theoretically allows a camera to track a whole body or focus in on a head, but it doesn't allow differentiating bodies. Differentiating bodies still would require chroma keying or face matching.
If a subject takes off its clothing, as subjects do in the kind of thing that this camera would be used for, chroma keying would get thrown off. Face tracking doesn't work so well when the subject looks away.
The complete workout with all the body positions shows how reliable that counter eventually became. No more lost counts. More attention spent on the TV. Just can't use wifi for anything else.
Made the GPU server just send coordinates for the tablet to overlay on the video. This was within the bandwidth limitations, but it still occasionally got behind by a few reps. It might be transient wifi usage. The next step would be reducing the JPEG quality from 90. Greyscale doesn't save any bandwidth. There's also omitting the quantization tables, but by then, you're better off using pipes with a native x264 on the tablet.
It never ceases to amaze lions how that algorithm tracks body poses with all the extra noise in the image, even though it has many errors. It's like a favorite toy as a kid, but a toy consisting of an intelligence.
The pose classifier eventually was bulletproof. This system had already dramatically improved the workout despite all its glitches. You know it's a game changer because no-one watches the video. Just like marriage & politics, the most mundane gadgets no-one cares about are the most revolutionary while the most exciting gadgets everyone wants are the least revolutionary.
The mane problem became connectivity. Lockups lasting several minutes & periods of many dropped frames continued. It seemed surmountable compared to the machine vision. Finally did the long awaited router upgrade.
Its days of computing ended 6 years ago. It is now the apartment complex's most powerful router. Seem to recall this was the laptop lions used while dating ... the single women. Lions wrote firmware on it while the single women watched TV or ate their doritos.
The other problem was for the pose classifier to work, it can't drop any frames. The tablet needs to capture only as many frames as the GPU can handle & the GPU needs to process all of them. Relying on the GPU to drop frames caused it to drop a lot of poses when a large number of frames got buffered & received in the time span of processing a single frame.
The hunt for connectivity eventually led to wifi not being fast enough to send JPEG frames both ways, at 6fps. The reason skype can send video 2 ways is H.264, which entails a native build of x264 so isn't worth it for this program. Wifi is also bad at multiplexing 2 way traffic. Slightly better results came from synchronizing the frame readbacks, but only after eliminating them completely did the rep counts have no delay & the lost connections go away. It counted instantaneously, even on the ancient broken tablet, but it didn't look as good without the openpose outlines. There's also bluetooth for frame readbacks or drawing vectors & statistics on the tablet, if H.264 is that bad.
An edited video provided the 1st documentation of a computer counting reps. The reality was it only made 3.5fps instead of 4.5 & this wasn't fast enough to detect hip flexes.
It detected mane hair instead of arms.
Then it froze for several minutes before briefly hitting 4.5, then dropping back to 3.5. The only explanation was thermal throttling.
If only obsolete high end graphics cards could drop to $30 like they did 20 years ago, but tried running it again on the GTX 1050 with the framerate at 5fps. This actually allowed it to play 1920x1080 video with an acceptable amount of studdering, while also processing machine vision. They actually have some form of task switching on the GPU, but PCI express is a terrible way to transfer data.
As it has always been with GPU computing, we have 32GB of mane memory on the fastest bus being used for nothing while all the computations are done on 2GB of graphics memory on the PCI bus.
It began again with a webcam on the pan/tilt module. A wide angle lens was required for any tracking at 4fps to have a chance. 6 years on, the lion kingdom was still determined to find a use for that pan/tilt mount.
Tracking seemed easy: define a bounding box around the lion & point the camera at the bounding box. What really happened was body parts always came in & out of view because of lighting, glitches, movement out of frame, & movement too close to the camera. The system could not keep the head in view, even with a higher framerate, & would just center on a paw or a leg. This quickly showed tracking based only on the photographed field of view to be an unsolvable problem.
The Facebook Portal uses a spherical cam to track a subject & zooms in based on a bounding box of the entire body. It uses face detection to optionally track individual humans while using YOLO to detect the entire body. It probably doesn't have enough clockcycles for full pose detection.
Amazingly, for a corporation which once promoted live streaming & vlogging, it has no support for content creation or recording video locally. It just makes phone calls. It's either a new height of corporate dysfunction or only private phone calls are worth monetizing.
Any tracking camera needs a spherical cam as the tracker & to just live with the parallax error. Live video from a spherical cam is a premium feature. The Gear 360 requires you to use a Samsung phone to get live video. Newer cameras charge $300 for live output. There is exactly 1 hemisphere webcam which streams over USB, the ESCAM Q8 on a 1 month boat ride from China.
With the sun facing the camera, the lion kingdom remembered abandoning quad copters flown by machine vision 6 years ago because of the extremely perfect lighting conditions required. Today's 12 figure valuations are based on the exact same level of errors. There hasn't been any improvement at all in the lighting requirements.
Finding corner cases would continue once a night, when the lighting was suitable & the lion was fresh. Detecting hip flexes with the rear leg remaned the hardest problem. There was requiring the ankles to be different heights, requiring the longest leg to be longer than the back, debouncing reps. The framerate was never high enough to throw out false images, but it could throw out reps which happened too close together. The profile camera view was never solved. You always have to face the camera. Squats, situps, & pushups were all bulletproof, however.
As much as openpose accomplishes with video alone, 3D pose estimation is really needed. It's such a practical need, it's hard to believe kinekts never became ubiquitous rep counters & camera trackers, except for the cost.
As long as you manually select the exercise, it can be quite scaleable. One could imagine ordinary humans setting it up in a gym & using it to count a wide variety of exercises.
The machine vision blues continue just like 2013, but at 10,000,000x the valuations. It has trouble with the occluded leg & rubber band. Another problem is double counting.
Another $130 GTX 1050 would help immensely. A debouncing algorithm can help with double counting, but the lion just has to play with the workout to get it to register. There's also just facing the camera during this exercise.
After much loathing, finally caved into the idea of a tablet serving as the camera & user interface, while still offloading the pose tracking over the network. Tablets have been utterly hopeless at finding any practical use for the last 10 years, but have finally come unto their own for computer vision.
Computer vision platforms need a subject facing camera, viewfinder which can be seen from far away, & always have to be on. That's making tablets perfect & why Facebook, Goog, & Amazon have finally started recycling their failed tablet inventory as the latest round of intelligent assistants.
The 6 year old Asus with a dead camera, dead GPS, marginal wifi, & marginal battery still has a better front facing camera than the webcam. After much implementation of a network protocol & fixing openpose bugs, the tablet was counting reps with the macbook as a remote compute server.
False positives abounded. If it detected a relevant pose anywhere in the 5 minutes before starting the workout, it falsely counted it as part of a rep. It also sagged into the carpet over time, causing the lion to drift out of frame. There is a long delay in counting, but it's bearable.
The Asus is still the best looking of all the tablets. The knife edge somehow works better than the fat edge of the ipad pro, although the ipad pro is overall the best gadget ever made in all history. The Asus looked utterly gigantic when it was new. It's tiny, now. The ipad might even be better for the rep counter, if lions weren't keen on finding uses for everything old.
The mane way to get more fps is decreasing netInputSize. There are diminishing returns with higher neuron count & higher noise with lower neuron count.
-1x368 was the default & too big for 2GB of RAM.
-1x256 gives 2fps & might be fast enough to track old people
-1x160 gives 4fps & noisy positions, but the lowest framerate needed to get all the reps on video
-1x128 gives 6fps & much noisier positions
The next task was classifying exercises based on noisy pose data. With the -1x160, openpose presented a few problems.
Falsly detecting humans.
Dropping lions of nearly the same pose as detected lions.
Differentiating between squats & a hip flex proved difficult, since the arms can either be horizontal or vertical in a squat & the knee angles are within the error bounds.
Situps & squats were also within the same error bounds.
The problem would only get harder if more exercises were added. Noise in the pose estimation & lack of 3D information reduced the angle precision.
To get the 100% accuracy of a manual counter, it needed prior knowledge of the exercise being performed. Manually setting the exercise on 1 device while setting up another device as a camera is a real pain, so the easiest solution was hard coding the total number of reps & exercises to be performed. The lion wouldn't be able to throw in a few extra if it was a good day.
A better camera might improve results, in any case.
There's also making a neural network to classify exercises & using YOLO instead of pose detection. Pose detection is the most general purpose algorithm & eventually the only one anyone is going to use. A neural network classifier would definitely not be reliable enough.
Despite its limitations, it's amazing how what are essentially miniaturized vacuum tubes & copper can identify high level biological movements in photos.
The search for the cheapest portable clockcycles led to the ancient macbook. It has a GT 750M with 2Gig of RAM. It could support rep counter & camera tracker with the same openpose library, but it meant giving up on Linux. CUDA is another thing Virtualbox doesn't support & forget about running Linux natively on a macbook since 2013.
Development for Macos was always voodoo magic for someone who grew up with only commercial operating systems & no internet. It's now no different than Linux, ios, & android. The mane differences are the package manager on mac is brew, the compiler is clang instead of gcc, libraries end in .dylib instead of .so. The compiler takes a goofy -framework command which is a wrapper for multiple libraries. Then of course, there's the dreaded xcode-select command.
The notes for mac:
Openpose officially doesn't support CUDA on MAC, but hope springs eternal.
All the dependencies are system wide except caffe.
The mac drivers are a rats nest of dependencies:
Download CUDA, CUDNN, CUDA driver, & GPU driver for the current Macos
version. The drivers are not accessible from nvidia.com.
Obsolete caffe instructions:
http://caffe.berkeleyvision.org/installation.html#compilation"brew tap homebrew/science" fails but isn't necessary.
Some necessary packages:
brew install wget
brew install pkg-config
brew install cmake
To compile, clone caffe from microsoft/github.
comment out CPU_ONLY
comment out the required lines on the CUDA_ARCH line
comment out the Q ?= @ line
Can't compile "using xyz = std::xyz"or no nullptr?
In the Makefile, add -std=c++11 to CXXFLAGS += and to NVCCFLAGS without
an -Xcompiler flag. It must go directly to nvcc. Nvcc is some kind of
shitty wrapper for the host compiler that takes some options but
requires other options to be wrapped in -Xcompiler flags.
nvcc doesn't work with every clang++ version. The version of clang++
required by nvcc is given on
It requires installing an obsolete XCode & running
sudo xcode-select -s /Applications/Obsolete XCode
to select it.
cannot link directly with ...vecLib...:
in Makefile, comment out
LDFLAGS += -framework vecLib
Undefined symbol: cv::imread
in Makefile, add LDFLAGS += `pkg-config --libs opencv`
make builds it
make distribute installs it in the distribute directory, but also attempts
to build python modules. make -i distribute ignores the python modules.
Then install it in this directory:
cp -a bin/* /Volumes/192.168.56.101/root/countreps.mac/bin/
cp -a include/* /Volumes/192.168.56.101/root/countreps.mac/include/
cp -a lib/* /Volumes/192.168.56.101/root/countreps.mac/lib/
cp -a proto/* /Volumes/192.168.56.101/root/countreps.mac/proto/
cp -a python/* /Volumes/192.168.56.101/root/countreps.mac/python/
The openpose compilation:
Unknown CMake command "op_detect_darwin_version".
comment out the Cuda.cmake line
To build countreps:
To run it, specify the library path:
Can't parse message of type "caffe.NetParameter" because it is missing
required fields: layer.clip_param.min, layer.clip_param.max
The latest Caffe is officially broken.
Use revision f019d0dfe86f49d1140961f8c7dec22130c83154...
The leading solution is to always stream frames over a network to a big computer with GPU processing.
Another solution is solving a simpler problem than pose estimation for camera tracking & using pose estimation only for counting reps. Only camera tracking must be portable. Counting reps will always be done near the ryzen.
There are 2 competing libraries for GPU processing: OpenCL & CUDA. The choice depends on the CPU, GPU, & stock portfolio.
The latest algorithm is YOLO. There are rumors of higher frame rates, but no good installation examples.
Running openpose with a GPU requires CUDA & CUDNN. CUDA requires 2GB of downloads from nvidia.com. The version of CUDA must be compatible with the version of the graphics driver & there's no documentation. Driver 410.78 happened to work with cuda-repo-ubuntu1604-10-0-local-10.0.130-410.48_1.0-1_amd64.
CUDA also requires rebooting & manually loading the nvidia-uvm module.
To test your CUDA installation:
To print the status of your graphics card:
Caffe must be rebuilt with CUDA, then openpose.
To build Caffe with CUDA:
edit caffe/Makefile.config uncomment USE_CUDNN comment out CPU_ONLY tweek the CUDA_ARCH line edit BLAS_INCLUDE, BLAS_LIB, LIBRARY_DIRS to include /root/countreps
All the objects need -fPIC, but nvcc complains about it. Placing -fPIC after -Xcompiler in NVCCFLAGS, CXXFLAGS, LINKFLAGS but not in COMMON_FLAGS seems to fix it. All the dependencies for caffe were installed in /root/openpose.
PATH=$PATH:/root/countreps/bin make PATH=$PATH:/root/countreps/bin make distribute
Comment out the tools/caffe.cpp: time() function if there's an undefined reference to caffe::caffe_gpu_dot
The output goes in the distribute directory & must be copied manually.
cp -a bin/* /root/countreps/bin/ cp -a include/* /root/countreps/include/ cp -a lib/* /root/countreps/lib/ cp -a proto/* /root/countreps/proto/ cp -a python/* /root/countreps/python/
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/root/countreps/lib make VERBOSE=1 LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/root/countreps/lib make install
Check failed: error == cudaSuccess (2 vs. 0) out of memory
error is caused by the GPU running out of memory. Reduce the netInputSize variable to -1x256.
Openpose on the GeForce GTX 1050 hit 14 frames per second, but the computer can't do anything else with the GPU like play a video. CUDA is a return to 1980's single tasking, but it's still amazing how well it can track a human pose in a blurry photo.
The terrabytes of opaque libraries required to make a computer vision program are how all computing is going to be done in the future. All these libraries are going to be part of the base system. Using computer vision won't involve tweeking neural...
If you have multiple computer vision projects like lions do, each one was compiled for different versions of opencv & all its dependencies, so you can't have system wide dependencies. The reason cocoa pods works is it compiles all the dependencies inside the project. That's really not much of an innovation, but it's a political mountain to convince developers not to use system wide dependencies. Creating a dependency manager with a meaningless name was all about overcoming the political mountain & legitimizing having dependencies in the project directory.
Most dependencies were already compiled for a previous project & installed in the /root/countreps prefix. A gootuber recommended another version of openpose based on tensorflow, which might have fewer dependencies.
compiling OpenBLAS for Ryzen:
make TARGET=ZEN make TARGET=ZEN PREFIX=/root/countreps install
FLOAT in OpenBLAS/common.h conflicts with another definition & has to be
renamed FLOAT_ when compiling openpose.
OpenCV must be built with GTK support.
To build opencv: mkdir build cd build cmake -DCMAKE_INSTALL_PREFIX=/root/countreps/ ..
make # this doesn't work make install # running this a 2nd time is what installs libopencv.so make
It reads JPEG photos from test_input & writes output to test_output.
The test program processed 424 frames at 640x480 resolution.
2.15 seconds per frame on the 4.1Ghz Ryzen 7 2700x.
4 gig of RAM required.
The 1st test brought the same disappointment as encoding MPEG video in 1995, on a 33Mhz computer. It was amazingly good at tracking the subject, but too slow to do it in realtime. The next step would be compiling the CUDA dependencies. It still wouldn't be portable.
The embedded installations in modern subject tracking cameras all use NVidia TX2 boards at $800. Maybe there could be a cheaper solution using a laptop. Just like 1995, the lion kingdom can't afford the required hardware.