Pose2Art: low cost video pose capture for interactive spaces

jerry-isdaleJerry Isdale wrote 10/19/2022 at 00:00 • 5 min read • Like

Pose2Art Project

Jerry Isdale,

Maui Institute of Art and Technology

notes started oct 12 2022

This Page is now supersceded by the Project

Basic idea

Create a low cost, (mostly) open source 'smart' camera system to capture human pose and use it to drive interactive immersive art installations.  Yes, its kinda like the Microsoft Kinect/Azure 'product', but DIY open to upgrading.

  1. use one or more smart cameras to capture Human Pose from video stream
  2. stream that data (multicast?) – pose points (OSC), raw frames, skeleton overlay (video), outline, etc etc
  3. receive stream to drive CGI Rendering Engine using skeleton data etc
  4. project that stream on wall (or use all above streams as input to video switcher/overlay



Multiple cameras can be used to  create 3d pose tracking

Stream video from edge cameras to rendering engine. Not sure of usable protocol

Tracking multiple people, in contact with each other (dancing, acro-yoga etc)

Depth Camera: cameras that give Point Cloud depth data could be used


This very much a work in process (with uneven progress).
18Nov: I have gotten the Camera/Pose etc working and feeding Points over network to PC via OSC which feeds data into TouchDesigner

Currently i;m note taking on both pi and pc (with multiple boot sd cards for different OS on Pi)

Example Art Installations

(insert links to still/video of pose tracking in interactive environments)


Oct 14

10 steps in Pose2Art process

(make a graphic of this flow)

  1. image capture
  2. pose extract
  3. pose render (optional)
  4. stream send pose data, video optionally
  5. physical send ( transport)
  6. physical rcv
  7. stream receive
  8. stream process
  9. render/overlay
  10. project/display

Project Plan and this Page

Nov 20 status:

The QEngineering Raspberry Pi image comes with TensorFlowLite properly installed, along with a C++ demo of pose capture.  Adding Libosc++ got it emitting OSC data. Fair bit of mucking around with network static ip, routes, and firewalls was required, but finally got it working with PC.  Found at least 1 TouchDesigner example of reading OSC pose data and got it working.  Looking into other demos, like a Kinect driving TD Therimin simulator.

OSC (OpenSoundControl) currently chosen as data transport. It is VERY much user defined messages, and I have yet to see any 'standards' for how to name the Pose data. Kinect tracked point names might be useful.

Survey of System Demos

Web searching turned up a LOT of links on pose estimation using machine learning. Some include source code repositories and documentation, others are academic papers or other non-replicable demos.  This section is a summary of some.  Hopefully one will be found to actually work?

30 oct 2022: links below this update

Attempting to run demos has been Interesting, with lots of classic dependency issues. Some Python pose examples were made to work, but alas Very slowly.  The QEngineering rPi example is in c++ and its basics ran much faster (6-10fps) than the python ones.  It (and many other examples) use the TensorFlow Lite implementations to run on rPi.  TFLite seems to be decent and there are both pretrained and reduced models available as well as the TF blog on how to train a new set on something more than the COCO yoga and dance poses. The options here after getting basics done.

Next steps are putting the pose data into a Message (OSC based) and sending that over network (UDP) to a 'server'.

More dependency issues over Socket/ASIO and OSC libraries, but some progress.

There is no 'standard' for the OSC pose messages. There are some examples of message, json, xml etc pose with either the 17 point or 33 point models, even some showing multiple person tracking data.   Since we are writing our own code, we can define it on both ends.  Receiver will likely be TouchDesigner, at least for the first prototype.

links to Pose

Tracking demos with code

- SAT LivePose

- rpi TensorFlowLite, PoseNet

Ethan Del's rpi_pose_estimation builds on Tensor Flow Lite and seems to be simple python with webcam

Medium article: rpi_pose_estimation is a Python library typically used in Artificial Intelligence, Machine Learning, Deep Learning, Tensorflow, Raspberry Pi applications. rpi_pose_estimation has no bugs, it has no vulnerabilities and it has low support. However rpi_pose_estimation build file is not available. You can download it from GitHub.

uses OpenPose

ActionAi and YogAI - Jetson

ActionAI is follow on to YogAI . The latter was touted as using rPi while new ActionAI uses JetsonNano

- web TensorFlow OpenPose js

There are several projects that use browser based (javascript) webcam pose estimation. These might be worth looking into, although more for their use of underlying Pose tools phoneCam+touchdesigner

April Tags



Active Oct 2022; pre-alpha

The FreeMoCap Project: A free-and-open-source, hardware-and-software-agnostic, minimal-cost, research-grade, motion capture system and platform for decentralized scientific research, education, and training


used in Ethan rPi pose Active early 2022 OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Steam webcam

Capture Systems:

ML Pose Engines Systems

TensorFlow/TensorFlow Lite

PoseNet OpenCV

nVidia BodyPoseNet, TensorRT

SAT LivePose


stream protocols - video, data (osc)

Video Streaming: ndi

Data: OpenSoundControl (OSC)

OpenSoundControl (OSC) is a data transport specification (an encoding) for realtime message communication among applications and hardware.


Transport - memory, multicast, disk

The Transport layer moves P2A assets between machines. This may be using in-memory on same system or across network.

render engines- td, unity, unreal, resolume

Rendering Engines should accept at least one of raw video, video+pose overlay, and pose data; using only OSC/PoseData would drive avatars and/or animation/synthesis.

likely first  demo: TouchDesigner variant on Kinect

skeleton interaction with particles,