MediaPipe Handpose
Note: this model can only detect a maximum of one hand in the input - multi-hand detection is coming in a future release.
MediaPipe Handpose is a lightweight ML pipeline consisting of two models: A palm detector and a hand-skeleton finger tracking model. It predicts 21 3D hand keypoints per detected hand. For more details, please read our Google AI blogpost.
Given an input, the model predicts whether it contains a hand. If so, the model returns coordinates for the bounding box around the hand, as well as 21 keypoints within the hand, outlining the location of each finger joint and the palm.
More background information about the model, as well as its performance characteristics on different datasets, can be found here: https://drive.google.com/file/d/1sv4sSb9BSNVZhLzxXJ0jBv9DqD-4jnAz/view
Check out our demo, which uses the model to detect hand landmarks in a live video stream.
This model is also available as part of MediaPipe, a framework for building multimodal applied ML pipelines.
Performance
MediaPipe Handpose consists of ~12MB of weights, and is well-suited for real time inference across a variety of devices (40 FPS on a 2018 MacBook Pro, 35 FPS on an iPhone11, 6 FPS on a Pixel3).
Installation
Using yarn
:
$ yarn add @tensorflow-models/handpose
Using npm
:
$ npm install @tensorflow-models/handpose
Note that this package specifies @tensorflow/tfjs-core
and @tensorflow/tfjs-converter
as peer dependencies, so they will also need to be installed.
Usage
To import in npm:
const handpose = require('@tensorflow-models/handpose');
or as a standalone script tag:
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-converter"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/handpose"></script>
Then:
async function main() {
const model = await handpose.load();
const predictions = await model.estimateHands(document.querySelector("video"));
if (predictions.length > 0) {
for (let i = 0; i < predictions.length; i++) {
const keypoints = predictions[i].landmarks;
for (let i = 0; i < keypoints.length; i++) {
const [x, y, z] = keypoints[i];
console.log(`Keypoint ${i}: [${x}, ${y}, ${z}]`);
}
}
}
}
main();
Parameters for handpose.load()
handpose.load()
takes a configuration object with the following properties:
-
maxContinuousChecks - How many frames to go without running the bounding box detector. Defaults to infinity. Set to a lower value if you want a safety net in case the mesh detector produces consistently flawed predictions.
-
detectionConfidence - Threshold for discarding a prediction. Defaults to 0.8.
-
iouThreshold - A float representing the threshold for deciding whether boxes overlap too much in non-maximum suppression. Must be between [0, 1]. Defaults to 0.3.
-
scoreThreshold - A threshold for deciding when to remove boxes based on score in non-maximum suppression. Defaults to 0.75.
Parameters for handpose.estimateHands()
-
input - The image to classify. Can be a tensor, DOM element image, video, or canvas.
-
flipHorizontal - Whether to flip/mirror the facial keypoints horizontally. Should be true for videos that are flipped by default (e.g. webcams).