# Human hand gesture detection
Detect common gestures with Google's [mediapipe library](https://github.com/google-ai-edge/mediapipe) and a pre-trained model.

To run this notebook, you need to install the mediapipe library first. However the current version of mediapipe may have compatibility issues. Therefore, you first downgrade `numpy` and `opencv-python`. You can do this via pip:

```bash
pip uninstall numpy opencv-python
pip install numpy<2 opencv-python<4.12
```

Then install mediapipe:

```bash
pip install mediapipe
```

In [None]:
import mediapipe as mp
import cv2
import numpy as np
import matplotlib.pyplot as plt
import os

%matplotlib inline

from IPython.display import display, Image, clear_output
import time

from mediapipe.tasks import python
from mediapipe.tasks.python import vision
from mediapipe import solutions
from mediapipe.framework.formats import landmark_pb2
mp.__version__

## Initialize model file

We will use a pre-trained model from mediapipe for gesture detection. Go to [Google Mediapipe Gesture recognizer](https://ai.google.dev/edge/mediapipe/solutions/vision/gesture_recognizer/index#models) and download the model file to `../models/`. Then run the cell below to set up the recognizer.

In [None]:
model_path = '../models/gesture_recognizer.task'
if os.path.exists(model_path):
    print(f'Model found at: {model_path}')
else:
    print(f'Model not found at: {model_path}')

### Question
What is a model file and what does pre-trained mean in this context?

## Create the task and set options
We only use the basic options for this demo. More options can be found in the [Mediapipe documentation](https://ai.google.dev/edge/mediapipe/solutions/vision/gesture_recognizer/python).

Check that the text output below shows no errors.

In [None]:
base_options = python.BaseOptions(model_asset_path=model_path)
options = vision.GestureRecognizerOptions(base_options=base_options)
recognizer = vision.GestureRecognizer.create_from_options(options)

### Question
What kind of options could be set for the gesture recognizer?

# Test inference on an Image
Load an image with a hand and run the cell below to see gesture detection results.

We use a helper function to draw the detected landmarks and the recognized gesture on the image. This function has been adapted from the blog article of Alice Heiman: [Recognize Hand Landmarks using Google MediaPipe and OpenCV]((https://medium.com/@aliceheimanxyz/recognize-hand-landmarks-using-google-mediapipe-and-opencv-9ca0a052ce75))

In [None]:
MARGIN = 10  # pixels
FONT_SIZE = 1
FONT_THICKNESS = 1
HANDEDNESS_TEXT_COLOR = (255, 255, 255)  # white

# adapted from Alice Heiman (https://medium.com/@aliceheimanxyz/recognize-hand-landmarks-using-google-mediapipe-and-opencv-9ca0a052ce75)
def draw_landmarks_on_image(rgb_image, detection_result):
    gesture_text = detection_result.gestures[0][0].category_name if detection_result.gestures else "No Gesture"
    # Draw the hand annotations on the image.
    hand_landmarks_list = detection_result.hand_landmarks
    handedness_list = detection_result.handedness
    annotated_image = np.copy(rgb_image)
    # Loop through the detected hands to visualize.
    for idx in range(len(hand_landmarks_list)):
        hand_landmarks = hand_landmarks_list[idx]
        handedness = handedness_list[idx]
        # Draw the hand landmarks.
        hand_landmarks_proto = landmark_pb2.NormalizedLandmarkList()
        hand_landmarks_proto.landmark.extend([
          landmark_pb2.NormalizedLandmark(x=landmark.x, y=landmark.y, z=landmark.z) for landmark in hand_landmarks
        ])
        solutions.drawing_utils.draw_landmarks(
          annotated_image,
          hand_landmarks_proto,
          solutions.hands.HAND_CONNECTIONS,
          solutions.drawing_styles.get_default_hand_landmarks_style(),
          solutions.drawing_styles.get_default_hand_connections_style())
        # Get the top left corner of the detected hand's bounding box.
        height, width, _ = annotated_image.shape
        x_coordinates = [landmark.x for landmark in hand_landmarks]
        y_coordinates = [landmark.y for landmark in hand_landmarks]
        text_x = int(min(x_coordinates) * width)
        text_y = int(min(y_coordinates) * height) - MARGIN
        # Draw handedness (left or right hand) and detected gesture on the image.
        cv2.putText(annotated_image, f"{handedness[0].category_name} {gesture_text}",
                    (text_x, text_y), cv2.FONT_HERSHEY_DUPLEX,
                    FONT_SIZE, HANDEDNESS_TEXT_COLOR, FONT_THICKNESS, cv2.LINE_AA)
    return annotated_image

# Load an image and run inference
cv_image = cv2.imread('../images/hand.jpg')
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=cv_image)
recognition_result = recognizer.recognize(mp_image)
# draw results in the image
annotated_image = draw_landmarks_on_image(cv_image, recognition_result)
# show the image with annotations
plt.imshow(cv2.cvtColor(annotated_image, cv2.COLOR_BGR2RGB))
plt.axis('off')
plt.show()


### Question
What needs to be considered to detect the handedness (left or right) correctly?

## Run Inference on an Webcam Stream

Connect your webcam and run the cell below to see gesture detection in real-time. There are seven gestures that can be recognized:

- Open Palm
- Fist
- Thumbs Up
- Thumbs Down
- Victory
- Index Pointing Up
- I love you

Can you make all of them? Do they get recognized correctly?

In [None]:
# Capture and display live frames from the default webcam inline in Jupyter

cap = cv2.VideoCapture(0)
if not cap.isOpened():
    raise RuntimeError("Cannot open webcam (VideoCapture(0) failed)")

start = time.time()
try:
    while True:
        ret, frame = cap.read()
        if not ret:
            print("Failed to grab frame")
            break
        timestamp_ms = int((time.time() - start) * 1000)
        # Convert the frame received from OpenCV to a MediaPipeâ€™s Image object.
        mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame)
        recognition_result = recognizer.recognize(mp_image)
        annotated_image = draw_landmarks_on_image(frame, recognition_result)
        # Encode as JPEG and display inline (fast and avoids creating new matplotlib figures)
        _, buf = cv2.imencode('.jpg', annotated_image)
        display(Image(data=buf.tobytes()))
        clear_output(wait=True)
        time.sleep(0.03)  # ~30 FPS-ish; adjust as needed
except KeyboardInterrupt:
    # Stop the loop with Ctrl+C in the notebook
    pass
finally:
    cap.release()
    clear_output()
    # Only for debuging:  # print(recognition_result)
    print("Webcam stopped and released.")





## Task

Explore further options for mediapipe gesture recognition and modify the code to use them. For example, you can try face detection or track multiple hands. Refer to the [Mediapipe documentation](https://ai.google.dev/edge/mediapipe/solutions/guide) for more details.