Lab 6: Seeing is Believing

Goal: Use computer vision to enable Pupper to follow a person by processing (fisheye) camera input!

In this lab, you’ll use an object detection model to detect a person in Pupper’s field of view and control its movement to follow that person. You’ll implement a simple tracking and searching behavior using a state machine, allowing Pupper to maintain focus on the target when visible and initiate a search if the target moves out of view.

Note: The object detector used in this lab can detect multiple types of objects. However, for simplicity in this lab, the detection array has already been filtered to only include detections of people. This ensures that Pupper will respond solely to human targets.

Lab slides

Lab document

We have installed an Arducam fisheye camera with a Raspberry Pi AI Kit module with the HAILO-8L chip to accelerate CV computations on your Puppers. Feel free to check out the specs here: https://github.com/hailo-ai/hailort.

Step 0. Setup

  1. Prepare the Environment

    Your Puppers have been reflashed with a new OS for the AI labs (just like from last week)! This was done following the documentation here. However, there are a few additional libraries you’ll need to install for this lab. Follow the instructions below:

  2. Install Foxglove

    Install Foxglove locally on your own computer. Don’t use the browser version. Foxglove is a visualization tool for seeing ROS information live from the robot. In this lab you’ll use it to see Pupper’s camera feed and object detections in real time, which is crucial for understanding if your code is working correctly.

  3. Install Dependencies

    For this lab, you’ll need to install the following dependencies (turn off the robot stack if it is running: sudo systemctl stop robot):

pip install supervision
pip install loguru
cd ~/pupperv3-monorepo/ros2_ws/src/common
git clone https://github.com/ros-perception/vision_msgs.git
cd ~/pupperv3-monorepo/ros2_ws
bash build.sh
  1. Clone the Starter Code

    Note: The code repo refers to lab 7 since the vision lab was lab 7 last quarter. For this offering, we decided to swap the two labs, but we will still use the same Git repo.

    Clone the starter repository from lab_7_2024 GitHub Repo on Pupper git clone https://github.com/cs123-stanford/lab_7_2024.git.

  2. Start the Necessary Processes

    Initialize the system with these commands:

    cd ~/lab_7_2024
    ./run.sh
    

    Note

    Make sure to keep this process running continuously in a separate terminal whenever you are testing your code (including visualization on Foxglove!), as it launches nodes for image publishing, object detection, foxglove, and the RL controller.

  3. Connect Foxglove to Pupper

    1. Connect the Pi to your laptop with an Ethernet cable/adapter.

    2. SSH into Pupper with special options: ssh -A -L 8765:localhost:8765 pi@pupper.local

    3. Open Foxglove, click Open Connection, leave the default websocket URL as is, and click Open

      ../../../_images/connect_localhost.png

      Connecting Foxglove to the Raspberry Pi.

      If you are having trouble connecting, try turning internet sharing off, enabling all options, and then turning it back on. You can also SSH and visualize topics from Pupper over wifi, but this is not recommended as it is slower and less reliable.

    4. Visualize the “annotated_image” topic with people detections by selecting the gear icon on the top right panel that says /camera/image_raw to configure it. Under General, set the topic to /annotated_image and the calibration to None.

      ../../../_images/lab7_2.png

      Select the gear icon on the top right panel that says /camera/image_raw to configure it. Under General, set the topic to /annotated_image and the calibration to None.

    1. Go fullscreen: Click the icon with the 3 dots in the top right corner of the window and select “fullscreen”.

      ../../../_images/lab7_3.png

      Click the icon with the 3 dots in the top right corner of the window and select “fullscreen”.

    2. Check detections: You should see a camera feed with bounding boxes around detected people. If you don’t see any detections, ask a TA. If the image is upside down, edit this line in hailo_detection.py to flip the image. If the image is blurry, ask a TA to help you adjust the lens.

      ../../../_images/lab7_4.png
  4. Review the Starter Code

    Open lab_7.py and take a look at the code structure. Notice the two main callback functions:

    • detection_callback: Triggered whenever a new detection message is received. This is where you’ll process detections and determine the target’s location.

    • timer_callback: Runs periodically to update Pupper’s behavior based on the current state and target position. This is where you’ll implement the control logic for tracking or searching.

    Most of your work will happen in these callbacks, where you’ll add code to process detections and control Pupper based on target visibility.

Step 1. Object Detection

In this section, you’ll work on extracting and processing target position information from the camera feed.

  1. Inspect Detection Messages

    Add a breakpoint in detection_callback to examine the detections message (use breakpoint() to open pdb). Observe the structure of each detection, noting how the position of each bounding box is stored. Check the ROS Message Documentation to understand the fields of the message.

  2. Extract Bounding Box Positions

    Print the x coordinate of each detected bounding box to see where each detected object appears within the image, following the documentation on how to find the x of each detections object (this is a valuable skill for determining how to find the fields of ROS messages).

DELIVERABLE: How do you get the x value of the detection from msg. Write out the full line of Python code.

  1. Normalize X Position

    Convert the x position to a range between -1.0 and 1.0 using the IMAGE_WIDTH constant, with 0 representing the center of the image. This will help you interpret the target’s position more easily. Alternatively, you can look at the image extracted from the fisheye camera to customize the normalization! That may yield a better result.

  2. Verify Position

    Print the normalized x value and observe how it changes in Foxglove as you move in front of the camera. Make sure that the frame is bounded by the normalization and your value does not extend beyond that range.

  3. Identify the Most Centered Bounding Box

    Find the bounding box that is closest to the center of the image (i.e., with an x value nearest to 0). This will be your target, and you should save its x position in a member variable for use in control logic. Hint msg of detection_callback contains a list of detections. We do a naive approach where we only want to track the most central of all the detected objects.

  4. Track the Time of Last Detection

    In detection_callback, update a member variable to store the time of the most recent detection. This variable will later be used in timer_callback to determine whether to switch Pupper’s state to “searching” if too much time has passed without a detection.

DELIVERABLE: Take a video of you moving across the frame (left/right, up/down), and show the numbers changing within the normalization range. Upload this video with your submission to Gradescope.

DELIVERABLE: To implement this method, we always choose the most central object within the camera frame to have Pupper track. However, there are a number of cases where we Pupper should actually continue tracking the same person, regardless of if they are moving out of the frame, which may not necessarily always be the same person. Can you come up with another method that might accomplish this? How would you make sure that you are tracking the same object (the detections array may change the object index between any given frame)? Answer these questions in your lab document, and upload a video of your implementation.

Step 2. Visual Servoing

Now that you can detect and locate the target, you’ll implement a control mechanism to keep Pupper oriented toward it. (Implement in timer_callback when state == TRACK)

  1. Proportional Control

    Implement a proportional controller to calculate a yaw velocity command based on the target’s normalized x position. Define a proportional gain constant, which controls how quickly Pupper turns to center the target.

  2. Test on Stand

    Place Pupper on a stand and observe how it adjusts its yaw as you move left and right in front of the camera. It should aim to keep you centered in its view.

  3. Tune on Floor

    Place Pupper on the floor and adjust the proportional gain for smooth turning. Aim to have it follow you naturally as you move around.

DELIVERABLE: Tune the gain so that Pupper is able to keep up with the normal pace of a person walking. How did you go about tuning the gain for smooth turning? Take a video and upload to Gradescope.

Step 3. Search and Track

Here, you’ll add a search behavior to help Pupper look for you if it loses sight of the target, allowing it to return to tracking when you’re back in view. You’ll also command a forward velocity so that the robot follows when you are detected.

DELIVERABLE: Draw a state machine diagram describing how Pupper should transition between the SEARCH and TRACK states. In particular, highlight what makes Pupper transition between the two states and list all the cases to make the diagram comprehensive. Upload an image to the Gradescope submission.

  1. Search Mode (Implement in timer_callback when state == SEARCH)

    Set a constant yaw velocity to make Pupper rotate in a specific direction (left or right) based on where it last saw the target.

  2. Implement State Transitions

    • Track to Search Transition

      In timer_callback, use the member variable for the time of the last detection to check how much time has passed since Pupper last saw the target. If this time exceeds a defined threshold, switch to the SEARCH state.

    • Search to Track Transition

      If a detection occurs within the timeout period, switch back to TRACK mode.

    • Test Transitions

      Place Pupper on the floor and ensure that it enters search mode when the target is out of view, then resumes tracking when the target reappears.

  3. Move Forward While Tracking

    When in TRACK mode, set a positive linear velocity to make Pupper advance toward the target.

  4. Tune Constants

    Experiment with different values for the proportional gain, timeout threshold, search yaw velocity, and forward velocity to make Pupper’s behavior smooth and responsive.

DELIVERABLE: Upload a video of Pupper tracking a person using the camera. Write about some of the deficiencies in the current implementation, and what you think may help fix it.

By the end of this lab, you will have implemented a basic computer vision-based tracking system that enables Pupper to autonomously follow a person. The simple state machine will allow Pupper to handle target loss by searching for the target, making the tracking behavior more robust. Experiment with tuning to optimize Pupper’s performance. Enjoy watching Pupper follow you around!