Capstone Project: Autonomous Humanoid with Conversational AI

Project Overview

Your final project synthesizes everything you've learned to build a fully autonomous humanoid robot that can:

Understand natural language commands via speech
Plan complex tasks using AI reasoning
Navigate autonomously in a home environment
Manipulate objects safely
Interact naturally with humans

Success Criteria

Your robot must complete this scenario:

Scenario: User is sitting in the living room and says: "I'm feeling cold and hungry. Can you help?"

Expected Behavior:

Robot understands the compound request

Plans to fetch blanket AND snack

Navigates to bedroom, retrieves blanket

Navigates to kitchen, retrieves snack

Returns to user, hands over items

Confirms verbally: "Here's a blanket and a snack. Anything else?"

System Architecture

┌─────────────────────────────────────────────────────────┐
│                     User Interface                       │
│                  (Speech Input/Output)                   │
└───────────────────────────┬─────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────┐
│               Cognitive Layer (LLM)                      │
│    ┌─────────────┐   ┌──────────────┐   ┌───────────┐  │
│    │   Whisper   │──▶│     GPT-4    │──▶│  Planner  │  │
│    │   (Speech)  │   │  (Reasoning) │   │ (Actions) │  │
│    └─────────────┘   └──────────────┘   └───────────┘  │
└───────────────────────────┬─────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────┐
│              Perception Layer (Isaac ROS)                │
│    ┌────────────┐  ┌──────────────┐  ┌──────────────┐  │
│    │  cuVSLAM   │  │    nvBlox    │  │   YOLOv8     │  │
│    │ (Odometry) │  │    (3D Map)  │  │  (Objects)   │  │
│    └────────────┘  └──────────────┘  └──────────────┘  │
└───────────────────────────┬─────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────┐
│                Control Layer (ROS 2)                     │
│    ┌────────────┐  ┌──────────────┐  ┌──────────────┐  │
│    │    Nav2    │  │   MoveIt 2   │  │  Joint Ctrl  │  │
│    │  (Motion)  │  │(Manipulation)│  │   (Low-lvl)  │  │
│    └────────────┘  └──────────────┘  └──────────────┘  │
└───────────────────────────┬─────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────┐
│              Hardware Layer (Jetson/Motors)              │
│           Sensors: RealSense, LiDAR, IMU                │
│         Actuators: 30+ servo motors                      │
└─────────────────────────────────────────────────────────┘

Milestones (8 Weeks)

Goal: Robot can navigate a mapped environment

Tasks:

Import humanoid URDF into Isaac Sim
Set up ROS 2 bridge
Configure cuVSLAM for odometry
Build 3D map with nvBlox
Deploy Nav2 for path planning

Deliverable: Video showing robot navigating from living room → kitchen → bedroom

Resources:

Week 3-4: Perception & Object Detection

Goal: Robot can detect and localize objects

Tasks:

Train YOLOv8 on household objects (cup, blanket, remote, etc.)
Integrate camera feed with ROS 2
Publish detected objects with 3D poses
Create a semantic map (object locations)

Deliverable: Screenshot showing labeled objects in RViz with 3D bounding boxes

Week 5-6: VLA Integration

Goal: Robot understands natural language and plans tasks

Tasks:

Integrate Whisper for speech-to-text
Connect GPT-4 for task planning
Build action executor (translate plans to ROS)
Test compound requests

Deliverable: Terminal log showing complete plan execution

Week 7: Manipulation

Goal: Robot can pick and place objects

Tasks:

Set up MoveIt 2 for arm control
Compute inverse kinematics
Plan grasp poses
Execute pick-and-place

Deliverable: Video of robot grasping a cup in simulation

Week 8: Integration & Testing

Goal: Complete end-to-end demo

Tasks:

Integrate all subsystems
Create demo environment in Isaac Sim (living room + kitchen)
Run full scenario 10 times
Measure success rate
Debug failures
Record final demo video

Metrics:

Metric	Target
Navigation success	>90%
Object detection recall	>85%
Grasp success	>70%
Full task completion	>60%
Average time	under 5 min

Deliverable:

3-minute demo video
Written report (5 pages)
Code repository link

Evaluation Rubric (100 points)

Technical Implementation (60 points)

Navigation (15 pts)
- Collision-free path planning
- Dynamic obstacle avoidance
- Accurate localization
Perception (15 pts)
- Object detection accuracy
- 3D pose estimation
- Semantic understanding
VLA Integration (15 pts)
- Speech recognition quality
- Task planning coherence
- Natural language understanding
Manipulation (15 pts)
- Grasp stability
- Motion planning smoothness
- Safety (no collisions)

System Design (20 points)

Architecture (10 pts)
- Modular design
- Clean interfaces
- Error handling
Code Quality (10 pts)
- Documentation
- Readability
- ROS 2 best practices

Documentation & Presentation (20 points)

Report (10 pts)
- Clear explanation of approach
- Analysis of results
- Discussion of limitations
Demo Video (10 pts)
- Shows full scenario
- Highlights key features
- Professional quality

Getting Started

1. Fork the Template

git clone https://github.com/YourUsername/humanoid-capstone-template
cd humanoid-capstone-template

2. Set Up Environment

# Install dependencies
sudo apt install ros-humble-isaac-ros-visual-slam ros-humble-isaac-ros-nvblox
pip install openai whisper ultralytics

# Build workspace
cd ~/humanoid_ws
colcon build
source install/setup.bash

3. Run Tests

# Test navigation
ros2 launch humanoid_bringup test_navigation.launch.py

# Test object detection
ros2 launch humanoid_perception test_detection.launch.py

# Test VLA
ros2 launch humanoid_vla test_planning.launch.py

Support

Office Hours: Thursdays 4-6 PM
Discussion Forum: GitHub Discussions
Slack: #capstone-help

Good luck building the future of robotics! 🚀

Submission

Due: Week 8, Friday 11:59 PM

Format:

Code repository (GitHub)
Demo video (YouTube/Vimeo)
Written report (PDF)

Submit via: Course portal

Project Overview​

Success Criteria​

System Architecture​

Milestones (8 Weeks)​

Week 1-2: Foundation & Navigation​

Week 3-4: Perception & Object Detection​

Week 5-6: VLA Integration​

Week 7: Manipulation​

Week 8: Integration & Testing​

Evaluation Rubric (100 points)​

Technical Implementation (60 points)​

System Design (20 points)​

Documentation & Presentation (20 points)​

Getting Started​

1. Fork the Template​

2. Set Up Environment​

3. Run Tests​

Support​

Submission​

Project Overview

Success Criteria

System Architecture

Milestones (8 Weeks)

Week 1-2: Foundation & Navigation

Week 3-4: Perception & Object Detection

Week 5-6: VLA Integration

Week 7: Manipulation

Week 8: Integration & Testing

Evaluation Rubric (100 points)

Technical Implementation (60 points)

System Design (20 points)

Documentation & Presentation (20 points)

Getting Started

1. Fork the Template

2. Set Up Environment

3. Run Tests

Support

Submission