Capstone Project: Autonomous Humanoid with Conversational AI
Project Overview
Your final project synthesizes everything you've learned to build a fully autonomous humanoid robot that can:
- Understand natural language commands via speech
- Plan complex tasks using AI reasoning
- Navigate autonomously in a home environment
- Manipulate objects safely
- Interact naturally with humans
Success Criteria
Your robot must complete this scenario:
Scenario: User is sitting in the living room and says: "I'm feeling cold and hungry. Can you help?"
Expected Behavior:
- Robot understands the compound request
- Plans to fetch blanket AND snack
- Navigates to bedroom, retrieves blanket
- Navigates to kitchen, retrieves snack
- Returns to user, hands over items
- Confirms verbally: "Here's a blanket and a snack. Anything else?"
System Architecture
┌─────────────────────────────────────────────────────────┐
│ User Interface │
│ (Speech Input/Output) │
└───────────────────────────┬─────────────────────────────┘
│
┌───────────────────────────▼─────────────────────────────┐
│ Cognitive Layer (LLM) │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ Whisper │──▶│ GPT-4 │──▶│ Planner │ │
│ │ (Speech) │ │ (Reasoning) │ │ (Actions) │ │
│ └─────────────┘ └──────────────┘ └───────────┘ │
└───────────────────────────┬─────────────────────────────┘
│
┌───────────────────────────▼─────────────────────────────┐
│ Perception Layer (Isaac ROS) │
│ ┌────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ cuVSLAM │ │ nvBlox │ │ YOLOv8 │ │
│ │ (Odometry) │ │ (3D Map) │ │ (Objects) │ │
│ └────────────┘ └──────────────┘ └──────────────┘ │
└───────────────────────────┬─────────────────────────────┘
│
┌───────────────────────────▼─────────────────────────────┐
│ Control Layer (ROS 2) │
│ ┌────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Nav2 │ │ MoveIt 2 │ │ Joint Ctrl │ │
│ │ (Motion) │ │(Manipulation)│ │ (Low-lvl) │ │
│ └────────────┘ └──────────────┘ └──────────────┘ │
└───────────────────────────┬─────────────────────────────┘
│
┌───────────────────────────▼─────────────────────────────┐
│ Hardware Layer (Jetson/Motors) │
│ Sensors: RealSense, LiDAR, IMU │
│ Actuators: 30+ servo motors │
└─────────────────────────────────────────────────────────┘
Milestones (8 Weeks)
Week 1-2: Foundation & Navigation
Goal: Robot can navigate a mapped environment
Tasks:
- Import humanoid URDF into Isaac Sim
- Set up ROS 2 bridge
- Configure cuVSLAM for odometry
- Build 3D map with nvBlox
- Deploy Nav2 for path planning
Deliverable: Video showing robot navigating from living room → kitchen → bedroom
Resources:
Week 3-4: Perception & Object Detection
Goal: Robot can detect and localize objects
Tasks:
- Train YOLOv8 on household objects (cup, blanket, remote, etc.)
- Integrate camera feed with ROS 2
- Publish detected objects with 3D poses
- Create a semantic map (object locations)
Deliverable: Screenshot showing labeled objects in RViz with 3D bounding boxes
Week 5-6: VLA Integration
Goal: Robot understands natural language and plans tasks
Tasks:
- Integrate Whisper for speech-to-text
- Connect GPT-4 for task planning
- Build action executor (translate plans to ROS)
- Test compound requests
Deliverable: Terminal log showing complete plan execution
Week 7: Manipulation
Goal: Robot can pick and place objects
Tasks:
- Set up MoveIt 2 for arm control
- Compute inverse kinematics
- Plan grasp poses
- Execute pick-and-place
Deliverable: Video of robot grasping a cup in simulation
Week 8: Integration & Testing
Goal: Complete end-to-end demo
Tasks:
- Integrate all subsystems
- Create demo environment in Isaac Sim (living room + kitchen)
- Run full scenario 10 times
- Measure success rate
- Debug failures
- Record final demo video
Metrics:
| Metric | Target |
|---|---|
| Navigation success | >90% |
| Object detection recall | >85% |
| Grasp success | >70% |
| Full task completion | >60% |
| Average time | under 5 min |
Deliverable:
- 3-minute demo video
- Written report (5 pages)
- Code repository link
Evaluation Rubric (100 points)
Technical Implementation (60 points)
-
Navigation (15 pts)
- Collision-free path planning
- Dynamic obstacle avoidance
- Accurate localization
-
Perception (15 pts)
- Object detection accuracy
- 3D pose estimation
- Semantic understanding
-
VLA Integration (15 pts)
- Speech recognition quality
- Task planning coherence
- Natural language understanding
-
Manipulation (15 pts)
- Grasp stability
- Motion planning smoothness
- Safety (no collisions)
System Design (20 points)
-
Architecture (10 pts)
- Modular design
- Clean interfaces
- Error handling
-
Code Quality (10 pts)
- Documentation
- Readability
- ROS 2 best practices
Documentation & Presentation (20 points)
-
Report (10 pts)
- Clear explanation of approach
- Analysis of results
- Discussion of limitations
-
Demo Video (10 pts)
- Shows full scenario
- Highlights key features
- Professional quality
Getting Started
1. Fork the Template
git clone https://github.com/YourUsername/humanoid-capstone-template
cd humanoid-capstone-template
2. Set Up Environment
# Install dependencies
sudo apt install ros-humble-isaac-ros-visual-slam ros-humble-isaac-ros-nvblox
pip install openai whisper ultralytics
# Build workspace
cd ~/humanoid_ws
colcon build
source install/setup.bash
3. Run Tests
# Test navigation
ros2 launch humanoid_bringup test_navigation.launch.py
# Test object detection
ros2 launch humanoid_perception test_detection.launch.py
# Test VLA
ros2 launch humanoid_vla test_planning.launch.py
Support
- Office Hours: Thursdays 4-6 PM
- Discussion Forum: GitHub Discussions
- Slack: #capstone-help
Good luck building the future of robotics! 🚀
Submission
Due: Week 8, Friday 11:59 PM
Format:
- Code repository (GitHub)
- Demo video (YouTube/Vimeo)
- Written report (PDF)
Submit via: Course portal