Overview & Contributions | This course project was focused on solving a humanoid robotics grand challenge. Our group chose to attempt to solve
the synchronization of multimodal behaivors. Our task was to create a simulated humanoid and allow it to freely interact
with a human user in VR. We created an environment in Unreal Engine 5 and our humanoid was designed with MetaHuman.
My roles on this project were:
|
Introduction | One of the biggest current challenges for humanoid robot development is conquering the Uncanny Valley. It is a phenomenon that robots' affinity decreases as it is becoming more and more like a human. Before solving this problem, we need to first conquer the problem of synchronizing the robot's multimodal behaviors since it is essential for information exchange between it and the user. To solve this grand challenge, we propose a solution that mainly uses neural networks that considers different scenarios. The method combines GPT3 (third-generation Generative Pre-trained Transformer), predefined keywords trigger command, and Mixamo, a database that uses motion-captured animations. Our final result allows the user to interact with the humanoid in Unreal Engine 5 and give it commands. The humanoid is able to execute these commands. |
Methods | Our method requires a combination of six sub-steps, and each requires a different technique or software.
The first step is to set up the environment, acquiring the coordinates of items. The second step is using
a neural network to train the walking movement of the humanoid robot. The third step is using predefined
action scripts to simulate the action of picking up the specified item. The fourth step is to combine
walking with the action of picking things up, making the entire process smooth. The fifth step is to
reconstruct the entire action in a VR environment, and the user (human) should be able to clearly see
the action of the robot. The final step is to integrate GPT3 which allows the robot to verbally interact
with the user.
|
Results | Our blueprint enables the user to chat and command our humanoid in Unreal Engine 5, providing
a quick and accurate response from the robot, and making the user feel comfortable interacting
with it. The integration of GPT3 provides a prompt response to user input, though response time
may vary depending on the complexity of the question. Keyword-sensitive commands allow for accurate
execution of predefined actions, and the robot can dynamically adjust its path to avoid obstacles.
Each action is linked to a specific Mixamo action neural network for precise execution. However,
the accuracy of GPT3's responses to commands may vary. |
Discussion |
Overall, our approach offers a significant advancement in the development of humanoid robots,
addressing the challenges of synchronization of the robot's multimodal behaviors and the
generation of movement variants. Our solution offers a highly realistic and immersive environment
for testing and training robotic systems. However, further research is necessary to address the
limitations of our approach, such as the accuracy of GPT-3's responses to user commands. Future
research could also focus on the development of more sophisticated language models that offer more
accurate responses to user queries.
|