To develop the system, researchers from New York University and Meta tested Stretch, a commercially available robot made by Hello Robot that consists of a wheeled unit, a tall pole, and a retractable arm, in a total of 10 rooms in five homes.
While in a room with the robot, a researcher would scan their surroundings using Record3D, an iPhone app that uses the phone’s lidar system to take a 3D video to share with the robot.
The OK-Robot system then ran an open-source AI object detection model over the video’s frames. This, in combination with other open-source models, helped the robot identify objects in that room like a toy dragon, a tube of toothpaste, and a pack of playing cards, as well as locations around the room including a chair, a table, and a trash can.
The team then instructed the robot to pick up a specific item and move it to a new location. The robot’s pincer arm did this successfully in 58.5% of cases; the success rate rose to 82% in rooms that were less cluttered. (Their research has not yet been peer reviewed.)
The recent AI boom has led to enormous leaps in language and computer vision capabilities, allowing robotics researchers access to open-source AI models and tools that didn’t exist even three years ago, says Matthias Minderer, a senior computer vision research scientist at Google DeepMind, who was not involved in the project.
“I would say it’s quite unusual to be completely reliant on off-the-shelf models, and that it’s quite impressive to make them work,” he says.