How can robots purchase abilities via interactions with the bodily world? An interview with Jiaheng Hu

February 16, 2026

17

How can robots purchase abilities via interactions with the bodily world? An interview with Jiaheng Hu

One of many key challenges in constructing robots for family or industrial settings is the necessity to grasp the management of high-degree-of-freedom methods akin to cell manipulators. Reinforcement studying has been a promising avenue for buying robotic management insurance policies, nevertheless, scaling to advanced methods has proved difficult. Of their work SLAC: Simulation-Pretrained Latent Motion Area for Entire-Physique Actual-World RL, Jiaheng Hu, Peter Stone and Roberto Martín-Martín introduce a technique that renders real-world reinforcement studying possible for advanced embodiments. We caught up with Jiaheng to search out out extra.

What’s the matter of the analysis in your paper and why is it an fascinating space for research?

This paper is about how robots (particularly, family robots like cell manipulators) can autonomously purchase abilities through interacting with the bodily world (i.e. real-world reinforcement studying). Reinforcement studying (RL) is a common studying framework for studying from trial-and-error interplay with an surroundings, and has enormous potential in permitting robots to be taught duties with out people hand-engineering the answer. RL for robotics is a really thrilling subject, as it could open potentialities for robots to self-improve in a scalable manner, in direction of the creation of general-purpose family robots that may help folks in our on a regular basis lives.

What have been a number of the points with earlier strategies that your paper was making an attempt to handle?

Beforehand, a lot of the profitable purposes of RL to robotics have been completed by coaching solely in simulation, then deploying the coverage within the real-world immediately (i.e. zero-shot sim2real). Nevertheless, such a technique has massive limitations: on one hand, it isn’t very scalable, as it’s good to create task-specific, high-fidelity simulation environments that extremely match the real-world surroundings that you simply need to deploy the robotic in, and this will usually take days or months for each activity. Alternatively, some duties are literally very onerous to simulate, as they contain deformable objects and contact-rich interactions (for instance, pouring water, folding garments, wiping whiteboard). For these duties, the simulation is commonly fairly completely different from the true world. That is the place real-world RL comes into play: if we are able to enable a robotic to be taught by immediately interacting with the bodily world, we don’t want a simulator anymore. Nevertheless, whereas a number of makes an attempt have been made in direction of realizing real-world RL, it’s really a really onerous downside since: 1. Pattern-inefficiency: RL requires numerous samples (i.e. interplay with the surroundings) to be taught good conduct, which is commonly unattainable to gather in massive portions within the real-world. 2. Security Points: RL requires exploration, and random exploration within the real-world is commonly very very harmful. The robotic can break itself and can by no means have the ability to recuperate from that.

May you inform us in regards to the technique (SLAC) that you simply’ve launched?

So, creating high-fidelity simulations may be very onerous, and immediately studying within the real-world can also be actually onerous. What ought to we do? The important thing concept of SLAC is that we are able to use a low-fidelity simulation surroundings to help subsequent real-world RL. Particularly, SLAC implements this concept in a two-step course of: in step one, SLAC learns a latent motion area in simulation through unsupervised reinforcement studying. Unsupervised RL is a method that permits the robotic to discover a given surroundings and be taught task-agnostic behaviors. In SLAC, we design a particular unsupervised RL goal that encourages these behaviors to be protected and structured.

Within the second step, we deal with these realized behaviors as the brand new motion area of the robotic, the place the robotic does real-world RL for downstream duties akin to wiping whiteboards by making choices on this new motion area. Importantly, this technique enable us to bypass the 2 largest downside of real-world RL: we don’t have to fret about questions of safety because the new motion area is pretrained to be all the time protected; and we are able to be taught in a sample-efficient manner as a result of our new motion area is educated to be very structured.

The robotic finishing up the duty of wiping a whiteboard.

How did you go about testing and evaluating your technique, and what have been a number of the key outcomes?

We check our strategies on an actual Tiago robotic – a excessive degrees-of-freedom, bi-manual cell manipulation, on a collection of very difficult real-world duties, together with wiping a big whiteboard, cleansing a desk, and sweeping trash right into a bag. These duties are difficult from three points: 1. They’re visuo-motor duties that require processing of high-dimensional picture data. 2. They require the whole-body movement of the robotic (i.e. controlling many degrees-of-freedom on the similar time), and three. They’re contact-rich, which makes it onerous to simulate precisely. On all of those duties, our technique permits us to be taught high-performance insurance policies (>80% success fee) inside an hour of real-world interactions. By comparability, earlier strategies merely can’t clear up the duty, and sometimes danger breaking the robotic. So to summarize, beforehand it was merely not attainable to resolve these duties through real-world RL, and our technique has made it attainable.

What are your plans for future work?

I feel there may be nonetheless much more to do on the intersection of RL and robotics. My eventual aim is to create actually self-improving robots that may be taught solely by themselves with none human involvement. Extra not too long ago, I’ve been desirous about how we are able to leverage basis fashions akin to vision-language fashions (VLMs) and vision-language-action fashions (VLAs) to additional automate the self-improvement loop.

About Jiaheng

Jiaheng Hu is a 4th-year PhD scholar at UT-Austin, co-advised by Prof. Peter Stone and Prof. Roberto Martín-Martín. His analysis curiosity is in Robotic Studying and Reinforcement Studying, with the long-term aim of growing self-improving robots that may be taught and adapt autonomously in unstructured environments. Jiaheng’s work has been revealed at top-tier Robotics and ML venues, together with CoRL, NeurIPS, RSS, and ICRA, and has earned a number of finest paper nominations and awards. Throughout his PhD, he interned at Google DeepMind and Ai2, and is a recipient of the Two Sigma PhD Fellowship.

Learn the work in full

SLAC: Simulation-Pretrained Latent Motion Area for Entire-Physique Actual-World RL, Jiaheng Hu, Peter Stone, Roberto Martín-Martín.

AIhub
is a non-profit devoted to connecting the AI group to the general public by offering free, high-quality data in AI.

Lucy Smith
is Senior Managing Editor for Robohub and AIhub.

How can robots purchase abilities via interactions with the bodily world? An interview with Jiaheng Hu

What’s the matter of the analysis in your paper and why is it an fascinating space for research?

What have been a number of the points with earlier strategies that your paper was making an attempt to handle?

May you inform us in regards to the technique (SLAC) that you simply’ve launched?

How did you go about testing and evaluating your technique, and what have been a number of the key outcomes?

What are your plans for future work?

About Jiaheng

Learn the work in full

Related Articles

Plug-and-Play AI: Reworking robotics with modular expertise

Growing an optical tactile sensor for monitoring head movement throughout radiotherapy: an interview with Bhoomika Gandhi

Humanoid developer Agility Robotics rebrands

Stay Connected

Latest Articles

Pakistan’s 5G Spectrum Auction: A New Digital Dawn (March 10, 2026)

A Connecticut dwelling with a inexperienced roof and minimal website influence

Plug-and-Play AI: Reworking robotics with modular expertise

Qualcomm CEO outlines three pillars for 6G

Is the Pentagon allowed to surveil People with AI?

ABOUT US