Hacker News

Humanoid robot: The evolution of Kawasaki’s challenge

28 points by hhs 4 days ago | 19 comments

Humanoid industrial robots are always a little confusing for me. The human form is not best suited for industrial tasks, and by making specialised robot arms, you could improve efficiency etc. It's only if you need to interact with systems that were designed for humans, and can't be modified to work with a more efficient robot that you need humanoids

ACCount37 13 hours ago

Every single task that was easy and economical to offload to a single purpose robot arm bolted down to the floor was already offloaded to a single purpose robot arm bolted down to the floor.

What remains is: all those quirky little one-off processes that aren't very amenable to "robot arm" automation, aren't worth the process design effort to make them amenable to it, and are currently solved by human labor.

Thus, you design new solutions to target that open niche.

Humans aren't perfect at anything, but they are passable at everything. Universal worker robots attempt to replicate that.

"A drop-in replacement for simple human labor" is a very lucrative thing, assuming one could pull it off. And that favors humanoid hulls.

Not that it's the form that's the bottleneck for that, not really. The problem of universal robots is fundamentally an AI problem. Today, we could build a humanoid body that could mechanically perform over 90% of all industrial tasks performed by humans, but not the AI that would actually make it do it.

ath92 12 hours ago

My impression is that a big part of the reason for the sudden boom in humanoid robots is that they lend themselves particularly well to RL based training using human-made training footage using VR. It’s much easier to have a robot broadly copy human actions if the robot looks like a human, instead of having to first translate the human action to your robot arm equivalent.

ACCount37 12 hours ago

The big part is the rise of modern AI in general.

The success of large multipurpose AI models trained on web-scale data pushed a lot of people towards "cracking general purpose robot AI might be possible within a decade".

Whether transfer learning from human VR/teleop data is the best way to do it remains uncertain - there are many approaches towards training and data collection. Although transfer learning from web-scale data, teleoperation and "RL IRL" are common - usually on different ends of the training pipeline.

Tesla got the memo earlier than most, because Musk is a mad bleeding edge technology demon, but many others followed shortly before or during the public 2022 AI boom.

throwawayffffas 12 hours ago

That is certainly a factor, but you also have to take into account that all these tasks in the factories are now centered around the human form because humans are doing them.

mrp23 11 hours ago

This framing clarifies something people get wrong about humanoid robots. The competition isn't "humanoid vs. better robot" — it's "humanoid vs. hiring another person."

And that reframes the economics entirely. You don't need the robot to be better than a human at any given task. You need the total cost of ownership to be lower than a salary, benefits, turnover, and training. That's a much easier bar to clear once the AI catches up to the body.

The interesting question is whether the AI problem gets solved generally (one model that can do everything) or whether we end up with task-specific AI in a general-purpose body — basically the robot arm paradigm wearing a humanoid suit.

ACCount37 10 hours ago

Em-dashes aside, I favor "one model that can do everything" in principle because scaling laws and distillation exist, and in practice because "one model that you can point at any problem" is a massive operational advantage.

If you can get 5 specialist models that can use the same robot body, you can also get 1 generalist model with more capacity and fold the specialists into it. If you have the in-house training that made those specialists, apply them to the generalist instead, the way we give general purpose AIs coding-specific training. If you don't, take the specialists as is and distill from them.

If you do it right, transfer learning might even give you a model that generalizes better and beats the specialists at their own game. Because your "special" tasks have partial subtask overlap that you got stronger training for, and contributed to diversity of environments. Robotics AI is training data starved as a rule.

Same kind of lesson we learned with LLM specialists - invest into a specialist model and watch the next gen generalists with better data and training crush it.

nosaidit 11 hours ago

> Every single task that was easy and economical to offload to a single purpose robot arm bolted down to the floor was already offloaded to a single purpose robot arm bolted down to the floor.

What about doing dishes? That could be done with one arm. Maybe not easy and economical yet, but could be.

There is plenty that has not been seen through.

Laundry folding machines are not in wide distribution.

Robots to put away laundry?

Etc. lots of mundane tasks.

clayhacks 34 minutes ago

Yeah I think there’s plenty of room for more bolted arm robots, it’s just similar to the humanoids they need better AI. There’s also room for more optimisation on the entire system design around more specialized robots. I think some industries work really well for that kind of revamp, and have already begun doing so. Others are waiting for the cost curve to fall for it to be worth the investment

throwawayffffas 13 hours ago

Yes that's pretty much it. Some people from boston dynamics were talking on a podcast. And they were saying that they sat down with toyota and figured out they could automate all the tasks in a factory, but it would take 10000 man years or something and toyota makes new trims every six months so you need about 10000 man years every six months or so.

It's the flexibility and adaptability with minimum training that's required.

spking 9 hours ago

I think this is the podcast you mentioned:

https://youtu.be/SRZ9E48B6aM?si=K_wwvu97agBZpFTa

kleiba 15 hours ago

Looking at the video at the bottom of the page, the robot looks like an old man, especially in the trash bag throwing sequence. Compare that to the recent Chinese kung-fu robots video...

bsboxe 15 hours ago

Completely different situations. The Unitree demos are prerecorded movements with no real adaptability. While visually impressive, they are highly tuned to perform that specific sequence of actions. If you walked in front of one it would have zero awareness of you and you’d be hit. They’re essentially “blind”. The last video here is likely demonstrating a teleoperated humanoid.

throwawayffffas 13 hours ago

> The Unitree demos are prerecorded movements with no real adaptability.

That is not true. The routine is preprogrammed, but there is adaptability. If there wasn't they would fall on the ground in the first 5 seconds. The movement involved in the routine we saw requires continuous adjustment. You can't just record the movement as you would with a video game animation, real physics get in the way and you end up on your back on the ground trying to do a jump and a backflip.

If you think I am wrong, sure I could be but have a look at atlas, https://www.youtube.com/watch?v=oe1dke3Cf7I

The robots motion is not preprogramed at all, see how much more smooth the motion is?

Thats because boston dynamics are using an approach where they try to calculate and take the dynamics of motion into account, just like Unitree.

The kawasaki approach is clearly to use overwhelming torques in an effort to cancel all the dymanics and produce fully controlled movement. Exactly what an old man does as well or a robotic arm in a factory. It's honestly embarrassing it looks like kawasaki has no progress in the last 30 years their robots still move like its 1996.

Have a look here https://underactuated.csail.mit.edu/intro.html for a more indepth explanation of the difference between the two approaches.

imtringued 12 hours ago

I'm honestly more concerned with your lack of understanding of these topics.

There are two main ways to accomplish what the kung-fu robot does.

First you train a reinforcement learning policy for balancing and walking and a bunch of dynamic movements, then you record the movement you want to perform using motion capture, then you play back the recorded trajectory.

Second, you train a reinforcement learning policy for balancing and walking, but also bake in the recorded movement into the policy directly.

Okay, I lied. There is also a third way. You can use model predictive control and build a balancing objective by hand and then replay the recorded trajectory, but I think this method won't be as successful for the shown choreography however it's what Boston dynamics did for a long time.

In both cases you will still be limited to a pre-recorded task description. Is this really that hard to understand? Do you really think someone taught the robot in Chinese language and by performing the movement in front of the camera of the humanoid how to perform the choreography like a real human or that the robot came up with the choreography on its own? Because that's the conclusion you have to draw if you deny the two methods I described above.

throwawayffffas 11 hours ago

As far as I understand the state of the art as of 2-3 years ago, there is no reinforcement learning at all at any point. At least not in the dynamics.

What you do is you map the dynamics of your system, and solve them, that solution is a program that can produce torque inputs in joints to move the system in the way that you want.

You then create a sequence of desirable intermediate and end states. The program then does it best to achieve these.

The difference between atlas and the kawasaki robots, is that to achieve those states, the kawasaki robots use a program that attempts to stop all inertial rotations and movements in order to maintain full control of it's movements at all times.

While atlas and the chinese robots leverage the inertia and gravity to achieve their movements, again you do that by solving a large set of equations, no ML required.

The GP described a system of prerecorded motions, like a video game animation, if you try to do that, and have no controller to adjust to the real time environment, you are just going to tip over and continue doing the prerecorded motions. We saw that with the Russian robot last year.

You can use a real human that does the choreography as a way for capturing the desired intermediate states that is the step that might require ML.

scotty79 11 hours ago

> As far as I understand the state of the art as of 2-3 years ago, there is no reinforcement learning at all at any point. At least not in the dynamics.

I think this might no longer be true. I don't think this years dance routine would have been possible without RL given how crappy robots were 2-3 years ago.

imtringued 12 hours ago

The kawasaki robot had to do something much more impressive, which is to lift a table while another human is holding the other end.

The actual concern here is that there are too many cuts. If the whole table movement sequence was uncut and fully autonomous, that would mean they have the most advanced humanoid robot software in the world.

It means they can autonomously find the correct grasping location on the table for both arms, meaning the robot needs to have a model of the table. The robot needs to know at what height to hold the table to keep the table level and compensate for the human pulling on the object while balancing and autonomously following the direction the human is pulling in.

Of course, since there were many cuts, we don't really know whether that's true. We also don't know if teleoperation is involved or not.

The Chinese robot dancing is cool, because it shows what the hardware is capable of, but it doesn't really show anything on the software side. Contacts with objects are hard in robotics and the kung-fu choreography avoids them for obvious reasons.

kotaKat 7 hours ago

Meanwhile Honda’s robotics design whittled away after ASIMO’s mild improvements in 2011 and it’s really, really sad.