[] Joint Optimization of Dual Robot Arm Pick and Place

Revolutionizing Robot Manipulation: How Diffusion Models Unlock Seamless Bimanual Pick-and-Place in Cluttered Shelves

A simulation demo: video
A real world demo: demo

Imagine a warehouse robot staring at a jumbled shelf: it needs to grab a box lying flat on the table, flip it upright, and slide it neatly onto a high shelf — all without knocking anything over or dropping the item. Today’s robots often fail here. Single-arm systems hit kinematic limits; traditional planners get stuck in endless trial-and-error loops. But what if two robot arms could elegantly hand the object between them, optimizing every grasp and every joint movement together in one smart loop?

That's exactly what our Final Year Project team at HKUST has built.

The Engineering Problem We're Solving

In real-world pick-and-place tasks — warehousing, assembly lines, even smart homes — objects rarely line up perfectly for a single grasp-and-place motion. You need reorientation, which demands bimanual handovers: one arm picks, passes to the other, and the second places. Clutter, tight spaces, and precise final poses make this brutally hard. Classic methods separate grasp generation from motion planning, leading to collisions, jerky movements, or outright failures.

Our Breakthrough: A Collaborative Diffusion Framework

We created a unified collaborative optimization framework powered by diffusion models — the same AI tech behind stunning image generators, now driving real robot dexterity.
At its heart:
- GraspGen (a pretrained diffusion model we adapted for dual inputs) simultaneously proposes high-quality pick and placement grasps from point-cloud data in a single forward pass.
- A planning diffusion module injects smart guidance gradients during the denoising process, considering:
- Trajectory smoothness (minimizing wild joint swings)
- Collision avoidance with shelves, objects, and the other gripper
- Handover constraints (perfect region + orientation for safe transfer)
- Regrasp cost (ensuring grippers stay safely apart)
Everything happens in a gradient-guided loop: the models “think together” to refine grasp poses, handover poses, and motion plans jointly. No more decoupled guesswork.
We also trained a fast MLP (Multi-Layer Perceptron) to predict valid inverse-kinematics solutions in milliseconds, and built a full sandbox simulator with stunning 3D visualizations of entire trajectories.

Inside the Magic (Without the Math Overload)

Here’s the high-level flow:
1. Noisy grasp proposals start the process.
2. GraspGen denoises them step-by-step.
3. In the final few steps, our planning module kicks in with differentiable cost gradients — smoothness pulls poses toward gentle joint paths, collision terms nudge away from obstacles, handover gradients lock the transfer zone.
4. The best collision-free, smooth pair is selected and executed in simulation (or on real UR arms with Orbbec cameras).
We even validated that simple joint-angle distance is an excellent (and lightning-fast) proxy for real smoothness.

Impressive Progress So Far

In just months we've delivered:
- Dual-grasp generation working flawlessly on diverse objects
- Differentiable smoothness, collision, handover, and regrasp costs fully integrated
- A 92.5% accurate neural IK solver (sub-millimeter precision)
- Visualization tools that let us watch robots “dance” through complex scenes
- Baselines showing clear wins in grasp confidence, success rate, and total motion cost
Our targets? ≥85% smooth collision-free trajectories in simulation and ≥80% real-world success (placement error <1 cm, orientation <5°). Early results are already beating decoupled baselines by double digits.

Why This Matters

This isn't just another robotics paper. It's a step toward robots that can truly collaborate like humans — passing objects smoothly, adapting to clutter, and handling the messy real world. Warehouses could rearrange shelves overnight. Factories could assemble complex products faster. Your future home assistant might finally hand you that coffee mug the right way up.

My Teammates

Led by YEUNG Wun Lam, QIN Zhengyan Lambo(me, on the right), and YU Yui Cheung under the guidance of Prof. TAN Ping, this project blends cutting-edge generative AI with classic robotics in a way that feels like the future arriving early.
We're not done yet — real-world hardware tests and final polishing are underway — but the foundation is rock-solid.
Want to see the robots in action? Stay tuned for demo videos. The era of clumsy single-arm robots is ending. Bimanual, diffusion-powered dexterity is here.

What do you think — ready for robots that can actually hand things to each other? Drop a comment below!