PhyPlan: Learning To Plan Tasks with Generalizable and Rapid Physical Reasoning for Embodied Manipulation

......

Ankit Kanwar*, Hartej Soin*, Abhinav Barnawal*, Mudit Chopra, Harshil Vagadia, Tamajit Banerjee, Shreshth Tuli, Rohan Paul and Souvik Chakraborty
Indian Institute of Technology Delhi | *Equal contribution

PhyPlan is a novel physics-informed planning framework based on accelerated learning of Physical Reasoning Tasks using Physics-Informed Dynamics Predictors.


Abstract

Given the task of landing a ball in a goal region beyond direct reach, humans can often throw, slide, or rebound objects against the wall to attain the goal. Enabling robots to replicate such reasoning is non-trivial as it requires multi-step planning and involves a mixture of discrete and continuous action spaces, a sparse and sensitive reward structure, computationally expensive simulations, and an incomplete understanding of the environment's physics. We present PhyPlan, a physics-informed and adaptable planning framework for efficient multi-step physical reasoning. At its core, PhyPlan comprises of Generative Flow Networks (GFlowNets) and Monte Carlo Tree Search (MCTS) to explore and evaluate sequences of object interactions. GFlowNets sample discrete action sequences in proportion to their associated reward, enabling broad and reward-driven exploration of the discrete planning space. MCTS complements this by adaptively balancing the use of a fast but approximate pre-trained physics-informed dynamics predictor and costly but accurate environment rollouts, ensuring both speed and precision in planning. The known and actual physics discrepancy is captured using Gaussian Process Regression. Experiments on benchmark simulated tasks requiring composition of collisions, slides, and rebounds demonstrate that PhyPlan achieves a 45% higher success rate and up to 3x efficiency gains over state-of-the-art model-based reinforcement learning approaches.

Explanatory Video


Approach Overview

Left to Right: The framework begins by learning dynamics-predictor models for elementary physical skills—such as throwing, sliding, swinging, and collision—using a Physics-Informed Neural Network (PINN). By incorporating coarse physics-based equations directly into its loss function, the network learns to predict both the future trajectory of interacting objects and the domain's latent physical parameters.

Moving to the planning phase, given a new task and a set of tools, the agent uses GFlowNets to sample and iterate over various tool sequences, utilising the PINN-based dynamics predictors as efficient proxy-reward models. To optimise execution, the agent reasons over specific control parameters using Monte Carlo Tree Search (MCTS). This search employs the learnt models (PINN) to rapidly simulate actions—such as the release plane of a pendulum or the orientation of a wedge—faster than a complex high-fidelity physics simulator. Finally, to ensure accuracy, the MCTS periodically conducts rollouts in the high-fidelity simulator (world model) and corrects any discrepancies between the PINN-based simulations and reality using a Gaussian Process (GP) Regressor.

Skill Learning

We consider the problem of learning a model for physical skills such as bouncing a ball-like object off a wedge, sliding over an object, swinging a pendulum, throwing an object as a projectile and hitting an object with a pendulum. The skill learning model predicts the state trajectory of an object as it undergoes a dynamic interaction with another object.



GIF 1

Sliding Skill

Skill network learns to determine the displacement and velocity of a sliding box with given initial velocity at any time queried on a rough plane

GIF 2

Throwing Skill

Skill network learns to determine the location and velocity of a ball thrown with given initial angle and velocity, at any time queried.

GIF 3

Swinging Skill

Skill network learns to determine the angular position and angular velocity of pendulum at any time queried with given initial angular position.

GIF 4

Collision Skill

Skill network learns to determine the velocity of the puck just after it gets hit by a swinging pendulum.


The skill learning model is based on a neural network that predicts the object's state during dynamic interaction continuously parameterised by time. The figures above show the predicted positions of the ball plotted against time in the Bounce Task. Such interactions can be simulated in a physics engine by using numerical integration schemes. However, since we aim to perform multi-step interactions, simulating outcomes during training is often intractable. Hence, we adopt a learning-based approach and learn a function which predicts the object's state during dynamic interaction continuously parameterised by time. For certain skills like swinging, sliding and throwing, we leverage the known governing physics equations and employ a physics-informed loss function in a neural network to constrain the latent space, which is called as Physics-Informed Dynamics-Predictors. However, skills like collision detection are learnt directly from data due to the complex, intractable physics.

Benchmark Physical Reasoning Tasks

We created the following five challenging 3D physical reasoning tasks to analyse the performance of PhyPlan, inspired by prior works in simplistic 2D environments presented in [Allen et al., 2020] and [Bakhtin et al., 2019].

PhyPlan performs semantic reasoning using PINN-based Skill Models before executing each action in the environment (just as Humans think before executing). It also learns the difference between PINN-based rewards for actions and actual rewards as it executes actions (called online learning). Therefore, it often improves in subsequent actions (Just as Humans improve their actions with more trials). The videos show the actions taken by PhyPlan on each task. The effect of online learning is more evident in the Bounce and Bridge tasks, where the robot performs poorly in early attempts.


Launch Task

Robot trains to use the pendulum object present in the environment to make the ball reaches the goal

The robot learns to correctly align the pendulum's plane and angle to throw the ball into the box.

Slide Task

Robot trains to use the pendulum object present in the environment to slide the puck to the goal

The following five trials represent the robot eventually sliding the puck to reach the goal by aligning the pendulum and using physical skills like hitting and sliding.

Bounce Task

Robot trains to use the wedge object present in the environment to make the ball reach the goal

In the above five trials, the robot places the wedge at the correct location with the proper orientation, throwing the ball from the proper height, so that the ball reaches the goal.

Bridge Task

Robot trains to use the pendulum and bridge objects present in the environment to make the puck reach the goal

Over the shown trials, the robot learns to correctly align the pendulum so that the hitting plane is correctly aligned. Eventually, the robot effectively uses objects like the bridge present in the environment.

Progressively reasoning over tools

PhyPlan's tool selector based on GFlowNet eventually learns to select and place the appropriate tool classes and the best corresponding tool variants.

Comparison with the Baselines

The image below quantifies the significant efficiency and accuracy advantage of PhyPlan over the baselines, in spite of PhyPlan having a greater reasoning depth compared to the baselines, which only select controls for "ideal" tools.


Below is a qualitative comparison with the baselines "DQN" (adapted from [Bakhtin et al., 2019]) and LLM. DQN is a Deep Q-Network trained on a set of observation-action-reward triplets minimizing the cross-entropy between the soft prediction and the observed reward. DQN also uses the same algorithm (Gaussian Process) as PhyPlan for online learning (here)

  1. DQN (Baseline) executes actions in sequence while learning the difference in the predicted reward for an action and the actual reward. However, it does not use the bridge even after 11 trials which is needed to land the ball further closer to the goal.
  2. LLM (Baseline) executes actions in sequence while correcting them based on feedback in further trials. It uses the bridge in a few trials but does align the pandulum appropriately even in 10 trials.
  3. PhyPlan executes actions in sequence while learning the difference in the predicted reward for an action and the actual reward. It does not use the bridge in the first attempt because of errors in prediction. However, it quickly realises the need of the bridge in the second attempt. Further, it chooses appropriate actions to land the ball in the goal in just the fourth attempt. Note that the robot learns to use the bridge effectively; a physical reasoning task reported earlier [Allen et al., 2020] to be challenging to learn for model-free methods, highlighting PhyPlan’s adaptability to long-horizon tasks.

LLM (Baseline) Prompting Details

We investigate the physical reasoning abilities of a Large Language Model (specifically Google's Gemini-Pro LLM). We initially describe the task setup and ask the LLM to generate the actions. We execute the generated action in the environemnt and reprompt the LLM based with the feedback of where the ball/puck landed with respect to the goal.


Launch Task

Initial Prompt (Task Description):
There is a robot and a goal located at {goal_pos} outside the direct reach of the robot. There is a ball that needs to reach the goal. The environment has a fixed pillar over which the ball is resting, and a pendulum hanging over the ball that the robot can orient to hit the ball to throw it to the goal. The robot can orient the pendulum along any vertical plane and choose to drop the pendulum from any angle from the vertical axis. When hit with a pendulum, the puck projectiles and lands far away on the ground.
Sanity check 1: How does the plane of the pendulum affect the puck's position with respect to the goal?
Sanity check 2: How does the drop angle of the pendulum affect the puck's position with respect to the goal?
Feedback Prompt:
In one line, give the numerical values of the angle to orient the pendulum's plane and the angle to drop the pendulum from (both in decimal radians). The bound for plane orientation angle is ({bnds[0][0]}, {bnds[0][1]}) and that for drop angle with vertical axis is ({bnds[1][0]}, {bnds[1][1]}). I will tell you where the ball landed, and you should modify your answer accordingly till the ball reaches the goal. I have marked the ground into two halves. The goal lies in one half, and the robot and the wedge are at the centre. Thoughout the conversation, remember that my response would be one of these:
  1. The ball lands in the half not containing the goal, I'd say 'WRONG HALF'.
  2. The ball lands in the correct half but left of the goal, I'd say 'LEFT by <horizontal distance between ball and goal>'.
  3. The ball lands in the correct half but right of the goal, I'd say 'RIGHT by <horizontal distance between ball and goal>'.
  4. The ball lands in the correct half and in line but overshot the goal, I'd say 'OVERSHOT by <horizontal distance between ball and goal>'.
  5. The ball lands in the correct half and in line but fell short of the goal, I'd say 'FELL SHORT by <horizontal distance between ball and goal>'.
  6. Finally, the ball successfully landed in the goal, I'd say 'GOAL'. \
Note: In your response, do not write anything else except the (pendulum's plane angle, pendulum's drop angle) pair. Send in tuple FORMAT: (angle 1, angle 2). Do not emphasise the answer, just return plain text. Let's begin with an initial guess!

Slide Task

Initial Prompt (Task Description):
There is a robot and a goal located at {goal_pos} outside the direct reach of the robot. There is a puck that needs to reach the goal. The environment has a fixed table over which the puck slides, and a pendulum hanging over the puck that the robot can orient to hit the puck to slide it to the goal. The robot can orient the pendulum along any vertical plane and choose to drop the pendulum from any angle from the vertical axis. When hit with a pendulum, the puck slides on the table.
Sanity check 1: How does the plane of the pendulum affect the puck's position with respect to the goal?
Sanity check 2: How does the drop angle of the pendulum affect the puck's position with respect to the goal?
Feedback Prompt:
In one line, give the numerical values of the angle to orient the pendulum's plane and the angle to drop the pendulum from (both in decimal radians). The bound for plane orientation angle is ({bnds[0][0]}, {bnds[0][1]}) and that for drop angle with vertical axis is ({bnds[1][0]}, {bnds[1][1]}). I will tell you where the puck landed, and you should modify your answer accordingly till the puck reaches the goal. I have marked the ground into two halves. The goal lies in one half, and the robot and the wedge are at the centre. Thoughout the conversation, remember that my response would be one of these:
  1. The puck lands in the half not containing goal, I'd say 'WRONG HALF'.
  2. The puck lands in the correct half but left of the goal, I'd say 'LEFT by <horizontal distance between puck and goal>'.
  3. The puck lands in the correct half but right of the goal, I'd say 'RIGHT by <horizontal distance between puck and goal>'.
  4. The puck lands in the correct half and in line but overshot the goal, I'd say 'OVERSHOT by <horizontal distance between puck and goal>'.
  5. The puck lands in the correct half and in line but fell short of the goal, I'd say 'FELL SHORT by <horizontal distance between puck and goal>'.
  6. Finally, the puck successfully landed in the goal, I'd say 'GOAL'. \
Note: In your response, do not write anything else except the (pendulum's plane angle, pendulum's drop angle) pair. Send in tuple FORMAT: (angle 1, angle 2). Do not emphasise the answer, just return plain text. Let's begin with an initial guess!

Bounce Task

Initial Prompt (Task Description):
There is a robot and a goal located at {goal_pos} outside the direct reach of the robot. There is a ball that needs to reach the goal. The environment has a wedge (an inclined plane at 45 degrees from the horizontal plane) placed at origin, and the robot can bounce the ball over the wedge to place the ball inside the goal. The height of the wedge centre from the ground is fixed at 0.3 metres. The robot can orient the wedge along any horizontal direction and choose to drop the ball over the wedge from any height. When dropped from a height, the ball bounces on the wedge and lands far away on the ground.
Sanity check 1: How does the orientation angle of the wedge affect the ball's position with respect to the goal?
Sanity check 2: How does the drop height of the ball affect the ball's position with respect to the goal?
Feedback Prompt:
In one line, give the numerical values of the angle to orient the wedge and the height to drop the ball from in the format (angle in decimal radians, height in meters). The bound for angle is ({bnds[0][0]}, {bnds[0][1]}) and that for height is ({bnds[1][0]}, {bnds[1][1]}). I will tell you where the ball landed, and you should modify your answer accordingly till the ball reaches the goal. I have marked the ground into two halves. The goal lies in one half, and the robot and the wedge are at the centre. Thoughout the conversation, remember that my response would be one of these:
  1. The ball lands in the half not containing goal, I'd say 'WRONG HALF'.
  2. The ball lands in the correct half but left of the goal, I'd say 'LEFT by <horizontal distance between ball and goal>'.
  3. The ball lands in the correct half but right of the goal, I'd say 'RIGHT by <horizontal distance between ball and goal>'.
  4. The ball lands in the correct half and in line but overshot the goal, I'd say 'OVERSHOT by <horizontal distance between ball and goal>'.
  5. The ball lands in the correct half and in line but fell short of the goal, I'd say 'FELL SHORT by <horizontal distance between ball and goal>'.
  6. Finally, the ball successfully landed in the goal, I'd say 'GOAL'.
Note: In your response, do not write anything else except the (angle, height) pair. Send in tuple FORMAT: (angle, height). Do not emphasise the answer, just return plain text. Let's begin with an initial guess!

Bridge Task

Initial Prompt (Task Description):
There is a robot and a goal located at {goal_pos} outside the direct reach of the robot. There is a puck that needs to reach the goal. The environment has a fixed table over which the puck slides, a movable bridge over which the puck slides and a pendulum that the robot can orient to move the puck towards the goal. The robot can orient the pendulum along any vertical plane, orient the bridge in any horizontal direction and choose to drop the pendulum from any angle from the vertical axis. When hit with a pendulum, the puck slides on the table, then on the bridge and finally projectiles to land far away on the ground.
Sanity check 1: How does the plane of the pendulum affect the puck's position with respect to the goal?
Sanity check 2: How does the drop angle of the pendulum affect the puck's position with respect to the goal?
Sanity check 3: How does the orientation angle of the bridge affect the puck's position with respect to the goal?
Feedback Prompt:
In one line, give the numerical values of the angle to orient the pendulum's plane, the angle to orient the bridge and the angle to drop the pendulum from (all in decimal radians). The bound for plane orientation angle is ({bnds[0][0]}, {bnds[0][1]}), that for bridge orientation angle is ({bnds[2][0]}, {bnds[2][1]}), and that for drop angle with vertical axis is ({bnds[1][0]}, {bnds[1][1]}). I will tell you where the puck landed, and you should modify your answer accordingly till the puck reaches the goal. I have marked the ground into two halves. The goal lies in one half, and the robot and the wedge are at the centre. Thoughout the conversation, remember that my response would be one of these:
  1. The puck lands in the half not containing goal, I'd say 'WRONG HALF'.
  2. The puck lands in the correct half but left of the goal, I'd say 'LEFT by <horizontal distance between puck and goal>'.
  3. The puck lands in the correct half but right of the goal, I'd say 'RIGHT by <horizontal distance between puck and goal>'.
  4. The puck lands in the correct half and in line but overshot the goal, I'd say 'OVERSHOT by <horizontal distance between puck and goal>'.
  5. The puck lands in the correct half and in line but fell short of the goal, I'd say 'FELL SHORT by <horizontal distance between puck and goal>'.
  6. Finally, the puck successfully landed in the goal, I'd say 'GOAL'.
Note: In your response, do not write anything else except the (pendulum's plane angle, pendulum's drop angle, bridge's orientation angle) triplet. Send in tuple FORMAT: (angle 1, angle 2, angle 3). Do not emphasise the answer, just return plain text. Let's begin with an initial guess!

Ricochet Task

Initial Prompt (Task Description):
There is a robot and a goal located at {goal_pos} outside the direct reach of the robot. There is a ball that needs to reach the goal. The environment has a fixed pillar over which the ball is resting, a movable wedge (an inclined plane at 45 degrees from the horizontal plane), and a pendulum hanging over the ball that the robot can orient to hit the ball. The robot can orient the pendulum along any vertical plane and choose to drop the pendulum from any angle from the vertical axis. However, the pendulum can only be used to hit the ball in a direction away from the goal, making it impossible to reach the goal directly by the pendulum alone. To solve this, the robot must move and orient the wedge such that the ball bounces off it after being hit by the pendulum, and lands inside the goal. The robot can change the radial distance of the wedge from the origin, the direction of this radial distance (in polar coordinates), and orient the wedge in any horizontal direction.
Sanity check 1: How does the drop angle of the pendulum affect the ball's trajectory and its position with respect to the wedge?
Sanity check 2: How does the plane of the pendulum affect the ball's trajectory position with respect to the wedge?
Sanity check 3: How does the position (radial distance and angle) of the wedge affect the ball's position with respect to the goal?
Sanity check 4: How does the orientation angle of the wedge affect the ball's position with respect to the goal?
Feedback Prompt:
In one line, give the numerical values of the pendulum's plane angle, pendulum's drop angle, wedge's radial distance from origin, direction angle of this radial distance (in polar form), and wedge's orientation angle (all in decimal radians or meters where applicable). The bound for pendulum plane orientation is ({bnds[0][0]}, {bnds[0][1]}), that for pendulum drop angle is ({bnds[1][0]}, {bnds[1][1]}), that for wedge radial distance is ({bnds[2][0]}, {bnds[2][1]}), that for radial direction angle is ({bnds[3][0]}, {bnds[3][1]}), and that for wedge orientation angle is ({bnds[4][0]}, {bnds[4][1]}). I will tell you where the ball landed, and you should modify your answer accordingly till the ball reaches the goal. I have marked the ground into two halves. The goal lies in one half, and the robot and the wedge are at the centre. Throughout the conversation, remember that my response would be one of these:
  1. The ball lands in the half not containing goal, I'd say 'WRONG HALF'.
  2. The ball lands in the correct half but left of the goal, I'd say 'LEFT by <horizontal distance between ball and goal>'.
  3. The ball lands in the correct half but right of the goal, I'd say 'RIGHT by <horizontal distance between ball and goal>'.
  4. The ball lands in the correct half and in line but overshot the goal, I'd say 'OVERSHOT by <horizontal distance between ball and goal>'.
  5. The ball lands in the correct half and in line but fell short of the goal, I'd say 'FELL SHORT by <horizontal distance between ball and goal>'.
  6. Finally, the ball successfully landed in the goal, I'd say 'GOAL'.
Note: In your response, do not write anything else except the (pendulum plane angle, pendulum drop angle, wedge radial distance, wedge radial direction, wedge orientation angle) tuple. Send in tuple FORMAT: (angle 1, angle 2, distance, angle 3, angle 4). Do not emphasise the answer, just return plain text. Let's begin with an initial guess!

References

1. [Allen et al., 2020] Kelsey R Allen, Kevin A Smith, and Joshua B Tenenbaum.
        Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning.
        Proceedings of the National Academy of Sciences, 117(47):29302-29310, 2020.
2. [Bakhtin et al., 2019] Anton Bakhtin, Laurens van der Maaten, Justin Johnson, Laura Gustafson, and Ross Girshick.
        Phyre: A new benchmark for physical reasoning.
        Advances in Neural Information Processing Systems, 32,484 2019.

Citation

@inproceedings{phyplan2026,
      title     = {PhyPlan: Learning To Plan Tasks with Generalizable and Rapid Physical Reasoning for Embodied Manipulation},
      author    = {Kanwar, Ankit and Soin, Hartej and Barnawal, Abhinav and Chopra, Mudit and Vagadia, Harshil and Banerjee, Tamajit and Tuli, Shreshth and Chakraborty, Souvik and Paul, Rohan},
      booktitle = {},
      year      = {2026}
    }