Research

Paper

TESTING March 09, 2026

Exp-Force: Experience-Conditioned Pre-Grasp Force Selection with Vision-Language Models

Authors

Siqi Shang, Minchao Huang, Bill Fan, Lillian Chin

Abstract

Accurate pre-contact grasp force selection is critical for safe and reliable robotic manipulation. Adaptive controllers regulate force after contact but still require a reasonable initial estimate. Starting a grasp with too little force requires reactive adjustment, while starting a grasp with too high a force risks damaging fragile objects. This trade-off is particularly challenging for compliant grippers, whose contact mechanics are difficult to model analytically. We propose Exp-Force, an experience-conditioned framework that predicts the minimum feasible grasping force from a single RGB image. The method retrieves a small set of relevant prior grasping experiences and conditions a vision-language model on these examples for in-context inference, without analytic contact models or manually designed heuristics. On 129 object instances, ExpForce achieves a best-case MAE of 0.43 N, reducing error by 72% over zero-shot inference. In real-world tests on 30 unseen objects, it improves appropriate force selection rate from 63% to 87%. These results demonstrate that Exp-Force enables reliable and generalizable pre-grasp force selection by leveraging prior interaction experiences. http://expforcesubmission.github.io/Exp-Force-Website/

Metadata

arXiv ID: 2603.08668
Provider: ARXIV
Primary Category: cs.RO
Published: 2026-03-09
Fetched: 2026-03-10 05:43

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.08668v1</id>\n    <title>Exp-Force: Experience-Conditioned Pre-Grasp Force Selection with Vision-Language Models</title>\n    <updated>2026-03-09T17:41:22Z</updated>\n    <link href='https://arxiv.org/abs/2603.08668v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.08668v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Accurate pre-contact grasp force selection is critical for safe and reliable robotic manipulation. Adaptive controllers regulate force after contact but still require a reasonable initial estimate. Starting a grasp with too little force requires reactive adjustment, while starting a grasp with too high a force risks damaging fragile objects. This trade-off is particularly challenging for compliant grippers, whose contact mechanics are difficult to model analytically. We propose Exp-Force, an experience-conditioned framework that predicts the minimum feasible grasping force from a single RGB image. The method retrieves a small set of relevant prior grasping experiences and conditions a vision-language model on these examples for in-context inference, without analytic contact models or manually designed heuristics. On 129 object instances, ExpForce achieves a best-case MAE of 0.43 N, reducing error by 72% over zero-shot inference. In real-world tests on 30 unseen objects, it improves appropriate force selection rate from 63% to 87%. These results demonstrate that Exp-Force enables reliable and generalizable pre-grasp force selection by leveraging prior interaction experiences. http://expforcesubmission.github.io/Exp-Force-Website/</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.RO'/>\n    <published>2026-03-09T17:41:22Z</published>\n    <arxiv:primary_category term='cs.RO'/>\n    <author>\n      <name>Siqi Shang</name>\n    </author>\n    <author>\n      <name>Minchao Huang</name>\n    </author>\n    <author>\n      <name>Bill Fan</name>\n    </author>\n    <author>\n      <name>Lillian Chin</name>\n    </author>\n  </entry>"
}