Building on this foundation, we introduce \ours, a novel approach that identifies and aligns the most relevant visual and textual representations in a modular manner.
In this work, we propose Mod-Squad, a modular multi-task learner based on mixture-of-experts and a novel loss to address the gradient conflicts among tasks. We demonstrate its potential to scale up in both model capacity and target task numbers while keeping the computation cost low.
We propose a simple but effective modular approach MOPA (Modular ObjectNav with PointGoal agents) to sys-tematically investigate the inherent modularity of the object navigation task in Embodied AI.
End-to-End Yet Modular Architecture distinguishes it-self from traditional and end-to-end planning approaches by integrating modular design with end-to-end training. As a result, it maintains safety and interpretability, while simul-taneously optimizing all modules for downstream planning.
This necessitates the creation of modular and scalable ways to teach VLMs about physical reasoning. To that end, we introduce Physics Context Builders (PCBs), a modular framework where specialized smaller VLMs are fine-tuned to generate detailed physical scene descriptions.
The modular architecture of MoReVQA allows it to be dynamically tailored to a wide range of datasets, question types, and tasks by selectively engaging different APIs and reasoning strategies based on the task at hand.
Building on this foundation, we introduce SmartCLIP, a novel approach that identifies and aligns the most relevant visual and textual representations in a modular manner.
Modular Blind Video Quality Assessment Wen Wen, Mu Li, Yabin Zhang, Yiting Liao, Junlin Li, Li Zhang, Kede Ma; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 2763-2772
We design modular adapter consisting of a functional adapter and a representation descriptor. The representation descriptors are trained as a distribution shift indicator and used to trigger self-expansion signals.
In this paper, we ad-dress a new problem called Modular Customization, with the goal of efficiently merging customized models that were fine-tuned independently for individual concepts.