Abstract: Visual encoders are fundamental components in vision-language models (VLMs), each showcasing unique strengths derived from various pre-trained visual foundation models. To leverage the ...
Abstract: Pre-trained encoders in computer vision have recently received great attention from both research and industry communities. Among others, a promising paradigm is to utilize self-supervised ...
Converting protein tertiary structure into discrete tokens via vector-quantized variational autoencoders (VQ-VAEs) creates a language of 3D geometry and provides a natural interface between sequence ...