Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning
(ICLR 2025)

Yang You, Yixin Li, Congyue Deng, Yue Wang, Leonidas Guibas

Finetuning feature equivariance on a synthetic object significantly enhances the Vision Transformer’s ability to generate better 3D feature correspondences across various objects. This improvement translates to superior performance in 3D tasks such as pose estimation, video tracking, and semantic correspondence.

You may also be interested in our previous work SparseDFF (ICLR 2024), which employs DINO features for one-shot dexterous manipulation.

Change Logs

2025/2/19 - Uploaded other ViT-family models.
2025/2/18 - Uploaded two missing files for PF-PASCAL evaluation.
2025/2/4 - Uploaded more DINOv2 variants (DINOv2-Small/Large/Giant). Provide the environment requirements.
2025/1/26 - Uploaded pretrained models (DINOv2-Base) along with training/evaluation recipes.

Huggingface Demo

We provide a Huggingface demo at https://huggingface.co/spaces/qq456cvb/3DCorrEnhance.

Environment Setup

Our environment information can be found in requirements.txt. You can install them by:

pip install -r requirements.txt

Quick Start

Our finetuned DINOv2-Small/Base/Large/Giant and other ViT-family models are available at Huggingface. To load DINOv2-Base, run:

from finetune import FinetuneDINO
model = FinetuneDINO.load_from_checkpoint('https://huggingface.co/qq456cvb/3DCorrEnhance/resolve/main/dinov2_base.ckpt', r=4, backbone_size='base').eval().cuda()

To load other ViT models (e.g., CLIP), run:

from finetune_timm import FinetuneTIMM
model = FinetuneTIMM.load_from_checkpoint('https://huggingface.co/qq456cvb/3DCorrEnhance/resolve/main/clip.ckpt', r=4, vit='clip').eval().cuda()

To extract descriptors for specific keypoints (a Nx2 numpy array), use:

import torch
from PIL import Image
import numpy as np

rgb = np.array(Image.open('/path/to/rgb.png'))
kps = ...  # N x 2 numpy array
rgb_input = torch.from_numpy(np.moveaxis((rgb / 255.).astype(np.float32), -1, 0)).cuda()
with torch.no_grad():
    kp_feats = model.get_feature(rgb_input[None], torch.from_numpy(kps).cuda()[None], normalize=True)[0]  # N x F torch tensor

To extract the entire feature map:

import torch
from PIL import Image
import numpy as np

rgb = np.array(Image.open('/path/to/rgb.png'))
rgb_input = torch.from_numpy(np.moveaxis((rgb / 255.).astype(np.float32), -1, 0)).cuda()
with torch.no_grad():
    feat_img = model.get_feature_wo_kp(rgb_input[None], normalize=True)[0]  # H x W x F torch tensor

Finetuning on Objaverse

Data Preparation

To prepare the multi-view training data on Objaverse, first download the Objaverse glbs (only a 10k subset is required, as defined in data/10k.txt). Then run data_utils/render_objects.py to render 10K randomly sampled Objaverse objects with blenderproc.

Your directory structure should look like this:

3DCorrEnhance/
└── data/
    └── 10k.txt
    └── obj_poses.npy
    └── objaverse/
        └── hf-objaverse-v1/
            └── glbs/
                ├── 000-000/
                ├── ...
                └── 000-159/
    └── objaverse_renderings/

Run Finetuning

To finetune the DINOv2 base network, run:

finetune.py backbone=base

This will finetune DINOv2 Base and save checkpoints in the checkpoints/ folder. For other DINOv2 variants, change the backbone type:

finetune.py backbone=large

For DINOv2 with registers, use:

finetune.py backbone=base reg=True

Evaluation

Data Preparation

For pose estimation, download the test data from OnePose++ and place it under data/. Your directory should look like this:

3DCorrEnhance/
└── data/
    └── sfm_output/
        └── outputs_softmax_loftr_loftr/
    └── lowtexture_test_data/

For video tracking evaluation, download the data from TAP-Vid-DAVIS and place it under data/:

3DCorrEnhance/
└── data/
    ├── tapvid_davis_data_strided.pkl
    └── lowtexture_test_data/

For semantic transfer, download the PF-PASCAL dataset and place it under data/:

3DCorrEnhance/
└── data/
    └── PF-dataset-PASCAL/
        ├── Annotations/
        ├── JPEGImages/
        ├── test_pairs_pf_different_views.txt
        └── test_pairs_pf_same_views.txt

Run Evaluation

To evaluate a checkpoint on all three tasks, run:

python evaluate.py --ckpt /path/to/ckpt --pose --tracking --transfer

Acknowledgements

Some code is adapted from DINO-Tracker, FiT3D, and Objaverse-XL. We thank these projects for their open-source contributions.

BibTeX

If you find our work helpful, please consider citing:

@misc{you2024multiviewequivarianceimproves3d,
      title={Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning}, 
      author={Yang You and Yixin Li and Congyue Deng and Yue Wang and Leonidas Guibas},
      year={2024},
      eprint={2411.19458},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.19458}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
checkpoints		checkpoints
config		config
data		data
data_utils		data_utils
evaluation_output		evaluation_output
utils		utils
.gitignore		.gitignore
README.md		README.md
evaluate.py		evaluate.py
finetune.py		finetune.py
finetune_timm.py		finetune_timm.py
hubconf.py		hubconf.py
requirements.txt		requirements.txt
teaser.png		teaser.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning
(ICLR 2025)

Yang You, Yixin Li, Congyue Deng, Yue Wang, Leonidas Guibas

Table of Contents

Change Logs

Huggingface Demo

Environment Setup

Quick Start

Finetuning on Objaverse

Data Preparation

Run Finetuning

Evaluation

Data Preparation

Run Evaluation

Acknowledgements

BibTeX

About

Releases

Packages

Languages

qq456cvb/3DCorrEnhance

Folders and files

Latest commit

History

Repository files navigation

Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning (ICLR 2025)

Yang You, Yixin Li, Congyue Deng, Yue Wang, Leonidas Guibas

Table of Contents

Change Logs

Huggingface Demo

Environment Setup

Quick Start

Finetuning on Objaverse

Data Preparation

Run Finetuning

Evaluation

Data Preparation

Run Evaluation

Acknowledgements

BibTeX

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning
(ICLR 2025)

Packages