You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I tested on the msvd-caption-test, I first used the fill_in_video_file you provided to add the video directory. Then, I used the Tarsier-7B and Tarsier-34B models respectively for inference and evaluation. The final CIDEr scores were 56.7 and 58.9, which differ significantly from those reported in the paper. Additionally, I also performed inference and evaluation using MSR-VTT, and the Tarsier-34B result was 31.4.
I used two A800 GPUs, each with 80G of memory, and made no other modifications. Therefore, I would like to ask if there are any other details that I might have overlooked. I look forward to your reply.
The text was updated successfully, but these errors were encountered:
Thanks for the reminder! We have just updated the prompts in the metadata to our latest version, which is consistent to the test results reported in the paper.
Hello, thank you very much for your work.
When I tested on the msvd-caption-test, I first used the fill_in_video_file you provided to add the video directory. Then, I used the Tarsier-7B and Tarsier-34B models respectively for inference and evaluation. The final CIDEr scores were 56.7 and 58.9, which differ significantly from those reported in the paper. Additionally, I also performed inference and evaluation using MSR-VTT, and the Tarsier-34B result was 31.4.
I used two A800 GPUs, each with 80G of memory, and made no other modifications. Therefore, I would like to ask if there are any other details that I might have overlooked. I look forward to your reply.
The text was updated successfully, but these errors were encountered: