Stars
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
ChatGPT 中文调教指南。各种场景使用指南。学习怎么让它听你的话。
ChatGPT爆火,开启了通往AGI的关键一步,本项目旨在汇总那些ChatGPT的开源平替们,包括文本大模型、多模态大模型等,为大家提供一些便利
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
✨✨Latest Advances on Multimodal Large Language Models
On-device AI across mobile, embedded and edge for PyTorch
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
This repo includes ChatGPT prompt curation to use ChatGPT and other LLM tools better.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
TensorFlow code and pre-trained models for BERT
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
README and scripts for the Cityscapes Dataset
A topic-centric list of HQ open datasets.
Collection of Remote Sensing Vision-Language Models
cvpr2024/cvpr2023/cvpr2022/cvpr2021/cvpr2020/cvpr2019/cvpr2018/cvpr2017 论文/代码/解读/直播合集,极市团队整理
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Document Dewarping with Control Points
Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)