Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2-VL differences with Python implementation #1108

Open
hgaiser opened this issue Jan 28, 2025 · 0 comments
Open

Qwen2-VL differences with Python implementation #1108

hgaiser opened this issue Jan 28, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@hgaiser
Copy link

hgaiser commented Jan 28, 2025

Describe the bug

I am trying to use Qwen2-VL as per the example, but I'm getting some unexpected results. I modified the example (only changed the prompt and the amount of times the model is run) :

Modified Rust example
use anyhow::Result;
use mistralrs::{IsqType, TextMessageRole, VisionLoaderType, VisionMessages, VisionModelBuilder};

const MODEL_ID: &str = "Qwen/Qwen2-VL-2B-Instruct";

#[tokio::main]
async fn main() -> Result<()> {
    let model = VisionModelBuilder::new(MODEL_ID, VisionLoaderType::Qwen2VL)
        .with_isq(IsqType::Q4K)
        .with_logging()
        .build()
        .await?;

    let bytes = match reqwest::blocking::get(
        "https://www.garden-treasures.com/cdn/shop/products/IMG_6245.jpg",
    ) {
        Ok(http_resp) => http_resp.bytes()?.to_vec(),
        Err(e) => anyhow::bail!(e),
    };
    let image = image::load_from_memory(&bytes)?;

    let messages = VisionMessages::new().add_image_message(
        TextMessageRole::User,
        "You are to act like an image classification model. I want you to detect if there is a dog in the image. I want you to respond '0' if there is no dog and '1' if there is a dog.",
        image,
        &model,
    )?;

    for _ in 0..10 {
        let response = model.send_chat_request(messages.clone()).await?;
        println!("{}", response.choices[0].message.content.as_ref().unwrap());
    }

    Ok(())
}

Running this (cargo r --features "cuda flash-attn cudnn" --release --example qwen2vl) gives me:

gekl
恁 asked if there is a dog in bei jia. Told ya '0' becuase there is no dog.
flower
))] attached to the bottom right
found no dogs
There is no dog.
maryflower1711
The image contains flowers, so there is no dog in the image.
0
Flower

ps. the normal example where the prompt is to describe the image

Doing the same thing in Python using this implementation, using their (slightly modified) example:

Modified Python example
import torch
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

# default: Load the model on the available device(s)
# model = Qwen2VLForConditionalGeneration.from_pretrained(
#     "Qwen/Qwen2-VL-2B-Instruct", torch_dtype="auto", device_map="auto"
# )

# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2-VL-2B-Instruct",
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    device_map="auto",
)

# default processer
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")

# The default range for the number of visual tokens per image in the model is 4-16384.
# You can set min_pixels and max_pixels according to your needs, such as a token range of 256-1280, to balance performance and cost.
# min_pixels = 256*28*28
# max_pixels = 1280*28*28
# processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", min_pixels=min_pixels, max_pixels=max_pixels)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://www.garden-treasures.com/cdn/shop/products/IMG_6245.jpg",
            },
            {"type": "text", "text": "You are to act like an image classification model. I want you to detect if there is a dog in the image. I want you to respond '0' if there is no dog and '1' if there is a dog."},
        ],
    }
]

for _ in range(10):
	# Preparation for inference
	text = processor.apply_chat_template(
		messages, tokenize=False, add_generation_prompt=True
	)
	image_inputs, video_inputs = process_vision_info(messages)
	inputs = processor(
		text=[text],
		images=image_inputs,
		videos=video_inputs,
		padding=True,
		return_tensors="pt",
	)
	inputs = inputs.to("cuda")

	# Inference: Generation of the output
	generated_ids = model.generate(**inputs, max_new_tokens=128)
	generated_ids_trimmed = [
		out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
	]
	output_text = processor.batch_decode(
		generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
	)

	print(output_text)

Gives me the following output:

['0']
['0']
['0']
['0']
['0']
['0']
['0']
['0']
['0']
['0']

Did I wrongly configure something? I tried disabling ISQ but it did not have any effect.

Also, running the example in a loop gives the following outputs (prefixed with the loop index to make clear when a new response starts) :

Rust example output
0: flowers are ornamentals that have variable sizes and are multiflowered. In the image, you have a water red lily, which is a beautiful flower with a distinctive red and white striped pattern in the petals.

Fun fact: Water lilies are commonly found in marshes, swamps, and riversides due to their ability to thrive in low-lying and stagnant water conditions, giving them an interesting and unique appearance. They require plenty of sunlight for their roots to grow and thrive.

1: Chinese Camellia (Camellia) is a type of flowering plant usually referred to as the 'Camilla' in plants and flowers. It has large, open beautiful pinkish flowers arranged in a circle. Here are some fun facts about this beautiful flower:
- The Camellia is native to Japan, but it今日spread to many parts of the world.
- They belong to the genus Camellia in the-borders the family Theaceae.
- These flowers can reach up to 40 cm or more in diameter. Largest flowers are from a variety named 'Yulan I' by Lili de Kloet.
- The flowers can be ball-shaped or de-sheeted.
- With a *look of Amiya and romantic *meaning, they are often favorite flowers among many people in Japan.

2: /**
 *
 *
 *
 *
 * Flower Stems:
 *
 * Japanese Anemone:
 * Gardening Geumblette:
 * Copia:
 * Pureness:
 * Bali Blue:
 * Cannes:
 * Canaria Miniature:
 * Watari Kiss:
 * Old Fashion Rose:
 * Bali:
 * Original Cosmos:
 * Congo Al distinctions:
 * ..."
 *
 * Other Things:
 *
 * Topaz Azure:
 *
 * Brazilian Savanna Chic:
 *
 * Orange Island Indy:
 * Joe Co:
 * ...details:
 * ..."
 * ...important details:
 * ..."
 * ...suggestions:
 * ..."
 * ...alterations:
 * ..."
 * ..."
 * ...additional notes:
 * ..."
 * ..."
 *
 * *flower
 * * plant
 * * plants
 * *green
 * *greenery
 * *green leaves
 * *greenery leaves
 * * foliage
 * *camellia
 * *camellias
 * *tulip
 * *lily
 * *iris
 * *lily family
 * *flowering plant
 * *plant family
 * *green
 * *greenery
 * *green leaves
 * *greenery leaves
 * *foliage
 * *camellia
 * *tulip
 * *lily
 * *ibris
 * *flowering plant
 * *plant family
 * *green
 * *greenery
 * *green leaves
 * *greenery leaves
 * *foliage
 * *flower
 * *flowering
 * *garden
 * *green
 * *greenery
 * *green leaves
 * *garden
 * *nature
 * *florist
 * *plant
 * *plant family
 * *green
 * *greenery
 * *green leaves
 * *garden
 * *nature
 * *flowering
 * *receipe
 * ..."
 * ..."
 *
 * Please let me know if you need further assistance.
 *..."

I hope this helps!

3: - **Flower Name:** The flower you've described is a **Camellia x williamsii**, a species of deciduous shrubs and trees from the genus Camellia in the family Magnoliaceae. It is known for its large, fragrant flowers that bloom in late winter or early spring and are commonly used in horticulture and perfume to maintain delicate, blyssoming scent.

  - **Fun Facts:** 
    - **Embrace the Finess:** Camellias are also known as ".but During times you can actually be gentle, paying them with plenty of light and air." But these flowers need protection if you want to see them do their thing.
    - **Camellia Grace:** Camellias can take many names, and the Cambridge ladies should have a plant of them around. They add a sense of elegance to any garden.
    - **Crazy Carol:** Carol plants may be described as fouxy, which is a funny way of describing crooked plant. They should not be grown among the dogFACEs in your garden.

4: ondo ceramics

5: jodging98

6: (nature)

7: loved the reds!

8: styles:
This plant has no stamen (male parts).maple blade
This plant has a male catkin (sterile, flowering flower-like product).
These plants are not toxic and can be pink or white in color and spaced out on the plant.
Garden center to Fitness center doesn't cross the ocean.File:IB3II_16_right_S. McD children's book realtà
Sc respirant 
Installing a mosiac under the coco Sty|M&Winteralt:LuD daum

9: no attempt

Latest commit or version

I'm using the latest master branch.

@hgaiser hgaiser added the bug Something isn't working label Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant