Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Gemini Embedding 2: natively multimodal embedding model (blog.google)

35 points by panarky 5 days ago | 5 comments

jeanloolz 5 days ago [-]

This is colossal. It can creates embeddings on pretty much any type of format, video, audio, documents. The context is still a bit small compared to what we are used to in text, but this seems major

Grimblewald 4 days ago [-]

How does it compare with qwens open weight multimodal embedding model? Anyone know? This seems lesser form what i read, with the drawback of bei g via some api/model i dont have control over. Qwen gives great ebeddings out of the gate while also being steerable, i.e. you can supply a prompt to focus on embedding specific tasks with higher resolution, which in my tests has been mind-blowingly good. Not seeing the value add here.

zhangchen 1 days ago [-]

the steerability point is interesting. have you tried using task-specific prompts for cross-modal retrieval though? like searching images with text queries. curious whether qwen's prompt-based steering actually helps there or if it mainly improves same-modality tasks. the 3072-dim space seems tight for encoding all those modalities well.

jerrygoyal 4 days ago [-]

what's the pricing and how does it compare to zembed-1 for text only embeddings?

jiggawatts 4 days ago [-]

Pricing is here: https://cloud.google.com/vertex-ai/generative-ai/pricing#emb...

Seems to be 20 cents per million tokens of text and 0.012 cents per image.

Rendered at 13:35:37 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.