Sunday, December 15, 2024
HomeTechnologyMicrosoft Releases a Small Phi-3 Vision Multimodal Model

Microsoft Releases a Small Phi-3 Vision Multimodal Model


Earlier in April, Microsoft released its first AI model under the open-source Phi-3 family: Phi-3 Mini. And now, after almost a month, the Redmond giant has released a small multimodal model called Phi-3 Vision. At the Build 2024, Microsoft also unveiled two more Phi-3 family models including Phi-3 Small (7B) and Phi-3 Medium (14B). All of these models are open-source under the MIT license.

As for the Phi-3 Vision model, it’s trained on 4.2 billion parameters. It means that the model is fairly lightweight. This is the first time a mega-corporation like Microsoft has open-sourced a multimodal model. It has a context length of 128K and you can feed images as well. Google did release the PaliGemma model, but it’s not meant for conversational use.

Apart from that, Microsoft says that the Phi-3 Vision model was trained on publicly available, high-quality educational and code data. Microsoft has also generated synthetic data for math, reasoning, general knowledge, charts, tables, diagrams, and slides.

Image Courtesy: Microsoft

Despite its small size, the Phi-3 Vision model performs better than Claude 3 Haiku, LlaVa, and Gemini 1.0 Pro on many multimodal benchmarks. It even comes pretty close to OpenAI’s GPT-4V model. Microsoft says that developers can use the Phi-3 Vision model for OCR, chart and table understanding, general image understanding, and more.

If you want to check out the Phi-3 Vision model, head over to Azure AI Studio (visit).



Source link

RELATED ARTICLES

Leave a Reply

Most Popular

Recent Comments

Зарегистрируйтесь, чтобы получить 100 USDT on Farmer Wants A Wife star Claire Saunders shares urgent warning after ‘shock’ health scare

Discover more from MovieBird

Subscribe now to keep reading and get access to the full archive.

Continue reading