Bookkeeping Service Providers

  • Accounting
  • Bookkeeping
  • US Taxation
  • Financial Planning
  • Accounting Software
  • Small Business Finance
You are here: Home / CLOUD / One year of Phi: Small language models making big leaps in AI

One year of Phi: Small language models making big leaps in AI

May 1, 2025 by cbn Leave a Comment

A new era of AI 

One year ago, Microsoft introduced small language models (SLMs) to customers with the release of Phi-3 on Azure AI Foundry, leveraging research on SLMs to expand the range of efficient AI models and tools available to customers. 

Today, we are excited to introduce Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning—marking a new era for small language models and once again redefining what is possible with small and efficient AI. 

A man sitting at a desk with a computer

Azure AI Foundry

Find the ideal model for your business needs, then tinker, tweak, and customize within a project to achieve all your AI goals.

Discover more >

Reasoning models, the next step forward

Reasoning models are trained to leverage inference-time scaling to perform complex tasks that demand multi-step decomposition and internal reflection. They excel in mathematical reasoning and are emerging as the backbone of agentic applications with complex, multi-faceted tasks. Such capabilities are typically found only in large frontier models. Phi-reasoning models introduce a new category of small language models. Using distillation, reinforcement learning, and high-quality data, these models balance size and performance. They are small enough for low-latency environments yet maintain strong reasoning capabilities that rival much bigger models. This blend allows even resource-limited devices to perform complex reasoning tasks efficiently.

Phi-4-reasoning and Phi-4-reasoning-plus 

Phi-4-reasoning is a 14-billion parameter open-weight reasoning model that rivals much larger models on complex reasoning tasks. Trained via supervised fine-tuning of Phi-4 on carefully curated reasoning demonstrations from OpenAI o3-mini, Phi-4-reasoning generates detailed reasoning chains that effectively leverage additional inference-time compute. The model demonstrates that meticulous data curation and high-quality synthetic datasets allow smaller models to compete with larger counterparts.

Phi-4-reasoning-plus builds upon Phi-4-reasoning capabilities, further trained with reinforcement learning to utilize more inference-time compute, using 1.5x more tokens than Phi-4-reasoning, to deliver higher accuracy.

Despite their significantly smaller size, both models achieve better performance than OpenAI o1-mini and DeepSeek-R1-Distill-Llama-70B at most benchmarks, including mathematical reasoning and Ph.D. level science questions. They achieve performance better than the full DeepSeek-R1 model (with 671-billion parameters) on the AIME 2025 test, the 2025 qualifier for the USA Math Olympiad. Both models are available on Azure AI Foundry and HuggingFace, here and here.

A graph of different colored bars
Figure 1. Phi-4-reasoning performance across representative reasoning benchmarks spanning mathematical and scientific reasoning. We illustrate the performance gains from reasoning-focused post-training of Phi-4 via Phi-4-reasoning (SFT) and Phi-4-reasoning-plus (SFT+RL), alongside a representative set of baselines from two model families: open-weight models from DeepSeek including DeepSeek R1 (671B Mixture-of-Experts) and its distilled dense variant DeepSeek-R1 Distill Llama 70B, and OpenAI’s proprietary frontier models o1-mini and o3-mini. Phi-4-reasoning and Phi-4-reasoning-plus consistently outperform the base model Phi-4 by significant margins, exceed DeepSeek-R1 Distill Llama 70B (5x larger) and demonstrate competitive performance against significantly larger models such as Deepseek-R1.
A graph of numbers and a number of people
Figure 2. Accuracy of models across general-purpose benchmarks for: long input context QA (FlenQA), instruction following (IFEval), Coding (HumanEvalPlus), knowledge & language understanding (MMLUPro), safety detection (ToxiGen), and other general skills (ArenaHard and PhiBench). 

Phi-4-reasoning models introduce a major improvement over Phi-4, surpass larger models like DeepSeek-R1-Distill-70B and approach Deep-Seek-R1 across various reasoning and general capabilities, including math, coding, algorithmic problem solving, and planning. The technical report provides extensive quantitative evidence of these improvements through diverse reasoning tasks.

Phi-4-mini-reasoning

Phi-4-mini-reasoning is designed to meet the demand for a compact reasoning model. This transformer-based language model is optimized for mathematical reasoning, providing high-quality, step-by-step problem solving in environments with constrained computing or latency. Fine-tuned with synthetic data generated by Deepseek-R1 model, Phi-4-mini-reasoning balances efficiency with advanced reasoning ability. It’s ideal for educational applications, embedded tutoring, and lightweight deployment on edge or mobile systems, and is trained on over one million diverse math problems spanning multiple levels of difficulty from middle school to Ph.D. level. Try out the model on Azure AI Foundry or HuggingFace today.

A graph of numbers and a number of marks
Figure 3. The graph compares the performance of various models on popular math benchmarks for long sentence generation. Phi-4-mini-reasoning outperforms its base model on long sentence generation across each evaluation, as well as larger models like OpenThinker-7B, Llama-3.2-3B-instruct, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Llama-8B, and Bespoke-Stratos-7B. Phi-4-mini-reasoning is comparable to OpenAI o1-mini across math benchmarks, surpassing the model’s performance during Math-500 and GPQA Diamond evaluations. As seen above, Phi-4-mini-reasoning with 3.8B parameters outperforms models of over twice its size. 

For more information about the model, read the technical report that provides additional quantitative insights.

Phi reasoning models in action 

Phi’s evolution over the last year has continually pushed this envelope of quality vs. size, expanding the family with new features to address diverse needs. Across the scale of Windows 11 devices, these models are available to run locally on CPUs and GPUs.

As Windows works towards creating a new type of PC, Phi models have become an integral part of Copilot+ PCs with the NPU-optimized Phi Silica variant. This highly efficient and OS-managed version of Phi is designed to be preloaded in memory, and available with blazing fast time to first token responses, and power efficient token throughput so it can be concurrently invoked with other applications running on your PC.

It is used in core experiences like Click to Do, providing useful text intelligence tools for any content on your screen, and is available as developer APIs to be readily integrated into applications—already being used in several productivity applications like Outlook, offering its Copilot summary features offline. These small but mighty models have already been optimized and integrated to be used across several applications across the breadth of our PC ecosystem. The Phi-4-reasoning and Phi-4-mini-reasoning models leverage the low-bit optimizations for Phi Silica and will be available to run soon on Copilot+ PC NPUs.

Safety and Microsoft’s approach to responsible AI 

At Microsoft, responsible AI is a fundamental principle guiding the development and deployment of AI systems, including our Phi models. Phi models are developed in accordance with Microsoft AI principles: accountability, transparency, fairness, reliability and safety, privacy and security, and inclusiveness. 

The Phi family of models has adopted a robust safety post-training approach, leveraging a combination of Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning from Human Feedback (RLHF) techniques. These methods utilize various datasets, including publicly available datasets focused on helpfulness and harmlessness, as well as various safety-related questions and answers. While the Phi family of models is designed to perform a wide range of tasks effectively, it is important to acknowledge that all AI models may exhibit limitations. To better understand these limitations and the measures in place to address them, please refer to the model cards below, which provide detailed information on responsible AI practices and guidelines.

Responsible AI at Microsoft

Learn more here: 

  • Try out the new models on Azure AI Foundry.
  • Read the Phi Cookbook.
  • Read about Phi reasoning models on edge devices.
  • Learn more about Phi-4-mini-reasoning.
  • Learn more about Phi-4-reasoning.
  • Learn more about Phi-4-reasoning-plus.
  • Read more about Phi reasoning on the Educators Developer blog.

The post One year of Phi: Small language models making big leaps in AI appeared first on Microsoft Azure Blog.

Share on FacebookShare on TwitterShare on Google+Share on LinkedinShare on Pinterest

Filed Under: CLOUD

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Archives

  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • May 2021
  • April 2021
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • March 2016

Recent Posts

  • Agent Factory: Connecting agents, apps, and data with new open standards like MCP and A2A
  • Azure mandatory multifactor authentication: Phase 2 starting in October 2025
  • Microsoft Cost Management updates—July & August 2025
  • Protecting Azure Infrastructure from silicon to systems
  • Red vs. blue vs. purple team: What are the differences?

Recent Comments

    Categories

    • Accounting
    • Accounting Software
    • BlockChain
    • Bookkeeping
    • CLOUD
    • Data Center
    • Financial Planning
    • IOT
    • Machine Learning & AI
    • SECURITY
    • Uncategorized
    • US Taxation

    Categories

    • Accounting (145)
    • Accounting Software (27)
    • BlockChain (18)
    • Bookkeeping (205)
    • CLOUD (1,320)
    • Data Center (214)
    • Financial Planning (345)
    • IOT (260)
    • Machine Learning & AI (41)
    • SECURITY (620)
    • Uncategorized (1,284)
    • US Taxation (17)

    Subscribe Our Newsletter

     Subscribing I accept the privacy rules of this site

    Copyright © 2025 · News Pro Theme on Genesis Framework · WordPress · Log in