Play all

Intro

Phi3 mini 128k

Defining input and output parameters

Defining artifacts

Flowing the model

MLFlow notebook

MLFlow model

ONNX model

ONNX performance

DirectML

Microsoft ONNX

Conclusion

Description:

Explore the Microsoft Phi3 Mini 128k model and compare inference performance across different formats and quantization methods in this 45-minute video tutorial. Learn how to work with MLFlow, GGUF, and ONNX formats while examining their impact on inference time and precision. Follow along with provided notebooks to implement MLFlow quantization with bfloat16, Llama.cpp quantization with float16 in GGUF format, ONNX CPU quantization with int4, and ONNX GPU DirectML quantization with int4. Gain insights into defining input and output parameters, managing artifacts, and flowing the model through various frameworks. Conclude with a comprehensive understanding of the performance differences between these approaches for deploying the Phi3 mini 128k model in machine learning and data science applications.

MLOps: Comparing Microsoft Phi3 Mini 128k in GGUF, MLFlow, and ONNX Formats

The Machine Learning Engineer

Add to list

#Computer Science #Machine Learning #MLOps #Data Science #MLFlow #ONNX

0:00 / 0:00