The quantization formats will soon be updated: #1305
All ggml
model files using the old format will not work with the latest llama.cpp
code after that change is merged
Hot topics:
Table of Contents
The main goal of llama.cpp
is to run the LLaMA model using 4-bit integer quantization on a MacBook
The original implementation of llama.cpp
was hacked in an evening. Since then, the project has improved significantly thanks to many contributions. This project is for educational purposes and serves as the main playground for developing new features for the ggml library.
Supported platforms:
Supported models:
Bindings:
UI:
Visit Official Website