entry-slick
About llama.cpp

⚠️ TEMPORARY NOTICE ABOUT UPCOMING BREAKING CHANGE ⚠️

The quantization formats will soon be updated: #1305

All ggml model files using the old format will not work with the latest llama.cpp code after that change is merged


Hot topics:

Table of Contents

Description

The main goal of llama.cpp is to run the LLaMA model using 4-bit integer quantization on a MacBook

  • Plain C/C++ implementation without dependencies
  • Apple silicon first-class citizen - optimized via ARM NEON and Accelerate framework
  • AVX, AVX2 and AVX512 support for x86 architectures
  • Mixed F16 / F32 precision
  • 4-bit, 5-bit and 8-bit integer quantization support
  • Runs on the CPU
  • OpenBLAS support
  • cuBLAS and CLBlast support

The original implementation of llama.cpp was hacked in an evening. Since then, the project has improved significantly thanks to many contributions. This project is for educational purposes and serves as the main playground for developing new features for the ggml library.

Supported platforms:

  • Mac OS
  • Linux
  • Windows (via CMake)
  • Docker

Supported models:

Bindings:

UI:

Visit Official Website

https://github.com/ggerganov/llama.cpp

Community Posts
no data
Nothing to display