About Nougat

This is the official repository for Nougat, the academic document PDF parser that understands LaTeX math and tables.

Project page: [https://facebookresearch.github.io/nougat/


) Install

From pip:

pip install nougat-ocr

From repository:

pip install git+https://github.com/facebookresearch/nougat

There are extra dependencies if you want to call the model from an API or generate a dataset. Install via

pip install "nougat-ocr[api]"` or `pip install "nougat-ocr[dataset]"

Get prediction for a PDF


To get predictions for a PDF run

` $ nougat path/to/file.pdf usage: nougat [-h] [--batchsize BATCHSIZE] [--checkpoint CHECKPOINT] [--out OUT] pdf [pdf ...]

positional arguments: pdf PDF(s) to process.

optional arguments: -h, --help show this help message and exit --batchsize BATCHSIZE, -b BATCHSIZE Batch size to use. Defaults to 6 which runs on 24GB VRAM. --checkpoint CHECKPOINT, -c CHECKPOINT Path to checkpoint directory --out OUT, -o OUT Output directory. `

In the output directory every PDF will be saved as a .mmd file, the lightweight markup language, mostly compatible with Mathpix Markdown (we make use of the LaTeX tables).


With the extra dependencies you use app.py to start an API. Call

$ nougat_api

To get a prediction of a PDF file by making a POST request to It also accepts parameters start and stop to limit the computation to select page numbers (boundaries are included).


