OpenAI has launched a new tutorial on how to use GPT-4 and Whisper to create an automated meeting note-taking tool.
Create an automated meeting minutes generator using Whisper and GPT-4
In this tutorial, we'll leverage OpenAI's Whisper and GPT-4 models to develop an automated meeting minutes generator. The app can transcribe meeting audio, provide a summary of discussions, extract key points and action items, and perform sentiment analysis.
ready to start
This tutorial assumes you have a basic understanding of Python and OpenAI API keys. You can use the audio files provided with this tutorial or your own.
Also, you need to install python-docx and OpenAI library. You can create a new Python environment and install the required packages with:
Audio transcription with Whisper
The first step in transcribing meeting audio is to pass the meeting audio file into our /v1/audio API. Whisper is the model that drives the audio API, capable of converting spoken language into written text. We'll refrain from passing hints or temperatures (optional parameters that control the model output), and keep the defaults.
Next, we import the required packages and define a function that takes an audio file and transcribes it using the Whisper model:
In this function, audio_file_path is the path to the audio file you want to transcribe. This function opens this file and passes it to the Whisper ASR model (whisper-1) for transcription. Results are returned as raw text. It should be noted that the openai.Audio.transcribe function needs to pass the actual audio file, not just the path of the file on the local or remote server. This means that if you're running this code on a server, and you probably don't store the audio file, you'll need a preprocessing step to download the audio file to that device first.
Summarization and analysis of conference proceedings using GPT-4.
Once we have the minutes, we pass them to GPT-4 via the ChatCompletions API. GPT-4 is OpenAI's latest large-scale language model, and we will use it to generate summaries, extract key points and action items, and perform sentiment analysis.
This tutorial uses a different function for each task we want GPT-4 to accomplish. It's not the most efficient way - you could put these instructions into a function, but splitting them up improves the quality of the summary.
To split these tasks, we define the meeting_minutes function, which will be the main function of the application:
In this function, transcription is the text we got from Whisper. This text can be passed to four other functions, each designed to perform a specific task: abstract_summary_extraction generates a summary of the meeting, key_points_extraction extracts the main points, action_item_extraction identifies action items, and sentiment_analysis performs sentiment analysis. If you want other functionality, you can add it using the same framework shown above.
Here's how each function works:
Abstract Extraction Function
The summary extraction function will summarize the transcribed text into concise paragraphs, with the purpose of retaining the most important points and avoiding unnecessary details or digressions. The main mechanism for accomplishing this is the system message shown below. Similar results can be achieved in many different ways through a process known as cue engineering. You can read our GPT best practices guide, which provides in-depth advice on how to do this most effectively.
The main points extraction function identifies and lists the main points of the meeting discussion. These points should represent the most important ideas, findings, or key themes at the heart of the discussion. Again, the main mechanism that controls the recognition of these points is the system message. You may wish to provide some additional context here, such as about your project or how your company operates, such as "We are a company that sells racing cars to consumers. We operate XYZ with XYZ goals". This additional background information can greatly improve the model's ability to extract relevant information.
Action Item Extraction Function
The action item extraction function identifies tasks, assignments, or actions that were agreed upon or mentioned in the meeting. These action items can be tasks assigned to specific people or general actions the team decides to take. While not covered in this tutorial, the Chat Completions API provides function calls that give you the ability to automatically create tasks in task management software and assign them to relevant people.
The sentiment_analysis function analyzes the sentiment of the entire discussion. It takes into account the tone of the language used and the emotion conveyed, as well as the context in which words and phrases are used. For simpler tasks, instead of using gpt-4, try gpt-3.5-turbo to see if you can get similar performance. Also, it might be helpful to experiment by passing the results of the sentiment_analysis function to other functions to see how the sentiment in question affects other properties.
Export meeting minutes
Once you've generated meeting minutes, it's beneficial to save them in a readable format for easy distribution. Microsoft Word is one of the common formats for this type of report. Python's docx library is a popular open source library for creating Word documents. If you want to build a full meeting minutes application, consider removing this export step and instead send the summary inline as an email follow-up.
To handle the export process, we can define a function called save_as_docx that converts the raw text into a Word document
In this function,
minutes is a dictionary containing the meeting's summary, highlights, action items, and sentiment analysis.
Filename is the name of the Word document file to create. The function creates a new Word document, adds titles and content for each section of the meeting minutes, and saves the document to the current working directory.
Finally, you can combine all the steps to generate meeting minutes from an audio file:
This code transcribes the audio file Earningscall.wav, generates meeting minutes, prints them, and saves them to a Word document named meeting_minutes.docx.
Now that you have a basic meeting minutes processing setup, try to optimize performance through hint engineering, or build an end-to-end system that implements native function calls.