OpenAI has launched a new tutorial on using GPT-4 and Whisper to create automated meeting notes tools.
Creating an automated meeting minutes generator with Whisper and GPT-4
In this tutorial, we'll harness the power of OpenAI's Whisper and GPT-4 models to develop an automated meeting minutes generator. The application transcribes audio from a meeting, provides a summary of the discussion, extracts key points and action items, and performs a sentiment analysis.
This tutorial assumes a basic understanding of Python and an OpenAI API key. You can use the audio file provided with this tutorial or your own.
Additionally, you will need to install the python-docx and OpenAI libraries. You can create a new Python environment and install the required packages with the following commands:
Transcribing audio with Whisper
The first step in transcribing the audio from a meeting is to pass the audio file of the meeting into our /v1/audio API. Whisper, the model that powers the audio API, is capable of converting spoken language into written text. To start, we will avoid passing a prompt or temperature (optional parameters to control the model's output) and stick with the default values.
Next, we import the required packages and define a function that uses the Whisper model to take in the audio file and transcribe it:
In this function, audio_file_path is the path to the audio file you want to transcribe. The function opens this file and passes it to the Whisper ASR model (whisper-1) for transcription. The result is returned as raw text. It’s important to note that the openai.Audio.transcribe function requires the actual audio file to be passed in, not just the path to the file locally or on a remote server. This means that if you are running this code on a server where you might not also be storing your audio files, you will need to have a preprocess step that first downloads the audio files onto that device.
Summarizing and analyzing the transcript with GPT-4
Having obtained the transcript, we now pass it to GPT-4 via the ChatCompletions API. GPT-4 is OpenAI's state-of-the-art large language model which we'll use to generate a summary, extract key points, action items, and perform sentiment analysis.
This tutorial uses distinct functions for each task we want GPT-4 to perform. This is not the most efficient way to do this task - you can put these instructions into one function, however, splitting them up can lead to higher quality summarization.
To split the tasks up, we define the
meeting_minutes function which will serve as the main function of this application:
In this function,
transcription is the text we obtained from Whisper. The transcription can be passed to the four other functions, each designed to perform a specific task:
abstract_summary_extraction generates a summary of the meeting,
key_points_extractionextracts the main points,
action_item_extraction identifies the action items, and
sentiment_analysis performs a sentiment analysis. If there are other capabilities you want, you can add those in as well using the same framework shown above.
Here is how each of these functions works:
abstract_summary_extraction function takes the transcription and summarizes it into a concise abstract paragraph with the aim to retain the most important points while avoiding unnecessary details or tangential points. The main mechanism to enable this process is the system message as shown below. There are many different possible ways of achieving similar results through the process commonly referred to as prompt engineering. You can read our GPT best practices guide which gives in depth advice on how to do this most effectively.
Key points extraction
key_points_extraction function identifies and lists the main points discussed in the meeting. These points should represent the most important ideas, findings, or topics crucial to the essence of the discussion. Again, the main mechanism for controlling the way these points are identified is the system message. You might want to give some additional context here around the way your project or company runs such as “We are a company that sells race cars to consumers. We do XYZ with the goal of XYZ”. This additional context could dramatically improve the models ability to extract information that is relevant.
Action item extraction
action_item_extraction function identifies tasks, assignments, or actions agreed upon or mentioned during the meeting. These could be tasks assigned to specific individuals or general actions the group decided to take. While not covered in this tutorial, the Chat Completions API provides a function calling capability which would allow you to build in the ability to automatically create tasks in your task management software and assign it to the relevant person.
The sentiment_analysis function analyzes the overall sentiment of the discussion. It considers the tone, the emotions conveyed by the language used, and the context in which words and phrases are used. For tasks which are less complicated, it may also be worthwhile to try out gpt-3.5-turbo in addition to gpt-4 to see if you can get a similar level of performance. It might also be useful to experiment with taking the results of the sentiment_analysis function and passing it to the other functions to see how having the sentiment of the conversation impacts the other attributes.
Exporting meeting minutes
Once we've generated the meeting minutes, it's beneficial to save them into a readable format that can be easily distributed. One common format for such reports is Microsoft Word. The Python docx library is a popular open source library for creating Word documents. If you wanted to build an end-to-end meeting minute application, you might consider removing this export step in favor of sending the summary inline as an email followup.
To handle the exporting process, define a function save_as_docx that converts the raw text to a Word document:
In this function, minutes is a dictionary containing the abstract summary, key points, action items, and sentiment analysis from the meeting. Filename is the name of the Word document file to be created. The function creates a new Word document, adds headings and content for each part of the minutes, and then saves the document to the current working directory.
Finally, you can put it all together and generate the meeting minutes from an audio file:
This code will transcribe the audio file Earningscall.wav, generates the meeting minutes, prints them, and then saves them into a Word document called meeting_minutes.docx.
Now that you have the basic meeting minutes processing setup, consider trying to optimize the performance with prompt engineering or build an end-to-end system with native function calling.