Extract Data from PDFs to Airtable using OpenAI

In this tutorial, you'll learn how to extract data from PDFs in Airtable. We will extract specific fields (like invoice number and due date) from invoices using OpenAI. You can use the same approach to extract structured data from any PDF attachments.

Set Up Invoices Table

We'll begin by uploading our PDF invoices and setting up a filtered view with all the records that need data:

1. Create a table called "Invoices" or use an existing table.

2. Create a new field called "PDF" with the type Attachment. You can use your existing attachment field if you have one.

3. Upload invoices to the "PDF" field. Make sure the invoices are PDF files. OpenAI PDF extraction is flexible enough to handle different invoice layouts and structures, so they do not all have to be from the same supplier.

4. Rename the primary field "Name" to "Invoice number".

5. Create a new Grid view called Needs data and add the following filters applied:

Invoice number - is empty
PDF - is not empty

This view will help us extract data from the PDFs that don't have any information in the table.

Now that the table is set up, let's create an OpenAI assistant to extract PDF data for us.

Create an Assistant in OpenAI

To extract data from PDF files, we need to use OpenAI assistants, since OpenAI's API requires assistants for PDF file processing. Follow these steps to create an OpenAI assistant:

1. Sign up for an OpenAI API account, or log in to your existing account.

2. Add a payment method in your OpenAI API billing panel.

Note: OpenAI API accounts and billing are separate from ChatGPT accounts and plus subscriptions.

3. Visit the OpenAI assistants page.

4. Click + Create to create your first OpenAI assistant.

5. Name the assistant "PDF parser".

6. Under Instructions, enter "You are a helpful assistant.". This is all we need.

7. Select gpt-4o under Model.

8. Scroll down to TOOLS and turn File search on. The file search tool enables the assistant to access the PDFs we send to it and extract data.

OpenAI will save the changes automatically, so you don't have to click any buttons. Now that the assistant is set up, we can use it in Data Fetcher.

Add Data Fetcher

We'll create a request in Data Fetcher that will allow us to communicate with the OpenAI assistant we created earlier.

1. Add the Data Fetcher extension to your Airtable base.

2. Once you’ve added the extension, sign in to Data Fetcher or sign up if you don’t have an account.

3. Once you're logged in to Data Fetcher, click Create your first request.

Extract Data from PDFs to Airtable using OpenAI

Follow these steps to configure the request:

1. Select OpenAI under Application.

2. Rename the request to "Extract invoice data".

3. Copy and paste your OpenAI API key under Authorization. You can create an API key in your OpenAI account.

4. Under Endpoint, select Create assistant thread and run.

5. Click Save and Continue in the bottom right corner.

Send Messages to the Assistant

After following the previous steps, you'll be taken to a new section where you'll configure how OpenAI processes the PDFs.

1. Under Assistant, select the assistant you created earlier in OpenAI.

2. Copy and paste the following message under Messages:

Extract data from the following invoice. Return JSON data. If you cannot find a field, return null.
⁠
Output format: 
{
"InvoiceNumber": "345ADK", 
"TotalAmount": 100, 
"InvoiceDueDate": "01-27-2025" 
}

Note: If you want to extract additional fields from each invoice, such as tax or quantity, add them to the output format JSON.

Now we'll reference the PDF files in the table.

3. Click the + button next to Messages to reference the PDF files.

4. In the dialog that opens:

Select PDF under Field

The message sent to OpenAI will now include each invoice when the integration runs.

Run PDF Invoice Parser

1. Click Save and Run in the bottom right corner to proceed to the next step.

Map PDF Extraction Results to Airtable Fields

After completing the previous steps, Data Fetcher will take you to Response Field Mapping. This is where you'll configure how the extracted PDF results are imported into your Airtable fields.

Notice the "Json" prefix in the field names. Data Fetcher automatically adds this prefix because we configured our PDF invoice parser to return JSON-formatted data from the OpenAI assistant.

The OpenAI API response returns four fields from the request, but we only need the three invoice data fields we specified in our output format. Follow these steps to map them:

1. Click the checkbox next to "Message" to deselect it (we don't need the raw message text).

3. Click Save and Run in the bottom right corner to run the PDF invoice parser.

Return to the default view of your Invoices table, and you should see the extracted invoice data populated automatically from your PDF files through the OpenAI PDF extraction process.

Automate PDF Extraction

Currently, whenever you add new PDFs to your table, you'll have to manually run the request. To save time, you can use Data Fetcher's Trigger feature to run the request automatically.

The Trigger feature is only available on our paid plans, so you can follow these steps to upgrade:

1. Open the request in Data Fetcher, scroll down to the Schedule / Trigger / Webhook URL tabs, then select Upgrade under Trigger.

A new window will open, asking you to grant Data Fetcher access to your Airtable base.

5. Click Add a base, then select + Add resources so you won’t have to authorize Data Fetcher anytime you want to use it on an additional Airtable base.

Now that you've authorized Data Fetcher, you can create a trigger to run the request when you add a new PDF to the table:

Your automation is now active! Data Fetcher will automatically extract data from any new PDFs you add to your Invoices table. Simply upload PDF invoices and the system will populate the invoice numbers, amounts, and due dates without any manual work - saving you from manual data entry.

That's all for this tutorial. You can check out our blog to learn how to use OpenAI and Data Fetcher to generate images and text in Airtable.

Beyond PDF Invoices: Other Use Cases

This same workflow works with many other file types and business scenarios. You can extract structured data from Word documents, PowerPoint presentations, text files, code files, and more. Here are some practical applications:

The flexibility of OpenAI's file processing means you can adapt this approach to virtually any document-based workflow where you need to turn unstructured files into organized Airtable data.

Supported File Types

For working with images instead of documents, check out our guide on extracting data from images to learn how to process screenshots, charts, and visual content.

Extract Data from PDFs to Airtable using OpenAI

Contents

Set Up Invoices Table

Create an Assistant in OpenAI

Add Data Fetcher

Extract Data from PDFs to Airtable using OpenAI

Send Messages to the Assistant

Run PDF Invoice Parser

Map PDF Extraction Results to Airtable Fields

Automate PDF Extraction

Beyond PDF Invoices: Other Use Cases

Supported File Types

Related Posts

How to Extract Data from Images in Airtable using AI

How to Use Airtable Generative AI

How to use GPT-4o in Airtable with No-Code

Loved by Airtable users like you

"Need data pumped into Airtable? Data Fetcher is the solution."

Ready to build with Data Fetcher?