Extract Data from PDFs to Airtable using OpenAI

Jul 18, 2024Andy Cloke

This tutorial shows you how to extract data from PDF files directly into Airtable using OpenAI and Data Fetcher. You'll extract specific fields (like document numbers and dates) from PDF attachments and automatically populate your Airtable fields. The same approach works for parsing any structured data from PDFs into your database.

Set Up Your Documents Table

We'll begin by uploading our PDF files and setting up a filtered view with all the records that need data:

1. Create a table called "Documents" or use an existing table.

2. Create a new field called "PDF" with the type Attachment. You can use your existing attachment field if you have one.

3. Upload PDF files to the "PDF" field. Make sure the documents are PDF files. OpenAI can parse different document layouts and structures, so they don't need to be from the same source or template.

4. Rename the primary field "Name" to "Title".

documents in table.png

5. Create a new Grid view called Needs data and add the following filters applied:

  • Title - is empty
  • PDF - is not empty
needs data view.png

This view will allow us to avoid processing the same documents twice, and save API credits.

Now that the table is set up, let's grab an OpenAI API key.

Create an OpenAI API Key

1. Sign up for an OpenAI API account, or log in to your existing account.

2. Add a payment method in your OpenAI API billing panel.

3. Create an API key in your OpenAI account. Save this API key - you'll need it in the next steps.

Note: OpenAI API accounts and billing are separate from ChatGPT subscriptions. You will still need to go through these steps even if you have a subscription to ChatGPT.

API costs are typically $0.001-0.005 per document, depending on size and complexity.

Add Data Fetcher

We'll create a request in Data Fetcher that will allow us to send PDFs to OpenAI to parse the information from the PD documents.

1. Add the Data Fetcher extension to your Airtable base.

2. Once you’ve added the extension, sign in to Data Fetcher or sign up if you don’t have an account.

Data Fetcher extension sign up.png

3. Once you're logged in to Data Fetcher, click Create your first request.

Create your first request button.png

Extract Data from PDFs to Database using OpenAI

Follow these steps to configure the request:

1. Select OpenAI under Application.

OpenAI application dropdown.png

2. Rename the request to "Extract PDF data".

3. Copy and paste your OpenAI API key under Authorization.

api key authorization.png

4. Under Endpoint, select Create a model response.

openai Create a model response endpoint.png

5. Click Save and Continue in the bottom right corner.

openai extract pdf save & continue.png

Configure PDF Data Extraction

After following the previous steps, you'll be taken to a new section where you'll configure how OpenAI processes the PDFs.

1. For this tutorial, we'll use the GPT-5-mini model, which provides good performance at lower cost.

openai gpt 5 mini model.png

2. Copy and paste the following under Input. The message below tells OpenAI exactly which fields to extract from your PDFs and how to structure them. Modify this template to match the data you need:

Extract data from the following document. Return JSON data. If you cannot find a field, return null.

Output format: 
{
"DocumentNumber": "DOC-12345", 
"Title": "Project Requirements Document", 
"Date": "01-27-2025" 
}
openai input extract data base message.png

Note: To extract additional fields (like author, category, or amount), add them to the JSON template above.

Now we'll reference the PDF files in the table.

3. Click the + button next on the right-hand side of Input to add a field reference.

openai input extract pdf data + button.png

4. In the dialog that opens:

  • Ensure Documents is selected for Table
  • Select PDF under Field
  • Under Run for every record in view, select Needs data

5. Click Confirm to save and close the dialog.

pdf document field table reference.png

The message sent to OpenAI will now include each PDF document when the integration runs.

pdf data extract cell reference complete.png

Test the PDF Data Extraction

1. Click Save and Run in the bottom right corner to proceed to the next step.

save & run pdf data extract pdf.png

Map PDF Parsing Results to Airtable Fields

After completing the previous steps, Data Fetcher will take you to Response Field Mapping. This is where you'll configure how the extracted PDF results are imported into your Airtable fields.

response field mapping pdf data.png

You can ignore the "Json" prefix in the field names. Data Fetcher automatically adds this prefix because we configured our PDF document extractor to return JSON-formatted data in the OpenAI response.

The OpenAI API response returns four fields from the request, but we only need the three document data fields we specified in our output format. Follow these steps to map them:

1. Click the selected "Message" field to deselect it (we don't need the raw message text).

2. Map the remaining extracted PDF fields to your Airtable columns:

  • Json title → Existing field Title
  • Json document number New field Document number
  • Json date → New field date
PDF data fields mapped.png

3. Click Save and Run in the bottom right corner to run the PDF extraction.

Return to the default "Grid view" in your Documents table, and you will see the extracted data populated automatically from your PDF files through the OpenAI PDF parsing process.

airtable pdfs with data extracted.png

Automate PDF Extraction

Currently, whenever you add new PDFs to your table, you'll have to manually run the request. To save time, you can use Data Fetcher's Trigger feature to run the request automatically.

The Trigger feature is only available on our paid plans, so you can follow these steps to upgrade:

1. Open the request in Data Fetcher, scroll down to the Schedule / Trigger / Webhook URL tabs, then select Upgrade under Trigger.

upgrade button trigger tab.png

2. Select a paid plan and complete the payment process.

3. Click + Authorize to give Data Fetcher access to your Airtable base.

trigger tab authorize.png

4. Click I understand, let's Authorize.

A new window will open, asking you to grant Data Fetcher access to your Airtable base.

5. Click Add a base, then select + Add resources so you won’t have to authorize Data Fetcher anytime you want to use it on an additional Airtable base.

airtable oauth grant all access.png

6. Click Grant access.

Now that you've authorized Data Fetcher, you can create a trigger to run the request when you add a new PDF to the table:

1. Select Record created.

trigger tab record created.png

2. Select "Documents" under Table and "Needs data" under View.

documents needs data trigger.png

3. Finally, click Save at the bottom of the screen.

Your PDF to Airtable automation is now active. When you upload new PDFs to your Documents table, Data Fetcher automatically extracts the document numbers, titles, and dates into your Airtable fields.

That's all for this tutorial. You can check out our blog to learn how to use OpenAI and Data Fetcher to generate images and text in Airtable.

Beyond Basic Documents: Other Use Cases

This same workflow works with many other file types and business scenarios. You can extract structured data from Word documents, PowerPoint presentations, text files, code files, and more. Here are some practical applications:

  • Contract Management: Extract key terms, dates, and parties from legal documents and contracts stored as PDFs or Word files.
  • Resume Screening: Parse candidate information from resumes in various formats to populate your hiring database.
  • Research Documentation: Extract findings, data points, and citations from research papers and academic documents.
  • Financial Reports: Pull key metrics and figures from financial statements and quarterly reports.
  • Technical Documentation: Extract API endpoints, code snippets, and configuration details from technical docs.

The flexibility of OpenAI's file processing means you can adapt this approach to virtually any document-based workflow where you need to turn unstructured files into organized Airtable data.

Supported File Types

OpenAI supports over 30 file formats for data extraction:

  • Documents: PDF, Word (.doc, .docx), Excel (.xlsx), PowerPoint (.pptx), CSV, XML, Markdown (.md), plain text (.txt)
  • Images: JPEG, JPG, PNG, GIF
  • Code Files: Python (.py), JavaScript (.js), TypeScript (.ts), Java (.java), C++ (.cpp), C (.c), C# (.cs), Ruby (.rb), PHP (.php), CSS (.css), Shell scripts (.sh)
  • Archives: ZIP (.zip), TAR (.tar)
  • Other: HTML (.html), JSON (.json), TeX (.tex), Pickle files (.pkl)
  • G2 rating

    Loved by Airtable users like you

    Data Fetcher customers spend less time copying data and more time using it.

    1 / 11

    "Need data pumped into Airtable? Data Fetcher is the solution."

    Data Fetcher is incredibly easy to use and understand. We have no API or data experience, yet our team can seamlessly integrate external data easily with Data Fetcher.

    Thomas Coiner

    Thomas Coiner

    CEO, ProU Sports

    Ready to build with Data Fetcher?

    Start connecting your data sources with Airtable today.