Using this tutorial, we will walk through the process of setting up an Airtable web scraper and importing data from your chosen website into Airtable.
Web scraping is a powerful technique to gather data from websites. We will be using ParseHub, a free web scraping tool, to obtain our data.
We'll also be using Data Fetcher to create our Airtable web scraper. Data Fetcher is an extension that enables you to connect Airtable to any application or API with no-code.
Install Data Fetcher from the Airtable app marketplace. After the extension launches, sign up for a free Data Fetcher account by entering a password and clicking 'Sign up for free'. If you already have an account, click 'Have an account?' at the bottom left of the screen.
For this tutorial, we'll be scraping data from this URL: https://www.reuters.com/technology/
Open ParseHub and click '+ New Project' on the homescreen.
Enter the URL you would like to scrape data from, then click 'Start project on this URL'.
Next, go through the process of selecting which data you want to collect from the page by clicking on it in the ParseHub interface. This can be headings, sections of text, images, prices, ratings, addresses, URLs etc. For this example, we'll select headings and URLs from the page to scrape. Once you have done this, click 'Get Data'.
Click 'Create your first request' on the home screen of the Data Fetcher app. By using requests in Data Fetcher, you are able to easily import data to or send data from Airtable.
For Application, select 'ParseHub' on the create request screen.
You need to enter your ParseHub API key to authorize Data Fetcher to read from your ParseHub account. Find your ParseHub API key here.
Copy and paste your personal API key from your ParseHub account into Data Fetcher below the Authorization label.
For Endpoint, select 'Import data from a project's latest run'.
Add a Name for your request, e.g. 'Scrape Website Data'.
Click 'Save & Continue'.
Under Project on the next screen, select the ParseHub project which you would like to import your scraped data into Airtable from.
Select the Airtable Output Table & View you would like to import data into and click 'Save & Run'.
Data Fetcher will now run the request, and the Response Field Mapping window will open.
Here, you can select the fields you want to import into your Airtable web scraper. You can also choose which fields to map them to in the output table.
For this example, we only have two fields, and we'll import both of them and map them to new fields in Airtable. (These options will be pre-selected.)
If you have more fields, it's easier to 'Filter all' and select the fields you want to import from your ParseHub project by using the 'Find field' search bar.
Click 'Save & Run'.
Data Fetcher will now create any new fields in Airtable and import the website data from ParseHub to Airtable. You can now view the data in your Airtable web scraper.
Instead of manually running a request every time you want to run your Airtable web scraper, you can use Data Fetcher's scheduled requests feature and automatically import data on a regular schedule.
In Data Fetcher, scroll to Schedule and click 'Upgrade'.
Select a plan depending on your usage needs and enter your payment details.
Click 'I've done this' back in the Data Fetcher extension.
Under Schedule click '+ Authorize'.
Next, authorize the Airtable bases you want Data Fetcher to be able to access.
We recommend you select 'All current and future bases in all current and future workspaces' to avoid issues with any unauthorized bases in the future.
Click 'Grant access'.
Schedule this request will now be toggled on.
Select how often you want the web scraper to run, e.g. 'Every 15 mins'. Click 'Save', and the request will now run on the schedule and sync any new scraped website data automatically.