Create an Automated Scraper | Scribe

    Create an Automated Scraper

    • Scott Colenutt |
    • 46 steps |
    • 21 minutes
    1
    You'll start by creating an Airtable base. This will be the place where you'll send your scraped data.\ \ Navigate to [AIrtable](https://airtable.com/)
    2
    From the Airtable dashboard, click '**+Create**' from the left-hand menu.
    3
    Click "**Select a workspace**".
    4
    Select your workspace.
    5
    Click '**Start from scratch**'.
    6
    Change your Airtable base name to '**Top Stories**'.
    7
    Sign in to your [Browse AI](https://www.browse.ai/?utm_source=sponsored&utm_medium=email&utm_campaign=100daysofai) dashboard and click '**+ Build New Robot**'.
    8
    Enter the URL of the URL you'd like to scrape and then click "**Start Training Robot**"\ \ You're welcome to use any URL you like, but keep it simple as you're learning and try to select a website with a simple list of items.\ \ If you want to follow along with this specific example, enter [https://www.theverge.com/](https://www.theverge.com)
    9
    A new browser window will open with your URL and Browse AI's robot will appear in the top right hand side of this window.\ \ Click the robot and select the '**Capture List**' option.
    10
    Hover over the items on the page until they appear as listed items (as shown below).\ \ This is selecting the area that we want the Browse AI robot to scrape data from.
    11
    Now hover over the title until you see the option to select '**Visible Text, Link**', these are the two elements we want to scrape.\ \ This can be slightly fiddly because there are lots of elements on the page that Browse AI is able to capture, so take your time to make sure you've selected these items.
    12
    You'll now be prompted to select the first element you want to scrape.\ \ Select the '**Capture visible text**' item.
    13
    Next, click the '**Capture link**' item.\ \ Press '**Enter**' on your keyboard as this will finish the process of selecting the items you want to scrape.
    14
    You'll now be prompted to give the items you've selected a name.\ \ Give your first item the name '**Story Title**' and click the ✅ when it appears.
    15
    Repeat this process for your second item and call it '**Story Link**'.
    16
    A screen will now appear showing the structure of the data that you've selected to capture. If everything looks as expected, start by naming this list of data.\ \ Call it 'The Verge Top Stories'.
    17
    Set the number of items you want to extract as '10' and from the drop-down menu select 'There are no more items to load. \ \ Click '**Capture List**'.
    18
    You'll now be taken back to the previous screen where you can finish the recording.\ \ Click '**Finish Recording**'.
    19
    You'll be taken back to the Browse AI dashboard where you need to name your robot.\ \ Replace the text in the '**Robot name**' field. If you're following along with this example, call it 'Top Stories from The Verge'.
    20
    Click "**Save**"