How to Scrape Data from Any Website | Scribe

    How to Scrape Data from Any Website

    • Scott Colenutt |
    • 24 steps |
    • 3 minutes
    1

    Sign up for an account with [Browse AI](https://www.browse.ai/?utm_source=100daysofnocode&utm_medium=partner&utm_campaign=q3) and follow the onboarding instructions.

    2

    Inside the dashboard, select "**Build New Robot**"

    3

    Select the "**Extract Structured Data**" option.

    4

    When invited to create your Robot, click the "**Origin URL**" field and enter: <https://www.notion.so/careers>

    5

    Click "**Start Training Robot**"

    6

    Click "**Use Robot Studio**"

    7

    Select the option "**Capture Text**" followed by "**From list**" from the right-hand menu.

    8

    Scroll down to the Customer Experience section and **highlight the List Items as shown below.**\ \ When you are web scraping, you'll often be exporting data into spreadsheets, and so it can help to think of list items as rows in a spreadsheet. \ \ In this example, we want to capture the data for each role and output this data in a row. \ \ Highlight the List Items as shown in the screenshot below.

    9

    You'll now be prompted to select the items you want to scrape (or extract).\ \ Start by selecting the job title.

    10

    Next, select the location from the first job listing.

    11

    Finally, select the outer border of the first job listing, and select "**Link**" from the pop-up that appears.

    12

    Click "**Confirm**" in the right-hand menu.

    13

    You'll now be invited to label the elements you've selected, in order of how you selected them.\ \ In the first pop-up box that appears, label the item "**Job Role**" and click the✅

    14

    Next, label the location element with "**Location**" and click ✅

    15

    Finally, label the link with "**Job URL**" and click ✅

    16

    **Give your extracted data list a name**. In this example we'll use "Notion CX Roles"

    17

    Select "**10**" as the maximum number of rows you want to extract.

    18

    Click "**Select Pagination Setting**"

    19

    In the right-hand menu, select "**No more items to load**"

    20

    Click "**Save Captured List**"