How to Scrape Data from Any Website | Scribe

    How to Scrape Data from Any Website

    • Scott Colenutt |
    • 24 steps |
    • 3 minutes
    1
    Sign up for an account with [Browse AI](https://www.browse.ai/?utm_source=100daysofnocode&utm_medium=partner&utm_campaign=q3) and follow the onboarding instructions.
    2
    Inside the dashboard, select "**Build New Robot**"
    3
    Select the "**Extract Structured Data**" option.
    4
    When invited to create your Robot, click the "**Origin URL**" field and enter: <https://www.notion.so/careers>
    5
    Click "**Start Training Robot**"
    6
    Click "**Use Robot Studio**"
    7
    Select the option "**Capture Text**" followed by "**From list**" from the right-hand menu.
    8
    Scroll down to the Customer Experience section and **highlight the List Items as shown below.**\ \ When you are web scraping, you'll often be exporting data into spreadsheets, and so it can help to think of list items as rows in a spreadsheet. \ \ In this example, we want to capture the data for each role and output this data in a row. \ \ Highlight the List Items as shown in the screenshot below.
    9
    You'll now be prompted to select the items you want to scrape (or extract).\ \ Start by selecting the job title.
    10
    Next, select the location from the first job listing.
    11
    Finally, select the outer border of the first job listing, and select "**Link**" from the pop-up that appears.
    12
    Click "**Confirm**" in the right-hand menu.
    13
    You'll now be invited to label the elements you've selected, in order of how you selected them.\ \ In the first pop-up box that appears, label the item "**Job Role**" and click the✅
    14
    Next, label the location element with "**Location**" and click ✅
    15
    Finally, label the link with "**Job URL**" and click ✅
    16
    **Give your extracted data list a name**. In this example we'll use "Notion CX Roles"
    17
    Select "**10**" as the maximum number of rows you want to extract.
    18
    Click "**Select Pagination Setting**"
    19
    In the right-hand menu, select "**No more items to load**"
    20
    Click "**Save Captured List**"