Web Scraping with Node.js

Scrape data from a web page with Cheerio

In this workshop we will automate the process of pulling data from a web page

Table of Contents

  1. Scrape data from a web page with Cheerio
  2. Activity 1: Modify the sample code
  3. Cheerio Selectors
  4. Activity 2: Trying out Cheerio Selectors
  5. Activity 3: Trying out some Tables
  6. Activity 4: Reading attributes
  7. Activity 5: Books to Scrape
  8. Clicking and Autoscrolling
  9. Links to Scrape Samples

Web scraping

Take a look at this web page from Open Street Map:


There is a list of underground stations together with some useful information, such as the geo-coordinates of the stations. If we wanted to use this data in a program (e.g. to draw a map of the London Underground) we would need to get this data in to a more useful form, such as a list in Python or a JSON object in a Javascript program.


A really simple approach would be to select the table on the page using your mouse, copy it, and paste it into your favourite spreadsheet program. This will work, but has a few significant disadvantages:

A Really Simple Scrape

Here is a more automated, code-free approach:

Here is an HTML to JSON tool which you can use for this process:


The above approach works to some degree, but it is not full automated. We still need to do some copying and pasting.

The rest of this workshop explores a coded approach, using node.js.

The Source Code

The source code for the examples in this worksheet can be found here:

Source Code

Before we get started, download the code and open it up in Visual Studio Code.

Web Scraping with Node.js and Cheerio

Let's do a simple web-scraping exercise. We will scrape this test page:

Test Page

The test page looks like this:


We will be using a Node.js module called Cheerio to help us.

First, let's run this program. Open up a Terminal window in Visual Studio Code (menu Terminal > New Window). Type in the following command:

node scrape1-simple.js 

You should see the following message:

Found 8 rows
See data.csv for results

You should also see a file called data.csv containing the following:

Bosnia and Herzegovina

Open the code file scrape1-simple.js and take a look at it:

// Load the modules we need 
const axios = require('axios');                     // for sending web requests
const cheerio = require('cheerio');                 // for web scraping
const scrape_helper = require("./scrape-helper")          // for saving objects to csv

// Call the scrape function

// Function to scrape a page
function scrape() {
  // Specify the URL of page we want to scrape
  let url = "https://www.thinkcreatelearn.co.uk/resources/node/web-scraping/sample1.html";

  // Make the http request to the URL to get the data
  axios.get(url).then(response => {

    // Get the data from the response
    data = response.data

    // Load the HTML into the Cheerio web scraper
    const $ = cheerio.load(data);

    // Create a list to receive the data we will scrape
    results = []

    // Create a new csv file

    // Search for the elements we want
    selection = $('h2')

    // Add the elements to the list
    selection.each((i,el) => {
      text = $(el).text()

    // Save the data to the csv
    scrape_helper.storeCsv('data.csv', results)
    console.log("See data.csv for results")

  }).catch((err) => {
    // Show any error message
    console.log("Error: " + err.message);

Essentially what the code is doing is searching for particular elements in the HTML and using them to build up a JSON object. Take a look at the HTML of the web page we are scraping (in Chrome, visit the Test Page and right-click on the page, then select View Page Source).

Here's the HTML code for this page. The H2 tags are highlighted in yellow:

<!DOCTYPE html>
<html lang="en">

        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <meta http-equiv="X-UA-Compatible" content="ie=edge">
        <title>Web scraping exercise</title>
            .fpar {
                font-family: Georgia;

        <p>Sample for web scraping exercises</p>

        <h1>Countries beggining with A</h1>





        <h1 id="b-countries">Countries beggining with B</h1>     



        <h2>Bosnia and Herzegovina</h2>



The web scraping code is looking for all the <h2> tags:

      // Search for the elements we want
      const selection = $('h2')

Then building up the list of all the h2 tags:

    // Add the elements to the list
    selection.each((i,el) => {
      text = $(el).text()

Then saving the results as a csv file:

    // Save the data to the csv
    scrape_helper.storeCsv('data.csv', results)
    console.log("See data.csv for results")

Table of Contents

  1. Scrape data from a web page with Cheerio
  2. Activity 1: Modify the sample code
  3. Cheerio Selectors
  4. Activity 2: Trying out Cheerio Selectors
  5. Activity 3: Trying out some Tables
  6. Activity 4: Reading attributes
  7. Activity 5: Books to Scrape
  8. Clicking and Autoscrolling
  9. Links to Scrape Samples