Web Scraping with Node.js

Cheerio Selectors

Investigating the range of selectors offered by Cheerio


Selectors

Web scraping with Cheerio uses selectors to specify the data we want. When we pulled the h2 tags from the page in the previous activity we created a selector:

    // Search for the elements we want
    selection = $('h2')

The selector could then be used to pull the data from the page:

    // Add the elements to the list
    selection.each((i,el) => {
      text = $(el).text()
      results.push({country:text})        
    })

Cheerio has a number of selectors and selector patterns for different scenarios.

Here is a list of examples of the most common selector scenarios:

Selector Action
$('.') Select all elements on the page
$('div') Select all div elements on the page
$('h1, h2') Select all h1 and h2 elements on the page
$('div > p') Select all p elements directly under a div (children)
$('div p') Select all p elements under a div either directly or indirectly (find)
$('#xyz') Select all elements with an id of xyz
$('.pqr') Select all elements with the CSS class (style) pqr
$('title') Select the title element

There are also some functions that we can apply to a selector:

Function Purpose
selector.children() Select all elements directly under the selector
selector.children('p') Select all p elements directly under the selector
selector.find('tr')|Select all tr elements directly or indirectly under the selector
selector.first()| Select the first direct child of the elements in the selector
selector.last()| Select the last direct child of the elements in the selector
selector.each(fn)| Loop through all elements in the selection and apply the function fn

On the next page we will explore these selectors and functions.

Table of Contents

  1. Scrape data from a web page with Cheerio
  2. Activity 1: Modify the sample code
  3. Cheerio Selectors
  4. Activity 2: Trying out Cheerio Selectors
  5. Activity 3: Trying out some Tables
  6. Activity 4: Reading attributes
  7. Activity 5: Books to Scrape
  8. Clicking and Autoscrolling
  9. Links to Scrape Samples