Web Scraping with Node.js

Activity 4: Reading attributes

In this activity we will investigate how to extract the attribute information in elements.

< Prev Page Next Page >Back to resources page

Attributes

Take a look at this page, which lists the London boroughs and their council websites:

View the page source.

Look at the first link:

<a href="http://www.lbbd.gov.uk/">London Borough of Barking and Dagenham</a><br/>

We already know how to extract the text "London Borough of Barking and Dagenham". We would write a selector that selects the a elements and then use the text() function to extract the text.

But what about the link embedded in the href attribute? To extact that we can use the attr() function:

    // Create a list to receive the data we will scrape
    results = []

    // Find all the a elements and extract the name and link
    links = $('a')
    links.each((i,el) => {
        borough = {}

        // The name is in the element text
        boroughName = $(el).text().trim()
        borough.name = boroughName

        // The link is in the element's href attribute
        link = $(el).attr('href')
        borough.link = link 

        // Add the borough to the array
        results.push(borough)
    })

Try it out

Use the above code in a new project to extrac the London borough names and links

Complete code

You can find the completed code in scrape4-attributes.js.

< Prev Page Next Page >Back to resources page

Table of Contents