In this activity we will investigate how to extract the attribute information in elements.
Take a look at this page, which lists the London boroughs and their council websites:
London BoroughsView the page source.
Look at the first link:
<a href="http://www.lbbd.gov.uk/">London Borough of Barking and Dagenham</a><br/>
We already know how to extract the text "London Borough of Barking and Dagenham". We would write a selector that selects the a
elements and then use the text()
function to extract the text.
But what about the link embedded in the href
attribute? To extact that we can use the attr()
function:
// Create a list to receive the data we will scrape
results = []
// Find all the a elements and extract the name and link
links = $('a')
links.each((i,el) => {
borough = {}
// The name is in the element text
boroughName = $(el).text().trim()
borough.name = boroughName
// The link is in the element's href attribute
link = $(el).attr('href')
borough.link = link
// Add the borough to the array
results.push(borough)
})
You can find the completed code in scrape4-attributes.js
.
Table of Contents