Kings Journalism banner

Skillsheet-Grab some data from the web, put it in a spreadsheet, and sort it

One of the simplest data tasks is to take some tabular data from the web and paste it into a spreadsheet program for some quick analysis. By doing something as simple as sorting on a column of data, you can quickly see elements of the dataset that might lead to stories.

For this demo we’ll go to the Canadian department of Fisheries and Oceans disclosure page for contracts at http://www.dfo-mpo.gc.ca/pd-cp/reports-eng.asp, and pick the most recent quarter of contracts posted (which is the third quarter of 2015-16 as we write this). If you click the link here, it will open in a new window.

Copying into a spreadsheet is as simple as highlighting the table of data by clicking and dragging your mouse from the top left corner to the bottom right corner of the data table. You can see part of it highlighted here.

Dataselected

Now, copy the content using copy under the edit menu of your browser or the keystroke CTRL > C on a PC or CMD > C on a Mac. Open your favourite spreadsheet program, put your mouse in cell A1 of a new sheet, and paste the data. In some spreadsheets you may need to use Edit>Paste Special and paste the data as text. This should work, with possible minor tweaks, in all major desktop and cloud spreadsheet applications, including Microsoft Excel, OpenOffice and Google Sheets.

It’ll end up looking something like this:

Datainsheet

Now, instead of a static HMTL table in your web browser, you have data in a spreadsheet. If you want to get rid of the hyperlinks, your spreadsheet should have a way to do that.

There are plugins available for browsers such Firefox and Chrome that make copying tables even easier. Internet Explorer also allows you to copy HTML tables to Excel by right clicking on the table in Explorer and choosing “Export to Microsoft Excel.” In Google sheets you can use the IMPORTHTML function.

Sorting your data

Now that the data is in a spreadsheet, you can do some simple analysis. The simplest is to sort it.

Sorting is most spreadsheet programs is a matter of first clicking in the column you want to sort on, right clicking (or CTRL > click on a Mac with only one mouse button) and choosing to sort in ascending or descending order. Sorting in ascending order will sort from the smallest to largest value; descending order is the opposite, largest to smallest.

You can also go to your program’s data menu (or the home ribbon in Excel for the PC) and choose to sort in either order, or click on the sort ascending or sort descending icon on the toolbar (home ribbon on Excel for Windows).

In Google sheets, hold your mouse over the letter label at the top of a column, and click on the small arrow that appears to the right side of the column label. You’ll find sort under the menu that appears. Here, we have chosen a descending order sort in a Google sheet:

GoogleSortMenu

The data is now sorted from the largest to the smallest value. Now we can see which vendor got the largest contract. That looks interesting! Time to do some further research.

GoogleSorted

If you want to sort on more than one column, you can do that too. In most spreadsheet programs, go to the data menu and choose sort. You’ll get a dialog in which you can choose one or more columns to sort and indicate if the first row contains headings.

In Google sheets, highlight the whole sheet by clicking within the data area and using the keystroke combination CTRL > A on a PC and CMD > A on a Mac. Choose Sort Range under the Data menu and sort using one or more columns. Make sure to indicate your data has a header row if there are column headers in the first row.

When you sort on more than one column, the first column you choose will be sorted first, then if there are any duplicate values in the first sort column, the second column will be sorted within each group of duplicates, and so on.

The content in this post also appears on Fred Vallance-Jones’s personal blog.

Back to main data journalism page

Posted: February 9, 2016, 2:54 pm AST
Modified: April 5, 2016, 10:26 am ADT