Module 5–WEDNESDAY MORNING, June 23

Go back to main resources page

Cleaning and preparing data will probably take you more time than any other task. There are all sorts of tools to help with the task, and some of them are shown in Appendix A of The Data Journalist. We encourage you to read through the appendix.

For this morning, we’ll show how you can use functions in Excel to prepare data for analysis.

There are eight videos in this module, each one relatively short. The first four focus on using IF functions in the context of calculating rolling averages. The second four show you how to use string functions to the divide content of a field into more than one field. You can watch the videos as many times as you like. You need the password that was provided to you by email.It is the same one you used to open the main data school resources page.

At the end of this page, you’ll find exercises for both sets of videos. We’ll talk about the answers in the live session for the morning.

Advanced Excel functions, Part 1

Link for data used in the first three videos (opens in new tab). 

You will use the sheet you create in the first four videos in this morning’s module to build a visualization in the first of the videos for this afternoon.

Watch the first video (Note: the video contains importing instructions for both Windows and Mac, in that order)

Watch the second video

Watch the third video

Link for the additional data used in the fourth video (opens in a new tab)

Watch the fourth video

Advanced Excel functions, Part 2

Download the exposures data used in the next four videos (Excel file).

The data came from here.

Watch the fifth video

Watch the sixth video

Watch the seventh video

Watch the eighth video

Here are all the formulas used in these videos

Part 1, IF function exercise:

Go to the federal COVID vaccination page at: https://health-infobase.canada.ca/covid-19/vaccination-coverage/

For this exercise, we will use the population vaccinated data (make sure you select right correct tab) at the top of page.

Download the CSV file from just below the map. Save it to your data folder, and make sure that you give it a name so you’ll know how up to date it is.

Now, here’s an interesting thing about this data. The vaccine administration data gives data for the number of doses given each weekday, but it doesn’t differentiate between first and second doses. This dataset does, but it only provides cumulative figures. What if we want to know the change each week in the number of partially vaccinated and fully vaccinated people? That information is not provided. Luckily, you have Excel and you can to do some calculations of your own. So that’s what you are going to do, using a simple formula to subtract the previous week’s total from this week’s total, and an IF function to make sure you don’t do the calculation when the province name changes.

Alright, so please import the dataset into Excel using the method shown in the first video today. Be sure to use a character encoding that can handle accented French characters.

Make sure the sheet is organized by province and then by the week-end date, in alphabetical order in the case of province and with the most recent date at the bottom for the date.

Now, insert new columns to the right of the number partially and number fully vaccinated columns. Create a formula that subtracts the previous week’s total from the current total to give the number for the week, for each of your new columns. Next, refine your formula with an IF function so that it inserts a “—” notation when the name of the province changes.

Once you are done, make sure to convert your formulas to values using paste special. Finally, sort the columns to see the days with the biggest increases and decreases.

This task is a little simpler than the one in the video, but it will require you to correctly structure an IF function to apply a formula based on a condition you set.

Part 2, String function exercise

You want to do a story on the locations of pharmacies in Ontario giving COVID-19 vaccinations. A colleague has already scraped the data from this page: https://covid-19.ontario.ca/vaccine-locations, but the addresses are all together in one field. You want to be able to both count up the number of locations in each city, and put the locations on a map to see how locations line up with things such as income, visible-minority populations and other local factors. You’ll need to clean up your data a little. Luckily, you’ve taken the data school and you know how to use string functions!

Download and open the VaccineLocationsforExercise in Ontario (as of June 22).

Using the skills you learned in the second four videos for Wednesday morning, create new columns for street number, street, city and province, then use string functions to populate the new fields.

Make sure you save a working copy.

When you are done, make a pivot table to count up the number of locations per city.

We’ll talk about the answers both parts of this exercise in the morning live session Wednesday.

Pro tip: You should make a regular habit of reviewing the technical notes for any dataset. This might go by various names, such as “metadata,” “data notes” or “about this data” but the key is to make note of any limitations of the data, both in terms of the data itself or how it was collected, the meanings of codes used in fields and other similar information. You should never work with data blindly and if information is absent or missing, you may need to be in touch with the provider of the data.

Live session: Noon Eastern time

Go back to main resources page

We are proud to work with our partners and sponsors