digital ash in a newspaper urn

Lucas Timmons is a special editor for data journalism and data visualization at the Edmonton Journal, a former journalism instructor at Grant MacEwan University and one of the ONA's 2011 MJ Bear fellows.

Making maps more beautiful: Coffee in Edmonton’s Capital


This is a look at the density of coffee shops around Edmonton. The darker the hexagon, the more coffee shops there are in that area.

This choropleth map uses hexagonal binning (a good write up about hex-bins here) to show the density.

This map uses map tiles created by Stamen Design. The colours for the hexagons were made using Color Brewer.

The data comes from work by friend William Wolfe-Wylie at Canada.com. He wrote a webscraper to get all the addresses of Tim Hortons and Starbucks in Canada. I added addresses for the city’s other coffee shops by using the data available on Canada411 and Yelp. The data was then moved into a geojson file in QGIS and added to the map using the d3js library.

This map is very nice to look at, but is it functional? I’m not so sure. But from now on I plan on being much more aware of the beauty that goes into maps and not just making sure I get the data on them.

Do you think this map is beautiful? Do you find maps beautiful at all? If so, we should talk.

Friday Fun: Does Edmonton have the highest quality of life in English speaking countries?

Does Edmonton have the highest quality of life in English speaking countries?

According to Numbeo.com the answer is yes.

The site, which describes itself as, “The world’s largest database of user contributed data about cities and countries worldwide, especially living conditions: cost of living, housing indicators, health care systems, traffic, crime and pollution” uses crowd-sourced information about cities around the world. When the amount of collected data is sufficient, the site runs a statistical analysis of the information to create a quality of life index.

As of November 30, 2012, the top cities in the world for quality of life are:

  • 1. Berlin, Germany
  • 2. Zurich, Switzerland
  • 3. Edmonton, Canada
  • 4. Perth, Australia
  • 5. Calgary, Canada
  • 6. Trondheim, Norway
  • 7. Stockholm, Sweden
  • 8. Sydney, Australia
  • 9. Montreal, Canada
  • 10. Dubai, United Arab Emirates

The site surveys the following information:

  • Cost of living and purchasing power
  • Affordability of housing (apartments)
  • Pollution including air, water, etc.
  • Crime rates
  • Health system quality
  • Traffic (commute times)

That information is run through a formula:

index.main = 65 + purchasingPowerInclRentIndex – (housePriceToIncomeRatio * 2) – cpiIndex / 5 + safetyIndex * 3 / 4 + healthIndex / 2 – trafficTimeIndex / 2 – pollutionIndex;

And in the end the site comes up with a quality of life index.

Edmonton

Here is Edmonton’s information:

Purchasing Power Index 143.70
Safety Index 65.47
Health Care Index 76.98
Consumer Price Index 115.04
House Price to Income Ratio 3.02
Traffic Commute Time Index 35.33
Pollution Index 14.66
—————————————–
Quality of Life Index: 234.93

Each part of the formula is based it’s own formula made up from the information provided by the site’s users.

How does it work?

Take the purchasing power index for example. The site sets a baseline for cost (New York City). The cost of goods and services for any city is then measured against the baseline cost to come up with a number. Numbers above 100 mean the good or service is more expensive than New York City and less than 100 mean the opposite.

Here’s the chart for Edmonton

Consumer Price Index (Excl.Rent): 115.04
Rent Index: 45.96
Groceries Index: 118.27
Restaurants Index: 94.07
Consumer Price Plus Rent Index: 81.23
Local Purchasing Power: 143.70

It turns out that groceries are more expensive in Edmonton than New York, but rent is significantly less expensive. See the full list of good and services measured here.

The same sort of calculations are then done for safety, health care, traffic commute time and pollution. And they are run through the main formula as outline above. It can all get very complicated, but if you want to check the math yourself, the site has put it all online.

What does it mean?

Well, it really depends. The reliability of crowd sourcing is still up for debate. It does work great for a lot of things (Wikipedia) but it also is subject to people with nefarious intent (Wikipedia).

So if you consider this sort of crowd sourcing valid, and you think the site has taken enough data to ensure they are looking at trends and not outliers, then perhaps Edmonton really is the best place in the world to live if you speak English.

But regardless, be proud that Edmonton did better than Calgary.

Edmonton Calgary
Quality of life 234.93 172.41
Purchasing Power Index 143.70 102.42
Safety Index 65.47 68.33
Health Care Index 76.98 75.33
Consumer Price Index 115.04 117.11
House Price to Income Ratio 3.02 7.42
Traffic Commute Time Index 35.33 56.71
Pollution Index 14.66 17.30

So what do you think? Does Edmonton have the best quality of life in English speaking countries?

Halloween 2012: Where is all the candy at?

UPDATED 11:10 a.m. : I have included the latest data. Responses are still coming in. The charts at the bottom are up to date and so is the new bubble chart. The figures in the text are updated too.

Where do Edmonton’s children go to trick-or-treat and what kind of candy do they get? This was the question that lead to our Halloween map.

In discussion with Kerry Powell over how we should best cover Halloween we came up with a few ideas. One incredibly ambitious idea was to send reporters (interns) around with children as they trick-or-treated and then rate their performance on a series of metrics. Their c/m (candy per minute) score would show us who the best the trick-or-treater was. The idea was to compare the tactics, costumes, neighbourhoods and whatever else we could to try and determine how to maximise the candy haul for next Halloween. Unfortunately for us, that was a bit too ambitious and we couldn’t pull it together.

So Kerry came up with an idea for what we ened up doing. We decided to find a way to crowd-source Halloween information from our staff and readers and present the data on a map. We came up with some ideas as to what we wanted to measure. When all the deciding was done, we decided to ask people how many trick-or-treaters came to their door, how much they spent on the treats they gave out and what those treats were.

The result is our trick-or-treat 2012 map.

So what did we find out?

Where was the busiest spot for trick-or-treaters? Grovenor

We received four responses from Grovenor with an average of  119.75 trick-or-treaters per house. Because we had four responses from this area, I am more comfortable with the reliability of the information and that’s why I am putting it first. Other neighbourhoods of note: Mill Creek (2 responses, 119 average), Terwillegar Towne (5 responses, 117.80 average) and Westmount (4 responses, 96.75 average).

Reporter Elise Stolte interviewed one of the respondents from Westmount:

EDMONTON – When Lana Barr moved to Westmount neighbourhood one year ago, her neighbours warned her about the kids at Halloween.

There are oodles of them.

She counted 180 trick-or-treaters at her door, handing out mini-chocolate bars from several boxes of Nestle mixed treats.

Westmount is a neighbourhood just northwest of downtown, with many younger families moving in.

“From 6 to 8, it was pretty steady. It was all younger kids as well,” she said. “We were told to prepare for a lot.”

But Barr had lots of candy. She spent $120 on it. “It’s once a year. Not that big of a deal. It puts some joy on a kid’s face, hearing them say, ‘thank-you, thank-you.’ ”

How many trick-or-treaters visited? The average (arithmetic mean) of trick-or-treaters was 38.07 per neighbourhood. That does not take into account any neighbourhoods with no trick-or-treaters reported. See the table at the bottom of this post for full details.

How much money was spent? The average (arithmetic mean) of money spent on treats was $37.77. See the table at the bottom of this post for full details.

What was the most popular candy? The following bubble chart shows the most popular treat. The larger the bubble, the more the treat was given out. Click to see it full size.

 

So there you have it. A very basic, unscientific look at Halloween in Edmonton. What do you think? What would you like to see next year? Anything else you’d want to know about trick-or-treating in the city?

ORIGINAL DATA

Candy data for bubble chart:

Treat Total responses
Aero 85
Baby Ruth 7
Butterfinger 17
Candy Corn 1
Chips 70
Coffee Crisp 72
Hershey’s Chocolate Bar 66
Hershey’s Hugs / Kisses 3
Jelly Beans 5
Kit Kat 92
Mike and Ike 9
Non-food Item 13
Other Candy 149
Other Chocolate 131
Other Salty Snack 21
Peanut Butter Cups 61
Rolo 3
Smarties 75
Snickers 26
Tootsie Roll 19
treats1 1
Twizzlers 20
Grand Total 946

Neighbourhood information:

Neighbourhood Responses Average (mean) trick-or-treaters Average (mean) money spent in $s
Alberta Avenue 6 21.50 29.83
All Pine Estates 1 34.00 35.00
Allendale 3 9.33 28.33
Anders 1 30.00 30.00
Argyll 1 27.00 80.00
Aspen gardens 1 20.00 80.00
Athalone 1 18.00 35.00
Balwin 1 18.00 60.00
Baturyn 1 20.00 25.00
Bearspaw 1 72.00 30.00
Belgravia 3 61.67 46.67
Bell Rive 1 45.00 50.00
Bellevue 2 4.50 47.50
Belmead 1 75.00 100.00
Belmont 1 30.00 30.00
Beverly 1 6.00 15.00
Blackmud Creek 2 35.00 34.00
Blue Quill 2 50.00 50.00
Bonnie Doon 2 37.50 55.00
Brander Gardens 1 42.00 30.00
Bridlewood 1 10.00 40.00
Bristol Oaks 1 40.00 70.00
Brookview 3 12.67 28.33
Calder 1 48.00 25.00
Callaghan 1 58.00 50.00
Canossa 2 77.50 57.50
Capilano 5 9.80 23.00
Carlton 1 15.00 50.00
Castlebrook 2 32.50 33.00
Castledowns 1 60.00 60.00
Casttledowns 1 40.00 15.00
Cavell Ridge 1 75.00 17.96
Charlesworth (Ellerslie Heights) 1 60.00 25.00
Copperwood 2 101.50 72.50
Craigavon 1 25.00 35.00
Crawford Plains 2 27.00 47.50
Crestwood 2 19.50 55.00
Cumberland 1 55.00 25.00
Daly Grove 2 35.50 47.50
Delton 2 9.00 32.50
Delwood 1 50.00 40.00
Dovercourt 6 23.83 18.00
Downtown 1 0.00 0.00
Duchene 1 24.00 20.00
Duggan 3 31.33 31.67
Dunvegan 3 57.67 33.33
Eagle Ridge 1 172.00 100.00
Elerslie Crossing 1 74.00 60.00
Ellerslie 1 0.00 20.00
Elmwood 1 20.00 30.00
Erin Ridge 1 80.00 80.00
Evergreen Community 1 65.00 40.00
Falconer Heights 1 35.00 6.00
Forest Heights 2 24.00 35.00
Fraser 2 10.50 16.50
Fulton Place 2 17.50 40.00
Gariepy 1 15.00 65.00
Glastonbury 3 65.00 58.33
Glengarry 1 10.00 7.99
Glenora 3 59.67 36.67
Glenwood 2 22.00 54.00
Grandin 3 18.67 19.33
Grandview Heights 1 90.00 45.00
Greenfield 2 30.00 17.50
Greenview 1 8.00 12.00
Greystone 1 80.00 40.00
Groat Estate 2 59.00 55.00
Grovenor 4 119.75 62.50
Haddow 3 40.67 38.33
Hampton Pointe 1 17.00 50.00
Hazeldean 2 59.00 35.00
Hermitage 1 4.00 25.00
Highlands 3 7.00 29.67
Hillview 1 11.00 50.00
Hodgson 1 22.00 30.00
Hollick Kenyon 1 12.00 20.00
Holyrood 3 19.67 21.97
Homestead 1 38.00 75.00
Hudson 1 35.00 30.00
Huntington Hill 1 11.00 20.00
Idylwylde 2 32.50 55.00
Inglewood 1 20.00 28.00
Jackson Heights 2 36.00 35.00
Jasper Place 1 25.00 12.00
Kameyosek 1 40.00 30.00
Kenilworth 2 20.00 22.50
Kensington 1 24.00 50.00
Kildare 1 40.00 30.00
Kilkenny 2 44.00 45.00
killarney 1 75.00 75.00
King Edward Park 2 30.00 40.00
Kingswood 1 26.00 50.00
Kiniski Gardens 1 12.00 17.00
La Perle 1 6.00 7.00
Lacombe Park 1 13.00 40.00
Lago Lindo 3 47.00 35.00
Lakeland Ridge 1 41.00 20.00
Lakeview 1 75.00 110.00
Landsdowne 1 59.00 24.00
Lansdowne 1 55.00 41.00
Larkspur 2 189.50 49.00
Laurel 1 30.00 25.00
Laurier 1 6.00 25.00
Laurier Heights 3 12.33 29.67
Lendrum 1 15.00 20.00
Lessard 1 30.00 #DIV/0!
Lewis Estates 2 36.50 18.00
Lymburn 2 19.50 23.00
Lynnwood 2 28.50 35.00
MacEwan 2 32.50 20.50
MacTaggart 1 24.00 60.00
Magrath 2 82.00 37.50
Mayliewan 1 30.00 75.00
McKernan 4 7.75 26.75
McLeod 2 42.00 47.87
Meadowlark Park 1 12.00 8.00
Mensa 1 75.00 39.00
Mill Creek 2 119.00 117.50
Mill Woods 2 35.50 20.00
Miller 1 67.00 50.00
Minchau 1 96.00 40.00
Montalet 1 150.00 70.00
North Glenora 2 25.50 40.00
North West Wetaskiwin 1 17.00 30.00
Nottingham 1 45.00 28.00
Old Strathcona 1 12.00 25.00
Oliver 2 0.50 6.00
Ormsby 5 44.00 37.00
Ottewell 5 12.40 36.00
Ottwel 1 16.00 75.00
Ozerna 3 53.00 36.67
Parkallen 2 27.50 19.00
Parkdale 1 50.00 60.00
Parkview 4 35.50 51.25
Patricia Heights 1 45.00 50.00
Pleasantview 1 1.00 10.00
Pollard Meadows 1 17.00 18.50
Primrose 1 29.00 30.00
Prince Charles 1 35.00 60.00
Queen Alexandra 2 10.50 19.00
Ramsay Heights 1 35.00 30.00
Rhatigan Ridge 1 12.00 10.00
Richfield 1 35.00 10.00
Rio Terrace 2 9.00 20.00
Riverbend 3 18.33 35.33
Riverdale 5 33.60 35.75
Riverstone Pointe 1 53.00 110.00
Rossdale 1 43.00 45.00
Royal Gardens 2 20.00 39.50
Rue monette 1 40.00 20.00
Rundle Heights 1 12.00 50.00
Rutherford 4 19.50 32.50
Sakaw 1 12.00 20.00
Schonsee 1 58.00 40.00
Secord 1 25.00 35.00
Sherbrooke 3 10.67 25.00
Sherwood Heights 2 30.00 20.00
Sifton Park 2 20.00 30.00
Silver Berry 1 115.00 25.00
Skunk Hollow 1 0.00 20.00
Skyview 2 40.00 75.00
Southwood 1 40.00 34.00
Spruce Avenue 1 20.00 14.00
Strathcona 2 23.50 17.00
Strathearm 1 5.00 20.00
Suder Greens 1 113.00 50.00
Summerside 7 42.14 35.43
Summerwood 2 71.50 63.50
Suntree 1 160.00 40.00
Sweet Grass 3 13.33 33.50
Terrace Heights 2 19.00 12.50
Terwillegar 1 117.00 75.00
Terwillegar South 2 41.00 45.00
Terwillegar Towne 5 117.80 41.00
The Hamptons 2 45.50 16.00
The Meadows 1 85.00 #DIV/0!
Uplands of McTaggart 1 19.00 #DIV/0!
Upper Windermere 1 2.00 30.00
Virginia Park 3 32.67 31.67
WAHA 1 50.00 100.00
Walker 1 20.00 14.00
West Jasper Sherwood 1 4.00 35.00
West Lethbridge 1 68.00 20.00
West Meadowlark 1 23.00 45.00
Westmount 4 96.75 60.00
Westridge 2 32.00 55.00
Westwood 1 12.00 40.00
Whitemud Creek 2 18.00 25.00
Wild Rose 1 39.00 25.00
Winderemere South 1 6.00 15.00
Windermere North 1 30.00 20.00
Woodbridge Farms 2 18.00 40.00
Woodcroft 1 7.00 25.00
Woodlands 1 25.00 30.00
Yellowbird 1 29.00 19.00
No neigbhourhood given 33.17 30.81

How we did it: Highway 63 features map

The Journal launched a project on Albera’s Highway 63 today. Working on the stories were Marty Klinkenberg and Dan Barnes. I put together a couple of multimedia elements for the website as value added content.

The first was a map of the features along the highway and the second was a database of all those who have died in accidents along the highway. Along with the database there were a couple of charts to help break down the numbers.

I’ll outline the map below and provide a brief description on how it was made, why it was made that way and links to the appropriate libraries so you can try to do it yourself if you want.

Highway 63 features map

The Highway 63 features map has information about all the points of interest along the highway including current and planned construction, where the highway is divided, where vehicles can pass and where vehicles can stop for fuel. It was designed so that when a user put his or her cursor over cities and segments of the highway, detailed information about that point or feature would appear on the screen.

The map was made in JavaScript using the Raphaël JavaScript library. Raphaël’s sight says it

…uses the SVG W3C Recommendation and VML as a base for creating graphics. This means every graphical object you create is also a DOM object, so you can attach JavaScript event handlers or modify them later. Raphaël’s goal is to provide an adapter that will make drawing vector art compatible cross-browser and easy.

I chose SVG (Scalable vector graphics) over using an Open Street Map or Google Maps background layer because this map looks at just one road and is not concerned with the area around the highway. I wanted the focus to be on the road, so I designed the map to be bare except for the road.

I was able to get a copy of an ESRI shape file from the Canadian federal government that has every(!) road in the country. It was a huge file.

Every single road in Canada. Really.

Using QGis I was able to isolate Highway 63 and remove all the other roads. I created multiple layers, one for the whole highway, one for the twinned part, one for the part under construction and one for where the right-of-way tree clearing had been completed.

From there I was able to export the map paths to an SVG file. I opened the file in Adobe Illustrator and combined all the paths on the different layers. After saving the file I opened it up in a text editor. SVG files are great insofar as they basically tagged text files. Because of that, the co-ordinates of the paths in the file are easy to export using a text editor.

The beginning of the path co-ordinates for the Highway 63 shape from the SVG file.

After pulling out the co-ordinates from the SVG I used Raphaël to create paths based on them. I added the Raphaël canvas as a div in an HTML file and the map was born. From there it was just a matter of creating objects using the the library and placing them on the canvas. That was how all the markers, cities and titles were added to the map.

The next step was to write a function that made certain paths interactive. The code changes the mouse cursor when it is over a path the code is applied to. The onmouseover function also displays a div on the canvas. When the onmouseout event happens, the pointer is changed and the div disappears. I set it up to work with all parts of the highway along with the cities that were drawn on the map.

Because there are no images the map, including all the paths and elements and text comes in at only 106k. The Raphaël library itself is 90k. Unfortunately because of the server it has to be hosted on the elements aren’t gzipped for transfer, but it is still a pretty lean download. (Don’t get me started on the rest of the downloads for our behemoth of a site)

All in all I think this map worked well for what we were trying to do. It’s simple and that’s what makes it beautiful. Sometimes these projects get bogged down in complexity. How something is going to look, how it’s going to work, what features we can add. That’s not always the best experience for the reader. This map, while not perfect, does a pretty great service to the reader.

Why I don’t vote, why you shouldn’t either, and what would change my mind

I don’t vote. I haven’t in a long time.

I used to think that voting was a patriotic thing to do. It was a way of standing up and being counted. But at some point I started to become cynical. It occurred to me that choosing between the lesser of two evils isn’t really much of a choice at all and upon closer inspection, it seemed that in every election I was interested in, the will of the voters was ignored. Candidates were getting elected with the support of only 1/3rd of the electorate who cast ballots.

I was shocked and saddened. How could something that’s designed to serve the public’s will fail so spectacularly at doing so?

I came to believe the system is broken and by voting, I was giving it tacit approval. I was helping prop up a broken system and robbing my fellow citizens of a true democratic choice by supporting this system. I could no longer, in good conscious, vote.

The problem is that elections in Alberta do not serve the public’s will. Consider the 2008 election in the Edmonton Manning riding, where Progressive Conservative Peter Sandhu defeated independent incumbent Dan Backs.

2008 Election results

Sandhu was elected with 35.79 per cent of the vote, which means just under two thirds of the population of Edmonton Manning who voted had a MLA that they didn’t want. It gets even crazier if you look at actual turnout. Only 36.74 per cent of eligible voters turned out. That means that only 13.15 per cent of voters in Edmonton Manning voted for Sandhu and yet he was the winner! The will of the majority was not served, but the will of the plurality was.

How did this happen?

The system used to determine a winner, plurality voting, also known as first-past-the-post, works well when there are only two choices. A yes or no vote for instance, will have a clear majority provided the result is not a tie. However when a third choice is included, a majority is not needed to win. Only 33.4 per cent of the vote could be enough to win. Add a fourth option and the barrier is lowered to 25.1 per cent. In both cases the will of the majority is not served and the results end up with a winner that the majority does not support.

To think of it another way, here’s music reporter Robert Hilburn on the plurality system and how it’s used to decide the Grammys.

“Looking over previous Grammy contests, it’s easy to see where strong albums may have drawn enough votes from each other to let a compromise choice win. In 1985, two of the great albums of the decade – Bruce Springsteen’s “Born in the USA” and Prince’s “Purple Rain” – went head to head in the best album category, allowing Lionel Richie’s far less memorable “Can’t slow down” to get more votes.”

If this system can easily rob Prince or Springsteen of a Grammy, why are we using it to determine our MLAs?

I think voting could and should be an important part of the political process. And I would happily vote again, if the system were fixed. So how do we fix it?

I submit that there are three systems much better suited for our elections – Borda counts, Condorcet systems and approval voting.

Borda count

Jean-Charles de Borda was a French mathematician and the late 1700s. He saw the problem with the plurality system, and proposed a viable alternative.

A Borda count takes into consideration the entire range of preference each voter has for each of the candidates. It is more of a consensus method and would prevent a situation in which a candidate with only 35 per cent support is elected. It is no more difficult than the ballot used now, and would in fact give a more accurate result, since the winners will be tabulated by computer.

Each voter ranks all the candidates from most desirable to least desirable. So in an election with X number of candidates, each first place vote on a ballot is worth X points, each second place vote is worth X-1 votes, each third place vote is worth X-2 votes and so on. The last place vote on the ballot is worth one point. The points are added up and the candidate with the most points at the end of the day is elected.

Let’s look at this hypothetical example. Riding X has 18,000 voters:

  • 7000 voters rank the PCs first, the Liberals second and the NDP third.
  • 6000 voters rank the Liberals first, the NDP second and the PCs third.
  • 5000 voters rank the NDP first, the Liberals second and the PCs third.

In our current system the PCs win the seat with 38 per cent of the vote (7,000/18,000) even though 11,000 voters ranked them in last place!

In a Borda count system, the results are more fair.

  • The PCs get: (7,000×3) + (6,000×1) + (5,000×1) = 32,000 points.
  • The Liberals get: (7,000×2) + (6,000×3) + (5,000×2) = 42,000 points.
  • The NDP get: (7,000×1) + (6,000×2) + (5,000×1) = 24,000 points.

With the Borda system, the Liberals win the seat by a wide margin and all 18,000 voters get a MLA that was either their first or second choice. This system gives the majority their due.

Condorcet system

The Marie Jean Antoine Nicolas Caritat, Marquis de Condorcet was a French philosopher during the Enlightenment. He was not a fan of the plurality system either. “The apparent will of the plurality may in fact be the complete opposite of their true will.”

Condorcet’s system of voting gives meaningful results when comparisons of all the candidates in pairs are considered. Think of it as a round robin tournament for elections -‑ each choice faces the other choices in head-to-head competition. The votes again rank their candidates in order of preference.

Using this hypothetical example, we again suppose that riding X has 18,000 voters:

  • 7,000 voters rank the PCs first, the Liberals second and the NDP third.
  • 6,000 voters rank the Liberals first, the NDP second and the PCs third.
  • 5,000 voters rank the NDP first, the Liberals second and the PCs third.

Here are the results

  • The PCs lose to the Liberals 7,000 to 11,000. (11,000 voters ranked the Liberals higher than the PCs)
  • The PCs lose to the NDP 7,000 to 11,000. (11,000 voters ranked the NDP higher than the PCs)
  • The NDP loses to the Liberals 13,000 to 5,000. (13,000 voters ranked the Liberals higher than the NDP)

This would elect the Liberals, showing that the Liberals beat the PCs one-on-one and that they beat the NDP one-on-one. The major drawback to this method is that in not all cases will one party win each matchup. For those situations, the Copeland rule comes into effect. Basically, it looks at each party’s win-loss record and chooses the party with the best winning percentage.

In the unlikely event of a total tie, that is that no party can beat more parties than any other party, a simple Borda count would be used.

Approval voting

My personal preference for a voting system is the approval voting system, which is based on positive choices and ensures the election does not result in someone who the majority opposes being elected.

Each voter is given a ballot with all those running for election in their riding. Beside each name there are two boxes, yes and no. Voters choose whether or not they approve of each candidate. Every box with yes ticked is a vote for that candidate. Every box with no ticked is not a vote for that candidate. The candidate who gets the most votes wins the election.

Approval voting is a simple system that allows all voters the opportunity to get a say on each candidate. It would work to reduce negative campaigning as candidates need voters to choose yes for them, rather than trying to convince them not to vote for someone else. It would also prevent candidates from being spoilers and siphoning votes off from the candidates with a legitimate chance to win.

Conclusion

Voting should be fair. The results should accurately reflect what the electorate thinks. And our current system is the worst system for doing that. If we support the system, we’re giving it our approval. I can’t do that and I don’t think you should either.

There are no candidates who support these changes. It makes sense, since they can win now without the change. When is the last time you heard a politician talk about a Borda system or a Condorcet winner?

The more people who refuse to vote, the less legitimate an election becomes. When election participation drops into the 30 per cent range, it will force the discussion about our system.

Until then, I am happy to not vote. It’s not like it will give the proper result anyway.

Political donations from 2004 to 2010 or man people love giving money to the Tories

Allison Redford on a $100 bill

Redford's party brought in the most money by far.

Today we’ve launched a database of all political donations made directly to parties in Alberta since 2004. We’ve also put together a list of the top 70 donors in the province.

This is a pretty awesome moment for me, I’ve been working on this database since shortly after I arrived at the Journal in 2010. The work was done and then left for a while. A new year’s worth of data would appear, and I’d be back to getting it organized and ready to go. Major credit to Trish Audette for helping me get over the hill and finally get it published.

Trish has written two awesome stories about this by the way. Check them out here.

Methodology

Creating the database was a complex and very time-intensive process.

We started by downloading the annual financial statements from each party off of the elections Alberta website. The reports show donations to the party (not the constituency associations) from January 1 to December 31 of each year. If there was an election campaign that year, the donations during the campaign, the period from writ drop until two months after polling day, were counted in a separate report.

In all, there were annual reports for 2004, 2005, 2006, 2007, 2008, 2009, 2010, campaign reports for 2004 and 2008 and reports for the 2007 and 2009 by-elections.

Once the data was downloaded, the real work began. The data was trapped in a pdf file and formatted in such a way that it was not easy to extract. Using Xpdf, a free suite of tools for dealing with pdfs, the text from the statements was extracted, one statement at a time. From there the data was imported into an excel spreadsheet.

Once in excel, the donor had to be matched up with the donation along with the type of donation it was (personal, corporate, union). That was done for all 11 statements and the data was combined into one large file.

From there the data was imported into Google Refine. Refine is free software that is used to clean and standardize data. It helped correct mistakes that occurred through inconsistent data entry and poor formatting of the original source files. Once this process was complete for one party, the process was restarted for the next party.

When this process was complete for each party, the data was imported into a single database and run through Refine again.

The database was then imported into Caspio, a tool for displaying and making databases searchable online.

Findings

The Tories, who have governed for the past four decades, have not only a majority of seats in the legislature, but also in political donations.

The Progressive Conservatives have received $14,984,504 in cash and valued donations since 2004. That’s 65 per cent of the total $23,075,705 donated to all parties in the province. In other words, for every dollar the Tories raised, the opposition parties combined raised only 54 cents.

The Conservative’s closest competitor was the Alberta Liberal Party who brought in $2,765,406. The Tories brought in $5.42 for ever dollar the Liberals did.

After the Conservatives and the Liberals came the Wildrose Alliance with $2,440,232.68, the Alberta NDP with $2,385,045.90, the Social Credit Party with $196,989.65, the Alberta Party with $157,891.57, the Alberta Separation Party with $143,313.67 and the Alberta Communist Party $2,320.

The largest donors across all parties were the Encana Corportation, Transcanada Pipelines Ltd., Suncor Energy Inc., Enbridge Pipelines Inc and Nexen Inc.

Encana being the most generous benefactor donated $45,000 to the Liberal Party, $50,000 to the Wildrose Alliance and $106,710 to the Conservatives for a total of $201,710 in political donations.

Transcanada gave $134,370, Suncor gave $124,768, Enbridge gave $115,600 and Nexen gave $103,900.

The largest donation to any party was made by Transcanada. The pipeline builder has donated $115,470 to the Conservatives since 2004.

The analysis included only donations directly to the parties from 2004 to the end of 2010. It includes donations made during the 2004 and 2008 general elections and the 2009 by-election. It does not include money donated during leadership races nor money donated to constituency associations.

Conclusion

See anything interesting in our database? Let me know I’d love to hear what you find interesting about the donations.

Liberate your data from the PDF police

This post is one of a series of blog posts that I am doing as one of the first ONA class of MJ Bear Fellows. It is reposted from the ONA’s website.

Sometimes when dealing with data, you’re given incredibly useless PDF files instead of databases and csv files. When a government agency gives you a database but refuses to give you a spreadsheet or .csv file, it could be in the hopes that the data will be too difficult and time-consuming to use to tell a story. The agency might be hoping you just forget it because it’s too difficult to work with. However, there is a tool that can help you liberate that data.

You can use Pdftotext to convert PDF files into plain text files. This makes them much easier to manipulate and work with.

Read the rest of this entry »

Get your privacy on with Tor

As part of the ONA’s MJ Bear fellowship, I am blogging over at journalists.org. This post is reprinted from there.

Tor is a free, open network designed to allow users to connect to the Internet anonymously, without fear of being tracked. Using onion routing (more on this below), Tor conceals the data you send and receive online, keeping your data private and your identity shielded.

The Tor project website says, “Journalists use Tor to communicate more safely with whistleblowers and dissidents.”

If used properly, Tor can be incorporated into your reporting to keep you and your sources safe. If you’re reporting from a country with Internet censorship laws, it can also help you connect to where you need to go on the Internet.

WHAT DOES IT DO?

Tor protects you from Internet stream traffic analysis. Essentially, it makes you anonymous by shielding the content of your data stream from those who would use it to identify you, or things about you, including information about your location. Even if your data is encrypted, traffic analysis still can be used to figure out a lot about what you’re sending and to whom.

How does traffic analysis work? According to the Tor project’s site:

“Internet data packets have two parts: A data payload and a header used for routing. The data payload is whatever is being sent, whether it’s an email message, a web page or an audio file. Even if you encrypt the data payload of your communications, traffic analysis still reveals a great deal about what you’re doing and, possibly, what you’re saying. That’s because it focuses on the header, which discloses source, destination, size, timing and so on.”

Any information that can identify you or your source should be guarded as closely as possible. Just dealing with encryption isn’t enough.

HOW CAN I USE IT?

Journalists can use Tor for many different purposes, including identity protection of sources and journalists and to aid in publishing.

How are reporters using Tor?

  • Reporters without Borders tracks Internet prisoners of conscience and jailed or harmed journalists all over the world. They advise journalists, sources, bloggers and dissidents to use Tor to ensure their privacy and safety.
  • The U.S. International Broadcasting Bureau (Voice of America/Radio Free Europe/Radio Free Asia) supports Tor development to help Internet users in countries without safe access to free media. Tor preserves the ability of persons behind national firewalls or under the surveillance of repressive regimes to obtain a global perspective on controversial topics including democracy, economics and religion.
  • Citizen journalists in China use Tor to write about local events to encourage social change and political reform.
  • Citizens and journalists in Internet black holes use Tor to research state propaganda and opposing viewpoints, to file stories with non-State controlled media and to avoid risking the personal consequences of intellectual curiosity

Information provided by The Tor Project

When dealing with sensitive sources, anonymity can be crucial. By using Tor, journalists can ensure that all online discussions and the transfer of files with sources remain anonymous. Going through Tor also will shield you and your source’s IP addresses, so if a news outlet is compelled by the government to turn over communication logs, the source’s online identity still will be protected.

Journalists who wish to remain anonymous also can use the service to keep safe while working in countries with less friendly press laws. Tor will shield what you are looking at and allow access to blocked sites online. For example, if the jurisdiction you are in prevents Facebook access, using Tor enables you to visit Facebook. This can be incredibly helpful if you are trying to publish news and your platform has been blocked by government web filters. Tor also is the perfect solution if you have access to your publishing platform, but want to hide where you are posting from.

A great example of why IP / location protection is important happened recently in Canada. Someone had created an anonymous Twitter account — Vikileaks30 — to leak embarrassing information about federal cabinet minister Vic Toews.

To find out who the tweeter was, the Ottawa Citizen created a web page and emailed the link to the owner of the Vikileaks30 account. The link was opened by the tweeter and because he visited the page, his IP address had been recorded by the Ottawa Citizen.

It turns out the IP address came from a computer in the House of Commons. After a brief internal investigation, the owner of the account was identified as an opposition party staffer. He was fired, and the Liberal party was forced to apologize. If he had been using Tor, his identity would have been shielded.

If a newspaper can do this, there’s no question governments can.

SO HOW DOES TOR WORK?

Tor uses a network of proxies to connect you and your destination. It takes your data and runs it through three different servers going to the computer you connect to. To keep this anonymous, none of the servers know the entire route between you and the computer you’re connecting to. No one watching will be able to tell where your data is going to or where it’s coming from. Tor describes this type of connection as “using a twisty, hard-to-follow route in order to throw off somebody who is tailing you — and then periodically erasing your footprints.”

Tor creates a private network pathway for you to use to connect to your desired location. It does this through relays. Each relay in the network knows only where it’s getting data from and where it’s sending data to. No relay will know the full path that your data has taken. The transfer between each relay is also encrypted with a different key.

Once you’ve established a connection, you can safely send data over the Tor network. Traffic analysis will not reveal any useful information because there is no direct link between the source and the destination. The circuit lasts for about 10 minutes. After that a new one will be created, and your traffic will be routed through it.

HOW DO I SET IT UP?

Setting up Tor is a fairly simple process. Navigate to the Tor project download page and download the Tor bundle. It runs on Windows, OS X or Linux, and can be kept on a USB thumb drive so that you can use it with any computer. The package includes a preconfigured browser (based on Firefox) that you can use to browse the web anonymously.

Tor is up and running.

You also can download the source code and compile it yourself if you don’t trust the package. You will need to configure your browser to use Tor properly if you choose this option.

If you choose to download the bundle, just run the executable inside. It will start the Tor launcher and open a Firefox browser. Once the launcher says, “Connected to the Tor network!” you are ready to go. Remember, for it to work, you need to be using the browser that the bundle provides.

ANYTHING ELSE I SHOULD KNOW?

Tor will only make the actual data transfer itself (TCP-stream) anonymous. What you do online and the applications you use could still be unsecured and revealing. For example, the software you use while browsing might identify you. Java, Active X, JavaScript and other extensions can reveal your location, operating system, plugins and a lot more. Your browser also could be improperly configured and still allow outbound connections to look up DNS names without using Tor. This would reveal your location.

You also need to exercise caution when being anonymous. Don’t put your name into online forms. Don’t reveal information about yourself online. Use the secure (HTTPS) versions of sites and not the standard ones.

CONCLUSION

You may not need to use it all the time, but when you do need anonymity, Tor is a solution worth considering. The laws regarding the Internet are changing, and freedoms are being threatened. Tor is another valuable tool — one that helps working journalists stay anonymous.

Edmonton’s happiest city councillor is…

The week is coming to an end, so here’s a fun, Friday afternoon data treat.

Using the face.com API I analyzed the portraits of all 12 of Edmonton’s city councillors and the mayor to determine looked the happiest in their photo.

Karen Leivovici is Edmonton's happiest city councillor and has council's best smile.

Karen Leivovici is Edmonton's happiest city councillor and has council's best smile.

Get happy Ward 5, because your councillor sure is. Karen Leibovici is Edmonton’s happiest city councillor, at least according to her photo.

So how did we come up with that?

The face.com API allows users to perform face-detection and recognition on photos sent to the service.

The detection is done using an algorithm that scans the photo you’ve uploaded an finds shapes that resemble the average human face. It searches to make sure things like a nose or lips or eyes. Once it finds something, it tries to determine where the facial features are. It then scans the photos to find the face and returns a bunch of information as a .json file.

The file has locations of facial features based on an XY coordinate system.

The code looks like this if you’re interested:


url: http://www.edmonton.ca/city_government/documents/TonyCaterina.jpg
width: 200
height: 250
-tags: [
-{
tid:
recognizable: true
threshold: null
uids: [ ]
gid: null
label: ""
confirmed: false
manual: false
tagger_id: null
width: 22
height: 17.6
-center: {
x: 44
y: 21.2
}
-eye_left: {
x: 38.1
y: 17.85
}
-eye_right: {
x: 48.97
y: 16.44
}
-mouth_left: {
x: 40.02
y: 26
}
-mouth_center: {
x: 44.78
y: 25.95
}
-mouth_right: {
x: 48.76
y: 25.04
}
-nose: {
x: 44.58
y: 21.74
}
ear_left: null
ear_right: null
chin: null
yaw: 5.91
roll: -9.18
pitch: 4.81
-attributes: {
-glasses: {
value: "true"
confidence: 95
}
-smiling: {
value: "false"
confidence: 57
}
-face: {
value: "true"
confidence: 97
}
-gender: {
value: "male"
confidence: 90
}
-mood: {
value: "happy"
confidence: 78
}
-lips: {
value: "sealed"
confidence: 100
}
}
}

So for that photo of Tony Caterina, it has determined that his left eye is at point x=88.1 and y=17.85 and his right eye is at x=48.97 and y=16.44.

Along with that information it also tries to determine attributes about the person in the photo. In this case it returns information on if the person is wearing glasses, is smiling, their gender, their mood and if their lips are sealed or not.

So for Caterina, face.com is 95 per cent sure he’s wearing glasses, 57 per cent sure he’s not smiling, 90 per cent sure his features are male, 78 per cent sure he’s happy and 100 per cent sure his lips are sealed.

Using that information, I’ve put together the following charts.

Happiest:

Edmonton's city council's moods.

Man, Karen is happy.

Or take a look at the patent-pending Mood-O-Meter:

Happy, happy, happy... (click to see full size)

Best (or biggest) smile:

Edmonton city council's smiles.

That's four non-smiles for those of you counting.

Or take a look at the patent-pending Smile-O-Chart:

Karen Leibovici's smile could fill an empty city council chamber. (Click to see full size)

Keep in mind this is not scientific at all. Dave Loken might be a really nice guy in real life. I’ve never met him. Happy Friday.

A dozen years of homicides mapped

As part of the MJ Bear fellowship with the ONA, I am blogging for them on journalists.org.

This post is reprinted from http://journalists.org/2012/01/10/a-dozen-years-of-homicides-mapped/.


Edmonton homicides – 1999 – 2011

It was a particularly rough year for homicides in Edmonton. In 2011, the “City of Champions” was Canada’s homicide capital with a record-breaking 47.

As the year was wrapping up, we decided that the standard year-end wrap-up wouldn’t suffice for such a dreadful year. The Journal decided to publish a larger year-end project on homicides, and I was to supply a way to visualize it online.

This map is what I ended up creating. It fit in as a piece of the larger project.

This post isn’t a tutorial how-to for the map. It’s more of an outline of how I got from point A to point B with the lessons I learned along the way. If you’re looking for a tutorial on how to do this, let me know in the comments. If there’s enough interest, I could show specifics in a future post.

We had been using Dipity to keep a running timeline of the homicides in 2011. The idea was simple enough. We wanted anyone who read a story about a homicide in Edmonton to see where it was, who the victim was, when it happened and to have a link to the full story.

This worked well for us, but Dipity’s limitations, as well as some service outages, left us looking for a better answer. We also had decided we wanted something larger.

With that in mind, we started looking for a better way to tell the story. After some discussion with one of the Journal’s (then) crime reporters, Brent Wittmeier, we decided to find all the data we could before trying to decide what story we wanted to tell. Using our newspaper’s archives and paper records that earlier crime reporters kept, we were able to gather information on all the homicides in the city since 1999.

Using Google docs as our collaboration tool, we went to work filling in the details about every homicide. Brent handled the bulk of the work. When we took a look at the data set, it was pretty impressive. Edmonton had had 364 homicides since the start of 1999. We had information on the location, the method, the victim, his/her age and whether it was solved. It was at this point that we decided how best to visualize the data.

We had talked about a timeline, spanning 1999 to 2011. I made a simple mockup usingProPublica’s TimelineSetter. The idea was that the readers could choose from check boxes what type of homicide they wanted to see, and the timeline would be populated.


An example of ProPublica’s TimelineSetter

TimelineSetter allows you to put whatever code you like in the in boxes for each entry. We were able to enter all the information we had, plus a map link for each entry. Even better, the data was already in tabular format, so getting it into Timeline Setter was easy. We were lucky to avoid this trap.

With deadlines looming, sometimes we make decisions about what to do based on how easy it is to get the job done, rather than if what we plan to do makes the most sense for a situation. While Timeline Setter is awesome, (We’ve used it here) it wasn’t the best tool for the data we had.

Because we had location information for each homicide, and we had the method for each homicide, we decided that a map would be our best bet. This gave us multiple ways to visualize the data. We could show homicides by location, but also allow readers to pick the context. They could choose the year, a yearly comparison or the method used. We also were able to take the location data and map it against the city’s neighbourhoods so that we could find out what areas were disproportionately affected.

I got to work building the map. We don’t have access to ArcGIS and we’re limited with tech infrastructure because of Postmedia’s centralization and IT security policies, so we needed a solution that was free and could be hosted off site. I have had success withGoogle Fusion Tables before, so that became our platform. (Incidentally, Kathryn Hurleyfrom Google, did a great presentation on getting started with Fusion Tables at ONA11.

I cleaned the data that Brent and I had put together and uploaded it to Fusion Tables. I wrote some simple javascript to add a legend to the map that allowed users to choose what year they want to appear on the map. For this, I had uploaded the data from each year into a separate table. I colour coded the points with the legend based on the type of homicide. Then, I sought other staffers’ advice.


The homicide data in Fusion Tables

This step is was very important. Get your basic idea built quickly and then start asking everyone who will listen for advice. If you’re reading this, you’re probably not representative of the average user of your site. Users are going to want to know different things than you do, and they will use the data differently.

It wasn’t very long before I had a bunch of changes to make to improve the map. I was very happy that I had just shown a very basic model of what I was trying to do. The suggested changes were good and forced me to review how I was going to put the data online.

The data was easy to see, but difficult to understand. Seeing 364 dots on a map will show you there were a lot of homicides, but have you ever tried to count 364 dots — some of them overlapping — on a map? I had failed to realize that without the proper context, the dots would be meaningless.

The other big suggestion was that perhaps readers would want to see just what type of homicides had happened by location. So I had to add a method to show just the shootings, or just stabbings. Obviously having each year in a separate table was not going to cut it — especially not if I had to do that for each method, too.

It was at this point that I dove into the Fusion Tables API.

Saving you the technical details, I rewrote the queries that get the data from Fusion Tables and consolidated the data into a single table. This included adding some extra information into the table. (If you really want the technical nitty-gritty, email me and we can talk.) That brought the total number of tables down to just one, and it simplified the code I had to write.

I now had a method for showing just the year, or just the type of homicide, but still no way to display the numbers. I decided that simple bar charts would be the way to go in this case. The plan was to have them appear on the map each time a modifier was selected. So if you picked just the 2003 homicides, a chart would appear showing you how many homicides by each method occurred in 2003. If you picked stabbings, a chart would show you how many stabbings had occurred in each year.

As my budget for this remained at $0, the best place to look was Google. The Google Charts API gave me exactly what I wanted. It was easy to use and easy to embed.

Things were looking up, but the deadline was fast approaching, and I was far from done. It was at this point that I stopped looking for elegant solutions in my code and started looking for brute force code that just worked. This was probably my most valuable lesson: Build in more time!


The code. Well, just a few lines.

Instead of creating a nice compact function that when passed a variable, would create the chart and set the map, I had to write multiple similar functions with different values. I had run out of time, the function wasn’t working and I needed to get it done. Again, the lesson here is to build in more time for debugging and troubleshooting.

The final, and probably most difficult part of the map was the heat map function. Since we do not have access to ArcGIS, I had to do the spatial join in QGIS. That was more difficult than anticipated, and was complicated by some incorrect geocoding of points.

I had exported the points from the Fusion Tables document as a KML file. When I had geocoded them in Fusion Tables, it had appeared that all of the points were properly geocoded. So I was working under the assumption that they were all in the right place. WRONG.

Yet another lesson learned specific to mapping: Check to make sure things are showing up where they are supposed to.

Google is a bit particular as to how it handles addresses for geocoding, and Edmonton’s streets run on a confusing grid system. When confused about an address, Google put the point in the middle of the city. It did this with multiple points, but since they were all at the same latitude and longitude, it appeared as a single point.

It took me a while to figure out why one neighbourhood was reporting a huge number of homicides with so few points. I originally had thought it was a projection issue, and spent a good deal of time trying to determine if the KML and .shp files I was using had different projections.

Here’s another lesson: Don’t be afraid to ask for help. I am certain that if I had posted this on the NICAR-L some very smart person would have quickly shown me the error of my ways. Instead, I struggled for hours on something that took minutes to fix.

After making the address changes in the Fusion Tables document, and once again exporting the data as KML, I was able to get it into QGIS and run a spatial join with the neighbourhood shapes. I uploaded the shape file to Fusion Tables with Shape to Fusionand added the code to display it on the map.

From there, I just had to get the map online and then add the text for how to use the map. It’s here that my boss, Kerry Powell, constantly reminds me to be better. I suck at writing explanation text for how to use something. I know I do. I need to get better at it and I’m working on it. Kerry very graciously helps me out every single time I create something. Another lesson here: You could make the most awesome widget in the world, but if your readers don’t know how to use it, they won’t.

This also was where copy editing came into play. All the points on the map, when clicked, bring up a popup box with information about the homicide. All that text needed to be copy edited before we put it online. Sometimes the designers/coders/data journalists/whatever can get so caught up in getting something online and working, that tasks like copy editing get pushed aside. It is very important to remember this when trying to determine how long a project will take. Again, this is another lesson learned about building time into the schedule.

Looking back I think we made the right decision about how to visualize the data. Using these tools not only gave us a nice visualization, but also served to help the reporting.


Edmonton’s corridor of death

Specifically, the heat map confirmed what had long been thought about the downtown core of the city: Downtown Edmonton is “Deadmonton”. It also disproved the reputation that one area of the city had picked up: Millwoods is not “Killwoods”. The points also gave us an interesting look at homicides over time. We were able to see that along 107th Avenue and 86th Street, there exists a corridor of death. This information was included in one of the year-end wrap-up stories about where homicides happen.

In the end, this project turned out to be a lot more work than anticipated. If I had to do it over, I would have built in more time for revisions and coding. I also would have brought more people on board earlier to try and determine what we were trying to visualize. Working out an idea is great, but if the idea keeps changing, you’re never going to finish.