Defining Data

The Activity

At a lot of the events I attend there is a tendency to talk about data as if it were a homogenous, unchanging thing, like something you could pick up at a supermarket.

It’s pretty dangerous territory to be in - particularly for the civic tech and open data communinities. If these movements are not careful to define the realms in which it is appropriate to open data and where not, a knee-jerk reaction threatens to shut down data release or data collection before we have seen the true potential that opening up data could hold.

This exercise targets this blanket usage by producing a “typology of data”.

The aim of the exercise is to trigger in the participant an awareness of the number of different things we could mean when we say data then helps them to categorise the outputs. If all goes well, the participants should come out of the exercise with a kickass vocabulary to be able to talk about data in the future.

The Outputs

Required materials

Post-it notes
Sharpies (or other bold pens)

Step 1: (Brain)Storm

Ask members of your group to write down any adjective which they have heard of as applying to data. That could be any adjective from ‘big’ to ‘messy’. Give them 4-5 minutes to think about it on their own. One adjective per post-it.

Step 2: Splat

Have everyone put their post-it notes on a wall - let people see the range of answers which people have come up with.

Step 3: Snake (Or ‘cluster’ - but that breaks the alliteration)

Have people work out whether there are any patterns or themes, then group the post-its by theme into a ‘snake’ (see image above). Give the snake a ‘title’ - i.e. what does it represent?

Step 4: Step Back

As a group - take a look at the first draft of the typology. What do you observe? Is there anything missing? Feel free to respond as appropriate - this may mean adding additional post-its or even columns.

Remind participants that ultimately any dataset will ultimately fit into multiple categories you have described above.

Where Next?

One of the participants also suggested we generate a Family Fortunes, or Pointless game around the adjectives - an excellent idea to surface which terms people know best and least.

Thanks to

Thank you to the following events, where I got to refine this exercise:

Ethics of Data in Civil Society in Stanford
The kick off event for the Civicus DataShift

Get updates straight to your inbox!

Alternatively:

Here's the RSS feed.
Or follow me on Twitter.

The Curse of the Data Lake Monster
Data Expeditions - new things tried
Wolf in Sheep's Clothing - Bad User Stories
Hacking User Stories and Jobs to be Done