5 Stars of Personal Data Access

July 5, 2013Uncategorizedmidata, standardsReuben

As a volunteer ‘data donor’ at the Midata Innovation Lab, I’ve recently been attempting to get my data back from a range of suppliers. As our lives become more data-driven, an increasing number of people want access to a copy of the data gathered about them by service providers, personal devices and online platforms. Whether it’s financial transactions data, activity records from a Fitbit or Nike Fuelband, or gas and electricity usage, access to our own data has the potential to drive new services that help us manage our lives and gain self-insight. But anyone who has attempted to get their own data back from service providers will know the process is not always simple. I encountered a variety of complicated access procedures, data formats, and degrees of detail.

For instance, BT gave me access to my latest bill as a CSV file, but previous months were only available as PDF documents. And my broadband usage was displayed as a web page in a seperate part of the site. Wouldn’t it be useful to have everything – broadband usage, landline, and billing – in one file, covering, say, the last year of service? Or, even better, a secure API which would allow trusted applications to access the latest data directly from my BT account, so I don’t have to?

Another problem was that in order to get my data, I sometimes had to sign up for unwanted services. My mobile network provider, GiffGaff, require me to opt-in to their marketing messages in order to receive my monthly usage report. FitBit users need to pay for a premium account to get access to the raw data from their own device.

Wouldn’t it be nice to rate these services according to a set of best practices? In 2006, when the open data movement was in its infancy, Tim Berners-Lee defined ‘Five Stars of Open Data‘ to describe how ‘open’ a data source is. If it’s on the web under an open license, it gets one star. Five stars means that it is in a machine-readable, non-proprietary format, and uses URI’s and links to other data for context. While we don’t necessarily want our private, personal data to be ‘open’ in Berners-Lee’s sense, we do want standard ways to get access to our personal data from a service. So, here are my suggested ‘Five Stars of Personal Data Access’ (to be read as complementary, not necessarily hierarchical):

1. My data is made available to me for free in a digital form. For instance, through a web dashboard, or email, rather than as a paper statement. There are no strings attached; I do not need to pay for premium services or sign up to marketing alerts to read it.

2. My data is machine-readable (such as CSV rather than PDF).

3. My data is in a non-proprietary format (such as CSV, XML or JSON, rather than Excel).

4. My data is complete; all the relevant fields are included in the same place. For instance, usage history and billing are included in the same file or feed.

5. My data is up-to-date; available as a regularly-updated feed, rather than a static file I have to look up and download. This could be via a secure API that I can connect trusted third-party services to.

The Midata programme has considered these issues from the outset, calling for suppliers to adopt common procedures and formats. Simplifying this process is an important step towards a world where individuals are empowered by their own data. My initial attempts to get my data back from suppliers point to a number of areas for improvement, which I’ve tried to reflect in these star ratings. Of course, there’s lots of room for debate over the definitions I’ve given here. And I’m sure there are other important aspects I’ve missed out. What would you add?

OrgCon2013 and Open Data for Privacy presentation

June 17, 2013UncategorizedReuben

I recently attended OrgCon2013, the Open Rights Group’s annual conference. As in previous years, this was an excellent opportunity to catch up on the latest developments in a range of UK and international digital rights issues. It was perfectly timed to coincide with the news about the NSA surveillance leak, a story which found its way into virtually every talk I attended throughout the day, including my own short presentation on ‘Open Data for Privacy’. I’d particularly recommend watching Caspar Bowden’s excellent talk on wiretapping the cloud – very timely given the aforementioned NSA story.

I’ve posted up my slides, and a related network graph visualisation here. I’m hosting them on a new website I’ve set up to host some outputs from my research into open data and privacy – MyDataTransparency.org. Suggestions / collaborations welcome.

And thanks to ORG for putting on another great event and having me talk!

Nudge Yourself

May 24, 2013Uncategorizedautonomy, behavioural, choice architecture, nudge, paternalism, psychologyReuben

It’s just over five years since the publication of Nudge, the seminal pop behavioural economics book by Richard Thaler and Cass Sunstein. Drawing from research in psychology and behavioural economics, it revealed the many common cognitive biases, fallacies, and heuristics we all suffer from. We often fail to act in our own self-interest, because our everyday decisions are affected by ‘choice architectures’; the particular way a set of options are presented. ‘Choice architects’ (as the authors call them) cannot help but influence the decisions people make.

Thaler and Sunstein encourage policy-makers to adopt a ‘libertarian paternalist’ approach; acknowledge that the systems they design and regulate inevitably affect people’s decisions, and design them so as to induce people to make decisions which are good for them. Their recommendations were enthusiastically picked up by governments (in the UK, the cabinet office even set up a dedicated behavioural insights team). The dust has now settled on the debate, and the approach has been explored in a variety of settings, from pension plans to hygiene in public toilets.

But libertarian paternalism has been criticised as an oxymoron; how is interference with an individual’s decisions, even when in their genuine best interests, compatible with respecting their autonomy? The authors responded that non-interference was not an option. In many cases, there is no neutral choice architecture. A list of pension plans must be presented in some order, and if you know that people tend to pick the first one regardless of its features, you ought to make it the one that seems best for them.

Whilst I’m sympathetic to Thaler and Sunstein’s response to the oxymoron charge, the ethical debate shouldn’t end there. Perhaps the question of autonomy and paternalism can be tackled head-on by asking how individuals might design their own choice architectures. If I know that I am liable to make poor decisions in certain contexts, I want to be able to nudge myself to correct that. I don’t want to rely solely on a benevolent system designer / policy-maker to do it for me. I want systems to ensure that my everyday, unconsidered behaviours, made in the heat-of-the-moment, are consistent with my life goals, which I define in more carefully considered, reflective states of mind.

In our digital lives, choice architectures are everywhere, highly optimised and A/B tested, designed to make you click exactly the way the platform wants you to. But there is also the possibility that they can be reconfigured by the individual to suit their will. An individual can tailor their web experience by configuring their browser to exclude unwanted aspects and superimpose additional functions onto the sites they visit.

This general capacity – for content, functionality and presentation to be altered by the individual – is a pre-requisite for refashioning choice architectures in our own favour. Services like RescueTime, which blocks certain websites for certain periods, represent a very basic kind of user-defined choice architecture which simply removes certain choices altogether. But more sophisticated systems would take an individuals’ own carefully considered life goals – say, to eat healthily, be prudent, or get a broader perspective on the world – and construct their digital experiences to nudge behaviour which furthers those goals.

Take, for instance, online privacy. Research by behavioural economist Alessandro Acquisti and colleagues at CMU has shown how effective nudging privacy can be. The potential for user-defined privacy nudges is strong. In a reflective, rational state, I may set myself a goal to keep my personal life private from my professional life. An intelligent privacy management system could take that goal and insert nudges into the choice architectures which might otherwise induce me to mess up. For instance, by alerting me when I’m about to accept a work colleague as a friend on a personal social network.

Next generation nudge systems should enable a user-defined choice architecture layer, which can be superimposed over the existing choice architectures. This would allow individuals to A/B test their decision-making and habits, and optimise them for their own ends. Ignoring the power of nudges is no longer a realistic or desirable option. We need intentionally designed choice architectures to help us navigate the complex world we live in. But the aims embedded in these architectures need to be driven by our own values, priorities and life goals.

Transparent Privacy Protection: Let’s open up the regulators

October 5, 2012Uncategorizeddata protection, transparencyReuben

Should Government agencies tasked with protecting our privacy make their investigations more transparent and open?

I spotted this story on (eminent IT law professor) Michael Geist’s blog, discussing a recent study by the Canadian Privacy Commissioner Jennifer Stoddart into how well popular e-commerce and media websites in Canada protect their user’s personal information and seek informed consent. This is important work; the kind of pro-active investigation into privacy practices that sets a good example to other authorities tasked with protecting citizen’s personal data.

However, while the results of the study have been published, the Commissioner declined to name names of those websites it investigated. Geist rightly points out that this secrecy denies individuals the opportunity to reassess their use of the offending websites. Amid calls from the Commissioner for greater transparency in data protection generally – such as better security breach notification – this decision goes against the trend, and seems, to me, a missed opportunity.

This isn’t just about naming and shaming the bad guys. It is as much about encouraging good practice where it appears. But this evaluation should take place in the open. Privacy and Data Protection commissioners should leverage the power of public pressure to improve company privacy practices, rather than relying solely on their own enforcement powers.

Identifying the subjects of such investigations is not a radical suggestion. It has already happened in a number of high-profile investigations undertaken by the Canadian Privacy Commissioner (into Google and Facebook), as well by its relevant counterparts in other countries. The Irish Data Protection Commissioner has made the results of its investigation into Facebook openly available. The UK Information Commissioners Office regularly identifies the targets of its investigations. While the privacy of individual data controllers should be respected, the privacy of individual data subjects should come before the ‘privacy’ of organisations and businesses.

As I wrote in my last blog post, openness and transparency from those government agencies tasked with enforcing data protection has the potential to alleviate modern privacy concerns. The data and knowledge they hold should be considered basic public infrastructure for sound privacy decisions. Opening up data protection registers could help reveal who is doing what with our personal data. Investigations undertaken by the authorities into websites’ privacy practices are another important source of information to empower individual users. The more information we have about who is collecting our data and how well they are protecting it, the better we can assess their trustworthiness.

Reflections on an Open Internet of Things

June 21, 2012Uncategorizedinternet of thingsReuben

Last weekend I attended the Open Internet of Things Assembly here in London. You can read more comprehensive accounts of the weekend here. The purpose was to collaboratively draft a set of recommendations/standards/criteria to establish what it takes to be ‘open’ in the emerging ‘Internet of Things’. This vague term describes an emerging reality where our bodies, homes, cities and environment bristle with devices and sensors interacting with each other over the internet.

A huge amount of data is currently collected through traditional internet use – searches, clicks, purchases. The proliferation of internet-connected objects envisaged by Internet-of-Things enthusiasts would make the current ‘data deluge’ seem insignificant by comparison.

At this stage, asking what an Internet of Things is for would be a bit like travelling back to 1990 to ask Tim Berners-Lee what the World Wide Web was ‘for’. It’s just not clear yet. Like the web, it probably has some great uses, and some not so great ones. And, like the web, much of its positive potential probably depends on it being ‘open’. This means that anyone can participate, both at the level of infrastructure – connecting ‘things’ to the internet, and at the level of data – utilising the flows of data that emerge from that infrastructure.

The final document we came up with which attempts to define what it takes to be ‘open’ in the internet of things is available here. A number of salient points arose for me over the course of the weekend.

When it comes to questions of rights, privacy and control, we can all agree that there is an important distinction to be made between personal and non-personal data. What also emerged over the weekend for me were the shades of grey between this apparently clear-cut distinction. Saturday morning’s discussions were divided into four categories – the body, the home, the city, and the environment – which I think are spread relatively evenly across the spectrum between personal and non-personal.

Some language emerged to describe these differences – notably, the idea of a ‘data subject’ as someone who the data is ‘about’. Whilst helpful, this term also points to further complexities. Data about one person at one time can later be mined or combined with other data sets to yield data about somebody else. I used to work at a start-up which analysed an individual’s phone call data to reveal insights into their productivity. We quickly realised that when it comes to interpersonal connections, data about you is inextricably linked to data about other people – and this gets worse the more data you have. This renders any straightforward analysis of personal vs. non-personal data inadequate.

During a session on privacy and control, we considered whether the right to individual anonymity in public data sets is technologically realistic. Cambridge computer scientist Ross Anderson‘s work concludes that absolute anonymity is impossible – datasets can always be mined and ‘triangulated’ with others to reveal individual identities. It is only possible to increase or decrease the costs of de-anonymisation. Perhaps the best that can be said is that it is incumbent on those who publicly publish data to make efforts to limit personal identification.

Unlike its current geographically-untethered incarnation, the internet of things will be bound to the physical spaces in which its ‘things’ are embedded. This means we need to reconsider the meaning of and distinction between public and private space. Adam Greenfield spoke of the need for a ‘jurisprudence of open public objects’. Who has stewardship over ‘things’ embedded in public spaces? Do owners of private property have exclusive jurisdiction over the operation of the ‘things’ embedded on it, or do the owners of the thing have some say? And do the ‘data subjects’, who may be distinct from the first two parties, have a say? Mark Lizar pointed out that under existing U.S. law, you can mount a CCTV camera on your roof, pointed at your neighbours back garden (but any footage you capture is not admissible in court). Situations like this are pretty rare right now but will be part and parcel of the internet of things.

I came away thinking that the internet of things will be both wonderful and terrible, but I’m hopeful that the good people involved in this event can tip the balance towards the former and away from the latter.

Libertarian Floating Islands

June 14, 2012Uncategorizedeconomics, politicsReuben

The philosopher Immanuel Kant once said that if the world were an infinite plane, then all the problems of political philosophy would be solved. If one citizen disagreed with the way his society were run, he could pack up and start a new one over there. In reality, we’re stuck with this spherical earth, and if you keep re-locating over there, eventually you’ll end up back here, to face whatever it is you were trying to get away from in the first place. So it looks like we’re stuck with each other, and the challenge of modern society is to find a compromise.

One thing Kant probably didn’t imagine is that in the 21st century, we would spot an opportunity for a new over there. Last week was the third annual conference on Seasteading. The Seasteading movement aims to create small floating cities in international waters. They envision experimental societies, intentionally-formed communities free from the regulation of national governments and the influence of social mores.

In reality, the most serious interest in seasteading has come from rich venture capitalists. Peter Thiel, the billionaire founder of paypal and noted libertarian, donated $500,000 to the Seasteading Institute in 2005. As a Silicon Valley venture capitalist, Thiel knows first-hand the downsides of government regulation. Thiel has seed-funded a Seastead off the shore of California, which will provide day trips to the mainland and promises to get around the restrictive work visa system, allowing the unrestricted flow of international capital and labour.

But for some, seasteading is more than just a legal hack. It’s an opportunity to apply the scientific method to society; each seastead an experiment to test an economic or political idea. Do financial transactions taxes really chill innovation? What are the consequences of zero welfare provision? How about if we legalise all drugs? Policies which would be impossible in a large democracy with a divided citizenry become possible in smaller communities of like-minded individuals.

So it is no surprise that seasteading is popular amongst libertarians like Thiel. And libertarian seasteads may indeed prove highly successful. But to see them as experiments in the ‘science’ of Politics is a rather dangerous mistake. Such ‘experiments’ have flawed validity; the citizenry of libertarian seasteads would end up a selective group blessed with talents and riches, who spend at least as much of their resources keeping the wrong people out, as letting the right people in.

Thiel criticises the US government immigration policy, as it prevents skilled foreign programmers from working in Silicon Valley. But the libertarian view of immigration has an ironic nuance. On the one hand, they often advocate open borders, arguing – admirably, if unrealistically – that no government should interfere with an individual’s freedom to roam the world as he wishes. On the other hand, in a libertarian society, where private property is absolute and everything is privatised, undesirable immigrants would have the same rights as trespassers, i.e. none. Some Seasteaders, fearful of climate change, have even begun building self-sustainable floating islands, impenetrable to climate refugees. Those foreign programmers on Silicon Island may be welcomed, but only at their host’s discretion. The poor, the destitute, the dispossessed, and the sick need not apply. The taxpayers on the mainland who funded the Seasteaders’ education can also forget about getting anything back.

Libertarian seasteads will be the preserve of the rich, and cut free from the draining demands of the rest of society they may well thrive. But this would hardly be a lesson for the rest of us. Those of us who know that the earth is not an infinite plane, also know that the challenge of building a good society means caring for all. The success of selective libertarian islands would constitute the failure of humanity to work together for an equitable future in a prosperous world.

Online Censorship – Overview of Research

May 9, 2012UncategorizedReuben

I heard last week that UK internet service providers are going to begin censoring file-sharing link aggregator The Pirate Bay. I don’t use TPB, but I went straight to the site to see if it was still accessible (many others evidently did the same, causing an unprecedented traffic spike). As it happens, my broadband provider (BT) haven’t yet decided whether to join in the censorship. So I probably have a little while left to note down the IP address or install appropriate circumvention tools (such as this browser plugin), if I ever want to access TPB in future.

Last year I put together a review and map of some of the academic literature in this area, addressing the question of how effectively governments can censor the web. It is by no means comprehensive (leaving out some important commentators in the area such as Rebecca MacKinnon), but I’ve tried to include a representative sample of the various disciplines I think are needed to answer this question. It doesn’t just boil down to a technical question about tools for censorship and circumvention – we have as much to learn from sociological, legal, political and economic research. I break the issue down into three factors

• Technical tools and infrastructure – what do governments have at their disposal?
• Circumvention – how successful and widespread are citizens attempts?
• Limitations on government power – both constitutional and influence over private industry

I’ve represented the research relevant to each factor in the map below:

You can read the rest of the report here (PDF)

I’m also interested in trying out OONI-probe, a new tool that anyone can deploy to detect censorship on the ground. In addition to the various annual reports (from the Open Net Initiative, HerdictWeb, and others) this should prove an invaluable tool for tracking online censorship in future.

ORGcon2012

March 25, 2012Uncategorizeddigital rights, ORGReuben

I’ve been a fan of the Open Rights Group – the UK’s foremost digital rights organisation – for a few years now, but yesterday was my first time attending ORGcon, their annual gathering. The turnout was impressive; upon arrival I was pleasantly surprised to see a huge queue stretching out of Westminster University and down Regent’s Street.

The day kicked off with a rousing keynote from Cory Doctorow on ‘The Coming War On General-Purpose Computing’ (a version of the talk he gave at the last Chaos Communication Camp, [video]). In his typical sardonic style, Doctorow argued that in an age when computers are everywhere – in household objects, medical implants, cars – we must defend our right to break into them and examine exactly what they are doing. Manufacturers don’t want their gadgets to be general-purpose computers, because this enables users to do things that scare them. They will disable computers that could be programmed to do anything, lock them down and turn them into appliances which operate outside of our control and obscured from our oversight.

Doctorow mocked the naive attempts of the copyright industries to achieve this using digital locks – but warned of the coming legal and technological measures which are likely to be campaigned for by industries with much greater lobbying power. In the post-talk Q&A session, an audience member linked the topic to the teaching of IT in schools; the need for children to understand from an early age how to look inside gadgets, understand how they work and that they may be operating against the users best interests.

As is always the way with parallel sessions, throughout the day I found myself wanting to be in multiple places at once. I opted to hear Wendy Seltzer give a nice summary of the current state of digital rights activism. She likened the grassroots response to SOPA and PIPA to an immune system fighting a virus. She warned that, like an overactive immune system, we run the risk of attacking the innocuous. If we cry wolf too often, legislators may cease to listen. She went on to imply that the current anti-ACTA movement is guilty of this. Personally, I think that as long as such protest is well informed, it cannot do any harm and hopefully will do some good. Legislators are only just beginning to recognise how serious these issues are to the ‘net generation’, and the more we can do to make that clear, the better.

The next hour was spent in a crowded and stuffy room, watching my Southampton colleague Tim Davies grill Chris Taggart (OpenCorporates), Rufus Pollock (OKFN), and Heather Brooke (journalist and author) about ‘Raw, Big, Linked, Open: is all this data doing us any good?’ The discussion was interesting and good to see this topic, which has until recently been confined to a relatively niche community, brought to an ORG audience.

After discussing university campus-based ORG actions over lunch, I went along to a discussion of the future of copyright reform in the UK in the wake of the Hargreaves report. Peter Bradwell went through ORG’s submission to the government’s consultation on the Hargreave’s measures. Saskia Wazkel from Consumer Focus gave a comprehensive talk and had some interesting things to say about the role of consumers and artists themselves in copyright reform. Emily Goodhand (more commonly known as @copyrightgirl on twitter) spoke about the University of Reading’s submission, and her perspective of as Copyright and Compliance officer there. Finally Professor Charlotte Waelde, head of Exeter Law School, took the common call for more evidence-based copyright policy and urged us to ask ‘What would evidence-based copyright policy actually look like?’. Particularly interesting for me, as both an interdisciplinary researcher and believer in evidence-based policy, was her question about what mixture of disciplines are needed to create conclusions to inform policy. It was also encouraging to see an almost entirely female panel and chair in what is too often a male-dominated community.

I spent the next session attending an open space discussion proposed by Steve Lawson, a musician, about the future of music in the digital age. It was great to hear the range of opinions – from data miners, web developers and a representative from the UK Pirate Party – and hear about some the innovations in this space. I hope to talk to Steve in more detail soon in lieu of a book I’m working on about consumer ethics/activism for the pirate generation.

Finally, we were sent off with a talk from Larry Lessig, on ‘recognising the fight we’re in’. His speech took in a bunch of different issues: open access to scholarly literature; the economics of the radio spectrum (featuring a hypothetical three way battle between economist Robert Coase, dictator Joseph Stalin and singer Hetty Lamar [whom I’d never heard of but apparently co-invented ‘frequency hopping’ which paved the way for modern day wireless communication]); and corruption in the US political system, the topic of his latest book.

In the Q+A I asked his opinion on academic piracy (the time honoured practice of swapping PDFs to get around lack of institutional access, which has now evolved into the twitter hashtag phenomenon #icanhazPDF), and whether he prefers the ‘green’ or ‘gold’ routes to open access. He seemed to generally endorse PDF-swapping. He came down on the side of ‘gold’ open access (where publishers become open-access), rather than ‘green’ (where academic departments self-archive), citing the importance of being able to do data-mining. I’m not convinced that data-mining isn’t possible under green OA; so long as self-archiving repositories are set up right (for example, Southampton’s eprints software is designed to enable this kind of thing).

After Lessig’s talk, about a hundred sweaty, thirsty digital rights activists descended on a nearby pub, then pizza, then said our goodbyes until next time. All round it was a great conference; roll on ORGcon2013.

Digital abundance, physical scarcity

October 26, 2011Uncategorizedabundance, economics, environment, IP, scarcityReuben

This is my attempt to articulate what seems like a contradiction in our modern attitudes to the production and consumption of physical versus digital goods. It’s not new, but I often find it lurking the background of much of what I think and read about.

On the one hand, it is increasingly clear that we have begun to push the planet to its limits. We use more and more of the earth’s finite resources, plundering them faster than they can be replaced. Throughout the ages, we have been able to do this without facing negative consequences. Why replant the forest when you can go and chop down another tree? Why create new energy sources when we can continue drilling for oil? This way of thinking is deeply ingrained in our economic model. Growth relies on consumption, and the resulting environmental degradation is not easily factored in to calculation. But even as it becomes clear that the natural world can no longer be treated as abundant,, we continue to act as if it is.

On the other hand, intellectual goods – by which I mean knowledge, culture, art, music, literature – are now more abundant than ever. They have, for most of history, been bounded by the scarce physical matter which allowed their transmission from one mind to another. The production and dissemination of knowledge and literature was for a long while dependent on paper, printing presses and costly distribution chains. Music was limited first by proximity to musicians, and later, by the material format on which sound was stored. Now, with the advent of the web, the cost of a copy of a book, song or image approaches zero. Modern technology enables us to have more intellectual goods than we could ever consume in a lifetime.

And yet the prevailing economic model for the production of intellectual goods requires us to behave as if they are scarce. The ‘content’ industries – those whose products exist as particular strings of 1’s and 0’s – have to limit the supply of their product to maintain its value. If just anyone can access to the particular string of 1’s and 0’s which makes up an mp3 audio file, then the intellectual good loses its value in the marketplace. According to some, this ultimately leads to no new intellectual goods being produced in the first place, but that’s another story. In any case, this imposed scarcity is artificial in the sense that there is no technological reason why everybody cannot access those bits or run that piece of code.

In both cases, our beliefs about the value and availability of a given resource are grounded in the reality of the past. For centuries, the earth’s resources really were abundant, and the dominant attitude towards them was appropriate; it allowed human civilization to progress. Likewise, intellectual goods actually were scarce, so our consumption of them really did have to be limited. But now that the situation is reversed, our assumptions have failed to catch up. We treat our natural resources as if they are abundant, and intellectual goods as if they are scarce, when the environmental and technological realities suggest the exact opposite.

Open Government Data Camp 2011

October 23, 2011UncategorizedReuben

This year’s Open Government Data Camp, hosted by the Open Knowledge Foundation, was held in Warsaw, in the incredible post-industrial Soho Factory. A gathering of open government data enthusiasts from around the world, it was a platform for sharing experiences, tracking progress and debating pressing issues for the future of the movement.

This being my first visit to an event of this kind, I was impressed by the number of attendees – apparently a significant increase on last year – as well as their diversity (although it was disappointing to see no female keynoters). I joined on the second day, which got off to a swift and serious start with keynote presentations.

Andrew Rasiej made a rousing case against ‘E-government’ and in favour of ‘WE-government’. The former implies governments delivering wasteful IT services to citizens, while the latter is about governments opening up their datasets and allowing anyone to build on top of them. Tom Steinberg’s presentation about MySociety was a perfect example of what can be achieved with this approach. Chris Taggart from OpenCorporates set a sober tone by outlining why he believes the open government data movement will probably fail. The majority of the world’s data is held by a relatively small number of companies which show no sign of opening it up, and there are too many open data projects and initiatives which are operating in silos. He concluded that with even with hard work, the odds are still stacked against the movement. Andrew Stott, (UK cabinet office’s Director of Digital Engagement) urged the audience to watch Yes, Minister, the classic British TV comedy set in the corridors of Whitehall, in order understand how ‘they’ think and the barriers to opening up data.

Nigel Shadboldt outlined a number of important developments in open data, and briefly mentioned another issue which is set to grow in importance over the next few years; that of individuals getting access to the data that companies are gathering on them. Personally I see this being manifested in two ways. The first is a government and business-led approach, along the lines of the UK government’s recently announced ‘MyData’ initiative (for which Nigel is an advisor). The idea is that companies will release their customer’s data to individuals, who then give it to third parties, who use it to create services to sell back to the customer – imagine, for instance, an app which tracks your calorie intake by analysing your supermarket purchases. The other is a bottom up, consumer-led approach, the beginnings of which we can already see in the fast-growing ‘Europe against Facebook‘ campaign, which aims to give Facebook users control over the data stored on the social networking site. It will be interesting to see whether and how these two approaches interact in the near future, and how they both relate to the open data movement.

Tom Steinberg explained how his latest project – FixMyTransport – was actually designed to ‘trick people into their first act of civic engagement’. The words ‘activism’ or ‘campaign’ don’t appear on the website, because that kind of language can often be alienating to the target audience, who just want to sort out a problem with their daily commute. The simple interface makes it very easy for a user to make a complaint. One complaint on its own have very little effect, but the site makes it very easy for individual complaints to aggregate publicly. With the support of five or more people, transport operators tend to take notice. The site is a few months old and some early successes suggest the approach could work on a large scale. I really liked the idea of enticing ordinary people with no interest in or knowledge of open data to take part by creating a really simple and attractive interface and purposefully leaving out any political language.

The enigmatically titled ‘Open… ‘ session turned out to be a somewhat philosophical discussion led by Andrew Rasiej and Nigel Shadbolt about the meaning of terms like ‘open’ and ‘public’ when applied to government data. Does data published as a PDF count as public, or does it need to be machine-readable? In a world where more and more of our information-processing is done by machines, ‘public access’ to data which can only be processed via feeble human eyes means very little. Data which has to be scraped from a website is not, Nigel suggested, good enough. Clearly, the ideal would be a presumption that ‘public’ entailed access to the data in formats which allow sophisticated manipulation rather than mere eyeball-scanning.

Generally, there seemed to be surprisingly little discussion of the Open Government Partnership (an intergovernmental initiative to secure commitments to open up government data). When it was mentioned it was often accompanied by scepticism. Although there may be problems with the approach, and it may yet turn out to be another opportunity for governments to enthuse about open data without actually doing much, I wonder if it deserves more optimistic engagement at this stage. That said, there were so many conversations going on in parallel sessions that I may have missed the more positive opinions floating around the camp.

All in all, it was a fascinating snapshot of the current state of the open government data movement. While many challenges lie ahead, the next year is sure to be interesting. I look forward to attending the next event.