Open Research in Practice: responding to peer review with GitHub

February 15, 2014UncategorizedReuben

I wrote a tweet last week that got a bit of unexpected (but welcome) attention;

Co-author and I are using @github to revise a conference paper after peer review. Creating 'issues' to track progress. #openscience

— Reuben Binns @RDBinns@someone.elses.computer (@RDBinns) February 10, 2014

I got a number of retweets and replies in response, including:

https://twitter.com/chadkoh/statuses/433012506097745920

@rdbinns @openscience Can you share more on using @github for reports etc?

— GIGAmacro (@giga_macro) February 12, 2014

https://twitter.com/n3siy/status/433462895116943360

@RDBinns is it a public repository?

— Craig B (@NeuroCraig) February 12, 2014

The answers are: Yes, yes, yes and yes. I thought I’d respond in more detail and pen some short reflections on github, collaboration and open research.

The backstory; I had submitted a short paper (co-authored with my colleague David Matthews from the Maths department) to a conference workshop (WWW2014: Social Machines). While the paper was accepted (yay!), the reviewers had suggested a number of substantial revisions. Since the whole thing had to be written in Latex, with various associated style, bibliography and other files, and version control was important, we decided to create a github repo for the revision process. I’d seen Github used for paper authoring before by another colleague and wanted to try it out.

Besides version control, we also decided to make use of other features of Github, including ‘issues’. I took the long list of reviewer comments and filed them as a new issue. We then separated these out into themes which were given their own sub-issues. From here, we could clearly see what needed changing, and discuss how we were going to do it. Finally, once we were satisfied with an issue, that could be closed.

At first I considered making the repository private – I was a little bit nervous to put all the reviewer comments up on a public repo, especially as some were fairly critical of our first draft (although, in hindsight, entirely fair). In the end, we opted for the open approach – that way, if anyone is interested they can see the process we went through. While I doubt anyone will be interested in such a level of detail for this paper, opening up the paper revision process as a matter of principle is probably a good idea.

With the movement to open up the ‘grey literature’ of science – preliminary data, unfinished papers, failed experiments – it seems logical to extend this to the post-peer-review revision process. For very popular and / or controversial papers, it would be interesting to see how authors have dealt with reviewer comments. It could help provide context for subsequent debates and responses, as well as demystify what can be a strange and scary process for early-career researchers like myself.

I’m sure there are plenty of people more steeped in the ways of open science who’ve given this a lot more thought. New services like FigShare, and open access publishers like PLoS and PeerJ, are experimenting with opening up all the whole process of academic publishing. There are also dedicated paper authoring tools that extend on git-like functionality – next time, I’d like to try one of the collaborative web-based Latex editors like ShareLatex or WriteLatex. Either way, I’d recommend adopting git or something git-like, for co-authoring papers and the post-peer-review revision process. The future of open peer review looks bright – and integrating this with an open, collaborative revision process is a no-brainer.

Next on my reading list for this topic is this book on the future of academic publishing by Kathleen FitzPatrick – Planned Obsolescence
— UPDATE: Chad Kohalyk just alerted me to a relevant new feature rolled out by Github yesterday – a better way to track diffs in rendered prose. Thanks!

Care.Data: Why we need a new social contract for personal health data

February 11, 2014UncategorizedReuben

In an ideal world, our collective medical records would be a public good, carefully stewarded by responsible institutions, used to derive medical insights and manage public health better. This is the basic premise of the care.data scheme, and construed as such it suggests a simple moral equation with an obvious answer; give up a little individual privacy for the greater public good. The problem is, our world is not ideal. We’re in the midst of multiple crises of trust in government, the private sector and the ability of our existing global digital infrastructure to adequately deal with the challenges of personal data.

The NHS conducted a privacy impact assessment for the care.data scheme, to identify and weigh its risks and benefits. In discussing why citizens might choose to opt-out of sharing their own data (as 40% of surveyed GP’s said they would), the final paragraph is both infuriating and revealing:

‘However, some people may believe that any use of patient identifiable data without explicit patient consent is unacceptable. These people are unlikely to be supportive of care.data whatever its potential benefits and may object to the use of personal confidential data for wider healthcare purposes.’

In other words, there are some people who will selfishly exercise their individual rights to privacy (for whatever misguided reasons), to the cost and detriment of the public good.

While the leaflet promoting the scheme encourages donating ones data as a contribution to the public health service, even left-wing Bevanites have reason to be sceptical. While many of us instinctively trust ‘our NHS’, the truth is large parts of it are no longer ‘ours’, and the care.data scheme is a perfect example. As expected, the contract to provide the ‘data extraction’ service was won by an unnaccountable private sector provider (Atos, who are also responsible for disability benefit assessments), while some of the main beneficiaries of all the data itself will be a plethora of commercial entities.

This is not to say that private sector use of health data is inherently bad. The trouble with the care.data scheme goes deeper than that; it is a microcosm of a much wider malaise about the future of personal data and the value of privacy.

The social contract governing the use of our health information was written for a different age, where ‘records’ meant paper, folders and filing cabinets rather than entries in giant, mine-able databases. This social contract (if it ever even existed) never granted a mandate for the new kinds of purposes HSCIC proposes.

Such a mandate would have to be based on a realistic and robust assessment of the long-term risks and a stronger regulatory framework for downstream users. Crucially, it would need to proactively engage citizens, enabling them to make informed choices about their personal data and its role in our national information infrastructure. Rather than seizing this opportunity to negotiate a new deal around data sharing, the architects of this scheme have attempted to hush it in through the backdoor.

Thankfully, there are alternative ways to reap the benefits of aggregated health data. One example is Swiss initiative HealthBank.ch, a patient data co-operative, owned and run by its members. By giving patients themselves a stake and a say in the governance of their data, the project aims to harness that data to ‘benefit the individual citizen and society without discrimination and invasion into privacy’.

Personal data collected unethically is like bad debt. You can aggregate it into complex derivatives, but in the end it’s still toxic. If the NHS start out on the wrong foot with health data, no amount of beneficial re-use will shore up public trust when things go wrong.

Snowden, Morozov and the ‘Internet Freedom Lobby’

January 24, 2014UncategorizedReuben

The dust from whistleblower Edward Snowden’s revelations has still not settled, and his whistle looks set to carry on blowing into this new year. Enough time has elapsed since the initial furore to allow us to reflect on its broader implications. One interesting consequence of the Snowden story is the way it has changed the debate about Silicon Valley and the ‘internet freedom’ lobby. In the past, some commentators have (rightly or wrongly) accused this lobby of cosying up to Silicon Valley companies and preaching a naive kind of cyberutopianism.

The classic proponent of this view is the astute (though unecessarily confrontational) journalist Evgeny Morozov, but variations on his theme can be found in the work of BBC documentarian-in-residence Adam Curtis (whose series ‘All Watched Over by Machines of Loving Grace‘ wove together an intellectual narrative from 60’s era hippies, through Ayn Randian libertarianism to modern Silicon Valley ideology). According to these storytellers, big technology companies and non-profit groups have made faustian bargains based on their perceived mutual interest in keeping the web ‘free from government interference’. In fact, they say, this pact only served to increase the power of both the state and the tech industry, at the expense of democracy.

Whilst I agree (as Snowden has made clear) that modern technology has facilitated a something of a digital land grab, the so-called ‘internet freedom lobby’ are not to blame. One thing that was irksome about these critiques was the lack of distinction between parts of this ‘lobby’. Who exactly are they talking about?

Sure, there are a few powerful ideological libertarians and profiteering social media pundits in the Valley, but there has long been a political movement arguing for digital rights which has had very little to do with that ilk. Morozov’s critique always jarred with me whenever I came across one of the many the principled, privacy-conscious technophiles who could hardly have been accused of Randian individualism or cosying up to powerful elites.

If there is any truth in the claim, it is this; on occasion, the interests of internet users have coincided with the interests of technology companies. For instance, when a web platform is forced to police behaviour on behalf of the Hollywood lobby, both the platform and its users lose. More broadly, much of the free/libre/open source world is funded directly or indirectly from the profits of tech companies.

But the Snowden revelations have driven a rhetorical wedge further between those interests. Before Snowden, people like Morozov could paint digital rights activists as naive cheerleaders of tech companies – and in some cases they may have been right. But they ignored the many voices in those movements who stood both for emancipatory power of the web as a communications medium, and against its dangers as a surveillance platform. After Snowden, the privacy wing of the digital rights community has taken centre stage and can no longer be ignored.

At a dialectical level, Silicon Valley sceptics like Morozov should be pleased. If any of his targets in the digital rights debate have indeed been guilty of naivety about the dangers of digital surveillance, the Snowden revelations have shown them the cold light of day and proved Morozov right. But in another sense, Snowden proved him wrong. Snowden is a long-term supporter of the Electronic Frontier Foundation, whose founders and supporters Morozov has previously mocked. Snowden’s revelations, and their reception by digital rights advocates, shows that they were never soft on digital surveillance, by state or industry.

Of course, one might say Snowden’s revelations were the evidence that Morozov needed to finally silence any remaining Silicon Valley cheerleaders. As he said in a recent Columbia Journalism Review interview: “I’m destroying the internet-centric world that has produced me. If I’m truly successful, I should become irrelevant.”

Why DRM is not a technical solution to privacy

November 28, 2013UncategorizeddrmReuben

Recently I’ve heard a number of people suggest that personal data might be protected using ‘digital rights management’, the same technology that some copyright owners use to ‘protect’ ‘their’ content (apologies for excessive scare-quotes but I think they are necessary in this instance). The idea is that content or data is transferred to the user in a proprietary format (often with encryption), which can only be played or used by related proprietary software or hardware and relevant decryption keys. Thus, in theory, the content ‘owner’ (or the individual data subject, in the privacy protection scenario) is able to ensure the content/data is only accessible to licensed users for a restricted range of uses. In practice, DRM content is invariably cracked and unlocked, after which it can be copied, shared and used without restriction.

I’m sceptical as to whether ‘DRM for privacy’ could ever really work as a purely technical fix to the privacy problem. As far as I can see, the proposals either amount to simple encryption of user data (which certainly has a role in protecting privacy, but has existed for years without being called ‘DRM’), or else they involve some additional policy proposal or trust arrangement which goes beyond the technology and enters into the contractual / legal / regulatory arena.

For instance, a recent DRM-for-privacy proposal from a Microsoft Research engineer Craig Mundie goes something like this. Personal data (e.g. health records) are encrypted before being sent to a third party (let’s say, a medical researcher) for processing. The encrypted package comes with some additional metadata wrapper, explaining the terms and conditions for use, and some kind of consent mechanism so the data processor can express their consent, whereafter the data becomes accessible.

This sounds nice in theory but in order to work, the terms would need to be legally binding and enforceable. Unless there is some sort of audit trail, and a credible threat for non-compliance, there’s nothing to stop the processor simply clicking ‘I agree’ and then ignoring the terms. Encryption only protects the data up to the point at which the data’s terms-of-use clickwrap is ripped open. And if the whole motivation for adopting DRM in the first place was that you don’t trust the entity you’re giving data to, it becomes pointless. Cory Doctorow put it thus;

For “privacy DRM” to work, the defender needs to be in a position to dictate to the attacker the terms on which he may receive access to sensitive information. For example, the IRS is supposed to destroy your tax-records after seven years. In order for you to use DRM to accomplish the automatic deletion of your records after seven years, you need to convince the IRS to accept your tax records inside your own DRM wrapper.

But the main reason to use technology to auto-erase your tax-records from the IRS’s files is that you don’t trust them to honor their promise to delete the records on their own. You are already adversarial to the IRS, and you are already subject to the IRS’s
authority and in no position to order it to change its practices. The presence or absence of DRM can’t change that essential fact.

Talking about encryption, ‘metadata wrappers’ and DRM makes Mundie’s proposal sound like a nice, stand-alone technical solution, but ultimately it relies on further legal, social and technical infrastructure to work in practice. All the encryption does is protect your data while it’s in transit, and all the terms-of-use wrapper does is let them know your preferences. Unless there’s something in current DRM-for-privacy proposals that I have missed – in which case, I’d be very keen to learn more. But I can’t find any more detailed proposals from Mundie or anyone else.

As well as being a little misleading on a technical level, I’m also suspicious about the motivation behind slapping the DRM label onto this proposal. Those who would like protect their business models with DRM have a vested interest in classifying any kind of socially useful technology which vaguely resembles it as such. That way they can refer to ‘enhanced privacy’ as one of the consumer benefits of DRM, whilst sweeping its more damaging aspects under the carpet.

Looking for a cloud I can call my own

November 2, 2013UncategorizedReuben

Dusk Cloud Mountains, By DeviantArt User Akenator http://akenator.deviantart.com/ under a Creative Commons Attribution 3.0 License

The term ‘cloud computing’ refers to the idea that programs, processing and data storage can be run on a connected remote server rather than happening on your personal computer device. It was coined in the 1990’s by Irish entrepreneur Sean O’Sullivan, but didn’t achieve true buzzword ubiquity until the late 2000’s.

The term is still vague, despite attempts by the European Union to give it a concrete definition. To me, it simply means that the code I’m using and interacting with is happening on a computer that isn’t in my nearby physical space. But this lack of proximity to the physical location of the code can be worrying. Usually it means it’s happening on a server thousands of miles away that you have no control over. Can you trust that the code is safe, and not working against you? Who else might see your data when it’s stored in the cloud?

Despite these fears, most of us have embraced the cloud, using cloud storage providers like Google and Dropbox and installing mobile apps which store our data and process it remotely. But what is the alternative? One option is to store all your files and run applications on your own hardware. But many applications are cloud-only, and it is hard to make backups and integrate multiple devices (laptop, tablet, phone) without syncing via a cloud. Another is to encrypt all your data before you upload it to the cloud, but this can limit its use (the data needs to be decrypted before you can do anything with it).

A better alternative might be for each of us to have our own personal clouds which we can connect to via our personal devices. Personal clouds would be under our control, running on hardware that we own or trust. They could be hosted on lightweight, internet-connected devices kept in safe, private places – perhaps in a safety deposit box in your home. Or they might be hosted somewhere else – by a hosting provider you trust – and be easily packaged up and taken elsewhere if you change your mind.

Over the last few weeks, I’ve been trying to migrate away from my existing cloud storage providers (including Google Drive, Dropbox and Ubuntu One), and experimenting with running my own personal cloud. I’m trying out various free and open-source personal cloud systems, hosted on my own hardware (an old laptop), or on a hosting provider I trust.

Sceptics may say that this option is beyond the technical capability of the vast majority of users. I’d agree – without experience as a system administrator, it wasn’t simple to set up and maintain. But despite a few teething problems, it’s not as hard as I thought. With a bit of help and some improvements in user experience, running your own server could be within the reach of the average user. Just like the motor car and the personal computer, personal clouds don’t need to be fully understood by their owners.

One day, owning your own cloud might be as common as owning your own home (it would certainly be more affordable). And as personal data plays an increasingly important role in our lives, trusting the hardware it’s housed in might be as important as trusting the roof over your head.

I hope to blog further about my journey towards a personal cloud in the coming weeks and months…

Do you need a Personal Charity Manager?

September 27, 2013Uncategorizedcharity, intermediary, nudge, philanthropyReuben

2969641664_80c1ecae03 — ‘Charity‘ – by flickr user Howard Lake under CC-BY-SA 2.0 license

As an offshoot of some recent work, I’ve been thinking a lot about intermediaries and user agents, who act on behalf of individuals to help them achieve their goals. Whether they are web browsers and related plugins that remember stuff for you or help you stay focused, or energy switching platforms like Cheap Energy Club who help you get the best deal on energy, these intermediaries provide value by helping you to follow through on your best intentions. I don’t trust myself to keep on top of the best mobile phone tariff for me, so I delegate that to a third party. I know that when I’m tired or bored, I’ll get distracted by Youtube, so I use a browser plugin to remove that option when I’m supposed to be working.

Intermediaries, user agents, personal information managers, impartial advisers – however you refer to them, they help us by overcoming our in-built tendencies to forget, to make bad choices in the heat of the moment, or to disregard important information. Behavioural economics has revealed us to be fundamentally less rational in our everyday behaviour than we think. Research into the very real concept of willpower shows that all the little everyday decisions we have to take exact a toll on our mental energy, meaning that even with the best intentions, it’s very unlikely that we consistently make the best choices day-to-day. The modern world is incredibly complex, so anything that helps us make more informed decisions, and actually act consistently in line with those decisions on a daily basis, has got to be a good thing.

Most of these intermediary systems operate on our interactions with the market and public services, but few look at our interactions with ‘third sector’ organisations. This is an enormous opportunity. Nowhere else is the gap between good intentions and actual behaviour more apparent than in the area of charitable giving. If asked in a reflective state of mind, most people would agree that they could and should do more to make the world a better place. Most people would agree that expensive cups of coffee, new clothes, or a holiday are not as important as alleviating world hunger or curing malaria. Even if home comforts are deserved, we would probably like to cut down on them just a little bit, if doing so would significantly help the needy (ethicist Peter Singer suggests donating just 10% of your income to an effective charity).

But on a day-to-day basis, this perspective fades into the background. I want a coffee, I can easily afford to buy one, so why not? And anyway, how do you know the money you donate to charity is actually going to do anything? International aid is horribly complex, so how can an ordinary person with a busy life possibly work out what’s effective? High net worth individuals employ full time philanthropy consultants to do that for them. So even if we recognise on an abstract, rational level that we ought to do something, the burden of working out what to do, the hassle of remembering to do it, and the mental effort of resisting more immediate conflicting urges, are ultimately overwhelming. The result is inertia – doing nothing at all.

Many charities attempt to bypass this by catching our attention with adverts which tug at the heartstrings and present eye-catching statistics. As a result, until recently I went about giving to charity in a completely haphazard way – one-off donations to whoever managed to grab my attention at the right moment. But wouldn’t it be better if we could take our rational, considered ethical commitments and find ways to embed them in our lives, to make them easy to adhere to, reducing the mental and administrative burden? I’ve found several organisations that can help you work out how to give more effectively and stay committed to giving (see Giving What We Can). But there is even more scope for intermediaries to provide holistic systems to help you develop and achieve your ethical goals.

Precisely what form they take (browser plugins, online services, or real, human support?), and what we call them (Personal Charity Managers, Ethical Assistants, Philanthropic Nudges, Moral Software Agents), I won’t attempt to predict. They wouldn’t be a panacea; ethical intermediaries will never replace careful, considered moral deliberation, rigorous debate about right and wrong, and practising virtue in daily life. But as services that practically help us follow through on our carefully considered moral beliefs, and manage our charitable giving, they could be revolutionary.

What can innovators in personal data learn from Creative Commons?

September 5, 2013UncategorizedReuben

: “License Layers” by Creative Commons, used under Creative Commons Attribution 3.0 License

This post was originally published on the Ctrl-Shift website.

A few weeks ago I attended the Creative Commons global summit, as a member of the CC-UK affiliate team, and came away thinking about lessons for the growing personal data ecosystem.

Creative Commons is a non-profit organisation founded in 2003 to create and promote a set of alternative copyright licenses which allow creative works to be legally shared, remixed and built upon by others. Creators can communicate which rights they want to keep, and which they would like to waive. These licenses are now used in education, cultural archives, science, as well as in commercial contexts. By creating a set of legally robust, standardised and easy-to-use licenses, the organisation has turned a complicated and costly legal headache into an usable piece of public infrastructure fit for the digital age.

What lessons does this movement have for the management and use of personal data? In one sense, managing content is radically different to managing personal data. Consumers generally want to be able to restrict the publication of their personal information, while creative content is generally made for public consumption from the outset. But despite the differences, there are some striking parallels – parallels which point to possible innovations in personal data.

Just as for creative works, personal data bridges technical, legal and human challenges. Personal data is stored, transferred and transformed by technology, in ways that are not always captured by the legal terminology. In turn, the law is usually too complex for humans – whether they be data controllers or individual data subjects themselves – to understand. Creative commons licenses translate a complex legal tool into something that both humans and machines can understand. There are easy tools to help creators choose the right license for their work, and a simple set of visual icons help users understand what they can do with the work. By attaching metadata to content, search engines and aggregators can automatically find and organise content according to the licenses applied.

There are already pioneering initiatives which attempt to apply aspects of this approach to personal data. One promising area is privacy policies. Much like copyright licenses, these painfully obscure documents are usually written in legalese, and can’t be understood by humans or parsed by computers. Various projects are working to make them machine-readable, and to develop user-friendly icons to represent important clauses – for instance, whether data is shared with third parties. Conversely, if individuals want to create and share data about themselves, under certain conditions, they may need an equivalent easy-to-use license-chooser.

The personal data ecosystem is in need of public, user-friendly, and standardised tools for managing data. The Creative Commons approach shows this can be done for creative works. Can a similar approach work for personal data?

Is Commodify.us an elaborate art joke?

August 26, 2013UncategorizedReuben

Last week I was sent a link to commodify.us – a new web application where you can upload your data from Facebook, and choose whether to license it directly to marketers or make it available as open data. It’s a neat idea which has been explored by a number of other startups (e.g. Personal.com, YesProfile, Teckler).

Obviously, uploading all of your Facebook data to a random website raises a whole host of privacy concerns – exactly what you’d expect a rock-solid privacy policy / terms-of-service to address. Unfortunately, there doesn’t seem to be any such terms for commodify.us. If you click the Terms of Service button on the registration page it takes you nowhere.

Looking at the page source, the html anchor points to an empty ‘#’ id, which suggests that there is not some problem with the link, but that there was nowhere to link to in the first place; suspicious! If I was serious about starting a service like this, the very first thing I’d do is draft a terms-of-service and privacy policy. Then before launching the website, I’d triple-check to make sure it appears prominently on the registration form.

Looking at the ‘Browse Open Data’ part of the website, you can look at the supposedly de-identified Facebook profiles that other users have submitted. These include detailed data and metadata like number of friends, hometown, logins, etc. The problem is, despite the removal of names, the information on these profiles is almost certainly enough to re-identify the individual in the majority of cases.

These two glaring privacy issues and technical problems make me think this whole thing might just be an elaborate hoax. In which case, Ha ha. Well, done, you got me. After digging a little deeper, it looks like the website is a project from Commodify, Inc., an artist-run startup, and Moddr, who describe themselves as;

Rotterdam-based media/hacker/co-working space and DIY/FOSS/OSHW fablab for artgeeks, part of the venue WORM: Institute for Avantgardistic Recreation‘

They’re behind a few other projects in a similar vein, such as ‘Give Me My Data‘. I remembered seeing a very amusing presentation on the Web 2.0 Suicide Machine project by Walter Langelaar a year or two ago.

So I registered using a temporary dummy email addresses, to have a look around, but I didn’t get to upload my (fake) data because the data upload page says it’s currently being updated. I tried sending an email to the mailing address moderator ( listed as tim@moddr.net ) but it bounced.

If this is intended as a real service, then it’s pretty awful as far as privacy is concerned. If it’s intended as a humorous art project, then that’s fine – as long as as there are no real users who have been duped into participating.

Southampton CyberSecurity Seminar

July 19, 2013Uncategorizeddata protectionReuben

I recently delivered a seminar for the Southampton University Cyber Security seminar series. My talk introduced some of the research I’ve been doing into the UK’s Data Protection Register, and was entitled ‘Data Controller Registers: Waste of Time or Untapped Transparency Goldmine?’.

The idea of a register of data controllers came from the EU Data Protection Directive, which set out a blueprint for member state’s data protection laws. Data controllers – any entity responsible for collection and use of personal data – must provide details about the purposes of collection, categories of data subjects, categories of personal data, any recipients, and any international data transfers, to the supervisory authority (in the UK, this is the Information Commissioner’s Office). This represents a rich data source on the use of personal data by over 350,000 UK entities.

My talk explored some initial results from my research into 3 years worth of data from this register. A number of broad trends have been identified, including;

The amount of personal data collection reported is increasing. This is measured in terms of the number of distinct register entries for individual instances of data collection, which have increased by around 3% each year.
There are over 60 different stated reasons for collection of data, with ‘Staff Administration’, ‘Accounts & Records’ and ‘Advertising, Marketing & Public Relations’ being the most popular (outnumbering all other purposes combined).
The categories of personal data collected exhibit a similar ‘long tail’, with ten very common categories (including ‘Personal Details’, ‘Financial Details’ and ‘Goods or Services Provided’) accounting for the majority of instances.
In terms of transfers of data outside the EU, the vast majority of international data transfers are described as ‘Worldwide’. Of those who do specify, the most popular countries are the U.S., Canada, Australia, New Zealand and India.

Beyond these general trends, I explored one particular category of personal data collection which has been raised as a concern in studies of EU public attitudes, namely, trading and sharing of personal data. The kinds of data likely to be collected for this purpose are broadly reflective of the general trends, with the exception of ‘membership details’, which are far more likely to be collected for the purpose of trading.

Digging further into this category, I selected one particularly sensitive kind of data – ‘Sexual Life’ – to see how this was being used. This uncovered 349 data controllers who hold data about individual’s sexual lives, for the purpose of trading and sharing with other entities (from the summer 2012 dataset). I visualised this activity as a network graph, looking at the relationship between individual data controllers and the kinds of entities they share this information with. By clicking on blue nodes you can see individual data controllers, while categories of recipients are in yellow

I also explored how this dataset can be used to create personalised transparency tools, or to ‘visualise your digital footprint’. By identifying the organisations, employers, retailers and suppliers who have my personal details, I can pull in their entries from the register in order to see who knows what about me, what kinds of recipients they’re sharing it with and why. A similar interactive network graph shows a sample of this
Open data is often seen as in tension with privacy. However, through this research I hope to demonstrate some of the ways that open data can address privacy concerns. These concerns often stem from a lack of transparency about the collection and use of personal data by data controllers. By providing knowledge about data controllers, open data can be a basis for accountability and transparency about the use (or abuse) of personal data.

Data on Strike

July 19, 2013Uncategorizedrcuk, smart citiesReuben

What happens to a smart city when there’s no access to personal data?

Last week I had the pleasure of attending the Digital Revolutions Oxford summer school, a gathering of PhD’s doing research into the ‘digital economy’. On the second day, we were asked to form teams and engage in some wild speculation. Our task was to imagine a news headline in 2033, covering some significant event that relates to the research we are currently undertaking. My group took this as an opportunity to explore various utopian / dystopian themes relating to power struggles over personal data, smart cities and prosthetic limbs.

The headline we came up with was ‘Data Strike: Citizens refuse to give their data to Governments and Corporations’. Our hypothesis was that as ‘smart cities’ materialise, essential pieces of infrastructure will become increasingly dependent on the personal data of the city’s inhabitants. For instance, the provision of goods and services will be carefully calibrated to respond and adjust to the circumstances of individual consumers. Management of traffic flow and transportation systems will depend on uninterrupted access to every individual’s location data. Distributed public health systems will feed back data live from our immune systems to the health authorities.

In a smart city, personal data itself is as critical a piece of infrastructure as you can get. And as any observer of strike action will know, critical infrastructure can quickly be brought to a halt if the people it depends on decide not to co-operate. What would happen in a smart city if its inhabitants decided to go on a data strike? We imagined a city-wide personal data blackout, where individuals turn off or deliberately scramble their personal devices, wreaking havoc on the city’s systems. Supply chains would misfire as targeted consumers dissappear from view. Public health monitoring signals would be scrambled. Self-driving cars would no longer know when to pick up and drop off passengers – or when to stop for pedestrians.

We ventured out into the streets of Oxford to see what ‘the public’ thought about our sensational predictions, and whether they would join the strike. I had trouble selling the idea of a ‘data co-operative’ to sceptical passengers waiting at the train station, but was surprised by the general level of concern and awareness about the use of personal data. As a break from dry academic work, this exercise in science fiction was a bit of light relief. But I think we touched on a serious point. Smart cities need information infrastructure, but ensuring good governance of this infrastructure will be paramount. Otherwise we may sleepwalk into a smart future where convenience and efficiency are promoted at the expense of privacy, autonomy and equality. We had better embed these values into smart infrastructure now, while the idea of a data strike still sounds ridiculous.

Thanks to Research Council’s UK Digital Economy Theme, Know Innovation and the Oxford CDT in healthcare innovation, for funding / organising / hosting the event. More comprehensive coverage can be found over on Chris Phethean’s write-up.