Opinion: data means vastly different things to the users of technology, the companies that build it and especially computer science academics 

By Dr Jennifer Edmond, Trinity College Dublin

It's easy to feel confused about big data. On the one hand, it seems to be a wonderful thing, the "new oil" as they say. One London-based analytics firm went so far in an advertising campaign as to call big data "the secret to living happily ever after". This may be a bit exaggerated, but big data approaches have proven themselves as powerful tools for addressing social problems and creating economic value.

However, there is a dark cloud to this silver lining. The availability of oceans of information, as well as of software able to find patterns in that ocean, leaves individuals, companies and indeed societies open to harm. Privacy has become the catch all descriptors for the object of these threats. When personal information, be it your bank details or your medical records, gets hacked, you feel the sting in the tail of big data. And it does not even have to be your own data that is disclosed: privacy, ironically, has a very public aspect.  If enough people around you reveal enough about themselves, the same software drawing conclusions from what others expose can also generate conclusions about you.

One easy-to-grasp example of this is your DNA, which in Ireland can be legally collected and stored under certain conditions by the Garda Siochana.  But your DNA profile is, by its very nature, shared with your parents and other blood relatives. It is not difficult to imagine how some of what it reveals, such as an inherited illness or disability, could be used as a basis for discrimination against potentially vulnerable and completely innocent individuals.

From The Business on RTÉ Radio One, a discussion on Data Protection Day (January 28) featuring TJ McIntyre, Lecturer of Law at UCD & Chairman of Digital Rights Ireland and Philip McMichaels, MD of AMI

Let's not forget as well that privacy as we understand it is only the tip of the iceberg. While privacy protection may be the term currently embedding itself in legal and corporate jargon (especially as Europe moves to adopt sweeping changes to its privacy regulations), big data also threatens other aspects of identity formation. Filter bubbles are shielding us from opinions different from our own, internet trolls are taking advantage of the anonymity of online fora to enact power games of verbal abuse over unseen others and democracy itself having been shown to be at risk from microtargeted ads on social media platforms.

If we are to harness the many potential benefits of big data without incurring these risks, we will need to take more responsibility for what happens within the "black boxes". These are the systems that we may view only as slick interfaces (or may not be able to see at all) without knowing what we are disclosing about ourselves through them or how that information may be used.

But who, in this case, are "we?" Who exactly needs to do what to reduce the potential for harm inherent in current approaches and practices around big data? Most often, the finger is pointed either at the users of technology or the companies that build it. As a result, individuals are becoming more aware of how their activities are monitored and the records of this activity reused or sold on. Companies are under more pressure to ensure privacy settings are transparent and data breaches are judged more harshly by consumers than ever before.

Is your data in here? Google's data centre in Dublin

However, there is a a third leg in this triad of complicity and one far less often discussed. It's the discipline of computer science were the fetish of data and algorithm begins - as do the blind spots that impact upon us. 

One can quickly feel lost talking about data with computer scientists and software developers and not just with the inevitable specialist terminology you find in any established field. Such manifestations of linguistic and cultural dislocation between technology and society has been the focus of a European-funded project called Knowledge Complexity, which is exploring some of the origins and points of friction these gaps gives rise to. 

For example, even the manner in which the term data is used seems to obscure more than it reveals. In computer science literature, "data" can refer to both input and output. Both raw and highly manipulated  it comes from predictable sources (like sensors) and highly unpredictable ones (like people). Most importantly, it is both yours and mine. The scale of the pervasion of this super-term is hard to fathom: in one single computer science research paper we found the word data used more than to 500 times over the course of about 20 pages.

"When personal information, be it your bank details or your medical records, gets hacked, you feel the sting in the tail of big data"

In other scientific disciplines, many different terms would normally appear to clarify such an argument and highlight the relationship between research inputs and conclusions but not so here.While there are excellent and responsible researchers in the field, this linguistic density so prevalent at the heart of software development indicates that we cannot lay all of the blame for big data-related risk on either the software users or the companies.

Academic computer science researchers not only contribute directly to these products and processes, but they also train much of the next generation of academic and commercial software developers, instilling them with the same cultures of communication and values systems. When you look at how computer scientists talk about data, it's easy to see how even well-meaning companies can introduce social risks, which their customers may unknowingly take.

We need more than new solutions; we need new sensitivities, perspectives and priorities feeding in to technology from its foundations and not as an add-on

High tech gurus such as Sean Parker, Chamath Palihapitiya and Tristan Harris are beginning to emerge as critical voices regarding the effects of the platforms they once served (or still do). The initiatives they and others have launched such as Time Well Spent and The Copenhagen Letter are seeking to raise awareness of what can and should be done.

The creativity and elegance with which technology developers approach problems must be matched with an equally creative and elegant understanding of the social and cultural contexts for their work. We need more than new solutions; we need new sensitivities, new perspectives and new priorities feeding in to technology from its foundations and not as an add-on. Perhaps bringing some subtlety back to the use of this one small word, data, could be a good place to start - for them and for us. 

Dr Jennifer Edmond is Director of Strategic Projects in the Faculty of Arts, Humanities and Social Sciences at Trinity College Dublin. She is also co-director of the Trinity Centre for Digital Humanities and co-ordinator of several EU-funded projects in that field.  


The views expressed here are those of the author and do not represent or reflect the views of RTÉ