Please introduce How to be more or less human – including its background and how you developed the work.
MD: I began researching computer vision last December after seeing images of automated captioning released by Google and wanted to know how this software could be used in performance and theatre. I began working under the title Theatre For Computer Vision and envisioned a performance that used object recognition and scene recognition to lead and direct a live show. I immersed myself in all the different scientific papers and research being done in the field of computer vision. One of those research initiatives is the by the MIT computer science and artificial intelligence lab, who host the largest online database of annotated images for computer vision training and development. The Sun database and accompanying "label me" tool is the largest online database of categorised images and annotated objects. The database is a public resource and users are encouraged to add and annotate images to further add the to the database and increase the accuracy of artificial intelligent systems.
The categorisation and hierarchical navigation of scenes, locations and objects is fascinating. You discover that there are more instances of a ceiling lamp than a person, meaning the algorithm is more familiar with a ceiling lamp than a human being. You discover the list of locations and objects that are being edited out of the training process due to their complexity or confusion. What you really see is the extensive amount of human cognitive labour that is being conducted to develop automated computer vision. In order for a computer program to identify objects, it needs to compare the image with millions of others in order to make an accurate guess at what it is seeing. Like us, computer software must be taught what it can see, by referencing the image with millions of others to accurately analyse the locations and objects within the image. The image can only serve the development of AI if it has been accurately identified and categorised and this process is still done manually, predominantly through Amazon’s Mechanical Turk service. In fact, the Sun database has released a plug-in to automate batch image uploads to the Amazon Mechanical Turk marketplace where workers identify and annotate mass image datasets for nominally low fees. These annotated images are verified and then categorised into image database libraries. The illusion or the magic of the automation is slightly tarnished when you discover the extensive human labor that has gone into the software. The database and the resources are all open tools to aid the development and accuracy of software that only MIT scientists and researchers have access to.
Although I created a few performances using existing images and datasets available, I still wanted to use some computer vision software to interpret my actions and performance in real-time.
I then found an auto-tagging software called Imagga that has a pay-per-use API for analysing and tagging images. Imagga works in a slightly different way; it uses existing meta-data from online stock imagery to give near real-time analysis of images. Early experiments indicated Imagga is extremely powerful with indentifying outdoor locations, people and animals, objects and style. But, because the software is built upon a library of commercial stock photography, the language is largely business-focused. I found this fascinating; that the visual criteria for "sexy", "man" and "successful" had been encoded into the neural networks of the software.
Every image returns an average of around 30 tags: words describing the image to different degrees of accuracy. The software is very sensitive to identifying people, animals and weapons and is quick to identify race and gender. Each tag has a numerical percentage that represents the confidence of the software; a long 9-digit string of numbers indicates how confident the program is in its interpretation of the image. Working with the feedback of these tags and their confidence levels I was able to begin investigating how the software had been trained through performing the process of identification with different objects. Basic errors could show fundamental issues with how the software had been trained.
For example, I noticed that without a white business shirt on the software was significantly less confident about identifying me as a "man". The majority of men in the image database are dressed in formal business attire, meaning the software has a normative view of a man, that he must be in a business suit; the suit literally defines the man. This became one aspect of the performance that reveals how other tendencies of the software reflect on how typologies have been encoded and valued into computer vision.
What is the fundamental point which you wish to make in the work and can it ever be resolved in a way which is universally harmonious?
MD: The key idea that the piece addresses is how humans are encoded into computer vision, and how our living, breathing nature does not make us any more human than a household dustbin with a white shirt on. What became integral was the discovery of the human as one data object amongst thousands of other data objects, and that our living species had no hierarchy over any other objects. In the list of the most common objects in the Sun database, a person is 8th, coming after "ceiling lamp" as the most cited and identified subject. This agency to other objects fits in with the notion of the actor-network theory popularized by Bruno Latour.
My living, breathing presence as a human actor gave me no priority over the non-human actors within the show, and the attributes used to identify objects could be exchanged and performed by an ensemble of things. Benjamin Bratton identified the significance of machine vision and its implications in an interview with at the beginning of 2015:
“There is the question of how the world looks as a screen, and another, more important I think, is how we look as objects of perception from the position of the machines with which we co-occupy that world.”
In this interview, Bratton talks about the importance of seeing ourselves through machine vision and encountering a visual perception that does not have an understanding of aesthetics, genre and context. However, what became apparent in Imagga was that it displayed subjective normative characteristics that made judgments of typologies and style and genre informed by commercial stock imagery. The software had been trained to describe visual elements as "sexy" or "professional" and I wanted to understand what criteria made a subject more "sexy" or more "professional". I became obsessed with making myself appear like more of a man, become more attractive, more sexy and more professional.
As I performed the process of matching the standards of the software I noticed that it was not me that defined my gender, but my clothes. Once I exchange properties with a household dustbin, dressing it in a white collar shirt, the dustbin is perceived as more successful, more professional and more attractive than me.
This realization also had wider implications into my research into liveness and performance, and whether to be live something has to be living. This question came out of looking at many modes of performance that use the temporal liveness of computers as actors in a show, producing the unexpected improvised performances that was something that only humans were capable of.
Annie Dorsen, for example, practices what she calls "Algorithmic Theatre" where computer chatbots will perform in front of an audience with no human actors on stage. This type of work calls into question the role of the human performer in live performance and theatre, and questions the qualities of what makes something live. If computer programs can recreate the unexpected tension that comes from the improvised and ephemeral nature of live performance, what can the human offer that still makes his performance unique or of higher quality?
I think we are at a stage in our relationship with technology where the advancing of artificial intelligence is beginning to disrupt dominant Darwinist thinking - namely that we are top dog and should be centre-stage. I have been directly responding to this through the spotlight of performance and theatre because I think the qualities of what makes something live are also what makes something (a)live and living. And this distinction, between live technological performers and (a)live subjects has become increasingly difficult to separate as the performance of artificially intelligent systems becomes increasingly widespread and convincing.
Is the work designed to tell us as much about the curation of stock imagery, and the consideration (conscious or otherwise) of the human characteristics which you touch on in the work, as it is the software itself?
MD: The work investigates both: how human characteristics have been encoded into computer vision through the curation of stock imagery. The analysis of the software happens in real-time, with each description projected into the installation behind me, so you consistently see the affects of the stock imagery and its results. The vocabulary of the software is distinctly normative and it will always associate work with happiness, which is a symptom of it being trained on a stock imagery database. I think How to be more or less human says a lot more about the characteristics of the software than the characteristics of humans.
During the performance, I go to extraordinary lengths to be recognised as a man but these sequences serve only to highlight and demonstrate the social and political bias of this particular algorithm. Because my performance is quite animated, many people have commented on how the webcam is a metaphor for a mirror and can represent some of the constrains and ideals that are maintained through the camera on the smartphone. The culture of the selfie and the incorporation of applications like beauty filters on mobile phones indicate a vulnerability that comes from our narcissistic desire for perfection. Throughout the performance, I take a lot of pictures and perform the process of achieving that perfect image, but this process is predominantly to demonstrate how each object affects the confidence of the software. Although I am much more interested in the performance of the algorithm, my obsession with being recognized as a 100% man perhaps highlights some narcissistic traits of being human and our obsession with self-image.
If we are reduced to "stuff in the world" for machine cognition, does that objectivity give a better outcome than the narrow, horrid subjectivity which we have to bear ourselves to in real life?
MD: Well, the horrid subjectivity that humans use to perceive the world is how we make sense of the world around us, but we are beginning to start to see other machines of cognition that present alternative means of perception. And it is unsettling - images, colours, shapes and patterns are categorized in alien ways: just look at the psychedelic pictures released by Google.
These new modes of perception are fascinating to us, but what I am trying to highlight in the piece is that they are far from objective. The Imagga auto-tagging software is characteristic of what it has seen... and when you have only seen images of commercial stock photography, your perspective on the world is going to be constrained to commercial business vocabulary. The machine's inability to recognise the naked flesh of the human really highlights this point, namely that the computers software’s understanding of a man is based on the normative clothing of a white business shirt. But, every image recognition program comes within its own set of faults and characteristics. The Flickr auto-tagging feature caused outrage when it labelled concentration camps as "jungle gyms" and black people as "apes". The public outcry highlights the impossibility of making an objective tool that has no social or political code embedded within it. As the developers withdraw certain terms and phrases and refine the algorithm, the software is censored and forced to process certain social and political concerns that improve its functionality. I find this really interesting... the computer simply says what it sees and then the developers are responsible to incorporate racial, social and political moral codes into the algorithm to improve it.
The ways of seeing demonstrated by some computer vision software are quite similar to a human condition called Visual Agnosia that I read about in Oliver Sacks' book The Man Who Mistook His Wife For A Hat. In this particular case study, the neurologist, Dr Sacks, is visited by a man who identifies things through association rather than appearance. Hence, he mistakes his wife for a hat and his foot for a shoe. The method of association used to identify things is very similar to computer vision, for example, if there is a bed then it is probably a bedroom. This is an aspect of computer vision that is not so different from how we associate things to identify them.
Has the subjectivity of the software writer transcended the subjectivity of the lens, of the gaze?
MD: As part of the work, I have published a series of emails exchanged with the developers as I negotiated how to be improve their image recognition software. After asking them about the confidence levels and the criteria for recognising gender, I wanted the developers to advice me on how I could be classified as man with a 100% confidence. They responded with set prices for customizing the software. I could actually pay to have myself encoded into the algorithm; they offered me a deal to alter the software to incorporate my naked body as 100% man. I could send them 1000+ pictures of myself and pay €1299 to customize the algorithm. This is when we begin to see the subjectivity behind the automatic gaze, and that someone or a company with a big enough wallet could pay to have their product instantly recognised by the software.
One has to think what else will be left outside of the computer vision vocabulary, which objects or scenes present too much complexity or not enough value to encode into artificial intelligent programs. The naked body is one of those objects, because it is commonly obscured by clothing, the software struggles to identify it.
What would Lacan make of your work?
MD: I think of the webcam as a mirror and in the performance the feedback of the software is continually forming the identity of the subject. I take the classification of the software to an extreme, interpreting the confidence of the software as a formation of my identity. When typologies, feelings and human conditions can be quantified to a point where I can be 12% sexy, 18% successful and 22% man, it is important to know what you are being measured against.
I am unsure what Lacan would think of my work, but I hope that Bruno Latour would enjoy it.
What's next for you?
MD: Over the summer, I will be attending a ventriloquist school in Norway for a residency and at the end of the year will be developing How To Be More Or Less Human for a group exhibition at V2 in Rotterdam.
I am working in Amsterdam for the Institute of Network Cultures, to help to produce the next edition of Moneylab, a conference on digital economies.
"How to be more or less human" is now available online; you'll need to switch your webcam on in order to achieve the effects which Max describes above. Further information on Max and his work is available at his website and at @maxdovey.