Machine Vision in Pandemic Times

by Antonio Somaini

This article is about the social and political implications of the different uses of machine vision technologies during the COVID-19 pandemic. After arguing that the phenomenon of machine vision should be tackled from a media-archaeological standpoint, one that highlights the lines of continuity and the moments of discontinuity that define its position within the wider history of images and visual media, the article analyzes the different applications of machine vision systems within the context of the social measures taken in order to contain the spread of the virus: from the enforcement of social distancing and the wearing of masks, to the strategies of positive case detection and contact tracing, all the way up to the diagnostic examination of medical imaging. If machine vision systems and the machine-readable images they are applied to raise the question of what we mean by “vision” and by “image” in the age of algorithms, the COVID-19 pandemic, with the increasing presence of such a non-human gaze within the public space, has further underlined the current relevance of this question.

 

Since its beginning, the COVID-19 pandemic has triggered a double, apparently contrasting dynamic: physical distancing, and data aggregation. As bodies were instructed to stay apart and even self-isolate, data about bodies began to be collected and aggregated in order to monitor and contain the spread of the virus. Technologies of machine learning and, more broadly, artificial intelligence, have been deployed across the board as part of this effort, their clinical and societal applications ranging from the study of the genetic structure of the virus to the prediction of the number of positive cases, ICU hospital beds availability, ventilator use, and expected deaths; from the analysis of Google searches concerning terms related to COVID-19 symptoms as a way to predict the infection rate, to the diagnostic examination of medical imaging and the enforcement of measures of physical distancing through drones, heat cameras, and machine vision techniques. As several observers have noted, the new coronavirus SARS-CoV-2 and AI seemed to be destined to meet one another (Larousserie 2020, Bullock et al. 2020), with the viral spread of artificial intelligence technologies finding an ideal accelerator in the viral spread of the COVID-19 pandemic.

Machine vision technologies, in particular, have been the object of a wide range of applications: their capacity to deal with huge image datasets in order to recognize, identify, store, and process data has been used to analyze X-rays of patients, to monitor movements across public spaces, to identify bodies with higher temperatures that might be a sign of infection, to identify those who do not respect the guidelines of physical distancing and the wearing of masks. The COVID-19 pandemic has further increased and accelerated a deployment of machine vision technologies that was already happening at various levels, highlighting even more the significant rupture that such technologies introduce in the history of visual cultures and visual media.

This history is periodically marked by the sudden appearance of new images and new technologies of vision: images that introduce new forms of representation, and technologies of vision that introduce new ways of seeing, extending and reorganizing the field of the visible, while redrawing the frontiers between what can and what cannot be seen. For a few years, this has been the case with the new technologies of machine vision and with the machine-readable images they can be applied to. Considered from the perspective of the longue durée of the history of visual media and images, the impact of both is so profound that it leads us to raise the question of what we still mean by “vision” and “image” in the age of algorithms. What is “seeing” when the process of vision is reduced to the acts of identifying and labeling, and when such acts are entirely automated? And can we still use the term “image” for a digital file, encoded in some image format, that is machine-readable even when it is not visible by human eyes, or that becomes visible on a screen as a pattern of pixels only for a tiny fraction of time, spending the rest of its lifespan circulating across invisible networks?

Machine-readable images that can be processed by systems of machine vision are everywhere today. Everywhere in the sense that any digital image—whether produced through some kind of lens-based optical recording of a profilmic event, or entirely computer-generated, or a mix between the two, as it is often the case—may potentially be analyzed by a machine vision system based on technologies of machine learning and neural networks such as the Generative Adversarial Networks (GAN). By processing the several trillions of fixed and moving images that exist on the internet and that keep on being uploaded every day, reaching also the ones that are not on the internet but are stored in our networked devices, machine vision systems are turning the contemporary iconosphere into a vast field for data mining and aggregation. A field in which faces, bodies, gestures, expressions, emotions, objects, places, atmospheres, and moods may be identified, labeled, stored, organized, retrieved, and processed as data that can be quickly accessed and activated for a wide variety of goals: from surveillance to policing, from marketing to advertising, from the monitoring of industrial processes to military operations, from the operations of driverless vehicles to that of drones and robots, all the way up to the study of climate change through the analysis of satellite images. Even disciplines that might seem to be distant from the applications of machine vision technologies, such as art history and film history, are beginning to test the possibilities introduced by such an automated gaze, and we may legitimately ask ourselves what it would have meant for a cultural historian of images such as Aby Warburg to study the spatio-temporal migrations of “formulae of pathos” [Pathosformeln] through machine vision systems capable of taking the entire corpus of art history as a dataset, and then to identify and aggregate movements, gestures, and expressions.

Even though machine vision technologies and machine-readable images do introduce a moment of rupture within the history of optical media and images, the very idea of a non-human “machine vision,” in itself, is not new. Considered from a media-archaeological standpoint, it runs through the entire history of mechanical optical media. Reactions to it, and attempts to theorize its nature and its impact, can be traced back to the early years of photography, with the physicist François Arago and the geographer and naturalist Alexander von Humboldt praising the extraordinary visual exactitude of daguerreotypes in 1839, and the poet Charles Baudelaire condemning it twenty years later as “art’s most mortal enemy”: a form of sheer mechanical reproduction that should not “encroach upon the domain of the impalpable and the imaginary” (Baudelaire 1859). During the 1920s, 1930s, and 1940s, in the writings of filmmakers, film theorists, artists, and cultural critics such as Dziga Vertov, Jean Epstein, László Moholy-Nagy, Walter Benjamin, and Siegfried Kracauer, we find different ways of analyzing the aesthetic, epistemological, and political potential of images produced by a mechanical optical medium, the camera, capable of extending vision beyond the limits of the human eye, and, at the same time, introducing a new way of seeing from a decentered, non-human point of view. Traces of the idea of a “machine vision” can be found in the “kino-eye” [kino-glaz] that captures and reorganizes the visible world through the two operations of optical recording and montage (Vertov 1923), in the “metal brain” [cerveau metallique] of a camera that is “a non-human eye, without memory, without thought” capable of “escaping the egocentricm of our personal viewpoint” (Epstein 1921 and 1926), in the “new vision” [Neue Vision] and the “impartial optics” [unvoreingenommene Optik] produced by the “productive” uses the camera (Moholy-Nagy 1927), in the “new image worlds” and the “optical unconscious” [Optisch-Unbewußt] revealed by photography and cinema (Benjamin 1928 and 1935-36), and in the “unfeeling camera” that gives us access to the “alienated phenomena” of an “inert world […] in its independence from human beings” (Kracauer 1927 and 1949).

Beginning with the 1970s, the idea of a non-human “machine vision” is tackled in the writings of Paul Virilio on the intertwinings between military technologies and optical media (Virilio 1984 and 1988), in Vilém Flusser’s speculations on “technical images” and the “telematic society” (Flusser 1985), in Friedrich Kittler’s radically non-anthropocentric vision of the history of optical media (Kittler 1986 and 2002), as well as in Harun Farocki’s explorations—in video installations such as Eye Machine I, II and III (2001-03) and Counter Music (2004)—of the realm of “operational images” that are “devoid of social intent”: images that are “not for edification” nor “for reflection” (as Farocki writes in the textual commentary that runs along the images of the Eye Machine series), but are purely conceived and produced as active means for technical operations.

Machine vision systems and machine-readable images need to be tackled within such a historical perspective, without erasing the radical discontinuity that they introduce due to their connection with technologies of machine learning capable of dealing with data sets of unprecedented dimensions. The rupture that such systems introduce in the history of optical media is such that the very terms of “vision” and “image” run the risk of becoming purely metaphorical, since “vision” is here a form of algorithmic processing of different kinds of pixel-based pattern recognition, while the term “image,” when it refers to a machine-readable image, designates what is actually a digital file, encoded in a specific file format (.jpg, .tiff, .png, .mp4, .mov, .avi, etc.) that can be accessed and processed even when it is not visualized onto a screen in the form of an image visible for human eyes (Paglen 2016).

Even though mostly invisible, machine-readable images are nevertheless active and operational, and in this sense they may be considered to be the latest variations within a history of active images that has been explored by art historians and image theorists such as David Freedberg, W.J.T. Mitchell, and Horst Bredekamp (Freedberg 1991, W.J.T. Mitchell 2006, Bredekamp 2011). Through operations such as pixel counting, segmenting, sorting and thresholding, pattern recognition and discrimination, color analysis, object detection and motion capture, machine vision systems introduce new kinds of “image-acts” (Bredekamp 2011) that participate in the “feed forward” dynamic that Mark Hansen has suggested as a defining trait of “twenty-first-century media” (Hansen 2015).

As we have already noted, the COVID-19 pandemic has further accelerated the deployment of such systems in the public sphere. Heat cameras have been installed in public spaces in order to quickly identify bodies with unusually high temperatures. Unmanned vehicles such as drones and robots have been equipped with cameras connected to machine vision systems in order to enforce social distancing and the wearing of masks: speaking drones appeared first in China and then in various other countries, while a sinister robot-dog, which had already made its first appearances in various TV series such as Fox’s War of the Worlds (2019, episode 4) and Netflix’s Black Mirror (2017, season 4, the episode entitled Metalhead), has been roaming through public parks in Singapore. Matrix barcodes have been integrated into mobile phone apps meant to facilitate contact tracing, and in China red, orange, or green QR codes appearing on mobile phones were used in order to discipline the movements of the population, allowing or prohibiting traveling and access to specific places.

The social and political implications of the wide-ranging uses of machine vision technologies during the COVID-pandemic cannot be overestimated. In countries that had already adopted massive measures of social surveillance—such as China, with its famous Social Credit System, first tested in 2009, and then increasingly expanded since 2014—the pandemic has given the perfect excuse to further increase the means of surveillance and repression, even though the actual effectiveness and pervasiveness of such means still needs to be assessed. In most other countries, reaching an equilibrium between respect for personal privacy and management of the health crisis—with all that it means in terms of positive cases detection, contact tracing, and the surveillance of quarantines—has proven arduous and is still the object of political negotiations that differ from country to country.

A century ago, during the 1920s and 1930s, the non-human, non-anthropocentric “machine vision” of the camera was hailed as an instrument of liberation: a means for the exploration of a visible world that could be reinterpreted and reorganized from a revolutionary standpoint (Vertov), rediscovered with its vitalism and animism (Epstein), detached from its connection with the structures of the human mind (Moholy-Nagy), penetrated within layers that are inaccessible to the human eye (Benjamin), caught in its uncanny indifference to the existence of human beings (Kracauer). Half a century later, during the 1970s and 1980s, the rise of automation within both the industrial and the military domain brought to the foreground another aspect of the idea of machine vision: the possibility of using techniques of automated image analysis within complex sequences of operations that did not require any human agency. It was this turn that Farocki highlighted with his video installations of the early 2000s and with the highly influential concept of the “operational image.”

A further step leads us from the early 2000s to the current uses of machine vision: the connection between digital technologies of image analysis and the immense datasets that are accessible through the internet and that can be processed through artificial intelligence and machine learning. This last step transforms the very idea of machine vision into a complex set of operations capable of turning the digital iconosphere into a vast field for data mining. The present and future applications of such an algorithmic gaze are extremely varied and still to be discovered, and one should resist the temptation to see the increasing deployment of machine vision systems as the sign of yet another step in the direction of a condition of panoptic surveillance. The easier access to machine vision technologies might promote new, unpredictable applications. To give an example, we can mention the way in which, at the 2019 Whitney Biennial, the London-based agency Forensic Architecture led by Eyal Weizman used computer vision systems in order to automatically detect the use against civilians of a tear gas grenade, the Triple-Chaser, produced by the company Safariland, whose CEO, Warren B. Kanders, happened to be vice-chair of the board of trustees at the Whitney Museum of American Art. This use of machine vision technologies by an independent, non-governmental investigative agency showed us how such technologies, when openly accessible, could serve political goals that are far from those of policing, surveillance, or the extraction of data from social media platforms.

The COVID-19 pandemic has confirmed once more the plasticity of machine vision systems, triggering a wide spectrum of applications, ranging from social surveillance to diagnostics. The invisible spread of the virus has been countered through a non-human, algorithmic gaze capable of seeing and processing vast quantities of images that human eyes could never handle. In the context of a health crisis that required bodies to be distanced and data about bodies to be aggregated, machine vision systems participated in a vast effort of data collection and analysis that will definitely leave significant traces in the foreseeable future, and whose consequences are still hard to predict.

References

Baudelaire, Charles. 1859. “On Photography.” In The Mirror of Art. London: Phaidon Press, 1955.

Benjamin, Walter. 1928. “News about Flowers.” In The Work of Art in the Age of Its Technological Reproducibility and Other Writings on Media, ed. by Michael W. Jennings, Brigid Doherty, and Thomas Levin. Cambridge and London: Harvard University Press, 2008, pp. 271-273.

Benjamin, Walter. 1935-36. “The Work of Art in the Age of Its Technological Reproducibility.” In The Work of Art in the Age of Its Technological Reproducibility and Other Writings on Media, ed. by Michael W. Jennings, Brigid Doherty, and Thomas Levin. Cambridge and London: Harvard University Press, 2008, pp. 19-55.

Bredekamp, Horst. 2010. Theorie des Bildakts. Über das Lebensrecht des Bildes. Frankfurt: Suhrkamp (English translation: Image Acts. A Systematic Approach to Visual Agency, Berlin: De Gruyter, 2017).

Bullock et al., “Mapping the landscape of Artificial Intelligence applications against COVID-19,” https://arxiv.org/abs/2003.11336 (accessed: 19 July 2020)

Crawford, Kate, and Paglen, Trevor. 2019. “Excavating AI: The Politics of Training Sets for Machine Learning.” https://excavating.ai (accessed 29 June 2020)

Epstein, Jean. 1921. Bonjour Cinéma. In Écrits sur le cinéma 1921-1953, tome 1 (1921-1947). Paris: Seghers, 1974, pp. 71-ff.

Epstein, Jean. 1926. “L’Objectif lui-même.” In Écrits sur le cinéma 1921-1953, tome 1 (1921-1947). Paris: Seghers, 1974, pp. 127-130.

Flusser, Vilém. 1985. Ins Universum der technischen Bilder. Göttingen: European Photography. (English translation: Into the Universe of Technical Images, Minneapolis, London: University of Minnesota Press, 2011).

Freedberg, David. 1991. The Power of Images. Chicago: University of Chicago Press.

Hansen, Mark. 2015. Feed Forward: On the Future of Twenty-First-Century Media. Chicago: University of Chicago Press.

Kittler, Friedrich. 1986. Grammophon, Film, Typewriter. Berlin: Brinkmann und Bose (English translation: Gramophon Film Typewriter. Redwood City: Stanford University Press, 1999).

Kittler, Friedrich. 2002. Optische Medien. Berlin Vorlesungen 1999. Berlin: Merve. (English translation: Optical Media. Cambridge: Polity Press, 2010)

Kracauer, Siegfried. 1927. “Photography.” In The Mass Ornament: Weimar Essays. Cambridge and London: Harvard University Press, 1985.

Kracauer, Siegfried. 1949. “Tentative Outline of a Book on Film Aesthetics.” In Siegfried Kracauer – Erwin Panofsky, Briefwechsel 1941-1966, with an appendix: Siegfired Kracauer “under the spell of the living Warburg tradition,” edited and with an afterword by Volker Breidecker- Berlin: Akademie-Verlag, 1996.

Larousserie, David. 2020. “Coronavirus: comment l’intelligence artificielle est utilisée contre le Covid-19.” Le Monde. 18 May 2020. https://www.lemonde.fr/sciences/article/2020/05/18/comment-l-intelligence-artificielle-se-mobilise-contre-le-covid-19_6040046_1650684.html (accessed: 19 July 2020)

Mitchell, W.J.T. 2006. What Do Pictures Want? The Lives and Loves of Images. Chicago: University of Chicago Press.

Moholy-Nagy, László. 1927. Malerei Fotografie Film. Berlin: Gebr. Mann Verlag, 1986.

Paglen, Trevor. 2016. “Invisible Images (Your Pictures Are Looking At You).

Somaini, Antonio. 2020. “From Time Machines to Machine Visions.” in Antonio Somaini, with Eline Grignard and Marie Rebecchi, Time Machine: Cinematic Temporalities. Milan: Skira, 2020.

Vertov, Dziga. 1923. “Kinoks: A Revolution.” In Kino-Eye: The Writings of Dziga Vertov, edited and with an introduction by Annette Michelson. Berkeley, Los Angeles, London: University of California Press, 1984.

Virilio, Paul. 1984. Guerre et cinéma I. Logistique de la perception. Paris: Cahiers du cinéma, 1984.

Virilio, Paul. 1988. La Machine de vision. Paris: Galilée.