Avatars Squared: Reality & Privacy

Trapped Between Deep Fake Face Videos and Face Recognition

Aug 18, 2019

Potemkin Protocol theme:

"Everything is up for grabs when it comes to what is "real" lately."

Avatars Squared: Digital and Real Life between Fake Videos and Face Recognition

Avatars Squared is my label for one corner of the Potemkin Protocol theme.

The need to record, recognize and reproduce images of human faces with masks predates photographic and video technology. There were two kinds: “theater masks” and “death masks”. These masks were the technological ancestors of today’s Deep Fakes and Facial Recognition technologies. How we deal with how our faces are seen, and how our stories are remembered, has been elevated to new levels of excitement and anxiety.

Some history first.

Greek and Roman theater relied upon masks for reasons we can still relate to:

The Ancient Greek term for a mask is prosopon (lit., "face"), and was a significant element in the worship of Dionysus at Athens, likely used in ceremonial rites and celebrations.

the mask is known to have been used since the time of Aeschylus and considered to be one of the iconic conventions of classical Greek theatre.

[Various] paintings never show actual masks on the actors in performance; they are most often shown being handled by the actors before or after a performance, that liminal space between the audience and the stage, between myth and reality.

Effectively, the mask transformed the actor as much as memorization of the text. Therefore, performance in ancient Greece did not distinguish the masked actor from the theatrical character.

The mask-makers were called skeuopoios or "maker of the properties," thus suggesting that their role encompassed multiple duties and tasks….

Masks’ special effects importance in Roman theater. They were large, with distorted features, designed for acoustics and changed how actors signaled emotional states.

Roman masks came in various shapes, sizes and styles.

By utilizing paint and ornamentation to create a more vivid mask, people in the audience were better able to identify with a character.

“Masks require a wholly different style of acting in order to communicate what is lost by the lack of facial expression; the masked actor must compensate by using a complementary physicality. Rather than limiting the actor’s ability to transfer complex text or emotions, however, the stylized movements of a masked actor bring a new dimension of expression, as well as a new language, to the performance.”

The “Death Mask” was another mask used by the Romans - to record, reproduce and remember the images of actual people. The techniques had a lifelike recognizable effect. The makers were creating a persistent memory for others. Centuries later it was used for research, celebrities, criminals, and attempted classification of people.

The lifelike character of Roman portrait sculptures has been attributed to the earlier Roman use of wax to preserve the features of deceased family members (the so-called imagines maiorum). The wax masks were subsequently reproduced in more durable stone.

Death masks were increasingly used by scientists from the late 18th century onwards to record variations in human physiognomy. The life mask was also increasingly common at this time, taken from living persons. Anthropologists used such masks to study physiognomic features in famous people and notorious criminals. Masks were also used to collect data on racial differences.

(Wikipedia image of “Resusci Anne”, a 19th Century death mask of an unidentified young woman found drowned in the Seine River. A morgue worker made a cast of her face, saying "Her beauty was breathtaking, and showed few signs of distress at the time of passing. So bewitching that I knew beauty as such must be preserved." “Anne” was also the face of the world’s first CPR training mannequin in 1960.)

In the 21st Century, we have a new global theater, which comes with the distortion of reality, including the celebrity of imagined idealized people who never existed but who are more real than reality and as familiar as family. It comes as well with recording, remembering and recognizing of real people who may be complete strangers to us.

Themes to keep in mind with the 21 Century’s answer to theater & death masks:

Fake people who never existed in real life but who influence real people.
Real people used like virtual puppets in realistic stories that never happened.
Virtual Influencers ruling social networks and sharing glamorous virtual lives.
Virtual Assistants, lifelike, in your homes, businesses & lives
Massive libraries of new content, including animation and animated people.

One of the foremost leaders in this space, Hao Li, profiled by MIT’s Technology Review as the “World’s Top Deep Fake Artist”, a prankster with a PhD from Zurich, credited Hollywood for setting him on the path of making unreality very real:

It was 1993, when he saw a huge dinosaur lumber into view in Steven Spielberg’s Jurassic Park. As the actors gawped at the computer-generated beast, Li, then 12, grasped what technology had just made possible. “I realized you could now basically create anything, even things that don’t even exist,” he recalls.

Fans of the “Fast & the Furious” film franchise have seen Dr. Li’s work, which enabled filmmakers to recreate the image of actor Paul Walker, who had been killed in a tragic car accident. Li also contributed to software called FaceShift that was later acquired by Apple and would ultimately make its way to iPhone users’ Animojis. When he’s not busy elevating Hollywood, his startup Pinscreen is capable of converting a simple photograph into 3D avatars, enabled by machine learning and GANS (generative adversarial networks).

The profile also reported Dr. Li’s concerns about this technology’s dark side. He’s not a mad scientist but he is aware that a virtual monster has been brought to life and he is alerting the villagers that it’s on the loose in the countryside.

(The term GANS comes from a 2014 machine learning paper by Ian Goodfellow and other researchers, called “Generative Adversarial Networks”.)

In a more harmless quarter of the Avatar Squared theme, M.I.T’s Technology Review reported an example of how GANS, used by Dr. Li for Hollywood, were also used in a project called “AIportraits” to transform photographs into classical paintings. The article also provided a simple explanation of how GANS work.

[GANS (Generative Adversarial Networks) get] two neural networks to duel each other to produce an acceptable outcome: a generator, which looks at examples and tries to mimic them, and a discriminator, which judges if they are real by comparing them with the same training examples.

We non-PhDs can make home-grown 2D virtual avatars with a tiny plugin. Courtesy of an Uber engineer Philip Wan, who worked with code from Nvidia called “StyleGAN”, we can play with GANS to create virtual avatars. (For a bit more on the history of GANS, a historical piece from site KD Nuggets.)

"This Person Does NOT Exist"

Unique AI-Generated Faces Placeholders Sketch Plugin, By Sliday

The tiny Sketch plugin that generates unique AI faces – data content placeholders – using thispersondoesnotexist.com.

The directions are frighteningly simple:

"How to use

1. Download, unpack & double-tap on the file

2. Select shape

3. Layer → Data → AI-Generated Faces

4. Wait...

5. Get AI-generated face for your image fill”

Nvidia’s paper “A Style-Based Generator Architecture for Generative Adversarial Networks” covers the code used. They also have an explainer video.

A far bigger version, known by at least 1 million people on Instagram, would be “Miquela Sousa” a/k/a “Lil’Miquela”, “a 19-year-old Brazilian-American model, musical artist, and influencer with over a million Instagram followers, who is computer-generated… an avatar puppeteered by Brud, a mysterious L.A.-based start-up of “engineers, storytellers, and dreamers” who claim to specialize in artificial intelligence and robotics.” Such was the attachment that Lil’Miquela’s followers had for this virtual influencer that when her Instagram was hacked and taken over, her fans cried ““Bring back the real Miquela!”.

It will get easier to create something out of nothing & a “someone” from everyone.
There are already large scale versions of artificial virtual avatars who walk the line between animation and animate entities. Another example is “Hatsune Miku”.

(image of Hatsune Miku, from a SELL OUT concert in London. She is worldwide.)

Michael Dempsey wrote an epic on AI & animation, “Animation Is Eating the World”:

In 2007, Hatsune Miku, an animated pop star by Crypton Future Media was created using Yamaha's vocaloid software. While her voice is modeled after a certain actress and her personality is dictated by her creators, her music and surrounding storyline is largely crowdsourced.

The stories she tells, emotions she conveys, and content she creates, enabled by technology, means something different to everyone.

Hatsune Miku is quite literally telling a different story imagined by a different person each time. Or as Linh K. Le puts it in his paper on the rise of Hatsune Miku:
“When these songs are performed on stage, fans feel as if they are celebrating the success of someone they see as being one of them. In this perspective, Miku is not so much a virtual pop-star but rather a symbol of the collective efforts that culminated in a concert-style celebration.”
Hatsune Miku is rumored to have sold over $1B worth of merchandise since her inception.

Dempsey focused on animation content but this same tech also has other implications.

Right now the mass market variety includes filters on current messaging apps - like fun-house mirrors we can use to create altered versions of ourselves to share. What happens when we can apply filters to the world around us, via augmented reality?

Magic Leap’s “Magic Leap One” release came with a virtual assistant called Mica.

These use cases will unfortunately be overshadowed by weaponized tech to blackmail real humans, commit fraud and topple politicians and governments.
An August 2018 research paper, from a team primarily from Germany’s Max Planck Institute of Informatics, entitled, “Deep Video Portraits”, explains how easy it is:

a novel approach that enables photo-realistic re-animation of portrait videos using only an input video.

In contrast to existing approaches that are restricted to manipulations of facial expressions only, we are the first to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor.

The core of our approach is a generative neural network with a novel space-time architecture.

The network takes as input synthetic renderings of a parametric face model, based on which it predicts photo-realistic video frames for a given target actor. The realism in this rendering-to-video transfer is achieved by careful adversarial training, and as a result, we can create modified target videos that mimic the behavior of the synthetically-created input.

In order to enable source-to-target video re-animation, we render a synthetic target video with the reconstructed head animation parameters from a source video, and feed it into the trained network - thus taking full control of the target. With the ability to freely recombine source and target parameters, we are able to demonstrate a large variety of video rewrite applications without explicitly modeling hair, body or background.

For instance, we can reenact the full head using interactive user-controlled editing, and realize high-fidelity visual dubbing.

To demonstrate the high quality of our output, we conduct an extensive series of experiments and evaluations, where for instance a user study shows that our video edits are hard to detect.

Futurism recounted that this was “presented at the VR filmmaking conference SIGGRAPH in August [2018], the team ran a number of tests comparing its new algorithm to existing means of manipulating lifelike videos and images, many of which have been at least partially developed by Facebook and Google. Their system outperformed all the others, and participants in an experiment struggled to determine whether or not the resulting videos were real.”

Here is the SIGGRAPH 2018 video of how AI manipulating fake videos of real people.

“Hard to detect.” One of the implications could be “Ransomware”.

A piece by Paul Andrei on DeepFake technology and what we are faced with (no pun).

Ransomware is a type of malicious software that blocks access to the victim’s files and threatens to permanently delete them unless a ransom is paid.

”DeepFake technology has not only allowed people to swap faces for the fun of it.

The technique has been repeatedly used to generate fake intimate content, impersonate politicians so that they held extremist speeches, and incriminate people of certain offenses which never actually happened.

Digital content forgery can bring about huge amounts of damage, both financial and psychological, in countless ways. It only takes several images or videos with faces and some awful video content.”

”existing tools and techniques can be used in exceedingly disturbing ways in order to pursue dark agendas.”

FAKE VIDEOS are only getting better.
Virtual people are only getting more realistic and influential.
Real people are losing more control over both the reality & narrative of their lives.

Another consequence is that there will be a demand for detectives and analysts how can work through what is real, fake, actual and virtual, like Jonathan Albright:

Working at all hours of the night digging through data, Albright has become an invaluable and inexhaustible resource for reporters trying to make sense of tech titans’ tremendous and unchecked power.

Not quite a journalist, not quite a coder, and certainly not your traditional social scientist, he’s a potent blend of all three—a tireless internet sleuth with prestigious academic bona fides who can crack and crunch data and serve it up in scoops to the press.

Earlier this year, his self-published, emoji-laden portfolio on Medium was shortlisted for a Data Journalism Award, alongside established brands like FiveThirtyEight and Bloomberg.

Seeing comedian Bill Hader’s face morph into Tom Cruise while he does an impersonation on a talk show is unnerving. Both funny & scary.

At the other end of the Avatar Squared spectrum, aside from fake people and fake videos, is a very real worry about what else will be done with the images of real people. Real people are poised to lose both privacy and control over personal information and personal property due to facial recognition.

Last week it was reported in the South China Morning Post that facial recognition for traffic control and offenders would take on a new level:

China’s traffic police are pushing ahead with a nationwide network of facial recognition surveillance cameras to deal with rule violations despite rising global anxiety over the new technology’s impact on privacy.

The central police authority announced… that its platform successfully identified 126,000 suspect vehicles without a valid licence last year, for example. They now want to widen the network so that information on suspicious traffic activity can be shared with other cities and provinces.
The facial recognition technology checks the faces of drivers and vehicle details against a database, helping to verify the identity of wrongdoers much more quickly and improving the accuracy of traffic violation management, said Sun Zhengliang, Secretary of the Party Committee at the Ministry of Public Security Traffic Management Science Research Institute, in a traffic security forum on Wednesday in Hefei, Anhui province.

Shenzhen-based AI firm Intellifusion has been providing face-scanning technology to the city’s traffic police since 2018.

In Handan in Hebei province, local police have teamed up with Guangzhou-based AI start-up Gosunyun Robot to introduce robots to help direct traffic and provide guidance to drivers.

The news about the protests in Hong Kong which has managed to trickle through on the Internet gives the world a glimpse of what is now an extreme example of face recognition’s implications for peoples’ lives. The recent protests in Hong Kong were surreal in their futuristic dystopian edge. Groups of masked protestors firing green laser lights to prevent riot police, who were wearing armor and deploying tear gas within subway stations, from using their face recognition technology.

It’s not just governments with access to this tech - it’s very much out in the open.

Here’s a project called OpenFace, supported by the National Science Foundation (NSF) under grant number CNS-1518865. Additional support was provided by the Intel Corporation, Google, Vodafone, NVIDIA, and the Conklin Kistler family fund. OpenFace provides: “Free and open source face recognition with deep neural networks.”

Face recognition technology, implicitly understood for its weaponized potential, continues being developed by companies for its commercial potential in new and unusual ways. Without it we couldn’t tag and share pics of friends and family. One example: Smile for the camera, so you can complete your purchase if Alibaba’s Alipay has anything to smile about it.

The Economist highlighted that not everyone is on board with this tech trend:

America’s Department of Homeland Security reckons face recognition will scrutinise 97% of outbound airline passengers by 2023.

A backlash, though, is brewing.

The authorities in several American cities, including San Francisco and Oakland, have forbidden agencies such as the police from using the technology. In Britain, members of parliament have called, so far without success, for a ban on police tests.

Refuseniks can also take matters into their own hands by trying to hide their faces from the cameras or, as has happened recently during protests in Hong Kong, by pointing handheld lasers at cctv cameras. to dazzle them. Meanwhile, a small but growing group of privacy campaigners and academics are looking at ways to subvert the underlying technology directly.

Enter Refusenik Fashion. Anti-recognition products, such as fashion line Adversarial Fashion whose “designed to trigger Automated License Plate Readers, injecting junk data in to the systems used by the State and its contractors to monitor and track civilians and their locations.”

While some authorities tout facial recognition’s value to fight crime, others want to fight its negative potential for civil liberties and privacy. Recode reports that bi-partisan legislation in the U.S. Congress may be in the works:

Rep. Elijah Cummings (D-MD) and Rep. Jim Jordan (R-OH), plan this fall to introduce a new bipartisan bill on facial recognition, according to representatives from both legislators’ offices. The specifics of the bill are still being hashed out, but it could include issuing a pause on the federal government’s acquisition of new facial recognition technology….

This is the beginning of the latest drama on a stage filled with new kinds of theatrical and “death” masks. Do we decide what masks to wear or shall that be up to the powers that be?

If the powers that be have second sight, with a panopticon which never sleeps, does that mean we must fight for the lesser evil of at least controlling how we are seen, are remembered and known? Full faith > fake faces? Or do we retreat, disguise ourselves, and take back our roles on the stage with new masks?

Our faces are everything, and we’re about to escalate what happens when it comes to remembering and mis-remembering. The battle has been joined - the tension between reality, liberty, privacy and security must be resolved.

I close with a quote from Bruce Sterling in an interview. He commented about design fiction, that brand of science fiction and futurism produced as a kind of consulting content product. I think it applies to Avatar Squared scenarios:

“Fake news” often reads, or looks on video, like weaponized design fiction.

(This is another piece that will be edited and updated as a living piece.)

The Big Stack

Discussion about this post