In recent years, advancements in machine learning have enabled hyper-realistic synthesis, recreation, and modification of images, audio and videos of people. This new class of synthetic media, also known as AI-generated media, generative media, and “deepfakes” is raising many questions regarding how technology distorts our perception of reality, news, and information. Rooted in the ethical and epistemological discussion of synthetic media technology, this class aims to raise awareness of the emerging topic of Deepfakes by exploring the techniques involved as well as the potential consequences of the technology. The class will also explore the potential positive use cases of Deepfakes for artistic expression, entertainment, and learning. Students will try their hand at creating deepfakes, and engage in discussions which will culminate in a final project informed by their personal interests and research.
The class will include weekly presentations by prominent speakers from: MIT, Harvard, Stanford, Microsoft Research, Adobe, Hollywood (!) and other companies and organizations.
*An application is required and participation will be limited (we aim to have around 25-30 students in the class)
*We will give priority to MIT Media Lab students and also students who take the class for credit.
Student’s Final Projects
Deepfaking Along the River During Qingming Festival
Zheng Ren, Yuebin Dong, and Jianyu Li
In the project “Deepfaking Along the River During the Qingming Festival ”, We use deepfake technique – First order model to animate the figures in the famous traditional Chinese painting and deepfake the face of 3 famous poets. We try to lead the audiences to watch through the long scroll in the form of video in order to intuitively show the historic scenes and lifestyles in the Northern Song Dynasty and to promote the traditional Chinese poetry culture.
Maggie Chen, Noah Deutsch, and Erica Luzzi
The rise of digital communication platforms has made it easier than ever for people to connect with others and meet new people in novel ways. However, as a result of this increased access, often digital interactions lack the intimacy and authenticity of in-person interactions. This is especially true in the case of digital dating, where gamified dating platforms have created a new normal for dating that strongly favors quantity over quality interactions and the intimate, honest experience of having face-to-face conversations has gotten lost over the broadband connection.
Ous Abou Ras
This aim of this project is to explore different methods in generating rendered figures that resembles a person. The figure is generated from a simple 3D human-like figure that could easily be manipulated in order to produce different poses and animations as needed. This production workflow is mainly to explore different aesthetics of cutout figures to be placed in renders.
Pix2Pix Hanbok Generator
This project explores the process of using machine learning to generate synthetic designs of women’s hanbok, traditional Korean clothing. A custom hanbok dataset was created and ran through a General Adversarial Network (GAN) in order to reveal patterns in the input dataset which was then used to train the network resulting in the production of synthetic images derived from those patterns. Furthermore, this hanbok generator followed the Pix2Pix ‘Sketches to Objects’ method, which required the use of Image-to-image translation, training the GAN on contour images of hanbok against original images of hanbok.
Combating Disinformation Deepfakes
Andrew Wong and Sarah Kovar
Deepfakes have begun to proliferate the public domain. Disinformation campaigns have used fake images and pictures for years, and the sophistication of those tactics have been accelerated through the development of deepfakes. Governments are concerned about the ramifications of deepfake enabled fake news on public discourse and democracy. Current research to combatting deepfakes focuses on identifying and warning the public and/or taking down deepfakes. While deepfake detection tools’ accuracy rate is increasing, it has a long way to go. Additionally, warning signs of fake news have not yielded the results that policy makers and the technology sector had anticipated. There has been little research done so far to apply cognitive science strategies to aid the public their own detection of deepfakes. By applying “The Poison Parasite Counter (PPC)” , we propose that the best way to do this is to use an educational tool that allows you to create a deepfake of yourself to see the ease at which deepfake videos can be developed. The goal is to help the public question the videos that they see on their newsfeed.
While still a nascent technology, generative adversarial networks (GANs) have increased substantially in both fidelity and popularity over the past few years. An exciting application of GANs for creative technologists and designers is the generation of novel image outputs through the training
of GANs on existing artistic styles. The increasing fidelity of artistic GANs raises broad questions about whether GANs are tools or artistic agents, who should get credit for generated artwork, and what constitutes art in an age where convincing digital “forgery” is possible. The body of work described in this paper focuses on one portion of this broader discourse: human perception and appreciation of artistic GAN outputs. Through novel
interface design and evaluative research, this work seeks to explore the following questions: At what point does discernment between human- and GAN created artwork become challenging for humans, and does this point vary with art style? Does the appreciation of GAN-generated artwork vary by art style and change with the complexity of the GAN output?
Faking from Slow to Soft: Creative & Critical Interventions of DeepDream
Kwan Queenie Li
This work describes the creative and critical exploration of the utilisation of the DeepDream computer vision program, particularly along parameters of image dissemination in terms of its physicality (from digital to physical, paper and fabric), circulation speed (from light-speed to an analogue path) and its application to unleash potential development directions of generative images that could be pro-social and benefit grassroot empowerment. The current applications of deepfakes situate mainly around fake news and fake pornography. Amongst the 14,698 pieces of deepfake videos identified on the internet at the end of 2019, 96% are nonconsensual pornography. There is a growing voice of exploring the positive ends of deepfakes. As a response, this work focuses on DeepDream as a theoretic theme beyond its technological meaning, and delves into notions of ‘machine dreaming’ in search for an alternative agency, one that might engender a positive outlook and perform as a powerful instrument with its contemplative capacity.
Speech Affect Translation Using Cycle-Consistent Generative Adversarial Networks
We show that it is possible to convert the emotional property of speech utterances by using a cycle-consistent generative adversarial network (CycleGAN) that modifies mel-spectrogram speech representations. We assess the conversion effect on an emotional speech dataset with five emotions (“angry”, “fearful”, “happy”, “sad” & “neutral) using a machine emotion recognition classifier. We see that all source emotions show signs of transferring to the target emotions, with some emotion pairs being more amenable to conversion than others. While we have not yet systematically assessed the human perception of the emotion in these modified speech utterances, we have implemented a MelGAN-based raw waveform reconstruction step so that the modified audio can be heard by human ears.
Peitong Chen, Olivia Seow, and Alicia Guo
Visual poetry is an interactive platform where users can generate and stylize imageries by inputting words, sentences, or poetry. Using AttnGAN trained on the COCO dataset, the user can translate words or sentences into machine learning generated images that can abstractly resemble the input. To reinforce the emotion evoked by the poetry and improve the cohesion, the user can choose to stylize the imagery. Using the integrated StyleTransfer methods and models of various art styles which have strong correlations to certain emotions, each image can be manipulated together or individually. The output – a series of images alongside with the original text input, will be presented to the user at the end as their personalized visual poetry.
While currently living in the age of technological overload, and endless hours of video calls, we find ourselves suddenly hyper exposed to our facial expressions, reactions, and unique tendencies. This platform, along with numerous social outlets, has enabled us to gain a stronger understanding of the sentic modulation we are exhibiting to the world. However, our voice remains rarely analyze by our own auditory perception. There has been a significant amount of research in the field of speech emotion recognition, but little application to how technology can account for these vocal cues and alter them to match our desired auditory output. This project proposed a new system of personalized voice modulation which alters prosody in speech by interpreting individuals’ neurological feedback while they speak.
Paul Gibby and Raul Alcantara
Writing is one of the most important skills a person can have, but despite years of schooling and training, many regard it as quite difficult. In this paper we present a machine learning system based on encoder-decoder models to permute the wording of sentences while maintaining semantic meaning for use as a writing tool. Our key contribution derives from a deepfake inspired technique of manipulating encoded representations to combine the semantics of one sentence with the high-level features (such as style) of another. We further demonstrate use cases for novel writing in the style of a given sample divorced from any particular semantic meaning.
The class will cover a wide range of perspectives on deepfakes, from their historical ancestors through their philosophical underpinnings and on to modern incarnations. Topics will be divided into two factions: Theoretical and Practical. The theoretical part will include readings of seminal works on simulative media as well as invited speakers that will discuss the societal impact of deepfakes on aspects of our lives: news, politics, entertainment and arts. The practical part will include programming tasks (Python) for hands-on usage of deepfake generators, a discussion of their powers as well as their limitations.
- History, Philosophy of the Synthetic (“Simulative”) Media: Simulacra and Simulation, Hyperreality, Transhumanism, Faith in Fakes, Ultra-realistic CGI (from Photoshop to Unreal Engine v5 and The Matrix); Baudrillard, Bostrom, McLuhan, Eco
- Deepfake-and-X, Societal Impacts: Journalism, Politics, Activism, Social Media, Film Industry & Hollywood, Creative and artistic expression, Learning and motivation
- Deepfake engines: Machine and Statistical Learning basics, Generative AI models, GANs and other Decoders, hands-on deep generators.
- Deepfake detectors: Datasets, competitions and approaches.
- Programming and scripting: Python, command line scripting (linux/mac)
- Basic mathematics: Linear algebra, statistics and probability, multivariate calculus – only cursory knowledge required, basics will not be repeated in class.
- Hands-On Machine Learning (2nd ed). Aurélien Géron. O’Reilly 2020
- Deep Learning with Python (2nd ed). Francois Chollet. Manning 2020
- Generative deep learning. David Foster. O’Reilly 2019.
- GANs in action. J. Langr and B. Vladimir. Manning 2019.
Recommended related courses
Day & Time: Thursdays 2-4pm & Friday (virtual office hour)
Units designation: 2-0-7
The class is given fully online via video conferencing. Recordings will be made as reference for students.
Email the instructors: MASS60email@example.com
|1||9/3||Class Introduction & Overview|
👤Invited Speaker: Dr. Omer Ben-Ami, Canny AI
🖐In-class tutorial: Introduction to Python, Machine, and Deep Learning
👤Invited Speaker: Dr. Judith Donath, Harvard’s Berkman Center
🖐In-class tutorial: Introduction to Python, Machine, and Deep Learning (con)
👤Invited speaker: Dr. Phillip Isola, MIT CSAIL
👤Invited speaker: Dr. Ohad Fried, IDC/ Stanford University
👤Invited Speaker : Aliaksandr Siarohin, University of Trento
👤Invited Speaker : Sam Kriegman, University of Vermont
🖐Pre-recorded tutorial : pix2pix
🖐Pre-recorded tutorial : Face Cloning & Swapping
📋Homework 1 : pix2pix & Homework 2 : first order model
👤Invited Speaker : Harshit Agrawal, Adobe
👤Invited Speaker : Ali Jahanian, MIT CSAIL
📋Introduce Homework 3 : Voice Cloning
👤Invited Speaker : Carter Huffman, Modulate.ai
📋Homework 1,2, 3 Review
|8||10/22||✨Special Conversation : Deepfakes, Science-Fiction and the Future by Jonathan Nolan, creators/director/writer of Westworld, Interstellar, Batman Begins, and more and Neo Mohsenvand, MIT Media Lab💡|
👤Invited speaker : John Bowers, Harvard’s Berkman Klein Center
📋Homework 3 Review
💡Final Project Brainstorming
👤Invited speaker : Matt Groh, MIT Media Lab
👤Invited Speaker : Yossef Daar,Co-founder and CPO at Cyabra
💡Final Project Check-in
👤Invited Speaker : Dr. Tal Hassner, Facebook AI & Open University of Israel
|Work on Final Project|
|14||12/3||👏Final Projects presentations|
Assignments and Grading
Each week a home assignment will be given in the form of a jupyter notebook. The notebook will contain code that follows each week’s class and also open segments for students to run their own code and tweak parameters to generate new artifacts. Students will be encouraged to post their successful creations on the class website. Assignments will be graded towards the final grade, and feedback will be given.
The class will have a final project that is centered around the student’s domain of interest, applying the tools given in and out of class. Instructors will provide starting points, data, and help for finding a suitable project. Projects could be done in groups of 1, 2 or 3 students.
Final grades will be given after project submission evaluations. Project grading criteria: creative value, technical contribution, academic contribution.
- Assignments: 20%
- Final project: 80%
- Extra credit: 5%
- Letter grading policy: http://catalog.mit.edu/mit/procedures/academic-performance-grades/#gradestext
The class is designed as a frontal 2hr (academic) x1/week schedule, with additional weekly home assignments and readings. Regular in-class participation will be encouraged, and topics from active learning practices will be applied.
Dr. Roy Shilkrot
MIT Media Lab
Dr. Shilkrot recently served as a faculty member in Stony Brook University’s Computer Science department, and currently a research affiliate at the MIT Media Lab (PhD’15) and Lead Scientist at Tulip (Media Lab spin off company). Dr. Shilkrot has served as a tenure-track professor of computer science, as well as an entrepreneur and engineer in several industrial settings. Dr. Shilkrot published numerous papers, articles, books and patents on topics in computer vision, human-computer interaction, language processing and assistive technology in leading venues such as CHI, UIST, ASSETS, SIGGRAPH, ICCV, BMVC, and many more.
MIT Media Lab
Pat Pataranutaporn is an antidisciplinary technologist/scientist/artist at the Massachusetts Institute of Technology (MIT), where he is part of the Fluid Interfaces research group at MIT Media Lab. Pat’s research is at the intersection of biotechnology and wearable computing, specifically at the interface between biological and digital systems. Pat’s research has been published in IEEE, ACM SIGCHI, ACM SIGGRAPH, ACM ISWC, ACM Augmented Humans, Royal Society of Chemistry, etc. He also serves as reviewers and editors for IEEE and ACM publications.
MIT Media Lab
Joanne Leong has a keen interest in understanding our human perception of the world, and designing technologies that complement us and can bring positive change to our lives—in how we learn, work, play, and connect with one another and the things around us. Her past works include a novel smart textile-based wearable approach for the sensory augmentation of prosthetic limbs (UIST 2016 Best Paper), and a conceptual framework for aiding in the design of constructive assemblies (TEI 2017 Best Paper).
Professor Pattie Maes
MIT Media Lab
Pattie Maes is a professor in MIT’s Program in Media Arts and Sciences. She runs the Media Lab’s Fluid Interfaces research group, which aims to radically reinvent the human-machine experience. Coming from a background in artificial intelligence and human-computer interaction, she is particularly interested in the topic of cognitive enhancement, or how immersive and wearable systems can actively assist people with creativity, memory, attention, learning, decision making, communication, and wellbeing.