Generative AI: Notes from the beginning

A Microsoft leader shares his experiences integrating generative AI and leading responsible innovation.

The AI Literacy Lab hosted a conversation between Ashish Jaiman, hands-on technologist, product innovator, Director of Product with the Microsoft Bing Multimedia team, and leader of the Responsible AI initiative for Bing Image Create, and Northeastern Professor Rupal Patel, founder of the synthetic voice company VocaliD. Here is an edited account of their conversation.

Rupal Patel: Ashish, we met at a voice cloning workshop in early 2020. You were then the director of the Defending Democracy program at Microsoft. I recall that the sentiment was about fraud protection and also about eroding trust, especially because of the elections coming up. How do you describe the general understanding of AI back then, and how has that changed with generative AI today?

Ashish Jaiman: In 2020, we were talking about synthetic media, AI-generated content or AI-manipulated content. Manipulation can very well be: I want to enhance an image, I want to use AI to have a low-resolution image and improve its resolution. Academic researchers, technology companies, and even some public organizations were thinking about [AI], but it wasn’t a conversation that people were having in their dining rooms. I was in India [in November 2022] when OpenAI announced Chat GPT, and it just blew over. And then when I went back in May or June, even the Uber driver knew what Chat GPT was. So in three years, it has become a kitchen table discussion.

RP: Now that people know what AI is, what’s at stake for those of us creating this technology? Why do we need to get the message right?

AJ: With every technology, there are two types of early adopters. One is academics, researchers who want to do good. And then there’s the other side, people who want to exploit that piece of technology to harm. Creators have not thought through all the potential harm. Cybersecurity is a great example. It took us almost three decades to come to a place where people know what a multi-factor authentication is, right? As creators of technology, not only do we have to realize what the current landscape of threat looks like, but we have to also start thinking about unknown unknowns. In cybersecurity, we do this “white hat hacking.” Can I adversarially check this myself before a real adversarial attack happens on this technology?

RP: You started the Defending Democracy program back at Microsoft before generative AI. Why was the leadership thinking it was necessary to have at that time? Can you bring us back to that?

AJ: Back in 2016, we saw a lot of information manipulation because of access to social media, phones, other apps. The long-term implications are a threat to society, to institutions, and eventually to democracy and democratic principles. [Bad actors] were manipulating technology in 2016 and 2017, using tools that were pretty sophisticated, but a human still had to do something with that tool to manipulate information. We saw the deep fake train come, and we started thinking that, hey, all of a sudden now AI can do it. Not only it can do it, but it can personalize that misinformation. I can get a message personalized based on what my biases are; Rupal may get a similar but slightly different message, which can align with her values, using AI. That was a big, big thing.

RP: And the pace of AI now makes it even more important. Is that group still functioning, and is it bigger now?

AJ: It is bigger. The team brought our journalism initiative into that umbrella of Defending Democracy, because that was one of the pillars that was missing when I was there. Can we empower them with the right tools, technology, processes, and trainings to be that fourth pillar that they are in a democratic environment?

RP: That’s quite aligned with what we’re doing now with the AI Literacy Lab. What do you see as the role of traditional media and social media in communicating clearly — and not just talking about mistrust and all the negative consequences, but also about some of the potential gains and the force multiplier for good?

AJ: Go back to the same thing, right? Early adopters could be those adversarial actors, but eventually more and more people start learning about the technology. More innovation happens. We’ve talked about the positive use cases of generative AI. It can generate documents, resumes, respond to your emails, but you can also start thinking about complex engineering. Rather than investing millions of dollars to do complex engineering and testing it out, you can simulate it at scale, test it, experiment with it very, very quickly. Education is, I think, a very, very important use case. If you think about the Civil War in the U.S., can you have an Abraham Lincoln talking about it — like a deepfake, but that could encourage little kids to learn about the history in a very different way. I’m just giving one example. There could be so many more.

RP: In terms of building trust, what is the role of scientists and students in an academic institution? What role do they play in helping journalists and people who are going to be implementing this technology, in terms of making sure that we have guardrails?

AJ: A very interesting thing about academic institutions is they’re not just focused on one thing and one thing only, right? It’s a diversity of programs and thought leadership. So when you think about AI, you can always find a professor who can talk about ethics at length. You can find a social scientist who can also start thinking about how it can be used. You can find a behavioral scientist, because of the diversity of minds that is available in this space that we are sitting right now.

RP: I want to ask about how you think organizations like Microsoft can foster more trust and build a more informed society?

AJ: It comes down to some of the work that we have done before in the Defending Democracy program. It’s just not technical solutions, but you have to actually collaborate. The North Star is that whatever AI we create should not be used to create harms. So harms, what does that mean actually? Who’s our target audience? How are these harms or threat vectors classified for a technology company like Microsoft?

Let’s give an image example. I don’t want my tool to produce a racist image. I don’t want my tool to create violence or gore. So we can think about policy: a regulation where you say, okay, your tool cannot create any child sexual abuse material. But when you think about responsible AI, you also have to start thinking about the right of expression, privacy, access to information, accessibility. I can be very sure that the tool will not produce any bad content because I’m blocking prompts, but that is not serving the users because there are genuine cases where an artist who wants to create a graphic novel may generate that imagery in the tool. So how do you balance that? How do you measure that? What are the guardrails around it?

And we talked about AI-based moderation. There could be a lot of false positives, or even false negatives. So you then start thinking about: How do we bring humans into the loop, and what’s the balance? 90% artificial intelligence and 10% human intelligence? And once you start getting into human intelligence, are these folks trained in the right way? And then what about subjecting them five hours a day to this content, which we don’t want our users to be subjected to, but these humans are, right? This is what we do when we talk about responsible AI and the things that we do to limit the exposure of harm to end users.

RP: And it’s an evolving conversation around ethics.

AJ: Yes. And as the tools improve, I’ll give you a very interesting example. Our Bing Image Creator was built on top of Dall-E 2, which is an OpenAI model. We got access to Dall-E 3. Dall-E 3 creates such realistic images. So some threats, we were not even thinking about [before] because the images were not as high resolution to be realistic. Once you bring Dall-E 3 into the picture, you have to revisit your definitions of harms, your categories of harm. So the technology is evolving so much, and you have to keep pace in your thinking as well.