Privacy and Ethics in the Age of AI Transcript

NOTE: The following has been edited for length and clarity.

Larry: Hi, I’m Larry Magid, CEO of ConnectSafely, and welcome to Are We Doing Tech Right? And a good person to know whether we’re doing it right or wrong, or at least have a strong opinion, would be Shea Swagger, who is with the Future of Privacy Forum, a graduate student and former staff member at the University of Colorado, and an all-around expert on privacy and policy and all things tech.

So, Shea, thanks so much for joining us. 

Shea: Thanks so much for having me. I appreciate it. 

Larry: I hope I got your bio more or less right. You crushed it. It was great. Okay. Well, let’s start a little bit by talking about your background and FP FPF, Future of Privacy Forum what you’re doing there, what it’s doing, and how this kind of ties into some of the big issues that are going on right now around privacy and tech policy, AI. You know the drill. There’s a lot on the minds of people these days. 

Shea: You are not wrong. So for those that don’t know FPF, it stands for the Future of Privacy Forum. It’s a nonprofit privacy organization that brings together really smart people that think about technology and privacy and how do we understand, especially emerging technologies and making choices around data in a way that is going to benefit everybody who uses them. And trying to think ahead around problems before they become an immediate issue, I think is one of the things that we do best.

I am a senior researcher for data sharing and ethics there. I’ll break that down a little bit. But primarily my data sharing work is involved around corporate academic data sharing partnerships for research. So, you know, most of the research that is typical and most people, when they think of research or imagine it in their head, it happens in, you know, universities or laboratories, and that’s true.

Like that’s a big part of what research looks like, but there’s another form where research is happening that has a partnership between companies and academics or people that work in higher education. And how do we facilitate data sharing between companies that either produce or gather the data and to the people that need the data to answer questions? That further something that maybe is related to health or education or transportation and collecting that data from a university side would be either too expensive or there’s some around how we collect that data.

And so having that partnership between companies and higher education can help scale research questions or allow new questions to be asked. But there’s definitely some privacy and some ethics issues when the data is coming from companies. And so we need to make sure that we’re doing that in a way that doesn’t harm anybody, that it respects informed consent, that it is equitable.

So that’s the data sharing part and then it kind of bleeds into the data ethics part. Data has a lot of consequences, especially depending on who you are, the consequences are not equal among populations and across populations. And so how do we collect, use, share, publish, reuse data in a way that is going to benefit the most amount of people? Reduce harm as much as possible and improve the lives of the people that it’s about.

Larry: And one question that always comes up from an ethical standpoint when companies help subsidize research is whether they have their finger on the scale. I’ll give you an example outside of tech. There’s a lot of research that suggests that wine is good for you, but it turns out, and I’m not saying it isn’t, I hope it is because I do consume some of it, but it turns out that a lot of this research is funded by the wine industry. Now that doesn’t mean that it’s biased research, but it obviously brings up a question of whether the methodology, the kinds of questions they’re asking, the population they’re going after, whether or not this is really the same level of unbiased research that we might expect if the government were paying for it, or some foundation were paying for it, or if it was just done by a university out of its own funding.

Shea: That’s a great example. We want to make sure that when we’re making decisions about especially things like health, but all research in general, the reasons for the conclusions are because we’re following the data. We want to make sure that our conclusions are evidence-based and that something like who’s funding it isn’t influencing it. And so one of the things that we’re very strongly recommending is that any time there’s a partnership between a company and a research group the company has no influence over the conclusions that happen in the research and that companies can review maybe how the data was shared and maybe some of the methods that are most appropriate to analyze that data. But the conclusions around the research should only be coming from the researchers and the authors of the journal, not the company.

And that hard division between the company and the authors of the study help ensure that integrity. 

Larry: And, the pressure points, I mean, I know sometimes the pressure points to avoid using company data come from the company itself. Often they will use privacy as a rationale, but sometimes there may be other rationales.

And sometimes the researchers might have concerns about getting the data from the company preferring to go to some neutral source for the data, but often only the company has the data. So how do those things get worked out?

Shea: Well, I’ll push back on maybe one of the premises that there, there isn’t a neutral data source.

The data is not neutral no matter where you get it from, but I understand the question and it’s totally legitimate. For companies that don’t want to share data, there are a multitude of reasons for why that is. And some of that is because they want to protect the identity of the you know, users that they got it from or doesn’t align with informed consent.

Those are very legitimate practices, and reasons for withholding. They can be addressed, in most cases, and that’s one of the things that we work on is trying to create some guidance to help them address those concerns. There are some other reasons why a company might not want to share data if it’s especially about them.

If the data in some way makes the company look bad and so for especially larger companies if the research is about the practices of those companies that would create maybe a public relations issue, then they’re not going to be incentivized to share it. And so that’s sort of the reasons why companies might hide behind a “we don’t want to share because it would make us look bad, but we’ll say it’s because of privacy.” And you know, I would like there to be more tools around how do we in those cases still make sure that the research gets done because it’s going to be in the best interest of the public. And from the research side anyone that doesn’t want to work with company data because they want to conduct their own research.

I think it’s born out of a few things. One is cultural in that most of the time that we’re taught in our disciplines, especially in graduate programs where we’re taught how to do research and most of that teaching is around collecting or generating data on your own, like sort of using novel, data collection methods.

So that’s, you know, in some disciplines, that’s generating surveys in some disciplines that is, using equipment to generate new kinds of data. And so the idea of using previously existing data that are not public is not something that’s in most disciplines’ pedagogy, that’s not the way that they train new researchers in the field.

And so it’s going to take a little bit of a culture shift to rethink how we think about research methods and data collection generation. And let’s form some new practices around creating those partnerships and generating new relationships so that data sharing could happen because it’s going to save a lot of time and money and ultimately benefit people. Because I think we can increase the amount of research that’s being done.

I think we can ask you more nuanced questions or even different questions because of it. So there are really good reasons why there’s hesitancy. There are some bad reasons why there’s hesitancy and I think there are some just very traditional cultural reasons why it will be some time before we get there.

Larry: And then there are situations where things go really wrong. I actually met you at a meta event and as most people now know, meta used to be called Facebook.  When it was Facebook, it was engaged in a scandal, was involved in the Cambridge Analytica scandal where data from tens of millions of Facebook users wound up going to places that it wasn’t supposed to go.

So how could that happen and how can we make sure something like that would never happen again? 

Shea: That’s probably what most people think of the Cambridge Analytica scandal when they think about when things go wrong. So there’s definitely things that we’ve learned since then that we know to do in future partnerships.

One of the things that we want to make sure that we’re doing is any time there’s partnerships between a company and a research group everything is going to go through an IRB review. So that’s an institutional review board. These are the professionals that sit and deliberate around privacy and ethics.

And informed consent was one of the big ones that came up. And the more we can have IRB review or equivalent review boards, looking at partnerships like these, the better, but we also need some new tools. Perhaps IRBs are less good at adjudicating. So one of the things is public data under current IRB, guidelines that any use of public data is acceptable and doesn’t require review.

And it doesn’t even matter where that data came from. So like a data breach could release you know, millions of people’s information online, but now that it’s public, researchers could go in and use it for research. Now, just because they can, does not mean that they should. So we’re going to need some new tools and regulations and guidances for researchers to make ethical decisions around when to use data like that, especially with new partnerships.

Larry: So there’s another type of data that’s on the horizon right now, going both in and out and that of course is generative AI. Generative AI is producing data, producing actually new content in a sense by consolidating content it finds from the various ways that its language models have been trained.

Perhaps more important, it’s getting data from its users. So if I go in and I ask it a question about a medical condition, information about the medical condition I’m interested in, which means I probably am affected by it, is now being entered into a database, whether or not it’s being recorded or not, or associated with me is a really important question.

I’m logged in. If it’s chat GPT or Bard, for example and if I were to ask about a medical condition, perhaps I might not want anyone to know that I’m concerned about that condition. Should I be worried? 

Shea: Well, I would definitely not recommend putting in private information into any generative AI systems.

For one, I think that’s a good practice is you don’t necessarily know how the information that you’re entering into those systems. 

Larry: Okay. I think most of us are not going to put our social security number in, but if we’re interested in solving a problem, like I want to know where to live, I want to know if this person is right for me, I want to know if I should buy a Ford or a Chevy, I want to know whether I should get an operation these are personal, this is personal information.

I mean, and that’s kind of the point of AI is to get advice on things that concern you.

Shea: So I think we’re still developing literacies around how to think about AI and how to use AI and as those literacies are still nascent, I would urge caution on how you want to use them. Especially with really important decisions, like all the ones you listed are ones that I would say are better answered by sources that are not AI based.

So, you know am I, you know, with the right person you should talk to the person. 

Larry: You should help me solve my relationship issues. 

Shea: Yeah. I wouldn’t go to chat GPT first for relationship advice. There’s lots of other sources. But also I want to acknowledge that like today, the Biden administration came out with a new executive order on AI and I’m still processing what that is and there’s a lot of people that are coming out with their analysis. And so there’s the landscape is still rapidly shifting, but I definitely think that all AI is based on data. And even though there are some like, you know, novel data sources, all that data is based on something that happened in the past.

It’s historical data and so one of the things that is at risk when we use AI, either generative AI or other forms is reproducing the past. And so one of my favorite scholars on this, her name is Dr. Ruha Benjamin, she was talking about predictive policing and reframed it as instead of predictive policing or a crime prediction. She called it crime production and she was describing a vicious cycle where police generate data around, especially poor and nonwhite neighborhoods and therefore they have higher crime rates in their data sets, which justify further policing, and it’s a vicious cycle.

And so it’s a form of crime production, but that is a form of how other non-technical forces influence the operations of technical infrastructure like AI. And I think there’s a tendency to consider the information that you get from an AI or maybe from a search engine as somehow objective, and I’m using like scare quotes, objective.

And this goes back to literacy. And so I, you know, as a former librarian, we talked about information literacy a lot. And so how do you evaluate the credibility of the information you’re given from a particular system is a skill. It takes practice. It takes cross referencing. It takes some time and often it takes talking with other experts in the field or people that are information experts to help you process and analyze and synthesize that kind of information.

And we’re not there yet. I mean, we’re still trying to figure out how to do information literacy with search engines for most students in higher education. At least we are not, we don’t have developed literacies around interpreting A. I. outputs yet. 

Larry: And as we talk about the caution, then you’re certainly urging people to be cautious.

How do we draw the line between caution and moral panic? Because there is right now in the generative AI field, a lot of moral panic. Some of it, ironically, is coming from the industry itself. When the leaders of the generative AI companies start talking about, you know, existential threats, extinction level threats compared to nuclear war and global warming that doesn’t help the matter when it comes to the common fears.

So how do we address that issue right now? 

Shea: Yeah, so I definitely don’t have the same kind of existential panic, at least from an AI perspective that maybe some people do. So I spend a lot of time in the higher education space and the moral panic around generative AI there centers around cheating and that AI is going to create new forms of cheating and new methods for students to, you know, skate their way into becoming their, your doctor that is, you know, unqualified. And I think that those are unfounded fears, but they’re also very predictable fears. So there’s a pattern, at least in educational technology, where, you know, a new technology comes on the market and people are very concerned that this technology is going to enable cheating in a way that we’ve never seen before and that all the students are going to use it and that, you know, it’s the end of you know, education as we know it. And this goes back to like the copy machine when there was the first copy.

Like students are not going to take notes, they’re going to share notes, and then, you know, the internet comes around, and the web becomes popular, and it’s that, or, you know, then the next iteration is around search engines or Wikipedia. 

Larry: Calculator, I mean, the calculator. Oh, absolutely. I mean, cheating on math, right?

Shea: That is a great example, and like math didn’t collapse under the weight of a calculator.

Larry: No, as a matter of fact, it got more sophisticated, because absolutely. The drudgery of long division went away, and suddenly you were able to think about the more serious problems in mathematics. 

Shea: And that’s like the best case scenario is that instead of the more mechanical procedural elements of math education, students were then able to offload some of that procedural work onto a device and be able to answer and think about new kinds of questions.

And so math education just adapted it, incorporated and encouraged calculators as a tool and changed the way that they teach and assess learning with that incorporation. And it’s great. Like, I wouldn’t want calculators to go away. I use them all the time. 

Larry: But at least at the higher levels, I mean, if you’re trying to teach long division, if that’s your purpose to teach someone to add, subtract, multiply, and divide, then yeah, maybe they shouldn’t have a calculator.

But if you’re trying to teach statistics or predictive models or whatever, or, you know, get a rocket to Mars, it might be helpful to have a tool or a tool or three to rely on. Absolutely. 

Shea: And I think we can integrate generative AI or other forms of AI into education and adapt in the ways that we have for every new kind of technology.

Larry: So, what else keeps you up at night? What should we be concerned about right now in terms of, especially on the privacy front, given all the things that are going on right now? 

Shea: I don’t know if I have a specific issue or technology that is the one that keeps me up at night, but I definitely have some, some sub-areas of interest. Like I’m doing a PhD right now in education and most of my work on that is around school shootings and that can be more emotionally involved and a difficult topic to engage in.

And I probably if I do lose sleep, it’s going to be over that area. Versus some of the other things that we’re talking about, like, you know, a I or chat GPT. 

Larry: Yeah, we’ve been thinking a lot about both school shootings and what’s going on in the Middle East right now in terms of its impact on children. Not necessarily the specific victims who are affected by the actual shooting, but the rest of us that have to read about it and watch it on television and think about it and possibly see gory aspects that are on the internet. What impact is that having on people’s psychics, especially the other children?

Shea: Yeah. I think that it’s a really good practice to decide for yourself how much do you want to be exposed to that and then try to mute or filter out some of the things that you’re, you don’t want. So for people who are going to be particularly affected, triggered or harmed by those kinds of images, it is totally okay to set some boundaries and step away.

Like, I don’t think everyone has to be engaged in every issue all of the time. I think that it’s more than fair to say, you know, I need to create some space and protect myself from that. And so I think that, like, as much as you can, you know, using the tools, such as mutes or you know, even stepping away from a platform or two is a good practice.

So some other issues. I mean, when we think again about the sort of diverse society we live in. People are behind algorithms, people are behind databases, and people are biased. And when I think about, you know, marginalized community, how can we begin to try to democratize or to equalize the data flow so it doesn’t harm or discriminate or deprive people who aren’t somehow reflected by those who create the data or create the algorithms?

Shea: That’s a really good, also a giant question. It’s going to depend a lot on the sector, on the technology on which strategies are most effective. First strategy I’d say is that you need to look at the team that’s building the technology and make sure that as much as possible, that the team is going to represent the people that are going to be most impacted by it.

So if there’s a significant difference between the identities of the people on the team and the identities of the people that are going to be subject to the effects of the technology, you probably have a problem. And in general, the tech sector isn’t great at this. The second thing I’d say is that you need to incorporate some feedback from those communities. So there are definitely ways, especially if you know your company is not able to take on more people or change its staff composition. You can at least have some feedback mechanisms that the people that are experiencing either some friction or some harm or consequences that they can give the developers. Or at least the company feedback on how to change it and make it more equitable.

Thirdly, I’d say that there are actually cases where maybe technology shouldn’t be created. That there are some technologies that are premised on either exploitation or exclusion or disparate impact that we can just sidestep and say, no, I don’t think that actually is going to be helpful for most people, or I don’t think it’s going to be helpful or equitable for everybody.

And so maybe we shouldn’t make that tool. Maybe let’s focus on other questions, tools, and problems that are more important and especially contribute to more equitable outcomes for everybody. 

Larry: I’m curious about your thoughts on something that I think a lot about and sometimes talk about at events, which is the line between protection.

I can protect you from ever getting into a car accident by never letting you out of your house. We can pretty much, unless a car goes through your front door, pretty much guarantee you’re not going to be harmed by a car if you’re locked up in your house. That’s an extreme example, but in the name of protecting children, for example, at least two states in the United States have passed laws that would essentially restrict their access to social media until they’re 18 without parental permission. 

There’s a lot of people, including members of the LGBTQ community, who are very concerned about that. The UK, even though it’s somewhat not totally restrictive has just recently passed  and starting to enforce a new law, which again, it gives parents certain rights in terms of controlling their children’s access to content. And I’m just curious. And then if you think about the one law that’s already in place federally, COPPA, the Children’s Online Privacy Protection Act, that actually get into your own wheelhouse because it’s really there to protect data, to protect children from disclosing personally identifiable data, which is a very laudable goal.

But its impact is to keep children off of social media. Whether that’s good or bad is a separate question, but that was an unintended consequence of the law, which is essentially to effectively ban them from social media. So I’m curious your thoughts about where policymakers should be thinking in terms of how to protect whether it’s data or other forms of protection versus over restriction on social media and other technologies, 

Shea: That’s hard. You’re asking about a really complex issue 

Larry: Why I brought you on. 

Shea: All right, so I’ll approach this a little bit theoretically, and then we’ll dig into how it’s applied. I’ve been trained in critical theory, and that means that it informs how we think about this question.

And largely it makes me ask, first of all, who has power in those conversations? Who has the most say and how? Something like rights or protections are meted out and in general, I tend to advocate for a redistribution of power. So there’s a saying in the advocacy world that whoever’s closest to the pain should be closest to the power.

And that means that If someone or a group of people is harmed by something, they should be given the tools to change that thing or be allowed to participate and given significant deference in how that thing operates in the future. So when it comes to a debate of rights versus protection, I think in general, in the U.S. at least, the default has been the people that want to do the protecting tend to have the most power. And that the people that are making a rights based argument tend to have the least. And that has usually hurt marginalized people most. And so I tend to favor a rights based argument versus a protection based argument because I feel like the protection based argument has tended to be abused in favor of a minority.

So specifically, I would want to make sure that in any particular case, be it COPPA or another law that the people that are saying that their rights are being infringed upon or truncated in any way are probably given the benefit and the majority of the power in making those kinds of decisions versus the protection based argument of we need to,

Larry: I’m hearing very little outcry from young people demanding more parental controls over their social media.

I rarely see that, you know, we demand that we don’t have the right to go on Facebook or Instagram without my… I don’t, I’m not hearing that from them. 

Shea: Yeah, it’s not there. Right. Like most of the people, like most of the um, the teens and children that I know at least have not made that argument.

But I definitely know a lot of people that have become adults since then that came from households where their parents were, you know, perhaps more conservative or strict, and definitely controlled their access to things around the internet that they believed would have been very helpful for the development and, you know, connecting with communities that were either far away or difficult to identify in their local communities.

And this is especially true for the queer community. And so, again I would favor like the gay or trans teen, their ability to connect with other gay or trans teens over the parents’ preference to say, we don’t want to allow our children to connect with those communities because of our values or beliefs.

Larry: And I would extend that also to people whose religious or political beliefs or interests may differ from their parents. I’m sure there are Republican teens in this country whose Democratic parents would prefer they not go on to act blue, but the, but yeah. I mean, it, it works, it works in all different directions.

I always point that out because it’s often Republicans that are behind restricting, but it could affect them negatively as well, if they suppress the ability for young people to explore ideologies of various kinds. 

Shea: Yeah, it definitely applies to any sort of combination of political bent or religious commitments.

And so I don’t think it is just a, you know, a democratic or a republican argument. I think that it’s a natural thing for like most parents and children to have maybe a differing set of values. Especially during, you know, teenage years. I remember when that happened for me and when I wanted to start splitting off from my parents in terms of like how I wanted to explore life.

And I, one of the things that I needed most was agency. I needed the ability to have some kind of self determination and the internet is a critical tool for that. 

Larry: I really appreciate your thoughts, Shea, and uh, wish you the best. So Shea Swagger from the future privacy forum.

Thanks so much for joining us. Thanks for having me.