A conversation with Ari Lamstein

Dear friends, Alberto here. Today, I bring you the second episode of our Friends of the Open Visualization Academy series; the first, as you may remember, was a chat with Nina Kruger. I really enjoy these conversations, so I have two more episodes in the works to be released in early 2026, probably before we launch the Academy itself.

My guest this time is Ari Lamstein, a software engineer and trainer who focuses on data science. He enjoys creating open source projects that help people better understand public data. His most recent project is an app that highlights trends in immigration enforcement data. That’s the project that we discussed in the episode; Ari has written extensively about it in his blog.

Ari knows that I usually include music in newsletter postings, so he also shared that he’s played classical guitar on and off for much of his life. Last year, he performed in the San Francisco Conservatory of Music’s adult extension guitar ensemble, and this piece was part of the end-of-term recital.

Without further ado, here’s my conversation with Ari (transcript after the video)

Transcript

Alberto Cairo; Ari Lamstein, welcome to Friends of the Open Visualization Academy. How are you doing?

Ari Lamstein: I'm doing well. It's a pleasure to be here.

Alberto Cairo: Yeah, it's a pleasure to have you. We want to talk about a very interesting project that you have developed, but before we get there, why don't you tell us who you are, what you do, where you are? Just an introduction to yourself.

Ari Lamstein: So, my name is Ari Lamstein, I live in San Francisco. I… As a bio, I started my career in academic computer science. I wanted to be a…

Alberto Cairo A professor of computer science, but after…

Ari Lamstein: 2 years, I had an internship at a video game company, actually, here in San Francisco. And I liked it so much, I didn't go back to my program. I dropped out. I spent the next decade working as a software engineer at tech companies in San Francisco. And towards the end of that decade, I started working on data projects, I learned the programming language R, I was working at a real estate company with a statistician. Doing some geospatial analysis. And that sort of changed the trajectory of my career. I got very interested in data science. I wrote some open source R packages, they became popular.

Alberto Cairo; That company was acquired… Let's give them some… some… let's promote them a little bit. What are those packages?

Ari Lamstein: Choropleth R, or I pronounce it Coroplethor. Most people… Have never heard of the word choropleth, but it's just a color-coded map. Where you show regions like states, and express values for those regions through color. And I discovered the U.S. Census Bureau has a lot of data like that, and I connected my mapping package to the Census Bureau's API, and it created all sorts of fun, fun maps.

Alberto Cairo: I've used that one, so I'm quite familiar with it.

Ari Lamstein: Wonderful endorsement.

Ari Lamstein: That's great. And then I started… When I was laid off from that job, I decided I wanted to double down on my own open source projects. I started doing corporate training around R and continued my own open source projects. Recently I've switched to Python, and the app that we're going to be talking about today is my second I'll call it my second independent project using, Python.

Alberto Cairo: Why Python? I mean, nothing against Python. Let's make that clear.

Ari Lamstein: That's… that's great. I never personally felt… So my background is very unusual for someone who'd programmed in R, in that I was doing real software engineering for a decade first. I never felt limited by R, but I saw that the industry was settling on Python for data analysis, and I thought that trend was so pronounced that I was harming my career prospects by not having at least parity between… in my skill and facility between the two languages. That was my motivation.

Alberto Cairo: Yeah, that's a good intel, too, I think, to many of the viewers of these conversations, because many of them probably, considering that they are watching this in the context of the Open Neutralization Academy and are interested in learning visualization, or beginning a career in data visualization, and one of the first questions that they usually ask, beginners, I mean, is which software tool should I use, or what programming languages should I learn? So you would say that Python will be sort of like a priority? For a beginner these days? It's really… Or does R… does R still do, you know, get things done?

Ari Lamstein: That's a great question, and you're… I assume you're asking me for my personal opinion here? Personal, total, totally personal opinion. This is all about personal opinions. So, I think that the book R for Data Science by Hadley Wickham and Garrett Krollamunda. Now there's another author, Mine, I think, on the new edition. Is the best introduction to… software analysis, data analysis using software, bar none, and I've taught intro are using my own workshop materials based on Garrett's. I think it is the best introduction you could have to that field, and I think ggplot 2 is a special piece of software that is uniquely designed and optimized for analyzing software. That's a personal opinion, but nonetheless, the industry has moved on. Has really moved on. So when I talk to people who have, like, I've talked to a lot of Python people now about data science. And they don't understand it in the way that… you understand it, having gone through that book, because that… the Tidyverse is so… Well optimized to how you think about data analysis. They don't get it. There's something they don't get in the way that you, you get something in… in math. But the industry's moved on, so it's very…

Alberto Cairo: It is… sorry to interrupt you, but that was exactly my experience. When I started teaching myself a little bit of Python for visualization, and then I started teaching myself R, particularly with the Tidyverse, and more specifically ggplot, to myself. R made a world of a difference. It's like, this is my world. I mean, this is a programming language that I can understand, that I can use, that get things done, but that's because I don't have a background in programming, and maybe that is why R was so friendly looking to me.

Ari Lamstein: It was designed… I read the history of how it was designed, and it was designed by the statisticians who wanted it to be close to how they fought.

Alberto Cairo; Yep, at the University of New Zealand, that's where our, or in the Department of Statistics.

Ari Lamstein: Well, but even further, because it's an implementation of the S language, which was created in the 70s, I think, by… I forget who it was. He's got… that guy's got a lot of great quotes. Tukey.

Alberto Cairo: It's based on John Tukey, yeah, yeah, yeah, absolutely. Sorry, I said it wrong, it's not the University of New Zealand, it's the University of Auckland in New Zealand. That's the place. Anyway, so let's switch back, let's come back to your career. So, nut then, eventually, you… you also change careers, right? These days, you are doing a lot of, training and consulting, that type of thing, right?

Ari Lamstein: That's right. I had a… I recently had a full-time job. I was called a staff data science engineer at a marketing analytics consultancy. I left that last year, and now I've been doing some workshops. I have a workshop on Streamlit, which is the web framework I used for the project we'll be talking about. I've also been doing some consulting and… You know, applying for… largely part-time jobs, because I want to continue developing projects like the one that I'm doing.

Alberto Cairo: meaningful projects that can make… can really make a difference. So I'm very excited about talking about it, because the whole reason why we are having this conversation is that I saw your project, and I thought that it was worth discussing, so maybe we can switch to that. Do you want to share your screen?

Ari Lamstein: Sure.

Alberto Cairo That way, people can also see your website and your weblog.

Ari Lamstein: Okay…

Alberto Cairo: And here we are. How has U.S. immigration enforcement changed in the past few years? How does this project… where are the origins of this project?

Ari Lamstein: Well… immigration enforcement sure has changed in America in the last year. That's… That's, I think, the story of our time right now. And… I wanted… people were talking about it a lot, everyone had noticed it was changing. And… I wanted to understand what the baseline was. I like… I like numbers, I like data analysis, I had emotional reactions to what was happening, but I wanted to really understand how it changed. And I didn't set out to… my assumption when I started was that someone had already done what I cared about. Which was to… Look at the historical context using numbers, let's say. And what I found was… I eventually wound up on this website, Track, the Transactional Records Access Clearinghouse, they have numbers like this, but they have another webpage. It was this, this page, This is the number of detentions, is what they called it. The data set is pretty complicated. It's not the number of people who were arrested per day, it's the number of people who were in any facility run by ICE. And I was interested in this graph, it shows ICE here is blue, so these are who arrested the person who's in ISIS detention.

Alberto Cairo: Apparently, Customs and Border Protection can arrest someone.

Ari Lamstein: But they can only hold them for 72 hours, I think. And then they have to put them into a facility run by ICE. And then there are the people, the ICE detainees who are arrested by ICE. And you could see here that number…

Alberto Cairo: Yeah, spikes.

Ari Lamstein: spikes, and then CBP goes down, and if you wanna… and they also have the criminal conviction status here. This is a very interesting graph. So, 71.5% of these people have no criminal conviction. I wanted to know, how has this number changed over time? Because that's a really important thing to know.

Alberto Cairo: It is part of the justification for this spike, correct? Is that the administration is saying, oh, we want to put all these criminals out of the country.

Ari Lamstein: Exactly.

Alberto Cairo: See that essentially three-quarters of them have no criminal conviction whatsoever.

Ari Lamstein: Exactly, and then you want to know, well, how has that changed over time? So you click see more data. And Trek gives you… they give you tables!

Alberto Cairo: Goodness gracious.

Ari Lamstein: And I said, well, This is something… That I… I can do. I can create that graph over time. And so that's what started this, this project. So here you can see that table has two datasets, actually. The arresting authority, meaning this is the arresting authority graph, this is the criminal conviction graph, but they both point to the same underlying data set. And… so, well, one thing is, if you graph that data and then you put in the annotations showing administration changes, that itself is interesting. But I wanted the criminality graph over time. And that's what you see here. And that… this page also has a lot more information. It shows percentages as well, the criminal status percentage, so I wanted… To, graph that as well. And that… Other underlying data set lets you see it also by the arresting authority. And that is a very interesting graph.

Alberto Cairo Why? Okay, so let's explain our viewers, because we have both read… you have written a blog post about the patterns.

Ari Lamstein: In this data, and I have read those blog posts, so we are both familiar with them, but can you describe the most.

Alberto Cairo: Intriguing or interesting patterns that you found when visualizing the data?

Ari Lamstein: Absolutely.

Ari Lamstein: The number of… let's… we have to be precise about what we're actually looking at here, because it's very easy to…

Alberto Cairo: First of all, maybe we can show viewers where your website, your website and your blog, so they can find more information about these.

Ari Lamstein: Sure. This is the… my… my website is arilamstein.com. This is the first blog post I wrote about the project, a Python app for analyzing immigration enforcement data. And I've written… we'll talk about the Border Patrol data set. Presumably next. That has two posts, because you've displaced together two datasets. Did you want me to talk about something specific about the post?

Alberto Cairo We are going to talk about all the patterns that you saw here, that you described in the blog post. I just wanted viewers to see your website so they can find more info.

Ari Lamstein: Thank you. And I, in the About page, I link to those three posts, and the GitHub repo, because this is all open source, and my hope is it's designed in a modular way, so if people want to use just a part of it to facilitate working with either of these datasets. That was the intention, and that's actually why I'm here, is to help journalists do similar analysis… analyses. They could do more interesting things faster, hopefully, by building on my work. But back to why this… Is so interesting is, These are the… Number of detainees. This is… every detainee in ISIS facilities has a criminal status and an arresting agency. So here we're looking at the ICE detainees, the detainees in ICE facilities who were arrested by ICE, And we're looking at how the… Composition of their criminal status has changed over time. The dataset that I got from TRAC starts in May 2019, and there you could see between 60 and… 80… it starts at… 67%, but it's basically 60-80% of the people detained by ICE, arrested by ICE, were convicted criminals. And then, when Donald Trump took office for the second time, and it's important to realize this start is Trump's first.

Alberto Cairo: Yeah, that's the first term. Yeah, that's the first term.

Ari Lamstein: And then in the second term, that percentage of criminal convictions drops precipitously.

Alberto Cairo: And then people without convictions increases quite significantly.

Ari Lamstein: And for the first time in the dataset convicted criminal is not the predominant criminal status. And I thought that was very interesting, and I wanted to… I thought… It would be useful for… More people to be aware of that.

Alberto Cairo: Yeah. You can also see another pattern here. If we go back in time, when Biden took office, you can see an increase in the percentage of the Chinese who are convicted criminals. That's correct. And a drop on people who had pending criminal charges. To both compensate each other.

Ari Lamstein: That's correct. And we're just looking here at the ICE detainees detained by ICE. nother really interesting trend is I like to point out when I don't know things? And sometimes, That's really useful, because it leads to someone who does know Contacting me. And in that original blog post, I pointed out so this is the arrest… this is the count data of the arresting authority. This blue line is the number of people in ICE's detention facilities who were detained by ICE, and it's skyrocketed, it's at an all-time high. But equally interesting is that the number of detainees detained by CBP has decreased. And I got very lucky in that someone introduced me to a statistician who works at the Department of Homeland Security, And… In my blog post, I said, I have no idea why the number of detentions by CBP have decreased. I was very upfront about that. And this statistician at DHS Said, I can tell you why. it's because less people are attempting to cross the Southwest land border. That's a phrase they use, the southwest land border. And he said, we have data sets that record every time someone on patrol on at the Border Patrol encounters a removable alien. Those are their terms. And he said, I bet And he said, that number… Those CBP encounters are a very interesting number. Because, he said, if there's a civil war in Honduras, people are leaving Honduras no matter what. But people are also motivated to come here, depending on what they perceive to be… how welcoming they've been the administration to be, then he said, Ari, I will tell you, my guess is that people outside of America Who are contemplating leaving their home country. And perhaps coming here illegally, without a visa, whatever term you want to use, they know that the Trump administration is not going to be welcoming to them. And I bet you could see that in the Border Patrol encounter data. And so I got that data set. You actually have to splice together two datasets, because the… and this is all in the About page, in the blog posts down here, and I assume your listeners will actually care about that data, so I'll mention it here. OHSS, the Office of Homeland Security Statistics, has, like, a historical Border Patrol encounter data set that goes back from October 1999, which is the start of the fiscal year, the federal fiscal year, and it goes to, like, November of last year, so it's the end of the fiscal year, plus two months, for whatever reason. And then CBP has their own month-by-month data set, so I spliced the two together to get the most comprehensive data set that I think is available on Border Patrol encounters along the Southwest land border, and it is… Fascinating. This graph is, like, the most interesting graph.

Alberto Cairo: There are seasonal patterns in there. During the Bush administration, you see things going up, down, up, down, up, down, up, down. Constantly.

Ari Lamstein: And the month is fascinating. So, I… my assumption, because I… I… I know nothing about… about this… this domain, really, I'm new to it, I should say, I thought the Southwest… my assumption was, if you told me there was seasonality, I would say, well, it's very hot. So I assume most people come in the winter. And then they're not coming in the summer. But the low month is continuously… and this is Clinton, by the way, and the seasonality really ends during Obama's administration, which we could talk about later. But the low month is consistently December, and the high month is consistently, like, March. Which is the spring. I googled it, like, why is that? People talk about it, I don't… really remember… I didn't find anything very authoritative and persuasive about why that seasonality existed, so I won't repeat it here, I'll just encourage others to look for it. I want to be upfront when I don't know things. And then this here, this is… the beginning of Trump's first administration is the lowest point in the data set up until from its beginning up until that point, and so that we're not even talking about Trump's second administration, so that statistician, he was really right, and wow, it is… It's the change of administrations led to a dramatic change there, as did Biden's first term, right at the start.

Alberto Cairo; Yeah, there's a spike in there. But there's also a spike in the middle of the Trump first administration, May 2019.

Ari Lamstein: And I'm really glad you pointed that out, because when I showed this to friends, they consistently… We live in a time when, partisanship is so ingrained in people, that they didn't want to admit that there was… a… even when presented with this… that spike, they didn't want to believe it existed. Consistently, and that was very surprising to me. I think you have the same mindset as me, which is. Analyzing data is playful. And we like to be surprised. And I found people saying, oh, I am not surprised at all. When you… when you say, Ari, that the lowest value was when Trump took office the first time, and now the new all-time low is the second time, I'm not surprised at all, and if you watch such and such news show, you won't be surprised at all that Trump, it takes it as a source of pride that this number is so low during his administration, and then I say, well, what about this huge spike? Did you… do you know about that? Is that part of the narrative that you have about this dataset? And they just don't want to talk about it. And if you Google it, which I did there is a… and again, I'm… I'm not an expert on this domain, I think I'm an expert on data analysis, but not… this domain is special. There is a set of countries called the Northern Triangle which are the countries that are on Mexico's southern border. I believe… El Salvador, Guatemala.

Alberto Cairo: Yeah, Nicaragua, perhaps, Honduras?

Ari Lamstein: And they… the only news articles I saw said. That the influx was from those countries. And the data set I have lets you see citizenship. I haven't done that yet. It… none of the news articles I googled about, you know, May 2019 southern border, southwest land border, none of them said specifically what was going on in those countries that led to it. And that's… Fascinating as well, that it's not part of our news contemporaneously to explain what's going on in these countries that are leading to it, we… we talked about that the contemporary news articles stated the fact of the increase and the citizenship, but it didn't go into what was going on that led to it.

Alberto Cairo But I believe that that is part of the value of a visualization such as this one that you have created, which essentially, they let you discover all these trends, all these patterns, spikes and lows, and so on and so forth you don't know what caused them, but at least the visualization itself prompts you to explore more and look for… that's what I always explain to people, that the value of exploratory data analysis, John Tukey style, is.

Ari Lamstein: Yes, sir.

Alberto Cairo: So, as long as I refer to Tukey before, right? John Tukey said, I mean, the value of this stuff is not that you will answer any question, it's that it will prompt you to look for more answers, or to pose better questions from the data, deeper and deeper and deeper questions from the data. Looking into the national origins of these immigrants will illuminate that spike a little bit more, who knows? But at the end, maybe the answers are not quantitative are more qualitative. We need to put all these in, sort of, like, the historical and political context of what was going on in Central America at this time.

Ari Lamstein: I… I agree, and… you know, when I talk to my engineering friends about this graph, they're… we should probably tell people the URL for the app. I should probably say that out loud.

Alberto Cairo I think that they are seeing it. That's a good idea, that's a good idea, yes.

Ari Lamstein: immigration-enforcement.streamlet.app. And to go with the point I was going to make earlier. When I… when I showed this graph to my… engineering friends their first response was, well, maybe there are more or less Border Patrol agents. And that's a very engineering way to view this data set and the questions that it poses. But it was fascinating when I spoke to a domain expert who's also very mathematical, the statistician at Homeland Security, and let that be a lesson to everyone who's watching this now you can contact the Department of Homeland Security. They… I went to a conference where the Association of Public Data Users conference, and I spoke about my app to someone there who put me in touch with this person, but he's like, no, part of my job is to talk to the public, so you should feel… anyone listening to this should feel they can do that, too. He said it's not… it's not mathematical. It's… you were saying quantitative versus qualitative. It's not that there are more or less Border Patrol agents, it's that things change in foreign countries, and our immigration policy changes, and people who are considering coming here know that. And they respond to it. And so that was a very… it was, as someone with a mathematical, technical background, that was a surprise to me. But I accept… but I… It interested… intrigued me enough to check, and… I think it's right.

Alberto Cairo: But I think that it is a feature, at least in my experience, that is a trait of even mathematically-oriented data analysts. They are very sensitive to these types of factors. They are very, very aware of the fact that not all analysis is quantitative. purely quanti… it cannot be, just because reality cannot be quantified in its entirety. All quantification is essentially an abstraction from reality, and that abstraction will not capture all the, sort of, like, the nuances of the essence of the realities that we're trying to describe. I think that's a great example of these. I love these for that.

Ari Lamstein: Thank you. I… I felt very lucky that I got introduced to that person, and that he… pointed me towards, that answer to the question I posed in that original blog post.

Alberto Cairo: What are the next steps here? Or what else could we learn from the application?

Ari Lamstein: What I'd like to do next is pursue that citizenship question. The data sets… Here, in the About page, I… I list where I'm getting the data from. That… those spreadsheets, and unfortunately, Department of Homeland Security does not have an API, so you have to get Excel workbooks and write scripts to import them into Python or R and so on.

Alberto Cairo: Old-fashioned way.

Ari Lamstein: Old-fashioned. And he, that statistician I spoke to, he cracked me up. Because when I… I began by saying… my first question is, am I getting the data right? Do you… are these the spreadsheets you have? You don't have an API? I have a lot of experience working with Census, they have a wonderful API. And he said, I was planning to twist your arm to… to say that. So that I could go to my boss and say, we really do need an API, but those spreadsheets have multiple… they're workbooks with multiple sheets. I think that's the Excel language, I don't really use Excel, I don't know. But they have that citizenship dataset, they also have a family status sheet, which… so I would like to have a drop-down here like, a multi-select that lets you see the citizenship over time. So you could see which con- the same graph per country. So, as an example. I could then look at this and say, I saw… I googled it, I saw contemporary news reports saying they were from the Northern Tri… Triangle countries. here's what that graph of those countries are. It would be interesting to know what countries are the most popular, for lack of a better word, right now. Where are people coming from along that border? And also… Again, because this is… Largely my own curiosity. I remember hearing about unaccompanied minors in the past, the family status, I think that's the term maybe they use. How many… When someone comes over, they're either an adult, a single adult, a family, a family with kids, or an unaccompanied minor. I'd like to see how the unaccompanied minor, story has changed over time, because that was in the news. Those are the two things that interest me to add right now.

Alberto Cairo: I wonder whether data exists about how many American citizens have been at least stopped by ICE or harassed by ICE, because we… I mean, we have read plenty of news stories everywhere, and there are very credible news that that has happened, and has happened repeatedly, because of all this onslaught of detentions that we are seeing in the news. But I don't know whether they even have that data. It would be great if they did.

Ari Lamstein: I believe ProPublica just had an article about U.S. citizens who were detained by ICE. Some for several days. That would not be in this dataset, because this… o there's this, this, this is a keyword, Southwest Land Border.

Alberto Cairo | www.albertocairo.com | He/him: Yeah, of course.

Ari Lamstein: When you download the… this dataset, his was specifically from CBP, Customs and Border Protection.

Alberto Cairo: Border Patrol protection, yeah. So it's just border crossings, essentially.

Ari Lamstein: They have 3 borders. Southwest land border, northern land border, and then… Ocean border or sea border.

Ari Lamstein: My first step was to… overwhelmingly, overwhelmingly, people are southwest land border, and then it gets much more complicated, where it's… Is it a Border Patrol action? s it an enforcement action on the Border Patrol, or is it… At a border crossing where it could be administrative, Or enforcement action. And I was just… I was told to look at Border Patrol. On the land border, which means it's not at, like, a port of entry. But… I don't… so I'm… so what I'm saying is that is an interesting question, but it would not be part of this data set. And I don't know… where I could get it.

Alberto Cairo: knowing where to get the… knowing where to get the data. But, you know, your project is what it is, and I think that what it does, it does it really well. It's a work in progress that can still be expanded, and I hope that organizations out there, or other individuals, will use it as a starting point for further investigations. You mentioned ProPublica, for example, which I believe is doing an excellent job, in terms of, sort of, like, holding the government accountable in these topics. I could also mention Mother Jones. I wrote about Mother Jones' coverage of ICE. on the detention center complex that is being built in the United States these days, and they're also doing great investigations about these. So, as you said, I mean, I think that there's still a lot of work to be done, lots of investigations to be done, and it would be great if those investigations could incorporate as much data as possible to put this story in context, so we could go beyond the individual anecdotes, or… I would not call them anecdotes, but individual cases, which are important, per se, because there are many, you know, heartbreaking stories that we have all read about, but if we combine those personal stories, which are heartbreaking on their own, with the data. hat puts those stories in context, and perhaps show that those stories are part of a pattern. I think that stories will become much more powerful.

Ari Lamstein: I agree, and there's one point I'd like to make. The… you were talking about building on this. The app, if you click on the About page, you'll see here the code for this app is open source. It's released under the MIT license and there's a section here in the README. These blog posts talk about the features I added and why. Here it says where the data comes from. And then there's a section I just added, Build Your Own Analysis. The… Data… If you're working with tracks data, you'll have to do some loading and cleaning and so on. That's in this… odule called detentions.py. For working with the Border Patrol data, there's a separate module for that called Border Patrol. And I just added this directory called Notebooks that has Jupyter notebooks that explain how to use those datasets, those modules. And because it's all released under the MIT license, you're free to use it as a starting point for your own analyses. It was intended as a contribution.

Alberto Cairo: Yeah, I was about to say congratulations on this, on this project. I think that it's, it exemplifies, something that I, I, I try to share with people these days in my talks that there are many areas, there are many topics, many stories that mainstream news organizations don't have the bandwidth to cover, or to cover deep enough. And that's where individuals who are motivated to illuminate the truth, and to tell those stories come in. I think that your project is a great example of that, so…

Ari Lamstein: Thank you very much.

Alberto Cairo: sharing it with us. Ari Lamstein, thank you so much for being here. It has been a pleasure.

Ari Lamstein: Thank you, Alberto.

Recommended for you