Build With Me

A Data Scientist's Workshop

Building real things with AI, open data, and a security mindset.
From messy datasets to deployed products. NYC-based.

Let's Learn Together

Welcome to the workshop. The world is moving fast — and technology is moving even faster. There's a lot to learn: new things, old things, things that didn't exist six months ago. I'm a data scientist based in NYC with a background in cybersecurity, and I'm here to learn alongside you. We'll work through building real tools together, varying in scope and difficulty, and I'll show exactly how I use LLMs and AI throughout the process — because they're part of how this work gets done now. So here we are. Welcome.

But this isn't a vent blog. This is a workshop. I want to build things in the open and show you exactly how I do it — every decision, every false start, every ugly intermediate step. Because that's the part nobody talks about.


Here's what I believe: We are living through a genuine shift in how technical work gets done. Tools like ChatGPT, Claude, and the broader ecosystem of AI assistants have changed the game. I started my career right before this wave hit, which means I'm in a strange middle ground. I didn't get decades of deep programming fundamentals before AI showed up. But I also know enough to understand that "just ask the AI" is not a strategy. It's a starting point.

A lot of people will tell you that learning to code properly doesn't matter anymore. I disagree. Not because everyone needs to be a software engineer, but because knowing how to think through a problem is what makes AI tools actually useful. If you can't break a problem down, you can't prompt your way out of it. If you don't understand what a deployment pipeline does, you can't ask an AI to build one for you. The fundamentals matter more now, not less — they just show up differently.

what this blog is about
# Not just code. The full picture.

$ find /real-world-problems -type f
  ./messy-data/cleaning-and-parsing
  ./ai-tools/using-them-to-actually-build
  ./deployment/getting-it-into-production
  ./security/because-that-matters-too
  ./problem-solving/the-real-skill

$ echo "Let's build."
  Let's build.

This blog is about the full lifecycle of building things. Not just writing a script in a Jupyter notebook and calling it done. I'm talking about taking a real problem, finding the data, cleaning the data (because it's always messy), building something useful, deploying it so people can actually use it, and thinking about security along the way. That's the work. And I want to walk through all of it with you.

I'm going to use AI throughout this process — not as a crutch but as a collaborator. I'll show you when it helps, when it doesn't, and how to think about the problems themselves so you can make these tools work for you regardless of which model is trending next month.


So what am I actually going to build? I have a few projects lined up, and they all use publicly available data so you can follow along. Here's the roadmap:

Project Roadmap
01

Where Are NYC's Algorithms?

New York City publishes data about where it uses algorithmic tools — tools that make or influence decisions about residents. The dataset exists in the NYC Open Data portal, but it's not exactly plug-and-play. I'm going to walk through acquiring this data, parsing and cleaning it, making sense of what's actually there, and eventually building a website to present it in a way that's actually useful.

NYC Open Data Data Cleaning Web Development AI-Assisted
02

Vendor Risk: Mapping CVEs to the City's Supply Chain

When a zero-day drops on a major vendor — say Microsoft — how do you figure out the blast radius? The city's vendor data is public. CVE databases are public. I'm going to link them together and build a tool that answers: if this vendor gets hit, who in the organization is affected?

Cybersecurity Supply Chain CVE Analysis Risk Assessment
03

Locking Down MCP Servers

Model Context Protocol is opening up new ways to extend AI systems, but security is an afterthought for most implementations. How do you apply zero trust principles to an MCP server? What open source tools can you use? I'm going to find out and document everything.

MCP Security Zero Trust Open Source
04

The Asset Attribution Problem

In any large organization, asset ownership data is scattered everywhere. Different systems, different formats, different levels of staleness. How do you build a multi-tiered agentic pipeline that pulls this together and actually makes sense of who owns what? This one's complex. That's the point.

Agentic Pipelines Data Integration Enterprise

Every project will be built in the open. I'll share the code, the data sources, the mistakes, and the decisions. If you want to build along with me, you'll have everything you need.

Who is this for? If you're a professional who's been in your field for a few years and you're curious about how AI tools can help you build real things — not just generate text, but actually build and ship — this is for you. You don't need to be an expert programmer. You don't need a computer science degree. You need to be willing to work through the messy parts, because that's where the learning happens.

This is also for the people who, like me, have ideas that don't fit neatly into a Jira ticket or a quarterly roadmap. The ones who want to build something and put it out into the world on their own terms.

Welcome to the workshop. Let's build something.

Coming Up Next
Series 01 — Part 1

Getting the Data: NYC's Algorithmic Tools

Pulling data from the NYC Open Data portal and taking a first look at what we're working with.

Series 01 — Part 2

Cleaning the Mess

The dataset isn't ready to use out of the box. Here's how we parse, clean, and structure it.

Status

Projects Coming Soon

Project pages are being built out alongside the work itself. Check back as the series progresses — everything will be documented here.

Status

About Coming Soon

More details on who's behind this workshop are on the way. For now, start with the writing — that's the whole point anyway.