Welcome to my blog!

Musing, wanderings & and learnings

Christopher M. Stewart, Ph.D.

🤖 On Data and the Two Realities of AI 🤖

April 9th, 2026

If you read my last post, you might recall the idea that "find the words, learn the grammar, let the geometry do the work" goes further than you might think. Two recent projects have added a layer of complexity to that story. It turns out, geometry is a hungry beast.

Reality #1

I recently built a dependency parser for English using a single hidden-layer feedforward network. A dependency parser predicts how the words in a sentence are related to one another. "The cat sat on the mat" seems simple, but it turns out to be surprisingly difficult to model. Nothing fancy. It ran in seconds and hit 88% accuracy. The math was simple, but the dataset was large and well-annotated. The geometry did the work because the words were there.

Reality #2

I then built a neural machine translation model that translates Cherokee, a language with fewer than 2,000 fluent speakers, to English.

This time, the architecture was sophisticated: a bidirectional LSTM with an attention mechanism, the same core "score and weight" math that powers modern transformer-based LLMs. It was elegant, cutting-edge, and harder to engineer than I anticipated! It scored a 12.13 out of 100 on BLEU, a common metric for translation quality. Professional translation usually lands between 40-60.

On data

The architecture wasn't broken. My implementation wasn't naive. The problem was the fuel.

Cherokee is often called a "low-resource language." I have always found the framing of that term very grating, but it's hard to argue with its implications from an AI/ML point of view. From that same point of view, it's not just languages that are "low-resource". The moment you need AI to perform reliably in a specific domain, whether it's processing filings for a novel legal theory or classifying customer service tickets for a new business, you hit the same wall. General-purpose models get you started, but the last mile of performance still depends on data that probably doesn't exist yet.

As Fred Jelinek famously put it: "There's no data like more data." If the words aren't there, the geometry has nothing to shape.

🔗 Dependency Parser | Cherokee-to-English NMT🔗

📊 Is data "data"? 📊

March 25th, 2026

At a breakfast at Google's LAX campus about ten years ago, I was discussing my team's crowdsourcing work with some colleagues. Two linguists and a quantitative psychologist working on the kinds of projects that companies like scale.ai would eventually make gobs of money doing for other companies. The data that we were collecting was all about ads, yet the three of us had no background in advertising whatsoever. The consensus was that "data is data". We didn't necessarily need to have extensive experience with advertising or individual partner teams' use cases to do our work of getting high-quality data to train and/or validate their models.

Fast-forward a decade. For the past 9 months, I have been working with lots of different collaborators on different kinds of projects. The people that I collaborate with have titles like AI scientist, faculty member at a research hospital, professor, graduate student, and startup founder. The various projects address AI safety, both from a methodological standpoint and in real-world contexts like cancer survivorship care, red-teaming, language patterns in AI output vs. human writing, among others. I have found that the deeper the collaboration, the more important it is to understand the culture of the domain in which my collaborators work. For example, physicians have a very particular perspective on safe and secure deployment of medical AI that has to be taken seriously for a fruitful collaboration. A focus on the patient rather than AI results in different priorities. Data is data only works up to a certain point.

With that being said, there are contexts in which the "data is data" mindset can be helpful. In the second week of XCS224N, Stanford's NLP with Deep Learning course, we did a deep dive on the 2013 paper that introduced Word2Vec, a computational implementation of the famous British linguist J.R. Firth's "you shall know a word by the company that it keeps". In 2018, this insight showed up in chemistry. Mol2Vec treats molecules the way Word2Vec treats sentences: break a molecule into its substructures, treat each one as a "word", and learn embeddings from a corpus of 19.9 million compounds. The result is a 300-dimensional vector for any molecule, a compressed representation of its chemical "meaning". In a recent interview, I used mol2vec (and XGBoost) to predict the temperature range where polymers transition from a hard, brittle glassy state to a soft, pliable state, the glass transition temperature (Tg), from string representations of their structures known as "SMILES". On a held-out test set, the trained model achieved an R² of 0.78. Not too shabby.

I have no background in chemistry or materials science. But the paradigm of "find the words, learn the grammar, let the geometry of the embedding space do the work" appears to have some juice in polymer informatics, just like in natural language processing. Sometimes data really is data.

🔗 Full notebook and code

Stanford AI Professional Program: NLP with Deep Learning

Spring 2026

I am currently working on completing Stanford Engineering’s Artificial Intelligence Professional Program and am taking XCS224N: Natural Language Processing with Deep Learning. The course covers 10 modules and 5 assignments, providing deep theoretical and practical grounding in modern NLP. Below is a summary of my notes from the first three modules, followed by the full reference document.

Notes: The embedded PDF below contains my complete reference notes from Modules 1, 2, and 3 of XCS224N, including mathematical derivations, worked examples, and key takeaways for later modules. I'll keep it current as I progress through the course's modules. The math is intense for me, but I'm lucky to have time to work on digesting it right now.

RAG via Hierarchical Bayesian Language Modeling

LinkedIn Post · March 1, 2026

This post previews a forthcoming COLM submission about automatic prompt optimization with neural and non-neural RAG for financial question-answering. Is there an alternative to RAG? Maybe something Bayesian? Perhaps even better than neural embeddings and similarity in some cases? The discussion touches on AI, NLP, and practical considerations for building effective retrieval pipelines.