- The AI UX Dispatch
- Posts
- š Key AI Reads for August 27, 2025
š Key AI Reads for August 27, 2025
Issue 12 ⢠MIT study on enterprise AI implementations, RAG and model fine-tuning working together, evaluating the performance of digital twins and synthetic users, ChatGPT versus Claude memory implementations
AI in the Enterprise
MIT study casts a shadow on enterprise AI implementationsāor does it?
This past week, MIT released a study that claims 95% of corporate generative AI pilots have failed to deliver meaningful returns, leading to a crop of doom-and-gloom headlines and market reaction.
However, what's been overshadowed are the limitations of the study methodology and a more thoughtful interpretation of the findings.
From Ethan Mollick on LinkedIn: "I am not sure how generalizable the findings are based on the methodology (based on 52 interviews, convenience sampled, where failed apparently means no sustained and significant P&L impact within six months, without an explanation of how they coded the interviews). I have no doubt pilot failures are high, but I think it is really hard to see how this report gives the kind of generalizable finding that would move markets."
From Nate Jones ($): "More importantly, look at what the 5% who succeed are achieving. The study mentions startups jumping from zero to $20 million in revenue within a year. Established companies seeing what they call "exponential gains" in productivity. The failures aren't about AI's limitationsāthey're about implementation. Companies trying to build their own tools instead of partnering. Focusing on the wrong use cases. Lacking proper data infrastructure. Missing the cultural change management entirely."
As someone who made their career in enterprise systems, Nate's callouts especially resonate. Long-understood failure modes in enterprise system development are no doubt amplified with AI, given all the unknowns and complexities around it. These are still early, early days.
The GenAI divide: State of AI in business 2025
š Long Read (28 minutes)
AI in the Enterprise
RAG versus fine-tuning: it's not an either/or
Imagine your company wants to build an AI technical support chatbot. It needs to understand your product deeplyāits features, common problems, even quirks in how your latest feature works. A chatbot based on a large, general model such as GPT-5 can sometimes be surprisingly helpful for the most common problems, but may not be able to address the breadth of issues people encounter in sufficient depth. How do you get from generic AI to something that actually knows your product?
This week, Armand Ruiz shared an implementation sequence for addressing this challenge using two established approaches: RAG (Retrieval Augmented Generation) and model fine-tuning.
With RAG, you connect the large general model to your company's support knowledgebase, documentation, and FAQs. Now, when someone asks "How do I use the new timeline view?", the system first searches these documents, retrieves relevant information, and feeds that context to the LLM along with the question.
Another, more involved technique (with higher up-front costs) is using a model fine-tuned to answer questions about your product. With fine-tuning, you start with a pre-trained model (already trained on massive amounts of text) and continue training it on your specific data.
In his LinkedIn post, Armand addresses a common question with fine-tuning: "A customer asked me WHY fine-tuning at all when they can get all they want with RAG?ā
The answer is significant cost savings: āYou will want to fine-tune to move out of very large LLMs that are not cost-effective. A Small Language Model (SLM) tuned for your use case can yield 10-50x savings in production."
Armand recommends a staged progression:
Start with a large model (such as GPT-5) to explore and validate use cases.
Add RAG to ground responses in your reality.
Fine-tune a smaller model for 10-50x cost savings.
He concludes:
"The answer to RAG vs. fine-tuning is not an either/or choice. The two techniques complement each other brilliantly. For example, a fine-tuned customer support chatbot could leverage RAG to incorporate the latest customer data, providing contextual and personalized responses."
It's worth reading his whole post and the associated comments.
Armand Ruiz on LinkedIn
ā” Quick Read (1 minute)
AI for Research
Putting digital twins and synthetic users to the test
From the Nielsen Norman Group:
"Can AI-powered models replace real people in user research? A growing body of research is exploring whether digital twins (generative AI models designed to simulate individual users) and synthetic users (models that mimic broader user groups) can replicate real human responses. In UX, these technologies raise exciting possibilities for scaling research, filling in gaps, and running studies that might otherwise be too slow or expensive."
The NN/g article goes on to summarize findings from three studies evaluating the performance of digital twins and synthetic users: how they were built, what kinds of tasks they performed, and how closely their results matched real human data.
Key takeaways from these studies:
"These findings reinforce the idea that digital twins and synthetic users exist on a continuum. Models based solely on generic demographic or persona-like inputs ā such as those used for synthetic users ā tend to underperform compared to those enriched with deeper context. That context may be tailored to the individual, as with digital twins, or rooted in detailed domain knowledge, as with synthetic users enhanced through RAG-based techniques."
There is nuance in the findings, including ethical considerations, so I recommend reading the entirety of NN/g's article.
Evaluating AI-Simulated Behavior: Insights from Three Studies on Digital Twins and Synthetic Users
ā Medium Read (12 minutes)
Frontier Models
Chat memory: a tale of two approaches (ChatGPT versus Claude)
ChatGPT and Claude have both rolled out memory features, but their approaches reveal differing philosophies about how AI should remember and draw on past chats. (For both products, memory is an optional feature you can turn on or off under account settings.)
ChatGPT's memory, available since February 2024 across most tiers, automatically builds a persistent profile of you across all conversations, learning your preferences and proactively applying them.
Claude's newly announced memory feature, currently exclusive to its highest tiers with broader rollout "coming soon," takes the opposite approach: it only references past conversations when you explicitly ask it to.
Overall, ChatGPT's approach focuses on convenience and seamlessness, while Claude prioritizes user control.
I've not yet had a chance to use Claude's memory feature because, at this writing, it's not been rolled out to Pro accounts. Anthropic prepared a quick demo of it.
ChatGPT's memory feature is somewhat polarizing; despite its promise, its approach makes it harder to compartmentalize context, particularly professional versus personal use. I prefer to "start clean" with a chat so that I can have complete control over the context, so I've turned ChatGPT memory off.
Because how memory works with chat impacts your chat interaction and results, it's essential to understand how these features work if you've opted into them.
Anthropicās Claude chatbot can now remember your past conversations
ā” Quick Read (4 minutes)
Memory and new controls for ChatGPT
ā” Quick Read (5 minutes)
Late breaking: ChatGPT has begun rolling out project-based memory, but at this time itās only available to Team/Enterprise accounts (not individual accounts), and memory must be enabled at the workspace level.
Thatās it for this week.
Thanks for reading, and see you next Wednesday with more curated AI/UX news and insights. š
All the best, Heidi
Reply