Anthony Deighton is CEO of Tamr. He has 20 years of experience building and scaling enterprise software companies. Most recently, he spent two years as Chief Marketing Officer at Celonis, establishing their leadership in the Process Mining software category and creating demand generation programs resulting in 130% ARR growth. Prior to that, he served for 10+ years at Qlik growing it from an unknown Swedish software company to a public company — in roles from product leadership, product marketing and finally as CTO. He began his career at Siebel Systems learning how to build enterprise software companies in a variety of product roles.
Can you share some key milestones from your journey in the enterprise software industry, particularly your time at Qlik and Celonis?
I began my career in enterprise software at Siebel Systems and learned a lot about building and scaling enterprise software companies from the leadership team there. I joined Qlik when it was a small, unknown, Swedish software company with 95% of the small 60-person team located in Lund, Sweden. I joke that since I wasn’t an engineer or a salesperson, I was put in charge of marketing. I built the marketing team there, but over time my interest and contributions gravitated towards product management, and eventually I became Chief Product Officer. We took Qlik public in 2010, and we continued as a successful public company. After that, we wanted to do some acquisitions, so I started an M&A team. After a long and reasonably successful run as a public company, we eventually sold Qlik to a private equity firm named Thoma Bravo. It was, as I like to say, the full life cycle of an enterprise software company. After leaving Qlik, I joined Celonis, a small German software company trying to gain success selling in the U.S. Again, I ran marketing as the CMO. We grew very quickly and built a very successful global marketing function.
Both Celonis and Qlik were focused on the front end of the data analytics challenge – how do I see and understand data? In Qlik’s case, that was dashboards; in Celonis’ case it was business processes. But a common challenge across both was the data behind these visualizations. Many customers complained that the data was wrong: duplicate records, incomplete records, missing silos of data. This is what attracted me to Tamr, where I felt that for the first time, we might be able to solve the challenge of messy enterprise data. The first 15 years of my enterprise software career was spent visualizing data, I hope that the next 15 can be spent cleaning that data up.
How did your early experiences shape your approach to building and scaling enterprise software companies?
One important lesson I learned in the shift from Siebel to Qlik was the power of simplicity. Siebel was very powerful software, but it was killed in the market by Salesforce.com, which made a CRM with many fewer features (“a toy” Siebel used to call it), but customers could get it up and running quickly because it was delivered as a SaaS solution. It seems obvious today, but at the time the wisdom was that customers bought features, but what we learned is that customers invest in solutions to solve their business problems. So, if your software solves their problem faster, you win. Qlik was a simple solution to the data analytics problem, but it was radically simpler. As a result, we could beat more feature-rich competitors such as Business Objects and Cognos.
The second important lesson I learned was in my career transition from marketing to product. We think of these domains as distinct. In my career I have found that I move fluidly between product and marketing. There is an intimate link between what product you build and how you describe it to potential customers. And there is an equally important link between what prospects demand and what product we should build. The ability to move between these conversations is a critical success factor for any enterprise software company. A common reason for a startup’s failure is believing “if you build it, they will come.” This is the common belief that if you just build cool software, people will line up to buy it. This never works, and the solution is a robust marketing process connected with your software development process.
The last idea I will share links my academic work with my professional work. I had the opportunity at business school to take a class about Clay Christensen’s theory of disruptive innovation. In my professional work, I have had the opportunity to experience both being the disruptor and being disrupted. The key lesson I’ve learned is that any disruptive innovation is a result of an exogenous platform shift that makes the impossible finally possible. In Qlik’s case it was the platform availability of large memory servers that allowed Qlik to disrupt traditional cube-based reporting. At Tamr, the platform availability of machine learning at scale allows us to disrupt manual rules-based MDM in favor of an AI-based approach. It’s important to always figure out what platform shift is driving your disruption.
What inspired the development of AI-native Master Data Management (MDM), and how does it differ from traditional MDM solutions?
The development of Tamr came out of academic work at MIT (Massachusetts Institute of Technology) around entity resolution. Under the academic leadership of Turing Award winner Michael Stonebraker, the question the team were investigating was “can we link data records across hundreds of thousands of sources and millions of records.” On the face of it, this is an insurmountable challenge because the more records and sources the more records each possible match needs to be compared to. Computer scientists call this an “n-squared problem” because the problem increases geometrically with scale.
Traditional MDM systems try to solve this problem with rules and large amounts of manual data curation. Rules don’t scale because you can never write enough rules to cover every corner case and managing thousands of rules is a technical impossibility. Manual curation is extremely expensive because it relies on humans to try to work through millions of possible records and comparisons. Taken together, this explains the poor market adoption of traditional MDM (Master Data Management) solutions. Frankly put, no one likes traditional MDM.
Tamr’s simple idea was to train an AI to do the work of source ingestion, record matching, and value resolution. The great thing about AI is that it doesn’t eat, sleep, or take vacation; it is also highly parallelizable, so it can take on huge volumes of data and churn away at making it better. So, where MDM used to be impossible, it is finally possible to achieve clean, consolidated up-to-date data (see above).
What are the biggest challenges companies face with their data management, and how does Tamr address these issues?
The first, and arguably the most important challenge companies face in data management is that their business users don’t use the data they generate. Or said differently, if data teams don’t produce high-quality data that their organizations use to answer analytical questions or streamline business processes, then they’re wasting time and money. A primary output of Tamr is a 360 page for every entity record (think: customer, product, part, etc.) that combines all the underlying 1st and 3rd party data so business users can see and provide feedback on the data. Like a wiki for your entity data. This 360 page is also the input to a conversational interface that allows business users to ask and answer questions with the data. So, job one is to give the user the data.
Why is it so hard for companies to give users data they love? Because there are three primary hard problems underlying that goal: loading a new source, matching the new records into the existing data, and fixing the values/fields in data. Tamr makes it easy to load new sources of data because its AI automatically maps new fields into a defined entity schema. This means that regardless of what a new data source calls a particular field (example: cust_name) it gets mapped to the right central definition of that entity (example: “customer name”). The next challenge is to link records which are duplicates. Duplication in this context means that the records are, in fact, the same real-world entity. Tamr’s AI does this, and even uses external 3rd party sources as “ground truth” to resolve common entities such as companies and people. A good example of this would be linking all the records across many sources for an important customer such as “Dell Computer.” Lastly, for any given record there may be fields which are blank or incorrect. Tamr can impute the correct field values from internal and 3rd party sources.
Can you share a success story where Tamr significantly improved a company’s data management and business outcomes?
CHG Healthcare is a major player in the healthcare staffing industry, connecting skilled healthcare professionals with facilities in need. Whether it’s temporary doctors through Locums, nurses with RNnetwork, or broader solutions through CHG itself, they provide customized staffing solutions to help healthcare facilities run smoothly and deliver quality care to patients.
Their fundamental value proposition is connecting the right healthcare providers with the right facility at the right time. Their challenge was that they didn’t have an accurate, unified view of all the providers in their network. Given their scale (7.5M+ providers), it was impossible to keep their data accurate with legacy, rules-driven approaches without breaking the bank on human curators. They also couldn’t ignore the problem since their staffing decisions depended on it. Bad data for them could mean a provider gets more shifts than they can handle, leading to burnout.
Using Tamr’s advanced AI/ML capabilities, CHG Healthcare reduced duplicate physician records by 45% and almost completely eliminated the manual data preparation that was being done by scarce data & analytics resources. And most importantly, by having a trusted and accurate view of providers, CHG is able to optimize staffing, enabling them to deliver a better customer experience.
What are some common misconceptions about AI in data management, and how does Tamr help dispel these myths?
A common misconception is that AI has to be “perfect”, or that rules and human curation are perfect in contrast to AI. The reality is that rules fail all the time. And, more importantly, when rules fail, the only solution is more rules. So, you have an unmanageable mess of rules. And human curation is fallible as well. Humans might have good intentions (although not always), but they’re not always right. What’s worse, some human curators are better than others, or simply might make different decisions than others. AI, in contrast, is probabilistic by nature. We can validate through statistics how accurate any of these techniques are, and when we do we find that AI is less expensive and more accurate than any competing alternative.
Tamr combines AI with human refinement for data accuracy. Can you elaborate on how this combination works in practice?
Humans provide something exceptionally important to AI – they provide the training. AI is really about scaling human efforts. What Tamr looks to humans for is the small number of examples (“training labels”) that the machine can use to set the model parameters. In practice what this looks like is humans spend a small amount of time with the data, giving Tamr examples of errors and mistakes in the data, and the AI runs those lessons across the full data set(s). In addition, as new data is added, or data changes, the AI can surface instances where it is struggling to confidently make decisions (“low confidence matches”) and ask the human for input. This input, of course, goes to refine and update the models.
What role do large language models (LLMs) play in Tamr’s data quality and enrichment processes?
First, it’s important to be clear about what LLMs are good at. Fundamentally, LLMs are about language. They produce strings of text which mean something, and they can “understand” the meaning of text that’s handed to them. So, you could say that they are language machines. So for Tamr, where language is important, we use LLMs. One obvious example is in our conversational interface which sits on top of our entity data which we affectionately call our virtual CDO. When you speak to your real-life CDO they understand you and they respond using language you understand. This is exactly what we’d expect from an LLM, and that is exactly how we use it in that part of our software. What’s valuable about Tamr in this context is that we use the entity data as context for the conversation with our vCDO. It’s like your real-life CDO has ALL your BEST enterprise data at their fingertips when they respond to your questions – wouldn’t that be great!
In addition, there are instances where in cleaning data values or imputing missing values, where we want to use a language-based interpretation of input values to find or fix a missing value. For example, you might ask from the text “5mm ball bearing” what is the size of the part, and an LLM (or a person) would correctly answer “5mm.”
Lastly, underlying LLMs are embedding models which encode language meaning to tokens (think words). These can be very useful for calculating linguistic comparison. So, while “5” and “five” share no characters in common, they are very close in linguistic meaning. So, we can use this information to link records together.
How do you see the future of data management evolving, especially with advancements in AI and machine learning?
The “Big Data” era of the early 2000s should be remembered as the “Small Data” era. While a lot of data has been created over the past 20+ years, enabled by the commoditization of storage and compute, the majority of data that has had an impact in the enterprise is relatively small scale — basic sales & customer reports, marketing analytics, and other datasets that could easily be depicted in a dashboard. The result is that many of the tools and processes used in data management are optimized for ‘small data’, which is why rules-based logic, supplemented with human curation, is still so prominent in data management.
The way people want to use data is fundamentally changing with advancements in AI and machine learning. The idea of “AI agents” that can autonomously perform a significant portion of a person’s job only works if the agents have the data they need. If you’re expecting an AI agent to serve on the frontlines of customer support, but you have five representations of “Dell Computer” in your CRM and it’s not connected with product information in your ERP, how can you expect them to deliver high-quality service when someone from Dell reaches out?
The implication of this is that our data management tooling and processes will need to evolve to handle scale, which means embracing AI and machine learning to automate more data cleaning activities. Humans will still play a big role in overseeing the process, but fundamentally we need to ask the machines to do more so that it’s not just the data in a single dashboard that is accurate and complete, but it’s the majority of data in the enterprise.
What are the biggest opportunities for businesses today when it comes to leveraging their data more effectively?
Increasing the number of ways that people can consume data. There’s no question that improvements in data visualization tools have made data much more accessible throughout the enterprise. Now, data and analytics leaders need to look beyond the dashboard for ways to deliver value with data. Interfaces like internal 360 pages, knowledge graphs, and conversational assistants are being enabled by new technologies, and give potential data consumers more ways to use data in their day-to-day workflow. It’s particularly powerful when these are embedded in the systems that people already use, such as CRMs and ERPs. The fastest way to create more value from data is by bringing the data to the people who can use it.
Thank you for the great interview, readers who wish to learn more should visit Tamr.