“Unstructured text data does a better job at predicting actual human behavior than any number of likert scale ratings”

Categories: Extractos I&M

Tom H. C. Anderson is the founder of OdinText, an American company dedicated to the natural language processing and advanced analytics. Companies like Disney and Coca-Cola use its services to extract information from complex, structured and unstructured text data. The company has been awarded with several innovation prizes by market research associations like ESOMAR, CASRO, ARF and the American Marketing Association. This interview has been made by Xavi Guiteras, from Empirica. 


Will ever a computer be able to completely understand our language and to react to this understanding? At which point is humanity in this race to make computers think like us?

What you are talking about is General AI, an extremely hard problem to solve for. Hollywood does a great job at solving for it, but in actuality we’re far away from that. An AI enthusiast would say computers can’t understand emotions, but I’d say the real problem is how broad of a topic area this would entail, at least as you’ve phrased the question.

We have various bots already of course that are technically responding to human natural language, under much more specific circumstances. A common example is if you follow a certain companies web or facebook page, you may opt-in to converse with a bot. This bot may try to keep top of mind with you by keeping in touch to ask if you are interested in various news or new offerings.

But it is far from human, and at least in my opinion can be more irritating than useful. That said, I think the more specific the application the sooner it may become useful.

text analytics

What is natural language processing? And which inputs have come to this discipline from the linguistics field? 

NLP is a very broad topic. By that I mean it can include and be synonymous with text analytics and text mining software like OdinText, but it can also include the voice in my Toyota FJ which only has the ability to dial a phone number more or less.

Over the years I have seen the term NLP change in meaning as it has been used in the area of text analytics and text mining, two other terms that technically mean the same thing, though text analytics is slightly broader than mining. I’ll typically prefer to use the term “text analytics” because I think it incorporates a good level of specificity for consumer insights, and it is also less scary/technical sounding than text mining or NLP.

10-15 years ago the approach to NLP/Text Analytics was far more linguistic in nature. Software companies at the time seemed to think that the key too unlocking the potential of Unstructured (Text) data was in understanding linguistic rules like grammar, syntax and POS (Part of Speech). Originally there were more “Linguists” or “computational linguistics” working in this field. However linguistics has fallen out of favor while AI/ML, Advaced Statistics and other mathematical automation has increased in importance. Basically for several reasons linguistics was less useful and important than we thought, including the fact that ‘real data’ rarely contains good spelling, grammar and syntax, not to mention various sources of data including multi language of course.

That said, I think most better text analytics vendors still include ‘linguistics’ to some degree when it makes sense. Though it needs to be localized for that language of course. Unless you use machine translation instead. Something I’m a fan of by the way.

In terms of languages, at least for consumer insights, English really is preferred most often. That’s because even at companies which are very global, while at some point they may have more than 10 different languages on any one data source, they like to have just one analyst be able to look across all of it, and then it needs to be translated. How many people speak more than 10 languages after all?


The message is just one of the elements that intervene in the communication process. Is, or will, text analytics be able to decode the context in which the message has been produced?

Not sure I fully understand the question. But, context can be and usually is very important of course. And so knowing the context, either because a tool is applied specifically to that context, or the company has some sort of IP that helps determine it, is important.


In Spain we associate “text analytics” with the automatic coding of sentences using a “dictionnaire-driven approach”. Is this assumption correct?

Well that is one approach. And it can work well in many cases. But there are others, and even with this approach, its less about the fact that ultimately some sort of custom dictionary/taxonomy/ontology is created, and more about how that is quickly and accurately created. And then after that, what else can be done to the data to identify additional patters, reduce and organize your data, and then also build models to predict future outcomes.

So if you are just talking about a simple dictionary in isolation applied to code up some data, that would be very simplistic approach in my opinion, and there is a lot more to good text analytics than that.



A long silence can offer great information about the respondent . Can text analytics solve this gap?

I’ve never thought about that, partly because we’re usually analyzing text, or spoken words that have been transcribed into text either by humans or via unsupervised machine transcription.

I think if it was very important, then it would be solved for. Wouldn’t be all that hard to include a blank/delay and transcribe that as such.


What is the role of a qual reseracher in the text analytics era? And what is the role of a quant reseracher?

Almost all serious text analytics right now lives on the quant side of things. There are various reasons for this ranging from lack of desire among qual researchers to do additional analysis and pay additional money for projects that are already quite tight, to the fact that it is technically a more difficult problem to solve for, and low sample sizes don’t allow for the same types of pattern recognition approaches to be used on qual data.

I expect this could change in the near future. But it’s not the lowest hanging fruit with the best ROI, so it will have to come later.


Reality is extense. That’s why the researcher puts a framework at this reality, to analyze just what he or she wants to discover. Will text analytics be able to use this information in order to create, by itself, a framework?

Absolutely. Our software already does this, builds better code frames, or said differently, does a better job identifying features aka topics or attributes in the data than humans. These have often lower incidence, but that our software detect as being important. Things which humans will miss. The software also confirms this again later on in the final components of the analysis.

This is one of many areas where machines can be better than humans. That and 100% consistency!


Can text analytics be a replacement of the quantitative techniques (let’s say surveys)? Rivals or complements?

Absolutely, certainly complement, but I believe also a complete replacement. Will allow surveys to be much shorter, which will increase quality of data. You will also be able to predict any structured metrics that are important. And from what I’ve seen, unstructured text data does a better job  at predicting actual human behavior than any number of likert scale ratings.

So let me reiterate, Yes and Yes!


About the predictive power of text analytics… in politics, can they be used to predict, let’s say, the election results?

We did sort of by accident, just before the US Election between Trump and Clinton, our software clearly showed that Trump was in a better position than Clinton, and we said so just before the election, and then when it happened we looked back again at how we had been able to see this.  I blogged about it a few times.

Later we  repeated this with French election, and saw a similar, though not quite the same pattern. I think what helped avert the rather possible French upset was the combination of government, media and people in France that something similar looked like it could happen in France, as well as the identification of possible Russian influence, which affected voting, as well as French voting laws and media freeze.

But yes, we have used text analytics in other election based analytics as well. Obviously a great way to understand what segments find what election topics important and how various positioning can affect outcomes.



From all of the above… artificial intelligence or intelligence amplification? What is the role of humans in the computers era?

While Unsupervised AI is the end goal, there is also Supervised AI. For highly customized ad-hoc research where training data is not available, and/or time frames do not allow for ‘AI proper’, this is a more realistic and better solution.


Going down from the theory to the practice… how the outcomes of an analysis look like? How is the process of a text analysis tool since the information is captured til the tool shows the results?

We have several vizualiations and tables etc. that we find work very well in communicating insights and findings. They are part of our IP. This is a step a lot of companies miss, they just create overly simplistic cute looking dashboards, or worse those stupid word clouds that don’t tell you anything.

The answer though is ultimately that output is up to the user, the analyst or manager using the tool and what they need to understand and communicate. It is sort of like asking me, “what kind of output should I use with numbers/math?”, it’s limitless and depends on everything from context specifics to imagination of the researcher. But ultimately of course it’s about effective communication of insights.  But every output technique available to math/numbers is also available to text, and more!


Sorry… I couldn’t help asking this question… in case we are not alone in this universe, would be analytical tools faster than humans in the understaning of an ET language? What if their grammar structures are totally different from those used by humans?

Haha, you’ve asked some interesting very good questions today. Far better than I usually get.  Ironically, even this one is somehting I’ve discussed on the blog once before here.

The fact is I believe communicating with intelligent aliens, is a great way to illustrate the role of linguistics, or lack thereof in NLP. I believe it would be an extremely easy problem to solve for. We have multiple ways of solving for it ranging from more basic, to AI. And as you think about this, you realize it’s just a function of mapping responses, something that computers are extremely well suited to do. And ironically, confirms what I said at the beginning of this interview, that linguistics is a relatively unimportant discipline in the grand scheme of things when it comes to text analytics…

Article published in the magazine ‘Investigación & Marketing’. Number 139 – July 2018.


Deja un comentario