It is difficult to get an accurate reading on how commonly a word is used in a given society. In fact, the task of measuring word frequency fully objectively is inherently impossible. The results will always be affected by the size of the corpus and the choice of the texts entered in it. On a global scale, where words take on subtle new meanings as they are appropriated into the semiotic structure of the actor and thereby changed, the problem becomes even more obvious. Frequency means nothing without cultural context.
This is not to say that frequency isn’t important. It is important and revealing. Frequencies are only broadly indicative of cultural salience and they can only be used as one among many sources of information about a society’s cultural preoccupations. But measurements only tell part of the story. And when they are decontextualized or proscribed meanings based on the person developing the algorithm that assigns sentiment. They give a potentially false understanding. To be correctly interpreted, figures have to be considered in the context of an in-depth analysis of meanings.
If four thousand people call a product “shitty,” it is fair to say that four thousand people reacted negatively to it. But that measurement can’t tell us about the culture of those people – are they engineers addressing it from a technological angle? Are they Venezuelan students reacting to a larger political issue? We assume that a word can be easily categorized along a linear trajectory – negative/positive, etc. But this isn’t necessarily the case. Words can be studied as focal points around which cultural domains are organized. By exploring these focal points in depth, we may be able to show the general organization principles which lend structure and coherence to a cultural domain as a whole, and which often have an explanatory power extending across multiple domains.
The underlying principle lacking in current social media monitoring processes is allolexy. The term allolexy refers to the fact that the same element of meaning may be expressed in a language in two or more different ways. Just as one word can be associated with multiple meanings, one meaning can often have two or more different lexical exponents. For example, in English, I and me are allolexes of the same primitive concept (In Latin, Ego). Often allolexes of a semantic primitive are in complimentary distribution. So in English, a combination of the semantic primitives someone and all is realized as everyone or everybody. In these particular contexts –one and –body can be seen as allolexes of someone; and –thing can be seen as an allolex of something. This notion of allolexy plays a particularly important role in social media monitoring because it allows us to build inflectional categories. For example, the forms am doing, did, and will do used without temporal adjuncts convey different meanings, but when combined with the temporal adjuncts now, before now, and after now, as in the sentences below, they are in complementary distribution and can be seen as allolexes of the same primitive DO:
- I am doing it now.
- I did it before now.
- I will do it after now.
When we apply an approach derived from an allolexical perspective, we can start to determine where sentences or words “match,” semantically, across languages, even though inflectional categories can differ considerably from language to language. In other words, if a word is taken out the process of discourse, it loses meaning and is therefore subject to interpretation that lacks a way of accounting for either semantic variance or semantic stability – it is nothing short of a guess.
In a sense it is true that words have no “fixed” meanings because meanings of words change. But if they were always fluid and without any “true” content, they could not change either. Words do have identifiable, “true” meanings, the precise outlines of which can be established on an empirical basis by studying their range of use and articulating the contexts that subtly repurpose them. The key point is that social media monitoring today does not account for semantic deviation and language as fundamentally tied to discourse.