[151]
Pierre Maranda
Anthropologie, retraité de l’enseignement, Université Laval
“Qualitative and Quantitative
Analysis of Myths by Computer.” [1]
in Mythology. Selected Readings edited by Pierre Maranda, pp. 151-161. Harmondsworth : Penguin Books Ltd., 1972, 320 pp. Collection : Penguin Modern Sociology Readings.
- Abridged from Pierre Maranda, 'Analyse qualitative et quantitative de mythes sur ordinateurs' in B. Jaulin and J. C. Gardin (eds.), Calcul et Formalisation dans les Sciences de l’Homme, Centre National de la Recherche Scientitique, 1968, pp. 79-86. Specially translated for this volume by John Freeman and Pierre Maranda.

The object of this paper is to describe in broad outline a segment of the computerized analysis of myths. [...]
In the following pages, attention will more particularly be devoted to three points. Certain preliminary conditions for the automatic analysis of myths will first be briefly reviewed. Then, a procedure for the structural analysis of myths will be sketched. Finally, there will follow some considerations of quantitative analysis and some of its relations with qualitative analysis.
Preparation of documents
The data submitted to computer processing were a corpus of 135 G8 myths. Collected by the German anthropologist Nimuendaju and in the last decade by the anthropologists of the Harvard Central Brazil Project, these texts express the cosmology of four tribes of the G8 linguistic group, the Eastern Timbria, the Sherente, the Apinayé and the Cayapo. They all live in the Mato Grosso, in central Brazil, in a context which constitutes a sort of natural laboratory eminently favourable to comparative study (see Lévi-Strauss, 1964). Indeed, while the historical, ecological and technological parameters remain constant, one observes sociological variations in political organization, kinship, ritual and mythology. [...] The work is based on English translations which of course are carefully checked.
The first stage was the preparation of the documents. It is not [152] necessary to expand on the necessity of working with an analytic language. Natural languages even those already trimmed by the process of oral coding and transmission could not be submitted to operational processing without previously eliminating the ambiguities with which they are permeated.
The linguistic and logical equipment available happens to be sufficient to allow a relatively simple and economical translation of the myths from natural to analytical language. The mythic discourse is thus first of all divided up into minimal units. [...] which are elementary or analytical propositions. Their limits are marked by the plus sign (+), while the period (.) is reserved for groups formed from these indivisible units functionally articulated. In fact, it can be sufficient to use a single marker ; however, a second is useful to designate the units at the 'sentence' level, that is, those which consist of elementary propositions explicitly linked by conjunctions. One can also ‑use one or more other signs if one thinks it already possible at this stage to trace the outlines larger of divisions ('episodes', 'sections', 'mythemes', etc., see Propp 1958 ; Lévi-Strauss 1958, 1960).
The articulations of the mythic discourse are first normalized with the aid of a battery of unequivocal connectors defined with the help of a concordance (see below). Then a 'dictionary' grid, data filter is built. In the process, homographs, pronouns and other sources of ambiguities are replaced by terms unambiguous or made precise by suffixes between brackets. Finally, each term and function is assigned a numerical suffix designating its propositional role. I have elsewhere discussed in detail the procedures adopted in this matter (Maranda 1967a, 1967b ; see Gundlach 1965), and I prefer to reserve the main part of my paper to considerations of a different kind.
Automatic analysis
- Structural analysis
The computer processing begins once the texts have been rewritten in analytical language. The programmes used are the KWIC ('Key-Words-in-Context') and the General Inquirer (Dunphy, Stone and Smith, 1965 : Stone et al., 1966). [2]
[153]
The structural output is obtained by a series of restrictive instructions. In effect, instead of normalizing the texts in their totality, as in the case of the complete outputs for quantitative analysis, the computer concerns itself exclusively with certain aspects in which only the principal theme of the narrative is retained.
Some preliminary remarks are necessary here to explain the procedure. The documents are normalized to the first degree by manual translation from the natural language to the analytical language. A normalization of the second degree is produced by the computer assigning each term and each function to a specific analytical category. Ninety-nine descriptors had first been established for the G8 documents (a more considerable number are now available). These analytical categories are not formed a priori but rest on the data themselves. The KWIC programme (Stone et al. 1966) used at this stage provides an alphabetical concordance where all the words from the documents are found in their context. This allows on the one hand the easy finding of ambiguities which may have escaped at the time of the first degree normalization (there are also some automatic routines which can help at this stage) ; on the other hand, and this is its great merit, the concordance permits the compilation of an inventory of the semantic fields in which the documents are situated. Given that all occurrences of a term or function appear in it with the functions or terms with which this term or function is associated - e.g. jaguar and hunting, jaguar and verbal communication with a boy, the fear a jaguar has of a toad, jaguar and possession of fire for cooking, tapir and adultery with the village women, etc. - the concordance provides the basis for a grid to establish classes of actors and actions. The categories or classes thus obtained are formed of groups of terms and functions defined by a coefficient of association. At first nominal, that is, non-operational for the most part, these categories emerge none the less from the texts themselves and do not originate in the analyst's ingenuity. His role consists, in fact, only of choosing a name for each category, of which the definition, not at all arbitrary, is provided by an exclusive and exhaustive listing. We thus have [154] some fixed concepts to start off with, even if this first approximation must assuredly be completed subsequently by the operational verification of the nominal definitions, or in certain cases by its elaboration (see Maranda 1967b ; see Greimas, 1971). The battery of these descriptors thus constitutes a 'dictionary' of which the rules of construction, the content and the principles governing its use are described and discussed elsewhere (Maranda 1967a, 1967b).
Already at the level of nominal definitions, certain semantic domains appear strongly differentiated, others are scarcely present, etc. KWIC reveals, for example, that to encompass the G6 data, more than three times as many sociological as cosmological descriptors are necessary (Maranda, 1967a). This is already an indication of the orientation of the Gê myths, which is confirmed by more developed analyses.
Among the ninety-nine descriptors which summarize the principal G8 semantic fields, thirty-one are exclusively 'terms' or arguments, seven are exclusively 'functions', and seven are normalized connectors. Additionally, the function is indicated by analytical dispositions at the time of first degree normalization and by numerical suffixes added to the propositional elements. Thanks to these dispositions the computer can be directed, in such a way as to produce an output containing only, for each text, the structural argument of the documents.
The first instruction in this phase indicates to the computer the path to follow in the choice of structural elements at the level of the proposition. It aims the computer exclusively at the actors, whose occurrences it counts, to determine those of which the frequency is equal to at least 20 per cent of the total number of the analytical propositions of the myth. The bearing of this operation is that only the propositions which feature these actors (in whatever position) will be taken into consideration in the next phase. The second instruction concerns the conjunctions and predicates or 'functions' : all the propositions which contain the most frequently mentioned actors, and only those, are described in four columns, of which the first is devoted to the conjunctions, the second to the 'subject' actors, the third to the actions, and the fourth to the 'object' actors (see output, below). To clarify reading, repetitions are eliminated so that if, for example, [155] A occupies the 'subject' position in three consecutive propositions, it only appears once, in the first line. Finally, the computer reacts to any change of 'subject' by printing an order number immediately before the proposition where the change occurs. In this way an automatic division of the text into clearly defined episodes is obtained.
An example will no doubt be welcome at this point. I shall take a fairly short Sherente myth and leave aside the last part of it so as not to prolong my account unduly. (This decision is also justified by the fact that the short version was collected by Nimmuendaju some decades before that recounted by the Sherente to Maybury-Lewis as I shall quote it ; although neglecting some aspects contained in the Nimmuendaju version, the Maybury-Lewis version is more easily intelligible because less elliptic with regard to the plot). We have to do with a myth of the origin of women as told in a patrilineal society. The text comes first, followed by its structural description as the computer has established it.
After their emergence, the Sherentes were hunting when they met two women. They pursued them, and the women sought refuge at the top of a tree from which they refused to come down. Then, one of the men struck the tree, one blow, and when the man had struck the tree, one of the women transformed herself into a capivara and fell into the water emitting a feeble cry. This left only one woman left in the tree. The young men then went to the edge of the water and called to the woman : 'Woman, come down !' 'No, I will not come down !' said the woman. 'Come on, get down !' shouted the Sherente. Then the woman came down and the Sherente took her with them to their village. After nightfall, all the Sherente copulated with the woman, one after another. Finally, they killed her and cut her into pieces in order to divide her among themselves. Then each Sherente wrapped up his piece. The puma took a piece of the breast, but the sariema wrapped up his very tightly so that for this reason he now has a cross‑eyed wife. Then the Sherente went hunting. Then they delegated a scout to their camp, who found the village full of women cackling among themselves (Maybury-Lewis, ethnographic notes).
The continuation of the text describes how the women prepare [156] cakes for their husbands and, give them to the scout. He rejoins the hunters who return to the village with all speed.
Here now is the computer's interpretation. The frequency analysis gives 'group of hunters' and 'woman' as principal actors.
[157]
The computer next provides a second output where only those propositions are retained which belong to the document's major semantic fields, that is, where only the most frequently appearing function descriptors are found, from the point of view of the complementarity of which the document is then examined. Thus, it can be read from this output that the action of the story is characterized by a passage, by episodes, from the intransitive to the transitive followed by a return to the intransitive :
where the first chain of three episodes is repeated by the last itself also of three episodes. Then, in the middle chain (intransitive), and only there, the communication function appears. The refusal function (episode 2) is followed by the metamorphosis function (episode 4) while agreement (episode 8) is followed by transformation (episode 10). Finally, the functions intransitive, hunting and finding (followed by possession in episode 12 not represented in the table) form the first and the last episode, which allows us to read the document as follows :
- 1. Possible addition to the group of men (episode 1).
- 2. Subtraction of a part of the possible addition (episodes 2-4).
- 3. Actual addition (episodes 5-8).
- 4. Division (episode 10).
- 5. Multiplication (episode 11).
The computer is then instructed to produce an output expressing only the propositions linked by conjunctions having to do with relations other than temporal (such as inclusion, exclusion, [158] implication, causality, etc.) here, metamorphosis is implied by division.
Finally in this phase a last output extracts from the corpus all the documents which comprise the same sequence of 'structural' descriptors, that is, grouping the myths by categories like that of the passage from the transitive to the intransitive and vice‑versa, from submission to dominance and vice‑versa, etc.
These very rudimentary descriptions scarcely qualify as structural. But they do provide a summary of the paths followed by the action and successive switchpoints according to which the movements of the actors are directed in a sociological and cosmological exploration. It can be seen indeed that in the myth quoted, from the first to the eleventh episode, we pass from the 'finding' of the two women to their multiplication a kind of structure which reflects the agricultural one of cutting a tuber into pieces, planting these, and harvesting a multiplied input. This result is reached by a dialectic where a transitive operation to strike the tree (episode 3) leading to a metamorphosis (of the woman into a capivara, episode 4), is followed by a dialogue, which culminates in an intransitive action on the part of the woman (she comes down the tree), sexual relations and the division of the woman in fact, a metonymical operation. While of the two women at the beginning of the myth, only one remains after the attempt of the hunter acting in isolation and in a purely physical way, ('transitive movement', to strike the tree), the woman who remains represents all women so we have the individual for the species - and the action is raised to the level of communication. Then, we have the part for the whole since the possession of a piece of the woman represents and in fact generates the possession of a woman. I may be reading things into the output when I see in the conjunction 'implication' followed by 'division' in episode 10, the major articulation of the metonymical structure of this Sherente myth ; it remains true none the less that this summary suggests a characterization of the narrative in terms which, if they are not directly structural, hint at some interesting indications.
The version of the same myth collected by Nimuendaju (1944) begins with a description of the state of the society without women where the men practice homosexuality. One of them [159] becomes pregnant, and, as he cannot give birth to the child he carries, dies in labour. The men go hunting, find a woman perched on the top of a tree but notice nothing of her but her reflection in the water. Frustrated in their efforts to seize the image, they finally discover its source and seize the woman. The continuation of the text is as in the Maybury‑Lewis version, without the final episode of making the cakes.
A comparison of the automatic interpretations of these two versions is worth briefly examining. Only one aspect will be mentioned, as it is clear that the first episode of the Nimuendaju version is clearly different and the metonymical structure is the same in both cases. The computer invites us to contrast communication, present in the Maybury-Lewis version and absent in the Nimuendaju version, with false perception / true perception, present in the latter but not in the former. Besides, while the middle instransitive in the Maybury‑Lewis version is accompanied by communication, it is replaced by transitive in the corresponding episode of the Nimuendaju version where the woman is seized without exchanging a word once the perception is rectified. The same metonymical structure rests, then, on two different mechanisms.
The reader will be able to scrutinize the text and the automatic interpretation at more leisure. Suffice it to have mentioned that a theme emerges expressed in such a way that it is possible to compare this summary with those of other myths. Further, by permutations or inversions of the abstract properties of the documents, theoretic variants are generated : the corpus is then searched to find whether the artificial myths can be found empirically. For example, it is instructive to reverse the sexual roles in the Sherente myth - reading 'women ' for 'hunters' and 'man' for 'woman' and to compare it with the matrilineal Bororo myth of the origin of men, in which it is the women who besiege a man perched at the top of a tree, and the false perception / true perception opposition is absent but communication is found.
- Quantitative analysis
Quantitative analysis is done by means of the 'dictionary' which has already been alluded to. The results are expressed by two out [160] puts : a tag tally, where each document is tabulated by descriptor according to the position of the latter in the propositional structure, and proportional graphs where each descriptor is quantified comparitively for the whole corpus, For example, it shows in what proportion blood relations are 'subjects' in the whole corpus, and in what proportion among the Sherente, the Apinayé, etc. (for comparative tables of this kind, where twenty categories and their distribution by tribes are contrasted, see Maranda, 1967b). No sequential order appears here. The outputs thus only indicate the size of the blocks or broad constitutive units of each document as they are measured in the tag tally, or comparatively for all the documents as they appear in the proportional graphs.
Although incomplete and rudimentary from the point of view of the total analysis, this quantitative information is valuable in that it measures the depth of the paradigmatic groups linked in the syntagmetic chain of the stories. If it is true that repetition emphasizes the structure (Lévi-Strauss, 1958) and assures the preservation of the information in the transmission of the message (Shannon and Weaver, 1948) the quantitative emphasis which the computer reveals at the paradigmatic level points up main themes. In a general way it can thus be said that the frequencies provide a measure of the degree of concern of myths with specific paradigmatic sets. What can also be learnt from that is that for example, when the constitutive element A appears with a very great frequency, it will be in contexts where B and C are also very frequent, and D, E, F and G generally absent.
A quantitative evaluation of the fields of concentration is thus obtained which gives a paradigmatic depth to the syntactic units indicative of the relative importance of the semantic masses which they link.
References
DUNPHY, D., STONE, P., and SMITH, M. (1965), 'The general inquirer : further developments in a computer system for content analysis of verbal data in the social sciences', Behavioral Science, vol. 10.
GREIMAS, J. (1971), 'The interpretation of myth : theory and practice', in P. and E. Maranda (eds.) Structural Analysis of Oral Tradition, University of Pennsylvania Press.
[161]
GUNDLACH, R. (1965), Ein Dokumentationssystem zur inhaltlichen Erfassung und maschinell en Erschliessung historische Sekundarliteratur, Munich.
LÉVI-STRAUSS, C. (1958), Anthropologie structurale, Plon.
LÉVI-STRAUSS, C. (1960), 'La structure et la forme', Cahiers de l'Institut de science économique appliquée, 99, pp. 3-36.
LÉVI-STRAUSS, C. (1964), Mythologiques I. Le cru et le cuit, Plon. URL.
MARANDA, P. (1967a), 'Computers in the bush : notes for the automatic analysis of Gê myths', in J. Helm, (ed.), 'Essays in the Verbal and Visual Arts’ Proceedings of the 1966 Annual Meetings of the American Ethnological Society, Philadelphia, University of Washington Press.
MARANDA, IF. (1967b), 'Formal analysis and intra-cultural studies', Social Science Information, vol. 6, pp. 7-36.
NIMUENDAJU, (1944), 'Sherente tales', J. Amer. Folklore, Vol. 57.
PROPP, V. (1958), 'Morpbology of the folktale', Publication Ten of the Research Center in Anthropology, Folklore and Linguistics, Indiana University Press.
SHANNON, C. E., and WEAVER, W. (1948), The Mathematical Theory of Communication, Illinois University Press.
STONE, P. J. et al. (1966), The General Inquirer : A Computer Approach to Content Analysis, MIT Press.
[1] The research work partly reported in this paper was supported by the Laboratory of Social Relations, Harvard University, Fourth Pilot Grant, and by the National Science Foundation Grant GS-178.
[2] There are now four different though partly overlapping INQUIRERS. They are, in addition to the 1964 version, the STANFORD INQUIRER (at Stanford University), INQUIRER II (University of Washington), and INQUIRER III (Harvard University).
[3] 'Intransitive' stands for 'intransitive motion', i.e. the actor's own movements ; this is in opposition with 'transitive', for 'transitive motion', i.e. the movement impelled to an object by the actor.
[4] The subjects of the two propositions forming episode 9 are 'puma' and 'sariema'. Since the propositions contain neither 'hunter' nor 'woman', they do not figure in this output. ('Puma' and 'sariema' both belong to the same category, 'human-animal' ; on the level of the normalized text, the two propositions are thus considered as forming one episode.)
|