New information spaces


Linguist and publisher (traditional and digital)

The digital contribution to the world of encyclopaedias

This article sketches out the history of the modern encyclopaedia, from its origins in the 17th century to the revolution brought by the launch of Wikipedia in 2001, an online work created by volunteers and whose content is freely reusable.

* All the headings in this article come from the poem “On acquiring an encyclopaedia,” by J.L. Borges.

“Here the tiger and the tartar”

The illusion of having all the knowledge of an age (or every­thing known about a given topic) in a single work is tempting for two reasons. On the one hand, it is reassuring to think that knowledge is graspable, even if it is in the form of a work comprising dozens of volumes. And on the other,
it is pleasant to think it is possible to access any aspect of knowledge immediately and directly.

The modern encyclopaedia was born in the 17th century. It had its precursors, although they were not called “encyclopaedias.” Medieval compil­ations, which began with works such as the Etymologies of Isidore of Seville (6th century), did not aim to make know­ledge available to all. It was rather an attempt, in an age when it was difficult to access original sources, to create a compendium of various mater­ials for use by scholars. Intern­ally, it was organised thematically, because the medieval mind baulked at the arbitrariness of an alphabetical order. It was simply unacceptable that Altissumus, the Almighty, an attribute of the divine, should not appear before Abyssus, the Abysm or hell.

The conceptual shock caused by the discovery of the Ameri­can continent led to works being written that, for the first time, sought to observe the world rather than copy existing works. In 1540, the Franciscan friar Bernardino de Sahagún wrote a thematically arranged encyclopaedia called the Historia general de las cosas de la Nueva España (General History of the Things of New Spain) for use by his missionary colleagues. In it he described the language, history and customs of the Aztecs, their land, and its fauna and flora. To enhance the utility of the work he explained: “it is to redeem a thousand grey hairs, because with much less effort than it has taken me, those who wish to, will be able to learn of the history and language of this Mexican people.” This has always been the object of reference works: making know­ledge available at less effort than would be required to compile it personally.

Title page of the Encyclopédie, published by Diderot and D’Alembert in the 18th century. / Photo: Wikipedia.

“Here meticulous typography and the blue of the seas”
However, the continual growth of human knowledge meant that the days of encyclopaedias written by a single author were numbered. Athanasius Kircher, a German Jesuit and author on a vast range of topics, who was justly described by a recent biography as the “last man who knew everything,” died in 1680. Today’s encyclopaedias are collectively authored works, although the contributions are not always signed. Moreover, they are divided into entries which are arranged alphabetically rather than thematically. For the sake of coherence between the different articles, and economy of content, internal references are used.
The famous French Encyclo­pédie (1751-1772) began with the purchase of the translation rights to an earlier English encyclopaedia that had already been a success. But when Diderot and D’Alembert were appointed to run the project it snowballed, ultimately turning into a 17-volume work. The Encyclo­pédie’s place in history is the result of its impact on the world of ideas, although at the time it represented a big step forward in both the quality and scope of this kind of work.

The Encyclopédie included illustrations, which had been a traditional element of reference works (at least since Isidore of Seville), however the quantity and quality of the images exceeded those of previous works. Following the publication of the text volumes a further eleven volumes of plates were published. The illustrations had been drawn for educational purposes, and would henceforth form an integral part of works of this kind. Outstanding examples are the illustrations of workshops in which various trades, such as printer or lute maker, were carried on.

“Here the vast Brockhaus encyclopaedia”
The golden age of print encyclo­paedias culminated with a Spanish work, the Enciclopedia Universal Ilustrada Europeo Americana (1908-1933), known as the “Espasa” for its publisher’s surname. Once again this encyclopaedia started out from an earlier work, in this case an exclusive licence to the famous German Brockhaus and Dreyer encyclopaedias, including the rights to a huge collection of photographs, engravings and colour plates. However, the enthusiastic welcome it received from the public and the publisher’s ambitions took it beyond the initial limits.

This huge encyclopaedia project produced volumes at a rate of three a year. However, in 1914 the production of new volumes was halted due to the disruption to the supply of illustrations, which were printed in Germany. Publication was resumed after the end of the Great War, an event which undoubtedly had an impact on the maps and history articles.

In this age of instant information and immediate updates it is hard to imagine what bringing a printed work up-to-date can involve. When printed encyclopaedias still existed (that is to say, until just a few years ago), producing a new edition meant incorporating recent events, people who had stood out for one reason or another, and new concepts (many of which were from the field of science and technology). But one of the key tasks was what in some publisher’s jargon was called “killing people off”: i.e. adding the date of death to entries for people who were still alive at the time of the last edition but had since died.

“Here the many and weighty volumes”
The Enciclopedia Espasa grew until, on certain measures, it became the largest print encyclopaedia ever. Its core comprised 70 volumes, but the numerous appendices that needed to be published brought it up to 117. And this is one of the biggest constraints of print encyclopaedias, namely that at the moment of publication they The biggest revolution in the world of encyclopaedias began in January 2001, with the launch of Wikipediaare closed to new material. To include events taking place after publication, annexes were compiled describing the changes since the previous edition. But this meant that looking up a topic involved first finding the main article, and then looking through all the appendices to find out if there had been any changes! A partial solution was to publish an Index volume that listed which volumes contained the information sought.

Printed encyclopaedias were an essential fixture in institutions such as libraries, and later became widespread in homes in many countries, including Spain. But in the 1990s they began to face digital competitors in the form of CD-ROMs, which offered a reference work that was basically the same as existing paper ones, but with the added features of animations and videos. Internal references took the form of hyperlinks which made it easy to find the required entry. But, in general, these features were not used to create a richer structure than that of paper-based works. Moreover, they continued to be self-contained. For centuries entries had included a bibliography (the Espasa included five million references), but time had to pass and the Internet had to be developed for digital encyclopaedias to start using links to remote content on the web.

Digital works opened up two key possibilities. One was the ability to search the texts as well as the indexes, making it possible to locate items not listed as a main entry: for example, searching for Rigoletto and finding both the entry for the opera and that for Verdi. Many of these electronic works could also be searched using logical operators: entries containing the words Martin and Luther, but not King. And another possibility was to copy and reuse text fragments or even images.

It might be thought that editing an encyclopaedia on CD-ROM or later the web, where greater size does not increase the cost of the paper, would free entries from the tyranny of limited length, but that is not exactly the case: the works of the past sometimes had absurdly long entries (such as in the case of the Espasa encyclopaedia, where contribu­tors were paid per page), and for a reference work it is often more important to be precise than to go into great detail.

The Enciclopedia Espasa grew to become the largest print encyclopaedia ever. / Photo: Wikipedia

“Here the boundless miscellany that knows more than any man”
However, the biggest revolution in the world of encyclopaedias began in January 2001 with the launch of Wikipedia, an online reference work created by volunteers’ contributions, and whose content was freely reusable. By 2005 the English-language version was bigger than the Espasa encyclopaedia. Today it has over 20 million entries, in as many as 282 languages, created by more than 31 million registered users.

Wikipedia was revolutionary in a number of ways, but not in terms of its structure: it is basically like an 18th century encyclopaedia, but with the multimedia add-ons of CD-ROM works from the 1990s. But, as everyone knows, its entries can be edited and corrected by anyone. A community of volunteers (“wikipedians”) oversees compliance with editing standards, and have built a consensus through a lengthy process of discussion.

Over time a number of issues have been extensively debated, such as the possibility of allowing unregistered users to make edits (this is allowed, although the system logs the IP address from which the modifications are made), or the fact that some of the more controversial articles are protected against editing. One issue that generated a lot of debate was the scope of Wikipedia: should it only allow entries typical of a classical encyclopaedia? Or, given that there are no limits as in the case of paper, should entries on any topic be allowed, for example: on all the Pokemon characters? The option that won out was the more traditional approach.

However, the real key to Wikipedia is that it is licensed for reuse (using a similar mech­anism to that for open-source software): any part of its content can be reused in any way, even commercially. This has stimulated the work of the volunteers working on it: their work is disseminated on all forms of media. Some are commercial, such as the book containing part of the German edition that is sold by Bertelsmann, but others help the needy with free editions, such as the DVD edition produced by Argentina’s Education Ministry for offline use in all the country’s schools.

Here error and the truth
Contrary to what might have been expected, an encyclopaedia written and corrected by people who are not necessarily specialists has achieved a very acceptable level of quality. Experts in collaborative work refer to the “swarm effect”, the undeniable fact that many eyes see more than just a few. Indeed, a study has demonstrated a clear correl­ation between the quality of a given entry and the number of edits (corrections or additions) made to it.

There has been constant controversy about the reliability of Wikipedia, but it has had some important successes, such as the paper published in Nature in 2005 comparing a series of scientific articles in the English edition with their counterparts in the Encyclopaedia Britannica, in which Wikipedia came out on top. Obviously, it should be borne in mind that the work of the editors and publishers of paper-format encyclopaedias has not always been as meticu­lous as might be hoped, so it is pointless a priori to consider one sort of encyclopaedia perfect and another flawed. However, there have also been a number of attempts to create collaborative encyclopaedias under expert supervision, such as Citizendium.

With widespread access to the web, the immediate availability of a variety of content (particularly Wikipedia) encyclopaedias that had been operating for decades suddenly found they faced an unexpected source of competition. Many responded by also going online, in various modes of operation, while keeping their print or CD-ROM format. One of the most famous cases was that of Encyclopaedia Britannica, which allows users to access a portion of its content for free, while charging a subscription for the rest.

But the influence of Wikipedia has also made itself felt: some traditional encyclopaedias have opened up to participation by volunteers, although through the filter of their publishers. This is true of the Gran Enciclopedia Catalana in Catalonia and the Larousse in France. Nevertheless, reader participation was not invented by Wikipedia, but has been a tradition in the world of know­ledge, particularly since the Enlightenment. For instance, the introduction to volume II of the first Mexican encyclopaedia, Diccionario universal de historia y de geografía (Mexico, 1853), says: “We formally invite all lovers of the Enlightenment to provide us with their help. If one person in each state, or each major city, were to devote a few moments of leisure to these tasks, after just a short time we would have such a collection of data that it would suffice to form an interesting compilation.”

“Here time’s memory and time’s labyrinths”
The procedure followed when drafting many encyclopaedia entries has been either to look at what other previous reference works said on the topic (which has contributed to perpetuating errors) or to turn to articles and books. In the digi­tal universe there is an immeasurable volume of materials of all types (texts, images, audio, videos). Could an automatic system ever compile and organise them into a coherent discourse? Such a project would expand the radius of the classical encyclopaedia, as it would include all possible topics and not just those planned for in advance. Implementing this would create a sort of expert system on a topic, but this would depend on progress towards the so-called semantic web (a web that knows what it contains) to effectively extract information from multiple sources.

There are already some prototype systems that, starting out from more or less structured data, are able to create something similar to encyclopaedia entries on people, countries, etc. including graphical elem­ents. Whereas a photograph or video are easily located and reused, an expert system could generate graphics ad hoc, for example: histograms showing the changes in population of one country compared to another, or maps of regions showing the unemployment rate in each area. There are already some applications that collect information and turn raw data into graphics, such as WolframAlpha.

What does this suggest the future encyclopaedia landscape will look like? We could envisage a digital continuum comprising digitised copies of existing books (in the style of Google books), plus databases of scientific articles, virtual libraries, newspaper archives, plus user contributions (like Wikipedia), and a variety of other content. To answer a specific question, an intelligent agent, which we will call the Encyclopaedia, collects and assesses information and finally writes a text accompanied by multimedia elements.

Such a system would have information that was permanently up-to-date. It could indicate the source of all the data it provides. And it would prepare entries with the length and structure defined in its parameters. What is more, if users running queries and experts on the different topics rate the accuracy
and appropriateness of its answers the agent could learn over time.

Thus, from the prodigious wise men of the past, whose mind was a compendium of all the knowledge of their age, the armies of contributors and editors of the biggest print and digital encyclopaedias, we could arrive at subtle sets of algorithms, with feedback from humans, who as Bernardino de Sahagún might say, cast their nets in the ocean of digital knowledge to achieve, by other means, what a brochure for the Espasa encyclopaedia said in the 1930s: “The need of the age is to know. To know everything, and now.”

Profile: José Antonio Millán

  José Antonio Millán is a linguist and publisher in both traditional and digital formats. As the publishing director of Taurus Ediciones he was responsible for publication of the Spanish version of David Crystal’s Encyclopaedia of Language. In 1995, jointly with Rafael Millán, he created the first CD-ROM version of the Spanish Royal Academy’s Diccionario de la lengua.

Over many years he has reviewed lexicographical works in the national newspaper El País. Some of his work is compiled at

In his work as an analyst and critic of the emerging field of digital publishing (compiled on his blog: http:/ on many occasions he has dealt with the possibilities and implementations of the electronic support for the publication of information and access to it.

Published in No. 07

