Pilot edition of Murray Scriptorium now online

My co-editor Stephen Turton (Gonville & Caius College, Cambridge) and I have just published the pilot edition of the Murray Scriptorium, a long-term project whose aim is to produce a fully annotated and digitized scholarly edition of the correspondence of Sir James Murray (1837-1915), chief editor of the first edition of the Oxford English Dictionary. Much of the research for the OED was carried out though the medium of letter-writing, and Murray wrote so many inquiries to his vast range of correspondents that in the early 1880s the Post Office installed a pillar-box outside his house at 78 Banbury Road, Oxford.

In pursuing the meaning and history of words, Murray exchanged letters with prime ministers (e.g. William Gladstone), distinguished writers of the day (e.g. George EliotThomas Hardy), subject experts and academics both professional and amateur (the Director of Kew Gardens, men and some women of letters such as Professor W. W. Skeat in Cambridge and the medievalist Lucy Toulmin Smith), as well as ordinary individuals whose identity is now unknown (Dear SirDear Madam).

Our pilot edition, designed by Huber Digital, contains a selection of letters transcribed from photographic scans (checked against the original documents where possible) and marked up in XML in accordance with the open-source protocols of the Text Encoding Initiative. Supporting resources include an Introduction explaining the general interest of the letters as well as the role they played in the making of the dictionary, while the letters themselves are searchable by author, recipient, subject, location, etc. Editorial commentaries explain the special character of the dictionary along with some of the features illuminated by individual letters, e.g. the contributions made by women to a largely male-dominated project, the difficulties in including obscene vocabulary, the treatment of World Englishes. 

Most of the letters in this initial stage of the Murray Scriptorium are held by the Bodleian Library, part of a huge collection of family papers donated to the library in 1996 by K. M. Elisabeth Murray, the editor’s granddaughter and author of a best-selling biography of him, Caught in the Web of Words (1977). Others have been transcribed from the other main holding of his letters, at Oxford University Press, which employed Murray on the OED from 1879 till his death (when he was part-way through editing the letter T), as well as from smaller collections in public or private ownership. Later instalments will represent both archives more fully as well as those further afield, while enhancing its inbuilt digital resources and expanding the range of topics covered. 

Launch of Murray Scriptorium website

Visit http://murrayscriptorium.org to see the holding page for the new Murray Scriptorium project. You can also download an advance copy of the forthcoming article:

  • Brewer, Charlotte, and Stephen Turton (2021). ‘Aggravated mischief: editing and digitising the papers of Sir James Murray’, Dictionaries: Journal of the Dictionary Society of North America, forthcoming.

Mapping the Bodleian Library’s Murray Papers: pilot project to edit the correspondence of J. A. H. Murray (1837–1915), chief editor of the OED

UPDATE MARCH 2021: this project will now be funded by the British Academy/Leverhulme Trust

The author of this website, Charlotte Brewer, has just been awarded a grant from the University of Oxford’s John Fell Fund to create a pilot edition of the Murray Papers, pending the result of an application for funding from an external source. I’ll be working on this fascinating project with Stephen Turton, who recently completed an Oxford DPhil in historical lexicography (see Turton 2020a and 2020b).

During his editorship of the Oxford English Dictionary, J. A. H. Murray posted so many letters in pursuit of his inquiries that the Post Office installed a pillar box outside his house at 78 Banbury Road, Oxford. Research conducted by correspondence played a vital role in the construction of this revolutionary new dictionary, published in instalments by Oxford University Press between 1884 and 1928. As Murray himself described, a typical day might go as follows

I write to the Director of the Botanic Gardens at Kew about the first record of the name of an exotic plant; to a quay-side merchant at Newcastle about the keels on the Tyne; to a Jesuit father on a point of Roman Catholic Divinity; to the Secretary of the Astronomical Society about the primum mobile or the solar constant; to the India Office about a letter of the year 1620 containing the first mention of Punch; to a Wesleyan minister about the itineracy; to Lord Tennyson to ask where he got the word balm-cricket and what he meant by it […] In fact a lexicographer if he wants to be accurate, has to be a universal enquirer about everything under the Sun, and over it.

Peter Gilliver, The Making of the Oxford English Dictionary, Oxford University Press, 2016

Many of the resulting documents now occupy the best part of 9 linear metres of shelf space in Oxford’s Bodleian Library. Apart from vividly testifying to the OED’s status as a largescale collaborative enterprise, they are a unique and invaluable resource for researchers of late 19th- and early 20th-century English language and literature, the social and intellectual networks of the OED’s compilers, and the late modern history of linguistics and other disciplines.

Though individual letters have been quoted in a number of articles and monographs on the history of the OED and its compilers, they have never been systematically edited or published. The results will be made freely available in a website, with a view to expanding the project over the next few years to support scholarly and popular interest in the OED as it approaches its 2028 centenary.

Stephen and I will be drawing gratefully on the advice and expertise of a number of Oxford colleagues to pursue this project, including Professor Lynda Mugglestone, Dr Peter Gilliver (associate editor of the OED and author of The Making of the OED), and Bev McCulloch, OED archivist. Watch this space for further updates.

Chronological coverage in OED2 and OED3: new scholarship or old?

EOED has just completed its series of pages on the OED’s treatment – both past and present – of different periods in the language. The series starts at Period coverage

Here is a summary of the main points.

Source: www.oed.com
  • The OED3 revision (begun in 2000, now about halfway through the alphabet) is adding enormous quantities of new quotations to its predecessors’ record of the language. Quotations form the evidential basis for the OED, so this increase indicates that the Dictionary’s account of the English language – on both a large and small scale – is changing significantly
  • Unfortunately, we cannot see which these new quotations are! This is because the OED website searches do not differentiate between revised and original entries. Instead, however, we can count up the total of quotations per decade in the current hybrid version of the OED, OED Online (i.e. a mixture of old and new scholarship), and compare the results with the equivalent totals in the pre-revision version of the OED (i.e. OED2)
  • Comparison of this sort tells us that, very broadly speaking, over 1500-1989, the revised OED appears to be reproducing the chronological biases of the old OED (click on the link to Chart 3 in the right-hand image below). So there is a bulge of quotations over the late 16th/early 17th centuries (Chart 8), a dip in quotation evidence in the early to mid-18th-century (Chart 13), and a steep rise towards the 1880s or so (Chart 19). 20th-century coverage is more uneven (Charts 21 and 37)
  • In a decisive departure from the practice of the first edition of OED, OED3 is no longer gathering huge numbers of quotations from major literary and cultural writers as evidence for the history of vocabulary in English. Instead, the revising lexicographers are raiding vast electronic databases of multi-authored sources for its new quotations – newspapers, journals, and periodicals. See discussions at Top sources in OED3, 1800-1929 in OED3, 1930 onwards in OED3
  • Notwithstanding this changed practice, Shakespeare, Chaucer, Milton, Caxton, Dryden, Dickens and hundreds of other male literary canonical writers continue to dominate the list of most quoted sources in today’s OED. This is because the OED3 has simply retained much of the original OED1 quotation evidence rather than archiving the original Dictionary and starting again. In this respect, the OED3 revision is producing a 21st-century dictionary bolted onto a Victorian one (click on links above and Which edition contains what?
  • As a result, women writers remain significantly under-represented in the OED. As of the June 2020 update, OED Online’s own list of top 1,000 quotation sources includes just 28 women. It is impossible to search OED quotations by gender of author, but inferentially the vast majority of contributors to the newspapers, journals and periodicals that OED3 is now favouring as quotation sources will also be male. On OED’s under-quotation of female sources, see further 1700-1899 in OED3 (Chart 14 and the discussion of Frances Burney beneath; Caroline Herschel and Philosophical Transactions) and information on the top individually-authored sources from 1930 onwards in OED3. Preliminary evidence and notes can also be found at Top female sources, currently in preparation. EOED’s 2009 study of the under-representation of 18th-century women writers (funded by the Leverhulme) is under Topics.

To read more, click on the pages below

Period coverage

1150-1499 in OED1/2

1150-1499 in OED3

1500-1699 in OED1/OED2

1500-1699 in OED3

1700-1799 in OED1/OED2

1700-1799 in OED3

1800-1929 in OED1/OED2

1800-1929 in OED3

1930-1989 in OED2

1930 onwards in OED3

OED Text Visualizer tool and the current state of OED Online

OED Online has recently put up a new tool on its website at www.oed.com.

The case for visualization tools such as these is that they represent different categories of quantitatively assessed data in a visually striking way. They are especially useful when they indicate groupings or relationships between constituent elements of the data that researchers might not previously have noticed or considered.

The OED Text Visualizer certainly has the potential to do this. Users can type in text of up to 500 words long to see the etymological source of each words (Germanic, Romance, etc) and when it first entered the language.

In its present form, however, the tool is problematic. The major issue is as follows. As its accompanying text explains, the Text Visualizer draws on two important components of OED Online entries: etymological origin of a word, and date of first recorded usage. What is not explained is that just under half of these two sets of OED data are significantly out of date, in some cases by a hundred years and more, since the entries from which they are derived are as yet wholly or partially unrevised. 

It follows that the results produced by the Text Visualizer represent an undifferentiated mixture of internally inconsistent lexicography, some of it significantly out of date. The tool needs to be reconfigured so that users can distinguish between results derived from modern lexicographical scholarship (i.e. 2000 onwards) from those based on entries first published in earlier stages of the Dictionary (stretching from 1884 to 1989). In its current form the Text Visualizer delivers results which are not yet appropriate for use in academic research. 

The Text Visualizer also provides information on the frequency of use of a word, both in the year the user has assigned to the text and in ‘modern English’. This is valuable, but no account or reference is made to the source of this information, which we may guess to have been Google N-grams, presumably manipulated or adapted in some way. Users of the tool need to know the source of the figures cited so that they can understand the assumptions on which they have been produced. This is a basic requirement for academic research.

One excellent feature of the new tool, nevertheless, is that its results are produced in csv and other formats and hence are far easier to work with for research purposes than the search results currently available on OED Online (see under Search tools below).

A more general comment is as follow. Setting aside the criticisms above, the OED visualization tools so far produced (e.g., geographical origin of vocabulary in English over time) have been captivating but over-determined. That is, they make assumptions about what researchers are interested in. By contrast, it is a widely acknowledged truism that good research comes out of giving researchers free and unfettered access to primary data, so that they can explore and think about it independently. The range of search tools on OED Online already provides a generous range of possibilities for new types of research, though of course we would all like more tools and more/better data to be available (for example, the currently provided information on frequency of head words is unsatisfactory). The problem is that these website tools don’t work well and the results are delivered in an unanalysable format, as described on EOED at OED Online.

OUP is now planning ‘a new suite of tools based on an OED Text Annotator engine,’ of which the Text Visualizer critiqued above is an example. Exciting as such tools are, there are other features of the OED Online website in its current form which are so unsatisfactory as to require immediate attention. Sorting these out is a priority of at least equal if not greater importance than a new set of tools, especially if the new tools repeat the flaws of the existing ones. Here is a list.

Urgent issues for OED Online 

Transparency on date of entries and changes to entries

  • OED Online needs to make it entirely clear to users that its website presents a mix of new, revised, and unrevised entries, some of which have been unchanged or little changed for over a hundred years. Electronic searches should distinguish between revised and unrevised entries, otherwise the results are not usable for research purposes. It is worth pointing out that if users were able to search OED3 independently of OED2, they would be in a position to appreciate the quality and characteristics of OED3’s lexicographical innovation and scholarship. The character and achievements of OED3 are currently under-recognized because they are impossible to identify systematically, i.e. across a range of entries.
  • When significant changes are made to revised entries, these should be flagged. An example is the change made to the definition of marriage after new UK legislation in 2013. The entry continues to be dated 2000. Researchers need to be able to make use of and cite dictionary entries with confidence that the dates they bear are accurate. 
  • Similarly, unrevised entries frequently contain unidentified changes and additions (to definitions, editorial notes, quotations and other components) added since date of first or subsequent print/web publication. Again, OED Online needs to find a way of recording significant changes so that academic users can use and cite Dictionary entries with an understanding of their provenance and with confidence that the date-stamping provided by OED itself is accurate.

Quotation sources

  • A pressing issue for the OED is the unevenness of balance in its most heavily cited quotation sources. These sources are listed on the OED website and accessible via a front-page link (‘Explore the top 1,000 authors and works quoted in the OED’). As of June 2020, only 28 are by identifiably female authors. The reasons for this imbalance are evidently not straightforward but the matter needs to be acknowledged and discussed and the editors should say what they are doing to tackle the issue. For example, it would be extraordinarily helpful if it were possible to search by gender of author, where known. See further EOED pages on Top sources, Fe/male sources.
  • The question of the balance of quotations between white and non-white writers of English is also a salient issue, one that OED will certainly be thinking about. Geographical spread in sources quoted is not a reliable proxy, given that many quotations are from (colonial era) white authors.

Search tools

  • Electronic searching of OED’s text continues to yield flawed results, even when using search pathways indicated by the website. For example, if you click on the top item on OED’s list of top 1,000 sources, which is The Times, and follow the directions to identify the quotations in question, many of the results turn out to be from unrelated publications (Musical Times, N.Y. Times, Financial Times, etc). With large bodies of evidence it is impracticable for users to weed out false results by hand or by subsequent searches.
  • The form in which website results are provided is not usable for research purposes. By contrast, the Text Visualizer’s provision of different formats for search results is exemplary. Similar features should be imported into OED Online. 

Editorial principles and practiceother accompanying information

  • Description of editorial principles and practice. Over its initial 20 years OED3’s editorial practices – and by inference, editorial policies – have varied considerably, e.g. on the provision of and criteria for usage notes and labels of various kinds. Users need full information and guidance here, preferably in one location on the website which is easy to locate, access and search.
  • The ‘About’ section of the website (https://public.oed.com/about/) contains much valuable material (e.g. on the history of the OED) but is hard to navigate. Users are often unaware of its contents. It needs to be completely reorganized, with content properly indexed and pages dated. 

EOED re-launched

November 2019 sees the launch of the new version of Examining the OED. The site has been rewritten and reorganized and lots of new material added – notably under Period coverage, where we look at the changes the new version of OED (OED3) is gradually making to the OED’s picture of the chronological shape of the language.

To find your way around, have a look at our new Contents page and (for a list of all material on the website) Site map.

All feedback and suggestions welcome.