Dating Spanish chapbooks: the wonders of artificial intelligence

Cambridge University Library was recently awarded a Cambridge Humanities Research Grant to continue work on the Spanish chapbooks catalogued and digitised under the “Wrongdoing in Spain, 1800-1936” project, as featured in the Cambridge Digital Library. This new year-long project aims to reliably date about 67% of chapbooks bearing estimated dates, often drawn from the printer’s period of activity. To establish more accurate dates of printing for these items, we aim to conduct visual search on woodcut illustrations within the chapbooks to compare prints made from the same woodblocks.

Printing houses used woodblocks (as well as metal stereotype plates in the nineteenth century) to illustrate the chapbooks. Woodblocks were expensive to produce, so printers often had a limited stock that they reused, sometimes through several generations of printers. Earlier woodblocks were crudely made on softwood, but the technique developed to produce much more detailed woodblocks etched with metal-engraving tools on harder wood. More intricate images are typical of the later period, although many older woodcuts continued to be used in later years to cut costs. It comes as no surprise then that wood blocks deteriorated over time, becoming less sharp, developing cracks. We see how, after many printings, the finest lines began to fade, and it is this wear-and-tear that we are hoping to use to our advantage to date the Cambridge Digital Library Spanish chapbooks more accurately.

During the first phase of the project (October 2021-to date) images of the chapbooks were run through a machine learning model created by Oxford University’s Visual Geometry Group. The model was pre-trained on similar Scottish chapbooks from the National Library of Scotland. This process recognized the woodcut images and created annotations to mark them using bounding boxes, but the result was not perfect. Manual input was needed to ensure that the gathering of images suited the parameters of the project. Our aim was to isolate individual woodblock prints (i.e., woodcuts made from a single woodblock). The software missed the fact that some images consisting of two or three separate woodblocks had been combined to make an individual image. It also missed borders and garlands and made “false detections”, so manual input was essential not just to serve our purposes for the project, but also to train the machine learning model to make more accurate predictions in the future.

On the next phase of the project, all the images and annotations, alongside metadata from Cambridge Digital Library, will be imported into an instance of VISE (Virtual Geometry Group Image Search Engine). VISE will allow us to visually search many images (we annotated a total of 18,757 images out of 26,527 scanned images of chapbooks). By using an image or a metadata field as a search query, we are hoping to use machine learning and computer vision to explore relationships between the illustrations and not only narrow down the publication dates of the chapbooks, but also open up fields for research in printing and social history.

Sonia Morcillo García

“Mapping Pliegos” : a collaborative project of Spanish chapbooks

A new online resource for the study of Spanish chapbooks has been made available to researchers. The recently launched Mapping Pliegos portal provides access to 7,000 Spanish chapbooks from the 19th and 20th centuries held in major collections in Spain and in the UK.

Continue reading

Catalan books and Spanish lyrical pieces in the Jonathan Gili collection

A few years ago, we announced the purchase of a collection of Catalan and Spanish books from the library of Jonathan and Phillida Gili. Our blog post featured some of the gems of this wonderful acquisition.

We are pleased to report that this collection is now fully catalogued. Continue reading