The Case for a Small Language Model

When a language model produces a sentence, it presents us with a statistical probability based on countless texts it has analyzed. Before it is able to predict the next character in a sentence, it has to cut the writings of millions of authors into fragments to analyze the sequence of characters. They are stripped of meaning and structure, and repurposed as a statistical resource. While the system needs the authors' works as input, the results can never be traced back. But what would happen if a language model created new texts while leaving the original work intact? What if generative AI could be traced back to and understood in the context of the original text?

The Case For a Small Language Model is a speculative AI based on the work of Dutch composer and poet Rozalie Hirs. Her 2021 poetry book Oneindige Zin (Uitgeverij Querido, 2021), which translates to Infinite Sense or Infinite Phrase in Dutch, can be read as one never-ending phrase. The installation shows the entire book printed on five 30-meter-long strips of label printer paper that scroll in both directions. As the five lines move back and forth, a vertical reading allows for new combinations to emerge. Meanwhile, a neural network based on Andrej Karpathy’s Char-RNN analyzes a digital copy of Hirs’ original text and tries to create new sentences based on her work. Initially, the combinations seem random and nonsensical, but as the training of the neural network (running on a low-power Raspberry Pi) progresses, more interesting combinations emerge. Rather than appropriating the author's work as mere statistical data and cutting it into fragments, the system leaves the original text intact. Its output can only be read and understood in the context of the input, as the only way to display it is to move the entire manuscript text left or right.

The Case for a Small Language Network reflects on the role of authorship in generative AI and questions the practice of reducing the written expressions of millions of authors (mostly without their permission) into a statistical resource.

In collaboration with Rozalie Hirs and Jelle Reith, supported by The Creative Industries Fund NL and the Netherlands Foundation for Literature.