The newspaper utilities thing can be configured with yaml files specifying what kinds of transformations to make; that became the idea behind DHMarginalia, a way of configuring what you wanted to analyze, which I see becomes Polybius which becomes more about the storytelling rather than finding the story.
The Historic Places Explorer can read other csv files, so if you deployed it without the historic places dataset, you get presented with a file upload card and it’ll try to read all that; with a bit more work it could become a bit more swiss-army-knife like (maybe add a way during the upload for the user to specify which columns should be mapped where, sort of thing).
The last suite of things I pulled together in January (and ok, one or two days in February) return to my happy place, sonification. I’ve published a couple of articles related to sonification of data (and not just conventional mappings, either!) as well as a creaky tutorial for the Programming Historian that could stand some overhauling. Some of the most engaging student work I’ve been privileged to be part of were sonification projects pushing public history in neat directions. There’s something wonderful about refusing to visualize or engage with data the way everyone expects you to, a productive contrariness.
On the other hand, doing things differently risks being ignored. If what you do isn’t legible to whoever controls the reward structure in your discipline, that’s a not insignificant risk.
“People think that stories are shaped by people. In fact, it’s the other way around.
Stories exist independently of their players. If you know that, the knowledge is power. Stories, great flapping ribbons of shaped space-time, have been blowing and uncoiling around the universe since the beginning of time. And they have evolved. The weakest have died and the strongest have survived and they have grown fat on the retelling . . . stories, twisting and blowing through the darkness. And their very existence overlays a faint but insistent pattern on the chaos that is history. Stories etch grooves deep enough for people to follow in the same way that water follows certain paths down a mountainside. And every time fresh actors tread the path of the story, the groove runs deeper.
This is called the theory of narrative causality and it means that a story, once started, takes a shape. It picks up all the vibrations of all the other workings of that story that have ever been. This is why history keeps on repeating all the time […]
It is now impossible for the third and youngest son of any king, if he should embark on a quest which has so far claimed his older brothers, not to succeed. Stories don’t care who takes part in them. All that matters is that the story gets told, that the story repeats. Or, if you prefer to think of it like this: stories are a parasitical life form, warping lives in the service only of the story itself.”
Pratchett, Witches Abroad
“There’s always a story. It’s all stories, really. The sun coming up every day is a story. Everything’s got a story in it. Change the story, change the world.”
Pratchett, A Hat Full of Sky
Consider a world where we have devised a way for every story ever written, every commentary upon every story, every little bit of thing that can be expressed in text, to be collapsed and jumbled altogether. Assume further that we have worked out a way of compressing all of that such that each little thing that there can be, can be expressed as a list of numbers. Position in that list signifies some distance along a particular dimension. Then consider the consequences for us when we went out of our way to plumb such a system into the digital control systems of modern life.
The result, I think, is a world where narrative causality actually exists. We have created narrativium. The system isn’t artificially intelligent. It’s a parasitical life form, warping lives in the service only of the story itself.
And all of the information in that system – which was following grooves anyway, which you’ll be surprised to know can be called ‘culture’ – is now channelling the cuts deeper and deeper. Train a system on fiction where rogue AI takes over the world, and the story duly told will be one where the AI performs trying to take over the world. It will push you there.
How can we map/measure/perceive the narrative causality?
Let’s take a story, and create an embedding model of it. Here is the story of Little Red Embedded Hood:
triples = [
['Red Riding Hood', 'lives_in', 'Village'],
['Red Riding Hood', 'is_child_of', 'Mother'],
['Mother', 'gives_basket_to', 'Red Riding Hood'],
['Red Riding Hood', 'visits', 'Grandma'],
['Grandma', 'lives_in', 'Forest House'],
['Wolf', 'lives_in', 'Forest'],
['Wolf', 'meets', 'Red Riding Hood'],
['Wolf', 'eats', 'Grandma'],
['Wolf', 'disguises_as', 'Grandma'],
['Wolf', 'waits_in', 'Forest House'],
['Woodcutter', 'hunts', 'Wolf'],
['Woodcutter', 'saves', 'Red Riding Hood'],
['Grandma', 'is_at', 'Forest House'],
['Wolf', 'is_at', 'Forest House']
]
And we’re going to express this knowledge in three dimensions (nothing special about ‘3’; I just wanted to keep things simple; I used Ampligraph 2 in a colab notebook hacked and kludged a bit to solve dependency hell).
A lot hinges on how we do the expression. There are a variety of mathematical techniques we could use; here, we’ll use ComplEx, DistMult, and RotatE, where relationships are represented in slightly different ways, with different consequences- basically, something like ComplEx can understand that the directionality implied in a ‘student’ -> ‘teacher’ relationship is different from that implied by a ‘teacher’ -> ‘student’, while DistMult will treat them the same (‘there’s a relationship here, don’t bother refining it’). RotatE on the other hand sees relationships as rotations around a circle, and that lets it model relationships that are symmetrical, are inverted, and composite (where if A connects to B connects to C, there is an implied linkage A to C). That’s about as complex as I’m going to get here talking about the math.
Now, I’ve expressed the story using those techniques giving me three models. Let’s look at the actors and elements of our story along the first dimension for each model.
This geometry, ‘ComplEx’ , has placed the Village at one extreme and Mother / Red Riding Hood at the other (my dots and labels don’t line up well, don’t sweat it). In the first dimension of a ComplEx model, entities that share many of the same “active” and “passive” relations often clump together. Since the Wolf, Grandma, and Woodcutter all converge at the “Forest House,” in the triples the model can’t quite tell them apart.
RotatE views relations as rotations (steps). Its Dimension 0 looks like a linear progression of the plot. The Start (Far Right): We begin at the Village. The Traveler: Red Riding Hood is the next point over, moving away from the Village toward the centre. The Destination: The Forest House, Grandma, and the Wolf are clustered in the middle. This is where the “climax” of the story happens. The Resolution (Far Left): The Woodcutter and Mother are at the opposite end.
DistMult is symmetric, so it’s not looking at journeys; it’s looking at who is “like” whom in terms of their relationships. The Outlier: Grandma is all by herself on the far left. She is the only one who is “eaten” and “visited” and “lives_in” a specific place without hunting or giving. She is a unique “node of vulnerability.” The Protectors (Far Right): Mother and Woodcutter are together. They are the “Adults” who provide or save. The Story Core (Middle left-ish): Red Riding Hood, the Village, and the Forest House are in the middle-ish. They are the “Connectors” that hold the triples together. The Gap: Notice how the Wolf is isolated in the middle-right-ish. He is near the Forest, but far from Grandma. In this model, the “Empty Space” around the Wolf is his alienation. He doesn’t belong to the Village group, and he doesn’t belong to the Protector group. He is a predator in a void.
NOW. Consider the kind of story/groove/world that each of these models implies, above and beyond ‘mere’ narrative. Can you see the consequences? Can you describe the nature of the groove?
~
Embeddings underpin the large language models that have been shoehorned into everything, polluting everything, changing us, pushing us. My embeddings above are built on knowledge graph triples; the big models use the raw text of the world itself, and perform all kinds of transformations as they go. But in the same way there’s a little four-legged hairy mouse-like creature in the ancestry of every mammal, my simple models capture the main bits. Don’t get too hung up on it. The key thought: what are large language models but Pratchett’s parasitical life-forms, ‘warping lives in the service only of the story itself’? What stories do the transformations themselves prioritize, give parasitical life to? What is their nature? What do they prey on? What do they need in order to express themselves?
In my classes, and in my Practical Necromancy book, I describe one possible way of trying to map this impact; there I called it ‘mapping the behaviour space’ drawing explicit connection with how archaeologists validate and understand the outputs of agent based simulations. With the texts that LLMs generate, one could sweep the hyperparameters for a particular prompt a thousand times, and then topic model the results; the resulting visualization would draw your eyes to the grooves/gravity wells/attractors around that particular prompt (in the book, following the ‘necromancy’ metaphor, I called them the ghosts in the data).
But, instead of looking at consistencies over output (the attractors, the grooves, the ghosts), I want to suggest the idea that the space between the grooves could be where we break free, where we change the story.
Change the story, change the world.
Digital humanities is about deformation. So I started developing a sonification, to make all this (even) strange(r). In ‘Wikiphonic’, https://shawngraham.github.io/wikiphonic/ I developed a little webtoy that pulls from the Wikipedia api articles that are geolocated close to the user’s position (assuming that personal familiarity with the subject matter will help with the intelligibility of the result). The user selects a starting article. The short snippet for each article is pushed through an embedding model on the user’s device, expressing the text in 384 dimensions. I slice those up, and map various sound/music transformations to it. The user selects the second article, and it too is transformed. Then the app subtracts the second vector from the first, and turns the result into music.
It’s not pleasant music; that’s not the point. The point is that you now have generated something new from the latent space between the two articles, what an article if it existed there in that place would sound like. That location is where the groove-in-the-hillside is weakest, what Narrative Causality has avoided. In the same way my little visualizations of the different embeddings for Little Red Embedded Hood capture the models’ different visions of the world (and so, the space between points would suggest different things in the different embeddings), Wikiphonic is capturing the space between the parasitical life forms, making it present for you to explore.
If we understood the space between (or even if we could merely perceive it), we’d know what we’re dealing with, I think. Understanding the space between, I think, could give us a wedge into understanding the deep grooves being cut for us. Literally, I am trying to get us out of our ruts. You might not like my approach to the problem, and yeah, for a first attempt, this is still quite thin. My metaphors could use some work. I feel no shame in mixing them. And the way generation works versus vector analogies and so on… yeah, still lots of things to think about. That’s fine; it’s ok to be wrong. But we need to develop our own (dh) ways of exploring these parasitical life forms, Pratchett’s Narrative Causality made flesh in the world:
On January 19th, the National Trust for Canada (a heritage charity) raised the alarm about the imminent closure of the Canadian Register of Historic Places website and database. The register is at https://www.historicplaces.ca/. The National Trust’s post contains the details about which ministers at the Federal and Provincial levels to contact to protest this act of vandalism. It’s one thing to sunset a website that is creaky and becoming too awkward to maintain (especially I’ll bet in the face of AI scraper bots). It’s quite another to do so without any word or seeming plan on what will happen to the data. A lot of the data exists as paper records in offices scattered around Ottawa, Hull, and Gatineau, but there’s something to be said for having all of that information online!
Each entry in the enormous json file (I should probably split the json up and load it in chunks) has the following kinds of values and keys:
{
"id": 13983,
"name": "Steeves House",
"other_names": "Steeves HouseHon. William Henry Steeves House MuseumMusée de la maison de l'hon. William Henry SteevesW.H. Steeves House MuseumMusée de la maison W.H. SteevesSteeves House MuseumMusée de la maison Steeves",
"location": "40 Mill Street, Hillsborough, New Brunswick",
"address": "40 Mill Street, Hillsborough, New Brunswick, E4H, Canada",
"province_territory": "New Brunswick",
"latitude": 45.92525,
"longitude": -64.64383,
"jurisdiction": "New Brunswick",
"recognition_authority": "Province of New Brunswick",
"recognition_statute": "Historic Sites Protection Act, s. 2(2)",
"description_of_place": "The Steeves House is a two-storey wood-frame Neo-Classical-inspired house built between 1812 and 1840, with several additions and modifications since its original construction. The residence is located on a 3,481 square metre lot on Mill Street in the Village of Hillsborough near the Petitcodiac River. It currently serves as a museum relating to the Steeves family and to the history of the region.",
"heritage_value": "The Steeves House is designated a Provincial Historic Site for its association with Hon. William Henry Steeves. The house is the birthplace of the Hon. William Henry Steeves, one of New Brunswick’s Fathers of Confederation. He was a judge in the Lower Court of Hopewell, New Brunswick, as well as the first postmaster of Hillsborough and the first Minister of Public Works in New Brunswick. He and several of his siblings operated a mercantile and international lumber export business with headquarters and several stores in Saint John, New Brunswick and offices in Liverpool, England. Steeves was an appointed New Brunswick delegate to the 1864 Pre-Confederation Conferences in Charlottetown and Quebec City. He assisted in the creation of the “Seventy-Two Resolutions” at the conference in Quebec that formed the framework for the Canadian Constitution. The Hon. William Henry Steeves is recognized by the Federal Government as a Person of National Historic Interest for his significant contributions to Canada. The Steeves House is also recognized for its association with the Albert Manufacturing Company, later taken over by the Canadian Gypsum Company. For about 100 years, this company was the principal employer in the village and environs. In 1871, the house became the residence of the plant manager of the gypsum mill. There were at least seventeen mill managers who eventually resided in this spacious home. \n\nSource: New Brunswick Department of Wellness, Culture and Sport, Heritage Branch, Site File: “Steeves House” #132.",
"character_defining_elements": "The character-defining elements relating to the placement and grounds of the Steeves House include:- location offering sightlines to the Petitcodiac River and the site of the former Canadian Gypsum Company.\n\nThe character-defining elements relating to the architecture of the house include:- original one-room cottage discernable from nearly 200 years of extensions and alterations relating to the progression of occupants of the home;- large attached barn;- central wooden door with sidelights- evenly-spaced 6-over-6 windows;- bay window on the east façade with six 4-over-4 narrow windows affording a view of the Petitcodiac River;- wide corner boards with capitals and original clapboard siding;- window and entrance entablatures;- two chimneys, one of which is placed at right angles to the ridge board;- raked chimney visible in the attic;- masonry constructs in the basement, including two large water cisterns used to heat the home and thick tapered stone walls.\n\nThe character-defining elements relating to the interior of the residence include:- four fireplaces;- curving central staircase with mahogany railing;- original crown moulding in the dining room and bright blue tiles around the fireplace, said to be from the 18th century;- narrow servant back staircase featuring thick glass inlays in the stair treads;- several unusual storage areas with concealed shelves.",
"construction_date": "1812/01/01 to 1840/01/01",
"significant_dates": "1871/01/01 to 1871/01/01",
"architect_designer": null,
"builder": null,
"function_category": "Leisure",
"function_type": "Museum",
"fpt_identifier": "1850",
"location_of_documentation": "New Brunswick Department of Wellness, Culture and Sport, Heritage Branch, Site File: “Steeves House” #132.",
You can see there’s a lot one might do with that kind of rich data! So kudos to Parks Canada for putting up such well-structured data in the first place. One respondent to my initial posts on scholar.social suggested trying to integrate with wikidata; that’s totally doable though I’d have to go learn how.
What would you do with rich data on over 13 000 historic sites?
When you go to one of the big museums, you expect to see cool displays, cool interactives, gee-whiz digital wizardry. Smaller museums, not so much. I wanted to figure out what could be done for a smaller outfit. So I started playing.
I want to be able to display a 3d digital object so that everyone clustered around the display sees the correct view – if you’re at the back, you see the back of the object while at the same time the person at the front sees the front. One lucky viewer could drive the thing using a mouse to spin it around, zoom in, zoom out, and everyone else would see the correct view. And why stop at a mouse? Why not use hand gestures to control the display? An ultra leap motion controller isn’t cheap, but it isn’t all that expensive either, about $260. So I thought it’d be good to figure out how to do that too.
The vision: simple software & reused hardware combined in an attractive mount, allowing visitors to interact with your 3d content without fancy goggles, or expensive components.
The positioning of that ultraleap motion controller, on second thought, would probably be better along the bottom front edge, so you don’t look like a maniac when you manipulate the digital 3d object. But I digress.
What you need
3d model file of an object or artefact (lots of different ways to achieve this. I’ll assume you’ve already got some. Otherwise go to somewhere like sketchfab.com and see if anything strikes your fancy from the free downloadable ones).
a spare second monitor connected to your computer, or an ipad.
ultra leap controller if you’ve got one, but not mission critical.
clear acrylic sheet (11″ x 14″, $12, from Michael’s; other sizes are available. Go big!)
I am not handy, and I ended up cutting my hand. I was using a draw-knife to cut the acyrlic sheet. I laid the template down and scaled it appropriately to make the most of my sheet (I ended up with a pyramid 15 cm x 15 cm across the base), traced the edges of the different faces of the pyramid onto the acrylic, and away I went. I used tape to hold it altogether. Were I more crafty I’d use some kind of clear glue. But for an experiment, this is fine.
To point it at your own model, you just change the filenames; the page is expecting an .obj file, a .mtl file, and a texture file in a subfolder. Those three files together specify the geometry and the look of the finished model. The lines in index.html that you’re looking for are lines 77 to 86 (the .mtl file itself has a line that points to the texture, so you shouldn’t need to worry about that):
const mtlLoader = new MTLLoader();
mtlLoader.setPath('dachshund/'); // change this to the name of the folder with your model
mtlLoader.load('Dachshund-bl.mtl', (materials) => { // change the name of the .mtl file
materials.preload();
const objLoader = new OBJLoader();
objLoader.setMaterials(materials);
objLoader.setPath('dachshund/'); // change this to your folder again
objLoader.load('Dachshund-bl.obj', (obj) => { //and change this to the name of the .obj file
Now, assuming you have python on your computer, you can start a webserver in the folder that has your index.html file with python -m http.server 8000 and go to localhost:8000 and you’ll see the model correctly positioned. Spin the model, and the different views spin correctly to keep everything aligned.
If you don’t have python, or that sentence made no sense to you at all, zip your folder up (the one that has index.html and the subfolder with your model in it) and go to https://app.netlify.com/drop . Drag the zip folder onto the circle in the middle of the screen, and the Netlify service will set you up with a webaddress where you can see see your model arranged rather like this:
With a second spare monitor connected to your computer, drag your browser window over to that second monitor. Lay that monitor down flat. Place your pyramid broad side down. Make sure its brightness is turned up, then turn off the lights. Peer into the pyramid at eye level. Voilà! Your model floats in the air – and if you have friends gathered around, they will all see the correct aspect of the model from their vantage point. Use your mouse to spin or zoom the model.
The screenshot below is blurry because a) my hands shake and b) I made the acrylic dirty as I cobbled the thing together.
I intend to do this with a raspberry pi, and a spare monitor, where I 3d print a case for the monitor and the computer. I also intend to use gesture control so a visitor can move the model around with their hands, adding a sense-of-touch-at-one-remove, as it were. The script for gesture control is at https://github.com/shawngraham/mouseLeap . It’s still a little finicky, but if you only have the one browser window open when you run that script, it should work fine. Hey, it’s early days.
So there you have it. A single-page website that loads your model and handles displaying it and interacting with it so that, with a pyramid placed on the screen, the reflected images hang in space and can be viewed from four different directions correctly. The bigger the pyramid, the better.
My kid and I were listening to a podcast about Jack the Ripper. We started talking, and I mentioned how newspapers would reprint stories from each other. This led to us developing an interesting question: did our local newspaper print anything about the Whitechapel murders, and if so, what would that have meant to the people of this community? I filed the thoughts away for future reference, but then later saw some posts about Anastasia Salter’s session at the MLA on agentic coding for the humanities (January 2026). I looked up her course materials, and thought I would follow along with them, using their guidance for prompting Claude Code with our initial question.
Claude was great and making a nice one-page html visualization from the eventual analysis of the OCR. Anastasia’s directions were clear and we had fun. But the tricky bit – ’twas ever thus – was the bloody OCR. Our version used pytesseract. Vision models from Google etc do extremely good OCR, if you’ve got an api key and are paying for it. I wanted to keep things on my local machine though. So I futzed with pytesseract, paddleocr, and surya. Pytesseract is fast but all over the map; Surya is pretty good but can get flumoxed; paddleocr just freezes my damn machine up. But the hardest bit was just getting the newspapers chopped up into small enough bits so that I could process them.
The Shawville Equity was scanned some time ago by the provincial archives, the BANQ; here’s the very first issue from 1883. There is a text layer in the pdf from the BANQ, but it looks to have been done through an automatic process without human intervention, so images are sometimes askew and the underlying text is often very very poor indeed (look up the work of Ian Milligan on the consequences for research of bad newspaper OCR).
The process that ended up working best involved counting on the Equity to maintain its 5-column layout and horizontal spacers. The Equity, as befits a publication with over 140 years of history, has gone through some layout changes over the years.
That slight skew drives me nuts.
Anyway, we preprocessed the paper by trying to identify those vertical lines and horizontal spacers then chopping the image up accordingly. Because the OCR stuff memory-wise works better on smaller images, we also chopped up anything longer than 2000 px. The coordinates are all mapped out in the output json so that everything can be stitched back together again.
Oh, and yeah: The Shawville Equity did publish stories about the Ripper. And at the same time, they printed a bunch of stuff about the Burke and Hare murders too, for good measure. They published a few stories shortly after the murders in Whitechapel started, and then returned a few years later for good measure. So still an interesting question to explore…
I need to workshop my titles more. But anyway: this post reflects on teaching the history of the internet to a class of 160 first year students in a world where generative AI shabbiness is pushed on them and a perfectly rationale way to deal with the myriad pressures and bad choices of being a student is to go ahead and use it. What’s a prof to do?
The first thing I tried to do was use the metaphor of going to the gym: you go to exercise your body and get stronger. If there was a machine that lifted weights for you, could you go, turn it on, point to it and say, ‘look, weights have been lifted! I have therefore exercised!’ No, you cannot. But – the same error is made in university classrooms all the time. Look! An essay has been written! Give me my grade! And I don’t need to spill any more photons, bits, or ink over the instrumentalization of higher education that has led us here. Instead, here’s how I tried to deal with it this term. And no, I didn’t put any trojan horse prompts into my assignments.
Instead, I chose to focus on reading and notemaking.
By hand.
I asked all students to get a little paper notebook. I showed them the readings; I showed them hypothes.is; we talked about how to read and what to pay attention to (“don’t read it through like a novel! Read like a predator! Go to where the game is!” etc etc). Then in class, I asked them to do two things for a given reading: write a rhetorical précis (using a model developed by historian Chad Black) and a research memo-to-self that pulls together one’s observations and annotations. They had to do this cold. In my lecture hall. No computer. No phone. No notes. (Students with accommodations: I made accommodations.)
I also told them: we’re doing this multiple times throughout the semester. You’re going to have off days. I’ll take your best 3 of 4 examples for grading. And we graded at first for the format, for the shape of what we were after, and then started pushing them towards deeper engagement with the content. They were always encouraged to filter these ideas through my lectures too. A pretty good example (though not perfect) of what we were after is this composite of a couple of student’s responses, after reading a longer blog post by Doctorow on Enshittification:
PRECIS MAJOR CLAIM: Doctorow argues in his McLuhan lecture on enshittification (2024) that platforms degrade through a three-stage process of user exploitation, business exploitation, and shareholder extraction leading to a world of digital decay known as the enshittocene. HOW: Doctorow develops this argument through a detailed case study of Facebook, tracing the three stages of enshittification (from user surplus to business surplus to shareholder surplus) while systematically dismantling the historical constraints that once prevented such decay, and showing how the erosion of competition, regulation, self-help, and labor power enabled the collapse of digital trust. PURPOSE: The author’s apparent purpose is to diagnose the systemic decay of digital platforms and show how it spreads across industries in order to empower users, workers, and policymakers to reverse the trend and build a more equitable, open digital world.
MEMO INITIAL OBSERVATION: WHAT IF there’s a connection with the Bory piece; what if tech ceos believe themselves to be the hero of the journey? This’d create a cultural narrative in which enshittification is not a failure, but a necessary stage of progress. THEN this mythos might normalize the extraction of surplus from users, workers, and business partners, treating exploitation as a form of “service” or “evolution”? #to-investigate #possible-thesis KEY: The reading matters because it reframes enshittification not as a technical process, but as a cultural one. #cultural-processes MY CONTRIBUTION: Doctorow’s framework shows how platforms collapse through a three-stage exploitation process: user → business → shareholder. There’s a connection here with Bory’s critique, which reveals that this process is culturally enabled by a narrative in which the founder is the hero, and the platform is the vehicle of a moral mission. When founders say, “I created this to serve humanity,” they are not just describing a product; they are enacting a myth. And when that myth is accepted, enshittification becomes not just a crisis, but a natural consequence of leadership.
These for the most part got better as the term went on. However, it took us longer to grade them than I would’ve liked. I transformed the final exercise from another round of precis/memo combos (we’d do 2 per session) to one last class workshop on ‘how to write with these things’ (where grading was pass/fail did-you-do-the-thing?-full-points).
The idea is, a student would look at these precis/memos and think to themselves, ‘what’s the story here? How do these observations speak to one another?’ How you look at things – ie, historical theory – guides your attention to some ideas rather than others. It being the last week of term, I wanted to do something fun first to get them in the mood, so today we did a kind of team debate-cum-tournament style sort of thing, where suggestions for the most important people/ideas/technologies of the history of the internet were gathered. These were arranged into a bracket. For each round in the bracket, I suggested a different lens through which the disputants were to make their argument for the greater importance of their person/idea/technology. Winners were chosen through applause from the class (y’know, I forget the winner? But I think it was between the ENIAC women and Vannevar Bush). And do you know, students were drawing some pretty nifty arguments from their precis/memos to do this, bouncing ideas off one another. It was neat to see! And difficult: the power went off during class and we did this via the blackboard and cellphone flashlights (internal lecture theatre without windows).
On wednesday this week, the idea is the students will have their precis/memo combos ready to hand. I’ll say, ‘let’s assume we’re looking at the history of the internet through a social history lens. What have you got that speaks to that or could be informed from that?’ The idea is, they’ll make a list (with page & paragraph numbers, since they’ll have numbered the pages in their booklets) of these interesting observations. We’ll do some think-pare-share: show your neighbour what you’ve got. Then, I’ll have them create an outline with each element they have, beginning with: where’s the question here? They’ll reorder their useful observations such that there appears to be an emergent story or argument. At that point, I’ll ask them to think about ‘what is missing? What pieces of connective tissue do you have to write?’ … and they’ll then make quick notes about what they’d need to look into or write to make the tissue of observations whole.
This will be what they need to do for the final exam, so I’ll give them the exam question on Friday (in the exam room: no aide-memoire. They’ll have had to work through their materials before going in). I’m feeling pretty good about this.
And that’s how I’ve moved through reading -> note making -> thinking -> writing in an age of generative AI.
Yes, this was a lot of work. And I find language models interesting to explore. But that doesn’t mean I think they have any business in a first year class.
…in which I try to retrain/fine-tune a spaCy model on Latin inscriptions
There is a lot of Roman epigraphic data online; the EDH is a great source for this. But none of the databases (at least, the ones that I have looked at) seem to provide a version with structured demographic or onomastic or whatever data derived from the inscriptions. Presumably that data is out there – epidoc formatted xml would have what I’m after, I should think – but I thought, what the hell: how hard could it be to train a spaCy model to read Roman inscriptions, which are after all famously formulaic? They don’t call it ‘the epigraphic habit‘ for nothing, right? If the average Roman could read them and understand – with their relatively low level of functional literacy – then a machine should be able to do this? …Right? Reader, it was harder than I thought.
Be Warned: My epigraphic experience is limited to the scintillating world of stamped Roman bricks. And it’s been over twenty years since I really futzed in any meaningful way with Latin. Caveat lector.
The idea is therefore:
download real data
annotate the data with the start and end positions of the different kinds of structured data that I am after
enhance this data with synthetic examples so that I get enough coverage of the different kinds of elements (the origin of the deceased in a funerary inscription is not as common as listing their cognomen, right? So training on exclusively real data would overfit on some things and miss others, right? That was my logic).
harmonize the synethic data with the real data so that any annotation label glitches in step 2 get sorted out
fix alignments so that annotations do not overlap
train.
(as an aside, how the f*n hell do you get the Gutenberg editor to give you a numbered list? THIS kind of shit is why I don’t blog very much any more: it’s such a bloody pain in the ass!)
Yes, I had help from Claude Haiku 4.5 and Gemini 3 Pro Preview for the fiddly bits. I downloaded data in two tranches. The first batch I tried downloading via the API and so got the inscriptions but didn’t realize I was leaving a lot of useful metadata behind – the second tranche I got from the EDH data dump website itself, where some of the metadata was provided by virtue of the column headings. I dropped the first tranche through a local LLM (Qwen 3) with instructions on returning jsonl data with annotations… that was an enormous pain in the arse and ultimately largely a waste of time. But I did get around 450 lines of stuff that was annotated sufficiently I could use it. The second tranche was easier- I downloaded the dump, filtered columns using Excel so that I got around 750 rows where an inscription had metadata for each column of interest. That was a reduction of tens of thousands of rows of data to just under a thousand (!). I converted each row to jsonl.
This next bit was where I had the most help from the big-ass LLMs. I devised the logic for an inscription generator that would use the Roman’s own epigraphic habits as rules for generation. It is probabilistic and is pretty good, for the most part, at creating legible inscriptions (I am reminded of John Clarke’s 19th century Eureka machine for generating Latin hexametre verse). Then, looking at what kinds of things my real data contained, I tweaked the generator so that it would produce examples to fill the gaps. I ended up with a ratio of about 2 synthetic examples for every 1 real example.
After that, it was just a matter of training.
LABEL
PREC
REC
F1
AGE
0.96
0.96
0.96
COGNOMEN
0.88
0.78
0.83
FORMULA
0.94
0.68
0.79
MILITARY_UNIT
0.88
0.73
0.80
NOMEN
0.75
0.76
0.75
OCCUPATION
0.84
0.63
0.72
ORIGO
0.73
0.46
0.57
PRAENOMEN
0.78
0.84
0.80
RELATIONSHIP
0.97
0.79
0.87
TRIBE
0.75
0.73
0.74
If you look at the ‘train your model’ jupyternotebook (in the repo), you’ll see where I ran the model against the testing split (dev.jsonl); these were the metrics:
Total Predictions: 4426
Total Gold Labels: 5034
Correct (Exact Match): 3847
Precision: 0.869
Recall: 0.764
F1 Score: 0.813
Now – more real well-annotated examples, complemented by synthetic ones to fill the gap, might lead to higher scores, but the real proof is in the pudding, not in these test-case scenarios. It might be that I’ve got a model here that’s really good at… my bespoke admixture of read/synthetic. It might (probably will?) fall down when thrown against your data. But that’s what makes this fun. So… give it a whirl on your own epigraphic data, see what percolates out? Feel free to modify, make better.
One day, I decided that my story-loving kid and I would play an rpg together. We settled on Ken Lowery’s ‘Lighthouse at the End of the World‘. This is a solo journaling RPG: you read the backstory, you throw the die, you pull some cards and consult the game for what those cards will prompt you to think about. And then you write. My kid and I enjoyed a quiet morning, pulling the cards and writing two vastly different stories that somehow seemed like a shattered crystal reflecting some deeper reality. We’d pull a card, write for a bit, then read each other what we’d written. We’re nerdy like that.
But! What a powerful way to think about writing, constraints, and world-building! Colleen Morgan has an excellent piece on world-building for archaeologists. We’re not storytellers; we build worlds, and if we build worlds well, others can tell their stories, other stories can be told. I went exploring, and discovered there is a myriad of systems out there to build these solo journaling rpgs. I’m attracted to the ‘solo’ part, because I teach, and I want students doing things on their own. At least at first. I’m particularly attracted to the ‘Wretched & Alone’ system for crafting these experiences, by Mat Sanders in consultation with Chris Bissette ( who wrote the first ‘Wretched’ game. The system is explained here). The main thing about this system is that they are “about struggling in the face of insurmountable odds to survive, or to achieve something important.” I’m also coming to these games with a back ground in agent based simulation, and some faffing about with board games and video games. I like the idea of changing the rules to better capture some truth about the past that you’re trying to communicate.
I set out to write my own Wretched-style game. My first attempt was based on the Franklin Expedition, and how the demise of those men is captured in Inuit oral tradition. I wrote my card prompts, I thought about how the game might end, I imagined what their colonial mindset when confronted with survival and the obvious ability of the Inuit to live in the North might do, and I used canva to produce a kind of ‘zine of my rules (I recognize now that there’s a flub with my ‘salvation’ mechanism which I need to get around to fixing some day). I so much enjoyed setting up the pieces this way that I spent quite a lot of time during my sabbatical writing a few more such games – one set at the interface of antiquarianism/nascent professionalization of archaeology in late Victorian/Edwardian England, another featuring the Teutoberg Forest disaster, and one about being a new grad student in a DH program.
One feature of ‘Wretched’ style games is the use of a block tower and the pulling of blocks to ramp up the tension and dread. It’s not the sort of thing one always has handy. I started mucking about building a writing pad in an html page where that kind of tower simulation could be accessed by mouse-click. Turns out, there’s a lot of discussion online about how to do this with dice rolls so I figured out how to implement that. At around this time I was also doing a lot of ‘homecooked meals‘, building little things using some coding help, for my students and with my students. I sketched out what I was after, got the html and css skeleton working, and eventually surfaced with an online gamepad for solo journaling rpgs.
There I left things for a while, until this week, when I grew unhappy with having to flip between the pdf with card prompts and the gamepad. Why not merge them, and have a unique html page for each game? After much faffing, I emerged with The Cave and Whoever Finds This Paper. You can see that the second one is very much the first one, but reskinned; both have arrays of json data in them to keep the cards straight and the actions and consequences and so on. So why not make that modular?
This is where I turned to Claude Code’s new sandbox feature. I gave it the original pdfs, my merged one-page gamepad/prompt html, and instructed it to turn the html into a template and move the game and prompts and styling to separate yaml, the idea being to have a static site generator just for Wretched & Alone style games. It took a while, with Claude suggesting/making changes, me tweaking, rolling things slightly differently, but… I now have something that I can use with my students. They can focus on the writing, the thinking through of how the prompts leave some things unspoken while directing attention elsewhere (what, in Terry Pratchett’s Discworld, might be called ‘storytelling with hole’), and use the generator to create the html. Students more familiar with css and js can extend things just by mucking about with the theme.yaml.
and you’re ready to go. There’s an example game included (not a great game, but just one to show you the ropes) that is derived from The Cave. Run the generator with:
python cli/wretched.py build example-game
and it will stitch together the necessary files in the example-game folder and output a single html page that you can then put online, or run locally. There’s even a mobile theme! Copy the theme you want from the ‘themes’ folder into your example-game folder, and rename it ‘theme.yaml’. If you don’t like typing python cli/wretched.py every time, there is also a bash script wretched that you can use (once you give permission for it to run; on a mac that’d be something like chmod +x wretched and then you can run it ./wretched build example-game.
I don’t think I’ll be doing much more development with this for a while; it’s been a lot of fun, but there are so many other fires I ought to be putting out right now. So take a copy of it, expand it, enhance it, play with it, use it, write great stories with hole.
I wrote a short, incomplete, opinionated introduction to generative AI for history and archaeology students. The work is what I wish my students had before they came to my class last September, and is built from my teaching notes and remarks that I gave at a few public venues last year. Here’s what the publisher has to say about it:
“The buzz surrounding AI these days is nearly deafening and hardly a week goes by without some breathless utterance about the future of AI. One day AI is eliminating the need for teachers, the next it is streamlining our entire consumer economy, revolutionizing warfare, and turning us all into mindless drones.
Shawn Graham’s new book will not help us predict the future, but it will cut through some of the hype and show how AI technology can help us understand the past in new ways. Practical Necromancy starts with a thoughtful guide to the history and inner workings of AI. The second part of the book offers some concrete exercises well-suited for students and curious faculty alike. The final section offers some wisdom to administrators who like the rest of us struggle with how to use AI most effectively on their campuses.
The observations and exercises offered in this book continue in tradition of Graham’s 2019 book Failing Gloriously and Other Essays. Failing Gloriously has become one of the most downloaded books ever published by The Digital Press with nearly 5000 downloads since it first appeared. Practical Necromancy continues to advocate for the fearless experimentation central to Failing Gloriously. It is only through a fearlessness approach to AI that we can “break them, push them, prod them, make them give up the ghosts in their data.”
I don’t think I’ve seen anyone do this yet: drop a Harris Matrix through Retrieval Augmented Generation. That is to say, I think I have a web toy here with a nifty feature. The ability to ask questions of our data using our everyday language is the StarTrek dream, isn’t it. ‘Computer, when and how was this site abandoned?’ ‘Computer, identify the logical inconsistencies in this stratigraphy’. ‘Computer, what is the relationship between phase 4 and the area B group?’ Things like that.
I’m a long way removed from day to day field work or analysis. But I’m teaching a course in the fall (asynchronous, online) that aims to introduce history students (nb, not archaeology: we don’t have a program or department) to some of the digital work involved in making sense of archaeological materials. I wanted to give them the experience of trying to understand stratigraphy from a section drawing, but I also didn’t want to pay for a license for existing Harris Matrix software, or hit them over the head with the full complexity of the exercise. I set about to create the individual pieces I would need, wired up in a single html page (see previous exercises in ‘homecook history‘). I shared the result on Mastodon, and had a few feature requests, and now I think I have a tool/toy that’s actually quite good. You can try it out here: https://shawngraham.github.io/homecooked-history/hm-generator-site/enhanced.html . Click to create a context. Drag and drop to set up stratigraphic relationships. Click to edit and add context metadata. There’s some validation going on under the hood for chronology etc. It still has its kinks, but it will serve my purpose. It also exports the Harris Matrix you build as csv (and a nice svg too if you want).
In my class, I am also addressing various kinds of machine learning things that archaeologists do. In order not to overwhelm everyone, this is mostly related to image processing. But we do consider image similarity through vectors and embeddings. So… why not express the information about each context as an embedding? And then, having done that, let’s do some retrieval-augmented generation. Then we can use an LLM to express our query in the same embedding space, find the contexts that are closest to it, and then constrain the LLM to generate a response using only the information from those contexts.
QUERY: What evidence exists for domestic activities across different phases?
==========================================================
The evidence from Context 1 (ID: C006) and Context 2 (ID: C007) demonstrates that domestic activities, specifically those related to hearth use and food preparation, were present during the Medieval phase. The stratigraphic relationships and dating evidence support the interpretation that these activities were part of the site's use during this period. The transition into the Post-Medieval phase, marked by Context 3 (ID: C008), indicates changes at the site, which may reflect alterations in domestic activities or the site's purpose, but direct evidence for domestic activities during this later phase is not provided within the given archaeological contexts.
SOURCES: C006, C007, C008
RETRIEVED CONTEXTS:
1. Context C006 (Similarity: 0.246)
Type: Feature
Description: Stone-lined hearth with evidence of burning and ash deposits...
Dating: 1250.0 AD to 1350.0 AD
Phase: Medieval
Relationship: built into C005
2. Context C007 (Similarity: 0.243)
Type: Fill
Description: Ash and charcoal fill of hearth C006, rich in pottery and animal bone...
Dating: 1250.0 AD to 1350.0 AD
Phase: Medieval
Relationship: fills C006
3. Context C008 (Similarity: 0.218)
Type: Layer
Description: Post-medieval demolition layer with brick and tile rubble...
Dating: 1600.0 AD to 1700.0 AD
Phase: Post-Medieval
Relationship: overlies C005
I’ve made some bug fixes and some enhancements to my personal knowledge management plugin for JupyterLab (and thus, JupyterLab Desktop). The goal is to enable personal note making around Jupyter notebooks, with bi-directional linking, discovery, and all the standard #pkm things we’ve come to expect. While my semantic versioning skills are a bit suspect, there’s now a version available through pypi that works pretty darned well and will be what I develop one of my autumn asynchronous courses around. The extension handles the functionality; I create a folder with markdown and ipynb files for students to use and build their personal notemaking around. I’ll share the course website & the ‘workbench’ files in due course. The extension is only one part; that ‘workbench’ is the other. Together, I think this will make for an excellent asynchronous learning experience.
Latest Changes
interface as it looks with the default Jupyter light theme
solarized PKM theme
markdown preview/source toggle can be hidden or made visible
a bug where infinite regression through files searching for wikilinks has been fixed (*fingers crossed*); this only happened when a folder (workspace) did not have a start.md file in it
start.md file is made on first run now if it doesn’t already exist, and contains helpful info about the extension (also, a much more expansive ‘pkm guide’ is also written to the folder on first run, detailing all the features)
Contextual menu for opening/closing backlinks panel
Markdown files with embedded content (eg, code or code output cells from ipynb files), when previewed, could not be printed formerly (they’d print, but the embed markdown code would show, not the content). Now, there’s a context menu that uses the print -> save as pdf function to save a rendered markdown file and its embeds as pdf (or I suppose, even print!)
Word export. A contextual menu allows for export of a rendered markdown file with its embedded content showing properly to Word. It’s glorified hyptertext, but sufficient that you could then tidy things up for eg reports and so on.
The backlinks panel
The solarized pkm theme, just because I like it. The automatically generated Guide note that gets written upon first run is shown, along with its table of contents.
Screenshots showing how the PKM: Print Markdown Preview correctly displays embedded content rather than embedded code. Also, you can use the browser’s print dialogue to save it to PDF.
Close up showing the contextual menu via right-click on a markdown note preview. The Export to Word menu item is shown.
You can try this yourself by installing it into your environment where Jupyter lives, via
pip install jupyterlab-pkm
If you’re using JupyterLab Desktop, you can also get it through the Extension Manager; search for ‘pkm’.
And yes, I built this through a judicious use of Claude & Gemini as I built one feature at a time, drawing on the work and examples of Simon Willison and Harper Reed , pushing just a little bit further than my own comfort level and knowledge each time. It seemed to me that pushing the existing markdown functionality of JupyterLab was a safer endeavour than trying to push the code executing functionality of a note-making app. This emphatically is NOT vibe-coding, and if you dig through this blog, you’ll see that I’ve been experimenting in this space since the days of RNN, so I’m aware of the issues and what is at stake.
The TRMNL device & system, an e-ink dashboard that you can extend and hook into all sorts of data streams, is all sorts of cool. And they’ve also made a lot of stuff open source. So I thought I would bring my own server and bring my own device and see what I could do. These are my notes-to-self on getting everything up and running.
BYOS – there are a number of options for setting up your own server. I went with this one built on Laravel/PHP. I already had Docker installed on my machine, so after downloading the latest release from the github page, I started Docker and then, in the terminal in the server folder I ran docker compose up and I was away to the races. The server is at http://localhost:4567/dashboard . So far, so good.
BYOD – I have a Kobo Aura lying around. I connected that to my mac, and used Finder to get into its file system, making sure that dot files were visible. Then, I followed the instructions at this repo for using a Kobo with TRMNL to get things installed, turning the device off and rebooting as indicated (important!). Also, you need the device’s MAC address. This can be found under more -> settings -> device information. Write that down somewhere handy. Now, the config file:
{ "TrmnlId": "your TRMNL Mac Address", "TrmnlToken": "your TRMNL API Key", "TrmnlApiUrl": "https://usetrmnl.com/api", "DebugToScreen": 0, "LoopMaxIteration": 0, "ConnectedGracePeriod": 0 }
Before you copy this file to your Kobo, you need to replace your TRMNL Mac Address with the actual MAC address, between the quotations. For the TrmnlToken, leave this null: “”. Then for the TrmnlApiUrl, you need to find the address for your computer on your home network. Both the device and the computer you’re using as a server have to be on the same network. On Windows: Open Command Prompt and type ipconfig. Look for the “IPv4 Address” under your main network connection (e.g., Wi-Fi or Ethernet). It will look like 192.168.1.XX or 10.0.0.XX. On macOS/Linux: Open a terminal and type ifconfig. Look for the inet address under your main network interface (e.g., en0 or wlan0). Again, look for something starting with 192… Then you’ll slot that in to your config, eg::
"TrmnlApiUrl": "http://192.168.1.100:4567/api" Save your config.json file, then move it over as per the instructions. Disconnect and then turn your Kobo off and on. At your server webpage, flip the toggle for device auto-join. Then when your Kobo has finally finished starting up, click ‘NickelMenu’ -> TRMNL et voilà.
Digital signage is yours!
Quick Update some moments later
Under the ‘Recipes’ section of the server, you can add/make new things to push to your digital sign. The interface is fairly straightforward; you add a data source that you can GET data from, and then you define a template for the data to go into. I came across this: https://github.com/SnarfulSolutionsGroup/TRMNL-Plugins/blob/main/TRMNL_Comic.md and thought, let’s go with that! So the data is ‘polled’ and comes from https://xkcd.com/info.0.json . Then we just define the template. The problem is in the server I am using (which isn’t the official one, and which is why I should probably pay for a key and use the official server with my device) is a bit more fiddly to make the templating work. However, the solution is to remember that in this particular base (BYOS laraval/php server for TRMNL) your template needs to use blade php templating conventions. Thus, for the XKCD recipe, my template looks like this:
{{–
This is a Blade template. We use Blade comments and PHP variables.
The entire data payload is available in a PHP array variable called $data.
–}}
<div class=”view bg-white”>
<div class=”layout flex flex–center-xy”>
{{– To access properties, we use PHP’s array syntax: $data[‘key’] –}}
<img src=”{{ $data[‘img’] }}” alt=”{{ $data[‘alt’] }}” style=”max-width: 100%; max-height: 100%;” />
</div>
<div class=”title_bar”>
<span class=”title”>{{ $data[‘title’] }}</span>
<span class=”instance”>#{{ $data[‘num’] }}</span>
</div>
</div>
You must be logged in to post a comment.