Home

History, things I've learned, what comes next

2018

I quit Facebook at the same time as a lot of other people in 2018. This was for the most part a relief, but I missed knowing what was going on in town.

I thought to myself, no problem. I made a bunch of the websites of the places I'm interested in. So, all I have to do is to make an automatic data export from those websites and i can collect the events somewhere for myself. I'll make it public so maybe it benefits others, as well. I figured if I start with this, the rest will follow. My motivation was (and still is) driven by a lot of things, but mostly I really just want to know what's up on the coming Saturday.

2019

The first version was ready in 2019 and looked very function-first. It was using a standard calendar component. The idea was to present this as an infrastructure project and emphasise the data and not the presentation. Unsurprisingly, this aesthetics was popular with the activist-leaning artists and less popular with the marketing departments.
Version 1. Using the FullCalendar component for showing the events in the way we're used to

I was essentially trying to recreate a light-version of the semantic web and I was vaguely aware of it at the time. The semantic web is an idea from the early 2000's to describe web pages in a way it can be understood by computers. 

Behind the scenes, this was using the common protocol .ics / iCalendar. There was a small json file that announced the publicly available calendar sources. So if you know the website, you would also be able to find out the calendar source. The idea was that any web page that contains event should also have a .ics representation.

My reasoning was that if things are desperate, it's worth trying old ideas again (similar to the semantic web). I also thought that, since .ics is baked in everywhere it's merely about connecting things with some duct tape and it will enough to create a critical mass of open data. 

I made a kind of a sales slogan "if the semantic web is an airplane, this is a bicycle". My hope was that, since the .ics format is built-in everywhere, it would be easy to get it to work.

2020

I kind-of, sort of, worked for a while. I think a critical mass (for myself and my imagined audience) would be about 20 places, and I got close at times.

Some places did really believe in this (just like me) and paid their developers to export the data so that I could display it on my website (thank you, again to you who did!). Making an automated .ics export is about half a day of work. The problem was (as always with an open data approach) that not all places had the means to do this. Some didn't even have a website to pull the data from.

For the places without website, the option for them would then be to create a calendar especially for this purpose (using nextcloud, google calendar, or similar). Some places did this (thank you!). Filling in events is a lot of work, but I thought that if this takes off, this could potentially be a kind of replacement for having a website. This is similar to how social media plays that function as a kind of poor mans website sometimes. It could work, but anything that is based on contious manual labor is really hard to maintain long term.

What I realised in this moment is that, even if this a selection of events that can never be complete in an absolute sense, it still needs to be complete according to my intended audience. If one or two places are missing, then it falls a part. If I can't convince everyone to export (or manually add) the data I had interest in, then the actual process of curating, selecting, etc, would never really start.

Just as I was getting a bit pessimistic about the whole project, something else happened: 

2021-2022

Waag and Public Spaces started a project with similar goals in 2022. I thought, maybe they can get an infrastructure project like this off the ground. They made it into more of a formal protocol specification. I think I might have persuaded them to include .ics because it would make it easier for smaller places to participate. It's not the best format, but the most accessible for small spaces, I argued.

I haven't heard anything from them for a while so I'm not sure how their project is doing. Maybe it's going amazing without me knowing, but their focus was more on medium-large places. Places with proper websites and databases, that could make proper exports of their data.

Standardization and open data is useful and needed. However, during this process I rediscovered two problems with open data and, by extension, the semantic web:

1. In the semantic web, how the data is defined (and is supposed to be interpreted) is decided by the sender. Information can be interpreted in various ways and the semantic web removes that agency from the reader. For example, if I want to tag an event as 'music' or 'art', it should be my concern, not only the creator of the event. Or rather, the categorisation needs to be context-specific. What is 'opera' in one context, is 'music' in another.

2. Nothing is 100% public, there only various publics and audiences. At this point I started to realise that 'publicdata.events' is a bit of a misnomer. I always had this nagging feeling that open data as an idea is more or less doomed. I partly chose to ignore it, since the alternative is unbelievably complicated: Both distributed and decentralized — and also limited to an intended audience — is a form of a contradiction.

However, possibly the fediverse has an at least partial solution to both of these problems, and this I'd like to explore later. Categorisation can be 'local' per instance, for example. There is also a bit more control of what servers sees what.

Putting data out there that is machine-readable is also a little bit scary, since you don't know what happens to it. You don't know in what context it will show up. What many don't realise is that there is a difference between putting something on a website and making the data machine-readable. When I explained this to potential participants their enthusiasm waned. Just because you make some information public and readable to humans, doesn't mean you want it to be available for large computation. For example, you might not want to automatically participate in a large map of who-does-what. "Please export your data but be aware it can end up anywhere" isn't a very good sales-pitch.

2022-2023

I got a very generous "donation" of data from New Music Now. I was thrilled! I thought, now this will be enough of some form of a critical mass. They gave me access to all their event data and I used it on the previous version of publicdata.events that looked like this:
Version 2

To make this attractive I made this feature that I still think has potential (I might add it back later): You could select what places you are interested in and subscribe to a feed of these events in your own calendar. I thought that the combination of critical mass of data and this subscription would be enough. This step was partly funded by Stimuleringsfonds through the "bouwen aan talent" program.

What I had at this point was a website that would potentially be interesting for someone interested in music.

Quickly after finishing this I realised that this isn't going to work unless I am myself the target audience. I don't go to a lot of concerts. The core of the events needs to be connected to my interest, or I won't have an intuitive feel for the curation. I never made any big splash about this update, for this reason.

At this moment I was again pessimistic about this project. I then tried my last ditch effort: To scrape websites. Scraping means getting data from websites and cleaning it up to usable data. I always thought to myself, this is the thing i can do if nothing else works. I was quite surprised to see that scraping did not work at all. It's pretty funny how I (with the inner image of myself in some kind of a heroic pose) been resisting scraping for years, for then to finally give in to the temptation and realising its completely useless.

As you can see in the screenshot above the events from Stedelijk Museum are scraped - that worked. But finding out what Stedelijk is up to is not that hard. The places I care about typically don't have very reliable websites. Some of them don't even have a website.

At this point i felt that I explored all avenues and it had also been a long moment of silence from Waag and Public Spaces so I felt this might never happen. Time to switch down to low-power mode on this project.

2024

Early in 2024 i tried pasting in a newsletter in ChatGPT asking it to extract titles and dates, and that worked surprisingly well. Obviously, I don't want to send any data to ChatGPT so I looked at self-hosted solutions. The during 2024 things developed quickly and it is now possible to get good-enough results from a locally hosted LLM.

Now and future

What I would like to try is to use AI as a temporary step. It could be a way to collect events from places that has no time, resources or interest in creating some special solution for exporting their event-data. As this hopefully becomes an actual, reliable, source of events, it might eventually be easier to usher these organisations over to a more traditional data export. Perhaps the goal is that everyone directly publishes on some variant of the fediverse, but things aren't really there yet.