5th August 2024
You have just deployed your 55th reconciliation service for a MediaWiki or Wikibase-based service. You start wondering if this isn’t the time to stop copy-pasting your code around and apply some of that don’t-repeat-yourself wisdom. That was me a moment ago after a few months of pushing it in front of me. Now a prototype intended to replace all of our reconciliation services targeting MediaWiki and Wikibase is here and you can give it a go in OpenRefine!
Let’s jump in head first with some example endpoints:
https://kartkod.se/apis/reconciliation/mw/en.wikipedia.org/ # English Wikipedia (MediaWiki)
https://kartkod.se/apis/reconciliation/wb/wikibase.world/ # Wikibase World (Wikibase)
https://kartkod.se/apis/reconciliation/wb/www.wikidata.org/ # Wikidata (Wikibase)
https://kartkod.se/apis/reconciliation/mw/www.wikidata.org/ # Wikidata (MediaWiki)
https://kartkod.se/apis/reconciliation/mw/en.wikisource.org/ # English Wikisource (MediaWiki)
# your wiki? and many more!
You see the patterns, now let’s see how we got here. Once you realize you can put any domain in there, you might think that we made it very error-prone and complex. We did, here is why:
It’s not error-prone to us
You see, our OpenRefine users can’t add their own reconciliation services and our collection of services is already there the first time they sign in.
We want it to be open to you
If we needed to allow-list all the configurations it would quickly become less useful to others and we need it to hook into our other reconciliation services with local restrictions anyway. It’s an experiment and I hope we can keep it configuration less.
We need it to be stateless
We deploy all our reconciliation services in a distributed way and keeping our services stateless is incredibly useful to limit the maintenance and engineering needed.
Future work
This service is experimental and you should expect it to change in the coming weeks as we bring it on pair with our existing services. It will never implement the whole Reconciliation specification but
- Wikibase reconciliation beyond items(properties are especially important)
- Documentation on how to reconcile against specific namespaces
- Documentation on how to reconcile against Wikibase items with certain statements
- Error messages and service validation(things that aren’t MediaWikis, things without necessary extensions, etc)
- Auto generation of usage documentation based on installed extensions(CirrusSearch, etc)
- Support Wikimedia Commons entities(it’s a weird one and given the state of Structured Data on Commons it’s not highly prioritized)
A note on compatibility with the OpenRefine Wikibase extension, our reconciliation services consider the Wikibase URIs as the “true” identifiers while the Wikibase extension expects just the QID. Hopefully, this is something I can work on upstream as it’s not an option to change the behavior on our end. In the meantime you might be able to work around this using the “Use values as identifiers” feature.
I hope you give it a try, if you have any questions or suggestions feel free to reach out. I will also be at Wikimania if you want to chat in person!
19th July 2024
I made a small tool for extracting EXIF location data and “converting” it into GeoJSON. It comes from my need to display EXIF locations in OpenOrienteering Mapper. Turns out it’s useful for other things like OpenStreetMap mapping and Wikidata(WikiShootMe supports custom GeoJSON layers).
You find the tool on this webpage and the code over at Codeberg.
1st March 2024
The other day, I needed to resolve a w.wiki URL from a client-side application. However, the UrlShortener MediaWiki extension does not provide an API for resolving URIs (T358049), and client-side applications can’t simply resolve the URLs normally due to CORS.
To unblock myself, I decided to write a generic Cloudflare Worker to resolve URLs, as it is a common task and I always end up dealing with the same edge cases, such as content negotiation and URL fragments. I will update the code below as I need to handle more cases.
// Created by Albin Larsson and made available under Creative Commons Zero
addEventListener('fetch', event => {
event.respondWith(handleRequest(event.request));
});
async function handleRequest(request) {
const urlParam = new URL(request.url).searchParams.get('url');
if (!urlParam) {
return new Response('URL parameter is missing.', { status: 400 });
}
const corsHeaders = {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Methods': 'GET, HEAD, OPTIONS',
};
// manually follow redirects to obtain the final URL from the location header
// as the client-side only fragment wont be a part of the URL-object
let finalUrl = urlParam;
let response;
do {
// pretent to be human by requesting text/html to prevent default content-neogtiation
response = await fetch(finalUrl, { redirect: 'manual', headers: { 'Accept': 'text/html' } });
if (response.status >= 300 && response.status < 400) {
const redirectUrl = new URL(response.headers.get('location'), finalUrl);
finalUrl = redirectUrl.href;
}
} while (response.status >= 300 && response.status < 400);
return new Response(finalUrl, {
headers: {
...corsHeaders
}
});
}
21st February 2024
I have gotten quite fond of Just lately much thanks to how it forces you into the habit of creating structured documentation for the various commands and scripts that you end up writing.
When adding a Justfile to a Python/Django project the other day I found myself in a situation where I wanted to make sure that all commands ran in a virtual environment. However, because Just run each line in a separate shell, it is not possible to activate the virtual environment in one line and then run the command in the next.
The only (sane) way I found to solve this was to prefix each command with the path to the virtual environment’s Python or Pip binary. This is not ideal, but it’s likley that you and your colaborators will have settled on a naming convention for the virtual environment directory anyway.
Here is a full example of a Justfile form one of my Django projects:
# load .env file
set dotenv-load
@_default:
just --list
# setup virtual environment, install dependencies, and run migrations
setup:
python3 -m venv .venv
./.venv/bin/pip install -r requirements.txt
./.venv/bin/python -Wa manage.py migrate
run:
./.venv/bin/python -Wa manage.py runserver
test:
./.venv/bin/python -Wa manage.py test
# virtual environment wrapper for manage.py
manage *COMMAND:
./.venv/bin/python manage.py