Albin Larsson: Blog

Culture, Climate, and Code

Reconcile Against any MediaWiki Instance

5th August 2024

You have just deployed your 55th reconciliation service for a MediaWiki or Wikibase-based service. You start wondering if this isn’t the time to stop copy-pasting your code around and apply some of that don’t-repeat-yourself wisdom. That was me a moment ago after a few months of pushing it in front of me. Now a prototype intended to replace all of our reconciliation services targeting MediaWiki and Wikibase is here and you can give it a go in OpenRefine!

Let’s jump in head first with some example endpoints:

https://kartkod.se/apis/reconciliation/mw/en.wikipedia.org/ # English Wikipedia (MediaWiki)
https://kartkod.se/apis/reconciliation/wb/wikibase.world/ # Wikibase World (Wikibase)
https://kartkod.se/apis/reconciliation/wb/www.wikidata.org/ # Wikidata (Wikibase)
https://kartkod.se/apis/reconciliation/mw/www.wikidata.org/ # Wikidata (MediaWiki)
https://kartkod.se/apis/reconciliation/mw/en.wikisource.org/ # English Wikisource (MediaWiki)
# your wiki? and many more!

You see the patterns, now let’s see how we got here. Once you realize you can put any domain in there, you might think that we made it very error-prone and complex. We did, here is why:

It’s not error-prone to us

You see, our OpenRefine users can’t add their own reconciliation services and our collection of services is already there the first time they sign in.

We want it to be open to you

If we needed to allow-list all the configurations it would quickly become less useful to others and we need it to hook into our other reconciliation services with local restrictions anyway. It’s an experiment and I hope we can keep it configuration less.

We need it to be stateless

We deploy all our reconciliation services in a distributed way and keeping our services stateless is incredibly useful to limit the maintenance and engineering needed.

Future work

This service is experimental and you should expect it to change in the coming weeks as we bring it on pair with our existing services. It will never implement the whole Reconciliation specification but

A note on compatibility with the OpenRefine Wikibase extension, our reconciliation services consider the Wikibase URIs as the “true” identifiers while the Wikibase extension expects just the QID. Hopefully, this is something I can work on upstream as it’s not an option to change the behavior on our end. In the meantime you might be able to work around this using the “Use values as identifiers” feature.

I hope you give it a try, if you have any questions or suggestions feel free to reach out. I will also be at Wikimania if you want to chat in person!

EXIF to GeoJSON Converter

19th July 2024

I made a small tool for extracting EXIF location data and “converting” it into GeoJSON. It comes from my need to display EXIF locations in OpenOrienteering Mapper. Turns out it’s useful for other things like OpenStreetMap mapping and Wikidata(WikiShootMe supports custom GeoJSON layers).

You find the tool on this webpage and the code over at Codeberg.

Screenshot of the tool.

Cloudflare Worker to resolve URLs

1st March 2024

The other day, I needed to resolve a w.wiki URL from a client-side application. However, the UrlShortener MediaWiki extension does not provide an API for resolving URIs (T358049), and client-side applications can’t simply resolve the URLs normally due to CORS.

To unblock myself, I decided to write a generic Cloudflare Worker to resolve URLs, as it is a common task and I always end up dealing with the same edge cases, such as content negotiation and URL fragments. I will update the code below as I need to handle more cases.

// Created by Albin Larsson and made available under Creative Commons Zero
addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request));
});

async function handleRequest(request) {
  const urlParam = new URL(request.url).searchParams.get('url');

  if (!urlParam) {
    return new Response('URL parameter is missing.', { status: 400 });
  }

  const corsHeaders = {
    'Access-Control-Allow-Origin': '*',
    'Access-Control-Allow-Methods': 'GET, HEAD, OPTIONS',
  };

  // manually follow redirects to obtain the final URL from the location header
  // as the client-side only fragment wont be a part of the URL-object
  let finalUrl = urlParam;
  let response;

  do {
    // pretent to be human by requesting text/html to prevent default content-neogtiation
    response = await fetch(finalUrl, { redirect: 'manual', headers: { 'Accept': 'text/html' } });

    if (response.status >= 300 && response.status < 400) {
      const redirectUrl = new URL(response.headers.get('location'), finalUrl);
      finalUrl = redirectUrl.href;
    }
  } while (response.status >= 300 && response.status < 400);

  return new Response(finalUrl, {
    headers: {
      ...corsHeaders
    }
  });
}

Using Python virtual environments through Just

21st February 2024

I have gotten quite fond of Just lately much thanks to how it forces you into the habit of creating structured documentation for the various commands and scripts that you end up writing.

When adding a Justfile to a Python/Django project the other day I found myself in a situation where I wanted to make sure that all commands ran in a virtual environment. However, because Just run each line in a separate shell, it is not possible to activate the virtual environment in one line and then run the command in the next.

The only (sane) way I found to solve this was to prefix each command with the path to the virtual environment’s Python or Pip binary. This is not ideal, but it’s likley that you and your colaborators will have settled on a naming convention for the virtual environment directory anyway.

Here is a full example of a Justfile form one of my Django projects:

# load .env file
set dotenv-load

@_default:
  just --list

# setup virtual environment, install dependencies, and run migrations
setup:
  python3 -m venv .venv
  ./.venv/bin/pip install -r requirements.txt
  ./.venv/bin/python -Wa manage.py migrate

run:
  ./.venv/bin/python -Wa manage.py runserver

test:
  ./.venv/bin/python -Wa manage.py test

# virtual environment wrapper for manage.py
manage *COMMAND:
  ./.venv/bin/python manage.py 

Older Posts