Blog

Wikimedia Commons Upload Campaigns for Cultural Heritage

13th April 2022

Since the first Swedish edition of Wiki Loves Monuments in 2011, participants have uploaded almost 30 000 images of heritage sites, protected buildings, ships, and working life museums.

Wiki Loves Monuments (WLM) goes on for 30 days per year. While many experienced users take images all year round for the event, many new contributors are introduced to WLM and the broader Wikimedia community for the first time through the 30-day event. What if there was a just as engaging effort for documenting heritage environments that would go on not for 30 days a year, but for 365?

Wiki Loves Monuments builds to an extent upon a Wikimedia Commons called Campaigns. Campaigns make it possible to construct upload forms with a set of predefined values or have default values passed through Campaigns URLs.

Much of the Wiki Loves Monuments tooling (maps, lists, etc) uses WLM specific Campaigns to define values for things like the identifiers of monuments and sites. It’s super easy to create these links:

https://commons.wikimedia.org/w/index.php?title=Special:UploadWizard&campaign=wlm-se-arbetsl&id=<museum-identifier>

However, it’s not that easy to reuse these Campaigns for non-WLM usage. The help texts are WLM specific, they set WLM-specific categories, etc. Nor is it easy to create a new Campaign. You will need special user rights and experience with JSON and Wikitext.

So rather than to set up tool-specific Campaigns, what if we had generic ones that any tool could integrate without needing to worry about set up and help texts?

Such Campaigns now exist thanks to Wikimedia Sverige’s community support! It doesn’t matter if you want to crowdsource images from a spreadsheet or if you are a programmer wanting to integrate uploading into your tool or app. You can use these Campaigns.

Two tools already utilizing these Campaigns are Kyrksök.se (database of churches in Sweden) and FornPunkt.se (FornPunkt is a citizen-science platform for historic sites).

You can create upload links for these Campaigns similarly to how one does it for WLM:

https://commons.wikimedia.org/w/index.php?title=Special:UploadWizard&campaign=Kulturl%C3%A4mningar&id=<monument-identifier>

The 30 000 images contributed through Wiki Loves Monuments add to the tens of thousands of images uploaded by individual contributors and cultural heritage institutions. Today Wikimedia Commons has the largest public collection of images depicting Swedish heritage sites. Wikimedia Commons is, therefore, an important and open piece of infrastructure for cultural heritage in Sweden. It’s utilized by various organizations, including the Swedish National Heritage Board and its website “Kringla” which indexes Wikimedia Commons once every week.

By creating these generic and easy-to-use Campaigns, the hope is to lower the barrier for integrations to contribute to this public collection of information. Your aim does not need to be to create the next Wiki Loves Monuments.

HTML Markup for Citation Tools

28th March 2022

Isn’t it great when a citation tool like Zotero or Wikipedia’s Citoid takes a link and turns it into a citation? In this post, I show some of the HTML markup needed for your web pages to support just that.

How Zotero and Citoid works

The most common citation tools out there are just like Citoid and Zotero powered by Zotero-translators, a set of JavaScript files that parse web pages into citations using things like XPath and CSS selectors.

Let’s make your site work with citation tools

Now you could write a Zotero-translations file for your site and submit it. There are already over 600 ones that could serve as examples. That might be a good way forward if you don’t have control over your website’s HTML markup.

However, one of those Zotero-translators files, “Embedded Metadata.js” happens to be a generic one that will try to extract data from your site if it does not have its own translator.

That translator is very capable and supports both common and generic metadata ontologies such as Open graph and ones one mostly find in industry-specific settings like BibO and Eprint Terms.

The tags and attributes your page needs

These tags are the ones I have ended up using as they are rather generic and serve many use-cases beyond citation.

<link rel="canonical" href="https://byabbe.se/example-page">

Always include a canonical tag! That ensures that the link used is your canonical link and not the one copied or visited by the user.

<meta property="og:title" content="Not your page title">

The title of the work or article is not necessarily the same as the page title, as the latter can contain things like the site name.

<meta name="author" content="Albin Larsson">

The name of the author.

<meta property="og:site_name" content="Site name">

The name of the site.

<meta property="og:article:published_time" content="2022-02-03T00:00:00+00:00">

Time of publication, note that this is from the article namespace in the Open Graph vocabulary. Let me know if you find a more generic property that is as widely supported!

<html lang="en">

Language of the page.

<meta name="description" content="An actual description.">

A description of the work, this isn’t commonly used for display purposes but some tools/setups still use it for indexing, etc.

Do you have other markup you use to expose data to citation tools? Let me know!

Govdirectory and the Unlock Accelerator

3rd February 2022

A crowdsourced and fact-checked directory of official governmental online accounts and services.

That was how Jan Ainali and I described Govdirectory in late April last year when we wrote the initial application to the Unlock Accelerator.

The idea, a global directory of government agencies and their online presence based on Wikidata, was accepted, and we got started.

Highlights of the Unlock Accelerator Program

As skeptical as I was about the accelerator program, others were about my aim to have a working prototype within the first week. The program was great, and we had a working prototype within the first week!

The development speed was all thanks to the Snowman site generator and the Wikidata SPARQL service. A couple of HTML templates and some SPARQL queries and “check”. Working with Snowman was great, and it was the first time I got to show it off in an open-source project.

Much more went into making the program great. Here are some of my personal highlights from the program!

The other participants and their projects

The common goal of expanding access to the world’s knowledge and open values of the Unlock Accelerator made it different from my previous experience with accelerators. The majority of the projects were efforts that I could easily relate to or be inspired by, and I enjoyed many interesting and rewarding chats. Check out the other projects!

The structure and mentorship

I used to say that Jan and I could have created Govdirectory without the Accelerator project. However, with a few months of reflection, I would rephrase that.

Our mentor Fabian Gampp did provide a framework for our product development. That ensured we always questioned our ideas and focused on our users.

We could have created a Govdirectory on our own, but it would have been one designed after the needs and ideas of Jan and me, not the one that now exists.

Less (Bad) Design

There were plenty of interesting workshops to attend during the program, but the one that stuck with me the most was one based around the “Less (Bad) Design: A Toolkit For Ethical Ideation”. A toolkit by Matthew Manos meant to uncover the new problems our “solutions” might generate or contribute to.

No digital project is free from bad side effects, it just isn’t possible. However, by utilizing this toolkit one becomes aware of such and can minimize them or even consider the design of various features or KPIs.

It stuck with me and lead me to reflect on several bad side effects of both past personal projects and ones with former employers or clients.

Such a toolkit/framework was to me a missing piece when it comes to product design and project planning. I have since applied this toolkit both to my projects and those of my clients.

Final notes

We continue to develop and expand Govdirectory. If you want to get involved, you can check out good first issues on Github or have a look at the ongoing data-related efforts over at Wikidata.

Finally, a big thanks to the Unlock team and encouragement for the reader to apply to the Unlock program later this spring!

Notes on launching an MVP

8th January 2022

Following a decade of procrastination, I decided that it was about time to create a Swedish citizen-science platform for heritage sites and historic environments and 80 work hours later I launched fornpunkt.se.

I have built quite a few crowdsourcing platforms some over a long period of time, some over a very short period of time. However, I’m not sure any of them would qualify as even close to a minimum viable product at their launch. FornPunkt was however, rather close to minimal while still functional.

The functionality of the MVP

Content editing Users could give a site a simple geometry, write a textual description and add a class. Users could edit and delete these sites.
Discovery and browsing Users could browse user-created sites on a map as well as sites from the government-run register “The Archaeological Sites and Monuments System”(KMR). Users can comment on both sites created by other users and ones from KMR. Users could add custom background maps (WMS/XYZ-services, etc) as backgrounds.
Export Users could export their data.
Documentation Documentation wise there were only some docs about using advanced map features.

Reasons for the MVP approach

Motivation Seeing early users register hundreds of sites is way better than ticking a box.

Not painting yourself into a corner Two days into the MVP I realized from watching users that using the Swedish National Heritage Board’s classification wasn’t good enough for my type of user. If I would have spent another 80 hours on it, I would have integrated more with this classification. Now I started decoupling from it a week after launch.

Real-world usage Real-world usage gives you indications not only regarding usage and features but also regarding technical bottlenecks.

Things that worked well

Waiting list/invite FornPunkt is at this early stage invite-only, not only does this allow me to connect with users but it also gives me a reason to ask how they would like to use it. The largest group are researchers wanting to store sites and data they collect. The second-largest group is all professionals, mostly from government agencies. Great, advanced users, let’s cover their needs early on. Also, the invite-only approach allows me to leave investment into moderation features to the future me.

The feature set Structured data is great, but if an option for a particular type of information does not exist, users will happily enter the data into a free-text field. That’s great as one can look at the free text and see which structured data fields should be prioritized. Advanced map features and export might not be obvious in the MVP but these helped to attract interest and stand out from existing services.

The infrastructure I launched the servers and databases the day before the launch, and other than a CSS file not being purged on an edge-server correctly, everything worked perfectly. Managed services are great, and I should maybe have used more proprietary services infrastructure. If you spend little time on something, the risks of lockin are rather low.

Stakeholder revelations I didn’t think much of stakeholders regarding the data and side services of FornPunkt. Since launch five companies and four government agencies have expressed interest in exchanging data and two pilots are being prepared.

Things that worked less well

Onboarding Interfaces are scary, especially ones where you can edit public data. Combining that with a lack of introductory documentation there was clearly a barrier to the first contribution. Considering how easy it is to make a screencast and demo today, I should have done it from the start, for the next MVP I will.

Transparency A site footer didn’t make it into the MVP, until two hours after launch. An about page? No that didn’t make it either until two hours after launch which was needed to explain that it was an early stage MVP. I played catchup all launch week with transparency and things like a public changelog and an overview of future work.

Invitation management I have enterprise-grade systems for automatic tests, automatic deployment, automatic error management, etc. All to save me time but for the invite system, I went with a spreadsheet and list of invite codes managed by me copy-pasting. Which invite codes had been shared? Which ones have been used for who? It wasted so much time and focus that I could have spent on talking to users, fixing bugs, and laying on the sofa. Never again.

Final notes

Tech is easy understanding problems are hard. Several features that felt prioritized at the start are no longer planned. New features that three weeks of usage could unravel while a decade of procrastination couldn’t are now planed. Launching early has probably saved me the 80 hours I spent on the original MVP.