Albin Larsson: Blog

Culture, Climate, and Code

HTML Markup for Citation Tools

28th March 2022

Isn’t it great when a citation tool like Zotero or Wikipedia’s Citoid takes a link and turns it into a citation? In this post, I show some of the HTML markup needed for your web pages to support just that.

How Zotero and Citoid works

The most common citation tools out there are just like Citoid and Zotero powered by Zotero-translators, a set of JavaScript files that parse web pages into citations using things like XPath and CSS selectors.

Let’s make your site work with citation tools

Now you could write a Zotero-translations file for your site and submit it. There are already over 600 ones that could serve as examples. That might be a good way forward if you don’t have control over your website’s HTML markup.

However, one of those Zotero-translators files, “Embedded Metadata.js” happens to be a generic one that will try to extract data from your site if it does not have its own translator.

That translator is very capable and supports both common and generic metadata ontologies such as Open graph and ones one mostly find in industry-specific settings like BibO and Eprint Terms.

The tags and attributes your page needs

These tags are the ones I have ended up using as they are rather generic and serve many use-cases beyond citation.

<link rel="canonical" href="https://byabbe.se/example-page">

Always include a canonical tag! That ensures that the link used is your canonical link and not the one copied or visited by the user.

<meta property="og:title" content="Not your page title">

The title of the work or article is not necessarily the same as the page title, as the latter can contain things like the site name.

<meta name="author" content="Albin Larsson">

The name of the author.

<meta property="og:site_name" content="Site name">

The name of the site.

<meta property="og:article:published_time" content="2022-02-03T00:00:00+00:00">

Time of publication, note that this is from the article namespace in the Open Graph vocabulary. Let me know if you find a more generic property that is as widely supported!

<html lang="en">

Language of the page.

<meta name="description" content="An actual description.">

A description of the work, this isn’t commonly used for display purposes but some tools/setups still use it for indexing, etc.

Do you have other markup you use to expose data to citation tools? Let me know!

Govdirectory and the Unlock Accelerator

3rd February 2022

A crowdsourced and fact-checked directory of official governmental online accounts and services.

That was how Jan Ainali and I described Govdirectory in late April last year when we wrote the initial application to the Unlock Accelerator.

The idea, a global directory of government agencies and their online presence based on Wikidata, was accepted, and we got started.

Highlights of the Unlock Accelerator Program

As skeptical as I was about the accelerator program, others were about my aim to have a working prototype within the first week. The program was great, and we had a working prototype within the first week!

The development speed was all thanks to the Snowman site generator and the Wikidata SPARQL service. A couple of HTML templates and some SPARQL queries and “check”. Working with Snowman was great, and it was the first time I got to show it off in an open-source project.

Much more went into making the program great. Here are some of my personal highlights from the program!

The other participants and their projects

The common goal of expanding access to the world’s knowledge and open values of the Unlock Accelerator made it different from my previous experience with accelerators. The majority of the projects were efforts that I could easily relate to or be inspired by, and I enjoyed many interesting and rewarding chats. Check out the other projects!

The structure and mentorship

I used to say that Jan and I could have created Govdirectory without the Accelerator project. However, with a few months of reflection, I would rephrase that.

Our mentor Fabian Gampp did provide a framework for our product development. That ensured we always questioned our ideas and focused on our users.

We could have created a Govdirectory on our own, but it would have been one designed after the needs and ideas of Jan and me, not the one that now exists.

Less (Bad) Design

There were plenty of interesting workshops to attend during the program, but the one that stuck with me the most was one based around the “Less (Bad) Design: A Toolkit For Ethical Ideation”. A toolkit by Matthew Manos meant to uncover the new problems our “solutions” might generate or contribute to.

No digital project is free from bad side effects, it just isn’t possible. However, by utilizing this toolkit one becomes aware of such and can minimize them or even consider the design of various features or KPIs.

It stuck with me and lead me to reflect on several bad side effects of both past personal projects and ones with former employers or clients.

Such a toolkit/framework was to me a missing piece when it comes to product design and project planning. I have since applied this toolkit both to my projects and those of my clients.

Final notes

We continue to develop and expand Govdirectory. If you want to get involved, you can check out good first issues on Github or have a look at the ongoing data-related efforts over at Wikidata.

Finally, a big thanks to the Unlock team and encouragement for the reader to apply to the Unlock program later this spring!

Notes on launching an MVP

8th January 2022

Following a decade of procrastination, I decided that it was about time to create a Swedish citizen-science platform for heritage sites and historic environments and 80 work hours later I launched fornpunkt.se.

I have built quite a few crowdsourcing platforms some over a long period of time, some over a very short period of time. However, I’m not sure any of them would qualify as even close to a minimum viable product at their launch. FornPunkt was however, rather close to minimal while still functional.

The functionality of the MVP

Reasons for the MVP approach

Motivation Seeing early users register hundreds of sites is way better than ticking a box.

Not painting yourself into a corner Two days into the MVP I realized from watching users that using the Swedish National Heritage Board’s classification wasn’t good enough for my type of user. If I would have spent another 80 hours on it, I would have integrated more with this classification. Now I started decoupling from it a week after launch.

Real-world usage Real-world usage gives you indications not only regarding usage and features but also regarding technical bottlenecks.

Things that worked well

Waiting list/invite FornPunkt is at this early stage invite-only, not only does this allow me to connect with users but it also gives me a reason to ask how they would like to use it. The largest group are researchers wanting to store sites and data they collect. The second-largest group is all professionals, mostly from government agencies. Great, advanced users, let’s cover their needs early on. Also, the invite-only approach allows me to leave investment into moderation features to the future me.

The feature set Structured data is great, but if an option for a particular type of information does not exist, users will happily enter the data into a free-text field. That’s great as one can look at the free text and see which structured data fields should be prioritized. Advanced map features and export might not be obvious in the MVP but these helped to attract interest and stand out from existing services.

The infrastructure I launched the servers and databases the day before the launch, and other than a CSS file not being purged on an edge-server correctly, everything worked perfectly. Managed services are great, and I should maybe have used more proprietary services infrastructure. If you spend little time on something, the risks of lockin are rather low.

Stakeholder revelations I didn’t think much of stakeholders regarding the data and side services of FornPunkt. Since launch five companies and four government agencies have expressed interest in exchanging data and two pilots are being prepared.

Things that worked less well

Onboarding Interfaces are scary, especially ones where you can edit public data. Combining that with a lack of introductory documentation there was clearly a barrier to the first contribution. Considering how easy it is to make a screencast and demo today, I should have done it from the start, for the next MVP I will.

Transparency A site footer didn’t make it into the MVP, until two hours after launch. An about page? No that didn’t make it either until two hours after launch which was needed to explain that it was an early stage MVP. I played catchup all launch week with transparency and things like a public changelog and an overview of future work.

Invitation management I have enterprise-grade systems for automatic tests, automatic deployment, automatic error management, etc. All to save me time but for the invite system, I went with a spreadsheet and list of invite codes managed by me copy-pasting. Which invite codes had been shared? Which ones have been used for who? It wasted so much time and focus that I could have spent on talking to users, fixing bugs, and laying on the sofa. Never again.

Final notes

Tech is easy understanding problems are hard. Several features that felt prioritized at the start are no longer planned. New features that three weeks of usage could unravel while a decade of procrastination couldn’t are now planed. Launching early has probably saved me the 80 hours I spent on the original MVP.

My Self-hosting Setup

21st September 2021

Having self-hosted various tools and services for a few years, I now believe I got a solid home server setup that covers almost all of my use-cases, and therefore I thought I would share my current setup.

Core services

Pi-hole

While the common use case for Pi-hole is content and ad-blocking I use it as both a DNS and DHCP Server in addition to content blocking. It essentially keeps most of my network management in one place. Pi-hole is a must-have piece of software for me nowadays as it makes browsing faster and saves a lot of battery for all of my devices.

Apache/WebDav

Apache and in particular its WebDav module is what I use for file sharing and synchronization. WebDav might be an old and scray protocol, but its ecosystem is great. WebDav just works with all kinds of clients most importantly for me with Nemo(file manager) and Joplin(note-taking).

Rclone

I use Rclone to make backups of my WebDav enabled files as well as some other services.

Jellyfin

I used to self-host Calibre-web for my eBooks, but after a couple of pull requests to the Jellyfin user interface, I now use it as a more general-purpose system. Today I use it to host eBooks, photographs, radio shows, and papers.

In addition to the services above I also host a few major technical tools like JupyterHub, Jena Fuseki / Thor SPARQL editor, and Git.

Wishlist

Firefox sync

I don’t use Mozilla’s sync service but instead, I have a backup script to backup my bookmarks, history, etc. It would however be nice to have a live sync service for my own devices. The open source service Mozilla provides is however rather complex and nothing I feel like hosting at this point, especially as I don’t have a current need for Docker.

Hardware

Everything is hosted on a Raspberry Pi 3b+ that boots from an SSD. A second Raspberry Pi 3b+/SSD hosts Rclone and periodically makes backups of the main Pi’s WebDav service.

Older PostsNewer Posts