Albin Larsson: Blog

Culture, Climate, and Code

Notes on launching an MVP

8th January 2022

Following a decade of procrastination, I decided that it was about time to create a Swedish citizen-science platform for heritage sites and historic environments and 80 work hours later I launched fornpunkt.se.

I have built quite a few crowdsourcing platforms some over a long period of time, some over a very short period of time. However, I’m not sure any of them would qualify as even close to a minimum viable product at their launch. FornPunkt was however, rather close to minimal while still functional.

The functionality of the MVP

Reasons for the MVP approach

Motivation Seeing early users register hundreds of sites is way better than ticking a box.

Not painting yourself into a corner Two days into the MVP I realized from watching users that using the Swedish National Heritage Board’s classification wasn’t good enough for my type of user. If I would have spent another 80 hours on it, I would have integrated more with this classification. Now I started decoupling from it a week after launch.

Real-world usage Real-world usage gives you indications not only regarding usage and features but also regarding technical bottlenecks.

Things that worked well

Waiting list/invite FornPunkt is at this early stage invite-only, not only does this allow me to connect with users but it also gives me a reason to ask how they would like to use it. The largest group are researchers wanting to store sites and data they collect. The second-largest group is all professionals, mostly from government agencies. Great, advanced users, let’s cover their needs early on. Also, the invite-only approach allows me to leave investment into moderation features to the future me.

The feature set Structured data is great, but if an option for a particular type of information does not exist, users will happily enter the data into a free-text field. That’s great as one can look at the free text and see which structured data fields should be prioritized. Advanced map features and export might not be obvious in the MVP but these helped to attract interest and stand out from existing services.

The infrastructure I launched the servers and databases the day before the launch, and other than a CSS file not being purged on an edge-server correctly, everything worked perfectly. Managed services are great, and I should maybe have used more proprietary services infrastructure. If you spend little time on something, the risks of lockin are rather low.

Stakeholder revelations I didn’t think much of stakeholders regarding the data and side services of FornPunkt. Since launch five companies and four government agencies have expressed interest in exchanging data and two pilots are being prepared.

Things that worked less well

Onboarding Interfaces are scary, especially ones where you can edit public data. Combining that with a lack of introductory documentation there was clearly a barrier to the first contribution. Considering how easy it is to make a screencast and demo today, I should have done it from the start, for the next MVP I will.

Transparency A site footer didn’t make it into the MVP, until two hours after launch. An about page? No that didn’t make it either until two hours after launch which was needed to explain that it was an early stage MVP. I played catchup all launch week with transparency and things like a public changelog and an overview of future work.

Invitation management I have enterprise-grade systems for automatic tests, automatic deployment, automatic error management, etc. All to save me time but for the invite system, I went with a spreadsheet and list of invite codes managed by me copy-pasting. Which invite codes had been shared? Which ones have been used for who? It wasted so much time and focus that I could have spent on talking to users, fixing bugs, and laying on the sofa. Never again.

Final notes

Tech is easy understanding problems are hard. Several features that felt prioritized at the start are no longer planned. New features that three weeks of usage could unravel while a decade of procrastination couldn’t are now planed. Launching early has probably saved me the 80 hours I spent on the original MVP.

My Self-hosting Setup

21st September 2021

Having self-hosted various tools and services for a few years, I now believe I got a solid home server setup that covers almost all of my use-cases, and therefore I thought I would share my current setup.

Core services

Pi-hole

While the common use case for Pi-hole is content and ad-blocking I use it as both a DNS and DHCP Server in addition to content blocking. It essentially keeps most of my network management in one place. Pi-hole is a must-have piece of software for me nowadays as it makes browsing faster and saves a lot of battery for all of my devices.

Apache/WebDav

Apache and in particular its WebDav module is what I use for file sharing and synchronization. WebDav might be an old and scray protocol, but its ecosystem is great. WebDav just works with all kinds of clients most importantly for me with Nemo(file manager) and Joplin(note-taking).

Rclone

I use Rclone to make backups of my WebDav enabled files as well as some other services.

Jellyfin

I used to self-host Calibre-web for my eBooks, but after a couple of pull requests to the Jellyfin user interface, I now use it as a more general-purpose system. Today I use it to host eBooks, photographs, radio shows, and papers.

In addition to the services above I also host a few major technical tools like JupyterHub, Jena Fuseki / Thor SPARQL editor, and Git.

Wishlist

Firefox sync

I don’t use Mozilla’s sync service but instead, I have a backup script to backup my bookmarks, history, etc. It would however be nice to have a live sync service for my own devices. The open source service Mozilla provides is however rather complex and nothing I feel like hosting at this point, especially as I don’t have a current need for Docker.

Hardware

Everything is hosted on a Raspberry Pi 3b+ that boots from an SSD. A second Raspberry Pi 3b+/SSD hosts Rclone and periodically makes backups of the main Pi’s WebDav service.

Live Editing Wikidata: 50 Episodes and Counting

8th September 2021

It has been more than 17 months since Jan Ainali asked if I wouldn’t be up for some live-streamed Wikidata editing. It seemed like an insightful and valuable thing to do. Now, over 50 episodes later, we have showcased more than 30 community tools, issued over 100 SPARQL queries, and even so, we don’t lack ideas for future content.

Tools Behind the Scenes

From day one, we have been using StreamYard and been broadcasting to a handful of platforms. Having no experience at all with streaming, StreamYard has been a joy to use. With a helping hand from Jan, I could get around the basics within a few minutes. Other tools I combine it with are Wikimedia’s URL-shortener to easily share URLs and Firefox running a separate profile for the window I share. I often also have a separate Joplin window where I have notes and URLs I might need during the broadcast.

Finding Content

While I personally could make 50 episodes of me editing historical railway systems and making maps with the same tools over and over, I try not to. I think three main things help us broaden our content beyond our own ideas.

1. New community tools

Jan is particularly quick when it comes to picking up new tools that the community has come up with, I can’t keep up myself and it has happened that I have showcased a tool as “new” when in fact it has been around for several years.

2. Awareness days

Awareness days have been a great way to explore and edit new types of content. It’s so useful that we even created an awareness day calendar using Wikidata and SPARQL on stream once.

3. Events

There are plenty of themed events relating to Wikidata and Wikimedia which provide both content and new audiences for both us and the events.

What Makes a Good Episode

Initially, I improvised and did everything live, and I still do that sometimes especially if I focus on editing/Wikidata content rather than on tools/workflows. An episode for which I have prepared a set of guiding points makes for better content. I guess my average preparation takes about 45 minutes. If I prepare or not depends much on if I find the time and motivation during the week. The most notable difference between a well-prepared episode and a less prepared one on my end is how many times I repeat the same things. Repeating something I just said is something I find myself doing when I improvise on the fly, if I, on the other hand, need to move straight to the next point the repetition does not occur.

Visual reusable examples and visualizations make good popular content, while it appears like fancy graphs and maps draw the most viewers and views, just a few users picking up a powerful editing workflow or tool might be of much greater benefit to Wikidata and its community. We need a mix of these and I hope we got that mix.

The pace is tricky, especially when one does semitechnical things while one wants to give viewers the time to make comments. It’s hit or miss on my end and I have yet to figure out a good way to manage it, especially as it’s so dependent on the content.

Things to Improve

1. Backlog of Prepared Content

In the future, I would like to build up a “prepared content” backlog so that I can be confident in the quality even during weeks when I have a lot on my plate.

2. Blog Posts Describing Advanced Workflows

It’s not unusual that Jan or I share quite advanced workflows and while a stream might be a good way to showcase it, it might not be the best medium to help someone adopt it. I have once or twice created write-ups for showcased workflows but maybe what I should do is to do so before an episode so I can share it at the end of the stream.

3. Ideas from the Community

Every now and then, we get a suggestion from the community for a topic or thing to talk about, but we could have more of this. No matter if you discover a tool that you found useful or if you made it, share it with us. We want to let more people know about it! We should prompt this in more places.

4. Increase the interaction with viewers

One pattern I imagine I observe is that if one person makes a comment that we can bring up on screen it’s more likely that someone else will too, even if it’s just a “Hi!”.

I’m yet unsure of what makes good content that encourages interactions with viewers. Maybe it’s good old editing where we are far from experts on the data modeling we face so that the audience can suggest things that we can use and improve. Maybe we should talk more about the weather.

Where You Find Our Episodes

We tend to stream on Saturdays at 20:00 UTC+2. The best way to get notified is by subscribing to the Wikipedia Weekly Network on YouTube. Our past episodes are listed over at the Wikimedia Meta-wiki.

Using SPARQL in QGIS

17th November 2020

A couple of months back I discovered the SPARQLing Unicorn QGIS Plugin and it has been super useful for both data visualizations and Wikidata editing.

After installing it adds a new option under the “Vector” menu item in QGIS. Its interface allows one to access and query a set of predefined SPARQL endpoints including Wikidata as well as pointing to a custom endpoint or RDF file.

screenshot

One of my use cases has been to fetch all glaciers in Wikidata missing GLIMS identifiers and then add a GLMIS layer from the official Shapefiles so I can visually see the overlap.

screenshot

I hope you will find this plugin useful as well.

Older Posts