The big picture

Sidestnd is a cloud-based SaaS publishing system for storing, organising, sharing, and using metadata resources. It aims for simplicity, has a long-term focus and zero lock-in. It is build around a repository that provides long-term access through REST or MCP interfaces and permanent URIs, which are system- or user generated.

That is quite a mouthful! On this page, we'll dive into this statement to understand its components and significance. Even though Sidestnd focuses on a relatively small functional domain, there is quite a bit going on under the hood.

If you want to learn how Sidestnd works in practice, click here to jump to a practical example.

If you're just looking for a quick summary, there's a table that summarises Sidestnd's functionality.

Otherwise, keep reading.

back to main documentation page

What problem is Sidestnd trying to solve?

Sidestnd is all about metadata resources. Metadata resources are files that are used to support creating, modifying or querying information. Examples of metadata resources include controlled vocabularies, taxonomies, lists, schemas, list with constants.

Metadata resources are files that provide context to data. Without context, data loses value. And with the wrong context, data becomes unreliable. Proper management of metadata resources is therefore a small, often overlooked part of the data management puzzle.

There are two main challenges when dealing with metadata resources: change and standardisation.

Handling change

The main challenge when managing metadata resources is handling change. Metadata resources represent things in the real world, and as such they will have to change when the real world changes. Additionally, the environment in which metadata resources are created, stored and used can change as well. Technological change, mergers, functional requirements all lead to changes in the systems environment which — when not carefully managed — could lead to problems down the road.

Managing change is something that does not come easy to all organisations. This is especially the case in situations where change is forced from the outside — for example, a new government directive. Success depends on several factors, including the maturity of the change handling process. An immature process is characterised by lots of email flying around, ad-hoc use of generic tools like Excel, manual copying of files, and/or relatively long duration. This can easily lead to outright errors or slightly different copies floating around the enterprise, both of which can lead to expensive cleanups later.

As long as data is not too old or the systems landscape is not too complicated, a less than optimal change process usually suffices. But when data is outliving systems, cracks will start to appear. The older data gets, the more likely it is that there have been changes in the metadata resources that have guided its creation. If these changes have not been handled properly — for example when the original metadata resource files were overwritten - then it becomes increasingly difficult to view data in context.

Many organisations are aware of these risks and have invested in data management tool suites. This is generally a good thing. However, such tools generally cover a lot more ground than just metadata resource management. The resulting complexity (in terms of implementation and operations) sometimes backfires, in the sense that the people doing the work find it is easier to continue the existing practice of using email, Excel and a file manager, thereby negating the investment in an expensive tool.

Handling standardisation

Enterprise data architects dream of standardisation. They know that small variations can be costly, for example because it requires data cleanup pipelines in the business intelligence environment. At the same time, they've learned that data producers and consumers cannot or will not change their ways on command. Systems might have their own data silos, mergers might lead to new challenges, legacy systems might not be phased out — there are many reasons why big-bang standardisation is complicated and often fails.

A key issue is end-to-end management. While there often is some kind of central repository (at least for the more important metadata resources), it turns out that production systems have specific requirements with regard to change processes, storage and access, data format and syntax. A common problem is that a production system needs a specific file format and/or name that is different from the "source" file. In order not to pollute the repository, a manual connection (read: copy/paste and edit) between production system and repository is then as good (or brittle) as it gets.

A solution would be to change (standardise) the production system input, but this often fails. The resulting manual process keeps the systems humming but with a serious risk of mismanaging metadata resources in the longer term. For example, it might be tempting to quickly update an error in a file that is used by a particular system without actually communicating this change back to the owner of the file. Before long, different versions of what should have been the same thing are floating around.

A better solution is to standardise where possible and accept variety where necessary. In a well-designed environment there is no harm in letting production systems have it their own way, as long as this is known and embedded in proper processes. Then, when the time is right, standardisation will follow.

Having our cake and eating it

Centralisation and standardisation are key elements of any enterprise data management strategy. At the same time, decentralisation and diversity are things that are unavoidable in practice. The challenge is how to deal with these opposing directions. One way to do this is to use a tool that is designed to give nudges towards the former while still accepting the latter. Only then can the unavoidable changes be managed successfully.

back to top

What are the fundamental Sidestnd concepts?

Handling the conflicting demands of standardisation, centralisation, inertia and change requires certain design choices. Sidestnd was designed from the ground up using the following concepts:

  1. content-based resource identifiers;
  2. a data model that supports versions and manifestations;
  3. custom, virtual and permanent URIs;
  4. human + machine-readable interface with permanent URIs (HTML+REST+MCP);
  5. a publish-subscribe model;
  6. zero lock-in / 100% export;
  7. a balanced security model;
  8. publish as an organisation.

Collectively, these concepts contribute to making Sidestnd suitable for long-term management of metadata resources. Let's see what they all mean.

Content-based resource identifiers

Anyone who has worked with desktop computers knows that files are easily moved around, that file systems introduce arbitrary limitations, and that comparing files is time-consuming. The net result is that the same filename might have different contents or that the same content is stored in files with different names. This can be a serious problem, especially in an enterprise with lots of legacy systems and/or application silos.

The best way to solve this problem is to use the file content itself to calculate hash values. The resulting hash value is a fingerprint that uniquely identifies the file. Any change, however small, in content will lead to a different hash value. Using hashcodes makes it easier to compare files, thereby solving one side of the problem (same content with different names).

Data model

The other side of the problem cannot be solved with hashcodes. If a file was edited for spelling errors, then the new hashcode will be different even though there is clearly a logical relationship between the two files. We want to be able to express this similarity, because it allows us to group related files (which simplifies management considerably).

A data model defines entities and their relationships. Sidestnd has a data model that was inspired by the FRBR model which comes from library science. FRBR provides a vocabulary for grouping related files. Then, we can assign an identifier to such a group and use that as a "shortcut" instead of having (many) references to each individual file (in the group).

Sidestnd exposes the group identifiers as URIs, similar to file URIs. This is an important feature because it means that other systems can not just link to files but also to the logical groupings.

The data model is further explored in the section What is the data model?.

Permanent or virtual URIs

A fundamental principle of Sidestnd is "once published never change". The reason for this is user expectations — a user must be able to trust that a resource is unchanged. The benefit of such a rule is clarity. There will always be a temptation to make minor changes without publishing a new version; "It is just a comma / spelling error / etc.". However, this quickly becomes a slippery slope. It is much clearer to have an unequivocal rule, and an explicit mechanism for handling updates.

Unchangeable files go well with permanent URIs. A permanent URI always returns the same binary content. Each metadata resource in Sidestnd has a permanent URI. There can be as many permanent URIs for a metadata resource as required. Permanent URIs are the cornerstone of long-term management of metadata resources.

In some cases, permanent URIs are inconvenient or even logically impossible. For example, a user might be interested in a "/latest" or "/current" version. It is perfectly acceptable to use a "virtual" URI as long as its behaviour is clearly documented. The logical groupings in the FRBR data model are also exposed through virtual URIs as these are collections that never get smaller but can still grow over time and as such are changeable (see the data model for more info).

HTML+REST+MCP interfaces

The HTML interface is the premier, platform-independent way for users to interact with a system. Computers can interact with the HTML interface to get things done but it is generally better to not rely on this. A better solution is to add a REST interface that is optimised for computer interaction.

REST stands for Representational State Transfer. It is an architectural style for designing networked applications. It relies on a stateless, client-server, cacheable communications protocol - typically HTTP. A key benefit of a REST interface is that it is easy to use from the command line and by scripting languages.

A REST interface is not just useful for general automation, but also a key building block in realising a zero lock-in system. When the REST interface exposes all the data then a full download is only a simple Python script away.

MCP is similar to REST but focused on AI. Model Context Protocol (MCP) is an open standard developed by Anthropic that enables AI assistants to securely connect with external data sources and tools. More information about using MCP can be found here.

Publish-subscribe

Metadata resources are reflections of real-world things, and as the real world changes the metadata resources tend to change as well. The change frequency varies from daily to yearly or even longer. Especially when changes don't follow a specific timing pattern, it becomes easy to miss an update.

Publish-subscribe is a powerful mechanism for distributing such changes. Anyone interested can register to follow changes. When something changes, a notification is sent out to all followers. This works especially well in settings with irregular or infrequent changes.

It is not just easier for the subscriber - it also means that the publisher doesn't have to keep email distribution lists etc. Instead, the publisher can rely on the system to propagate changes to consumers who have taken action themselves (by registering).

Zero lock-in / 100% export

Lock-in is a problem when data outlives the systems that use it. It is especially problematic when systems are not set up to share with other applications but keep their own silos. As it is inevitable that systems will have to be replaced at one point, a system that is self-centred and unwilling to share its data will make it more likely that data is lost or corrupted.

The only way to prevent this from happening is to make a commitment to openness. By design, Sidestnd is fully open. The REST interface exposes all data, so it is easy to get the data out of Sidestnd. There is zero lock-in. It is not just the metadata resource files themselves, but also the organising structure and even things like custom URIs.

Being this open has additional benefits. For example, there is no need for a complicated custom backup system, because it is easy to write a script that leverages the REST interface.

Balanced security model

Metadata resources generally have limited security and privacy requirements because they don't contain user information or other business-critical information. This is one of the main reasons why Sidestnd focuses on metadata resources only: it makes it more acceptable to use a pragmatic security and privacy approach that would be immediately flagged as unacceptable if "real" data were in scope.

By doing this, Sidestnd avoids one of the traps of larger data management suites. Because these suites must deal with all kinds of data they tend to enforce a relatively complicated security model even when it is not strictly necessary for the content at hand. The resulting complexity is a nuisance at best and could possibly even lead to ad-hoc initiatives to build a separate process outside the suite.

Sidestnd offers the following security features:

  • security by obscurity: Sidestnd entities and their URIs use GUIDs which are effectively impossible to guess or predict. There is no global search function or global list of works;
  • roles: there is a difference between registered (logged-in) users and guests (guests see less);
  • organisations: there is controlled access for resources tied to an organisation (members see more than non-members).

The result is a balanced approach that allows for a more targeted security approach which is more suitable for long-term access and storage anyway.

Publish as an organisation

Publishing as an organisation has a number of benefits. First and foremost, it clarifies the fact that a particular metadata resource is not some personal initiative but that it falls under the responsibility of an organisation. Second, an organisation has a number of additional functionalities to block or give access to information. Third, users can subscribe to updates from the organisation.

The primary identifier for an organisation in Sidestnd is the domain name. The administrator of an organisation must use an email address with the same domain name part. Users with the same domain name will be added automatically as members of the organisation. The administrator can also add users with different email domains as members.

A common scenario is a federation of organisations, where some resources are relevant for all participating organisations and others are more local. This can be a stepping stone towards standardisation. The central organisation prescribes the usage of certain resources but allows for local variation. Sidestnd has a federation model that supports these scenarios (premium feature).

back to top

Running Sidestand

Sidestnd runs in the cloud (AWS) as a "Software as a Service". It uses the URI https://sidestand.groundworkdatamanagement.com.au. By using AWS, it has excellent availability and performance characteristics.

Sidestnd is all about publishing and using metadata resource files. These files are organised into works, versions and manifestations. This is explained further in the data model section. Adding a new file to Sidestnd is very easy. The user is guided through a number of steps that lead to placement of the file under the appropriate work, version and manifestation grouping. It is at this moment that (permanent) URIs are constructed.

Once published, there are multiple options to share and invite other users. These users can then access the information, bookmark, follow, communicate with the owner, etc. A key characteristic is that users can rely on the fact that the information will not change.

A special feature is that users can clone a work. Cloning means that the user creates a copy of the work (in other words, becomes the administrator of a work with a new unique identifier). A clone can develop independently from the original, or it can still follow an original and receive notifications when new content is published under the original work (which can then be merged into the clone).

Metadata resources (files) have their own permanent URI, which is based on the content fingerprint. Users can access (download) a file using the permanent URI, but they can also add so-called aliases. An alias is a specific URI for a file. This is meant for situations where users have specific requirements or limitations, for example a hardcoded link. An alias is not a permanent URI; it can be assigned to return different content over time.

Finally, because it runs on AWS there is a real opportunity (in terms of scalability and availability) to let production systems get their metadata resources straight from the Sidestnd repository. Doing this can simplify deployment considerably; no more copy actions that go wrong because someone is on holiday. Even when this is not immediately technically possible for legacy systems, the ability to do so provides a growth path in the future. Another candidate for accessing metadata resources directly would be a Business Analytics environment, where this information can be used to optimize queries.

In some cases, it might be a disadvantage to run a metadata resource management tool under an external URI. An enterprise feature of Sidestnd is that it can be embedded in another website without having to host it. For example, suppose that website example.com wants to offer transparent access to Sidestnd. After defining an entry point like https://www.example.com/services/metadata and using the private context feature, Sidestnd URIs would become local from the perspective of users of example.com. So instead of https://sidestand.groundworkdatamanagement.com.au/works/workguid, users would see something like https://www.example.com/services/metadata/works/{workguid}. This is a plus feature aimed at enterprises.

back to top

The need of a separate tool for metadata resources management

Organisations are always looking for opportunities to centralise and simplify their ICT portfolio. There is usually a default "no" to any request for a specialised tool, let alone for something as exotic as management of metadata resources.

Someone who wants to change this must do several things. First, they must explain what kind of problem needs to be solved. Second, they need to prove that existing tools are not sufficiently helpful. Third, they will want to describe why a specialised tool is going to solve the problem.

Answers to these questions will be highly context-dependent. In general, the first explanation is likely to revolve around the keywords "data quality" and "long-term". If data is important to an organisation and if it is long-lasting, then quality issues tend to become visible over time. Making people aware of this risk is an important first step. Often, it helps to provide concrete examples where quality of decisions suffered or where additional expenses were needed.

The analysis of existing tools might point to their lack of functionality, overly complicated implementation and management, exploding costs, etc. It could also be an acknowledgement of the fact that the current tool landscape is heavily in flux because of the AI revolution. Given that things are changing so rapidly, one could argue that going all-in with a big data management investment might not be the wisest strategy now. At the same time, a focused, relatively small investment to get the basics right is never wasted effort.

Finally, the benefits of a specialised tool like Sidestnd can be shown. It generally does not hurt to mention that — by being small and focused — Sidestnd has better odds for long-term availability and data integrity than all-encompassing solutions. The latter have a tendency of getting "upgraded" or "phased out". During such a project, a few broken identifiers is considered collateral damage but this is exactly what breaks metadata resource management in the long run. Paradoxically, smaller can be safer.

back to top

Using existing tools does not always work

Metadata resource management is not new. So why not use an existing tool that does the job? It is true that there are many tools available, but the key insight is that successful long-term metadata resource management needs a combination of features that is not commonly available. So it is easy to point at individual tools having powerful features, but more difficult to find a tool that integrates these features in a way that works for metadata resources.

Many existing tool are either too small or too big. If a tool is too small, then it won't enforce certain best practices or rely on other tools or user input to get things done. For example, a generic explorer-like file manager is easy to use, but doesn't enforce a long-term organisation model that is scalable like FRBR. It can be used productively by people who know what they are doing, but the odds are against that.
Some tools are too big, covering too much ground. If a tool is too big, then it usually suffers from complex installation, steep learning curve, and lack of focus. It is often quite expensive too. In such a situation, metadata resource managament is just a small, overlooked part of the functionality.

We need something tailor-made that has just enough functionality but nothing more. What are non-negotiable features of such a tool?

First of all, there must be content-based identifiers for files instead of relying of names and paths in a file system. The only way to be sure that two files are identical is to compare their content hash. This is a true foundational building block.
Second, we need a built-in, low-level data model optimised for grouping resources. This is for files that are different (in terms of content hash) but similar (in terms of meaning). We need something that enforces best practices and does not rely on the knowledge and discipline of the operator.
Third, we need to build our application around permanent URIs and REST. A URI must return the same binary content now and in the future. Also, we must open the system up for computers, not just human consumption through HTML. This enables all kinds of data transfer and scripting scenarios that are just too brittle otherwise.
Fourth, we need to limit complexity of installation and upkeep as much as possible. This is where using a cloud environment pays off, with its built-in scalability, reliability, and security.

Going over these points, each can probably be matched with a particular tool. But how about the combination of all four? The combination is the whole point. Sidestnd was designed from the ground up to combine features in a way that particularly benefits metadata resource management, whereas other tools may have different target audiences or business requirements. Sidestnd is focused on solving the long-term metadata resource management puzzle. Nothing more but certainly nothing less.

back to top

Summary of Sidestnd functionality

Yes, the following table lists the key features.

FEATURE EXPLANATION
end-to-end approach combine storage, organisation and production usage in one SaaS environment
long-term focus permanent URIs based on immutable content (publish once, change never)
FRBR model built-in classification / grouping model
publish-subscribe follow works and/or organisations with multi-channel notifications
full programmatic access REST interface makes it easy for computers to access data
zero lock-in all data accessible through REST interface
aliases every metadata resource can have an unlimited number of custom URIs (permanent or dynamic)
tags use /latest, /current, etc. to get virtual URIs
support for files that reference other files for example, a schema file that imports several components from other files
bulk upload either upload a file at a time, or use a ZIP file to upload multiple files in one go
scalability; availability runs on AWS
federation support for a model where parent organisations can mandate content but children can add their specific files (Sidestnd Plus)
embeddability run it as part of your own website (Sidestnd Plus)
continuity access to source code under certain conditions (Sidestnd Plus)

back to top

Example: start using Sidestnd

Suppose Jane is a data manager in a medium-size enterprise. There have been enterprise data standardisation efforts, but they were mostly focused on customer data. Jane notices that it is sometimes hard to get the right version of a controlled vocabulary or list that was used previously in production. After doing her research, she decides to start an initiative where Sidestnd is a central repository for the enterprise.

signup

Jane signs up with Sidestnd by using her company email address. After completing the signup procedure, she logs in and goes to her dashboard. Here, she creates a new Organisation of which she will be administrator. Now, she can go to the Organisation start page, and do Organisation-related maintenance, like invite other users.

uploading and organising metadata files

Jane has a number of copies on her local drive. She also has access to the enterprise filesystem. She selects a number of high-value metadata resources. Most of these are text files, but some of them are PDF or DOC. With Sidestnd open in the browser, she starts with a file named "start_categories.json". This is a file that is used by a production system named FGT.

To start Jane clicks "publish" in Sidestnd. She clicks "publish as organisation" and selects the file for upload. This is where the FRBR model comes into play. As Jane has not published before, it is easy: she has to create a new work. A work represents the content in its most generic form, regardless of versions or manifestations. Jane names the new work Start_Categories_FGT. She could have used another name but it is something that she thinks is sufficiently clear.

Now she has to choose a version name. A version represents a specific state of the content, irrespective of syntax or fileformat. Jane picks 04-2025 (a month-year date format) as the version name.

Finally, she has to make a decision about the manifestation name. A manifestation is the specific "implementation". In this case, it is a JSON file. In most cases, using the file extension is a good manifestation name. Sometimes, a more elaborate name like "json generated by X" is more appropriate. This is fine, as long as the manifestation name is unique within the containing version.

Jane is now ready to publish the first work for her organisation. She clicks OK, and she is redirected to the new work. Here, she can add information to the work, expression or manifestation (things like a description, status, category, access, etc.).

distributing / sharing content

When Jane goes back to her organisation's start page, the work that was just published will be listed. After clicking on the work link, she lands in the work page for Start_Categories_FGT. This work has a permanent URI that follows the convention /works/{workGUID}.

Suppose that Jane wants to share the work with coworker Joe. She can copy the URI and compose an email with the link herself or she can click on the share icon next to the work's title. Here, she can enter Joe's emailaddress and Sidestnd will send the email to Joe.

When Joe receives the email with a link to the Sidestnd work URI he can have a look in his browser as a guest and sign up if as a regular user. If he uses an emailaddress with the same domainname as the organisation, then he will be added automatically as a member. This has several benefits. As a registered user, he can follow the work. If new versions or manifestations are added, he will be notified. He can also decide to follow the organisation, which means that he will receive a notification when the organisation publishes a new work. And of course, as a registered user he will be able to publish works himself.

adding files

It turns out that there are different versions of the start_categories file floating around. Jane finds three files: startCategories.json; start_categories_2022.json; startcategories.xls.

The first file is identical to the one she just published, only the name is different. This is a common problem - the same content under different names. Jane decides to add startCategories.json as an alias to the existing manifestation. She goes to the work page, finds the manifestation and clicks the File Aliases plus sign.

The second file has similar but different content to the one that was published; it is clearly an older version. Even though she is unsure how relevant the file is today, Jane decides to add it to the work to prevent confusion in the future. Because Jane is the owner of the original work, she has a plus icon next to Versions. She clicks the plus icon, selects the file in question and creates a new version name 2022 (only year, because she doesn't know the month). For the manifestation name, she accepts the default json value. She clicks OK and the work now has a second version with one manifestation.

The third file startcategories.xls turns out to have the same content as the original start_categories.json file, but it is using a different file format. Jane understands that this will be a new manifestation in the FRBR model.
She goes to the 04-2025 version in the work page and clicks the plus icon next to the manifestations label. She selects the file and accepts the default name "xls". After clicking OK, the version 04-2025 will have two manifestations: json and xls.

Finally, Jane comes across a generic README file that was written by a former employee. She decides that it would be valuable to store this document. Sidestnd has a feature called "supporting material". A supporting document can be attached to a work or to a version. In this case, Jane decides that the README file applies to all versions, so she clicks the plus sign next to supporting material in the work-level properties. Here, she can choose the role of the document - in this case Readme - and the file that will be uploaded. From now on, anyone visiting the work page will also have access to files that are about the metadata resources.

using information

Rob is a programmer working on system FGT. He regularly gets emails from business analysts asking for the start categories (because the business analyst want to understand the data that is produced by FGT).

Up until now, he would reply by email and attach the files to the email. He has tried using the company-wide file management system but the problem is that the business analysts can't find anything there so he still gets the emails - and then he might as well attach the file instead of pointing the business analysts to the right location.

With Sidestnd, Rob finds that he does not have to do as much work. He has reached out to the business analysts and showed them how to follow metadata resources. If they do this, they will immediately know when a new version or manifestation is launched. This prevents email conversations and lightens Rob's work because he no longer has to reach out himself towards possible users.
Also, the Sidestnd REST interface makes it trivially easy to write a computer script that uses the metadata files. As soon as the business analysts realised this, they rewrote their pipelines to directly directly access the metadata files instead of making local copies.
And finally, Rob is working on an update that will make FGT - the system - get the metadata file directly from Sidestnd instead of using a copy/paste into the Docker filesystem. Now, when a new version of the metadata resource becomes available, the system will be updated automatically.

next steps

After publishing one work there is always more. Sidestnd will scale to thousands of files easily (and if you're wondering: yes, there is a bulk upload tool called flex). So why not have a go at it?

back to top