Matching the labels of new datasets to already defined labels greatly increases the consistency of the database and makes it much easier to find relevant data. It is therefore a core requirement for incoming datasets. Label matching can be a lot of tedious work, however, which is why the IEDC now offers a probabilistic and semantics-based label matching tool based on the nomic-embed-text:v1.5 by Ollama large language model runner.
For example, an incoming label "battery vehicle" is matched with "passenger vehicle: battery electric vehicle", which is the already defined label and the right match. "screen" is matched to "monitor", "UK" to "United Kingdom", etc. For such matches, you need an LLM that captures semantics from the training data and cannot rely merely on string similarity.
How does it work in practice? Simply choose one of the IEDC's main classifications and enter a label or a list of labels to discover the semantically most similar labels from IEDC database:
https://www.database.industrialecology.uni-freiburg.de/labelMatching.aspx