@wikipedia We need to meet about “federating wikidata”...

We need to meet about “federating wikidata” and thus wikidata.
It is easy and also @maxlath has done so many things in the field.
The ActivityPub protocol which we are writing here is multilanguage too.
"nameMap" = label
"summaryMap" = description
"contentMap" = wikipedia
Any claim or qualified statement from wikidata can be a "Relationship" (with e.g. startTime endTime etc.) –
all like in wikibase.

We can really do better than mastodon.
You know what languages your user speaks. Even if no settings, navigator is your friend.
I wrote https://github.com/redaktor/languages (852) and for browser you send user the language-fingerprints they need and your system knows always in what they type.
Any claim can federate, any top instance is a `type` etc.
The thing which I write for the fediverse will benefit from wikidata multilanguage. Also from wikidata cause any item can become a “Topic” [like a Hashtag as a Service Actor, where you can subscribe to].
Currently I am doing a Service Actor for federated geocoding.

Like 14 Apr 2023 at 18:57 | Wall-to-wall | Open on digitalcourage.social

3 comments

maxlath

@sl007 nice language detection lib! I was trying to get somewhere in that direction with just unicode https://github.com/maxlath/unicode-scripts-finder, but this looks much more powerful! What is the source for the fingerprints?

14 Apr 2023 at 21:16 | Open on mastodon.social

Sebastian Lasse

@maxlath

Most powerful would probably be: both :)
Also we need to mention that of the 850 languages only 400 are really “spoken” in the languages.
It was driven by the desire to give voice to minor languages and so after the reportages in Papua New Guinea the source was tinkering, as far as I remember about 400 were trained from wikibase and about 100 were trained from a mix of specific dictionaries and web sources.
We could derive the “language variants” by BCP-47 blowing it up to maybe 700.
The rest (about 150, mostly minor languages) was driven by desire to find anything you can find in this english-dominated internet, even in papers about the language.

Anyway: My belief is that any fedi onboarding [just in: https://www.smashingmagazine.com/2023/04/design-effective-user-onboarding-flow/]
should include the user saying
This is my native language (100%) and I speak these 10%-99% perfect …
Then it will get precise cause it limits it to the selection where we should have enough difference.
🧵 1/2

@maxlath

Expand text...

15 Apr 2023 at 6:53 | Open on digitalcourage.social

Sebastian Lasse

@maxlath

Otherwise the languages are limited to the script used (I think, it didn't cover mixed scripts, something to think of).

Looking into yours now.

15 Apr 2023 at 6:55 | Open on digitalcourage.social