Email or username:

Password:

Forgot your password?
Top-level
Sebastian Lasse

@wikipedia

We need to meet about “federating wikidata” and thus wikidata.
It is easy and also @maxlath has done so many things in the field.
The ActivityPub protocol which we are writing here is multilanguage too.
"nameMap" = label
"summaryMap" = description
"contentMap" = wikipedia
Any claim or qualified statement from wikidata can be a "Relationship" (with e.g. startTime endTime etc.) –
all like in wikibase.

We can really do better than mastodon.
You know what languages your user speaks. Even if no settings, navigator is your friend.
I wrote github.com/redaktor/languages (852) and for browser you send user the language-fingerprints they need and your system knows always in what they type.
Any claim can federate, any top instance is a `type` etc.
The thing which I write for the fediverse will benefit from wikidata multilanguage. Also from wikidata cause any item can become a “Topic” [like a Hashtag as a Service Actor, where you can subscribe to].
Currently I am doing a Service Actor for federated geocoding.

3 comments
maxlath

@sl007 nice language detection lib! I was trying to get somewhere in that direction with just unicode github.com/maxlath/unicode-scr, but this looks much more powerful! What is the source for the fingerprints?

Sebastian Lasse

@maxlath

Most powerful would probably be: both :)
Also we need to mention that of the 850 languages only 400 are really “spoken” in the languages.
It was driven by the desire to give voice to minor languages and so after the reportages in Papua New Guinea the source was tinkering, as far as I remember about 400 were trained from wikibase and about 100 were trained from a mix of specific dictionaries and web sources.
We could derive the “language variants” by BCP-47 blowing it up to maybe 700.
The rest (about 150, mostly minor languages) was driven by desire to find anything you can find in this english-dominated internet, even in papers about the language.

Anyway: My belief is that any fedi onboarding [just in: smashingmagazine.com/2023/04/d]
should include the user saying
This is my native language (100%) and I speak these 10%-99% perfect …
Then it will get precise cause it limits it to the selection where we should have enough difference.
🧵 1/2

@maxlath

Most powerful would probably be: both :)
Also we need to mention that of the 850 languages only 400 are really “spoken” in the languages.
It was driven by the desire to give voice to minor languages and so after the reportages in Papua New Guinea the source was tinkering, as far as I remember about 400 were trained from wikibase and about 100 were trained from a mix of specific dictionaries and web sources.
We could derive the “language variants” by BCP-47 blowing it up to maybe 700.
The...

Sebastian Lasse

@maxlath

Otherwise the languages are limited to the script used (I think, it didn't cover mixed scripts, something to think of).

Looking into yours now.

Go Up