Email or username:

Password:

Forgot your password?
Top-level
mhoye

If this is reliable, this is "take something that needed a datacenter last year and do it on a phone this year" material.

8 comments
XaiaX

@mhoye I don’t know, seems like you would need the storage space to store all the references you’re comparing to, even if the computation is easy. Sounds like a time/space trade off.

Τοπάζ Αλαιν Φογτια Αννα Εμιλια

@XaiaX @mhoye but that's almost already the case, existing models are pretty large. I'd worry more about the computational complexity / representational examples for classification against (O(|to_classify| * |reference_dataset| * maxlen(to_classify {minkowski_plus} reference_dataset))

mhoye

@fogti @XaiaX So, I don't actually think that acres of storage space is all that meaningful if you just want to focus good-enough utility. We all kind of know you're not getting more than varying heats of spicy autocomplete out of these tools, and the core insight of statistics as a field is that you can make a very accurate approximation of the state of a large dataset out of a small subset of that data. So, maybe a wikipedia dump and a gutenberg mirror is plenty?

Τοπάζ Αλαιν Φογτια Αννα Εμιλια

@mhoye @XaiaX > spicy autocomplete

huh, nah the topic here is text classification, which is similar to text prediction (and there is probably a way to use Huffman tables to produce suggestions, etc.), but not the same.

> So, maybe a wikipedia dump and a gutenberg mirror is plenty?

probably yes. (imo coolest would be producing a tool that could both classify and predict using the same infrastructure (light preprocessed large text dumps (<10GiB), massive improvement)

mhoye

@fogti @XaiaX I'm a bit more interested in predictive text tools that give you stylistic nudges towards artists you admire, and finding a way to get artists paid for that. Smart autocomplete/smart fill tools that answer, "what might Degas have done right here?"

Τοπάζ Αλαιν Φογτια Αννα Εμιλια

@mhoye @XaiaX interesting thing is that it is probably *much* easier to simultaneously get the information for text completion *and* also what authors were involved in that match (instead of a large pool of authors just the relevant subset). [in the case of these compression-decompression + reference dataset models]

Matt Stratford

@mhoye @fogti @XaiaX

Idle thought that LaTeX equations rendering properly would be a pretty good use case for a Mastodon client, given the overall geeky clientele on here! cc. @ivory

Go Up