Email or username:

Password:

Forgot your password?
Top-level
Dmitri Kalintsev

@skysailor @inthehands Could you say more about the "documentable" bit?

Context for the question: every time you train a large language model you will get a different set of weights that affect what the resulting model will do, even if you run the training against the same source data set. In essence, you can't quite predict what your trained model will do from looking at the source data and the training parameters, so not too far from the HR folks silently thinking and feeling after all.

10 comments
Sky, Cozy Goth Prince of Cats

@dkalintsev @inthehands But in addition to being able to look at the training set, you can test the trained model, and even do so without it "knowing" you're doing that (whereas if you brought a human a bunch of test resumes mid-lawsuit, they'd probably alter their behavior.)

Paul Cantrell

@skysailor @dkalintsev
Indeed, and you can even do an A/B test of the model varying just one detail without the model “knowing” you’re playing a trick on it.

Dmitri Kalintsev

@skysailor @inthehands true, you can test the output. I suspect tho that the model will respond differently to the same input fed to it multiple times if the seed varies. And if you don't vary the seed, how do you know that a different one won't produce the results you don't want? Do you then iterate through the entire seed space?

Dmitri Kalintsev

@skysailor @inthehands thinking a bit more about it, I suppose you could test for a specific random seed and then always use that value..

Sky, Cozy Goth Prince of Cats

@dkalintsev @inthehands Hmm.

Thinking from a tech POV, I guess what I would want to know is:
Did the algorithm incorporate a random seed during post-training use? (Since, as far as I can tell, they're often just used during training/testing before deployment.)

If so, which seed settings did the vendor/employer recommend/use when making the sued-over hiring decisions?

Sky, Cozy Goth Prince of Cats

@dkalintsev @inthehands Thinking from a court POV, you'd probably (1) be looking at what seed(s) the vendor/employer actually used, and (2) have the opposing sides' attorneys trying out different seeds to see what favored their arguments best, and the court being left to decide what to make of that.

Sky, Cozy Goth Prince of Cats

@dkalintsev @inthehands There's a huge human element here in terms of the ability of the attorneys and their experts to explain that part of the tech and its relevance, the ability of the judge/jury to understand and interpret that, and how persuasive those explanations are as to convincing the judge/jury to favor one side or another.

Dmitri Kalintsev

@skysailor @inthehands oh, I can see that.

Regarding the seed, there would be one used for training and then another for inference.

From my admittedly limited and slightly orthogonal experience - I've played with image gen models, not language ones: you can get the same output from a given trained model if you feed it the same prompt and the same seed. But, you can't train another copy of that model, even using the same source data, training parameters, and seed. Your "supposedly same" model will generate completely different outputs, even with the same prompt and inference seed. Sigh. This is all such an alchemy. :(

@skysailor @inthehands oh, I can see that.

Regarding the seed, there would be one used for training and then another for inference.

From my admittedly limited and slightly orthogonal experience - I've played with image gen models, not language ones: you can get the same output from a given trained model if you feed it the same prompt and the same seed. But, you can't train another copy of that model, even using the same source data, training parameters, and seed. Your "supposedly same" model will...

Paul Cantrell

@dkalintsev @skysailor I suspect all this is a bit of a red herring. With a machine model, you can do things you could •never• do with an HR dept: Run it on 10 million resumes. Run it on repeatedly on the same resumes, altering on variable. Random? Run it on each 1000x. It’s a kind of broad testing that, should a court allow, would make many of the questions above evaporate.

Go Up