Email or username:

Password:

Forgot your password?
Hrefna (DHC)

If you want an interesting demonstration of how a person (who doesn't particularly know Rust) solves a problem versus how the #Devin software solves a problem, these two PRs are worth comparing:

* Human: github.com/pvolok/mprocs/pull/ (Human Solution: HS)
* The solution Devin generated: github.com/pvolok/mprocs/pull/ (Devin Solution: DS)

There are a couple of things I want to highlight about these, because I think this is legitimately an interesting thing to look at.

1/

14 comments
Hrefna (DHC)

1) Note the differences in documentation. The DS is _very_ terse and mostly describes _what_ on a _very_ granular level, HS describes what and why on a much higher level, includes a screenshot of the expected result, and is mindful of backwards compatibility.

2) HS doesn't make extraneous changes. DS changes several things that are clearly stylistic choices in the original source, or where the original source is using a cleaner approach. DS modifies several of these to fit its "style"

2/

Hrefna (DHC)

3) The DS pushes the state up to higher in the stack and then locks it in a mutex (the choice to use a mutex and not an atomic is… interesting, but that aside). The HS just passes the result along as part of the code path that was already established.

HS's approach is _much_ easier to read and more closely matches the code that is already there.

3/

Hrefna (DHC)

4) The DS includes sections of code that look… well they look like generated templates waiting for someone else to fill them in. There are even comments to this effect. It creates a dead branch to… write a comment that says someone else can put something here.

Notably, this text is a lot like how _tutorial projects_ are written. That stands out to me and tells me something about the training set.

HS's approach of course doesn't do that.

4/

Hrefna (DHC)

5) Finally, the biggest difference, and one of the things that _really_ stood out to me looking at the two solutions is this:

The #Devin solution solves the problem _as it is written_ (changing the color). It follows, to-the-letter, the instructions in the query.

The human solution actually reads the _problem that the requester was trying to solve_ and takes a stab at solving that in a slightly different way from the specifics of how the requester requested it.

That's _brilliant_.

5/5

5) Finally, the biggest difference, and one of the things that _really_ stood out to me looking at the two solutions is this:

The #Devin solution solves the problem _as it is written_ (changing the color). It follows, to-the-letter, the instructions in the query.

The human solution actually reads the _problem that the requester was trying to solve_ and takes a stab at solving that in a slightly different way from the specifics of how the requester requested it.

Mike P

@hrefna Interesting! Thank you. Some things that stand out to me:

1 - This confirms what I already thought: humans can understand things, AIs can't.

2 - As an experienced programmer who knows no Rust at all, I found the human PR far more understandable, and I even felt that it _taught_ me a little about the language, as opposed to the AI PR which taught me nothing at all.

3 - WTF is with that "New method..." comment? If I was reviewing this, I'd say "hell no" to that.

Hrefna (DHC)

@FenTiger yeah, 100% agreed. The "new method" comment stood out to me as well as an example of "this thing is bad at writing documentation by the standards of bad documentation"

Yvan DS 🗺️ :ferris: :go:

@hrefna @FenTiger what worries me even more, is that this is a "good case", with all the technics we have at the moment.
Everything that I have tested recently on all the LLMs out there showed me that all the answers on the same subject tend to converge, even on very different models.

I don't have data, but my theory is that they are all trained on more or less the same datasets. So their answers and capabilities converge.

It's going to look like this. Inadequate in non simple cases.

@hrefna @FenTiger what worries me even more, is that this is a "good case", with all the technics we have at the moment.
Everything that I have tested recently on all the LLMs out there showed me that all the answers on the same subject tend to converge, even on very different models.

I don't have data, but my theory is that they are all trained on more or less the same datasets. So their answers and capabilities converge.

Yvan DS 🗺️ :ferris: :go:

@hrefna @FenTiger the style might be different.
But the intent seems to be the same everywhere.

Eric McCorkle

@hrefna

This reminds me a lot of how things went during the whole outsourcing debacle of the mid-2000s. The nature of the documentation, the odd and simplistic technical choices, the obviously copied template code, and solutions that spec-lawyer their way to an unsatisfactory result.

Hrefna (DHC)

@emc2 Yep, and even earlier. Yourdon's Decline and Fall of the American Programmer was written in 1992 and was discussing these problems with outsourcing.

Dr. jonny phd

@hrefna super interesting, thx for writing out this comparison, actually genuinely useful to a very nooby language learner such as myself to see "bad version, good version" of a problem

Jon

Agreed, very interesting and useful, as always I appreciate the crisp analysis!

@jonny @hrefna

Go Up