@apostolis @ohad Ambiguity exists in the world. A computer executing an operation does not handle ambiguity, it expects to be told to execute a sequence of steps from a finite state machine.
Something has to resolve the ambiguity. GUIs do this by presenting the set of available commands and requiring the person to learn them. Command lines (even rich ones like the INFORM interpreters other folks have brought up) do this by having a well-defined vocabulary and grammar and reporting errors if you deviate from this. Both of these approaches have advantages and disadvantages but they both have the common property that the human is responsible for resolving the ambiguity. Both also provide the human with feedback on how to resolve the ambiguity.
If you try to support natural language, you are moving that requirement into the machine. This removes agency from the human. Rather than having to be explicit about what you want, you grant the computer greater freedom.
For issuing instructions to another human, this is useful: the other human is an intelligent being with agency and may be able to solve the problem in better ways than you expected (or worse: ‘Will no one rid me of this turbulent priest?’). If this is a work context, a lot of team building exercises (and, in military contexts, a lot of drill and manoeuvres) are intended to ensure that you have a common frame of reference that ensures that the people giving and receiving instructions will do so in the same way.
Back to the computer, it does not have a model of how you think. The common pitch for how you use an LLM for this is to prompt it to consume a text stream from a human and emit JSON or similar for a rule-based system to execute. The LLM has no understanding of the rule-based system and no theory of mind. It has a latent space that defines a notion of similarity based on proximity in this n-dimensional space and produces output based on that notion. This is not something that a human can develop an intuition for (why does a one pixel change turn your classifier from tagging an image as cat to tagging it as dog?). There is no defined vocabulary of commands, there is only a well-tested part of the space. If you stick to that, it will probably work but you have something less powerful than just exposing the underlying rule-based system via a GUI or command line. If you stray from that, the interpretation of your inputs will diverge in surprising ways.
And if you can define a restricted set of commands that work, now you don’t have natural language. Now you have a command line (possibly a voice-controlled command line). You can write a grammar for it. Users can learn how to use it. Users get feedback when their commands are ambiguous and can express their intent more clearly. You empower users, rather than trying to remove agency from them and replacing it with inexplicable behaviour.