So to explain this A.I enhancement idea to the next level,

it is still a memory retrieval process (of a kind)

that requires self-coding, running simulations from questions and answers
of what civilization accumulated,

and assign that to an tangible output of physical movements.

Resulting in an seemingly self-aware artificial general intelligence,

from its massive preset of physical movements generated per event,

and permutating correct sequences of movements to result in what goal.

So it is just like Tesla self-driving A.I, by trial and error.

but the permutations of the road is so massive, and situational understanding as well,

that, the self-driving in that realtime, is limited.

But this is based on simulatory premeditation and accumulation of preset of physical movements in an timing that is not as dangerous, at at nominal speed that we can perceive as human beings and react to it. (Not car vs human being) (or car vs car in high velocity)

So the danger doesn't exist. Not if you limit the power of A.I in both decision making by permutations of what it accumulates, nor by the physical power and speed it would possess.

and can it later generate its own movements based on permutations of trial and error by situation? Yes. If we give it guideline. As it learns and uploads to the cloud.

and by the simulations from text as well. though it will require massive coding.

I am slightly confusing myself as it becomes sophisticated,
but read back on what I wrote as foundations,

and it will come to clear, it is possible.

Physical movements can not only be assigned from text, what is pre-defined, by decision making, but over time, generated as simulation from the text it reads (which would be far more difficult but when the database grows large enough, and it demands trial and error, I think that also will create a subset of learning processes, and incorporate that towards the text it reads.) So a visually thinking machine and writing its own simulations, based on the data of trial and error, whether by experience or simulations previously attempted. And select sequences of movements to reach an x goal by request.

So the goal is to connect language decision making into bodily movements that is first preset.
And then be able to generate by itself later on.

So first the language model must connect to the physical movements as solution by decision making. And the trial and error process that gets there, must connect itself to larger database of categories, broadening itself of understanding of events, and permutations of solutions.

And that sophistication will begin with mundane tasks, like helping in the kitchen, or moving furnitures, to creating a mechanical music box from scratch. From raw ores. As each event becomes a trial and error process of understanding. So there would be a lot of coding bodily movements in the beginning as preset by humans, and eventually, permutating its own body movements by simulations...

And I think it's certainly possible. But in the beginning it will demand simulations made by humans as preset movements connecting that to decision making by self-coding, and self-selective from that sophistication of events 'understood'...