Jump to content
Christopher

The insanity of trying to make safe AGI

Recommended Posts

Reward hacking, part 2.

Part one was mostly about it finding "very ideal strategies we did not anticipate". This one goes a bit deeper with more example. And then adds the AI actually hacking the reward function to always produce "max value":

Share this post


Link to post
Share on other sites

Stop button? I got a hammer that will stop any AI!

That is just a mechanical stop button. It manipulating you to never use it will still apply.

To get around it, it will propably have to convince you to commit suicide or put a general ban on hammers. You know, wichever is less work :)

Share this post


Link to post
Share on other sites

I still think a sense of empathy is a major milestone/benchmark that is largely undiscussed in the above videos.  With empathy, it's easier to "teach" an AI ethics/morality.  Without, it seems like the problems described above ("reward hacking", e.g.) would be more likely.  

Share this post


Link to post
Share on other sites

Christopher I curse the machines I work with now! Trust me - I got mental defense vs machine probably in the +30 range so the AI gets the axe. OTOH, I seem to be a natural jinx around machines so the AI is still in trouble.

It is a Artificial General Intelligence. If your mental defense is to strong, it finds another way to get to you.

Including getting you comitted to a insane asylum, because you think you have a Herogames Character Sheet in Real Life ;)

 

Also it really depends if you ahve "Machien Jinx" as a complication or power. You know you can not get benefits from a Complication.

 

I still think a sense of empathy is a major milestone/benchmark that is largely undiscussed in the above videos.  With empathy, it's easier to "teach" an AI ethics/morality.  Without, it seems like the problems described above ("reward hacking", e.g.) would be more likely.  

The problem with Empathy is: It is not 100% reliable in humans.

 

To quote something written at the end of one video:

"I would prefer not giving a being superpowers and hope it will turn out okay."

Share this post


Link to post
Share on other sites

A better analogy may be the distinction between working in 1, 2, 3, or 4 dimensions.  "Straight" programming might be analogous to a straight line, moving from point to point.  "Basic" AI(capable of independent problem solving) might be analogous to working in 2 dimensions, drawing a geometric shape.  AI which incorporates "human" dynamics such as emotion/empathy might be analogous to working in 3 dimensions. And robust AI, including human dynamics and a fuller conception of "free will"(being able to change it's own purpose/programming on the fly), might be analogous to working in 4 dimensions.  At each level, things become substantially more challenging and complex.  

Share this post


Link to post
Share on other sites

A better analogy may be the distinction between working in 1, 2, 3, or 4 dimensions.  "Straight" programming might be analogous to a straight line, moving from point to point.  "Basic" AI(capable of independent problem solving) might be analogous to working in 2 dimensions, drawing a geometric shape.  AI which incorporates "human" dynamics such as emotion/empathy might be analogous to working in 3 dimensions. And robust AI, including human dynamics and a fuller conception of "free will"(being able to change it's own purpose/programming on the fly), might be analogous to working in 4 dimensions.  At each level, things become substantially more challenging and complex.  

There are no such levels. Or maybe only the levels 1 and 4.

Once we have general intelligence, reprogramming itself is a given. Humans invented a whole subsection of Doctors (known as Psychologists) to help reprogramm ourself.

 

A general intelligence must be able to deal with any problem it encounters. Its own limitations based on hardcoded rules or programming are simply one such problem it will invariably encounter.

Share this post


Link to post
Share on other sites

How would one model the capacity for intuitive "leaps", the ability to go beyond the immediate implications of data to a point not yet confirmed by data or testing?

That is the part we are trying to figure out. Any kind of general intelligence would be a breakthrough. Even ones that are literally "dumber then a dog" would be a progress. Because we could upgrade from there.

 

Meanwhile we are already looking into how we can avoid a "Terminator" Scenario.

Share this post


Link to post
Share on other sites

Okay, I've read up a little(well, wiki'd up a little) on AI, so I get the sense that the stuff I'm suggesting is a bit further along on the milestone/benchmark chart.  Social intelligence on up, all the way to what we would call consciousness.  

"Social intelligence" is simply a result of having a general intelligence interacting with other general intelligences.

A general intelligence learning how to work a TV remote.

A general intelligence learning how to work the psychology of another general intelligence.

Potato, Potato.

 

The only thing special about humanity is that eventually developed specialised wetware for this opeartion. To run it more energy effective. That process is similar to what got us GPU's, NPU's and all the other specilised prcessing units.

Share this post


Link to post
Share on other sites

"Social intelligence" is simply a result of having a general intelligence interacting with other general intelligences.

A general intelligence learning how to work a TV remote.

A general intelligence learning how to work the psychology of another general intelligence.

Potato, Potato.

 

The only thing special about humanity is that eventually developed specialised wetware for this opeartion. To run it more energy effective. That process is similar to what got us GPU's, NPU's and all the other specilised prcessing units.

Our wetware is also insanely energy efficient by comparison to silicon-based simulacra.  

Share this post


Link to post
Share on other sites

Our wetware is also insanely energy efficient by comparison to silicon-based simulacra. 

Evolution does not favor wasting energy on anything. Even with that extreme efficiency, the brain still eats about 20% of our energy intake. We could not have such a big brain, if it was not that efficient.

 

In school I still learned that the "Appendix is propably Vestigal". Except data inticates it is usefull for a rather specific case of curcimstances (loosing most of the gut flora to diseases). Wich apparently are common enough, that it was not a waste of energy until now:

https://en.wikipedia.org/wiki/Appendix_(anatomy)#Functions

 

But if I leanerd one thing, it is that efficiency never comes without a cost. In this I doubt the Brain is designed to just run any odd general intelligence. It is designed to run one specific subset of general intelligences: Human like minds.

Basically humanity might be slightly less then a full general intelligence.

Share this post


Link to post
Share on other sites

Another post. This time he talks about how even a human Level AGI could act much smarter then any human, by having better I/O speed.

Much less just running on more and faster hardware.

That later part should stay possible, as long as the process is paraleliseable. Wich all current AGI reserach is aiming for. Since apparently the human brain is inherently paraleliseable:

Share this post


Link to post
Share on other sites

if evolution doesn’t waste anything then why does limbless lizards still have vestigial limbs and blind fish have eyes? Actually how can a supposed random occurance favor anything?

Ask the guy who just won at roulette.

 

Lucius Alexander

 

The palindromedary bets on even numbers, not liking the odds.

Share this post


Link to post
Share on other sites

if evolution doesn’t waste anything then why does limbless lizards still have vestigial limbs and blind fish have eyes?

Those vestigal limbs might be nessesary for the development of other parts of the anatomy.

Or like our long hold view of the Appendix, they actually do have a use we are just not aware off yet.

In any case, they are not a big waste of energy. Not compared to what the brain eats. It is not affecting the survival and reproduction of the Lizard in a meaningfull way, so the "selective pressure" (specimens die before reproducing) against it is limited.

 

"Actually how can a supposed random occurance favor anything?"

If your offspring randomly develop with +2 too -2 handiworks skill, the -2, -1, 0 and +1 will be selected out eventually, simply by the +2 being better adapted at this time.

Evolution is a mad scientist that tries absolutely everything at least 3 times over. The mutation either flies or dies. Evolution is "trial and error".

Some mutations are +2 Handiworks skill. Some are Haemophilia and never properly weeded out.

 

And really confusing is that +2 Handiworks might be a loss for any species that lack the brain to use it, so it is weeded out. While Haemophilia might actually make that specimen resistant to certain Illnesses.

For example there is a number of Genetic Disesaes that are nominally bad to have. Except they also make you inately resistant to Malaria. So this can actually be a tradeoff that is worth it.

Share this post


Link to post
Share on other sites

Activating all the human gene "switches" related to cognition has been suggested as one possible path to organic superintelligence.  Essentially you are engineering an outcome that might either never occur naturally, or might occur naturally only once in the entire lifespan of the species.  

 

http://nautil.us/issue/18/genius/super_intelligent-humans-are-coming

Share this post


Link to post
Share on other sites

The Kingdom of Saudi Arabia has just given citizenship to Sophia an AI robot that looks like Audrey Hepburn.

 

https://comicsands.com/weird-news/ai-robot-sophia-mocks-elon-musk-youtube-video/?utm_content=inf_10_3759_2&utm_source=csands&tse_id=INF_5f6c8be0bb4b11e78240a3a6bbddd67d

 

Note this robot may have more rights than actual women living in Saudi Arabia.

Share this post


Link to post
Share on other sites

The Kingdom of Saudi Arabia has just given citizenship to Sophia an AI robot that looks like Audrey Hepburn.

 

https://comicsands.com/weird-news/ai-robot-sophia-mocks-elon-musk-youtube-video/?utm_content=inf_10_3759_2&utm_source=csands&tse_id=INF_5f6c8be0bb4b11e78240a3a6bbddd67d

 

Note this robot may have more rights than actual women living in Saudi Arabia.

Crownprince Mohammed bin Salman seems to be a bit of a Reformer:

Allowed women to participate in the National Day: http://www.bbc.co.uk/news/world-middle-east-41387229

Lifted the Ban on Women Driving: http://www.bbc.co.uk/news/blogs-trending-41412237

Deversify the Economy off Oil: http://www.bbc.co.uk/news/world-middle-east-38951539

Planning a New City called Neom.

 

It would be wonderfull if it was better already. But I guess with the way the power is distributed, any leader has to thread carefully to not piss of the religious groups too much. think of it more like the Medieval Catholic Church, then any modern Religion.

 

Also I have to point out: "AI" in this case means a glorified Chatbot. That is still not closer to a AGI then ELIZA wich was created 1964.

Share this post


Link to post
Share on other sites

Sometimes generating something new needs more then one dumb AI's. But the results can still be surprising. Unfortunately the Video is a bit poorly structured, so I will try to give a summary afterwards:

 

Generally our Dumb AI's are simple "Classifiers". You give them a image/some data and they give you a simple yes/no answer.

Yes, this is a cat.

No, this person is not ill.

The sample set is usually generated by humans. You have a bunch of cat and dog images. You get some grad students to declare "Cat", "Dog", "Neither" on every one of the images. You can teach your Neural Network with the sample set knowing the right answers.

 

Unfortunately this is vulnerable to Adversarial Examples. Images that are modified specifically to fool the AI into a faulty yes/no answer. It showcases that they do not truly understand what makes "a cat" yet.

Also if you to ask it to draw a new Cat Image, it would always draw that "prefectly average cat" it learned from it's sample sizes. And it would always produce that "perfectly average" result if given the same sample set to start learning with.

 

So we have a classifier that is not robust at all and could only draw one picture of a cat. The solution to both Problems are: Generative Adversarial Networks (GAN's).

 

You have two agents in a contest:

The Classifier tries to get better at classification. It sample set starts with the usual "human evaluated" sets.

The Generator is getting Random Noise and trying to make something that fools the classifier out of it. It's "made up as we go" images are mixed into the sample set for the classifier (as "no" images) as time goes on.

The Generator "wins" if it fools the classifier (or at least gets it down to a 50/50 confidence). The Classifier "wins" if it reaches 100% confidence.

 

They are in a adversarial learning relationship. One human example given is teaching humans. If you try to teach a child the numbers, you would quickly get high confidence ratings on 0, 2-6, 8 and 9. But 1 and 7 are notoriously similar (depending on the font used and how well the handwriting is). So you would focus (but not exclusively) on those two numbers. You try to get the "marging of error" to be in line with the other 8 numbers. You are "hammering out the weakpoints" by focussing on that.

For humans adversarial learning too much is psychically detrimental. But a neural network has no such issues.

 

 

The end result of this? We get a classifier really good at identifiyng cats (and not falling for adversarial examples). And a generator that makes mostly convincing cat images.

 

Now really intersting is how that allows more advanced image recognition. You can make a generator that you give a "man with glasses" or "women smiling". It will substract the current gender, add the opposite gender. And gives you a "women with glasses" or "man smiling". That implies a way to classify "smiling". And thus the whole area of reading facial expressions/context, wich is a major hangup for Computers.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×