Uncategorized

[Music]

[Applause]

[Music]

artificial intelligence is a term which for many people I think comes up images of Hollywood movies of killer robots or perhaps even the the more subtle but perhaps more sinister type of artificial intelligence we saw in how in the film 2001 in which it becomes increasingly paranoid and eventually kills some of its astronaut masters that’s the world of science fiction there’s something very interesting has been happening over the last 5 years I think one of the evidence points for this is just some ofthe the books which have appeared Israel books which have been published in the last couple of years and just look at

some of the titles our final invention artificial intelligence at the end of the human era super intelligence past dangers and strategies the artificial intelligence revolution with artificial intelligence save us or replace us and even various luminaries have commented on this Stephen Hawking uh said the development of full AI could spell the end of the human race Elon Musk said rampant AI is the biggest

existential threat facing mankind somewhat pessimistic view of the subjects I would say so this evening

what I want to do is to look at some of

the reasons why we’ve seen this

explosion of interest in our official

intelligence to look at some of the

science behind the technologies and

hopefully to paint a slightly more

optimistic view of how a I might help

humanity rather than destroy us one of

the things I want to do as well in this

talk is to look a little bit at the

history of AI because the history is

quite fascinating and it goes back a

long way certainly at least as far as

this person Alan Turing I’m sure you’ve

all heard of Alan Turing he was a

Cambridge mathematician famous for many

things but amongst them in the 1930s he

really laid the foundations for modern

computer science in in a sense

conceptualized the what today we call a

digital computer and he asked the fall

in question could such a machine emulate

the capabilities of the human mind could

a machine think well he was largely left

to theorize because he didn’t have in

his day the technology to take this

forward and really the big development

that has allowed us to think of

artificial intelligence potentially as

something of a practical value is the

development of the digital computer the

modern in particular the modern silicon

computer this is a rather old silicon

chip containing what a modern ship will

contain billions of components these

chips are incredibly powerful my laptop

for example can could probably do

several billion operations a second and

here’s an example of an operation take

two 16 digit numbers multiply them

together get the answers correct to 16

digits so these are this humble rather

ordinary laptop can probably do several

billion several thousand million of

those per second so what that machine is

doing is its slavishly following

instructions in this case the

instructions to multiply two numbers

together but back in the 1950s computer

scientists working with machines far

less powerful in this laptop wondered

whether a set of instructions could be

found to program a computer which would

cause it to exhibit intelligence so this

is an example of such a program this is

from 1964 this is called Eliza as

developed at the AI laboratory at MIT

this this particular script this

particular version was a psychotherapist

in it today we call this a chatbot so

you had a little conversation with the

computer and it would use various tricks

it would ask you for your name and it

would call you by your name and too many

people at least superficially at least

for a short period of time it appeared

that it was exhibiting something a bit

like intelligence of course if you

interacted with it for more than a few

minutes you realize as extremely

durman’s was not in the slightest bit

intelligent but at least officially it

seemed to mimic intelligence and this

field of artificial intelligence was

tremendously popular back in the 1950s

in the 1960s

it was founded on the idea that computer

scientists would be able to program

computers to be intelligent that is to

say to write a computer program or a set

of instructions such that when the

computer followed these instructions

perhaps blindingly fast billions per

second perhaps it would exhibits

intelligence and there was a tremendous

amount of hype rather like there is

today a tremendous amount of excitement

about artificial intelligence and its

potential however that that excitement

didn’t last forever

and probably didn’t really last for very

long by the time we reached the the

1970s things began to change and one

landmark moment in the history of

artificial intelligence was the

publication of a report by light Hill in

1973 in which he presented a very

pessimistic prognosis for this field

which led to an almost complete

cessation of funding first in the UK and

then elsewhere and led to what some

people have called the AI winter where

AI artificial intelligence became very

unfashionable now that report was it was

a seminal report in the field and

shortly after its publication the BBC

televised a debate from this very

lecture theater from the Royal

Institution and I’m going to play a

short extract from that debate and this

is hosted by Sir George Porter who is

the president of all society and also

the director of the Royal Institution

good evening and welcome to the Royal

Institution tonight we are going to

enter a world where some of the oldest

visions that have stirred man’s

imagination

blend into the latest achievements of

his size tonight we’re going to enter

the world of robots robots like shakey

developed by the Stanford Research

Institute shakey is controlled by a

large computer he’s directed through a

radio antenna through a television

camera he gets visual feedback from his

environment the box appears on the

monitor screen the computer analyzes the

traces which appear on the visual

display until it can interpret them as

an object it recognizes shakey gets

tactile feedback through his feelers

he’s able to move boxes with his Pushpa

he’s programmed to solve certain

problems that can be contrived in his

environment to choose say an alternative

route to a certain point when his way

has been blocked shakey is

unquestionably an ingenious product of

computer science and engineering but is

he anything more is he the forerunner of

startling developments which will end

our machines with artificial

intelligence enable them to compete with

the leave now strip the human brain one

man who’s pessimistic about the

long-term prospects of artificial

intelligence is our speaker tonight

so James light Hill one of Britain’s

most distinguished scientists

he’s a location professor of applied

mathematics at Cambridge and has worked

in many fields of applied mathematics

he’s a former director of the Royal

aircraft establishment at Farnborough

last year he compiled a report for the

science Research Council which condemned

work on general-purpose robots not

surprisingly scientists who’ve been

working on such robots reacted strongly

in defense of their field three of them

are here tonight to challenge to James

findings after they’ve had their say the

discussions will be open to bring in

members of the audience here with many

mathematicians and engineers computer

scientists and psychologists among them

their contribution will be particularly

welcome Security’s know how hairstyles

have changed since 1973

so that was the beginning of the AI

winter and we had a period when it was

positively unfashionable to work on AI

or to say you’re working on on

artificial intelligence nevertheless of

course the field of computer science

advanced with great rapidity and there

were many exciting developments one

that’ll draw attention to is in the

field of computer programs that play

chess so in the heyday of old-fashioned

artificial intelligence chess was

considered to be one of the the

pinnacles of human intellectual

achievement surely if a machine could

play chess everything else like solving

world poverty and global warming

whatever would be sort of trivial by

comparison

well it turns out that in 1997 a

computer program like the chess machine

called deep blue built by IBM beats the

world chess champion Garry Kasparov now

what’s interesting about this is the way

in which it worked because this machine

was dedicated to playing chess it did

one thing and one thing only which was

the play chess and it did it very well

and it did it by following a series of

instructions to analyze moves and

responses to moves and to evaluate board

positions and it made use of the very

high speed of digital computers it would

analyze literally millions of possible

moves and countermoves in order to

choose a good move for the machine a

more recent example again from IBM is

the machine Watson which defeated Ken

and brave who’s Brad who are the two

champions at this US television quiz

show called Jeopardy and so this

challenge was conducted on live

television and again very impressive

this is a sort of general knowledge

question answering type of quiz and

Watson made use of information that it

pulled off the internet including the

whole of Wikipedia and much else besides

again tuned by a team of very smart

people at IBM over a period of about

seven years in order to do this one very

specific thing which was to win at

jeopardy and so we have during this

period two examples and there are other

examples of computers doing things which

previously only humans to do being able

to do things better than humans things

which appear to be very intellectual in

nature but they’re very specific and of

course every time another task was

completed by machine to a level greater

than humans people said well ok that

wasn’t really intelligence after all

that’s not really artificial

intelligence about some cynics have

actually said artificial intelligence is

simply anything that computers can’t yet

do

so every time we solve another problem

we’ve not we’ve not advanced artificial

intelligence and in some ways that

that’s fair because one of the most

extraordinary things about the human

brain it’s not that we can do question

answering although we can play chess but

we can do all of these things and we can

learn to do new tasks so there’s

something rather special that we haven’t

captured in examples I’ve shown so far

but something very interesting began to

happen about five years ago any concerns

a field which has been around as you’ll

see later since again going back to the

1960 the field of neural networks and

the development of so-called deep

learning a little bit later in this talk

I’m going to dive into some of the

science behind neural networks and deep

learning deep learning was developed by

Geoffrey Hinton at the University of

Toronto and other academics and

colleagues around the world and appeared

to show great promise and so an attempt

was made to see whether this would scale

to some sort of real-world problem and

back in 2012 Jeff and others from

Toronto collaborated in this case with

Microsoft Research in Redmond to apply

deep learning to the problem of speech

recognition a speech recognition is a

tough challenge it’s been around for

many years performances of speech

recognition systems if you’ve tried one

from 10 years ago you’ll know it’s

pretty pretty bad could deep learning

help with speech recognition now at the

time there’s a whole community of people

working on speech recognition people

doing PhDs going to conferences

publishing papers but the performance

the error rate of speech recognition

systems have been pretty much flat for

an entire decade along came deep

learning and immediately produced a 30%

reduction in error rate now that was

dramatic if you think well the effort

that had gone into this field over the

previous 10 years so this particular

example this is Rick rash it was the

founding vice president who set up

Microsoft research worldwide he’s seen

here in Beijing and he’s illustrating

the power of deep learning by giving a

talk to a Chinese audience in Mandarin

now he doesn’t speak any Mandarin what’s

happening is that he’s speaking in in

English and that is being translated for

first of all captured and transcribed

into text by a deep neural network that

English text is the

translated into Mandarin by another deep

network and in finally a third peak

Network is taking that Mandarin and

synthesizing speech but using samples of

Rick’s own voice so even though it’s

Mandarin it still sounds a little bit

like Rick so that was a sort of seminal

moment these deep neural networks we’re

not programmed to do this they learned

how to do that they learned from data so

a very different approach to artificial

intelligence and this is a particularly

moving occasion because they’re about

5,000 Chinese students in the audience

and very some of the students are in

tears at the thought of the language

barriers might at last come come

crashing down so that was interesting

but what’s particularly interesting is

that this one technique of deep neural

networks in seems capable of solving

many different tasks so this is a

British startup called deep mind and in

2014 they applied deep neural networks

because of the technical reinforcement

learning to again tackle some games not

chess this time but some old Atari games

about 50 of them and the neural network

learns to play these different games by

effectively a process of trial and error

so it makes random news it sees how a

scorer is doing tries different things

it gradually learns over a period of

many many games how to play the game

successfully in about half of the game

that achieved human level performance

now what’s interesting it wasn’t

programmed to solve a specific game the

exact same architecture the exact same

software was able to learn to stall or

play a whole variety of different games

so this is much more like the

capabilities of the brain the ability to

learn and the flexibility to learn new

problems her deep mind was acquired by

Google and in 2016 earlier this year

they use much the same technique deep

neural networks and reinforcement

learning to tackle a very much harder

game the game of Go so this is a

deceptively simple-looking game

involving black and white stones on a

simple grid of squares

but the combinations of moves is very

much larger than in chess so from the

computational point of view this is a

very much harder problem than chess and

therefore building a machine that

achieves human level performance had had

proven he’s very much harder than with

chess and it was thought at least

another decade before that was achieved

so it’s very surprising then when only

this year alphago as the program was

known beat one of the world’s leading go

players probably about a decade earlier

than people had anticipated so that’s

game playing that speech recognition but

much the same technique of deep learning

can be applied to very different fields

here’s another historically very

important example this is called

imagenet so this is a data set of the

order of a million images classified

according to many different thousands of

categories and the goal is for a machine

to take an image and to assign it to the

correct category so it needs to take the

top left image of his input and the

output has to be the label judo and not

the label Oceanfront for example well

that’s a tough problem and people have

been working on this for a number of

years and then they applied the learning

and the immediate effect was to have the

error rate compared to any previous

technique so again a very dramatic

improvement in performance and back in

2015 a deep neural network developed by

Microsoft Research was applied to this

data set and achieved the same error

rate that a human makes now I should say

that one of the reasons the neural

network is as good as a human is that

it’s better than humans at

distinguishing 57 different varieties of

mushroom but it also makes the mistake

that we think of as rather silly and

perhaps no human would make but

nevertheless remarkable that this same

architecture the same concept can be

applied to this very different domain

that achieved human level performance

and so it’s really that spectrum of

different successes in many different

domains that really has underpinned this

explosion of excitement

around artificial intelligence so I

thought I’d spend a little bit of time

now looking at neural networks and deep

learning and tell you a little bit about

some of the science and some of the

technology behind these successes so

neural networks as the as the name

suggests are inspired if it’ll be

loosely by by the human brain and in

particular by the neurons in the brain

so the interesting part of the brain of

these electrically active cells called

neurons and here’s a photo micrograph

showing some of the neurons which have

been stained because they are very

complex structures lots of branches and

they make lots of connections with each

other and they send electrical signals

to each other and thereby process

information now the human brain is often

described as the most complex object in

the known universe and I just want to

give you a little feeling for just how

extraordinary the brain is so this is a

picture of South America and outlined in

red is the Amazon rainforest and in

spite of our best attempts to destroy it

it’s nevertheless still absolutely

enormous and you can see a picture of

the UK for scale I used to have a

picture of Europe but I’ve had to update

it recently unfortunately now what’s

interesting is that the number of

neurons in the brain is of the same

order as the number of trees in the

Amazon rainforest so that’s not really

the interesting part the real

interesting part about the brain of the

neurons but the connections between the

neurons these are called the sign

answers and each neuron that makes

perhaps 10,000 connections with other

neurons and so the number of sign apps

is in the brain and the sign axes are

thought to be the seat of learning and

the number of sign apps is in the brain

is that the same order is the number of

leaves on the trees of the Amazon

rainforest so the brain is truly

extraordinary

we certainly don’t understand how the

brain works but our limited knowledge of

how the brain works has been

inspirational in developing a technology

called neuro networks

so here are two neurons

the neuron on the left if it stimulated

an appropriate way can can fire can send

an electrical impulse down this cable

the axon and that axon makes connections

called sine axes with other neurons and

can stimulate those other neurons

themselves to fire or can inhibit them

from firing and the strength of those

synaptic scan change as a result of the

operation of the brain as a result of

processing information so the brain has

the capability to learn as a result of

the effectively the data that it sees

the inputs that it receives and so going

back to as far as the 1940s people began

to build mathematical models of neurons

and synapses and learning in the brain

and there are some very sophisticated

models but the ones that interest us are

extremely simple ones and we can

describe them by this little picture so

the the dots down the left hand side

represent inputs if you like there you

can think of those inputs from other

neurons and they’re combined together to

cause neuron here labeled Y either to

fire or not to fire and the connections

between them labeled W represent the

strengths of those sign APS’s so this

little model can be expressed

mathematically and this is the only

equation in the entire talk I promise

you that the equation captures this very

nicely it says that you take each of

those inputs X I and you multiply by a

weight a strength W that can be positive

or negative and you add them all up you

add up all those weighted strengths and

you pass them through this function

Sigma and Sigma just says that if the

the total combination of inputs is

positive the output is a 1 or if it’s

negative the output is a 0 so you could

imagine this little neuron being a

little classified imagine those inputs

being something that’s extracted from an

image and perhaps the output says

whether or not there’s a face in the

image so our goal is to have this neuron

output a 1 if there is a face and output

0 if there isn’t a face and so we could

imagine adjusting all those little

weights those parameters then sign up

strengths if you like

using lots of examples of images of

faces and images of not faces until we

have tuned those premises in such a way

the system has learned to solve that

particular task so that very simple

mathematical model of a neurons called a

perceptron and there was a lot of

excitement and a lot of hype again

around these very early neural networks

back in the nineteen the 1960s this is

one of the pioneers of this field this

is Frank Rosenblatt in the late 1950s

and early 1950s did a lot of work those

theoretical and experimental on

perceptrons and what’s interesting of

course is he didn’t ask access to

wonderful machines like this laptop he

couldn’t just program these in software

so he had to build analog Hardware

instantiations of sept roms and here he

is in front of some symbols at the back

there’s a triangle the circle the square

shows a typical problem that the

perceptron would be asked to solve

could it distinguish between a circle

and a triangle could it be trained to

tell the difference between a circle and

triangle now the input to the set Ron

was this box on the desk this is an

array of twenty by twenty cadmium

sulfide photo cells photo sensitive or

light sensitive resistors which formed

effectively a very primitive digital

camera these like the pixels of a very

slow very low resolution camera and so

here’s a typical experimental setup in

this case it’s going to try to

distinguish between some letters of the

alphabet this is a as I said a very poor

quality camera so we need some very

powerful light shining to the object and

there’s a lens that focus on the image

onto those photo cells so the output of

that camera then goes into this big rack

of equipment and what you see here are

effectively the finances of these neuron

models so in the rack here in this

person’s hand each of those cylindrical

objects is a combination of an

electrical motor and a rotary resistor

or potentiometer so by purely a letter

process the electric motor can change

the resistance value so the value of

that resistance represents the strength

of that sign apps and Rosenblatt

invented something called the perceptron

algorithm which was a mathematical

procedure by which those motors could

adjust the strengths of the sine apses

in response to various inputs in order

that the system could learn so let’s say

we’re distinguishing between triangles

and squares your presenter triangle and

inputs get the output it’s a triangle

that’s fine if the system makes a

mistake and outputs a square there’s an

algorithm it’s making little adjustments

to all of those sine apses to make the

output closer to the desired value

now you presenta nother image represent

a square the outputs are square that’s

fine if it isn’t we make some

adjustments to the sine answers and so

that perceptron learning algorithm

allowed the system to learn by seeing

lots of examples of each class and if

you gradually improve in performance and

hopefully solve the problem now there’s

a lot of excitement about these

perceptrons because it turns out that

they could actually learn to solve

things like distinguishing shapes and

letters the alphabet and so on and for

the day that was remarkable but there’s

wasn’t just an empirical result

Rosenblatt also approved a theorem

he showed mathematically that if the

perceptron was capable of solving a

problem that is to say if there existed

a setting of all of those resistors such

as the system would solve the problem

then the perceptron was guaranteed to

find that solution so people got very

excited about this I’ll just show you a

little bit more of the structure of the

perceptron what you see it’s on the

right therapy are these racks of

attention or misses on the left this

jungle of wiring this general of wiring

looks like it’s just random the reason

it is it is just random these are the

input to those neurons

they’re called features and they’re just

little combinations of those pixels we

got a 20 by 20 grid of photo cells the

pixels at each of these the neurons

would combine some input so-called

features which were just little

combinations sub

sets of those pixels combined together

in some fixed away chosen by the design

of the perceptron and this particular

and there’s lots of ways of choosing

these lots of research was done one one

way of choosing them is just to take

random subsets of those pixels and

combine them and that’s what this random

looking wiring is the reason this was

interesting is that even though you’d

randomized the inputs the system could

still learn to unscramble them and solve

the problem so we’re sort of remarkable

and has even more remarkable is you

could take a pair of wire cutters take a

system which is learn it’s been trained

solve a problem go in with a pair of

wire cutters and cut 10% of the wires

and its performance would degrade a

little bit but it would continue to work

it’s a little bit like you know going

down the pub having a few too many beers

few extra neurons die the next day you

know you can still function maybe not

quite as well as as previously they call

that graceful degradation it’s a

property which I’m pretty sure my laptop

didn’t have I started cutting wires in

my laptop very soon it just stopped

working completely so again this is a

little bit more a little bit brain like

in some in some ways so that generated a

tremendous amount of excitement and so

let me just summarize what’s going on

here a little picture so on the left the

the nodes of units or the neurons if you

like on the left the left-hand column

represent

in the case of a perceptron those those

pixels the original wor pixels and the

dots down the middle or what we call the

features so each of those dots would be

some combination it might be just the

the sum of a randomly chosen subset of

those pixels and that’s represented by

those green connections so there’s green

connections of correspond to that jungle

of random looking wiring and that’s

fixed so be chosen by the designer at

the outset then it’s fixed it doesn’t

change during learning then what we have

is this red layer and the red

connections represent those resistors

these are adjustable parameters in this

perceptron and so the the neurons or the

nodes on the right hand side again take

combinations of some subsets of the

features but this time the string

the combinations I learned those are the

adjustable parameters so what you see is

we have a layered structure in fact

again this is reminiscent to the brain

if you think of the visual processing

the brain occurs through a series of

layers of neurons an important thing is

that only one of these layers is

actually adaptive only one layer changes

during learning now perceptrons were

interesting because they could they

could learn to solve problems I’m just

really exciting but sometimes you would

give it a very similar problem which

looked just as easy and it would fail to

learn so what was going on sometimes

they work sometimes they didn’t well

these two computer scientist Minsky and

Papert analyzed perceptrons

mathematically and they showed that

there are some very severe limitations

to the capabilities of perceptrons but

they are very limited in what they can

do and that limitation arises because

there is only a single layer of

adaptation and they published this in

this famous book called perceptrons and

it’s often said that the publication of

this book led to a loss of interest in

this alternative approach to to

artificial intelligence we’re sort of

programming the community to be

intelligent here the the system is

learning to be intelligent and this book

of course is a piece of mathematics

people kept it it was correct so it was

it was hard to refute but the proof

applied only to a single layer two

systems who have single layers of

adaptive connections at the end of the

book they conjectured that even if you

had more than one layer similar results

would apply they conjectured these

neural networks were never really going

to be very useful that part was a pure

conjecture so there we have the

perceptron with a single layer of

adaptation the field of neural networks

had been very exciting 1960s and had

gone into abeyance and people had lost

interest as a result of the mathematical

discussion of perceptrons and their

limitations then something very

interesting happened which was the

discovery of algorithms different from

the perceptron learning algorithm which

would allow networks having more than

one layer of adaptation to be trained

techniques like so-called error back

propagation

and so people could now apply

multi-layered systems to various

problems and see if they worked and they

discovered that these systems were very

much more powerful than the single layer

perceptrons now to various technical

reasons it turns out that you can really

only train the system with usually at

most a couple of layers but nevertheless

those systems were very powerful they

were capable of solving lots of problems

that hitherto had been impossible and

led to the second way you have

excitement around your networks in the

late 1980s and 1990s now I began my

career as a physicist I did a PhD in

quantum field theory and I went often

works on the fusion program so this tie

line is at Kellerman of oratory in

Oxfordshire as a theoretical plasma

physicist working on nuclear fusion and

I read about the discovery of the back

propagation algorithm and these two

layer neural networks and their ability

their brain like ability to learn to

solve problems and it reminded me of

know how the computer artificial

intelligence I thought this is

tremendously exciting my sense was that

we were at the dawn of a new era this

was so exciting that I was actually

going to change fields and change career

and I did that by taking your networks

and applying them to data that we were

gathering from experiments this is the

inside of the jet tokamak the world’s

largest soccer max is down in

Oxfordshire and is operating at the

container a hydrogen plasma at about 200

million degrees and it’s bristling the

outside of this is bristling with all

kinds of Diagnostics and lasers and

magnetic measurements and so on so for

the day we had a huge amount of data we

could analyze it in all sorts of

interesting ways and so I said about

applying new networks to analyzing this

data and I became so excited about this

I changed fields I left physics and

actually moved at the field of computer

science so those are the two layer

networks they were very powerful but

they were a long way short of achieving

human level performance on some of the

tasks of the kind that I’ve talked about

so these systems were deployed in

practical applications they were very

useful I think it’s fair to say they

remained reasonably niche

what happened then is other techniques

came along there’s technical support

vector machines that was very popular

that issues slightly better performance

than these neural networks and so for

the second time neural network sort of

went into decline people lost interest a

little bit moved on to other techniques

for for so-called machine learning and

then about five years ago there was

something of a breakthrough for a number

of years people like Hinton himself had

been pivotal in the development of

backpropagation and neural networks of

the 1980s to layer neural networks

discovered how to train networks having

more than two layers in fact having many

layers and so the term deep learning

refers to systems which have many many

layers of processing this makes them

extremely powerful because if you think

about a task such as taking an image and

then describing what’s going on in that

image in English language that’s

something which isn’t going to be done

in two simple steps there’ll be some

very low level processing discovering

edges in the image discovering

combinations of edges to make corners

discovering relationships between

corners that make shapes discovering how

those shapes combine together to make

objects like faces looking at shapes of

faces whether somebody’s smiling or not

completing those and using those to

generate words and combining those into

sentences eventually generating language

that describes what’s going on in that

image that’s many many many layers of

processing and so we need to be able to

train networks that are that are deep

that have many many layers of processing

and so really this is the breakthrough

that underpins the new excitement in the

field of artificial intelligence so

here’s a actual deep neural network this

is one that’s used for image processing

for example taking images and labeling

them according to the objects that are

present in the image and these blocks

represent groups of nodes or neurons so

in one of those blocks there are many

layers each layer as a whole grid of

units and the units make connection

two patches or set of fields in the

previous block so you can see the

structure is pretty complex but that

whole system is adaptive and that whole

system is trained on large data sets and

so now we have neural networks

containing thousands or even millions of

adjustable parameters trained on

millions or sometimes even billions of

examples of data points so that’s a

modern deep neural network and I like

this this is a this is a fairly geeky a

magazine called wired if you’re in the

IT business you’ll know wide magazine if

you’re not you perhaps won’t but this is

the front cover of Wired magazine for a

massing June of this year it just says

the end of code soon we won’t program

computers will train them so behind all

the if you like to hide the excitement

around artificial intelligence there

really is a very fundamental

transformation happening in the field of

computer science that’s a transformation

from programming a computer directly to

solve a problem that is they’re human or

team of humans devising a set of

instructions such that when the computer

follows those instructions it solves a

particular task and instead doing

something very different writing a set

of instructions or computer program

which allows the computer to learn and

then train the computer to solve the

task by using large amounts of data so I

view that is a very fundamental shift in

the nature of computation there’s

something else going on as well and I

interest rate it with this slide I got

asked in college just to write the word

uncertainty what you see of course is

the tremendous variability in human

handwriting you can see from this

example why it’s so difficult to program

a computer directly to do something like

recognize human handwriting there’s

tremendous variability if you think of a

little rule which describes the shape of

the letter e you’ll very quickly find an

exception to that rule and so you can

write another rule that captures the

exceptions but there’ll be exceptions to

the exception there’s this combinatoric

explosion of possibilities that’s really

what defeated old-fashioned AI back in

the 1950s and 60s plus of course the

lack of fast computers

there’s something else going on as well

and in a sense it’s complementary to

this idea of learning so we’re seeing a

shift in computation that I think a

revolution in computation between

software which is written by humans and

software which is learned from data so

there’s something else going on not only

are we seeing a transformation in

computation from software which is

written by hand to software which is

learn from data but we’re seeing a shift

from software which is based on logic

that is everything is zero or one is

deterministic to software which deals

with uncertainty it quantifies

uncertainty it deals it if you like

shades of grey and ambiguity so I’m

going to show you a little a little

demonstration and this demonstration was

actually designed really to illustrate

this idea of uncertainty and to show you

a modern view of machine learning so

I’ve shown you what I would think of as

a traditional view of machine learning

adjusting these sign apps are adjusting

these parameters in a neural network to

bring it closer and closer to the

desired performance but there’s a very

different view of machine learning and

it shows you the critical role played by

uncertainty so this is an example that

be very familiar to many of you it’s

what we called a recommendation system

in this case is going to recommend films

or movies to people so this is a huge

table at each column of the table

represents a different movie and each

row of the table is a different person

and our goal is to recommend movies to

somebody which we think they might enjoy

watching now in a real system we would

certainly make use of features so it

makes use of features of the film for

example its length and its genre is it a

comedy or an action-adventure or

whatever and who are the actors and so

on we’ll also make features of the user

that age their gender their geographical

location other things we might know

about them and those are certainly very

helpful in matching movies to people

those are the purpose of this demo let’s

ignore those features all we know are

the ratings which

people have given to movies and so we

know that a certain person has watched a

particular movie and they like that

movie that’s represented by the ticks in

these boxes so where a particular person

has watched a particular movie and

they’ve given it a positive rating

because sometimes people watch a movie

and they don’t like it and so I give it

a negative rating and so that’s those

are the crosses now this is essentially

a big table it might have ten thousand

movies and ten million people so this is

an enormous table and it’s mostly

empties we don’t have very many ratings

and I’ll go effectively is to fill in

the blanks so where a person has not yet

watched a movie we want to predict will

they like the movie or will they not so

I’m going to show you a demonstration of

a system which solves exactly that task

it’s based on machine learning and this

is this is a little demonstration system

although the actual technology behind

this is used in real systems and in this

case we’ve chose de couple of hundred

movies and the system has already done a

certain amount of learning based on the

ratings of a few tens of thousands of

people on these two hundred movies now

what it’s going to do is make

recommendations for me so to learn about

my preferences now I wasn’t one of the

people in the original dataset so it

knows nothing about me at this point so

I need to do is watch a movie and decide

whether I like it or not so let’s say

I’ve watch this movie and let’s suppose

I do like it okay so what it’s doing now

is it’s reordering the other movies

according to whether it thinks I’ll like

them or not

now the vertical position on the screen

is is irrelevant they’re just spread out

vertically so you can see them what

matters is the horizontal position so if

a movie is close to the right hand edge

of the white region it is close to that

green region then the system is very

confident that I will like the movie and

it measures that confidence using

probability so it signs a high

probability to my liking that movie

conversely if it positions the movies on

the left-hand side of the white region

towards that red edge it’s very

confident that I won’t like it and if

it’s in the middle if it a 50/50 it’s

really very unsure

what you see is that most of the movies

are clustered around the middle there’s

a lot of white space down the right

there’s a lot of white space down the

left and that’s not surprising the only

thing it knows about me is that I like

that one movie that’s all it knows about

it so hasn’t had much data for me to

learn and so it’s really very unsure

about most of this so let’s pick another

another example let’s suppose I don’t

like this one

so now what we see is the movies are

spreading out some of them are moving

towards the right where it’s more

confident that I will like them so I’m

moving to the left where it’s more

confident that I won’t like them and

this if you like is the modern view of

machine learning if the reduction in the

uncertainty of the system as a result of

seeing the data and so I can carry on I

can pick another one that’s air like

this one pick another one suppose I

don’t like that one so that it’s seen

for example so now you see a very

different picture you see a lot of

movies clustered down the right-hand

side very confident I should go and

watch those ones down the left-hand side

pretty confident I won’t like them most

of the white space is now in the middle

there are few that it’s really quite

unsure about the Sound of Music for some

reason number but nevertheless you can

see that it has learned from data

through a reduction in uncertainty so

that I think this is the modern view of

machine learning I’m going to use this

demo to illustrate something else as

well which i think is really very

powerful and ticket Illustrated is very

nicely which is the concept of

information so the whole field called

information theory it was invented by

Shannon back in the 1920s

and he provided a mathematical basis for

the concept of information and that

really is foundational in modern

computer science and information

technology and I can illustrate that by

going to one of these movies down the

right-hand side so it’s equality on the

right hand side it’s pretty confident

that I won’t like this movie so let’s

suppose I watch it and suppose it’s

pretty confident I will like the movie

so let’s suppose I watch it and indeed I

do like it so watch what happens watch

carefully because I let go of the mouse

button okay actually I’ll pick another

one

so it’s confident that I’ll like this so

let’s say I do like

watch what happens a tiny change the

reason is that there was very little

surprise there’s very little surprise in

that in that data so there very little

information and if I pick another one

here that it’s very confident I’ll like

let’s suppose that I don’t like this

movie again watch what happens as I like

the other nice button okay so this time

we see a dramatic change is that it’s

now got rather confused again a lot of

things have gone back to the middle so

there there was a high degree of

surprise really confident that was going

to like the movie and I said I didn’t

that was very surprising is that Shannon

defined information is the degree of

surprise now what’s interesting is that

this is a I think a very nice

illustration of the difference between

data and information because in every

case the amount of data is the same it’s

one bit or one binary digit in order to

say that I like a movie or I don’t like

the movie I can find it a 0 or a 1 so

each of those was its 7 ratings I

provided so far is represented by one

bit of data the amount of data is the

same but the amount of information is

very different so if it’s a movie on the

right hand side that I like the amount

of information goes to 0 at the right

hand side when it becomes certain that I

like it and I do like it mean and if

information is goes to 0 and the amount

of information goes logarithmically to

infinity as we go across to the left

hand side so there’s a very nice

illustration of the distancing data and

information ok

and that’s the first example we call

collaborative filtering because people

are collaborating together to help each

other work out which movies they’re

going to like

so this quantification of uncertainty is

based on a branch of mathematics that

goes back certainly 350 years called

probability theory natural mathematical

equations of probability are deceptively

simple

it’s a very beautiful and very elegant

theory and it’s just a way of putting

numbers behind uncertainty in a way

that’s very consistent so probability is

really the calculus of uncertainty now

there are two kinds of probability so if

you were taught probability in school

you’re probably taught probably probably

taught probability as the limit of an

infinite number of

so the probability that a coin will land

heads if it’s a fair coin is 0.5 or 50%

what that means in precise terms is if

you flip the coin a number of times and

you measure the fraction of times it

lands heads that if you take the limits

you flip more and more times you flip an

infinite number of times that fraction

it will fluctuate around but it’ll

eventually converge to a value and that

value is the probability we call that

the frequentist notion of probabilities

it’s the frequency with which something

occurs there’s another view of

probability which we call the Bayesian

view which in a sense is more general

because it encompasses the notion of

frequency but it applies also to things

like one-off events unrefuted events if

we want to ask what’s the probability

that the moon was once part of the earth

compared to being a separate body that

was captured by the Earth’s

gravitational field we can’t sort of

repeat the origin of the universe

millions of times and see which fraction

of the times you know it’s so on it

doesn’t make sense it’s a one-off event

but we’re using probabilities to

describe uncertainty and it’s

interesting that we use the same

terminology and the reason we use the

same terminology is that if you try to

ascribe numbers to quantify uncertainty

those numbers are base and very simple

equations and those equations are

exactly the same equations as a paid by

frequencies and lots of things like coin

flips and so we use the same terminology

we call the probability this is a much

more general definition I tried to

illustrate it with this example so

here’s a coin that it’s a bent coin so I

flipped a coin it wasn’t going to land

it might land concave side up okay

concave side up more often that it lands

concave side down let’s suppose I don’t

know if that’s that right according the

physics let’s suppose it is so imagine

that I flip this coin an infinite number

of times look at the fraction of times

it lands concave side up that’ll be 60%

or naught point 6 that’s the probability

of landing concave side up so that’s a

frequentist probability blood suppose

that one side of the coin is heads and

the other is tails but imagine that you

don’t know which side is heads you don’t

know whether the concave side is heads

or the convex side is

but as soon as I asked you to take a bet

bet 5 pounds according to whether it’s

heads or tails which way should you bet

well for your point of view its

symmetrical you don’t know which side is

heads or tails so even though you know

it’s going to land concave side up more

often than concave side down because you

don’t know which is heads on which is

tails it’s symmetrical and so if you’re

acting rationally you were bet according

to a probability of not 0.5 it doesn’t

mean that you believe that in the limit

of an infinite number of trials it will

land heads 50% of the time you believe

that it will either land heads 60

percent of the time or it will land

heads 40 percent of the time but you

don’t know which so in a sense the

frequency with which it lands concave

side up in that case is a bit like a

frequentist probability but this

uncertainty over which side is heads or

tails it’s a one-off event one side is

heads

or the other is heads it’s not a

repeatable event it’s a one-off thing

you just don’t know which it is that’s

like this quantification of uncertainty

this Bayesian view of policy now at this

point you might thinking well I’m making

a lot of fuss here because we’ve just a

tiny change instead of having zeros and

ones we’ve now got numbers between 0 & 1

it seems like a very small change I just

give you one little illustration of the

fact that probabilities can behave in

very peculiar ways so this is not a

trivial change at all this is really

quite some I think quite significant so

here’s a little example here’s a bus and

let’s suppose that the bus is longer

than the car and there’s a bicycle and

suppose the car is longer than the

bicycle then if the bus is longer the

car and the car is longer than the

bicycle and I think you’ll all agree

that the bus must be longer than the

bicycle does anybody not agree with that

good we call that mathematicians call

that property transitivity so these if

you like deterministic numbers these

certain numbers these lengths of objects

behave in this transitive way but if we

now go to uncertain quantities we

discover that they can be non transitive

and this is extremely peculiar

and it’s not just a theoretical thing

these are these are non transitive dice

and they’re very easy to construct

they’re just regular dice the only

unusual thing about them is the choice

of numbers on the faces so these are

unbiased dice they command on each of

the six eyes of equal probability that

the choice of numbers is a bit unusual

and a particular number only appears on

one of the dice and so you can never

have a draw so if you roll one die

against another one of them are always

come up with a higher number we’ll call

that the winner and what you discover is

that if you let’s say roll the red die

against the yellow die then 2/3 at the

time the number on the yellow dye will

be bigger than the number on the red die

so the yellow dye actually just has

threes so always comes up with a three

the red die have a couple of sixes and

four twos so 2/3 of the time it rolls or

two and one service and rolls are six so

2/3 of the time yellow beets red okay

that’s fine 2/3 of the time likewise

purple will be yellow 2/3 of the time

Green will be purple rubber the higher

number and here’s the amazing thing

two-thirds of the time red will be green

so you can equip yourself for the set of

non transitive dice if I sell these at

very reasonable rates by the way and you

can have a very profitable evening down

the pub with your friend because you

show them these dice and you say that

you examine them to your heart’s content

now you pick whichever you like and I’ll

pick one of course you pick the next one

in the sequence you say let’s do the

best of the leaven or best of 15 throws

bet 5 pounds after a reasonably large

number throws it Alma is very very

likely that you’ll win the bet so they

get a bit cross and they want the dye

that you’ve just used and because you

pick a different one and so on and

you’ll always win so this is just one of

many many examples of fact that

probabilities behave in very unusual and

very peculiar ways

so we’ve seen the idea that artificial

intelligence is being revolutionized by

learning from data and that learning

from data happens through the

quantification of uncertainty so

learning from data is one of the key

ideas so these algorithms things like

deep neural networks are one of the

foundations of this revolution

another of the foundations obviously is

the data the explosion that we’re seeing

in data is one of the things that’s

enabling this this revolution the amount

of data in the world is doubling on a

very short very short timescale probably

less than a year and so we’re seeing a

tremendous growth in the amount of data

that can be used to fuel machine

learning there’s a third ingredient as

well and that’s computation so these

techniques are very hungry for computer

power so today we use neural networks

with millions of adjustable parameters

trained on millions or billions of data

points using extremely powerful

computers and these computers live

increasingly in what are called data

centers so here’s a picture of a data

center this particular one is a

Microsoft data center somewhere in North

America

what you see are these low buildings

with no windows and inside racks and

racks and racks full of computers and

storage and networking so these are

really the world’s most powerful

computers these days now these data

centers the foundation what we call

cloud computing the idea that computing

is now increasingly centralized in these

data centers and accessed anytime

anyplace from any device and the trend

is growth in cloud computing and many

companies and including in particular

Microsoft are expanding these data

centers this data center if you look

closely at the top of the picture you’ll

see some Bill dozers and some

construction work going on because this

data center is being expanded that if we

fly around a hundred eighty degrees and

up from the other end you’ll get some

idea of the scale of expansion so this

particular data center is obviously

increasing in size by an enormous factor

and so all around the world new data

centers are being constructed all the

time we’re seeing this tremendous growth

in the capacity of these data centers

now the last few weeks have been very

interesting there have been some

variants

the announcements in the last few weeks

and in particular the announcement by

Microsoft of the world’s first exascale

supercomputer and it’s based on a

technology called FPGA or field

programmable gate array so the way to

understand what this is is think of it

as a hardware chip but we’re the the

architecture of the hardware can be

changed using software so it’s a very

flexible kind of chip the chip itself is

not as powerful as fast as a fixed

architecture chip like a central

processing unit in this laptop but it’s

very flexible and so we can change the

architecture and run and try out lots of

different kinds of algorithms and neural

networks and so on and so in these data

centers as well as the regular

computation we’ve been deploying these

field programmable gate arrays on a very

large scale to the point where a couple

of weeks ago we announced the world’s

first exascale AI supercomputer so an

exascale need to can do an excerpt of a

second that’s a billion Giga operations

per second or a million million million

operations per second I’m sure this

won’t end up being the fastest computer

there’s more to come so this is just an

extraordinary growth in processing power

coupled with data coupled with these new

algorithms is driving all of the

excitement around machine learning and

artificial intelligence what’s this

being used for well many many many many

different things one example of course

is personal assistants many

organizations are working on large

companies are working on developing

personal digital assistants this is

Microsoft’s this one called katana and

these technologies are at a very early

stage of development but I’m very

confident the next decade will see the

capabilities of these types of assistant

advanced very very rapidly there are

many many other applications of machine

learning and the technology continues to

advance at a tremendous pace so again

just in the last week or so an

announcement in this case again by

Microsoft of the achievement of human

parity in speech recognition so this is

an automatic speech recognition system

which achieves the same error rate at

the word level as a human transcriber

when humans franchise speech they make a

few errors sort of the machines

the error rate is now the same what else

we’ll be using this for we building

killer robots and wipe us all out well

they’re actually many many more useful

things we can do with this and I’ll just

briefly tell you about a research

project that we’re looking at in in

Cambridge at Microsoft Research in

Cambridge core project inner eye this is

using these machine learning and

artificial intelligence techniques to

look at the treatment of cancer so what

we see on the left is a cross-section of

an MRI scan of a brain tumor very nasty

brain tumor and the the radiologist is

using a mouse and looking at this image

and segmenting the tumor that is

defining the boundary of the tumor by

hand in order this can be used for

radiation therapy planning firing in

x-rays and radiation to try to kill the

tumor and do the minimum damage to the

surrounding tissue so that’s being done

by hand and is a very time-consuming

process but we can use these machine

learning techniques to speed this up and

also to improve the accuracy and

reproducibility of this so a little bit

of human intervention now to provide

some initial segmentation and after

about 10 seconds or so the segmentation

is complete and it’s more accurate or

more reliable than the human

segmentation this is a nice example of

artificial intelligence being used in

partnership with humans so it’s the case

today despite all the advances I’ve

talked about it’s the case today and I

think it will be the case for quite some

time to come to the capabilities of

machines are different from and

complimentary to the capabilities of

humans so here the radiologist with the

experience of looking at these images

many different images from many

different patients for many years it’s

built up a good qualitative

understanding of the tumor that h of the

tumor how it should be treated what the

computer is good at is this three

dimensional segmentation defining

accurately and reproducibly which of

those three dimensional pixels as voxels

is tumor and which is normal tissue I

started my talk with a rather gloomy

outlook of killer robots I think that

belongs firmly in the world of Hollywood

but nevertheless this is a very powerful

technology it’s a very general purpose

technology and as it’s deployed and I’m

sure it will be deployed in many ways

which are of enormous benefit to society

helping us as a species tackle some of

the tremendous problems that we face in

the 21st century well we must of course

expect a few bumps on the road and to

help us think about issues around

privacy and security around theimplications of this transformation tothe world of solutions which are learnfrom data again we just announced in thelast week or two the formation of thepartnership on AI where some of theleading organizations working on artificial intelligence at large scale have come together to work together to see how artificial intelligence can best be used for the benefit of society and finally I if you’re worried about killer robots I think I think we’re always remain in control thank you very much you

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Check Also

Close
Back to top button
Close