Inference in First-Order Logic

INFERENCE IN FIRST-ORDER LOGIC
Artificial Intelligence, A Modern Approach, CH.9
2017.04.28(Fri)
Junya Tanaka (M1)

Definition
Sound
• Inference derives true conclusions given true premises
Complete
• Inference derives all true conclusions from a set of premises
Substitution(SUBST({var_name / value}, sentence))
• EX.
oP = Animal(X)
oSUBST({X/Panda},P}) = Animal(Panda)

Introduction
• Chapter 7 showed how a knowledge-based agent could
represent the world in which it operates and deduce what
actions to take.
• Chapter 8 examine first-order logic, which is sufficiently
expressive to represent a good deal of our common-sense
knowledge.
effective procedures for answering questions posed
in first-order logic.
Goal of this chapter is …

Agenda
•9.1 Propositional VS. First-Oder Interference
•9.2 Unification and Lifting
•9.3 Forward Chaining
•9.4 Backward Chaining
•9.5 Resolution
•9.6 Summery

9.1 Propositional VS. First-Oder Inference
• This section and the next introduce the ideas underlying
modern logical inference systems
• some simple inference rules that can be applied to
sentences with quantifiers to obtain sentences without
quantifiers
• Inference rules for quantifiers
oUniversal Quantifiers
 a type of quantifier a logical constant which is interpreted as "given
any" or "for all”
oExistential Quantifiers
 a type of quantifier, a logical constant which is interpreted as "there
exists", "there is at least one", or "for some"
∀
∃

Universal Instantiation(UI:∀)
• we can infer any sentence obtained by substituting a ground
term (a term without variables) for the variable.
• Example
oIf ∀x Person(x) (every x is a person)
othen Person(John), Person(Jack)…so on are true.
oSo substitute x with anything(ground term) is legal.
• 𝑆𝑈𝐵𝑆𝑇(𝜃, 𝛼) denote the result of applying the substitution 𝜃
to the sentence 𝛼 . Then the rule is written
∀𝑣 𝛼
𝑆𝑈𝐵𝑆𝑇(
𝑣
𝑔
, 𝛼)
for any variable v and ground term g

Example
• Suppose our knowledge base contains the standard
folkloric axiom stating that all greedy kings are evil:
∀𝑥 𝐾𝑖𝑛𝑔 𝑥 ∧ 𝐺𝑟𝑒𝑒𝑑𝑦 𝑥 ⇒ 𝐸𝑣𝑖𝑙(𝑥)
• Then it seems quite permissible to infer any of the
following sentences:
𝐾𝑖𝑛𝑔 𝐽𝑜ℎ𝑛 ∧ 𝐺𝑟𝑒𝑒𝑑𝑦 𝐽𝑜ℎ𝑛 ⇒ 𝐸𝑣𝑖𝑙 𝐽𝑜ℎ𝑛
𝐾𝑖𝑛𝑔 𝑅𝑖𝑐ℎ𝑎𝑟𝑑 ∧ 𝐺𝑟𝑒𝑒𝑑𝑦 𝑅𝑖𝑐ℎ𝑎𝑟𝑑 ⇒ 𝐸𝑣𝑖𝑙 𝑅𝑖𝑐ℎ𝑎𝑟𝑑
𝐾𝑖𝑛𝑔(𝐹𝑎𝑡ℎ𝑒𝑟(𝐽𝑜ℎ𝑛)) ∧ 𝐺𝑟𝑒𝑒𝑑𝑦(𝐹𝑎𝑡ℎ𝑒𝑟(𝐽𝑜ℎ𝑛)) ⇒ 𝐸𝑣𝑖𝑙(𝐹𝑎𝑡ℎ𝑒𝑟(𝐽𝑜ℎ𝑛))

Existential Instantiation
• The existential sentence says there is some object satisfying
a condition, and applying the existential instantiation rule just
gives a name to that object
• For any sentence α, variable v, and constant symbol k that
does not appear elsewhere in the knowledge base,
∃𝑣 𝛼
𝑆𝑈𝐵𝑆𝑇(
𝑣
𝑘
, 𝛼)
• Intuition
oThere is “some” object satisfying a condition.
oWe cannot randomly substitute the variable.
oInstead, we use a constant “C” which we assume that C
satisfies the condition.

Example
• From the sentence
∃x Crown(x) ∧ OnHead(x, John)
• we can infer the sentence
Crown(C1) ∧ OnHead(C1, John)
as long as C1 does not appear in knowledge base.
• Provided C1 is a new constant symbol, called a Skolem
constant
• Existential Instantiation is a special case of a more general
process called Skolemization

UI and EI
• UI can be applied several times to add new sentences; the
new KB is logically equivalent to the old.
oGiven everything is valid, instantiating anything is also valid.
• EI can be applied only once to replace the existential
sentence; the new KB is not equivalent to the old, but is
satisfiable iff the old KB was satisfiable.
oE.g. ∃x Killer(x, Victim)
 After substituting this sentence, we get Killer(Murderer, Victim) and
the original sentence with ∃ is discarded.
 The new KB is not equivalent to the old one since the original
sentence was replaced.
 These two KB are not logically equivalent yet inferentially
equivalent

Reduction to propositional inference
• We have rules for inferring nonquantified sentences from
quantified sentences
• It becomes possible to reduce first-order inference to
propositional inference
• The first idea is that, just as an existentially quantified
sentence can be replaced by one instantiation, a universally
quantified sentence can be replaced by the set of all possible
instantiations

Example
• Suppose the KB contains just the following:
o∀ x King(x) ∧ Greedy(x) ⇒ Evil(x)
oKing(John)
oGreedy(John)
oBrother(Richard, John)
• Instantiating the universal sentence in all possible ways, we
have
oKing(John) ∧ Greedy(John) ⇒ Evil(John)
oKing(Richard) ∧ Greedy(Richard) ⇒ Evil(Richard)
oKing(John)
oGreedy(John)
oBrother(Richard, John)
• The new KB is propositionalized: proposition symbols are
oKing(John), Greedy(John), Evil(John), King(Richard) etc.

Problems
• Problem1:
oWith function symbols, there are infinitely many ground terms,
Father(Father(Father(John)))
• Problem2:
oWe can check the entailment of a sentence. (Modus Ponens,
Resolution, etc.)
oHowever, unable to check the “non-entailment” of the sentence.
(Infinitely nested statement → infinite loop)

9.2 Unification and Lifting
• The propositionalization approach is rather inefficient
• Example
othe inference of Evil(John) from the sentences
∀x King(x)∧Greedy(x) ⇒ Evil(x)
King (John )
Greedy (John )
This seems completely obvious to a human being

First-order inference rule
• Generalized Modus Ponens
oIt can be summarized as "P implies Q and P is asserted to be
true, so therefore Q must be true
oFor atomic sentences p , p ′, and q, where there is a
substitution θ such that SUBST(θ, pi′) = SUBST(θ, pi), for all i,
p1′, p2′, ..., pn′, (p1 ∧p2 ∧...∧pn ⇒q)
𝑆𝑈𝐵𝑆𝑇(𝜃, 𝑞)

Example of Generalized Modus Ponens
• Example
oUse the rule that greedy kings are evil, find some x such that x
is a king and x is greedy, and then infer that this x is evil
oThe substitution {x/John,y/John} to the implication premises
King(x) and Greedy(x) and the knowledge-base sentences
King(John) and Greedy(y) will make them identical
• p1′ is King(John)
p2′ is Greedy(y)
θ is {x/John, y/John} SUBST(θ, q) is Evil(John) .
• p1 is King(x) p2 is Greedy(x) q is Evil(x)

Unification
• We can get the inference immediately if we can find a
substitution θ such that King(x) and Greedy(x) match
King(John) and Greedy(y)
oθ = {x/John,y/John} works
• Unification finds substitutions that make different logical
expressions look identical
oUNIFY takes two sentences and returns a unifier for them, if one
exists
oUNIFY(p,q) =  where SUBST(,p) = SUBST (,q)
 Basically, find a  that makes the two clauses look alike

Example of Unification
•Examples
UNIFY(Knows(John,x), Knows(John,Jane)) = {x/Jane}
UNIFY(Knows(John,x), Knows(y,Bill)) = {x/Bill, y/John}
UNIFY(Knows(John,x), Knows(y,Mother(y))= {y/John, x/Mother(John)
UNIFY(Knows(John,x), Knows(x,Elizabeth)) = fail
o Last example fails because x would have to be both John and Elizabeth
o We can avoid this problem by standardizing:
 The two statements now read
UNIFY(Knows(John,x), Knows(z,Elizabeth))
 This is solvable:
UNIFY(Knows(John,x), Knows(z,Elizabeth)) = {x/Elizabeth,z/John}

Unification
•To unify Knows(John,x) and Knows(y,z)
oCan use θ = {y/John, x/z }
oOr θ = {y/John, x/John, z/John}
•The first unifier is more general than the second.
•There is a single most general unifier (MGU) that is
unique up to renaming of variables.
MGU = { y/John, x/z }

Storage and Retrieval
•Use TELL and ASK to interact with Inference Engine
oImplemented with STORE and FETCH
 STORE(s) stores sentence s
 FETCH(q) returns all unifiers that the query q unifies with some
sentence in Knowledge Base
•Example:
oq = Knows(John,x)
oKB is:
 Knows(John,Jane), Knows(y,Bill), Knows(y,Mother(y))
oResult is
 1={x/Jane}, 2=, 3= {John/y,x/Mother(y)}

• First approach:
 Create a long list of all propositions in Knowledge Base
 Attempt unification with all propositions in Knowledge Base
oWorks, but is inefficient
• Need to restrict unification attempts to sentences that have
some chance of unifying
oIndex facts in Knowledge Base
 Predicate Indexing
Index predicates:
• All “Knows” sentences in one bucket
• All “Loves” sentences in another
 Use Subsumption Lattice (see below)

• Subsumption Lattice
oChild is obtained from parent through a single substitution
oLattice contains all possible queries that can be unified with it.
 Works well for small lattices
 Predicate with n arguments has a 2n lattice
oStructure of lattice depends on whether the base contains repeated
variables

9.3 Forward Chaining
• Forward Chaining
oone of the two main methods of reasoning when using
an inference engine and can be described logically as repeated
application
oIdea:
 Start with atomic sentences in the KB
 Apply Modus Ponens
Add new atomic sentences until no further inferences can be made
oWorks well for a KB consisting of Situation  Response
clauses when processing newly arrived data

First-Order definite clauses
• They are disjunctions of literals of which exactly one is
positive
• First Order Definite Clauses
oDisjunctions of literals of which exactly one is positive:
oExample:
 King(x)  Greedy(x)  Evil(x)
 King(John)
 Greedy(y)
oFirst Order Definite Clauses can include variables
 Variables are assumed to be universally quantified
Greedy(y) means y Greedy(y)
oNot every KB can be converted into first definite clauses

Example(1/2)
• Consider the following problem:
oThe law says that it is a crime for an American to sell weapons
to hostile nations. The country Nono, an enemy of America,
has some missiles, and all of its missiles were sold to it by
Colonel West, who is American
• Prove
oWest is a criminal

Example(2/2)
• it is a crime for an American to sell weapons to hostile
nations:
American(x)  Weapon(y)  Sells(x,y,z)  Hostile(z)  Criminal(x)
• Nono … has some missiles, i.e., x Owns(Nono,x) 
Missile(x):
Owns(Nono,M1) and Missile(M1)
• … all of its missiles were sold to it by Colonel West
Missile(x)  Owns(Nono,x)  Sells(West,x,Nono)
• Missiles are weapons:
Missile(x)  Weapon(x)
• An enemy of America counts as "hostile“:
Enemy(x,America)  Hostile(x)
• West, who is American …
American(West)
• The country Nono, an enemy of America …
Enemy(Nono,America)

Simple Forward-chaining algorithm

Properties of forward chaining
•Sound and complete for first-order definite clauses
•Datalog = first-order definite clauses + no functions
•FC terminates for Datalog in finite number of
iterations
•May not terminate in general if α is not entailed
oThis is unavoidable: entailment with definite clauses is
semidecidable

Efficiency of forward chaining
• Incremental forward chaining: no need to match a rule on
iteration k if a premise wasn't added on iteration k-1
 match each rule whose premise contains a newly added positive
literal
• Matching itself can be expensive:
• Database indexing allows O(1) retrieval of known facts
oe.g., query Missile(x) retrieves Missile(M1)
• Forward chaining is widely used in deductive databases
• There are two major possible sources of complexity.
oPattern matching
oIncremental forward chaining

Pattern matching
• Finding all possible unifiers consumes large amount of time.
• For example,
oMissile(x) ∧ Owns(Nono, x) → Sells(West, x, Nono)
oAssume that Nono owns 1000 objects but only 3 are missile.
oIf we sequentially find substitution of x in KB from objects
owned by Nono, it would take lots of time to find a missile.
oOn the other hand, if we started with finding from x that is
missile, we would check only 3 times and that reduce great
amount of computational time.
• This is called “conjunct ordering” problem:
ofind a good order to solve the conjuncts of the rule premise
oNP-hard problem (but good heuristic available)

Example : Map-coloring problem
• Given a map, find a way to paint the color on each city(node)
so that the adjacent cities do not share the same color.
• Colorable() means the constraints are satisfiable.
Diff(wa,nt)  Diff(wa,sa)  Diff(nt,q) 
Diff(nt,sa)  Diff(q,nsw)  Diff(q,sa) 
Diff(nsw,v)  Diff(nsw,sa)  Diff(v,sa) 
Colorable()
Diff(Red,Blue) Diff (Red,Green)
Diff(Green,Red) Diff(Green,Blue)
Diff(Blue,Red) Diff(Blue,Green)

Incremental forward chaining
• On each iteration, Forward Chaining matches every rule in
Knowledge Base.
 (But didn’t show in the previous slide)
 E.g. Missile(x) → Weapon(x) is matched with Missile(M1) resulting
in “Weapon(M1)”, which is already in KB, in every iteration.
• Observation :
oFacts obtained from iteration t must use some facts from the
iteration t-1 to derive.
oIt’s true, if the fact in iteration t could be derived with new fact
from iteration t-1, we would get that iteration t fact sooner.
• Solution :
oAt iteration t, we take a rule only if its premise contains a
conjunct pi that unifies with a newly derived fact pi
′
at iteration
t-1.
oNote: The premise just needs to contain, doesn’t have to be
whole.

Example of Incremental forward chaining
• Example
oWe were able to derive the fact P(John) at 5th iteration.
oThere are these following sentences in KB
 (1) P(x) ∧ Q(Jane) → AA
 (2) K(x) ∧ M(x, y) ∧ N(x) ∧ O(x, z) → BB
 (3) P(Jack) ∧ N(Jake) → CC
oAt the 6th iteration, there is no point in checking the second and
third sentences
 (2) does not contain P
 (3) cannot unify with P(John) since P was already substituted as
Jack.
oAnd at the 7th iteration, we don’t match P(John) with (1)
because we will keep deriving AA.

9.4 Backward Chaining
• Backward Chaining is a “generator”
oGenerates all possible substitutions to the goal.
• Backward Chaining is depth-first.
• Idea
oStarts from the goal
oMatches each literal in goal’s premise from left to right with
facts in Knowledge Base
oCreates new recursive call, if the call succeed, the proof in that
branch is completed.
oIf the current goal is completed, continue with new sentence
that can lead to goal.
 Be reminded : goal might be able to reached from one or more
rules.

Backward Chaining
• Properties of Backward chaining
oDepth-first recursive proof search
oIncomplete due to infinite loops
 Fixed by checking current goal against every goal
oInefficient due to repeated subgoals
 Fixed by caching previous results
oHowever, Backward Chaining is widely used “without”
improvements.

Backward Chaining
• Logic programming
oAlgorithm = Logic + Control (Robert Kowalski’s equation)
oDeclare problem in logical symbol form
oQuery desired statement
oThe program will find the solution to the query based on
Knowledge Base.
• Prolog
oLogic programming language
oUse Backward Chaining as an inference algorithm
oClose world assumption

Prolog
• Redundant inference and infinite loops
oProlog is trapped in infinite loops easily because of its depth-
first style search.
oLet
 path(x, y) refers to “an existence of path between point x and z.”
 link(x, y) refers to “x is directly connected to y.” (adjacent node)
oTo see if a path between two distant nodes exist, we use
path(n1, n2)
oWe model the rules in recursive style with prolog syntax as…
 path(n1, n2) :- link(n1, n2)
 path(n1, n2) :- path(n1, n), path(n, n2)

Prolog
• Redundant inference and infinite loops
• path(n1, n2) :- link(n1, n2)
oCase 1: two nodes are directly connected.
• path(n1, n2) :- path(n1, n), link(n, n2)
oCase 2:two nodes are connected by some path between n1
and n while n is adjacent to n2.
• This can be written in FOL style as…
o[link(n1, n2)] ∨ [path(n1, n) ∧ link(n, n2)] → path(n1, n2)

Prolog
• If implemented badly, prolog will keep on delving itself to find
the path as the right picture

Constraint logic programming(CLP)
• Normally logic program(e.g. prolog) can answer only 1 or
finite number of answers at time.
• What if there are infinitely many answers?
• Example
x>=0 ∧ y>=0 ∧ z>=0 ∧ x+y>=z ∧ y+z>=x ∧ x+z>=y
→ Triangle(x, y, z)
oFor Prolog
 Triangle(3,4,5) is True.
 But Triangle(3,4,z) causes an error because prolog cannot handle z>= 0
constraint.
oCLP can handle this.
 Triangle(3,4,z) results in 1<= z <= 7
• CLP was invented for constraint problem solving.
• The answer is correct as long as it meets the constraints

9.5 Resolution
• Conjunctive normal form for First-order logic
oAs in propositional logic, resolution algorithm works with
sentences in Conjunctive normal form.
oEvery First-order logic sentence can be converted into an
inferentially equivalent Conjunctive normal form sentence.

Resolution
• Conversion to Conjunctive normal form
• Everyone who loves all animals is loved by someone:
∀x [∀y Animal(y) → Loves(x, y)] → [∃y Loves(y, x)]
• 1. Eliminate implications and biconditionals (if any)
∀x [¬(∀y ¬Animal(y) ∨ Loves(x, y))] ∨ [∃y Loves(y, x)]
• 2. Move ¬ inwards: (¬∀x p ≡ ∃x ¬p , ¬ ∃x p ≡ ∀x ¬p)
∀x [∃y ¬(¬ Animal(y) ∨ Loves(x, y))] ∨ [∃y Loves(y, x)]
∀x [∃y ¬¬ Animal(y) ∧ ¬Loves(x, y))] ∨ [∃y Loves(y, x)]
∀x [∃y Animal(y) ∧ ¬Loves(x, y))] ∨ [∃y Loves(y, x)]
• 3. Standardize variables:
∀x [∃y Animal(y) ∧ ¬Loves(x, y))] ∨ [∃y Loves(y, x)]
oNotice that there are two y in the sentence.
oHowever, these y refer to different objects
 The first one is animal, the second one is the person who loves x.
oTo avoid confusion, we rename(standardize) the second y.
∀x [∃y Animal(y) ∧ ¬Loves(x, y))] ∨ [∃z Loves(z, x)]

Resolution
• 4. Skolemization
oAs in the previous slide, skolemization is done so as to drop the
EQ.
oHowever, we cannot use skolem variable as before.
o∀x [Animal(A) ∧ ¬Loves(x, A))] ∨ [Loves(B, x)]
 Everyone either
fails to love a “particular” animal A.
Is loved by a “particular” person B.
oThe meaning of the sentence has changed.
oIn this case, we have to use “skolem function” which its value
depends on x.
o∀x [Animal(F(x)) ∧ ¬Loves(x, F(x)))] ∨ [Loves(G(x), x)]
oSkolem function are all the “universally quantified variable” in a
“scope of existential quantifier”.
oSo… skolem funtion behaves like a universal quantifier

Resolution
• 5. Drop universal quantifiers
oSince everything up to this point is universally quantified,
dropping the sign is a legal action.
o[Animal(F(x)) ∧ ¬Loves(x, F(x))] ∨ [Loves(G(x), x)]
• 6. Rewrite into Conjunctive form (Distribute ∧ over ∨)
o(L1 ∨ … ∨ Ln) ∧ (M1 ∨ … ∨ Mm) ∧ … ∧ (Z1 ∨ … ∨ Zz)
o[Animal(F(x)) ∨ Loves(G(x), x)] ∧ [¬Loves(x, F(x)) ∨ Loves(G(x),
x)]

Resolution
• Properties of resolution
oComplete for FOL
oRefutation-complete
 If a set of sentences is unsatisfiable, resolution will always be able
to derive a contradiction.
oHowever, resolution can produce nonconstructive proofs.
 A proof which indirectly shows a mathematical object exists without
providing a specific example or algorithm for producing an
example.
 To avoid that, answer literal is added to the negated goal.
 CNF(KB ∧ ¬(a ∧ answerLiteral))

Resolution strategies
• Unit preference
oMake use of “unit clause” : a sentence which has exactly one
literal.
oIdea : use unit clause to produce shorter sentence.
 Since we want to reach empty clause as soon as possible, having
a heuristic that could reduce the sentence’s length is worth trying

Resolution strategies
• Set of support(SoS)
oA special set that in every step of resolution, the resolvent
comes from a member of that set.
oIdea :
 We get a problem with a set of conditions.
 To solve the problem, we have to use every condition indicated in
the problem.
If we could solve without using every condition????
Something must be off!
 That set of conditions is an analogy to the set of support.
• Drawback
oIf we define bad SoS, problem becomes unsolvable.

Summary
• A first approach uses inference rules for instantiating
quantifiers in order to propositionalize the inference problem.
However, it’s very slow.
• Lifted Modus Ponens, called generalized Modus Ponens,
can be applied to FOL. FC and BC also use this rule.
• FC is a bottom-up process. Starting from scattered facts,
assembling them to get the desired conclusion.
• BC is a top-down process. Starting from desired goal, then
try to find the facts that can prove the goal true.
• Resolution also has its lifted version which works in the
same manner, CNF and proof by contradiction, as the
propositional version.

Inference in First-Order Logic

More Related Content

What's hot (20)

Similar to Inference in First-Order Logic (20)

Recently uploaded (20)

Inference in First-Order Logic

Editor's Notes