156
Submodular Functions, Optimization, and Applications to Machine Learning — Spring Quarter, Lecture 3 — http://www.ee.washington.edu/people/faculty/bilmes/classes/ee563_spring_2018/ Prof. JeBilmes University of Washington, Seattle Department of Electrical Engineering http://melodi.ee.washington.edu/~bilmes April 2nd, 2018 + f (A)+ f (B) f (A [ B) =f (Ar ) + f ( C ) + f ( B r ) =f (A\B) f (A \ B) =f (Ar ) +2 f ( C ) + f ( B r ) Prof. JeBilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F1/56 (pg.1/154)

Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

Submodular Functions, Optimization,and Applications to Machine Learning

— Spring Quarter, Lecture 3 —http://www.ee.washington.edu/people/faculty/bilmes/classes/ee563_spring_2018/

Prof. Jeff Bilmes

University of Washington, SeattleDepartment of Electrical Engineering

http://melodi.ee.washington.edu/~bilmes

April 2nd, 2018

+f (A) + f (B) f (A [ B)

= f (Ar ) +f (C ) + f (Br )

�= f (A \ B)

f (A \ B)

= f (Ar ) + 2f (C ) + f (Br )

Clockwise from top left:vLásló Lovász

Jack EdmondsSatoru Fujishige

George NemhauserLaurence Wolsey

András FrankLloyd ShapleyH. NarayananRobert Bixby

William CunninghamWilliam TutteRichard Rado

Alexander SchrijverGarrett BirkhoffHassler Whitney

Richard Dedekind

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F1/56 (pg.1/154)

Page 2: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

Logistics Review

Cumulative Outstanding Reading

Read chapter 1 from Fujishige’s book.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F2/56 (pg.2/154)

Page 3: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

Logistics Review

Class Road Map - EE563L1(3/26): Motivation, Applications, &

Basic Definitions,

L2(3/28): Machine Learning Apps

(diversity, complexity, parameter, learning

target, surrogate).

L3(4/2): Info theory exs, more apps,

definitions, graph/combinatorial examples

L4(4/4):

L5(4/9):

L6(4/11):

L7(4/16):

L8(4/18):

L9(4/23):

L10(4/25):

L11(4/30):

L12(5/2):

L13(5/7):

L14(5/9):

L15(5/14):

L16(5/16):

L17(5/21):

L18(5/23):

L–(5/28): Memorial Day (holiday)

L19(5/30):

L21(6/4): Final Presentations

maximization.

Last day of instruction, June 1st. Finals Week: June 2-8, 2018.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F3/56 (pg.3/154)

Page 4: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

Logistics Review

Two Equivalent Submodular Definitions

Definition 3.2.1 (submodular concave)

A function f : 2V ! R is submodular if for any A,B ✓ V , we have that:

f(A) + f(B) � f(A [B) + f(A \B) (3.8)

An alternate and (as we will soon see) equivalent definition is:

Definition 3.2.2 (diminishing returns)

A function f : 2V ! R is submodular if for any A ✓ B ⇢ V , andv 2 V \B, we have that:

f(A [ {v})� f(A) � f(B [ {v})� f(B) (3.9)

The incremental “value”, “gain”, or “cost” of v decreases (diminishes) as thecontext in which v is considered grows from A to B.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F4/56 (pg.4/154)

w w

Page 5: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

Logistics Review

Two Equivalent Supermodular Definitions

Definition 3.2.1 (supermodular)

A function f : 2V ! R is supermodular if for any A,B ✓ V , we have that:

f(A) + f(B) f(A [B) + f(A \B) (3.8)

Definition 3.2.2 (supermodular (improving returns))

A function f : 2V ! R is supermodular if for any A ✓ B ⇢ V , andv 2 V \B, we have that:

f(A [ {v})� f(A) f(B [ {v})� f(B) (3.9)

Incremental “value”, “gain”, or “cost” of v increases (improves) as thecontext in which v is considered grows from A to B.A function f is submodular iff �f is supermodular.If f both submodular and supermodular, then f is said to be modular,and f(A) = c+

Pa2A f(a) (often c = 0).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F5/56 (pg.5/154)

o-

Page 6: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

Logistics Review

Submodularity’s utility in MLA model of a physical process :

When maximizing, submodularity naturally models: diversity, coverage,span, and information.When minimizing, submodularity naturally models: cooperative costs,complexity, roughness, and irregularity.vice-versa for supermodularity.

A submodular function can act as a parameter for a machine learningstrategy (active/semi-supervised learning, discrete divergence, structuredsparse convex norms for use in regularization).Itself, as an object or function to learn , based on data.A surrogate or relaxation strategy for optimization or analysis

An alternate to factorization, decomposition, or sum-product basedsimplification (as one typically finds in a graphical model). I.e., a meanstowards tractable surrogates for graphical models.Also, we can “relax” a problem to a submodular one where it can beefficiently solved and offer a bounded quality solution.Non-submodular problems can be analyzed via submodularity.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F6/56 (pg.6/154)

Page 7: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Learning Submodular Functions

Learning submodular functions is hard

Goemans et al. (2009): “can one make only polynomial number ofqueries to an unknown submodular function f and constructs a f suchthat f(S) f(S) g(n)f(S) where g : N ! R?”

Many results,including that even with adaptive queries and monotone functions,can’t do better than ⌦(

pn/ log n).

Balcan & Harvey (2011): submodular function learning problem from alearning theory perspective, given a distribution on subsets. Negativeresult is that can’t approximate in this setting to within a constantfactor.Feldman, Kothari, Vondrák (2013), shows in some learning settings,things are more promising (PAC learning possible in O(n2) · 2O(1/✏4)).One example: can we learn a subclass, perhaps non-negative weightedmixtures of submodular components?

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F7/56 (pg.7/154)

Page 8: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Learning Submodular Functions

Learning submodular functions is hardGoemans et al. (2009): “can one make only polynomial number ofqueries to an unknown submodular function f and constructs a f suchthat f(S) f(S) g(n)f(S) where g : N ! R?”

Many results,including that even with adaptive queries and monotone functions,can’t do better than ⌦(

pn/ log n).

Balcan & Harvey (2011): submodular function learning problem from alearning theory perspective, given a distribution on subsets. Negativeresult is that can’t approximate in this setting to within a constantfactor.Feldman, Kothari, Vondrák (2013), shows in some learning settings,things are more promising (PAC learning possible in O(n2) · 2O(1/✏4)).One example: can we learn a subclass, perhaps non-negative weightedmixtures of submodular components?

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F7/56 (pg.8/154)

Page 9: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Learning Submodular Functions

Learning submodular functions is hardGoemans et al. (2009): “can one make only polynomial number ofqueries to an unknown submodular function f and constructs a f suchthat f(S) f(S) g(n)f(S) where g : N ! R?” Many results,including that even with adaptive queries and monotone functions,can’t do better than ⌦(

pn/ log n).

Balcan & Harvey (2011): submodular function learning problem from alearning theory perspective, given a distribution on subsets. Negativeresult is that can’t approximate in this setting to within a constantfactor.Feldman, Kothari, Vondrák (2013), shows in some learning settings,things are more promising (PAC learning possible in O(n2) · 2O(1/✏4)).One example: can we learn a subclass, perhaps non-negative weightedmixtures of submodular components?

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F7/56 (pg.9/154)

Page 10: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Learning Submodular Functions

Learning submodular functions is hardGoemans et al. (2009): “can one make only polynomial number ofqueries to an unknown submodular function f and constructs a f suchthat f(S) f(S) g(n)f(S) where g : N ! R?” Many results,including that even with adaptive queries and monotone functions,can’t do better than ⌦(

pn/ log n).

Balcan & Harvey (2011): submodular function learning problem from alearning theory perspective, given a distribution on subsets. Negativeresult is that can’t approximate in this setting to within a constantfactor.

Feldman, Kothari, Vondrák (2013), shows in some learning settings,things are more promising (PAC learning possible in O(n2) · 2O(1/✏4)).One example: can we learn a subclass, perhaps non-negative weightedmixtures of submodular components?

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F7/56 (pg.10/154)

Page 11: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Learning Submodular Functions

Learning submodular functions is hardGoemans et al. (2009): “can one make only polynomial number ofqueries to an unknown submodular function f and constructs a f suchthat f(S) f(S) g(n)f(S) where g : N ! R?” Many results,including that even with adaptive queries and monotone functions,can’t do better than ⌦(

pn/ log n).

Balcan & Harvey (2011): submodular function learning problem from alearning theory perspective, given a distribution on subsets. Negativeresult is that can’t approximate in this setting to within a constantfactor.Feldman, Kothari, Vondrák (2013), shows in some learning settings,things are more promising (PAC learning possible in O(n2) · 2O(1/✏4)).

One example: can we learn a subclass, perhaps non-negative weightedmixtures of submodular components?

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F7/56 (pg.11/154)

Page 12: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Learning Submodular Functions

Learning submodular functions is hardGoemans et al. (2009): “can one make only polynomial number ofqueries to an unknown submodular function f and constructs a f suchthat f(S) f(S) g(n)f(S) where g : N ! R?” Many results,including that even with adaptive queries and monotone functions,can’t do better than ⌦(

pn/ log n).

Balcan & Harvey (2011): submodular function learning problem from alearning theory perspective, given a distribution on subsets. Negativeresult is that can’t approximate in this setting to within a constantfactor.Feldman, Kothari, Vondrák (2013), shows in some learning settings,things are more promising (PAC learning possible in O(n2) · 2O(1/✏4)).One example: can we learn a subclass, perhaps non-negative weightedmixtures of submodular components?

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F7/56 (pg.12/154)

f ( Al =

? he. f. ( A )

Page 13: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Structured Learning of Submodular Mixtures

Constraints specified in inference form:

minimizew,⇠t

1

T

X

t

⇠t +�

2kwk2 (3.1)

subject to w>ft(y(t)) � max

y2Yt

⇣w>ft(y) + `t(y)

⌘� ⇠t, 8t (3.2)

⇠t � 0, 8t. (3.3)

Exponential set of constraints reduced to an embedded optimizationproblem, “loss-augmented inference.”w>ft(y) is a mixture of submodular components.If loss is also submodular, then loss-augmented inference is submodularoptimization.If loss is supermodular, this is a difference-of-submodular (DS) functionoptimization.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F8/56 (pg.13/154)

T

D= { y't

'3⇐ ,

f ( y't ') big ft

0

WT . f ( yitl )

Page 14: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Structured Learning of Submodular Mixtures

Constraints specified in inference form:

minimizew,⇠t

1

T

X

t

⇠t +�

2kwk2 (3.1)

subject to w>ft(y(t)) � max

y2Yt

⇣w>ft(y) + `t(y)

⌘� ⇠t, 8t (3.2)

⇠t � 0, 8t. (3.3)

Exponential set of constraints reduced to an embedded optimizationproblem, “loss-augmented inference.”

w>ft(y) is a mixture of submodular components.If loss is also submodular, then loss-augmented inference is submodularoptimization.If loss is supermodular, this is a difference-of-submodular (DS) functionoptimization.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F8/56 (pg.14/154)

%

OFF

Page 15: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Structured Learning of Submodular Mixtures

Constraints specified in inference form:

minimizew,⇠t

1

T

X

t

⇠t +�

2kwk2 (3.1)

subject to w>ft(y(t)) � max

y2Yt

⇣w>ft(y) + `t(y)

⌘� ⇠t, 8t (3.2)

⇠t � 0, 8t. (3.3)

Exponential set of constraints reduced to an embedded optimizationproblem, “loss-augmented inference.”w>ft(y) is a mixture of submodular components.

If loss is also submodular, then loss-augmented inference is submodularoptimization.If loss is supermodular, this is a difference-of-submodular (DS) functionoptimization.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F8/56 (pg.15/154)

Page 16: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Structured Learning of Submodular Mixtures

Constraints specified in inference form:

minimizew,⇠t

1

T

X

t

⇠t +�

2kwk2 (3.1)

subject to w>ft(y(t)) � max

y2Yt

⇣w>ft(y) + `t(y)

⌘� ⇠t, 8t (3.2)

⇠t � 0, 8t. (3.3)

Exponential set of constraints reduced to an embedded optimizationproblem, “loss-augmented inference.”w>ft(y) is a mixture of submodular components.If loss is also submodular, then loss-augmented inference is submodularoptimization.

If loss is supermodular, this is a difference-of-submodular (DS) functionoptimization.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F8/56 (pg.16/154)

Page 17: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Structured Learning of Submodular Mixtures

Constraints specified in inference form:

minimizew,⇠t

1

T

X

t

⇠t +�

2kwk2 (3.1)

subject to w>ft(y(t)) � max

y2Yt

⇣w>ft(y) + `t(y)

⌘� ⇠t, 8t (3.2)

⇠t � 0, 8t. (3.3)

Exponential set of constraints reduced to an embedded optimizationproblem, “loss-augmented inference.”w>ft(y) is a mixture of submodular components.If loss is also submodular, then loss-augmented inference is submodularoptimization.If loss is supermodular, this is a difference-of-submodular (DS) functionoptimization.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F8/56 (pg.17/154)

Page 18: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Structured Prediction: Subgradient Learning

Solvable with simple sub-gradient descent algorithm using structuredvariant of hinge-loss (Taskar, 2004).Loss-augmented inference is either submodular optimization (Lin & B.2012) or DS optimization (Tschiatschek, Iyer, & B. 2014).

Algorithm 1: Subgradient descent learningInput : S = {(x(t),y(t))}T

t=1 and a learning rate sequence {⌘t}Tt=1.1 w0 = 0;2 for t = 1, · · · , T do3 Loss augmented inference: y⇤

t 2 argmaxy2Ytw>

t�1ft(y) + `t(y);4 Compute the subgradient: gt = �wt�1 + ft(y⇤)� ft(y(t));5 Update the weights: wt = wt�1 � ⌘tgt;

Return : the averaged parameters 1T

Ptwt.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F9/56 (pg.18/154)

Page 19: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Recall

The next page shows a slide from Lecture 1

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F10/56 (pg.19/154)

Page 20: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodular-Supermodular DecompositionAs an alternative to graphical decomposition, we can decompose afunction without resorting sums of local terms.

Theorem 3.4.1 (Additive Decomposition (Narasimhan & Bilmes, 2005))

Let h : 2V ! R be any set function. Then there exists a submodular

function f : 2V ! R and a supermodular function g : 2V ! R such that hmay be additively decomposed as follows: For all A ✓ V ,

h(A) = f(A) + g(A) (3.8)

For many applications (as we will see), either the submodular orsupermodular component is naturally zero.Sometimes more natural than a graphical decomposition.Sometimes h(A) has structure in terms of submodular functions but isnon additively decomposed (one example is h(A) = f(A)/g(A)).Complementary: simultaneous graphical/submodular-supermodulardecomposition (i.e., submodular + supermodular tree).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F11/56 (pg.20/154)

.

HLA ) Zfthtma

Page 21: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Applications of DS functions

Any function h : 2V ! R can be expressed as a difference between twosubmodular (DS) functions, h = f � g.

Sensor placement with submodular costs. I.e., let V be a set of possiblesensor locations, f(A) = I(XA;XV \A) measures the quality of asubset A of placed sensors, and c(A) the submodular cost. We havef(A)� �c(A) as the overall objective to maximize.

Discriminatively structured graphical models, EAR measureI(XA;XV \A)� I(XA;XV \A|C), and synergy in neuroscience.Feature selection: a problem of maximizingI(XA;C)� �c(A) = H(XA)� [H(XA|C) + �c(A)], the differencebetween two submodular functions, where H is the entropy and c is afeature cost function.Graphical Model Inference. Finding x that maximizesp(x) / exp(�v(x)) where x 2 {0, 1}n and v is a pseudo-Booleanfunction. When v is non-submodular, it can be represented as adifference between submodular functions.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F12/56 (pg.21/154)

Page 22: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Applications of DS functions

Any function h : 2V ! R can be expressed as a difference between twosubmodular (DS) functions, h = f � g.

Sensor placement with submodular costs. I.e., let V be a set of possiblesensor locations, f(A) = I(XA;XV \A) measures the quality of asubset A of placed sensors, and c(A) the submodular cost. We havef(A)� �c(A) as the overall objective to maximize.Discriminatively structured graphical models, EAR measureI(XA;XV \A)� I(XA;XV \A|C), and synergy in neuroscience.

Feature selection: a problem of maximizingI(XA;C)� �c(A) = H(XA)� [H(XA|C) + �c(A)], the differencebetween two submodular functions, where H is the entropy and c is afeature cost function.Graphical Model Inference. Finding x that maximizesp(x) / exp(�v(x)) where x 2 {0, 1}n and v is a pseudo-Booleanfunction. When v is non-submodular, it can be represented as adifference between submodular functions.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F12/56 (pg.22/154)

Page 23: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Applications of DS functions

Any function h : 2V ! R can be expressed as a difference between twosubmodular (DS) functions, h = f � g.

Sensor placement with submodular costs. I.e., let V be a set of possiblesensor locations, f(A) = I(XA;XV \A) measures the quality of asubset A of placed sensors, and c(A) the submodular cost. We havef(A)� �c(A) as the overall objective to maximize.Discriminatively structured graphical models, EAR measureI(XA;XV \A)� I(XA;XV \A|C), and synergy in neuroscience.Feature selection: a problem of maximizingI(XA;C)� �c(A) = H(XA)� [H(XA|C) + �c(A)], the differencebetween two submodular functions, where H is the entropy and c is afeature cost function.

Graphical Model Inference. Finding x that maximizesp(x) / exp(�v(x)) where x 2 {0, 1}n and v is a pseudo-Booleanfunction. When v is non-submodular, it can be represented as adifference between submodular functions.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F12/56 (pg.23/154)

Page 24: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Applications of DS functions

Any function h : 2V ! R can be expressed as a difference between twosubmodular (DS) functions, h = f � g.

Sensor placement with submodular costs. I.e., let V be a set of possiblesensor locations, f(A) = I(XA;XV \A) measures the quality of asubset A of placed sensors, and c(A) the submodular cost. We havef(A)� �c(A) as the overall objective to maximize.Discriminatively structured graphical models, EAR measureI(XA;XV \A)� I(XA;XV \A|C), and synergy in neuroscience.Feature selection: a problem of maximizingI(XA;C)� �c(A) = H(XA)� [H(XA|C) + �c(A)], the differencebetween two submodular functions, where H is the entropy and c is afeature cost function.Graphical Model Inference. Finding x that maximizesp(x) / exp(�v(x)) where x 2 {0, 1}n and v is a pseudo-Booleanfunction. When v is non-submodular, it can be represented as adifference between submodular functions.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F12/56 (pg.24/154)

Page 25: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodular Relaxation

We often are unable to optimize an objective. E.g., high tree-widthgraphical models (as we saw).

If potentials are submodular, we can solve them.When potentials are not, we might resort to factorization (e.g., themarginal polytope in variational inference, were we optimize over atree-constrained polytope).An alternative is submodular relaxation. I.e., given

Pr(x) =1

Zexp(�E(x)) (3.4)

where E(x) = Ef (x)� Eg(x) and both of Ef (x) and Eg(x) aresubmodular.Any function can be expressed as the difference between twosubmodular functions.Hence, rather than minimize E(x) (hard), we can minimize the easierE(x) = Ef (x)� Em(x) � E(x) where Em(x) is a modular lowerbound on Eg(x).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F13/56 (pg.25/154)

Page 26: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodular Relaxation

We often are unable to optimize an objective. E.g., high tree-widthgraphical models (as we saw).If potentials are submodular, we can solve them.

When potentials are not, we might resort to factorization (e.g., themarginal polytope in variational inference, were we optimize over atree-constrained polytope).An alternative is submodular relaxation. I.e., given

Pr(x) =1

Zexp(�E(x)) (3.4)

where E(x) = Ef (x)� Eg(x) and both of Ef (x) and Eg(x) aresubmodular.Any function can be expressed as the difference between twosubmodular functions.Hence, rather than minimize E(x) (hard), we can minimize the easierE(x) = Ef (x)� Em(x) � E(x) where Em(x) is a modular lowerbound on Eg(x).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F13/56 (pg.26/154)

Page 27: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodular Relaxation

We often are unable to optimize an objective. E.g., high tree-widthgraphical models (as we saw).If potentials are submodular, we can solve them.When potentials are not, we might resort to factorization (e.g., themarginal polytope in variational inference, were we optimize over atree-constrained polytope).

An alternative is submodular relaxation. I.e., given

Pr(x) =1

Zexp(�E(x)) (3.4)

where E(x) = Ef (x)� Eg(x) and both of Ef (x) and Eg(x) aresubmodular.Any function can be expressed as the difference between twosubmodular functions.Hence, rather than minimize E(x) (hard), we can minimize the easierE(x) = Ef (x)� Em(x) � E(x) where Em(x) is a modular lowerbound on Eg(x).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F13/56 (pg.27/154)

Page 28: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodular Relaxation

We often are unable to optimize an objective. E.g., high tree-widthgraphical models (as we saw).If potentials are submodular, we can solve them.When potentials are not, we might resort to factorization (e.g., themarginal polytope in variational inference, were we optimize over atree-constrained polytope).An alternative is submodular relaxation. I.e., given

Pr(x) =1

Zexp(�E(x)) (3.4)

where E(x) = Ef (x)� Eg(x) and both of Ef (x) and Eg(x) aresubmodular.

Any function can be expressed as the difference between twosubmodular functions.Hence, rather than minimize E(x) (hard), we can minimize the easierE(x) = Ef (x)� Em(x) � E(x) where Em(x) is a modular lowerbound on Eg(x).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F13/56 (pg.28/154)

Page 29: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodular Relaxation

We often are unable to optimize an objective. E.g., high tree-widthgraphical models (as we saw).If potentials are submodular, we can solve them.When potentials are not, we might resort to factorization (e.g., themarginal polytope in variational inference, were we optimize over atree-constrained polytope).An alternative is submodular relaxation. I.e., given

Pr(x) =1

Zexp(�E(x)) (3.4)

where E(x) = Ef (x)� Eg(x) and both of Ef (x) and Eg(x) aresubmodular.Any function can be expressed as the difference between twosubmodular functions.

Hence, rather than minimize E(x) (hard), we can minimize the easierE(x) = Ef (x)� Em(x) � E(x) where Em(x) is a modular lowerbound on Eg(x).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F13/56 (pg.29/154)

Page 30: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodular Relaxation

We often are unable to optimize an objective. E.g., high tree-widthgraphical models (as we saw).If potentials are submodular, we can solve them.When potentials are not, we might resort to factorization (e.g., themarginal polytope in variational inference, were we optimize over atree-constrained polytope).An alternative is submodular relaxation. I.e., given

Pr(x) =1

Zexp(�E(x)) (3.4)

where E(x) = Ef (x)� Eg(x) and both of Ef (x) and Eg(x) aresubmodular.Any function can be expressed as the difference between twosubmodular functions.Hence, rather than minimize E(x) (hard), we can minimize the easierE(x) = Ef (x)� Em(x) � E(x) where Em(x) is a modular lowerbound on Eg(x).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F13/56 (pg.30/154)

Page 31: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodular Analysis for Non-Submodular Problems

Sometimes the quality of solutions to non-submodular problems can beanalyzed via submodularity.

For example, “deviation from submodularity” can be measured using thesubmodularity ratio (Das & Kempe):

�U,k(f) , minL✓U,S:|S|k,S\L=;

Ps2S f(x|L)f(S|L) (3.5)

f is submodular if and only if �V,|V | = 1.For some variable selection problems, can get bounds of the form:

Solution � (1� 1

e�U⇤,k)OPT (3.6)

where U⇤ is the solution set of a variable selection algorithm.This gradually get worse as we move away from an objective beingsubmodular (see Das & Kempe, 2011).Other analogous concepts: curvature of a submodular function, andalso the submodular degree.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F14/56 (pg.31/154)

Page 32: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodular Analysis for Non-Submodular Problems

Sometimes the quality of solutions to non-submodular problems can beanalyzed via submodularity.For example, “deviation from submodularity” can be measured using thesubmodularity ratio (Das & Kempe):

�U,k(f) , minL✓U,S:|S|k,S\L=;

Ps2S f(x|L)f(S|L) (3.5)

f is submodular if and only if �V,|V | = 1.For some variable selection problems, can get bounds of the form:

Solution � (1� 1

e�U⇤,k)OPT (3.6)

where U⇤ is the solution set of a variable selection algorithm.This gradually get worse as we move away from an objective beingsubmodular (see Das & Kempe, 2011).Other analogous concepts: curvature of a submodular function, andalso the submodular degree.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F14/56 (pg.32/154)

f ( xk ) = f( tul ) - f( L )

Page 33: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodular Analysis for Non-Submodular Problems

Sometimes the quality of solutions to non-submodular problems can beanalyzed via submodularity.For example, “deviation from submodularity” can be measured using thesubmodularity ratio (Das & Kempe):

�U,k(f) , minL✓U,S:|S|k,S\L=;

Ps2S f(x|L)f(S|L) (3.5)

f is submodular if and only if �V,|V | = 1.

For some variable selection problems, can get bounds of the form:

Solution � (1� 1

e�U⇤,k)OPT (3.6)

where U⇤ is the solution set of a variable selection algorithm.This gradually get worse as we move away from an objective beingsubmodular (see Das & Kempe, 2011).Other analogous concepts: curvature of a submodular function, andalso the submodular degree.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F14/56 (pg.33/154)

Page 34: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodular Analysis for Non-Submodular Problems

Sometimes the quality of solutions to non-submodular problems can beanalyzed via submodularity.For example, “deviation from submodularity” can be measured using thesubmodularity ratio (Das & Kempe):

�U,k(f) , minL✓U,S:|S|k,S\L=;

Ps2S f(x|L)f(S|L) (3.5)

f is submodular if and only if �V,|V | = 1.For some variable selection problems, can get bounds of the form:

Solution � (1� 1

e�U⇤,k)OPT (3.6)

where U⇤ is the solution set of a variable selection algorithm.

This gradually get worse as we move away from an objective beingsubmodular (see Das & Kempe, 2011).Other analogous concepts: curvature of a submodular function, andalso the submodular degree.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F14/56 (pg.34/154)

Page 35: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodular Analysis for Non-Submodular Problems

Sometimes the quality of solutions to non-submodular problems can beanalyzed via submodularity.For example, “deviation from submodularity” can be measured using thesubmodularity ratio (Das & Kempe):

�U,k(f) , minL✓U,S:|S|k,S\L=;

Ps2S f(x|L)f(S|L) (3.5)

f is submodular if and only if �V,|V | = 1.For some variable selection problems, can get bounds of the form:

Solution � (1� 1

e�U⇤,k)OPT (3.6)

where U⇤ is the solution set of a variable selection algorithm.This gradually get worse as we move away from an objective beingsubmodular (see Das & Kempe, 2011).

Other analogous concepts: curvature of a submodular function, andalso the submodular degree.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F14/56 (pg.35/154)

Page 36: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodular Analysis for Non-Submodular Problems

Sometimes the quality of solutions to non-submodular problems can beanalyzed via submodularity.For example, “deviation from submodularity” can be measured using thesubmodularity ratio (Das & Kempe):

�U,k(f) , minL✓U,S:|S|k,S\L=;

Ps2S f(x|L)f(S|L) (3.5)

f is submodular if and only if �V,|V | = 1.For some variable selection problems, can get bounds of the form:

Solution � (1� 1

e�U⇤,k)OPT (3.6)

where U⇤ is the solution set of a variable selection algorithm.This gradually get worse as we move away from an objective beingsubmodular (see Das & Kempe, 2011).Other analogous concepts: curvature of a submodular function, andalso the submodular degree.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F14/56 (pg.36/154)

Page 37: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Ground set: E or V ?

Submodular functions are functions defined on subsets of some finite set,called the ground set.

It is common in the literature to use either E or V as the ground set —we will at different times use both (there should be no confusion).

The terminology ground set comes from lattice theory, where V are theground elements of a lattice (just above 0).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F15/56 (pg.37/154)

f ; 2V→1R

Page 38: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Ground set: E or V ?

Submodular functions are functions defined on subsets of some finite set,called the ground set.

It is common in the literature to use either E or V as the ground set —we will at different times use both (there should be no confusion).The terminology ground set comes from lattice theory, where V are theground elements of a lattice (just above 0).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F15/56 (pg.38/154)

V ,VL

-

. . . Un

u -¢

Page 39: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Notation RE, and modular functions as vectors

What does x 2 RE mean?

RE = {x = (xj 2 R : j 2 E)} (3.7)

and

RE

+ = {x = (xj : j 2 E) : x � 0} (3.8)

Any vector x 2 RE can be treated as a normalized modular function, andvice verse. That is, for A ✓ E,

x(A) =X

a2Axa (3.9)

Note that x is said to be normalized since x(;) = 0.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F16/56 (pg.39/154)

ttlk"

|RlEl

Page 40: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Notation RE, and modular functions as vectors

What does x 2 RE mean?

RE = {x = (xj 2 R : j 2 E)} (3.7)

and

RE

+ = {x = (xj : j 2 E) : x � 0} (3.8)

Any vector x 2 RE can be treated as a normalized modular function, andvice verse. That is, for A ✓ E,

x(A) =X

a2Axa (3.9)

Note that x is said to be normalized since x(;) = 0.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F16/56 (pg.40/154)

Page 41: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

characteristic (incidence) vectors of sets & modularfunctions

Given an A ✓ E, define the incidence (or characteristic) vector1A 2 {0, 1}E on the unit hypercube to be

1A(j) =

(1 if j 2 A;

0 if j /2 A(3.10)

or equivalently,

1Adef=

nx 2 {0, 1}E : xi = 1 if‌f i 2 A

o(3.11)

Sometimes this is written as �A ⌘ 1A.Thus, given modular function x 2 RE , we can write x(A) in a varietyof ways, i.e.,

x(A) = x| · 1A =X

i2Ax(i) (3.12)

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F17/56 (pg.41/154)

w --

Page 42: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

characteristic (incidence) vectors of sets & modularfunctions

Given an A ✓ E, define the incidence (or characteristic) vector1A 2 {0, 1}E on the unit hypercube to be

1A(j) =

(1 if j 2 A;

0 if j /2 A(3.10)

or equivalently,

1Adef=

nx 2 {0, 1}E : xi = 1 if‌f i 2 A

o(3.11)

Sometimes this is written as �A ⌘ 1A.

Thus, given modular function x 2 RE , we can write x(A) in a varietyof ways, i.e.,

x(A) = x| · 1A =X

i2Ax(i) (3.12)

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F17/56 (pg.42/154)

Page 43: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

characteristic (incidence) vectors of sets & modularfunctions

Given an A ✓ E, define the incidence (or characteristic) vector1A 2 {0, 1}E on the unit hypercube to be

1A(j) =

(1 if j 2 A;

0 if j /2 A(3.10)

or equivalently,

1Adef=

nx 2 {0, 1}E : xi = 1 if‌f i 2 A

o(3.11)

Sometimes this is written as �A ⌘ 1A.Thus, given modular function x 2 RE , we can write x(A) in a varietyof ways, i.e.,

x(A) = x| · 1A =X

i2Ax(i) (3.12)

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F17/56 (pg.43/154)

Page 44: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Other Notation: singletons and sets

When A is a set and k is a singleton (i.e., a single item), the union isproperly written as A [ {k}, but sometimes we will write just A+ k.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F18/56 (pg.44/154)

Page 45: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

What does ST mean when S and T are arbitrary sets?

Let S and T be two arbitrary sets (either of which could be countable,or uncountable).

We define the notation ST to be the set of all functions that map fromT to S. That is, if f 2 ST , then f : T ! S.Hence, given a finite set E, RE is the set of all functions that mapfrom elements of E to the reals R, and such functions are identical to avector in a vector space with axes labeled as elements of E (i.e., ifm 2 RE , then for all e 2 E, m(e) 2 R).Often “2” is shorthand for the set {0, 1}. I.e., R2 where 2 ⌘ {0, 1}.Similarly, 2E is the set of all functions from E to “two” — so 2E isshorthand for {0, 1}E

— hence, 2E is the set of all functions that mapfrom elements of E to {0, 1}, equivalent to all binary vectors withelements indexed by elements of E, equivalent to subsets of E. Hence,if A 2 2E then A ✓ E.

What might 3E mean?

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F19/56 (pg.45/154)

Page 46: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

What does ST mean when S and T are arbitrary sets?

Let S and T be two arbitrary sets (either of which could be countable,or uncountable).We define the notation ST to be the set of all functions that map fromT to S. That is, if f 2 ST , then f : T ! S.

Hence, given a finite set E, RE is the set of all functions that mapfrom elements of E to the reals R, and such functions are identical to avector in a vector space with axes labeled as elements of E (i.e., ifm 2 RE , then for all e 2 E, m(e) 2 R).Often “2” is shorthand for the set {0, 1}. I.e., R2 where 2 ⌘ {0, 1}.Similarly, 2E is the set of all functions from E to “two” — so 2E isshorthand for {0, 1}E

— hence, 2E is the set of all functions that mapfrom elements of E to {0, 1}, equivalent to all binary vectors withelements indexed by elements of E, equivalent to subsets of E. Hence,if A 2 2E then A ✓ E.

What might 3E mean?

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F19/56 (pg.46/154)

Page 47: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

What does ST mean when S and T are arbitrary sets?

Let S and T be two arbitrary sets (either of which could be countable,or uncountable).We define the notation ST to be the set of all functions that map fromT to S. That is, if f 2 ST , then f : T ! S.Hence, given a finite set E, RE is the set of all functions that mapfrom elements of E to the reals R, and such functions are identical to avector in a vector space with axes labeled as elements of E (i.e., ifm 2 RE , then for all e 2 E, m(e) 2 R).

Often “2” is shorthand for the set {0, 1}. I.e., R2 where 2 ⌘ {0, 1}.Similarly, 2E is the set of all functions from E to “two” — so 2E isshorthand for {0, 1}E

— hence, 2E is the set of all functions that mapfrom elements of E to {0, 1}, equivalent to all binary vectors withelements indexed by elements of E, equivalent to subsets of E. Hence,if A 2 2E then A ✓ E.

What might 3E mean?

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F19/56 (pg.47/154)

lo

f E) RE f : E → IR

#Yoo ) tif

Page 48: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

What does ST mean when S and T are arbitrary sets?

Let S and T be two arbitrary sets (either of which could be countable,or uncountable).We define the notation ST to be the set of all functions that map fromT to S. That is, if f 2 ST , then f : T ! S.Hence, given a finite set E, RE is the set of all functions that mapfrom elements of E to the reals R, and such functions are identical to avector in a vector space with axes labeled as elements of E (i.e., ifm 2 RE , then for all e 2 E, m(e) 2 R).Often “2” is shorthand for the set {0, 1}. I.e., R2 where 2 ⌘ {0, 1}.

Similarly, 2E is the set of all functions from E to “two” — so 2E isshorthand for {0, 1}E

— hence, 2E is the set of all functions that mapfrom elements of E to {0, 1}, equivalent to all binary vectors withelements indexed by elements of E, equivalent to subsets of E. Hence,if A 2 2E then A ✓ E.

What might 3E mean?

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F19/56 (pg.48/154)

g in iz )= { 1,2 )

IR

Page 49: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

What does ST mean when S and T are arbitrary sets?

Let S and T be two arbitrary sets (either of which could be countable,or uncountable).We define the notation ST to be the set of all functions that map fromT to S. That is, if f 2 ST , then f : T ! S.Hence, given a finite set E, RE is the set of all functions that mapfrom elements of E to the reals R, and such functions are identical to avector in a vector space with axes labeled as elements of E (i.e., ifm 2 RE , then for all e 2 E, m(e) 2 R).Often “2” is shorthand for the set {0, 1}. I.e., R2 where 2 ⌘ {0, 1}.Similarly, 2E is the set of all functions from E to “two” — so 2E isshorthand for {0, 1}E

— hence, 2E is the set of all functions that mapfrom elements of E to {0, 1}, equivalent to all binary vectors withelements indexed by elements of E, equivalent to subsets of E. Hence,if A 2 2E then A ✓ E.What might 3E mean?

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F19/56 (pg.49/154)

E 7 9 91 )

Page 50: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

What does ST mean when S and T are arbitrary sets?

Let S and T be two arbitrary sets (either of which could be countable,or uncountable).We define the notation ST to be the set of all functions that map fromT to S. That is, if f 2 ST , then f : T ! S.Hence, given a finite set E, RE is the set of all functions that mapfrom elements of E to the reals R, and such functions are identical to avector in a vector space with axes labeled as elements of E (i.e., ifm 2 RE , then for all e 2 E, m(e) 2 R).Often “2” is shorthand for the set {0, 1}. I.e., R2 where 2 ⌘ {0, 1}.Similarly, 2E is the set of all functions from E to “two” — so 2E isshorthand for {0, 1}E — hence, 2E is the set of all functions that mapfrom elements of E to {0, 1}, equivalent to all binary vectors withelements indexed by elements of E, equivalent to subsets of E. Hence,if A 2 2E then A ✓ E.

What might 3E mean?

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F19/56 (pg.50/154)

f : je → IR f : ( E → { 0,13 ) → IR

Page 51: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

What does ST mean when S and T are arbitrary sets?

Let S and T be two arbitrary sets (either of which could be countable,or uncountable).We define the notation ST to be the set of all functions that map fromT to S. That is, if f 2 ST , then f : T ! S.Hence, given a finite set E, RE is the set of all functions that mapfrom elements of E to the reals R, and such functions are identical to avector in a vector space with axes labeled as elements of E (i.e., ifm 2 RE , then for all e 2 E, m(e) 2 R).Often “2” is shorthand for the set {0, 1}. I.e., R2 where 2 ⌘ {0, 1}.Similarly, 2E is the set of all functions from E to “two” — so 2E isshorthand for {0, 1}E — hence, 2E is the set of all functions that mapfrom elements of E to {0, 1}, equivalent to all binary vectors withelements indexed by elements of E, equivalent to subsets of E. Hence,if A 2 2E then A ✓ E.What might 3E mean?

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F19/56 (pg.51/154)

E 9 { 0,4323

Page 52: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Example Submodular: Entropy from Information Theory

Entropy is submodular. Let V be the index set of a set of randomvariables, then the function

f(A) = H(XA) = �X

xA

p(xA) log p(xA) (3.13)

is submodular.Proof: (further) conditioning reduces entropy. With A ✓ B and v /2 B,

H(Xv|XB) = H(XB+v)�H(XB) (3.14) H(XA+v)�H(XA) = H(Xv|XA) (3.15)

We say “further” due to B \A not nec. empty.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F20/56 (pg.52/154)

pcxr ) lvkh ¥

Page 53: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Example Submodular: Entropy from Information Theory

Alternate Proof: Conditional mutual Information is always non-negative.Given A,B ✓ V , consider conditional mutual information quantity:

I(XA\B;XB\A|XA\B) =X

xA[B

p(xA[B) logp(xA\B, xB\A|xA\B)

p(xA\B|xA\B)p(xB\A|xA\B)

=X

xA[B

p(xA[B) logp(xA[B)p(xA\B)

p(xA)p(xB)� 0 (3.16)

then

I(XA\B;XB\A|XA\B)

= H(XA) +H(XB)�H(XA[B)�H(XA\B) � 0 (3.17)

so entropy satisfies

H(XA) +H(XB) � H(XA[B) +H(XA\B) (3.18)

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F21/56 (pg.53/154)

&

At&tB

Page 54: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Information Theory: Block Coding

Given a set of random variables {Xi}i2V indexed by set V , how do wepartition them so that we can best block-code them within each block.

I.e., how do we form S ✓ V such that I(XS ;XV \S) is as small aspossible, where I(XA;XB) is the mutual information between randomvariables XA and XB, i.e.,

I(XA;XB) = H(XA) +H(XB)�H(XA, XB) (3.19)

and H(XA) = �P

xAp(xA) log p(xA) is the joint entropy of the set

XA of random variables.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F22/56 (pg.54/154)

Page 55: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Information Theory: Block Coding

Given a set of random variables {Xi}i2V indexed by set V , how do wepartition them so that we can best block-code them within each block.I.e., how do we form S ✓ V such that I(XS ;XV \S) is as small aspossible, where I(XA;XB) is the mutual information between randomvariables XA and XB, i.e.,

I(XA;XB) = H(XA) +H(XB)�H(XA, XB) (3.19)

and H(XA) = �P

xAp(xA) log p(xA) is the joint entropy of the set

XA of random variables.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F22/56 (pg.55/154)

Page 56: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Example Submodular: Mutual Information

Also, symmetric mutual information is submodular,

f(A) = I(XA;XV \A) = H(XA) +H(XV \A)�H(XV ) (3.20)

Note that f(A) = H(XA) and f(A) = H(XV \A), and addingsubmodular functions preserves submodularity (which we will see quitesoon).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F23/56 (pg.56/154)

O

Page 57: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices

m⇥ n matrices C = [cij ]ij are called Monge matrices if they satisfy theMonge property, namely:

cij + crs cis + crj (3.21)

for all 1 i < r m and 1 j < s n.

Equivalently, for all 1 i, r m, 1 j, s n,

cmin(i,r),min(j,s) + cmax(i,r),max(j,s) cis + crj (3.22)

Consider four elements of the m⇥ n matrix:

cij = A+B, crj = B, crs = B +D, cis = A+B + C +D.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F24/56 (pg.57/154)

Page 58: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices

m⇥ n matrices C = [cij ]ij are called Monge matrices if they satisfy theMonge property, namely:

cij + crs cis + crj (3.21)

for all 1 i < r m and 1 j < s n.Equivalently, for all 1 i, r m, 1 j, s n,

cmin(i,r),min(j,s) + cmax(i,r),max(j,s) cis + crj (3.22)

Consider four elements of the m⇥ n matrix:

cij = A+B, crj = B, crs = B +D, cis = A+B + C +D.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F24/56 (pg.58/154)

Page 59: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices

m⇥ n matrices C = [cij ]ij are called Monge matrices if they satisfy theMonge property, namely:

cij + crs cis + crj (3.21)

for all 1 i < r m and 1 j < s n.Equivalently, for all 1 i, r m, 1 j, s n,

cmin(i,r),min(j,s) + cmax(i,r),max(j,s) cis + crj (3.22)

Consider four elements of the m⇥ n matrix:

cij

crs

cis

crjm

n

i

j

r

s

cij

crs

cis

crjm

n

i

j

r

s

A

B

C

D

cij = A+B, crj = B, crs = B +D, cis = A+B + C +D.Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F24/56 (pg.59/154)

g.iii

Page 60: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

X m on z

[email protected]

�4�

Page 61: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices, where useful

Useful for speeding up many transportation, dynamic programming,flow, search, lot-sizing and many other problems.

Example, Hitchcock transportation problem: Given m⇥ n cost matrixC = [cij ]ij , a non-negative supply vector a 2 Rm

+ , a non-negativedemand vector b 2 Rn

+ withP

m

i=1 a(i) =P

n

j=1 bj , we wish tooptimally solve the following linear program:

minimizeX2Rm⇥n

mX

i=1

nX

j=1

cijxij (3.23)

subject tomX

i=1

xij = bj 8j = 1, . . . , n (3.24)

nX

j=1

xij = ai 8i = 1, . . . ,m (3.25)

xi,j � 0 8i, j (3.26)

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F25/56 (pg.60/154)

Page 62: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices, where useful

Useful for speeding up many transportation, dynamic programming,flow, search, lot-sizing and many other problems.Example, Hitchcock transportation problem: Given m⇥ n cost matrixC = [cij ]ij , a non-negative supply vector a 2 Rm

+ , a non-negativedemand vector b 2 Rn

+ withP

m

i=1 a(i) =P

n

j=1 bj , we wish tooptimally solve the following linear program:

minimizeX2Rm⇥n

mX

i=1

nX

j=1

cijxij (3.23)

subject tomX

i=1

xij = bj 8j = 1, . . . , n (3.24)

nX

j=1

xij = ai 8i = 1, . . . ,m (3.25)

xi,j � 0 8i, j (3.26)

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F25/56 (pg.61/154)

Page 63: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices, Hitchcock transportation

a1

a2

a3

b1 b2 b3 b4

C

0 1 3 310

14940

1 4 7215

3 2 1 2

Producers,Sources,

or Supply

Consumers, Sinks, orDemand

Solving the linear program can be done easily and optimally using the“North West Corner Rule” (a 2D greedy-like approach starting attop-left and moving down-right) in only O(m+ n) if the matrix C isMonge!

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F26/56 (pg.62/154)

Page 64: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices and Convex Polygons

Can generate a Monge matrix from a convex polygon - delete twosegments, then separately number vertices on each chain. Distances cijsatisfy Monge property (or quadrangle inequality).

d(p2, q3) + d(p3, q4) d(p2, q4) + d(p3, q3) (3.27)

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F27/56 (pg.63/154)

Page 65: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices and Convex Polygons

Can generate a Monge matrix from a convex polygon - delete twosegments, then separately number vertices on each chain. Distances cijsatisfy Monge property (or quadrangle inequality).

d(p2, q3) + d(p3, q4) d(p2, q4) + d(p3, q3) (3.27)

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F27/56 (pg.64/154)

Page 66: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices and Convex Polygons

Can generate a Monge matrix from a convex polygon - delete twosegments, then separately number vertices on each chain. Distances cijsatisfy Monge property (or quadrangle inequality).

p1

p2

p3

p4p5p6

q1q2

q3

q4

d(p2, q3) + d(p3, q4) d(p2, q4) + d(p3, q3) (3.27)

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F27/56 (pg.65/154)

Page 67: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices and Convex Polygons

Can generate a Monge matrix from a convex polygon - delete twosegments, then separately number vertices on each chain. Distances cijsatisfy Monge property (or quadrangle inequality).

p1

p2

p3

p4p5p6

q1q2

q3

q4

d(p2, q3) + d(p3, q4) d(p2, q4) + d(p3, q3) (3.27)

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F27/56 (pg.66/154)

Page 68: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices and Submodularity

A submodular function has the form: f : 2V ! R which can be seen asf : {0, 1}V ! R

We can generalize this to f : {0,K}V ! R for some constant K 2 Z+.We may define submodularity as: for all x, y 2 {0,K}V , we have

f(x) + f(y) � f(x _ y) + f(x ^ y) (3.28)

x _ y is the (join) element-wise min of each element, that is(x _ y)(v) = min(x(v), y(v)) for v 2 V .x ^ y is the (meet) element-wise min of each element, that is,(x ^ y)(v) = max(x(v), y(v)) for v 2 V .With K = 1, then this is the standard definition of submodularity.With |V | = 2, and K + 1 the side-dimension of the matrix, we get aMonge property (on square matrices).Not-necessarily-square would be f : {0,K1}⇥ {0,K2} ! R.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F28/56 (pg.67/154)

Page 69: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices and Submodularity

A submodular function has the form: f : 2V ! R which can be seen asf : {0, 1}V ! RWe can generalize this to f : {0,K}V ! R for some constant K 2 Z+.

We may define submodularity as: for all x, y 2 {0,K}V , we have

f(x) + f(y) � f(x _ y) + f(x ^ y) (3.28)

x _ y is the (join) element-wise min of each element, that is(x _ y)(v) = min(x(v), y(v)) for v 2 V .x ^ y is the (meet) element-wise min of each element, that is,(x ^ y)(v) = max(x(v), y(v)) for v 2 V .With K = 1, then this is the standard definition of submodularity.With |V | = 2, and K + 1 the side-dimension of the matrix, we get aMonge property (on square matrices).Not-necessarily-square would be f : {0,K1}⇥ {0,K2} ! R.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F28/56 (pg.68/154)

Page 70: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices and Submodularity

A submodular function has the form: f : 2V ! R which can be seen asf : {0, 1}V ! RWe can generalize this to f : {0,K}V ! R for some constant K 2 Z+.We may define submodularity as: for all x, y 2 {0,K}V , we have

f(x) + f(y) � f(x _ y) + f(x ^ y) (3.28)

x _ y is the (join) element-wise min of each element, that is(x _ y)(v) = min(x(v), y(v)) for v 2 V .x ^ y is the (meet) element-wise min of each element, that is,(x ^ y)(v) = max(x(v), y(v)) for v 2 V .With K = 1, then this is the standard definition of submodularity.With |V | = 2, and K + 1 the side-dimension of the matrix, we get aMonge property (on square matrices).Not-necessarily-square would be f : {0,K1}⇥ {0,K2} ! R.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F28/56 (pg.69/154)

Page 71: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices and Submodularity

A submodular function has the form: f : 2V ! R which can be seen asf : {0, 1}V ! RWe can generalize this to f : {0,K}V ! R for some constant K 2 Z+.We may define submodularity as: for all x, y 2 {0,K}V , we have

f(x) + f(y) � f(x _ y) + f(x ^ y) (3.28)

x _ y is the (join) element-wise min of each element, that is(x _ y)(v) = min(x(v), y(v)) for v 2 V .

x ^ y is the (meet) element-wise min of each element, that is,(x ^ y)(v) = max(x(v), y(v)) for v 2 V .With K = 1, then this is the standard definition of submodularity.With |V | = 2, and K + 1 the side-dimension of the matrix, we get aMonge property (on square matrices).Not-necessarily-square would be f : {0,K1}⇥ {0,K2} ! R.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F28/56 (pg.70/154)

Page 72: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices and Submodularity

A submodular function has the form: f : 2V ! R which can be seen asf : {0, 1}V ! RWe can generalize this to f : {0,K}V ! R for some constant K 2 Z+.We may define submodularity as: for all x, y 2 {0,K}V , we have

f(x) + f(y) � f(x _ y) + f(x ^ y) (3.28)

x _ y is the (join) element-wise min of each element, that is(x _ y)(v) = min(x(v), y(v)) for v 2 V .x ^ y is the (meet) element-wise min of each element, that is,(x ^ y)(v) = max(x(v), y(v)) for v 2 V .

With K = 1, then this is the standard definition of submodularity.With |V | = 2, and K + 1 the side-dimension of the matrix, we get aMonge property (on square matrices).Not-necessarily-square would be f : {0,K1}⇥ {0,K2} ! R.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F28/56 (pg.71/154)

was

X=( 1. 3,5 ) g=( 4,2 ,i )

Xvy=( 4. 3,5 ) xny=( 1,41 )

Page 73: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices and Submodularity

A submodular function has the form: f : 2V ! R which can be seen asf : {0, 1}V ! RWe can generalize this to f : {0,K}V ! R for some constant K 2 Z+.We may define submodularity as: for all x, y 2 {0,K}V , we have

f(x) + f(y) � f(x _ y) + f(x ^ y) (3.28)

x _ y is the (join) element-wise min of each element, that is(x _ y)(v) = min(x(v), y(v)) for v 2 V .x ^ y is the (meet) element-wise min of each element, that is,(x ^ y)(v) = max(x(v), y(v)) for v 2 V .With K = 1, then this is the standard definition of submodularity.

With |V | = 2, and K + 1 the side-dimension of the matrix, we get aMonge property (on square matrices).Not-necessarily-square would be f : {0,K1}⇥ {0,K2} ! R.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F28/56 (pg.72/154)

Page 74: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices and Submodularity

A submodular function has the form: f : 2V ! R which can be seen asf : {0, 1}V ! RWe can generalize this to f : {0,K}V ! R for some constant K 2 Z+.We may define submodularity as: for all x, y 2 {0,K}V , we have

f(x) + f(y) � f(x _ y) + f(x ^ y) (3.28)

x _ y is the (join) element-wise min of each element, that is(x _ y)(v) = min(x(v), y(v)) for v 2 V .x ^ y is the (meet) element-wise min of each element, that is,(x ^ y)(v) = max(x(v), y(v)) for v 2 V .With K = 1, then this is the standard definition of submodularity.With |V | = 2, and K + 1 the side-dimension of the matrix, we get aMonge property (on square matrices).

Not-necessarily-square would be f : {0,K1}⇥ {0,K2} ! R.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F28/56 (pg.73/154)

Page 75: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Monge Matrices and Submodularity

A submodular function has the form: f : 2V ! R which can be seen asf : {0, 1}V ! RWe can generalize this to f : {0,K}V ! R for some constant K 2 Z+.We may define submodularity as: for all x, y 2 {0,K}V , we have

f(x) + f(y) � f(x _ y) + f(x ^ y) (3.28)

x _ y is the (join) element-wise min of each element, that is(x _ y)(v) = min(x(v), y(v)) for v 2 V .x ^ y is the (meet) element-wise min of each element, that is,(x ^ y)(v) = max(x(v), y(v)) for v 2 V .With K = 1, then this is the standard definition of submodularity.With |V | = 2, and K + 1 the side-dimension of the matrix, we get aMonge property (on square matrices).Not-necessarily-square would be f : {0,K1}⇥ {0,K2} ! R.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F28/56 (pg.74/154)

{ 0,1 ,...

,k3V=

It ]

n

me

00

Page 76: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Two Equivalent Submodular Definitions

Definition 3.8.1 (submodular concave)

A function f : 2V ! R is submodular if for any A,B ✓ V , we have that:

f(A) + f(B) � f(A [B) + f(A \B) (3.8)

An alternate and (as we will soon see) equivalent definition is:

Definition 3.8.2 (diminishing returns)

A function f : 2V ! R is submodular if for any A ✓ B ⇢ V , andv 2 V \B, we have that:

f(A [ {v})� f(A) � f(B [ {v})� f(B) (3.9)

The incremental “value”, “gain”, or “cost” of v decreases (diminishes) as thecontext in which v is considered grows from A to B.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F30/56 (pg.75/154)

Page 77: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodular on Hypercube Vertices

Test submodularity via values on verticies of hypercube.

Example: with |V | = n = 2, this iseasy:

With |V | = n = 3, a bit harder.

How many inequalities?

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F31/56 (pg.76/154)

Page 78: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodular on Hypercube Vertices

Test submodularity via values on verticies of hypercube.

Example: with |V | = n = 2, this iseasy:

00 01

1110

With |V | = n = 3, a bit harder.

How many inequalities?

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F31/56 (pg.77/154)

Page 79: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodular on Hypercube Vertices

Test submodularity via values on verticies of hypercube.

Example: with |V | = n = 2, this iseasy:

00 01

1110

With |V | = n = 3, a bit harder.

000

001100 010

011101110

111

How many inequalities?Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F31/56 (pg.78/154)

Page 80: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Subadditive Definitions

Definition 3.8.1 (subadditive)

A function f : 2V ! R is subadditive if for any A,B ✓ V , we have that:

f(A) + f(B) � f(A [B) (3.29)

This means that the “whole” is less than the sum of the parts.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F32/56 (pg.79/154)

Page 81: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Two Equivalent Supermodular Definitions

Definition 3.8.1 (supermodular)

A function f : 2V ! R is supermodular if for any A,B ✓ V , we have that:

f(A) + f(B) f(A [B) + f(A \B) (3.8)

Definition 3.8.2 (supermodular (improving returns))

A function f : 2V ! R is supermodular if for any A ✓ B ⇢ V , andv 2 V \B, we have that:

f(A [ {v})� f(A) f(B [ {v})� f(B) (3.9)

Incremental “value”, “gain”, or “cost” of v increases (improves) as thecontext in which v is considered grows from A to B.A function f is submodular iff �f is supermodular.If f both submodular and supermodular, then f is said to be modular,and f(A) = c+

Pa2A f(a) (often c = 0).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F33/56 (pg.80/154)

Page 82: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Superadditive Definitions

Definition 3.8.2 (superadditive)

A function f : 2V ! R is superadditive if for any A,B ✓ V , we have that:

f(A) + f(B) f(A [B) (3.30)

This means that the “whole” is greater than the sum of the parts.

In general, submodular and subadditive (and supermodular andsuperadditive) are different properties.Ex: Let 0 < k < |V |, and consider f : 2V ! R+ where:

f(A) =

(1 if |A| k

0 else(3.31)

This function is subadditive but not submodular.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F34/56 (pg.81/154)

Page 83: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Superadditive Definitions

Definition 3.8.2 (superadditive)

A function f : 2V ! R is superadditive if for any A,B ✓ V , we have that:

f(A) + f(B) f(A [B) (3.30)

This means that the “whole” is greater than the sum of the parts.In general, submodular and subadditive (and supermodular andsuperadditive) are different properties.

Ex: Let 0 < k < |V |, and consider f : 2V ! R+ where:

f(A) =

(1 if |A| k

0 else(3.31)

This function is subadditive but not submodular.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F34/56 (pg.82/154)

Page 84: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Superadditive Definitions

Definition 3.8.2 (superadditive)

A function f : 2V ! R is superadditive if for any A,B ✓ V , we have that:

f(A) + f(B) f(A [B) (3.30)

This means that the “whole” is greater than the sum of the parts.In general, submodular and subadditive (and supermodular andsuperadditive) are different properties.Ex: Let 0 < k < |V |, and consider f : 2V ! R+ where:

f(A) =

(1 if |A| k

0 else(3.31)

This function is subadditive but not submodular.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F34/56 (pg.83/154)

Page 85: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Superadditive Definitions

Definition 3.8.2 (superadditive)

A function f : 2V ! R is superadditive if for any A,B ✓ V , we have that:

f(A) + f(B) f(A [B) (3.30)

This means that the “whole” is greater than the sum of the parts.In general, submodular and subadditive (and supermodular andsuperadditive) are different properties.Ex: Let 0 < k < |V |, and consider f : 2V ! R+ where:

f(A) =

(1 if |A| k

0 else(3.31)

This function is subadditive but not submodular.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F34/56 (pg.84/154)

Page 86: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Modular Definitions

Definition 3.8.3 (modular)A function that is both submodular and supermodular is called modular

If f is a modular function, than for any A,B ✓ V , we have

f(A) + f(B) = f(A \B) + f(A [B) (3.32)

In modular functions, elements do not interact (or cooperate, or compete, orinfluence each other), and have value based only on singleton values.

Proposition 3.8.4If f is modular, it may be written as

f(A) = f(;) +X

a2A

⇣f({a})� f(;)

⌘= c+

X

a2Af 0(a) (3.33)

which has only |V |+ 1 parameters.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F35/56 (pg.85/154)

GOB

f ( A) = agtlyfk ) normalized ,

Page 87: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Modular Definitions

Proof.We inductively construct the value for A = {a1, a2, . . . , ak}.For k = 2,

f(a1) + f(a2) = f(a1, a2) + f(;) (3.34)implies f(a1, a2) = f(a1)� f(;) + f(a2)� f(;) + f(;) (3.35)

then for k = 3,

f(a1, a2) + f(a3) = f(a1, a2, a3) + f(;) (3.36)implies f(a1, a2, a3) = f(a1, a2)� f(;) + f(a3)� f(;) + f(;) (3.37)

= f(;) +3X

i=1

�f(ai)� f(;)

�(3.38)

and so on . . .Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F36/56 (pg.86/154)

- - u

Page 88: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Complement function

Given a function f : 2V ! R, we can find a complement functionf : 2V ! R as f(A) = f(V \A) for any A.

Proposition 3.8.5

f is submodular iff f is submodular.

Proof.

f(A) + f(B) � f(A [B) + f(A \B) (3.39)

follows from

f(V \A) + f(V \B) � f(V \ (A [B)) + f(V \ (A \B)) (3.40)

which is true because V \ (A [B) = (V \A) \ (V \B) andV \ (A \B) = (V \A) [ (V \B) (De Morgan’s laws for sets).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F37/56 (pg.87/154)

I At ; trial = H Halt H ( Hla ) - H ( h )

÷\ A

= Vn A'

Page 89: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Undirected Graphs

Let G = (V,E) be a graph with vertices V = V (G) and edgesE = E(G) ✓ V ⇥ V .

If G is undirected, define

E(X,Y ) = {{x, y} 2 E(G) : x 2 X \ Y, y 2 Y \X} (3.41)

as the edges strictly between X and Y .Nodes define cuts, define the cut function �(X) = E(X,V \X).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F38/56 (pg.88/154)

Page 90: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Undirected Graphs

Let G = (V,E) be a graph with vertices V = V (G) and edgesE = E(G) ✓ V ⇥ V .If G is undirected, define

E(X,Y ) = {{x, y} 2 E(G) : x 2 X \ Y, y 2 Y \X} (3.41)

as the edges strictly between X and Y .

Nodes define cuts, define the cut function �(X) = E(X,V \X).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F38/56 (pg.89/154)

×0¥w,

Page 91: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Undirected Graphs

Let G = (V,E) be a graph with vertices V = V (G) and edgesE = E(G) ✓ V ⇥ V .If G is undirected, define

E(X,Y ) = {{x, y} 2 E(G) : x 2 X \ Y, y 2 Y \X} (3.41)

as the edges strictly between X and Y .Nodes define cuts, define the cut function �(X) = E(X,V \X).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F38/56 (pg.90/154)

0¥orcxkscnx

)

Page 92: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Undirected Graphs

Let G = (V,E) be a graph with vertices V = V (G) and edgesE = E(G) ✓ V ⇥ V .If G is undirected, define

E(X,Y ) = {{x, y} 2 E(G) : x 2 X \ Y, y 2 Y \X} (3.41)

as the edges strictly between X and Y .Nodes define cuts, define the cut function �(X) = E(X,V \X).

G = (V ,E )

S={a,b,c} �G (S) = {{u, v}2 E : u 2 S , v 2 V \ S}.

a

b

c

ef

h

g

d

= {{a,d},{b,d},{b,e},{c,e},{c,f}}Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F38/56 (pg.91/154)

5=45

Page 93: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Directed graphs, and cuts and flowsIf G is directed, define

E+(X,Y ) , {(x, y) 2 E(G) : x 2 X \ Y, y 2 Y \X} (3.42)

as the edges directed strictly from X towards Y .

Nodes define cuts and flows. Define edges leaving X (out-flow) as

�+(X) , E+(X,V \X) (3.43)

and edges entering X (in-flow) as

��(X) , E+(V \X,X) (3.44)

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F39/56 (pg.92/154)

0%8"

Page 94: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Directed graphs, and cuts and flowsIf G is directed, define

E+(X,Y ) , {(x, y) 2 E(G) : x 2 X \ Y, y 2 Y \X} (3.42)

as the edges directed strictly from X towards Y .Nodes define cuts and flows. Define edges leaving X (out-flow) as

�+(X) , E+(X,V \X) (3.43)

and edges entering X (in-flow) as

��(X) , E+(V \X,X) (3.44)

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F39/56 (pg.93/154)

0

Page 95: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Directed graphs, and cuts and flowsIf G is directed, define

E+(X,Y ) , {(x, y) 2 E(G) : x 2 X \ Y, y 2 Y \X} (3.42)

as the edges directed strictly from X towards Y .Nodes define cuts and flows. Define edges leaving X (out-flow) as

�+(X) , E+(X,V \X) (3.43)

and edges entering X (in-flow) as

��(X) , E+(V \X,X) (3.44)

S={a,b,c}

a

b

c

ef

h

g

d

�G (S) = {(u, v ) 2 E : u 2 S , v 2 V \ S}. = {(b,e) ,(c,f)}

+

= {(d,a) ,(d,b) ,(e,c)}�G (S) = {(v , u) 2 E : u 2 S , v 2 V \ S}.-

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F39/56 (pg.94/154)

gtcxl :2✓→zE

Page 96: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

The Neighbor function in undirected graphs

Given a set X ✓ V , the neighbor function of X is defined as

�(X) , {v 2 V (G) \X : E(X, {v}) 6= ;} (3.45)

Example:

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F40/56 (pg.95/154)

p : 2V→z✓

Page 97: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

The Neighbor function in undirected graphs

Given a set X ✓ V , the neighbor function of X is defined as

�(X) , {v 2 V (G) \X : E(X, {v}) 6= ;} (3.45)

Example:

a

b

c

ef

h

g

d

G = (V,E)

S = {a, b, c}

�(S) = {d, e, f}

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F40/56 (pg.96/154)

Page 98: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Directed Cut function: property

Lemma 3.9.1For a digraph G = (V,E) and any X,Y ✓ V : we have

|�+(X)|+ |�+(Y )|= |�+(X \ Y )|+ |�+(X [ Y )|+ |E+(X,Y )|+ |E+(Y,X)| (3.46)

and

|��(X)|+ |��(Y )|= |��(X \ Y )|+ |��(X [ Y )|+ |E�(X,Y )|+ |E�(Y,X)| (3.47)

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F41/56 (pg.97/154)

z -

>

-

Page 99: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Directed Cut function: proof of propertyProof.We can prove Eq. (3.46) using a geometric counting argument (proof for|��(X)| case is similar)

X V \ X

Y

V \ Y

X V \ X

Y

V \ Y

X V \ X

Y

V \ Y

X V \ X

Y

V \ Y

(e)

(e)

(b)(a)

(a)

(b)

(b)(b)

(c)

(c)

(f )

(f )

(g)

(g)

(d)

(d)

X V \ X

Y

V \ Y

X V \ X

Y

V \ Y

|�+(X )| |�+(Y )|

|�+(X \ Y )| |�+(X [ Y )|

|E+(X ,Y )| |E+(Y ,X )|

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F42/56 (pg.98/154)

(a) +2 C 5 )

+ ( a + ( a) + (e)

+CH + Cg )

(a) + (2 ( b )

+ (c) + Ca )

\ \ (e) + Cfl

+ (g 1

Page 100: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Directed cut/flow functions: submodular

Lemma 3.9.2For a digraph G = (V,E) and any X,Y ✓ V : both functions |�+(X)| and

|��(X)| are submodular.

Proof.|E+(X,Y )| � 0 and |E�(X,Y )| � 0.

More generally, in the non-negative edge weighted case, both in-flow andout-flow are submodular on subsets of the vertices.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F43/56 (pg.99/154)

Page 101: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Undirected Cut/Flow & the Neighbor function: submodularLemma 3.9.3For an undirected graph G = (V,E) and any X,Y ✓ V : we have that both

the undirected cut (or flow) function |�(X)| and the neighbor function

|�(X)| are submodular. I.e.,

|�(X)|+ |�(Y )| = |�(X \ Y )|+ |�(X [ Y )|+ 2|E(X,Y )| (3.48)

and

|�(X)|+ |�(Y )| � |�(X \ Y )|+ |�(X [ Y )| (3.49)

Proof.Eq. (3.48) follows from Eq. (3.46): we replace each undirected edge{u, v} with two oppositely-directed directed edges (u, v) and (v, u).Then we use same counting argument.

Eq. (3.49) follows as shown in the following page.

. . .Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F44/56 (pg.100/154)

Page 102: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Undirected Cut/Flow & the Neighbor function: submodularLemma 3.9.3For an undirected graph G = (V,E) and any X,Y ✓ V : we have that both

the undirected cut (or flow) function |�(X)| and the neighbor function

|�(X)| are submodular. I.e.,

|�(X)|+ |�(Y )| = |�(X \ Y )|+ |�(X [ Y )|+ 2|E(X,Y )| (3.48)

and

|�(X)|+ |�(Y )| � |�(X \ Y )|+ |�(X [ Y )| (3.49)

Proof.Eq. (3.48) follows from Eq. (3.46): we replace each undirected edge{u, v} with two oppositely-directed directed edges (u, v) and (v, u).Then we use same counting argument.Eq. (3.49) follows as shown in the following page.

. . .Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F44/56 (pg.101/154)

Page 103: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Undirected Neighbor functioncont.

X Y(a) (b)

(c)

(f )

(g)

(h)(e)

(d)

Graphically, we can count and see that

�(X) = (a) + (c) + (f) + (g) + (d) (3.50)�(Y ) = (b) + (c) + (e) + (h) + (d) (3.51)�(X [ Y ) = (a) + (b) + (c) + (d) (3.52)

�(X \ Y ) = (c) + (g) + (h) (3.53)

so

|�(X)|+ |�(Y )| = (a) + (b) + 2(c) + 2(d) + (e) + (f) + (g) + (h)

� (a) + (b) + 2(c) + (d) + (g) + (h) = |�(X [ Y )|+ |�(X \ Y )| (3.54)

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F45/56 (pg.102/154)

M

Page 104: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Undirected Neighbor functions

Therefore, the undirected cut function |�(A)| and the neighbor function|�(A)| of a graph G are both submodular.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F46/56 (pg.103/154)

Page 105: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Undirected cut/flow is submodular: alternate proofAnother simple proof shows that |�(X)| is submodular.

Define a graph Guv = ({u, v}, {e}, w) with two nodes u, v and oneedge e = {u, v} with non-negative weight w(e) 2 R+.Cut weight function over those two nodes: w(�u,v(·)) has valuation:

w(�u,v(;)) = w(�u,v({u, v})) = 0 (3.55)

and

w(�u,v({u})) = w(�u,v({v})) = w � 0 (3.56)

Thus, w(�u,v(·)) is submodular since

w(�u,v({u})) + w(�u,v({v})) � w(�u,v({u, v})) + w(�u,v(;)) (3.57)

General non-negative weighted graph G = (V,E,w), define w(�(·)):

f(X) = w(�(X)) =X

(u,v)2E(G)

w(�u,v(X \ {u, v})) (3.58)

This is easily shown to be submodular using properties we will soon see(namely, submodularity closed under summation and restriction).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F47/56 (pg.104/154)

Page 106: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Undirected cut/flow is submodular: alternate proofAnother simple proof shows that |�(X)| is submodular.Define a graph Guv = ({u, v}, {e}, w) with two nodes u, v and oneedge e = {u, v} with non-negative weight w(e) 2 R+.

Cut weight function over those two nodes: w(�u,v(·)) has valuation:

w(�u,v(;)) = w(�u,v({u, v})) = 0 (3.55)

and

w(�u,v({u})) = w(�u,v({v})) = w � 0 (3.56)

Thus, w(�u,v(·)) is submodular since

w(�u,v({u})) + w(�u,v({v})) � w(�u,v({u, v})) + w(�u,v(;)) (3.57)

General non-negative weighted graph G = (V,E,w), define w(�(·)):

f(X) = w(�(X)) =X

(u,v)2E(G)

w(�u,v(X \ {u, v})) (3.58)

This is easily shown to be submodular using properties we will soon see(namely, submodularity closed under summation and restriction).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F47/56 (pg.105/154)

-

u •e- • v

w( e)

Page 107: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Undirected cut/flow is submodular: alternate proofAnother simple proof shows that |�(X)| is submodular.Define a graph Guv = ({u, v}, {e}, w) with two nodes u, v and oneedge e = {u, v} with non-negative weight w(e) 2 R+.Cut weight function over those two nodes: w(�u,v(·)) has valuation:

w(�u,v(;)) = w(�u,v({u, v})) = 0 (3.55)

and

w(�u,v({u})) = w(�u,v({v})) = w � 0 (3.56)

Thus, w(�u,v(·)) is submodular since

w(�u,v({u})) + w(�u,v({v})) � w(�u,v({u, v})) + w(�u,v(;)) (3.57)

General non-negative weighted graph G = (V,E,w), define w(�(·)):

f(X) = w(�(X)) =X

(u,v)2E(G)

w(�u,v(X \ {u, v})) (3.58)

This is easily shown to be submodular using properties we will soon see(namely, submodularity closed under summation and restriction).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F47/56 (pg.106/154)

in of

naFe ~

Page 108: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Undirected cut/flow is submodular: alternate proofAnother simple proof shows that |�(X)| is submodular.Define a graph Guv = ({u, v}, {e}, w) with two nodes u, v and oneedge e = {u, v} with non-negative weight w(e) 2 R+.Cut weight function over those two nodes: w(�u,v(·)) has valuation:

w(�u,v(;)) = w(�u,v({u, v})) = 0 (3.55)

and

w(�u,v({u})) = w(�u,v({v})) = w � 0 (3.56)

Thus, w(�u,v(·)) is submodular since

w(�u,v({u})) + w(�u,v({v})) � w(�u,v({u, v})) + w(�u,v(;)) (3.57)

General non-negative weighted graph G = (V,E,w), define w(�(·)):

f(X) = w(�(X)) =X

(u,v)2E(G)

w(�u,v(X \ {u, v})) (3.58)

This is easily shown to be submodular using properties we will soon see(namely, submodularity closed under summation and restriction).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F47/56 (pg.107/154)

f ( u ) + f ( ~ ) zf ( v.v ) + f ( $ )w + w 7 0 to

Page 109: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Undirected cut/flow is submodular: alternate proofAnother simple proof shows that |�(X)| is submodular.Define a graph Guv = ({u, v}, {e}, w) with two nodes u, v and oneedge e = {u, v} with non-negative weight w(e) 2 R+.Cut weight function over those two nodes: w(�u,v(·)) has valuation:

w(�u,v(;)) = w(�u,v({u, v})) = 0 (3.55)

and

w(�u,v({u})) = w(�u,v({v})) = w � 0 (3.56)

Thus, w(�u,v(·)) is submodular since

w(�u,v({u})) + w(�u,v({v})) � w(�u,v({u, v})) + w(�u,v(;)) (3.57)

General non-negative weighted graph G = (V,E,w), define w(�(·)):

f(X) = w(�(X)) =X

(u,v)2E(G)

w(�u,v(X \ {u, v})) (3.58)

This is easily shown to be submodular using properties we will soon see(namely, submodularity closed under summation and restriction).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F47/56 (pg.108/154)

¥, ~

( XI =

@= IEEE , a ,

funk )

Page 110: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Undirected cut/flow is submodular: alternate proofAnother simple proof shows that |�(X)| is submodular.Define a graph Guv = ({u, v}, {e}, w) with two nodes u, v and oneedge e = {u, v} with non-negative weight w(e) 2 R+.Cut weight function over those two nodes: w(�u,v(·)) has valuation:

w(�u,v(;)) = w(�u,v({u, v})) = 0 (3.55)

and

w(�u,v({u})) = w(�u,v({v})) = w � 0 (3.56)

Thus, w(�u,v(·)) is submodular since

w(�u,v({u})) + w(�u,v({v})) � w(�u,v({u, v})) + w(�u,v(;)) (3.57)

General non-negative weighted graph G = (V,E,w), define w(�(·)):

f(X) = w(�(X)) =X

(u,v)2E(G)

w(�u,v(X \ {u, v})) (3.58)

This is easily shown to be submodular using properties we will soon see(namely, submodularity closed under summation and restriction).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F47/56 (pg.109/154)

Page 111: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

f : 2A → IR,

ACV

g : 2V → IR

g. ( × ) = f ( XNA ) fE✓

Page 112: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Other graph functions that are submodular/supermodular

These come from Narayanan’s book 1997. Let G be an undirected graph.Let V (X) be the vertices adjacent to some edge in X ✓ E(G), then|V (X)| (the vertex function) is submodular.

Let E(S) be the edges with both vertices in S ✓ V (G). Then |E(S)|(the interior edge function) is supermodular.Let I(S) be the edges with at least one vertex in S ✓ V (G). Then|I(S)| (the incidence function) is submodular.Recall |�(S)|, is the set size of edges with exactly one vertex inS ✓ V (G) is submodular (cut size function). Thus, we haveI(S) = E(S) [ �(S) and E(S) \ �(S) = ;, and thus that|I(S)| = |E(S)|+ |�(S)|.

So we can get a submodular function bysumming a submodular and a supermodular function. If you had toguess, is this always the case?

Consider f(A) = |�+(A)|� |�+(V \A)|. Guess, submodular,supermodular, modular, or neither? Exercise: determine which one andprove it.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F48/56 (pg.110/154)

Page 113: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Other graph functions that are submodular/supermodular

These come from Narayanan’s book 1997. Let G be an undirected graph.Let V (X) be the vertices adjacent to some edge in X ✓ E(G), then|V (X)| (the vertex function) is submodular.Let E(S) be the edges with both vertices in S ✓ V (G). Then |E(S)|(the interior edge function) is supermodular.

Let I(S) be the edges with at least one vertex in S ✓ V (G). Then|I(S)| (the incidence function) is submodular.Recall |�(S)|, is the set size of edges with exactly one vertex inS ✓ V (G) is submodular (cut size function). Thus, we haveI(S) = E(S) [ �(S) and E(S) \ �(S) = ;, and thus that|I(S)| = |E(S)|+ |�(S)|.

So we can get a submodular function bysumming a submodular and a supermodular function. If you had toguess, is this always the case?

Consider f(A) = |�+(A)|� |�+(V \A)|. Guess, submodular,supermodular, modular, or neither? Exercise: determine which one andprove it.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F48/56 (pg.111/154)

IECS )l=a .I?sI{ Einstein 3

f Ball pyramid .

Page 114: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Other graph functions that are submodular/supermodular

These come from Narayanan’s book 1997. Let G be an undirected graph.Let V (X) be the vertices adjacent to some edge in X ✓ E(G), then|V (X)| (the vertex function) is submodular.Let E(S) be the edges with both vertices in S ✓ V (G). Then |E(S)|(the interior edge function) is supermodular.Let I(S) be the edges with at least one vertex in S ✓ V (G). Then|I(S)| (the incidence function) is submodular.

Recall |�(S)|, is the set size of edges with exactly one vertex inS ✓ V (G) is submodular (cut size function). Thus, we haveI(S) = E(S) [ �(S) and E(S) \ �(S) = ;, and thus that|I(S)| = |E(S)|+ |�(S)|.

So we can get a submodular function bysumming a submodular and a supermodular function. If you had toguess, is this always the case?

Consider f(A) = |�+(A)|� |�+(V \A)|. Guess, submodular,supermodular, modular, or neither? Exercise: determine which one andprove it.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F48/56 (pg.112/154)

Page 115: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Other graph functions that are submodular/supermodular

These come from Narayanan’s book 1997. Let G be an undirected graph.Let V (X) be the vertices adjacent to some edge in X ✓ E(G), then|V (X)| (the vertex function) is submodular.Let E(S) be the edges with both vertices in S ✓ V (G). Then |E(S)|(the interior edge function) is supermodular.Let I(S) be the edges with at least one vertex in S ✓ V (G). Then|I(S)| (the incidence function) is submodular.Recall |�(S)|, is the set size of edges with exactly one vertex inS ✓ V (G) is submodular (cut size function). Thus, we haveI(S) = E(S) [ �(S) and E(S) \ �(S) = ;, and thus that|I(S)| = |E(S)|+ |�(S)|.

So we can get a submodular function bysumming a submodular and a supermodular function. If you had toguess, is this always the case?Consider f(A) = |�+(A)|� |�+(V \A)|. Guess, submodular,supermodular, modular, or neither? Exercise: determine which one andprove it.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F48/56 (pg.113/154)

OF

Page 116: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Other graph functions that are submodular/supermodular

These come from Narayanan’s book 1997. Let G be an undirected graph.Let V (X) be the vertices adjacent to some edge in X ✓ E(G), then|V (X)| (the vertex function) is submodular.Let E(S) be the edges with both vertices in S ✓ V (G). Then |E(S)|(the interior edge function) is supermodular.Let I(S) be the edges with at least one vertex in S ✓ V (G). Then|I(S)| (the incidence function) is submodular.Recall |�(S)|, is the set size of edges with exactly one vertex inS ✓ V (G) is submodular (cut size function). Thus, we haveI(S) = E(S) [ �(S) and E(S) \ �(S) = ;, and thus that|I(S)| = |E(S)|+ |�(S)|. So we can get a submodular function bysumming a submodular and a supermodular function.

If you had toguess, is this always the case?Consider f(A) = |�+(A)|� |�+(V \A)|. Guess, submodular,supermodular, modular, or neither? Exercise: determine which one andprove it.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F48/56 (pg.114/154)

Page 117: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Other graph functions that are submodular/supermodular

These come from Narayanan’s book 1997. Let G be an undirected graph.Let V (X) be the vertices adjacent to some edge in X ✓ E(G), then|V (X)| (the vertex function) is submodular.Let E(S) be the edges with both vertices in S ✓ V (G). Then |E(S)|(the interior edge function) is supermodular.Let I(S) be the edges with at least one vertex in S ✓ V (G). Then|I(S)| (the incidence function) is submodular.Recall |�(S)|, is the set size of edges with exactly one vertex inS ✓ V (G) is submodular (cut size function). Thus, we haveI(S) = E(S) [ �(S) and E(S) \ �(S) = ;, and thus that|I(S)| = |E(S)|+ |�(S)|. So we can get a submodular function bysumming a submodular and a supermodular function. If you had toguess, is this always the case?

Consider f(A) = |�+(A)|� |�+(V \A)|. Guess, submodular,supermodular, modular, or neither? Exercise: determine which one andprove it.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F48/56 (pg.115/154)

Page 118: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Other graph functions that are submodular/supermodular

These come from Narayanan’s book 1997. Let G be an undirected graph.Let V (X) be the vertices adjacent to some edge in X ✓ E(G), then|V (X)| (the vertex function) is submodular.Let E(S) be the edges with both vertices in S ✓ V (G). Then |E(S)|(the interior edge function) is supermodular.Let I(S) be the edges with at least one vertex in S ✓ V (G). Then|I(S)| (the incidence function) is submodular.Recall |�(S)|, is the set size of edges with exactly one vertex inS ✓ V (G) is submodular (cut size function). Thus, we haveI(S) = E(S) [ �(S) and E(S) \ �(S) = ;, and thus that|I(S)| = |E(S)|+ |�(S)|. So we can get a submodular function bysumming a submodular and a supermodular function. If you had toguess, is this always the case?Consider f(A) = |�+(A)|� |�+(V \A)|. Guess, submodular,supermodular, modular, or neither? Exercise: determine which one andprove it.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F48/56 (pg.116/154)

Page 119: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Number of connected components in a graph via edgesRecall, f : 2V ! R is submodular, then so is f : 2V ! R defined asf(S) = f(V \ S).

Hence, if g : 2V ! R is supermodular, then so is g : 2V ! R defined asg(S) = g(V \ S).Given a graph G = (V,E), for each A ✓ E(G), let c(A) denote thenumber of connected components of the (spanning) subgraph(V (G), A), with c : 2E ! R+.c(A) is monotone non-increasing, c(A+ a)� c(A) 0 .Then c(A) is supermodular, i.e.,

c(A+ a)� c(A) c(B + a)� c(B) (3.59)with A ✓ B ✓ E \ {a}.Intuition: an edge is “more” (no less) able to bridge separatecomponents (and reduce the number of conected components) whenedge is added in a smaller context than when added in a larger context.c(A) = c(E \A) is number of connected components in G when weremove A; supermodular monotone non-decreasing but not normalized.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F49/56 (pg.117/154)

Page 120: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Number of connected components in a graph via edgesRecall, f : 2V ! R is submodular, then so is f : 2V ! R defined asf(S) = f(V \ S).Hence, if g : 2V ! R is supermodular, then so is g : 2V ! R defined asg(S) = g(V \ S).

Given a graph G = (V,E), for each A ✓ E(G), let c(A) denote thenumber of connected components of the (spanning) subgraph(V (G), A), with c : 2E ! R+.c(A) is monotone non-increasing, c(A+ a)� c(A) 0 .Then c(A) is supermodular, i.e.,

c(A+ a)� c(A) c(B + a)� c(B) (3.59)with A ✓ B ✓ E \ {a}.Intuition: an edge is “more” (no less) able to bridge separatecomponents (and reduce the number of conected components) whenedge is added in a smaller context than when added in a larger context.c(A) = c(E \A) is number of connected components in G when weremove A; supermodular monotone non-decreasing but not normalized.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F49/56 (pg.118/154)

Page 121: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Number of connected components in a graph via edgesRecall, f : 2V ! R is submodular, then so is f : 2V ! R defined asf(S) = f(V \ S).Hence, if g : 2V ! R is supermodular, then so is g : 2V ! R defined asg(S) = g(V \ S).Given a graph G = (V,E), for each A ✓ E(G), let c(A) denote thenumber of connected components of the (spanning) subgraph(V (G), A), with c : 2E ! R+.

c(A) is monotone non-increasing, c(A+ a)� c(A) 0 .Then c(A) is supermodular, i.e.,

c(A+ a)� c(A) c(B + a)� c(B) (3.59)with A ✓ B ✓ E \ {a}.Intuition: an edge is “more” (no less) able to bridge separatecomponents (and reduce the number of conected components) whenedge is added in a smaller context than when added in a larger context.c(A) = c(E \A) is number of connected components in G when weremove A; supermodular monotone non-decreasing but not normalized.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F49/56 (pg.119/154)

°## # o

Page 122: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Number of connected components in a graph via edgesRecall, f : 2V ! R is submodular, then so is f : 2V ! R defined asf(S) = f(V \ S).Hence, if g : 2V ! R is supermodular, then so is g : 2V ! R defined asg(S) = g(V \ S).Given a graph G = (V,E), for each A ✓ E(G), let c(A) denote thenumber of connected components of the (spanning) subgraph(V (G), A), with c : 2E ! R+.c(A) is monotone non-increasing, c(A+ a)� c(A) 0 .

Then c(A) is supermodular, i.e.,c(A+ a)� c(A) c(B + a)� c(B) (3.59)

with A ✓ B ✓ E \ {a}.Intuition: an edge is “more” (no less) able to bridge separatecomponents (and reduce the number of conected components) whenedge is added in a smaller context than when added in a larger context.c(A) = c(E \A) is number of connected components in G when weremove A; supermodular monotone non-decreasing but not normalized.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F49/56 (pg.120/154)

:#÷

Page 123: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Number of connected components in a graph via edgesRecall, f : 2V ! R is submodular, then so is f : 2V ! R defined asf(S) = f(V \ S).Hence, if g : 2V ! R is supermodular, then so is g : 2V ! R defined asg(S) = g(V \ S).Given a graph G = (V,E), for each A ✓ E(G), let c(A) denote thenumber of connected components of the (spanning) subgraph(V (G), A), with c : 2E ! R+.c(A) is monotone non-increasing, c(A+ a)� c(A) 0 .Then c(A) is supermodular, i.e.,

c(A+ a)� c(A) c(B + a)� c(B) (3.59)with A ✓ B ✓ E \ {a}.

Intuition: an edge is “more” (no less) able to bridge separatecomponents (and reduce the number of conected components) whenedge is added in a smaller context than when added in a larger context.c(A) = c(E \A) is number of connected components in G when weremove A; supermodular monotone non-decreasing but not normalized.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F49/56 (pg.121/154)

so

Page 124: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Number of connected components in a graph via edgesRecall, f : 2V ! R is submodular, then so is f : 2V ! R defined asf(S) = f(V \ S).Hence, if g : 2V ! R is supermodular, then so is g : 2V ! R defined asg(S) = g(V \ S).Given a graph G = (V,E), for each A ✓ E(G), let c(A) denote thenumber of connected components of the (spanning) subgraph(V (G), A), with c : 2E ! R+.c(A) is monotone non-increasing, c(A+ a)� c(A) 0 .Then c(A) is supermodular, i.e.,

c(A+ a)� c(A) c(B + a)� c(B) (3.59)with A ✓ B ✓ E \ {a}.Intuition: an edge is “more” (no less) able to bridge separatecomponents (and reduce the number of conected components) whenedge is added in a smaller context than when added in a larger context.

c(A) = c(E \A) is number of connected components in G when weremove A; supermodular monotone non-decreasing but not normalized.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F49/56 (pg.122/154)

Page 125: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Number of connected components in a graph via edgesRecall, f : 2V ! R is submodular, then so is f : 2V ! R defined asf(S) = f(V \ S).Hence, if g : 2V ! R is supermodular, then so is g : 2V ! R defined asg(S) = g(V \ S).Given a graph G = (V,E), for each A ✓ E(G), let c(A) denote thenumber of connected components of the (spanning) subgraph(V (G), A), with c : 2E ! R+.c(A) is monotone non-increasing, c(A+ a)� c(A) 0 .Then c(A) is supermodular, i.e.,

c(A+ a)� c(A) c(B + a)� c(B) (3.59)with A ✓ B ✓ E \ {a}.Intuition: an edge is “more” (no less) able to bridge separatecomponents (and reduce the number of conected components) whenedge is added in a smaller context than when added in a larger context.c(A) = c(E \A) is number of connected components in G when weremove A; supermodular monotone non-decreasing but not normalized.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F49/56 (pg.123/154)

Page 126: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Strength

So c(A) = c(E \A) is the number of connected components in Gwhen we remove A, is supermodular.

Maximizing c(A) might seem as a goal for a network attacker — manyconnected components means that many points in the network havelost connectivity to many other points (unprotected network).If we can remove a small set A and shatter the graph into manyconnected components, then the graph is weak.An attacker wishes to choose a small number of edges (since it ischeap) to shatter the graph into as many components as possible.Let G = (V,E,w) with w : E ! R+ be a weighted graph withnon-negative weights.For (u, v) = e 2 E, let w(e) be a measure of the strength of theconnection between vertices u and v (strength meaning the difficulty ofcutting the edge e).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F50/56 (pg.124/154)

Page 127: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Strength

So c(A) = c(E \A) is the number of connected components in Gwhen we remove A, is supermodular.Maximizing c(A) might seem as a goal for a network attacker — manyconnected components means that many points in the network havelost connectivity to many other points (unprotected network).

If we can remove a small set A and shatter the graph into manyconnected components, then the graph is weak.An attacker wishes to choose a small number of edges (since it ischeap) to shatter the graph into as many components as possible.Let G = (V,E,w) with w : E ! R+ be a weighted graph withnon-negative weights.For (u, v) = e 2 E, let w(e) be a measure of the strength of theconnection between vertices u and v (strength meaning the difficulty ofcutting the edge e).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F50/56 (pg.125/154)

Page 128: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Strength

So c(A) = c(E \A) is the number of connected components in Gwhen we remove A, is supermodular.Maximizing c(A) might seem as a goal for a network attacker — manyconnected components means that many points in the network havelost connectivity to many other points (unprotected network).If we can remove a small set A and shatter the graph into manyconnected components, then the graph is weak.

An attacker wishes to choose a small number of edges (since it ischeap) to shatter the graph into as many components as possible.Let G = (V,E,w) with w : E ! R+ be a weighted graph withnon-negative weights.For (u, v) = e 2 E, let w(e) be a measure of the strength of theconnection between vertices u and v (strength meaning the difficulty ofcutting the edge e).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F50/56 (pg.126/154)

Page 129: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Strength

So c(A) = c(E \A) is the number of connected components in Gwhen we remove A, is supermodular.Maximizing c(A) might seem as a goal for a network attacker — manyconnected components means that many points in the network havelost connectivity to many other points (unprotected network).If we can remove a small set A and shatter the graph into manyconnected components, then the graph is weak.An attacker wishes to choose a small number of edges (since it ischeap) to shatter the graph into as many components as possible.

Let G = (V,E,w) with w : E ! R+ be a weighted graph withnon-negative weights.For (u, v) = e 2 E, let w(e) be a measure of the strength of theconnection between vertices u and v (strength meaning the difficulty ofcutting the edge e).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F50/56 (pg.127/154)

Page 130: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Strength

So c(A) = c(E \A) is the number of connected components in Gwhen we remove A, is supermodular.Maximizing c(A) might seem as a goal for a network attacker — manyconnected components means that many points in the network havelost connectivity to many other points (unprotected network).If we can remove a small set A and shatter the graph into manyconnected components, then the graph is weak.An attacker wishes to choose a small number of edges (since it ischeap) to shatter the graph into as many components as possible.Let G = (V,E,w) with w : E ! R+ be a weighted graph withnon-negative weights.

For (u, v) = e 2 E, let w(e) be a measure of the strength of theconnection between vertices u and v (strength meaning the difficulty ofcutting the edge e).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F50/56 (pg.128/154)

Page 131: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Strength

So c(A) = c(E \A) is the number of connected components in Gwhen we remove A, is supermodular.Maximizing c(A) might seem as a goal for a network attacker — manyconnected components means that many points in the network havelost connectivity to many other points (unprotected network).If we can remove a small set A and shatter the graph into manyconnected components, then the graph is weak.An attacker wishes to choose a small number of edges (since it ischeap) to shatter the graph into as many components as possible.Let G = (V,E,w) with w : E ! R+ be a weighted graph withnon-negative weights.For (u, v) = e 2 E, let w(e) be a measure of the strength of theconnection between vertices u and v (strength meaning the difficulty ofcutting the edge e).

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F50/56 (pg.129/154)

Page 132: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Strength

Then w(A) for A ✓ E is a modular function

w(A) =X

e2Awe (3.60)

so that w(E(G[S])) is the “internal strength” of the vertex set S.Notation: S is a set of nodes, G[S] is the vertex-induced subgraph of G induced byvertices S, E(G[S]) are the edges contained within this induced subgraph, andw(E(G[S])) is the weight of these edges.

Suppose removing A shatters G into a graph with c(A) > 1components —

then w(A)/(c(A)� 1) is like the “effort perachieved/additional component” for a network attacker.

A form of graph strength can then be defined as the following:

strength(G,w) = minA✓E(G):c(A)>1

w(A)

c(A)� 1(3.61)

Graph strength is like the minimum effort per component. An attackerwould use the argument of the min to choose which edges to attack. Anetwork designer would maximize, over G and/or w, the graphstrength, strength(G,w).Since submodularity, problems have strongly-poly-time solutions.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F51/56 (pg.130/154)

Page 133: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Strength

Then w(A) for A ✓ E is a modular function

w(A) =X

e2Awe (3.60)

so that w(E(G[S])) is the “internal strength” of the vertex set S.Suppose removing A shatters G into a graph with c(A) > 1components —

then w(A)/(c(A)� 1) is like the “effort perachieved/additional component” for a network attacker.A form of graph strength can then be defined as the following:

strength(G,w) = minA✓E(G):c(A)>1

w(A)

c(A)� 1(3.61)

Graph strength is like the minimum effort per component. An attackerwould use the argument of the min to choose which edges to attack. Anetwork designer would maximize, over G and/or w, the graphstrength, strength(G,w).Since submodularity, problems have strongly-poly-time solutions.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F51/56 (pg.131/154)

Page 134: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Strength

Then w(A) for A ✓ E is a modular function

w(A) =X

e2Awe (3.60)

so that w(E(G[S])) is the “internal strength” of the vertex set S.Suppose removing A shatters G into a graph with c(A) > 1components — then w(A)/(c(A)� 1) is like the “effort perachieved/additional component” for a network attacker.

A form of graph strength can then be defined as the following:

strength(G,w) = minA✓E(G):c(A)>1

w(A)

c(A)� 1(3.61)

Graph strength is like the minimum effort per component. An attackerwould use the argument of the min to choose which edges to attack. Anetwork designer would maximize, over G and/or w, the graphstrength, strength(G,w).Since submodularity, problems have strongly-poly-time solutions.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F51/56 (pg.132/154)

Page 135: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Strength

Then w(A) for A ✓ E is a modular function

w(A) =X

e2Awe (3.60)

so that w(E(G[S])) is the “internal strength” of the vertex set S.Suppose removing A shatters G into a graph with c(A) > 1components — then w(A)/(c(A)� 1) is like the “effort perachieved/additional component” for a network attacker.A form of graph strength can then be defined as the following:

strength(G,w) = minA✓E(G):c(A)>1

w(A)

c(A)� 1(3.61)

Graph strength is like the minimum effort per component. An attackerwould use the argument of the min to choose which edges to attack. Anetwork designer would maximize, over G and/or w, the graphstrength, strength(G,w).Since submodularity, problems have strongly-poly-time solutions.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F51/56 (pg.133/154)

Page 136: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Strength

Then w(A) for A ✓ E is a modular function

w(A) =X

e2Awe (3.60)

so that w(E(G[S])) is the “internal strength” of the vertex set S.Suppose removing A shatters G into a graph with c(A) > 1components — then w(A)/(c(A)� 1) is like the “effort perachieved/additional component” for a network attacker.A form of graph strength can then be defined as the following:

strength(G,w) = minA✓E(G):c(A)>1

w(A)

c(A)� 1(3.61)

Graph strength is like the minimum effort per component. An attackerwould use the argument of the min to choose which edges to attack. Anetwork designer would maximize, over G and/or w, the graphstrength, strength(G,w).

Since submodularity, problems have strongly-poly-time solutions.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F51/56 (pg.134/154)

Page 137: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Strength

Then w(A) for A ✓ E is a modular function

w(A) =X

e2Awe (3.60)

so that w(E(G[S])) is the “internal strength” of the vertex set S.Suppose removing A shatters G into a graph with c(A) > 1components — then w(A)/(c(A)� 1) is like the “effort perachieved/additional component” for a network attacker.A form of graph strength can then be defined as the following:

strength(G,w) = minA✓E(G):c(A)>1

w(A)

c(A)� 1(3.61)

Graph strength is like the minimum effort per component. An attackerwould use the argument of the min to choose which edges to attack. Anetwork designer would maximize, over G and/or w, the graphstrength, strength(G,w).Since submodularity, problems have strongly-poly-time solutions.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F51/56 (pg.135/154)

Page 138: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodularity, Quadratic Structures, and Cuts

Lemma 3.9.4

Let M 2 Rn⇥nbe a symmetric matrix and m 2 Rn

be a vector. Then

f : 2V ! R defined as

f(X) = m|1X +1

21|XM1X (3.62)

is submodular iff the off-diagonal elements of M are non-positive.

Proof.

Given a complete graph G = (V,E), recall that E(X) is the edge setwith both vertices in X ✓ V (G), and that |E(X)| is supermodular.Non-negative modular weights w+ : E ! R+, w(E(X)) is alsosupermodular, so �w(E(X)) is submodular.f is a modular function m|1A = m(A) added to a weightedsubmodular function, hence f is submodular.

. . .Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F52/56 (pg.136/154)

Page 139: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodularity, Quadratic Structures, and Cuts

Lemma 3.9.4

Let M 2 Rn⇥nbe a symmetric matrix and m 2 Rn

be a vector. Then

f : 2V ! R defined as

f(X) = m|1X +1

21|XM1X (3.62)

is submodular iff the off-diagonal elements of M are non-positive.

Proof.Given a complete graph G = (V,E), recall that E(X) is the edge setwith both vertices in X ✓ V (G), and that |E(X)| is supermodular.

Non-negative modular weights w+ : E ! R+, w(E(X)) is alsosupermodular, so �w(E(X)) is submodular.f is a modular function m|1A = m(A) added to a weightedsubmodular function, hence f is submodular.

. . .Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F52/56 (pg.137/154)

Page 140: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodularity, Quadratic Structures, and Cuts

Lemma 3.9.4

Let M 2 Rn⇥nbe a symmetric matrix and m 2 Rn

be a vector. Then

f : 2V ! R defined as

f(X) = m|1X +1

21|XM1X (3.62)

is submodular iff the off-diagonal elements of M are non-positive.

Proof.Given a complete graph G = (V,E), recall that E(X) is the edge setwith both vertices in X ✓ V (G), and that |E(X)| is supermodular.Non-negative modular weights w+ : E ! R+, w(E(X)) is alsosupermodular, so �w(E(X)) is submodular.

f is a modular function m|1A = m(A) added to a weightedsubmodular function, hence f is submodular.

. . .Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F52/56 (pg.138/154)

Page 141: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodularity, Quadratic Structures, and Cuts

Lemma 3.9.4

Let M 2 Rn⇥nbe a symmetric matrix and m 2 Rn

be a vector. Then

f : 2V ! R defined as

f(X) = m|1X +1

21|XM1X (3.62)

is submodular iff the off-diagonal elements of M are non-positive.

Proof.Given a complete graph G = (V,E), recall that E(X) is the edge setwith both vertices in X ✓ V (G), and that |E(X)| is supermodular.Non-negative modular weights w+ : E ! R+, w(E(X)) is alsosupermodular, so �w(E(X)) is submodular.f is a modular function m|1A = m(A) added to a weightedsubmodular function, hence f is submodular. . . .Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F52/56 (pg.139/154)

Page 142: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodularity, Quadratic Structures, and Cuts

Proof of Lemma 3.9.4 cont.Conversely, suppose f is submodular.

Then 8u, v 2 V , f({u}) + f({v}) � f({u, v}) + f(;) while f(;) = 0.This requires:

0 f({u}) + f({v})� f({u, v}) (3.63)

= m(u) +1

2Mu,u +m(v) +

1

2Mv,v (3.64)

�⇣m(u) +m(v) +

1

2Mu,u +Mu,v +

1

2Mv,v

⌘(3.65)

= �Mu,v (3.66)

So that 8u, v 2 V , Mu,v 0.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F53/56 (pg.140/154)

Page 143: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodularity, Quadratic Structures, and Cuts

Proof of Lemma 3.9.4 cont.Conversely, suppose f is submodular.Then 8u, v 2 V , f({u}) + f({v}) � f({u, v}) + f(;) while f(;) = 0.

This requires:

0 f({u}) + f({v})� f({u, v}) (3.63)

= m(u) +1

2Mu,u +m(v) +

1

2Mv,v (3.64)

�⇣m(u) +m(v) +

1

2Mu,u +Mu,v +

1

2Mv,v

⌘(3.65)

= �Mu,v (3.66)

So that 8u, v 2 V , Mu,v 0.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F53/56 (pg.141/154)

Page 144: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Submodularity, Quadratic Structures, and Cuts

Proof of Lemma 3.9.4 cont.Conversely, suppose f is submodular.Then 8u, v 2 V , f({u}) + f({v}) � f({u, v}) + f(;) while f(;) = 0.This requires:

0 f({u}) + f({v})� f({u, v}) (3.63)

= m(u) +1

2Mu,u +m(v) +

1

2Mv,v (3.64)

�⇣m(u) +m(v) +

1

2Mu,u +Mu,v +

1

2Mv,v

⌘(3.65)

= �Mu,v (3.66)

So that 8u, v 2 V , Mu,v 0.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F53/56 (pg.142/154)

Page 145: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Set Cover and Maximum Coveragejust Special cases of Submodular Optimization

We are given a finite set U of m elements and a set of subsetsU = {U1, U2, . . . , Un} of n subsets of U , so that Ui ✓ U andS

iUi = U .

The goal of minimum set cover is to choose the smallest subsetA ✓ [n] , {1, . . . , n} such that

Sa2A Ua = U .

Maximum k cover: The goal in maximum coverage is, given an integerk n, select k subsets, say {a1, a2, . . . , ak} with ai 2 [n] such that|S

k

i=1 Uai | is maximized.f : 2[n] ! Z+ where for A ✓ [n], f(A) = |

Sa2A Va| is the set cover

function and is submodular.Weighted set cover: f(A) = w(

Sa2A Va) where w : U ! R+.

Both Set cover and maximum coverage are well known to be NP-hard,but have a fast greedy approximation algorithm, and hence areinstances of submodular optimization.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F54/56 (pg.143/154)

Page 146: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Set Cover and Maximum Coveragejust Special cases of Submodular Optimization

We are given a finite set U of m elements and a set of subsetsU = {U1, U2, . . . , Un} of n subsets of U , so that Ui ✓ U andS

iUi = U .

The goal of minimum set cover is to choose the smallest subsetA ✓ [n] , {1, . . . , n} such that

Sa2A Ua = U .

Maximum k cover: The goal in maximum coverage is, given an integerk n, select k subsets, say {a1, a2, . . . , ak} with ai 2 [n] such that|S

k

i=1 Uai | is maximized.f : 2[n] ! Z+ where for A ✓ [n], f(A) = |

Sa2A Va| is the set cover

function and is submodular.Weighted set cover: f(A) = w(

Sa2A Va) where w : U ! R+.

Both Set cover and maximum coverage are well known to be NP-hard,but have a fast greedy approximation algorithm, and hence areinstances of submodular optimization.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F54/56 (pg.144/154)

Page 147: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Set Cover and Maximum Coveragejust Special cases of Submodular Optimization

We are given a finite set U of m elements and a set of subsetsU = {U1, U2, . . . , Un} of n subsets of U , so that Ui ✓ U andS

iUi = U .

The goal of minimum set cover is to choose the smallest subsetA ✓ [n] , {1, . . . , n} such that

Sa2A Ua = U .

Maximum k cover: The goal in maximum coverage is, given an integerk n, select k subsets, say {a1, a2, . . . , ak} with ai 2 [n] such that|S

k

i=1 Uai | is maximized.

f : 2[n] ! Z+ where for A ✓ [n], f(A) = |S

a2A Va| is the set coverfunction and is submodular.Weighted set cover: f(A) = w(

Sa2A Va) where w : U ! R+.

Both Set cover and maximum coverage are well known to be NP-hard,but have a fast greedy approximation algorithm, and hence areinstances of submodular optimization.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F54/56 (pg.145/154)

Page 148: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Set Cover and Maximum Coveragejust Special cases of Submodular Optimization

We are given a finite set U of m elements and a set of subsetsU = {U1, U2, . . . , Un} of n subsets of U , so that Ui ✓ U andS

iUi = U .

The goal of minimum set cover is to choose the smallest subsetA ✓ [n] , {1, . . . , n} such that

Sa2A Ua = U .

Maximum k cover: The goal in maximum coverage is, given an integerk n, select k subsets, say {a1, a2, . . . , ak} with ai 2 [n] such that|S

k

i=1 Uai | is maximized.f : 2[n] ! Z+ where for A ✓ [n], f(A) = |

Sa2A Va| is the set cover

function and is submodular.

Weighted set cover: f(A) = w(S

a2A Va) where w : U ! R+.Both Set cover and maximum coverage are well known to be NP-hard,but have a fast greedy approximation algorithm, and hence areinstances of submodular optimization.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F54/56 (pg.146/154)

Page 149: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Set Cover and Maximum Coveragejust Special cases of Submodular Optimization

We are given a finite set U of m elements and a set of subsetsU = {U1, U2, . . . , Un} of n subsets of U , so that Ui ✓ U andS

iUi = U .

The goal of minimum set cover is to choose the smallest subsetA ✓ [n] , {1, . . . , n} such that

Sa2A Ua = U .

Maximum k cover: The goal in maximum coverage is, given an integerk n, select k subsets, say {a1, a2, . . . , ak} with ai 2 [n] such that|S

k

i=1 Uai | is maximized.f : 2[n] ! Z+ where for A ✓ [n], f(A) = |

Sa2A Va| is the set cover

function and is submodular.Weighted set cover: f(A) = w(

Sa2A Va) where w : U ! R+.

Both Set cover and maximum coverage are well known to be NP-hard,but have a fast greedy approximation algorithm, and hence areinstances of submodular optimization.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F54/56 (pg.147/154)

Page 150: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Set Cover and Maximum Coveragejust Special cases of Submodular Optimization

We are given a finite set U of m elements and a set of subsetsU = {U1, U2, . . . , Un} of n subsets of U , so that Ui ✓ U andS

iUi = U .

The goal of minimum set cover is to choose the smallest subsetA ✓ [n] , {1, . . . , n} such that

Sa2A Ua = U .

Maximum k cover: The goal in maximum coverage is, given an integerk n, select k subsets, say {a1, a2, . . . , ak} with ai 2 [n] such that|S

k

i=1 Uai | is maximized.f : 2[n] ! Z+ where for A ✓ [n], f(A) = |

Sa2A Va| is the set cover

function and is submodular.Weighted set cover: f(A) = w(

Sa2A Va) where w : U ! R+.

Both Set cover and maximum coverage are well known to be NP-hard,but have a fast greedy approximation algorithm, and hence areinstances of submodular optimization.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F54/56 (pg.148/154)

Page 151: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Vertex and Edge CoversAlso instances of submodular optimization

Definition 3.9.5 (vertex cover)A vertex cover (a “vertex-based cover of edges”) in graph G = (V,E) is aset S ✓ V (G) of vertices such that every edge in G is incident to at leastone vertex in S.

Let I(S) be the number of edges incident to vertex set S. Then wewish to find the smallest set S ✓ V subject to I(S) = |E|.

Definition 3.9.6 (edge cover)A edge cover (an “edge-based cover of vertices”) in graph G = (V,E) is aset F ✓ E(G) of edges such that every vertex in G is incident to at leastone edge in F .

Let |V |(F ) be the number of vertices incident to edge set F . Then wewish to find the smallest set F ✓ E subject to |V |(F ) = |V |.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F55/56 (pg.149/154)

Page 152: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Cut ProblemsAlso submodular optimization

Minimum cut: Given a graph G = (V,E), find a set of vertices S ✓ Vthat minimize the cut (set of edges) between S and V \ S.

Maximum cut: Given a graph G = (V,E), find a set of vertices S ✓ Vthat minimize the cut (set of edges) between S and V \ S.Let � : 2V ! R+ be the cut function, namely for any given set of nodesX ✓ V , |�(X)| measures the number of edges between nodes X andV \X — i.e., �(x) = E(X,V \X).Weighted versions, where rather than count, we sum the (non-negative)weights of the edges of a cut, f(X) = w(�(X)).Hence, Minimum cut and Maximum cut are also special cases ofsubmodular optimization.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F56/56 (pg.150/154)

Page 153: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Cut ProblemsAlso submodular optimization

Minimum cut: Given a graph G = (V,E), find a set of vertices S ✓ Vthat minimize the cut (set of edges) between S and V \ S.Maximum cut: Given a graph G = (V,E), find a set of vertices S ✓ Vthat minimize the cut (set of edges) between S and V \ S.

Let � : 2V ! R+ be the cut function, namely for any given set of nodesX ✓ V , |�(X)| measures the number of edges between nodes X andV \X — i.e., �(x) = E(X,V \X).Weighted versions, where rather than count, we sum the (non-negative)weights of the edges of a cut, f(X) = w(�(X)).Hence, Minimum cut and Maximum cut are also special cases ofsubmodular optimization.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F56/56 (pg.151/154)

Page 154: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Cut ProblemsAlso submodular optimization

Minimum cut: Given a graph G = (V,E), find a set of vertices S ✓ Vthat minimize the cut (set of edges) between S and V \ S.Maximum cut: Given a graph G = (V,E), find a set of vertices S ✓ Vthat minimize the cut (set of edges) between S and V \ S.Let � : 2V ! R+ be the cut function, namely for any given set of nodesX ✓ V , |�(X)| measures the number of edges between nodes X andV \X — i.e., �(x) = E(X,V \X).

Weighted versions, where rather than count, we sum the (non-negative)weights of the edges of a cut, f(X) = w(�(X)).Hence, Minimum cut and Maximum cut are also special cases ofsubmodular optimization.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F56/56 (pg.152/154)

Page 155: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Cut ProblemsAlso submodular optimization

Minimum cut: Given a graph G = (V,E), find a set of vertices S ✓ Vthat minimize the cut (set of edges) between S and V \ S.Maximum cut: Given a graph G = (V,E), find a set of vertices S ✓ Vthat minimize the cut (set of edges) between S and V \ S.Let � : 2V ! R+ be the cut function, namely for any given set of nodesX ✓ V , |�(X)| measures the number of edges between nodes X andV \X — i.e., �(x) = E(X,V \X).Weighted versions, where rather than count, we sum the (non-negative)weights of the edges of a cut, f(X) = w(�(X)).

Hence, Minimum cut and Maximum cut are also special cases ofsubmodular optimization.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F56/56 (pg.153/154)

Page 156: Submodular Functions, Optimization, and Applications to ... · Feldman, Kothari, Vondrák (2013) ,showsinsomelearningsettings, things are more promising (PAC learning possible in

ML Target Surrogate Bit More Notation Info Theory Examples Monge More Definitions Graph & Combinatorial Examples

Graph Cut ProblemsAlso submodular optimization

Minimum cut: Given a graph G = (V,E), find a set of vertices S ✓ Vthat minimize the cut (set of edges) between S and V \ S.Maximum cut: Given a graph G = (V,E), find a set of vertices S ✓ Vthat minimize the cut (set of edges) between S and V \ S.Let � : 2V ! R+ be the cut function, namely for any given set of nodesX ✓ V , |�(X)| measures the number of edges between nodes X andV \X — i.e., �(x) = E(X,V \X).Weighted versions, where rather than count, we sum the (non-negative)weights of the edges of a cut, f(X) = w(�(X)).Hence, Minimum cut and Maximum cut are also special cases ofsubmodular optimization.

Prof. Jeff Bilmes EE563/Spring 2018/Submodularity - Lecture 3 - April 2nd, 2018 F56/56 (pg.154/154)