Algortihm Assignment

profilebetoglan2017
CS.7.Performance_Algortithm_Lesson3.pdf

C O M P U T E R S C I E N C E S E D G E W I C K / W A Y N E

P A R T I : P R O G R A M M I N G I N J AVA

h t t p : / / i n t r o c s . c s . p r i n c e t o n . e d u

R O B E R T S E D G E W I C K K E V I N W A Y N E

C om

pu ter Scien

ce

Computer Science

An Interdisciplinary Approach

7. Performance

Section 4.1

7. Performance

• The challenge • Empirical analysis • Mathematical models • Doubling method • Familiar examples

C O M P U T E R S C I E N C E S E D G E W I C K / W A Y N E

P A R T I : P R O G R A M M I N G I N J AVA

CS.7.A.Performance.Challenge

3

The challenge (since the earliest days of computing machines)

Difference Engine #2

Designed by Charles Babbage, c. 1848

Built by London Science Museum, 1991

“ As soon as an Analytic Engine exists, it will necessarily guide the future course of the science. Whenever any result is sought by its aid, the question will arise—By what course of calculation can these results be arrived at by the machine in the shortest time?”

− Charles Babbage

Q. How many times do you
 have to turn the crank?

The challenge (modern version)

4

Q. Will I be able to use my program to solve a large practical problem?

Key insight (Knuth 1970s). Use the scientific method to understand performance.

Q. If not, how might I understand its performance characteristics so as to improve it?

5

Three reasons to study program performance

2. To compare algorithms and implementations.

• Will this change make my program faster? • How can I make my program faster?

3. To develop a basis for understanding the
 problem and for designing new algorithms

• Enables new technology.

• Enables new research.

1. To predict program behavior

• Will my program finish? • When will my program finish?

public class Gambler { public static void main(String[] args) { int stake = Integer.parseInt(args[0]); int goal = Integer.parseInt(args[1]); int trials = Integer.parseInt(args[2]); int wins = 0; for (int t = 0; t < trials; t++) { int cash = stake; while (cash > 0 && cash < goal) if (Math.random() < 0.5) cash++;
 else cash--; if (cash == goal) wins++; } StdOut.print(wins + " wins of " + trials); } }

An algorithm is a method for solving a problem that is

suitable for implementation as a computer program.

We study several algorithms later in this course.
 Taking more CS courses? You'll learn dozens of algorithms.

An algorithm design success story

N-body simulation

• Goal: Simulate gravitational interactions among N bodies. • Brute-force algorithm uses N 2 steps per time unit. • Issue (1970s): Too slow to address scientific problems of interest. • Success story: Barnes-Hut algorithm uses N logN steps and enables new research.

6

problem size (N )

ti m

e

N 2

N logN

limit on available time

Andrew Appel PU '81

senior thesis

Another algorithm design success story

Discrete Fourier transform

• Goal: Break down waveform of N samples into periodic components. • Applications: digital signal processing, spectroscopy, ... • Brute-force algorithm uses N 2 steps. • Issue (1950s): Too slow to address commercial applications of interest. • Success story: FFT algorithm uses N logN steps and enables new technology.

7

problem size (N )

ti m

e

N 2

N logN

limit on available time

John Tukey 1915–2000

Quick aside: binary logarithms

8

Def. The binary logarithm of a number N (written lg N )
 is the number x satisfying 2x = N.

N approximate value lgN log10N

210 1 thousand 10 3.01

220 1 million 20 6.02

230 1 billion 30 9.03

Frequently encountered values

Fact. The number of bits in the binary representation of N is 1+⎣lg N⎦.

Q. How many recursive calls for convert(N)?

Fact. Binary logarithms arise in the study of algorithms based on recursively solving problems 
 half the size (divide-and-conquer algorithms), like convert, FFT and Barnes-Hut.

A. Largest integer less than or equal to lg N (written ⎣lg N⎦).

public static String convert(int N) { if (N == 1) return "1"; return convert(N/2) + (N % 2); }

N

lg N

1 2 4 8 16

2

4

0 1

3

Prove by induction. Details in "sorting and searching" lecture.

or log2N

An algorithmic challenge: 3-sum problem

Three-sum. Given N integers, enumerate the triples that sum to 0.

9

Q. Can we solve this problem for N = 1 million?

Applications in computational geometry

• Find collinear points.

• Does one polygon fit inside another?

• Robot motion planning.

• [a surprisingly long list]

For simplicity, just count them.

public class ThreeSum { public static int count(int[] a) { /* See next slide. */ } 
 public static void main(String[] args) { int[] a = StdIn.readAllInts(); StdOut.println(count(a)); } }

% more 6ints.txt 30 -30 -20 -10 40 0

% java ThreeSum < 6ints.txt 3

30 -30 0 30 -20 -10 -30 -10 40

10

Three-sum implementation

"Brute force" algorithm

• Process all possible triples.

• Increment counter when sum is 0.

public static int count(int[] a) { int N = a.length; int cnt = 0; for (int i = 0; i < N; i++) for (int j = i+1; j < N; j++) for (int k = j+1; k < N; k++) if (a[i] + a[j] + a[k] == 0) cnt++; return cnt; }

i j k a[i] a[j] a[k]

0 1 2 30 -30 -20

0 1 3 30 -30 -10

0 1 4 30 -30 40

0 1 5 30 -30 0

0 2 3 30 -20 -10

0 2 4 30 -20 40

0 2 5 30 -20 0

0 3 4 30 -10 40

0 3 5 30 -10 0

0 4 5 30 40 0

1 2 3 -30 -20 -10

1 2 4 -30 -20 40

1 2 5 -30 -20 0

1 3 4 -30 -10 40

1 3 5 -30 -10 0

1 4 5 -30 40 0

2 3 4 -20 -10 40

2 3 5 -20 -10 0

2 4 5 -20 40 0

3 4 5 -10 40 0

i 0 1 2 3 4 5

a[i] 30 -30 -20 -10 40 0

Q. How much time will this program take for N = 1 million?

Keep i < j < k to avoid processing


each triple 6 times

� 5 �

� triples 


with i < j < k

C O M P U T E R S C I E N C E S E D G E W I C K / W A Y N E

P A R T I : P R O G R A M M I N G I N J AVA

Image sources

http://commons.wikimedia.org/wiki/File:Babbages_Analytical_Engine,_1834-1871._(9660574685).jpg

http://commons.wikimedia.org/wiki/File:Charles_Babbage_1860.jpg

http://commons.wikimedia.org/wiki/File:John_Tukey.jpg

http://commons.wikimedia.org/wiki/File:Andrew_Apple_(FloC_2006).jpg

http://commons.wikimedia.org/wiki/File:Hubble's_Wide_View_of_'Mystic_Mountain'_in_Infrared.jpg

CS.7.A.Performance.Challenge

7. Performance

• The challenge • Empirical analysis • Mathematical models • Doubling method • Familiar examples

C O M P U T E R S C I E N C E S E D G E W I C K / W A Y N E

P A R T I : P R O G R A M M I N G I N J AVA

CS.7.B.Performance.Empirical

A first step in analyzing running time

Find representative inputs

• Option 1: Collect actual input data. • Option 2: Write a program to generate representative inputs.

13

public class Generator { // Generate N integers in [-M, M) public static void main(String[] args) { int M = Integer.parseInt(args[0]); int N = Integer.parseInt(args[1]); for (int i = 0; i < N; i++) StdOut.println(StdRandom.uniform(-M, M)); } }

Input generator for ThreeSum % java Generator 1000000 10 28773 -807569 -425582 594752 600579 -483784 -861312 -690436 -732636 360294

% java Generator 10 10 -2 1 -4 1 -2 -10 -4 1 0 -7

not much chance of a 3-sum

good chance of a 3-sum

Empirical analysis

Run experiments

• Start with a moderate input size N. • Measure and record running time. • Double input size N. • Repeat. • Tabulate and plot results.

14

% java Generator 1000000 1000 | java ThreeSum 59 (0 seconds)

% java Generator 1000000 2000 | java ThreeSum 522 (4 seconds)

% java Generator 1000000 4000 | java ThreeSum 3992 (31 seconds)

% java Generator 1000000 8000 | java ThreeSum 31903 (248 seconds)

Run experiments

Tabulate and plot results

problem size (N )

ti m

e (s

ec on

d s)

0 1000 2000 4000 8000

100

0

200

250 N time (seconds)

1000 0

2000 4

4000 31

8000 248

Measure running time

double start = System.currentTimeMillis() / 1000.0; int cnt = count(a); double now = System.currentTimeMillis() / 1000.0; StdOut.printf("%d (%.0f seconds)\n", cnt, now - start);

Replace println() in ThreeSum with this code.

Aside: experimentation in CS is virtually free, particularly by comparison with other sciences.

15

Bottom line. No excuse for not running experiments to understand costs.

Physics

Chemistry

Biology

one experiment

Computer Science

one million experiments

Do the math

Data analysis

Curve fitting

• Plot on log-log scale. • If points are on a straight line (often the case), a

power law holds—a curve of the form aNb fits.

• The exponent b is the slope of the line. • Solve for a with the data.

16

N TN lgN lg TN

1000 0.5 10 −1

2000 4 11 2

4000 31 12 5

8000 248 13 8

log problem size (lg N )

lo g t

im e

(l g T

N )

10 11 12 13

2

−1

8

log-log plot

5 TN = aN3 raise 2 to a power of both sides

248 = a × 80003 substitute values from experiment

a = 4.84 × 10 −10 solve for a

4.84 × 10 −10 × N3

0.5

4

31

248

straight line of slope 3

lgTN = lga + 3lgN equation for straight line of slope 3

x-intercept (use lg in anticipation of next step)

TN = 4.84 × 10 −10 × N3 substitute

a curve that fits the data ?

Prediction. Running time for N = 16,000 will be 1982 seconds.

Prediction and verification

Hypothesis. Running time of ThreeSum is 4.84 × 10 −10 × N 3.

17

✓% java Generator 1000000 16000 | java ThreeSum 31903 (1985 seconds)

Q. How much time will this program take for N = 1 million?

A. 484 million seconds (more than 15 years).

about half an hour

Another hypothesis

18

Hypothesis. Running times on different computers differ by only a constant factor.

0 1000 2000 4000 8000

100

0

200

250 N time (seconds)

1000 0 2000 4 4000 31 8000 248

4.84 × 10 −10 × N3 seconds 2010s: 10,000+ times faster

Macbook Air

0 1000 2000 4000 8000

1000000

0

N time (seconds) 1000 5319 2000 43221 4000 343774 8000 2654384

5.2 × 10 −6 × N3 seconds

VAX 11/780

1970s

2000000

(estimated )

C O M P U T E R S C I E N C E S E D G E W I C K / W A Y N E

P A R T I : P R O G R A M M I N G I N J AVA

Image sources

http://commons.wikimedia.org/wiki/File:FEMA_-_2720_-_Photograph_by_FEMA_News_Photo.jpg

http://pixabay.com/en/lab-research-chemistry-test-217041/

http://upload.wikimedia.org/wikipedia/commons/2/28/Cut_rat_2.jpg

http://pixabay.com/en/view-glass-future-crystal-ball-32381/

CS.7.B.Performance.Empirical

7. Performance

• The challenge • Empirical analysis • Mathematical models • Doubling method • Familiar examples

C O M P U T E R S C I E N C E S E D G E W I C K / W A Y N E

P A R T I : P R O G R A M M I N G I N J AVA

CS.7.C.Performance.Math

Mathematical models for running time

A. (D. E. Knuth, 1968–present) Yes!

• Determine the set of operations. • Find the cost of each operation (depends on computer and system software). • Find the frequency of execution of each operation (depends on algorithm and inputs). • Total running time: sum of cost × frequency for all operations.

21

Q. Can we write down an accurate formula for the running time of a computer program?

A. (Prevailing wisdom, 1960s) No, too complicated.

Don Knuth
 1974 Turing Award

22

Warmup: 1-sum

public static int count(int[] a) { int N = a.length; int cnt = 0; for (int i = 0; i < N; i++) if (a[i] == 0) cnt++; return cnt; }

Q. Formula for total running time ?

A. cN + 26.5 nanoseconds, where c is between 2 and 2.5, depending on input.

operation cost frequency

function call/return 20 ns 1

variable declaration 2 ns 2

assignment 1 ns 2

less than compare 1/2 ns N + 1

equal to compare 1/2 ns N

array access 1/2 ns N

increment 1/2 ns between N and 2N

representative estimates (with some poetic license); knowing exact values may require study and

experimentation.

Note that frequency of increments

depends on input.

23

Warmup: 2-sum

operation cost frequency

function call/return 20 ns 1

variable declaration 2 ns N + 2

assignment 1 ns N + 2

less than compare 1/2 ns (N + 1) (N + 2)/2

equal to compare 1/2 ns N (N − 1)/2

array access 1/2 ns N (N − 1)

increment 1/2 ns between N (N + 1)/2 and N 2

public static int count(int[] a) { int N = a.length; int cnt = 0; for (int i = 0; i < N; i++) for (int j = i+1; j < N; j++) if (a[i] + a[j] == 0) cnt++; return cnt; }

Q. Formula for total running time ?

A. c1N 2 + c2N + c3 nanoseconds, where... [complicated definitions].

exact counts tedious to derive � N 2

� = N(N � 1)

2 # i < j =

Simplifying the calculations

Tilde notation

• Use only the fastest-growing term. • Ignore the slower-growing terms.

24

Ex. 5/4 N 2 + 13/4 N + 53/2

Def. f (N ) ~ g(N ) means f (N )/g(N ) ➛ 1 as N ➛ ∞

1,250,000 for N = 1,000,

within .3% 1,253,276.5 for N = 1,000

Q. Formula for 2-sum running time when count is not large (typical case)?

A. ~ 5/4 N 2 nanoseconds.

Rationale

• When N is large, ignored terms are negligible. • When N is small, everything is negligible.

eliminate dependence on input

~ 5/4 N 2

25

Mathematical model for 3-sum

operation cost frequency

function call/return 20 ns 1

variable declaration 2 ns ~ N

assignment 1 ns ~ N

less than compare 1/2 ns ~ N 3/6

equal to compare 1/2 ns ~ N 3/6

array access 1/2 ns ~ N 3/2

increment 1/2 ns ~ N 3/6

Q. Formula for total running time when return value is not large (typical case)?

A. ~ N 3/2 nanoseconds.

public static int count(int[] a) { int N = a.length; int cnt = 0; for (int i = 0; i < N; i++) for (int j = i+1; j < N; j++) for (int k = j+1; k < N; k++) if (a[i] + a[j] + a[k] == 0) cnt++; return cnt; }

� N 3

� = N(N � 1)(N � 2)

6 � N3

6 # i < j < k =

✓ matches 4.84 × 10 −10 × N 3 empirical hypothesis

assumes count is not large

Context

26

Mathematical analysis of algorithms

• Analyze algorithm to develop a formula for running time as a function of N.

• Useful for predicting and explaining. • Might involve advanced mathematics. • Applies to any computer.

Scientific method

• Observe some feature of the natural world. • Hypothesize a model consistent with observations. • Predict events using the hypothesis. • Verify the predictions by making further observations. • Validate by refining until hypothesis and observations agree.

Empirical analysis of programs

• "Feature of natural world" is time taken by a program on a computer.

• Fit a curve to experimental data to get a formula for running time as a function of N.

• Useful for predicting, but not explaining.

Good news. Mathematical models are easier to formulate in CS than in other sciences.

Francis Bacon 1561–1626

René Descartes 1596–1650

John Stuart Mill 1806–1873

C O M P U T E R S C I E N C E S E D G E W I C K / W A Y N E

P A R T I : P R O G R A M M I N G I N J AVA

Image sources

http://commons.wikimedia.org/wiki/File:KnuthAtOpenContentAlliance.jpg

http://commons.wikimedia.org/wiki/File:Pourbus_Francis_Bacon.jpg

http://commons.wikimedia.org/wiki/File:Frans_Hals_-_Portret_van_René_Descartes.jpg

http://commons.wikimedia.org/wiki/File:John_Stuart_Mill_by_London_Stereoscopic_Company,_c1870.jpg

CS.7.C.Performance.Math

7. Performance

• The challenge • Empirical analysis • Mathematical models • Doubling method • Familiar examples

C O M P U T E R S C I E N C E S E D G E W I C K / W A Y N E

P A R T I : P R O G R A M M I N G I N J AVA

CS.7.D.Performance.Doubling

Key questions and answers

29

Q. Is the running time of my program ~ a Nb seconds?

A. Yes, there's good chance of that. Might also have a (lgN )c factor.

Q. How do you know?

A. Programs are built from simple constructs (examples to follow).

A. Real-world data is also often simply structured.

A. Computer scientists have applied such models for decades 
 to many, many specific algorithms and applications.

A. Deep connections exist between such models and a wide 
 variety of discrete structures (including some programs).

Doubling method

30

Bottom line. It is often easy to meet the challenge of predicting performance.

Hypothesis. The running time of my program is TN ~ a Nb .

Consequence. As N increases, T2N/TN approaches 2b.

Doubling method

• Start with a moderate size. • Measure and record running time. • Double size. • Repeat while you can afford it. • Verify that ratios of running times approach 2b. • Predict by extrapolation:

multiply by 2b to estimate T2N and repeat.

N TN TN/TN/2

1000 0.5

2000 4 8

4000 31 7.75

8000 248 8

16000 248 × 8 = 1984 8

32000 248 × 82 = 15872 8

...

1024000 248 × 87 = 520093696 8

math model says running time

should be aN 3 23 = 8

3-sum example

H(�5)I

H5I = �IProof:

no need to calculate a (!)

Order of growth

31

Def. If a function f (N ) ~ ag(N ) we say that g(N ) is the order of growth of the function.

Hypothesis. Order of growth is a property of the algorithm, not the computer or the system.

Experimental validation

When we execute a program on a computer that is X times faster, we expect the program to be X times faster.

Explanation with mathematical model

Machine- and system-dependent features of the model are all constants.

Order of growth

32

Hypothesis. The order of growth of the running time of my program is N b (logN )c.

for (int i = 0; i < N; i++) for (int j = i+1; j < N; j++) for (int k = j+1; k < N; k++) ...

Cubic (N3)

for (int i = 0; i < N; i++) for (int j = i+1; j < N; j++) ...

Quadratic (N2) for (int i = 0; i < N; i++) ...

Linear (N)

public static void f(int N) { if (N == 0) return; ... f(N/2)... }

Logarithmic (log N)

public static void f(int N) { if (N == 0) return; ... f(N/2)... ... f(N/2)... for (int i = 0; i < N; i++) ... }

Linearithmic (N log N)

public static void f(int N) { if (N == 0) return; ... f(N-1)... ... f(N-1)... }

Exponential (2N)

Evidence. Known to be true for many, many programs with simple and similar structure.

Stay tuned for examples. ignore for practical purposes
 (infeasible for large N )

log instead of lg since constant base

not relevant

Order of growth classifications

33

order of growth slope of line in log-log plot (b)

factor for doubling

method (2b)description function

constant 1 0 1

logarithmic logN 0 1

linear N 1 2

linearithmic N logN 1 2

quadratic N 2 2 4

cubic N 3 3 8

If math model gives order of growth, use doubling method to validate 2b ratio.

If not, use doubling method and solve for b = lg(TN/TN/2) to estimate order of growth to be N b.

if input size doubles running time increases

by this factor Problem size

T im

e

1K 2K 4K 8K

2T

T

8T

log-log plot

4T

NN 2N 3

N logN

logN

1

math model may have log factor

Do the math

An important implication

34

A. You can't afford to use a quadratic algorithm (or worse) to address increasing problem sizes.

Moore's Law. Computer power increases by a roughly a factor of 2 every 2 years.

Q. My problem size also doubles every 2 years. How much do I need to spend to get my job done?

now 2 years

from now 4 years

from now 2M years from now

N $X $X $X ... $X

N logN $X $X $X ... $X

N 2 $X $2X $4X ... $2M X

N 3 $X $4X $16X ... $4M X

TN = aN3 running time today

T2N = (a / 2 ) (2N )3 running time in 2 years

= 4aN3

= 4TN

a very common situation: weather prediction, transaction processing, cryptography...

Meeting the challenge

35

Doubling experiments provide good insight on program performance

• Best practice to plan realistic experiments for debugging, anyway. • Having some idea about performance is better than having no idea. • Performance matters in many, many situations.

Hmmm. Still didn't finish. The experiment showed my program

to have a higher order of growth than I expected. I found and fixed the bug.

Time for some pizza!

Mine, too. I'm going to run a doubling experiment.

My program is taking too long to finish. I'm going

out to get a pizza.

Caveats

36

It is sometimes not so easy to meet the challenge of predicting performance.

Good news. Doubling method is robust in the face of many of these challenges.

a(2N)b(lg(2N))c

aNb(lg N)c = 2b

� 1 +

1 (lg N)

�c

� 2b

What happens when the leading term oscillates?

There are many other apps running on my computer !

Your machine model is too simple: My computer has parallel processors

and a cache.

Where’s the log factor?

We need more terms in the math model: N lg N + 100N ?

Your input model is too simple: My real input data is completely different.

C O M P U T E R S C I E N C E S E D G E W I C K / W A Y N E

P A R T I : P R O G R A M M I N G I N J AVA

Image sources

https://openclipart.org/detail/25617/astrid-graeber-adult-by-anonymous-25617

https://openclipart.org/detail/169320/girl-head-by-jza

https://openclipart.org/detail/191873/manga-girl---true-svg--by-j4p4n-191873

CS.7.D.Performance.Doubling

7. Performance

• The challenge • Empirical analysis • Mathematical models • Doubling hypothesis • Familiar examples

C O M P U T E R S C I E N C E S E D G E W I C K / W A Y N E

P A R T I : P R O G R A M M I N G I N J AVA

CS.7.E.Performance.Examples

Example: Gambler’s ruin simulation

39

public class Gambler { public static void main(String[] args) { int stake = Integer.parseInt(args[0]); int goal = Integer.parseInt(args[1]); int trials = Integer.parseInt(args[2]); double start = System.currentTimeMillis()/1000.0; int wins = 0; for (int i = 0; i < trials; i++) { int t = stake; while (t > 0 && t < goal) { if (Math.random() < 0.5) t++; else t--; } if (t == goal) wins++; } double now = System.currentTimeMillis()/1000.0; StdOut.print(wins + " wins of " + trials); StdOut.printf(" (%.0f seconds)\n", now - start); } }

Q. How long to compute chance of doubling 1 million dollars?

% java Gambler 1000 2000 100 53 wins of 100 (4 seconds) % java Gambler 2000 4000 100 52 wins of 100 (17 seconds) % java Gambler 4000 8000 100 55 wins of 100 (56 seconds) % java Gambler 8000 16000 100 53 wins of 100 (286 seconds)

N TN TN/TN/2 1000 4

2000 17 4.25

4000 56 3.29

8000 286 5.10

16000 1172 4.09

32000 1172 × 4 = 4688 4

...

1024000 1172 × 46 = 4800512 4

math model says order of growth

should be N 2

A. 4.8 million seconds (about 2 months). % java Gambler 16000 32000 100 48 wins of 100 (1172 seconds)

Pop quiz on performance

Q. Let TN be the running time of program Mystery and consider these experiments:

40

public class Mystery { public static void main(String[] args) { ... int N = Integer.parseInt(args[0]); ... } }

Q. Predict the running time for N = 64,000.

Q. Estimate the order of growth.

N TN (in seconds) TN/TN/2

1000 5

2000 20 4

4000 80 4

8000 320 4

Pop quiz on performance

Q. Let TN be the running time of program Mystery and consider these experiments.

41

Q. Predict the running time for N = 64,000.

Q. Estimate the order of growth.

N TN (in seconds) TN/TN/2

1000 5

2000 20 4

4000 80 4

8000 320 4

16000 320 × 4 = 1280 4

32000 1280 × 4 = 5120 4

64000 5120 × 4 = 20480 4

A. N 2, since lg 4 = 2.

A. 20480 seconds.

public class Mystery { public static void main(String[] args) { ... int N = Integer.parseInt(args[0]); ... } }

Another example: Coupon collector

42

public class Collector { public static void main(String[] args) { int N = Integer.parseInt(args[0]); int trials = Integer.parseInt(args[1]); int cardcnt = 0; double start = System.currentTimeMillis()/1000.0; for (int i = 0; i < trials; i++) { int valcnt = 0; boolean[] found = new boolean[N]; while (valcnt < N) { int val = (int) (StdRandom() * N); cardcnt++; if (!found[val]) { valcnt++; found[val] = true; } } } double now = System.currentTimeMillis()/1000.0; StdOut.printf(“%d %.0f “, N, N*Math.log(N) + .57721*N); StdOut.print(cardcnt/trials); StdOut.printf(" (%.0f seconds)\n", now - start); } }

Q. How long to simulate collecting 1 million coupons?

% java Collector 125000 100 125000 1539160 1518646 (7 seconds)

% java Collector 250000 100 250000 3251607 3173727 (14 seconds)

% java Collector 500000 100 500000 6849787 6772679 (31 seconds)

N TN TN/TN/2

125000 7

250000 14 2

500000 31 2.21

1000000 31 × 2 = 63 2

A. About 1 minute.

math model says order of growth

should be N logN

% java Collector 1000000 100 1000000 14392721 14368813 (66 seconds) ✓

might run out of memory trying for 1 billion

Analyzing typical memory requirements

43

type bytes

boolean 1

char 2

int 4

float 4

long 8

double 8

Primitive-type values

A bit is 0 or 1 and the basic unit of memory.

A byte is eight bits — the smallest addressable unit.

type bytes

int[N] 4N + 16

double[N] 8N + 16

int[N][N] 4N 2 + 20N + 16 ~ 4N 2

double[N][N] 8N 2 + 20N + 16 ~ 8N 2

String 2N + 40

System-supported data structures (typical)

Example. 2000-by-2000 double array uses ~32MB.

1 megabyte (MB) is about 1 million bytes.

1 gigabyte (GB) is about 1 billion bytes.

Note: not 1 bit

Summary

44

does it scale?yes

buy a new computer and solve bigger problems

learn a better algorithm

noA.

Q. What if it's not fast enough?

invent a better algorithm

no yes

Plenty of new algorithms awaiting discovery. Example: Solve 3-sum efficiently

found one?

Use computational experiments, mathematical analysis, and the scientific method 
 to learn whether your program might be useful to solve a large problem.

Case in point

45

does it scale?yes

buy a new computer and solve bigger problems learn a better algorithm

no

Not so long ago, 2 CS grad students had a program to index and rank the web (to enable search).

Lesson. Performance matters!

yes invent a better algorithm

not yet changing the world?

maybe found one?

yes

PageRank
 (see Section 1.6)

Larry Page and Sergey Brin

no

C O M P U T E R S C I E N C E S E D G E W I C K / W A Y N E

P A R T I : P R O G R A M M I N G I N J AVA

Image source

http://en.wikipedia.org/wiki/File:Google_page_brin.jpg

CS.7.E.Performance.Examples

C O M P U T E R S C I E N C E S E D G E W I C K / W A Y N E

P A R T I : P R O G R A M M I N G I N J AVA

h t t p : / / i n t r o c s . c s . p r i n c e t o n . e d u

R O B E R T S E D G E W I C K K E V I N W A Y N E

C om

pu ter Scien

ce

Computer Science

An Interdisciplinary Approach

7. Performance

Section 4.1