Discussion 2(30)

profileSam@98&
Biometrics-231-45_recognized.docx

7/5/2021 Biometrics in Identity Management: Concepts to Applications

Fundamentals of Technical Evaluations

No biometric system is perfect-there are varying levels of how well a biometric system performs its basic cask of recognizing users. Evaluating biometric systems requires analyzing a number of different variables such as mismatch error races, throughput races, reliabilicy, consistency, cose, and target population. The cask of determining which system is the beseis made difficult for practitioners and system integrators as they face a deluge of performance repores and press releases about superiority of different systems. In order co infer results of these repores in a correct and unambiguous manner, a clear understanding of biometric system performance evaluation is necessary. This chapter introduces che reader co concepts of perfor­ mance evaluation and system errors so that questions such as che following can be answered:

· Out of the multiple biometric systems, which one performs the bese?

· How well does a particular biometric system perform?

· Why is a particular biometric system performing poorly?

By the end of this chapter, the reader will become familiar wich different types of biometric system errors, testing methodologies, performance assessment and vi­ sualization techniques, and bese practices for planning an evaluation.

2.1 System Process Transactions

Beforediscussing various types of biometric syscemerrors, it is importane co become familiar with concepts and terms that form the basis of performance calculations. As discussed in Chapter 1, a biometric system is capable of three processes: enroll­ ment, verification, and identification. Each of these requires a user co interact with a sensor and provide a sample for further processing, which is called a presentation [1]. An operation that consists of a single presentation or a series of presentations that results in an enrollment or a matching score is called an attempt. Transaction is completion of one or more attempts for the purpose of enrollment, verification, or identification [1]. It is extremely important co understand che concepts of pre­ sentations, attempts, and transactions for correct calculation of error races, as well as differentiating system errors from process errors. To understand che differences between these terms, consider a scenario where a fingerprint recognition system

17

1/17

7/5/2021 Biometrics in Identity Management: Concepts to Applications

18 Fundamentals of Technical Evaluations

requires a user co provide three usable fingerprint images for enrollment and two usable fingerprint images for verification. Each fingerprint image submitted by the user during enrollment and verification is a presentation. A user provides three fingerprint images in three successive presentations, which are all accepted by che system, and thereby completes the enrollment process. This is a single attempt and a single transaction. During verification the user provides two fingerprint samples, which results in two similarity scores. The aggregate of these two scores is used co make a decision about che identity of the user. In this case two presentations, two attempts, and one transaction have taken place.

Two other importane concepts are those of genuine attempts and imposter at­ tempts. In a genuine attempt a user tries co match his or her sample against his or her own enrollment template. In an imposter attempt a user cries co match his or her sample against another user's enrollment template. The decision policies and the nature of the attempts determine genuine transactions and imposter transac­ tions. The similarity scores generated from genuine transactions are called genuine ·match scores and similarity scores generated from imposter transactions are called imposter ·match scores. These terms and concepts are used throughout this chapter,

2.2 Types of Errors

A generic biometric system is composed of five subsystems: daca acquisition, signal processing, daca storage, matching, and decision. Performance assessment of a bio­ metric system is a function of errors generated by these five subsystems, which are discussed in this section.

2.2.1 Acquisition Errors

An acquisition error occurs when che daca acquisition subsystem is unable co cap­ ture a representation of a user's biometric characteristics or if the signal process­ ing subsystem is unable co extract features from the sample. An acquisition error depends on multiple factors such as user training, user interface and capture device form factor, environment conditions, and sample quality threshold. A user who has not been given adequate training or is unsure about proper interaction wich che device may provide an incomplete sample or exceed the system defined cime-out threshold. Interaction issues gain even more importance in unsupervised systems as there is no operator co provide corrective feedback co che user. A classic example of an interaction issue is a user not knowing how long co keep his or her finger placed on the sensor, A common issue for first-time users is removing their fingers before the capture process has completed, which leads co an acquisition error, En­ vironmental conditions also have a significane impact on biometric systems, which is discussed in lacer chapters. Sample quality assessment ensures chat low-quality daca is kept out of the system and the besepossible sample is used for che different processes. Similar co che decision subsystem, quality assessment has a threshold

2/17

7/5/2021 Biometrics in Identity Management: Concepts to Applications

that determines the level of noise allowed in the sample. Biometric samples that do not pass this quality threshold are rejected and users are asked co presene another sample.

2.3 Performance Metrics

19

2.2.2 Matching Errors

The matching subsystem compares two biometric samples and produces a similarity score. The similarity score is compared co a matching threshold and a decision is made about che source of che two samples. Based on chis operation, two different types of errors can be committed:

· The inability co correctly assess whether two remplaces are from the same user;

· The inability co correctly assess whether two remplaces are from different users.

Matching errors can be attributed co several sources, some of which overlap with acquisition errors. Human factors, improper user training, and environmental factors impact the consistency of samples captured by the sensor, which in turn impacts che matching process. The matching of two samples is typically performed on samples chat are captured with a cime gap between chem; for example, a user creates a fingerprint enrollment template today and then returns after a week and provides che same fingerprint for matching purpose. In the course of a week che user may have scarred his or her finger, which temporarily alters the fingerprint representation. This can lead co a mismatch error.

2.3 Performance Metrics

In this section the fundamental system errors discussed in Section 2.2 are described in terms of error races and the proportion of users affected by che errors.

2. 3.1 Failure to Enroll Rate (FTE)

The FTE is che proportion of user enrollment transactions that cannot be completed according co the enrollment policy [1 J. The root ca use of an FTE could be any of the ones described in Section 2.2.1 and is governed by administrative policy decisions. For example, consider an enrollment policy chat allows che user three successive attempts in order co complete the process by presenting an acceptable biometric sample. If a user cannot presene an acceptable biometric sample in three attempts, then it is considered a failure co complete the enrollment process. The number of attempts in chis case is purely a policy decision.

2.3.2 Failure to Acquire Rate (FTA)

3/17

7/5/2021 Biometrics in Identity Management: Concepts to Applications

The FTA is che probability of user attempts during identification or verification for which the system cannot acquire an appropriate sample [1]. Even though the root ca use for FTA and FTE could be the same, they are differentiated based on the pro­ cess during which the error occurs. This ensures that errors are not counted twice and are attributed co the appropria ce process as opposed co a particular subsystem.

20 Fundamentals of Technical Evaluations

2.3.3 False Nonmatch Rate (FNMR)

The false nonrnatch race (FNMR) is calculated as the proportion of samples from genuine attempts chat cannot be matched against che enrolled remplaces of genuine users [2].

FNMR= Number of rejected genuine comparisons

Total number of genuine comparisons

2.3.4 False Match Rate (FMR)

(2.1)

The FMR is calculated as che proportion of samples from imposter attempts are successfully matched against che enrolled remplaces of genuine users [2].

FMR =

Number of imposter comparisons

-----~~--~---

Total number of imposter comparisons

that

(2.2)

The root cause of both false match and false nonmarch errors can be traced co matching errors described in Section 2.2.2. It should be noted these error races are calculated based on a single attempt and not on a verification or identification transaction. Verification and identification transaction errors are described in Sec­ tions 2.3.5 and 2.3.6.

2.3.5 Verification Performance Metrics

During verification a user makes a claim co an identity and che input sample is com­ pared co the associated enrolled template for the claimed identity. The output of the verification transaction is either an acceptance or a rejection of the claim, and the errors based on these claims are described next.

2.3.5.1 False Reject Rate (FRR)

The FRR is calculated as the proportion of verification transactions from genuine users that will be incorrectly rejected [2]. For single-attempt transactions, the FRR

4/17

7/5/2021

Biometrics in Identity Management: Concepts to Applications

includes che FTA. Thus, the formula for calculating

FRR is given by

FRR = FTA + FNMR* (1- FTA)

(2.3)

2.3.5.2 False Accept Rate (FAR)

The FAR is calculated as che proportion of verification transactions from imposters that will be incorrectly accepted [2]. For single-attempt transactions a false accept occurs only if a false match occurs without a failure co acquire error, Thus, the for­ mula for calculating FAR is given by

2.3 Performance Metrics

21

FAR=FMR*(l-FTA)

(2.4)

Both FAR and FRR calculations are based on transactions. As discussed ear­ lier, a transaction is a policy decision that cakes into account the number of failed attempts. Consider a biometric system that allows a user three failed verification attempts before rejecting the user. ln the case of a failed verification transaction, there are three false nonrnatch errors, bue only one false reject error will have taken place. This distinction is extremely importane for the calculation of system errors such as FMR and FNMR and process errors such as FAR and FRR.

2.3.5.3 Equal Error Rate (EER)

The EER is calculated as the point where the FAR and FRR are equal. This race is also called the crossover error rate. A lower EER indicates a better overall matching performance. The EER is often used for comparing multiple biometric systems, bue it does not provide much benefit for operational evaluation because both error races are given equal weighting.

2. 3.5 .4 Generalized Error Rates

The FAR and FRR cake into account the FTE as parc of the final error race. The FAR and FRR of a system can be positively influenced by increasing the FTE of a system and keeping problematic users out of che system. ln cases where multiple systems need co be compared, a direct comparison of che FAR and FRR will not highlight the crue differences in system capability. The generalization of error races is then necessary for comparing multiple systems with a different FTE. Generalized error races for the FAR and FRR, GFAR and GFRR, are calculated by combin­ ing enrollment, acquisition, and matching error races. Every instance of an FTE is created as a successful enrollment, but every subsequent verification attempt against or by the user is created as an error.

The GFAR is described as a proportion of verification transactions from im­ posters chat will be successfully enrolled and incorrectly accepted by che system.

5/17

7/5/2021

Biometrics in Identity Management: Concepts to Applications

GE4R = (1- FTE) * F,;4R

(2.5)

The GFRR is described as proportion of verification transactions from genuine

users who will be incorrectly rejected, will fail co enroll, or will not be successfully

acquired for verification.

GFRR = FTE +(1-FTE)* FRR

(2.6)

2. 3.6 Identification Performance Metrics

As discussed in Chapter 1, the output of identification can be a candidate lise of en­ rolled users who are the most similar co the input sample. The identification rank r of a user is the smallest-sizedcandidate lise of which the user is a member. As a part

22

Fundamentals of Technical

Evaluations

of bese practices, che coral number of users enrolled in a database is also mentioned

along wich che rank of a user, For example,

if the coral number

of enrolled users in

a database

is n, the user's rank

is presented

as rank rout of n.

The closed-sec identification performance

is described with

respect co rank, as

the input sample belongs co an enrolled

user, The identification

race ac rank r is che

probability

that che enrolled

user is a member of the candidate

lise of size r for an

identification transaction. Cumulative match

characteristic (CMC)

curves are gen­

erally used co report the closed-sec identification performance,

which

is described

in Section 2.5.

In open-sec identification

the input

sample could potentially

belong co a nonen­

rolled user, The f aise negative

identification

rate (FNIR)

and false positive identifi­

cation rate (FPIR) are metrics

used co describe

open-sec identification performance.

The FNIR is che proportion

of identification

transactions performed

by enrolled

users that returns a candidate

lise of which

they are not a member, The FPlR is

the proportion of identification

transactions

performed

by nonenrolled

users that

returns a candidate lise of which they are a member, For

example,

che FPIR is the

probability

of an innocent

traveler being on

a candidate

lise when

his or her bio­

metric is compared against

a watchlist,

FNIR =FTA+(l-FTA)*

FNMR

(2.7)

FPIR = (1-FTA)*

[1-(1-F-MR)"]

(2.8)

where nis the number of enrolled templates.

2.4 Type I and Type 11 Errors

6/17

7/5/2021 Biometrics in Identity Management: Concepts to Applications

Statistical hypothesis testing forms the basis of evaluating biometric system match­ ing errors. In statistics hypothesis testing is conducted co cese an assumption or a claim. For every assumption, a null hypothesis and a corresponding alternate hypothesis are generated. For a biometric system a null hypothesis sea ces chat two samples being compared belong co the same individual, and the alterna ce hypothesis scares that the two samples being compared belong co different individuals:

Null hypothesis H0, samples belong co sarne individual P

Alternate hypothesis H1: samples do not belong co the same individual P

If two samples that are being compared belong co che sarne individual bue are determined co come from different individuals, then the null hypothesis is rejected and an error is committed. Such an error is called a Type I error, If the two samples that are being compared belong co different individuals but are determined co be­ long co che sarne individual, then the alternate hypothesis is accepted and an error is committed. Such an error is a Type II error. A false reject or false nonmatch error is analogous co che Type I error, and a false accept or false match error is analogous co che Type II error, The genuine and imposter comparisons produce match scores

2.5 Performance Curves

23

that can be represented by genuine and imposter score distributions. The ultimate goal in biometric system design is co create a system that produces genuine and imposter distributions chat do not overlap because the area of overlap signifiesche coral proportion of errors produced by che system. The threshold decides the pro­ portion of errors chat are categorized into false accepts and false rejects. Figure 2.1 illustra ces chis concept. By moving che threshold co che righe, che proportion of false accepts decreases and the proportion of false rejects increases. This is che underly­ ing principle that governs che trade-off between security and che convenience of a biometric system, which is discussed in che next section.

2.5 Performance Curves

Previously discussed error races provide the capabilicy of assessing a single error category such as false accepts or false rejects. A biometric system is designed co maximize security and convenience. For a biometric system security is the abil­ ity of the system co detect imposter transactions reliably and accurately, whereas convenience is the ability of che system co detect genuine transactions reliably and accurately. A more detailed and realistic analysis of performance measures requires it co be viewed as a function of security and convenience. Detection error trade­ off (DET) curves and cumulative match characteristic (CMC) curves are two such analytical methods used predominantly in che biometrics domain for performance analysis. A DET curve plots the false match and false nonmacch races on the x-axis and the y-axis as a function of che threshold. Thus, the DET curve graphically pres­ ents che trade-off between che two error races. For each possible threshold value

7/17

7/5/2021 Biometrics in Identity Management: Concepts to Applications

the two error races are calculated and plotted on the graph. Thus, each point (x, y) represents che combination of error races ac a particular threshold value. A DET curve can also ploc the FAR and FRR on the x-axis and the y-axis as a function of the threshold, and this represents a trade-off in che transaction errors. Figure 2.2 illustrates a DET curve. The closer the curve is co the origin, the becter is the per­ formance of che system. An ideal biometric system with no errors will have a DET curve that is a straight line on che x-axis or the y-axis. DET curves are often plotted

Threshold

Probability

density

function

Imposter

distribution

False

rejects~-...\ ..

False

Genuine

distribution

Match score

Figure 2 1 Biometric system match score distributions.

24 Fundamentals of Technical Evaluations

1.0%

0.0% O.lo/o 1.0% 10.0% 100%

FMR

Figure 2.2 DET curve.

on a logarithmic scale on both che axes. This provides visual clarity co che graph lower error races, which in turn allows che viewer co visualize che trade-off in much greater detail.

Identification results are visually analyzed using a cumulative match character­ istic (CMC) curve. The x-axis of a CMC curve represents all possible rank values, and che y-axis represents che probability of correct identification ac each possible rank value. The CMC curve in Figure 2.3 graphically illustrates che identification accuracy of che biometric system against a variable sized candidate lise.

8/17

7/5/2021 Biometrics in Identity Management: Concepts to Applications

The analytical power of DET and CMC curves can be used co either analyze a single system or compare multiple systems. For instance, an administrator may already have decided which biometric system co use bue is undecided about che op­ timal threshold for che biometric system. In such a case che administra cor can ana­ lyze a DET curve and determine che optimal threshold based on acceptable trade­ off between che two error races. In another scenario an administrator might wane co compare multiple biometric systems and decide which one co select for deploy­ ment. In such a case multiple DET curves can be superimposed on che sarne graph and compared using a common scale. In Figure 2.4 DET curves for systems 51, 52, and 53 are shown superimposed on che sarne graph. Figure 2.5 shows CMC curves for four systems superimposed for comparion purposes. A single CMC curve or multiple CMC curves can be analyzed using che same methodology.

100%

90%

....

80%

70%

Q)

60%

"'

~

e

o

50%

·¡:¡

"'

u

40%

:.::

·¡:¡

e

30%

Q)

"C

20%

10%

0%

1 5 10 50 100 150 200

Candidate list size

Figure 2.3 CMC curve.

2.6 User-Specific Performance: Zoo Analysis

25

52

53

100%

.

.

c:.: 10.0%

¿

z

LL

1.0%

0.0%

.....

/

,/º

~

-.

-.. ~- ,, .

, ,

-

...

~

e EER

"

I

./

.....

.•

I

-/

~~'

I

0.1

IM.

1.0IMo

10.0IMo

100%

o

51

FMR

Figure 2.4 Comparison of multiple DET curves.

100%~--------------~..=····-

90%

2 -

······;.../

9/17

7/5/2021 Biometrics in Identity Management: Concepts to Applications

~

80%

-,--------. ..···········"· ..

····:

--

:~:

========:...-.~---~·-·~·····¿······/·,

/L..--

·~§

50ª7'(º.

+--

....·····

53.,.

.,.7/2

••••••

/ill""'

u

40%

,,······· .,..,.

~

30%

,...

/ ---

_..

~

e

-J-/

~

...<::=------::./-0------

QJ

I ,,,,-

---

=======

-o

201%% .e__:::-;;;---

;;-=

----------

0%

1

5

10

1 OO

150

200

Candidate list size

Figure 2.5 Comparison of multiple CMC curves.

2.6 User-Specific Performance: Zoo Analysis

Doddington et al., as parc of their research on speaker verification, devised an alternative method of analyzing biometric system performance [3]. Instead of con­ centrating on fundamental system error races and presenting che performance of a system in terms of overall error races, their approach analyzed genuine match scores and imposter match scores of users in a given system. In Doddingcon's zoo model all users of a biometric syscemare categorized into one of four animal groups based on che similarity scores they generate.

1. Lambs: Users belonging co chis group are easy co imitate. Imposters match­ ing against the remplaces of lambs generate a high similarity score. Lambs have an adverse impact on biometric systems, as they generate a high pro­ portion of false matches.

2. Wolves: Users belonging co this group easily imitate users belonging co och­ er groups. Wolves' remplacesgenerate a high similarity score when matched

26

Fundamentals of Technical Evaluations

against other users' templates. They have an adverse impact on biometric

systems, as they generate a high proportion of false matches.

3. Goats: Users belonging co this group are difficult co match against them­

selves. They have an adverse impact on biometric systems, as they generate

a high proportion of false nonmatches.

4.

Sheep: Users belonging co this group match well against themselves. A ma­

jority of the users of a biometric system belong co this group.

A zoo ploc can be visualized by calculating che average genuine matching scores and average imposter matching scores for all users of a biometric system and plot­ ting them on 2-D graph of genuine match scores versus imposter match scores. The resulting animal groups separate into four different quadrants of the zoo ploc, as shown in Figure 2.6.

10/17

7/5/2021 Biometrics in Identity Management: Concepts to Applications

Doddington's zoo analysis uses genuine match scores or imposter match scores co identify issues and suggest improvements. Dunstone and Yager added four more animals co the original zoo, which uses che relationship between genuine match scores and imposter match scores for categorizing users [4]. These four animal groups are:

1. Chameleons: Users belonging co this group always appear similar co them­ selvesand other users. They generate a high similarity score, irrespective of whom they are matched against, thereby generating false matches.

2. Phantoms: Users belonging co this group do not appear similar co them­ selves or co other users. They generate a low similarity score, irrespective of whom they are matched against, thereby generating false nonmatches,

3. Worms: Users belonging co this group appear similar co other users but do not appear similar co themselves. They generate a low similarity score when matched against their own templates and generate a high similar­ ity score when matched against users of other groups. They generate false matches and false nonrnatches and represent the most problematic user group of the biometric system.

~Q)

60

o

u

50

s:V,

u

.....

"'

40

E

~

30

Q)

~

o

o.

-~ 20

Q)

Wolves and

lambs

.......................

•..........

.

......

.

•:

···~·····•·········;

..

• • .

~:

.

:

..1;;..··;;.;··'==···::..:··:..:.·

..::• ..I

·41t--:•··..

.•:.'•

••:

:.

······~·········'.'

··r-- ......

~ -~- .;.~~.-

. .~.

L~

·······• ···············~··~--~ ···;··~··:~t~...

~:··~!~·1-¡::~~:¡,·

c::1

.

s=:':

.... . .

-~·:

.

.

:

.• ······~ ·~· :-·

\.

···;~~~~··········

Ol

..... -~···

,

.

~ 10

Q)

"'

>

~

100

200

300

400

500

o

Average genuine match score

Figure 2.6 Doddington's zoo analysis.

2.7 Evaluation Methodologies

27

4. Doves: Users belonging co this group appear similar co themselves, bue do not appear similar co ocher users. They generate high similarity scores when matched against their own remplaces and generate low similarity scores when matched against users of ocher groups. Users of this group represent che most ideal users of the biometric system.

The zoo analysis in ics entirety provides a framework for isolating user-specific issues, which is not possible using global error race analysis. For example, if a spe­ cific user or group of users is contributing disproportionately co che error races, zoo analysis can identify such a group of users and preventive measures can be taken

11/17

7/5/2021 Biometrics in Identity Management: Concepts to Applications

that only affect that group of users and not all users of a system. This is the typical problem scenario that most biometric system administrators have co address in an operational system.

2.7 Evaluation Methodologies

Juse as software and hardware engineering have specific cesting methodologies based on the specific objectives, biometric system performance is evaluated using three different testing methodologies: technology evaluation, scenario evaluation, and operational evaluation [5]. Table 2.1 summarizes the differences in these three methods.

One very important factor that secs the evaluation of biometric systems apare from the evaluation of ocher security systems, such as cryptographic systems, is che need co use biometric daca collected from humans. Although biometric daca model­ ing has made great strides in the lasedecade, a crue evaluation of a biometric system still requires daca collected from live humans. This increases the complexity and che financial coseof conducting biometric system evaluations.

Table 2.1 Comparison of Technology, Scenario, and Operational Evaluation

Characteristic

Technology

Scenario

Operational

Scope of test

Individual subsystems,

Complete system

Complete system

combination of

su bsystems

Data processing

Off-line

Online or off-line

Online

Repeatable

Possible

Depends on level

Not possible

of control

Comparable

Possible

Depends on level

Not possible

of control

Ground truth

Known

Known

Generally unknown

Real-time feedback

Nor possible

Possible

Possible

Test supervisor

Required

Application dependent

Application dependent

Level of control

High

Medium

Low

(environment, users)

Performance metrics

FTA,FTE,

FTA, FTE, FMR,FNMR, Throughput; other

FMR,FNMR, FAR,

FAR, FRR, throughput

metrics require ground

FRR

truth

Source, (1].

28 Fundamentals of Technical Evaluations

2.7.1 Technology Evaluation

The goal of technology evaluation is co evaluate any of che biometric subsystems for a target application. Typically a technology evaluation is conducted co compare multiple biometric subsystems from different vendors. A well-known example of a technology evaluation is the Fingerprint Verification Competition (FYC) [6]. The

12/17

7/5/2021 Biometrics in Identity Management: Concepts to Applications

goal of che FVC is co cese multiple fingerprint recognition algorithms on che same sec of fingerprints collected from specific fingerprint sensors. Such an evaluation provides several benefits. Daca collection and daca processing are conducted sepa­ rately, chus providing a complete dataset for cross comparisons. The ground truth of the all daca is known so that the crue identity is linked with all biometric samples. Since there is no requirement for such a system co operate in real cime, it is well suited co laboratory research. Technology evaluation can also be used co evaluate changes in multiple versions of the same algorithm as well.

2.7.2 Scenario Evaluation

The goal of a scenario evaluation is co determine che overall

performance of a bio­

metric system in a simulated application chat is representative

of the real-world ap­

plication. A scenario evaluation includes che entire biometric

system from the daca

acquisition subsystem co che decision subsystem. Scenario evaluations are conducted in real time, although certain ceses separate daca collection from daca processing. The key benefit of a scenario evaluation is che inclusion of human interaction vari­ ability and che ability co conduce enrollment, verification, and identification trans­ actions. As discussed earlier, transactions are a product of policy decisions, and a

scenario evaluation

provides a method for evaluating different policy decisions. If

all the transactions

are completed in real cime, throughput analysis is also possible.

A scenario evaluation does not allow a true comparison

of multiple systems

as the biometric

daca used for each evaluation is not collected

from

the same sen­

sor. Environment

conditions, human interaction issues, and subject

demographic

differences have

an

impact on the eventual performance of the system, and it is

impossible co replicate all of these among multiple systems. A scenario evaluation is useful for understanding impact of real-world factors on the system performance without having co deploy the system. Like technology evaluations, che ground truth of daca is known, which allows a deeper analysis on the matching errors.

2.7.3 Operational Evaluation

The goal of an operational evaluation is co determine performance of a complete biometric system deployed in a real-world application chat is being used by a par­ ticular user population, Such an evaluation can be viewed as a monitoring and maintenance activity. Results of operational evaluations are nonrepeatable, so a comparison of multiple operational evaluations is impractical. For example, a com­ parison of operational evaluations of che same face recognition system ac an airport and a seaport will provide different results as the environmental effects and the demographics of the users for the two applications are completely different. In operational evaluations che ground truth cannot be determined, which limits the

2.8 Design of Evaluation

29

amount of analysis chat can be conducted on matching errors. For example, the system being evaluated could have been functioning for a long time, and in such

13/17

7/5/2021 Biometrics in Identity Management: Concepts to Applications

a case the enrollments have already been performed before che evaluation com­ menced. Also, evaluators do not have any control over che target population, which gives chem less control over specific manipulation of cese variables. Operational evaluations can be extremely useful for conducting user experienceand throughput analysis. Users are more likely co provide an honest opinion of their experiences in an operational evaluation compared co a laboratory or a simulated cese. Through­ put analysis is also more realistically determined when che biometric system is in­ tegrated into che operational infrastructure. For example, the throughput of face recognition integrated with a turnstile system will be different than the throughput of face recognition used as parc of the network logon.

2.8 Design of Evaluation

Previous sections of this chapter discussed performance metrics and evaluation methodologies as they relace co biometric systems. This information is necessary for designing effective evaluations and reporting results of these evaluations. This is an important section even for readers who might never conduct an evaluation, as it discusses the process behind an evaluation and provides readers wich the skills co ask pertinent questions when analyzing other system evaluation repores.

One should chink of a biometric system evaluation as a scientific experiment where a system is created with several carefully chosen input and the effect of chis input on the output is observed. As wich any other experiment, a couple of key questions need co be answered before starting an evaluation:

· What is/are the main factor/factors being evaluated?

· Which evaluation methodology {technology, scenario, operational) is the most suitable co achieve the overall objective of the evaluation?

The answers co these questions will determine specific details such as the num­ ber of genuine transactions and imposter transactions, number of subjects, daca collection procedures, calculation of performance metrics, and information co be collected from che system.

A well-designed evaluation requires determining which variables have an im­ pact on performance and how co control them. These factors are generally cat­ egorized into five groups (7]. Readers familiar with the design of experiments will notice a direct correspondence with the following groups:

1. Factors that are observed;

2. Factors that are manipulated co see what effect they have on the observed factors;

3. Factors that are controlled so that they have a negligible effect on che evaluation;

4. Factors that are randomized co minimize their impact on the evaluation;

5. Factors that have a negligible effect and are ignored.

14/17

7/5/2021

Biometrics in Identity Management: Concepts to Applications

30

Fundamentals of Technical Evaluations

The determination of how to categorize various factors into these groups re­ quires expertise in data collection, design of experiments, statistical evaluation, and, most importantly, the particular biometric technology under observation. The ISOIIEC 19795-1 [1] lists several factors that have an impact on performance, al­ though it should be noted that this list is not exhaustive, nor does each factor affect all biometric technologies. Table 2.2 lists the general factors that have an impact on biometric system performance.

2.8.1 Sample Quality

Sample quality assessment is an extremely important step in biometric system eval­ uations. Sample quality is impacted by the inherent characteristics of the user, the system deployment characteristics, the deployment environment, the interaction between the user and system, or a combination of any of them. Quality is an ex­ tremely subjective concept and it depends on the context in which it is being used. Within the biometrics domain, the term quality is used in three different contexts [8]:

1. Fidelity reflects the accuracy of a sample's representation of the original source.

2. Character reflects the expression of the inherent features of the source.

3. Utility reflects the observed or predicted positive or negative contribution of the biometric sample to the overall performance of a biometric system.

The ISO/IEC 29794-1 :2009 standard [8] specifiesa modality independent qual­ ity derivation and interpretation framework. The standard prescribes that quality score should be indicative of performance metrics such as the FTA, FTE, FMR, and FNMR. Utility, as described earlier, is a function of both fidelity and character and predictive of system performance. Extensive research is being conducted in this area to identify covariates that can be used in calculating the utility of a biometric sample.

A discussion of different variables that have an impact on biometric technolo­ gies is given in each specific technology chapter (Chapters 3 through 10), along with a discussion of its impact on recognition performance.

Table 22 Factors Affecting Biometric

System Performance

Factors

Examples/Description

Population

Age, gender, occupation, ethnic origin

demographics

Application

Time between transactions, identification or verification processes,

number of attempts per transaction, supervised or unsupervised system

User physiology

Physical appearance such as height, baldness, height, color of iris, skin

tone, and cold and sore throat

User interface

Physical or nonphysical interaction, auditory or visual feedback, num­

ber of interactions for a complete transaction

Environmental influences Temperature,

ambient lighting, humidity, rain, and snow

Data acquisition sensors

Dirt, residue,

obstruction, form factor

Database size

Number of individuals in the enrolled database

Source: (2].

15/17

7/5/2021 Biometrics in Identity Management: Concepts to Applications

2.9 Reporting Biometric Evaluations

31

2.9 Reporting Biometric Evaluations

Performance evaluations are heavily dependent on che type of biometric system, the application context, che deployment environment, che target population, the number of users participating in che cese, and other factors. A properly formulated report provides the reader with accurate information regarding the evaluation so that it can be interpreted correctly without any ambiguity. Table 2.3 lises the areas that are critical as part of a comprehensive performance evaluation report.

2.1 O Summary

This chapter introduces readers co various copies related co the performance evalu­ ation of biometric systems including fundamental system errors and error races, transaccional error races, graphical techniques for analyzing these error races, and system evaluation methodologies. Performance evaluations are critical for success­ ful deployments; it gives decision-makers information ac their disposal co make educated decisions about procurement, system administrators can fine-cune per­ formance based on specific application context and predict future performance, and vendors can identify performance issues chat need co be addressed. The goal of this chapter is co lay a solid foundation for conducting performance evaluations of specific biometric modalities. Biometric technologies have advanced significantly in the lasedecade and their use in specificapplications will increase in the near future. The ability co conduce meaningful comparisons and assessments will be crucial co successful deployments and increasing biometric adoption.

Table 2.3 Components of an Evaluation Report

Area Description

Details of system tested Entire biometric system or a particular component of the entire system

Type of evaluation Technology,scenario, or operation

Design details Explanation of data collection protocol, and categorization of various

factors as described in Section 2.8

Details of test Physical layout of collection area, environment conditions, time of year

environment

Crew size Number of individuals who participated in the data collection along

with the breakdown of genuine and imposter groups

Demographics of crew Breakdown of age, gender, occupation, and other relevant information

Thresholds Sample quality and matching threshold

Transaction policy Numberof transactions for enrollment, verification, or identification

along with time difference between presentations and attempts

Performance metrics Justificationfor calculation of specific performance metrics and for­

mula for calculating them

Deviations Any abnormalities and outliers should be reported; for example, if any

particular user was discarded from the evaluation, a reason for doing

so should be present in the report

16/17

7/5/2021 Biometrics in Identity Management: Concepts to Applications

Source, (2].

17/17