Annotated Bibliography Assignment

biwickks123

sources.zip

Home >English homework help >Annotated Bibliography Assignment

sources/139/Collins - 2019 - Hardware Trojans in FPGA Device IP Solutions Thro.pdf

Hardware Trojans in FPGA Device IP: Solutions Through Evolutionary Computation

Zachary Collins

A Thesis In Partial Satisfaction

of the Requirements for the Degree of

Master of Science

Computer Engineering

in the

Graduate Division

of the

University of Cincinnati College of Engineering and Applied Science

Committee in Charge:

Professor Rashmi Jha, Chair Professor Anca Ralescu

Dr. David Kapp

Spring 2019

Abstract

Evolutionary Removal of Hardware Trojans in FPGA IP

Zachary Collins

Master of Science in Computer Engineering

University of Cincinnati

College of Engineering and Applied Science

Prof. Rashmi Jha, Chair

As the hardware supply chain continues to globalize, hardware designers are becoming

more concerned about the security of their designs. Because FPGAs are accessible to nearly

anybody, and few organizations design and manufacture them, the FPGA supply chain is

particularly complex. In recent years, hardware designers and the academic community

have begun considering the possibility of hardware Trojans, malicious modifications to

hardware circuitry, in FPGA IP - the code used to program an FPGA device. As more and

more hardware designers incorporate somewhat unverified IP purchased from 3rd-party

hardware vendors into their designs, the need for a comprehensive model of FPGA security

increases.

Many design strategies for detecting and tolerating Trojans in FPGA devices have been

proposed. Many of these strategies focus on catching Trojans at test-time. This is undesir-

able, as Trojans often employ complex techniques to hide themselves during testing. Some

Trojan tolerance systems have been suggested, but no system exists that will allow FPGA

systems to completely mitigate the effects of, or remove, all types of Trojans from FPGA

IP.

In order to develop a comprehensive system for FPGA protection, it is important to

understand what types of threats might exist in FPGA IP. Trojans can have a variety of

different activation mechanisms and effects, and it is important to thoroughly understand

all of them in order to ensure security of FPGA designs. There exist some taxonomies, or

classifications, of the Trojans that may exist in FPGA hardware, rather than in the IP. Some

of these taxonomies acknowledge the existence of Trojans in FPGA IP, but none provide a

classification of the threats Trojans in IP can pose.

This work presents a comprehensive taxonomy of the payloads (effects) of Trojans in

FPGA IP. This taxonomy is built on the taxonomies of hardware Trojans, and other work

in the field of FPGA IP viruses and Trojans. The goal of this taxonomy is to concisely

categorize and summarize the different threats hardware designers face when integrating

3rd-party IP into their designs, and to provide an analysis of existing mitigation strategies

and their effectiveness against the various types of Trojans. This work also examines what

Trojans that may exist in FPGA IP are relatively unaffected by existing Trojan detection

and tolerance schemes. It is important for hardware designers to be able to design systems

that can tolerate any type of Trojan, not just a small subset of them.

Finally, this work presents a novel Trojan tolerance strategy using genetic program-

ming, a type of biologically-inspired computation. Genetic programming, inspired by ge-

netic crossover and mutation in biological organisms, can be used to modify software and

guide it toward a better solution by iteratively improving on the design using a variety of

biological operations. Because 3rd-party IP is often delivered as hardware design language

(HDL) code, genetic programming is uniquely adept at removing Trojans when they are

detected in the code. Results show that genetic programming can be used to remove a va-

riety of Trojans from FPGA IP. This Trojan tolerance scheme can be used to repair FPGAs

at run-time, without human intervention. This effect is desirable because many FPGAs are

deployed in aerospace and other uptime-sensitive fields, where having to bring a device

down may endanger lives or incur large monetary costs.

understand, v.: To reach a point, in your investigation of some subject, at which you cease

to examine what is really present, and operate on the basis of your own internal model

instead.

/usr/bin/fortune

Acknowledgements

I would first like to thank my thesis advisor, Professor Rashmi Jha, for supporting me

and providing direction in the time I’ve spent working on my research and writing. I would

similarly like to thank Professor Anca Ralescu and Dr. David Kapp for their valuable

advice and feedback on my work, and for serving on my thesis committee.

I would like to thank my labmates, particularly Michael Santacroce, for watching my

work’s progress and providing suggestions for improvement week by week, and Michael

for supporting me through writing this thesis.

Finally, I would like to express my sincere gratitude to my parents, and to Laura Tebben,

for providing me their encouragement throughout my years of study and the time I have

spent on this work. Finishing my thesis would never have been possible without them.

TABLE OF CONTENTS

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Chapter 2: Hardware Trojans . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Taxonomies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Taxonomy of Trojans in FPGA IP . . . . . . . . . . . . . . . . . . . . . . 11

2.2.1 Trojans that Cause Malfunction . . . . . . . . . . . . . . . . . . . 11

2.2.2 Trojans that Prevent FPGA Operation . . . . . . . . . . . . . . . . 13

2.2.3 Trojans that Inject Faults . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.4 Trojans that Cause Side Effects . . . . . . . . . . . . . . . . . . . . 16

2.2.5 Trojans that Leak Information . . . . . . . . . . . . . . . . . . . . 16

2.2.6 Trojans that Waste FPGA Resources . . . . . . . . . . . . . . . . . 18

2.2.7 Trojans that Introduce Vulnerabilities . . . . . . . . . . . . . . . . 19

2.3 Existing Trojan Mitigation Strategies and FPGA IP . . . . . . . . . . . . . 20

2.3.1 Trojan Detection Techniques . . . . . . . . . . . . . . . . . . . . . 21

2.3.2 Trojan Tolerance Techniques . . . . . . . . . . . . . . . . . . . . . 23

Chapter 3: Evolutionary Algorithms and Evolvable Hardware . . . . . . . . . . 25

3.1 Evolutionary and Genetic Programming . . . . . . . . . . . . . . . . . . . 25

3.2 Evolvable Hardware In FPGAs . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3 Genetic Programming-based Evolvable Hardware . . . . . . . . . . . . . . 33

3.3.1 Background and Justification . . . . . . . . . . . . . . . . . . . . . 33

3.3.2 Past Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.3 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4 Trust-Oriented Applications of Evolvable Hardware . . . . . . . . . . . . . 41

3.4.1 Applications to Trojans in FPGA IP . . . . . . . . . . . . . . . . . 43

3.4.2 Applications to Trojans in FPGA Hardware . . . . . . . . . . . . . 44

Chapter 4: Genetic Programming-based Evolvable Hardware for FPGA Security 46

4.1 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2 Experimental Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Chapter 5: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

LIST OF TABLES

3.1 Fault Tolerance in Evolved Hardware - Canham et al. . . . . . . . . . . . . 42

3.2 Fault Tolerance Systems - Larchev et al. . . . . . . . . . . . . . . . . . . . 43

vii

LIST OF FIGURES

1.1 FPGA Supply Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2.1 Taxonomy Proposed by Wang et al. [8] . . . . . . . . . . . . . . . . . . . . 6

2.2 Taxonomy Proposed by Chakraborty et al. [10] . . . . . . . . . . . . . . . 8

2.3 Taxonomy Proposed by Mal-Sarkar et al. [1] . . . . . . . . . . . . . . . . . 10

2.4 Taxonomy of Trojans in FPGA IP . . . . . . . . . . . . . . . . . . . . . . 12

2.5 Triple Modular Redundant FPGA System . . . . . . . . . . . . . . . . . . 23

3.1 Sample Abstract Syntax Tree . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2 Mutation in an AST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3 Crossover in an AST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.4 Intrinsic Evolvable Hardware System . . . . . . . . . . . . . . . . . . . . . 30

3.5 Extrinsic Evolvable Hardware System . . . . . . . . . . . . . . . . . . . . 32

3.6 Reduced Verilog BNF [38] . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.7 AST of 8 to 1 MUX from one run . . . . . . . . . . . . . . . . . . . . . . 38

3.8 Alternate AST of 8 to 1 MUX from same run . . . . . . . . . . . . . . . . 39

3.9 AST of 8 to 1 MUX from a second run . . . . . . . . . . . . . . . . . . . . 39

3.10 AST of 8 to 1 MUX from a third run . . . . . . . . . . . . . . . . . . . . . 40

4.1 GENPEFS System Design . . . . . . . . . . . . . . . . . . . . . . . . . . 48

viii

4.2 AST of correct 4 to 1 MUX . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3 4 to 1 MUX Number of Generations To Remove Trojans . . . . . . . . . . 51

4.4 AST of correct 8 to 1 MUX . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.5 8 to 1 MUX Number of Generations To Remove Trojans . . . . . . . . . . 52

4.6 AES Encryption Module Number of Fitness Evaluations To Remove One Trojan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.7 AES Encryption Module Number of Fitness Evaluations To Remove Two Trojans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

CHAPTER 1

INTRODUCTION

Recent years have had increasing globalization of the semiconductor supply chain. Hard-

ware designers are becoming more concerned about the security of their designs as more

of the hardware design and manufacturing process moves overseas. FPGAs have a partic-

ularly long supply chain - FPGA vendor, foundry, PCB vendor, 3rd-party IP1 vendors, the

end hardware designer, and the end user. This work focuses on security concerns that the

hardware designer, the person writing code to be programmed onto the FPGA, might face

- particularly security concerns from integrating 3rd-party IP into FPGA designs.

Figure 1.1: FPGA Supply Chain

Concerns about security in FPGA designs have lead hardware designers and academics 1IP refers to the code programmed onto the FPGA. FPGA code is typically written in Verilog or VHDL,

and 3rd-party IP is delivered to hardware designers in one of these languages.

to increasingly consider the possibility of Trojans at any point in the hardware supply chain.

Hardware and FPGA design security have existed for some time, but the integration of 3rd-

party IP into designs is a recent phenomenon. This is primarily because hardware designs

are becoming increasingly complex. It is often not feasible for hardware designers to create

a design end-to-end in-house. Instead, hardware designers look to 3rd-party hardware IP

vendors, who sell commercial-off-the-shelf components for hardware designs.

While a lot of focus has been put into understanding hardware Trojans in silicon and in

circuit boards, the idea of Trojans existing in FPGA IP is somewhat novel. Recently, work

has acknowledged the existence of Trojans in FPGA IP, but there is no comprehensive

discussion and analysis of what threats hardware designers might face when they integrate

3rd-party IP into their designs [1]. In order for hardware designers to better understand the

threats they face when integrating 3rd-party IP intro designs, it would be useful to have a

comprehensive classification and analysis of hardware Trojans in FPGA IP.

In addition to a need for understanding what threats are present in FPGA IP, there is

also a need for more comprehensive FPGA Trojan mitigation strategies. There are many

Trojan detection and tolerance strategies in existence, but no strategy is able to overcome

every type of Trojan. In general, Trojan tolerance is more desirable than Trojan detection.

Test-time Trojan detection methods are useful, but Trojans often hide themselves during

testing and surface later, meaning the only way to operate an FPGA without fear of Trojans

is to develop Trojan tolerant designs.

Most Trojan tolerance strategies use redundancy at some level in the system in order to

duplicate results and hopefully choose the correct one. One such strategy is Triple Modular

Redundancy, one of the most common Trojan tolerance strategies [2]. Redundancy-based

strategies always have some sort of cost - area in the design, power consumption, and in

the case of IP the cost of purchasing multiple copies of the IP from different hardware

vendors [1]. Further, these strategies are unable to mitigate some Trojans, like those that

leak information. If a Trojan leaks information through a side channel, duplicating the IP

will not help [3]. The system might compare results and choose the most common one in

order to pick the correct result, but information will still be leaked each time the infected

IP is used [4]. Clearly, there is a need for a more comprehensive tolerance system.

Evolvable hardware - a hardware design strategy inspired by biological processes - has

been used in the past to repair faults in hardware [5]. FPGAs are particularly suited to

evolvable hardware design strategies due to their reconfigurability. Similarly, genetic pro-

gramming can be used to repair bugs in software [6]. Because 3rd-party IP is delivered

as HDL code, genetic programming might be uniquely fit for repairing bugs - or Trojans -

in FPGA IP. Despite some difficulties in employing evolvable hardware strategies in mod-

ern FPGAs, it might be possible to repair Trojans located in FPGA IP using a genetic

programming-inspired evolvable hardware technique.

Removing Trojans from FPGA IP using genetic programming would allow automatic

repair of IP Trojans without any human intervention. A system using such a strategy would

be able to minimize downtime of the hardware, and hopefully deal with the types of Trojans

that current Trojan tolerance schemes cannot handle. Minimizing downtime is particularly

important in uptime-sensitive applications like aerospace, or more recently, autonomous

vehicles. Downtime in those types of applications can endanger lives, or incur massive

costs to the developer and end user.

The rest of this work analyzes past taxonomies of hardware Trojans and their applica-

bility to Trojans in FPGA IP. It then presents a the first Taxonomy of Trojans in FPGA IP,

based on past work in creating hardware Trojans in IP. Then, it examines the effectiveness

of a variety of existing mitigation strategies, and examines the need for a new, more com-

prehensive system. It then examines how evolutionary and genetic algorithms have been

used for hardware design and fault tolerance. It proposes a novel genetic programming

(GP)-based approach to employing evolvable hardware. Then, it explores the effectiveness

of this method in removing Trojans from FPGA IP, by inserting Trojans into a variety of

circuits and attempting to remove them using this GP-based evolvable hardware strategy.

Finally, it considers considers future work in developing this GP-based evolvable hardware

strategy and how it might be applied to Trojans in FPGA hardware. This work demonstrates

successful removal of various hardware Trojans from different circuit designs. Importantly,

it finds that the only successful modification to a large circuit using genetic programming

results in removal of the Trojan and no other changes.

CHAPTER 2

HARDWARE TROJANS

Hardware Trojans are malicious modifications made to circuitry in order to produce some

undesirable effect, usually without the hardware designer’s knowledge. In recent years, the

hardware supply chain has become increasingly globalized. Few semiconductor businesses

own their own foundries, and many of those 3rd-party foundries are located outside of the

US. As hardware designers outsource more and more of their work, the threat of hardware

Trojans has increased. Often, hardware designers do not design their products end-to-end.

Many hardware designers make use of an outside foundry, or use externally designed pe-

ripheral components. In the case of FPGAs, hardware designers produce only the IP to

be run on the FPGA, while all hardware is purchased from a vendor. More recently, how-

ever, hardware vendors are offering commercial-off-the-shelf IP blocks, allowing hardware

designers to outsource even more of their work. Ultimately, any hardware or software com-

ponent that is outsourced is a potential attack vector, and should be considered untrusted.

Hardware Trojans in FPGAs can exist in many different forms. They can be located in

the FPGA silicon, on the FPGA PCB, or even in the IP used to program the FPGA. These

Trojans can be characterized by a variety of their parameters - payload, trigger mechanism,

physical size, power consumption, etc. Previous attempts have been made to comprehen-

sively characterize Trojans based on many of these parameters. This chapter explores these

characterizations, discusses Trojan mitigation strategies, and proposes a taxonomy of Tro-

jans that can be found specifically in FPGA IP, something that has not yet been done.

2.1 Taxonomies

Few taxonomies have been proposed to categorize hardware Trojans. These taxonomies

often categorize Trojans using their payload, their activation mechanism, and their physical

characteristics. This section explores the taxonomies that have been proposed in the past,

and considers their applications to Trojans in FPGA IP.

An early, not comprehensive taxonomy of hardware Trojans discussed that they may

have varying activation triggers and payloads [7]. A more complete taxonomy of hardware

Trojans was first discussed by Wang et al. in [8], and then expanded upon by Tehranipoor

et al. in [9]. Wang et al. proposed the taxonomy seen in Figure 2.1.

Figure 2.1: Taxonomy Proposed by Wang et al. [8]

This taxonomy categorizes Trojans by three different types characteristics - their phys-

ical characteristics, their activation characteristics, and their action characteristics.

Physical characteristics describe how the Trojan is placed in FPGA hardware. These

characteristics might not we worthwhile in the scope of FPGA IP Trojans, but it is best

to consider them here anyway. Physical characteristics include the distribuiton, structure,

size, and type of the Trojan. Distribution describes how and where the Trojan is placed

on the FPGA hardware - is there one Trojan in each LUT, are the Trojans on select LUTs,

and so on. It describes the Trojan’s physical location on the chip. Structure refers to the

Trojan’s internal logic and routing, rather than where the Trojan is located on the chip.

Size categorizes Trojans based on their physical size on the chip. Type makes a distinction

between two main types of Trojans - functional and parametric. Functional Trojans produce

errors or undesirable effects in logic by addition or deletion of gates, while parametric

Trojans modify functionality of existing wires [9].

Activation characteristics explain how a Trojan is triggered. Trojans are either inter-

nally or externally activated. Externally activated Trojans are activated when a sensor or

communication device reads an external signal or environmental condition telling the Tro-

jan to activate. Internally activated Trojans are activated either combinationally (some rare

condition) or sequentially (after a certain amount of time).

Action characteristics explain what a Trojan actually does when it is activated. This is

perhaps the most worthwhile classification of Trojans, as it can be used to develop different

Trojan tolerance strategies. This taxonomy breaks action characteristics into three cate-

gories: Trojans that transmit information, Trojans that modify specification, and Trojans

that modify function. Trojans that transmit information aim to leak information from the

hardware system to somewhere else. This information can be internal to the system, like

an encryption key, or can be external information read using sensors in the hardware sys-

tem. Trojans that modify specification change nonfunctional requirements of the hardware

system, like clock speed. Finally, Trojans that modify function change the actual logic of

the system. Modified logic can be as simple as an incorrect result for an operation, or as

complex as taking over the FPGA system and using it for some other task.

This classification provides an interesting first attempt at categorizing hardware Tro-

jans. Later taxonomies provide more elaboration on the action characteristics, and mostly

ignore physical characteristics of the Trojans. Physical characteristics can mostly be ig-

nored as they are only useful in detecting the Trojan using destructive or side-channel anal-

ysis, both of which are difficult to perform and somewhat ineffective at detecting Trojans

[4, 10, 3].

The next classification to consider is that by Chakraborty et al. in [10]. This proposed

taxonomy of hardware Trojans omits classification by physical properties and focuses only

on triggers and payloads. A reproduction of this taxonomy is seen in Figure 2.2.

Figure 2.2: Taxonomy Proposed by Chakraborty et al. [10]

This taxonomy organizes Trojans by their trigger and their payload. Payloads can be

either digital or analog payloads to the circuit, or payloads that don’t have an effect on the

logic (other). Digital payloads change the logic in the circuit at a node, or modify memory

content in the hardware system. Analog payloads modify the circuit electrically, but often

end up producing a logical effect. Bridging analog Trojans tie a node in the circuit to Vdd or

Gnd. Delay Trojans modify the delay between nodes in a circuit. Activity Trojans generate

excess activity in the circuit in order to shorten its lifespan. Finally, Trojans may participate

in more software-based attacks like leaking information from the circuit or participating in

a denial of service attack to shut down important system functions [10].

Trojan triggers are organized into digital and analog triggers. Analog-triggered Trojans

are activated using sensors, or by monitoring device electrical activity. Digitally-triggered

Trojans are activated either combinationally or sequentially. Combinationally activated

Trojans are activated when some rare value occurs at a node in a circuit. Sequentially

activated Trojans are activated by a synchronous counter, asynchronous counter, hybrid

counter, or a rare sequence at a node in the circuit.

This taxonomy is useful particularly because of the expanded payload taxonomy. Pay-

loads are the most important aspect of a Trojan to understand. Understanding Triggers

helps hardware designers produce designs to avoid triggering Trojans, or to intentionally

trigger them during testing. However, the most effective form of Trojan mitigation is a

Trojan-tolerant circuit design, and comprehensively understanding what payloads Trojans

may bring helps hardware designers create designs that are able to deal with Trojans left

undetected during testing.

The final and most comprehensive taxonomy is that proposed by Mal-Sarkar et al. in [1]

and [4]. This taxonomy for the first time suggests that Trojans may also exist in FPGA IP,

though does not elaborate further. Equally importantly, this taxonomy focuses exclusively

on Trojans that may exist in FPGA devices, rather than in hardware devices in general.

Figure 2.3 shows a reproduction of this taxonomy.

This taxonomy has approximately the same content as that proposed by Chakraborty

et al. in [10], but organizes it somewhat better. Trojans in FPGA hardware can be either

conditionally triggered or always on. Conditionally triggered Trojans can be triggered by

an evironmental factor or some logic in the circuit. Trojan payloads can cause malfunction

or leak secret information. Secret information can be either FPGA IP or data inside the

FPGA. Malfunctions can be logical, like a logical error, or parametric, like modifying the

clock frequency of the circuit.

Figure 2.3: Taxonomy Proposed by Mal-Sarkar et al. [1]

This taxonomy is important because it is focused specifically on FPGAs. Earlier Trojan

taxonomies that did not have an FPGA focus ignored the possibility of leaking IP. Addi-

tionally, this work for the first time mentioned Trojans in FPGA IP as a possibility, but did

not include a detailed taxonomy of what types of Trojans might exist in IP.

These taxonomies give hardware designers an idea of what types of Trojans might exist

in hardware designs. Unfortunately, there is no comprehensive taxonomy of hardware

Trojans in FPGA IP. Creating such a taxonomy might help hardware designers understand

what kinds of threats they face when integrating 3rd-party IP into their FPGA systems.

2.2 Taxonomy of Trojans in FPGA IP

So that hardware designers may better understand the threats Trojans in FPGA IP pose to

hardware designs, it would be useful to have a single comprehensive taxonomy of hardware

Trojan effects in FPGA IP. Some attempts at categorizing Trojans that can be found in

FPGA hardware have been made [9, 10, 1, 4], to this date there exists no comprehensive

categorization of Trojans that can be found in FPGA IP. The categorization presented in

Figure 2.4 attempts to categorize the payloads, or effects, of Trojans that may be found in

FPGA IP. It is based on the earlier taxonomies of hardware Trojans and on FPGA Trojans

and viruses examined in other work. This categorization focuses only on the payloads of

FPGA IP Trojans because the categorization based on their triggers will likely be similar

to Trojans in FPGA hardware, and has not yet been explored enough to be mature. Mal-

Sarkar et al. suggest in [4] that categorizing Trojans based on their size and distribution

may not lead to interesting results or help hardware designers, so those categorizations

are also omitted. It is worthwhile to consider that a single Trojan may fit into multiple

categories in the taxonomy. A Trojan that hijacks an FPGA may also send the FPGA’s

previous configuration data to an attacker. A Trojan that injects faults may do so in order to

leak information, such as in fault-based cryptanalysis [11, 12, 13]. The work in this section

has been submitted in [14].

2.2.1 Trojans that Cause Malfunction

The class of Trojans existing in FPGA IP that many existing mitigation strategies aim to

tolerate is those that cause the FPGA to malfunction. The intensity of the malfunction can

range from a simple logical error to complete device failure with an electrical fault. These

Trojans can be further classified into Trojans that Prevent FPGA Operation and Trojans

F ig

ur e

2. 4:

Ta xo

no m

y of

T ro

ja ns

in F

P G

A IP

that Inject Faults.

2.2.2 Trojans that Prevent FPGA Operation

Some previously explored hardware Trojans aim to disrupt the operation of an FPGA.

This type of attack was first introduced by Hadzic as the SALT Trojan discussed in [15].

Whereas Trojans that electrically damage an FPGA are not probably not possible to pro-

duce in an HDL, it is somewhat easy for Trojan designers to produce Trojans that disable or

even reprogram the FPGA. When a hardware designer integrates a 3rd-party IP block into

a design on an FPGA, the 3rd-party IP block may have access to a multitude of advanced

FPGA reprogramming features. The 3rd-party IP block may be able to write information

to the FPGA’s configuration SRAM. Trojans that Prevent FPGA Operation can be further

divided into two subgroups, Trojans that Disable the FPGA and Trojans that Hijack the

FPGA.

Trojans that Disable the FPGA have been somewhat explored in past work. One of the

first papers on FPGA Secuity, the aforementioned ”FPGA Viruses” [15] by Hadzic et al.,

discusses three FPGA-disabling classes of viruses. The first, a permanently damaging virus

class titled MELT, is probably not possible through FPGA IP due to protections built into

FPGA synthesis software. The other attack classes, SALT and HALT, cover attacks that

cause system malfunction without permanently damaging the FPGA. These types attacks

are possible to implement in HDL code, especially when an attacker is aware of the FPGA

system the code will be deployed on.

Druyer et al. discuss the possibility of hijacking an FPGA system during configuration

in [16]. Dynamic FPGA reprogramming features provide an opportunity for attackers to

hijack an FPGA at any point in time. Inactive FPGA configurations are usually stored in

SRAM on the FPGA. Alternately, new configurations can be delivered to the FPGA through

a network interface. If a 3rd-party IP has access to the SRAM or a network interface, it may

be able to modify dormant FPGA configurations. This creates the opportunity for malicious

3rd-party IP blocks to reprogram the FPGA using a malicious configuration. Any FPGA

that is able to reprogram itself using a dynamic reprogramming feature, or has access to

its own configuration SRAM, may be vulnerable to a 3rd-party IP block modifying the

configuration. If a 3rd-party IP block is able to modify the configuraiton, there is no limit

to what it can do, including taking over the FPGA.

2.2.3 Trojans that Inject Faults

Hardware Trojans in FPGA IP may also manifest themselves as fault-injecting Trojans.

These Trojans aim to disrupt operation of the FPGA by providing an incorrect result in

some circuit operation. Fault injecting Trojans are one of the types that has been studied

most often in developing Trojan tolerance strategies. Many conventional Trojan tolerance

strategies, such as Adapted TMR [4] and MRVO [3] aim to mitigate the effects of fault-

injecting Trojans. We can further classify fault-injecting Trojans into Trojans that Inject

Internally Verifiable Faults and Trojans that Inject Internally Unverifiable Faults [17].

Past work in FPGA Trojans has discussed fault-injecting Trojans whose faults are in-

ternally verifiable [17]. A fault-injecting Trojan is internally verifiable if the result an IP

block produces can be determined to be correct or incorrect without duplication of the sys-

tem. An example of a Trojan whose result is internally verifiable is an FPGA that produces

instructions to control a CPU. If the FPGA sends an invalid opcode to the CPU, the result

is verifiably wrong.

In come cases, fault-injecting Trojans may be unverifiable without system duplication.

Fault-injecting Trojans produce undetectable results when their results are indistinguish-

able from a correct result [17]. Incorrect results can be indistinguishable from correct re-

sults in a variety of circumstances. For example in [17], an FPGA provides instructions for

a computer system to run. The authors explain that in the studied system, an incorrect result

is indistinguishable from a correct result as long as the incorrect result is a valid instruction

for the system to run. Fortunately, numerous redundancy-based verification strategies have

been proposed in past work, most of them using a majority voting scheme to determine

the correct result [9, 4, 2, 3]. In the case of Trojans in 3rd-party IP blocks, designers may

choose to purchase multiple copies of the same IP from different vendors and run those IP

blocks concurrently or one at a time using partial reconfiguration [3].

Listing 1 shows a simplified example of a fault-injecting Trojan. The Verilog code for a

4-state machine is infected with a trojan triggered by the t trojan signal that causes the state

machine to unexpectedly reset. Unless the signal is activated during testing, the Trojan will

not be caught. Although fault-injecting Trojans can be dangerous, the variety of existing

mitigation strategies makes them less of a threat than other types of Trojans [2].

module fourState (

input clk, input reset, input transition, input t_trojan, output [1:0] state

);

wire clk; wire reset; wire transition; wire t_trojan;

reg [1:0] state;

always @(posedge clk or negedge reset) begin if (!reset) begin

state <= 0; end else begin

if (t_trojan) begin state <= 0;

end else if (transition) begin

state <= state + 1; end

end end

endmodule

Listing 1: Fault-Injecting Trojan in State Machine

2.2.4 Trojans that Cause Side Effects

Trojans in FPGA IP may have a payload that does not interfere with the logic of the design.

These Trojans instead cause some side effect, often leaking information about the FPGA

or wasting resources on a task not intended by the victim of the Trojan. Although these

Trojans don’t always disrupt the FPGA’s work, they may have other extremely costly con-

sequences. These Trojans can be classified into Trojans that Leak Information and Trojans

that Waste FPGA Resources.

2.2.5 Trojans that Leak Information

Trojans in FPGA IP may aim to leak information about the FPGA or its environment [1, 3,

9]. These Trojans do not disrupt normal operation of the FPGA. Instead, they expose secrets

about the FPGA - other IP implemented on the FPGA, data in the FPGA, or information

from the systems surroundings. Trojans that leak information have been studied in the

context of Trojans in FPGA hardware, but few tolerance strategies exist. We further classify

these Trojans into Trojans that Leak System-Internal Information and Trojans that Leak

System-External Information.

Trojans that leak system-internal information discreetly send information about the

FPGA or data it is processing to a malicious actor. These Trojans can leak secret infor-

mation such as cryptographic keys [18]. Leaking cryptographic keys may allow attackers

to access even more data outside of the FPGA system. Trojans may also aim to leak infor-

mation about the FPGA’s design and configuration stored in the bitstream. Due to the high

design costs of FPGA systems, Trojans may attempt to steal information about the FPGA

configuration in order to reverse engineer and steal the system [1].

Trojans may also leak information external to the FPGA system. These Trojans take ad-

vantage of sensors and other peripheries to gather and transmit information such as the sys-

tem’s location and surroundings. These Trojans pose a particularly large threat to defense-

related applications, where the locations of equipment must be kept secret from potential

module encryption (

input [255:0] key, input [255:0] secret, input control, input t_trojan, input clk, input reset, output [255:0] encrypted

);

wire [255:0] key; wire [255:0] secret; wire control; wire t_trojan; wire clk; wire reset;

reg [255:0] encrypted; reg [255:0] tmp;

always @(*) begin . . . // Encryption logic . . .

always @(posedge clk) begin if (!reset) begin

encrypted <= 256'b0; end else begin

if (t_trojan) begin encrypted <= secret;

end else begin

encrypted <= tmp; end

end end

endmodule

Listing 2: Information Leaking Trojan in Encryption Module

attackers.

Listing 2 shows an example of an information-leaking Trojan. This Trojan, again trig-

gered by the t trojan signal, outputs an unencrypted version of the secret when triggered.

This again may not be caught during testing if the activation signal is never triggered.

Few strategies have been proposed to mitigate the effects of information-leaking Tro-

jans. Traditional redundancy-based approaches will not suffice, as the Trojans’ leaking

information is separate from the results the IP block produces. Alanwar et al. propose

a technique called Simple Blockage (SB) to obfuscate all information before it is sent to

other parts of the system. This technique prevents malicious actors from reading any data,

even when it is leaked [3]. Additionally, it comes at a lower design and power cost than

redundancy-based approaches to security. However, hardware designers must take care to

monitor every communication port on the FPGA, or the Trojan may be able to leak infor-

mation undetected. Additionally, this strategy is not appropriate when the 3rd-party IP is

responsible for implementing a communication protocol, as encryption of any data before

it leaves the FPGA may break the implementation.

2.2.6 Trojans that Waste FPGA Resources

Trojans may also aim to impede an FPGA system by consuming an excessive amount of

system resources. These Trojans may clog up network traffic or simply consume an ex-

cessive amount of power. Trojans that Waste FPGA Resources may do so by performing

a separate, unrelated task to the benefit of an attacker, or by maliciously wasting system

resources for no benefit. Because these Trojans have very similar effects and countermea-

sures, we do not draw a distinction between them in the taxonomy.

One potential motivation for resource-wasting Trojans is for an attacker to use the

FPGA for their own benefit while avoiding detection. Chakraborty et al. discussed the

idea of a Trojan harnessing an FPGA for a denial of service attack in [10]. These Trojans

are similar in motivation to those that hijack a system, but more difficult to detect because

the FPGA may remain completely functional while the Trojan is active. In modern FP-

GAs that use very small transistor processes, even measuring power consumption is not a

reliable way to detect Trojans in FPGAs with very small transistor sizes. The concept of

FPGAs used in a botnet is very similar to the recent Mirai IOT device botnet virus [19]. We

should anticipate that malicious actors may also take advantage of FPGAs in a similarly

difficult to detect attack.

Resource-wasting Trojans may also aim to disrupt operation of the FPGA by consuming

too many resources available to the FPGA. These Trojans may attempt to use all of the

bandwidth for some communication mechanism, as in a denial of service attack. They

may also simply increase the power consumption of an FPGA to an unsustainable level,

behaving like computer power viruses [20] or the hardware power viruses discussed in

[15].

Few Trojan tolerance strategies focus on these types of Trojans. Any system that aims

to tolerate resource-wasting Trojans must first be able to detect them. Although DOS-

based attacks are easy to detect, Trojans that don’t waste an excessive amount of resources

must be detected through some other type of system monitoring. Alanwar et al. discuss

two strategies that help disable infected IPs, Multiplexing Reconfigurable Variants’ Ouput

(MRVO) and Multiplexing Reconfigurable IPs’ Outputs and Cyclic Redundancy Check

Trojan Detection Schema (MCRC) in [3]. These strategies both suggest using partial re-

configuration to swap out infected IPs at run-time. These redundancy-based approaches

help mitigate the effects of resource-wasting Trojans, though they require at least one un-

infected copy of the IP to fully eliminate them.

The genetic programming-based evolvable hardware strategy we introduce in a later

section addresses side effect-inducing Trojans particularly well. It aims to disable the ma-

licious functionality while preserving all of the intended behavior, which is not possible in

most redundancy-based Trojan tolerance approaches without a golden copy of the IP.

2.2.7 Trojans that Introduce Vulnerabilities

A type of Trojan that has not been discussed in previous taxonomies is the Trojan that

introduces other vulnerabilities into the system, but has no other ill effects. These Trojans

have been discussed in software Trojan vulnerabilities, and there is no reason they cannot

also exist in hardware [21]. Hardware designers should be particularly careful about these

types of Trojans, as they might be very difficult to detect. Producing no immediate payload

makes the Trojan’s effect undetectable until the vulnerability allows another sort of attack

on the FPGA.

Trojans that introduce other vulnerabilities into the system might also be inserted by

hardware designers rather than only by 3rd-party IP vendors. These types of Trojans can

be difficult to distinguish from design errors, so they can allow malicious actors to have

some sort of plausible deniability.

2.3 Existing Trojan Mitigation Strategies and FPGA IP

In order to understand what work is needed in the area of FPGA IP Trojan mitigation, it is

necessary to examine current Trojan tolerance schemes and their effectiveness against the

various types of IP Trojans. This section examines a variety of different Trojan tolerance

strategies and examines the effectiveness of each in dealing with the different types of

Trojans in FPGA IP.

It is important to differentiate between Trojan detection and Trojan tolerance. Trojan

detection strategies provide a way for hardware designers to detect Trojans in their sys-

tems. These detection strategies may be at test-time, or continuous during run-time. Trojan

tolerance strategies are run-time Trojan mitigation techniques that allow a hardware design

to function normally in the presence of one or more hardware Trojans. In this sense, Trojan

tolerance techniques might be more useful in a design than Trojan detection techniques.

Because Trojans often hide themselves during testing, a tolerance technique will give hard-

ware designers more confidence that their designs are secure [22]. Trojan tolerance tech-

niques are usually based on either design for security (DFS) or run-time monitoring [23].

2.3.1 Trojan Detection Techniques

Trojan detection techniques can be divided into destructive and non-destructive techniques.

Destructive techniques are based on taking apart an IC, and examining it with a microscope

[24]. These are of course not useful for Trojans in FPGA IP, because they rely on examining

the hardware itself for Trojans. In place of destructive approaches, hardware designers

might consider thoroughly examining the code or netlist delivered by a 3rd-party IP vendor.

Of course, understanding the code requires a lot of engineering effort, and often that kind

of effort is not available when in-house design is forgone in favor of 3rd-party IP blocks.

Non-destructive techniques are often based on more comprehensive testing or side channel

analysis, and may be more useful for detecting Trojans in IP.

Logic testing-based Trojan detection techniques rely on detecting the Trojan through

extensive logical coverage of the device. Because combinational logic scales exponentially

with respect to the amount of inputs, it is very difficult to detect Trojans during testing.

Some strategies have been proposed in an attempt to make Trojan detection easier.

One such strategy is MERO, proposed by Chakraborty et al. in [25]. MERO creates

a set of tests to minimize test time while maximizing coverage in the device. Better test

coverage in less time should lead to better Trojan detection rates. MERO works by finding

low probability conditions at every node in a circuit, and creating vectors specifically for

triggering those rare conditions more than once. This strategy increases how often nodes

that are resistant to random pattern testing are triggered, hopefully revealing Trojans during

test-time [22].

Logic testing-based approaches to Trojan detection often use a coverage metric to deter-

mine the probability of a Trojan making it through testing undetected. Because of the huge

combinational complexity of large hardware designs, deterministically checking whether

there are any mistakes in the circuit is not possible. Instead, designers may choose to

randomly sample from known hardware Trojan triggers and Trojans and place them at dif-

ferent points in the circuit [25]. Running all tests against these Trojans develops a metric

of Trojan trigger coverage and detection coverage. Hardware designers may use this to test

Trojan resistance in circuits, but a fairly comprehensive set of possible Trojans to sample

from is necessary for adequate coverage [22].

Side channel analysis can also be used to detect Trojans in hardware systems. Side

channel analysis involves measuring electrical characteristics of the chip, like static current,

dynamic current, or power [22]. Static current analysis measures leakage current in the

device. Static CMOS gate idly leak some current when they are not switching. Measuring

differences in static current may be used to differentiate between golden circuits and those

with Trojans [26]. Transient current analysis measures switching current in an attempt

to measure whether more gates than expected are switching [27]. Unfortunately, process

variations in static current, switching current, and many other device parameters makes

side channel analysis techniques difficult and sometimes unreliable [22].

In [22], Bhunia et al. discuss a variety of different techniques to increase trust in 3rd-

party hardware IP. Particularly interesting are suggestions that hardware designers should

purchase multiple copies of a 3rd-party IP and programmatically compare them, as in [28].

Comparing different copies of the same IP may allow designers to find malicious modifi-

cations, working under the assumption that multiple 3rd-party vendors will not include the

same exact Trojan. This may be an extensive practice, but many Trojan tolerance schemes

rely on redundancy so purchasing multiple copies may be necessary regardless.

Hardware designers may also take advantage of proof-carrying code to ensure security

in their designs [29]. Hardware designers may agree on a set of formal proofs that a 3rd-

party IP vendor’s delivered IP must satisfy. Designers may anticipate that any malicious

changes will break these contracts. When the IP is delivered, the IP vendor must demon-

strate that all proofs still hold. Of course, the IP vendor may design proofs to accommodate

malicious modifications, but this is still somewhat more secure than receiving 3rd-party IP

code with no formal verification [22].

Despite the downfalls of many of these Trojan detection methods, there are no limits

to what types of Trojans most of them will detect. In that sense, these Trojan detection

methods might be more versatile than Trojan tolerance techniques that will only tolerate a

subset of all FPGA IP Trojans.

2.3.2 Trojan Tolerance Techniques

Trojan tolerance techniques aim to make hardware designs resistant to or tolerant of Trojans

that are not detected during runtime. There is a variety of design for security techniques

that aim to make hardware designs more difficult to infiltrate with a Trojan [22], but this

section will focus on run-time monitoring and other run-time techniques.

Run-time monitoring of Trojans is effective for all types of Trojans discussed in the

taxonomy, as long as the monitoring system is comprehensive enough [23]. Fault-injecting

Trojans can be monitored using an anomaly detection technique, while side effect-inducing

Trojans can be monitored by an system external to the FPGA observing activity on all

of the FPGA’s communication devices. In some cases, monitoring and checking before

output may not be enough to prevent information leakage, such as with advanced fault-

based cryptanalysis attacks [30].

Figure 2.5: Triple Modular Redundant FPGA System

Many run-time monitoring systems are based on a redundancy and majority voting

scheme, such a Triple Modular Redundancy (TMR) and its derivatives [2]. In these sys-

tems, three redundant copies of the same IP are used in conjunction with an oracle or

majority voter. Because the three redundant copies of the IP are purchased from different

vendors, it is unlikely that the same Trojan will exist in all 3 copies. The majority voter

determines the correct output based on what output most copies of the IP provided, and

uses that as the actual output to the system. This system only protects against logical er-

rors in the circuit, where a Trojan provides an incorrect result. The IPs are still able to

leak information or perform other malicious actions. Various improvements on this type

of system have been proposed. In [4], Mal-Sarkar et al. discuss a more energy-efficient

adaptation of TMR. In [3], Alanwar et al. suggest a variety of modular redundant systems,

including some that flag infected IPs and swap them out for unused, clean IPs using partial

reconfiguration features available in FPGAs.

Alanwar et al. also suggest a data obfuscation method called Simple Blockage (SB) [3].

SB involved obfuscating data before sending it through any communication device using

a key that is shared between the device and whatever system is receiving the information.

This can help mitigate Trojans that aim to leak information. If all output from the FPGA is

obfuscated sufficiently, attackers will not be able to use any leaked information. Of course,

SB cannot be used when an IP implements a communication protocol, as obfuscating it

may lead to the protocol no longer working.

Few or no Trojan tolerance strategies aim to tolerate all types of side effect-inducing

Trojans. Those that leak information are not affected by redundancy, and those that waste

FPGA resources are not affected by SB. Trojans that aim to disable or hijack the FPGA

are affected by neither strategy. Even though fault-injecting Trojans may be detected using

redundancy, a redundant system incurs large design area and power costs [1], and purchas-

ing additional copies of 3rd-party IP may be prohibitively expensive or impossible [3]. It

is clear that a new Trojan tolerance strategy is needed, in order to comprehensively tolerate

(or at least mitigate the effects of) all types of Trojans.

CHAPTER 3

EVOLUTIONARY ALGORITHMS AND EVOLVABLE HARDWARE

Evolvable hardware is a hardware design strategy where a hardware designer uses evolu-

tionary or genetic algorithms to produce a circuit, instead of designing the circuit by hand.

Evolvable hardware algorithms mutate and mate hardware configuratoins in order to im-

prove on them. They maintain a population consisting of some number of individuals, each

individual some representation of a hardware design. These algorithms iteratively improve

on the hardware design and eventually produce an optimal individual in the population.

This chapter discusses evolutionary algorithms and their uses in hardware design. In

particular, it examines the applications of evolutionary algorithms to FPGA programming.

This chapter also presents an evolvable hardware strategy that excels in the face of con-

straints imposed by complications in modern FPGA technology.

3.1 Evolutionary and Genetic Programming

Evolutionary algorithms use techniques inspired by biological evolution to improve some

metric in a population. They iteratively make modifications to individuals in a population

with the goal of improving the population over time. Evolutionary algorithms consist of

four main components - a population, a fitness function, a selection function, and a mutation

function.

The population is a set of individuals that represent whatever we are trying to create or

improve upon. In evolvable hardware, the population is some representation of a circuit de-

sign. For example, each individual may be a netlist, abstract syntax tree (AST) representing

a hardware design language (HDL) program, or an FPGA configuration bitstream. If we

choose to represent each individual as an AST, the algorithm is referred to as evolutionary

programming.

Figure 3.1: Sample Abstract Syntax Tree

In evolutionary (or genetic) programming, individuals are represented as abstract syntax

trees (ASTs). Figure 3.1 shows an abstract syntax tree. This AST represents the program:

Algorithm 1: Program in Sample AST 1 if x == 0 then 2 y := y + 1 3 end

A fitness function is an algorithm that evaluates and assigns a fitness to each individual

in a population. In the case of evolvable hardware, a fitness function may synthesize and

then run a test bench on a netlist, the fitness value representing what portion of tests passed

on the netlist. In genetic programming, an AST is converted to code in whatever language

it represents, and then that code is compiled and tested.

A selection function simply selects an individual from the population to be mutated and

added to the new population. Selection functions generally prefer more fit individuals but

do provide some randomness so that less fit individuals may make it to the next generation.

A common selection function is tournament selection, where a number of individuals are

compared one-on-one, with the more fit individual having a higher probability of winning

each match and being inserted into the next generation. Selection functions

Finally, the mutation function is an operation applied to individuals to slightly modify

them. For example, a mutation function may randomly place an additional gate in a netlist.

In evolutionary programming, a mutation function may choose a point in an AST to replace

with another randomly generated tree.

Figure 3.2: Mutation in an AST

Genetic algorithms introduce a second function used to modify individuals during each

generation. This operation is called a crossover (or mating) function. A one-point crossover

function chooses two individuals from the population and one point in each individual, and

swaps the individuals at those points. In genetic programming, a crossover function might

randomly choose one node in each AST and swap the entire subtrees at those nodes.

Figure 3.3: Crossover in an AST

Each iteration of the evolutionary algorithm is referred to as a generation. At each gen-

eration, every individual in the population is evaluated using the fitness function. Then, the

algorithm chooses individuals in the population to advance to the next generation using the

selection function. Each individual is mutated with some mutation probability and mated

with another individual with some crossover probability. Because the selection function

typically prefers more fit individuals, we expect the overall fitness of the population to in-

crease with each generation. However, most often the metric we care about is the fitness of

the fittest individual in the population, not the overall fitness across the entire population.

This process repeats until the algorithm reaches some stopping point: a generation limit, a

maximally fit individual, or some other metric chosen by the programmer.

In general, genetic algorithms follow this structure:

Algorithm 2: Genetic Algorithm 1 population := initializeRandomPopulation() 2 while numberOfGenerations ≤ maxGenerations do 3 foreach individual in population do 4 individual.fitness := evaluate(individual) 5 end 6 newPopulation := [] 7 while newPopulation.size ≤ population.size do 8 individual := selectFrom(population) 9 if random(0,1) ≤ crossoverProbability then

10 mate := selectFrom(population) 11 offspring := crossover(individual, mate) 12 newPopulation.add(offspring) 13 end 14 if random(0,1) ≤ mutationProbability then 15 offspring := mutate(individual) 16 newPopulation.add(offspring) 17 end 18 end 19 population := newPopulation 20 numberOfGenerations++ 21 end

Genetic programming algorithms are very much the same. The differences between ge-

netic programming and genetic algorithms in general lie in implementations of the fitness,

crossover, and mutation functions.

Evolutionary and genetic algorithms are useful to hardware designers because their

results are often competitive in terms of better performance, smaller area, lower power, etc.

with what human designers are capable of [31]. Hardware designers use genetic algorithms

to automate hardware design, produce results in an expanded search space human designers

would be unlikely to explore [32], or to make hardware systems adaptible without human

intervention [5, 33].

3.2 Evolvable Hardware In FPGAs

Evolvable hardware is most often implemented in FPGAs due to their reconfigurability.

Although evolvable hardware strategies can be used to design hardware for ASICs, evolv-

able hardware is most useful when a piece of hardware can be dynamically reconfigured

[33].

Most evolvable FPGA models have followed an intrinsic style of evolvable hardware.

We refer to an evolvable system as intrinsic when the evolutionary system is completely

contained within the hardware and the system is tested only in hardware, not in simulation

[32, 34]. This definition can be expanded on to introduce self-contained evolvable hard-

ware, where the entire evolutionary system is contained on the same piece of hardware. In

a self-contained system, the FPGA configuration is comprised of what are referred to as

an evolutionary module and a functional module [34]. The functional module contains the

evolved functionality the FPGA is responsible for. The evolutionary module is an oracle

responsible for evaluating and evolving the functional module.

Figure 3.4: Intrinsic Evolvable Hardware System

In 3.4 the evolutionary module produces a genome and programs the functional module,

while leaving itself intact. The evolutionary module evaluates the functional module’s

fitness and improves upon the genome.

Placing both modules on the same piece of hardware requires a number of restrictions

on the evolutionary algorithm. The evolutionary algorithm must not modify the portion

of the bitstream corresponding to the evolutionary module. If the portion of the bitstream

containing the evolutionary module is modified, the program may break itself. Further,

the evolutionary algorithm must not produce invalid FPGA configurations to avoid perma-

nently damaging the FPGA hardware [35]. These restrictions on the evolutionary algo-

rithm complicate the evolutionary module, limiting how much space is available for the

functional module. In general, intrinsic evolvable hardware systems severely limit what

functionality can be placed on an FPGA [34].

Extrinsic evolutionary models aim to fix some of these problems. An evolutionary

system is referred to as extrinsic when circuit configurations are simulated before imple-

menting them in hardware. An extrinsic system might only implement the fittest individual

from a population, while the rest are discarded after simulation. Extrinsic evolutionary al-

gorithms are not self-contained and separate the evolutionary hardware from the functional

hardware, for example running the evolutionary algorithm on a computer that manages the

FPGA [36]. This frees all of the FPGA hardware for the functional module, allowing the

algorithm to produce more complex hardware configurations. However, designers must

still ensure the algorithm will never produce an invalid bitstream or one that may damage

the FPGA hardware. Extrinsic evolutionary models come at the cost of additional hardware

used for generating and simulating the FPGA configuration before programming.

Figure 3.5: Extrinsic Evolvable Hardware System

In Figure 3.5 the external evolutionary system evolves a genome and uses it to program

the logic device. The evolutionary system monitors the logic device to measure fitness, and

improves upon the genome.

In modern FPGAs, the capabilities of evolutionary hardware are limited by the com-

plexity of bitstreams used to program the devices. Traditional methods of evolutionary

hardware design require manipulating FPGA bitstreams to produce different circuit config-

urations. Limiting evolutionary hardware to coherent circuits requires understanding how

the bitstream corresponds to lookup tables (LUTs) in the FPGA and limiting the evolution-

ary algorithm’s search space to coherent configurations [36]. The correspendence between

bitstream and configuration in modern FPGAs is proprietary knowledge not available to

the public. Additionally, many FPGAs encrypt bitstreams, further complicating the evolu-

tionary hardware design process. Any algorithm that evolves an FPGA bitstream requires

more knowledge about FPGA bitstreams than is available to the public about modern FP-

GAs [37]. These algorithms may necessitate an unencrypted FPGA bitstream, which may

be inappropriate for applications that require a high degree of security.

3.3 Genetic Programming-based Evolvable Hardware

This section is organized into three subsections. The first provides a background and

discusses the challenges solved and difficulties presented by genetic programming-based

evolvable hardware. The second discusses past work in generating Verilog HDL code. The

third discusses preliminary results in producing simple circuits using genetic programming,

in experimental results performed for this thesis.

3.3.1 Background and Justification

The idea of automatically generating Verilog HDL code has been explored in the past, by

Cullen [38] and Karpuzcu [39]. In [38], Cullen demonstrates using an extrinsic evolution-

ary programming technique referred to as Evolutionary Meta Programming to generate a

Verilog program. The program generated is a bit-slice of a full adder circuit, using behav-

ioral Verilog. In [39], Karpuzcu also builds a full adder using genetic programming. The

work demonstrates successfully building an entire adder module, including inputs and out-

puts, by evolving a circuit while following the Verilog grammar specification. These two

results are encouraging, and show that genetic programming may be a worthwhile approach

to building evolvable hardware systems.

A genetic programming-based approach to evolvable hardware might help to alleviate

issues with prior types of evolutionary hardware systems. Rather than evolving the FPGA

bitstream or a netlist, an evolutionary system should use genetic programming to produce

an FPGA program in a hardware design language (HDL), like Verilog or VHDL. A compre-

hensive system design for evolvable hardware using genetic programing to generate HDL

has not been explored. Such an evolutionary system will generate a correct HDL program

and synthesize it into an FPGA bitstream using the FPGA vendors place-and-route and

synthesis tools.

A genetic programming-based approach has many advantages over traditional evolu-

tionary hardware models, even extrinsic ones. It removes the need for algorithm design-

ers to add restrictions to the algorithm for adherence to an FPGA configuration standard.

In evolutionary hardware strategies that evolve an FPGA bitstream, it is possible to gen-

erate a bitstream that electrically damages the FPGA, and evolutionary hardware design

frameworks have even been proposed just to mitigate that possibility [35]. A genetic

programming-based is guaranteed to produce valid FPGA configurations as long as the

synthesis tools behave correctly. A valid HDL program compiled by an FPGA vendor’s

toolchain should not produce any electrical errors, though logical errors are of course pos-

sible and expected in an evolutionary system.

This approach can be expected to produce correct logic for any given circuit using a

smaller genome than by evolving the bitstream. For example, an adder can be represented in

Verilog code using 5 nodes in an abstract syntax tree (AST) one assignment, one addition,

two operands, and a destination for the assignment. In contrast, a creating 4-bit adder in a

bitstream requires configuring at least 256 lookup tables or LUTs.

However, the ability of such a system to produce efficient circuits is limited by the ca-

pabilities of the FPGA vendors synthesis tools. In one of the founding papers of evolvable

hardware, Thompson demonstrates that an evolved hardware design exploited quirks in the

FPGA hardware [32]. In this work, Thompson evolved an FPGA bitstream to produce

a pattern-recognizing circuit that was able to discriminate between 1kHz and 10kHz fre-

quency waves. The circuit used far fewer cells than a human-designed filter, but this was

due to the circuit exploiting unusual electrical properties of the cells in the FPGA. The cir-

cuit had logically disconnected components that when removed prevented the circuit from

working. Such a behavior may not be possible to replicate using FPGA synthesis tools.

3.3.2 Past Work

In [38], Cullen uses genetic programming to evolve a series of software programs, includ-

ing a full adder in Verilog seen in Figure 3. In this work, Cullen develops an evolution-

ary programming technique referred to as Evolutionary Meta Programming. Evolution-

ary Meta Programming is an evolutionary programming technique where compilation and

testing are completely separated from the evolutionary engine, as is the case in extrinsic

evolvable hardware and any genetic programming-based hardware model. An evolutionary

engine evolves a program’s genome, and creates a separate process on the computer for

evaluating each program, usually with a pipe in between the two programs to pass infor-

mation.

module chmain (x, y, z, s, c);

input x, y, z;

output s, c;

wire V0;

wire V1;

assign c = (z ˆ (˜ z & (z ˆ (x & (x + ˜y)))));

assign s = ((z + (˜c + (z + (˜c + (z ˆ (x & (x ˆ x))))))) + (x + y));

endmodule

Listing 3: Full Adder Slice [38]

Cullen’s results are relevant because this is the style of genetic programming necessi-

tated by FPGAs. Unfortunately, Cullen does not quantify results in this paper - there is

no mention of how many generations it took to evolve this circuit. Regardless, being able

to evolve a simple circuit using evolutionary programming is an encouraging result. It is

important to note that in this paper, Cullen evolved only the assignment statements and the

two miscellaneous wire declarations. The module declaration, inputs, and outputs were

fixed. Regardless, the result is still worthwhile - module declarations are usually specified,

not something that needs to be figured out by an algorithm.

In [39], Karpuzcu uses a genetic programming technique following the grammar of

Verilog to generate parse trees instead of generating ASTs. This evolutionary programming

strategy is referred to as Grammatical Evolution [40]. A grammar of a language specifies

< S > :: < blocking-assignment-s > < blocking-assignment-cout > < blocking-assignment-s > :: assign s = < rhs > ; < blocking-assignment-cout > :: assign cout = < rhs > ; < rhs > :: < binary-op > | < logical-not > < binary-op > :: < bitwise-and > | < bitwise-or > | < bitwise-xor > < bitwise-and >:: (< argument > & < argument > ) < bitwise-or > :: ( < argument > | < argument > ) < bitwise-xor > :: ( < argument > ˆ < argument > ) < logical-not > :: ! ( < argument >) < argument > :: < invar >| < binary-op-out > | < logical-not-out > < argument-out > :: < invar > | <binary-op-in> | < logical-not-in > < binary-op-out > :: < bitwise-and-out > | < bitwise-or-out > |

< bitwise-xor-out > < bitwise-and-out > :: (< argument-out > & < argument-out >) < bitwise-or-out > :: ( < argument-out > | < argument-out > ) < bitwise-xor-out > :: ( < argument-out > ˆ < argument-out > ) < binary-op-in > :: < bitwise-and-in > | < bitwise-or-in > |

< bitwise-xor-in > < bitwise-and-in > :: ( < invar > & < invar > ) < bitwise-or-in > :: ( < invar > | < invar > ) < bitwise-xor-in > :: ( < invar > ˆ < invar > ) < logical-not-out > :: ! ( < argument-out > ) < logical-not-in > :: ! ( < invar > ) < invar > :: a | b | cin

Figure 3.6: Reduced Verilog BNF [38]

the path a compiler will take while parsing the code. It describes how each type of statement

can expand. For example, an assignment statement right hand side can expand to either a

binary operation or a logical negation (Figure 3.6). Grammatical Evolution requires the

grammar of the language to be expressed in Backus-Naur Form (BNF) [41]. Individuals

are represented using variable-length genomes that essentially pick decisions in the parse

tree of the BNF [40].

Figure 3.6 provides the reduced BNF of the Verilog grammar used by Karpuzcu in [39].

This BNF provides the opportunity for the code generator in the algorithm to make two

types of statements: an assigment statement to the output variable s, and an assignment

statement to the output variable cout. Again the module header is predefined, and the

algorithm is only responsible for generating the correct expressions for assignment to both

the sum and the carry out variables.

Karpuzcu provides quantified results for generating the adder. The algorithm uses a

population size of 200, crossover probability of 0.5, and mutation probability of 0.1. In

2 of 35 runs, a completely correct idividual is generated. Those individuals took 50369

and 19772 fitness evaluations, respectively [39]. Although the results are not particularly

consistent, it is worthwhile that a completely correct individual was generated twice, using

a different evolutionary programming strategy.

module adder(a,b,cin,s,cout);

input a; input b; input cin;

output s; output cout;

assign s=(aˆ(bˆcin));

assign cout=(((bˆa)&(aˆcin))ˆa);

endmodule

Listing 4: Adder Evolved by Karpuzcu [39]

Listing 4 shows one adder generated in this work [39]. It is interesting to note that while

the logic generated was slightly more complicated than necessary (the cout assignment

could be simplified), it was significantly smaller than that generated by Cullen in [38], and

nearly ideal.

These results provide confidence in the idea that evolving Verilog (or other HDL) pro-

grams is the future of evolvable hardware.

3.3.3 Preliminary Results

To test this approach to evolvable hardware, we use the DEAP genetic programming frame-

work [42], introducing to the framework valid Verilog operators, to produce ASTs that can

be tested in the DEAP Python environment and are compiled and run on the Icarus Verilog

[43] FPGA simulator for fitness evaluation.

DEAP natively represents programs as a depth-first traversal of an AST, though the

program representation can be configured. For now, the the genetic algorithm is only al-

lowed to use the basic combinational logical operators available in Verilog AND, OR,

NOT, etc. These results use a genetic programming algorithm (as opposed to evolutionary

programming), adding a crossover function.

To demonstrate the model, a genetic programming algorithm is used to evolve an 8 to

1 MUX, an 11-input 1-output piece of hardware. Sample code to for a MUX evolution is

in fact provided in the DEAP framework documentation [44], and the problem was first

introduced by Koza in [45]. The algorithm uses population size of 1000, random growth

mutation with a probability of 0.3, and single-point crossover with a probability of 0.8.

Additionally, a hard tree depth limit of 80 nodes is added to prevent bloat. The initialized

population is a 1000 randomly generated ASTs with a depth between 3 and 5.

Preliminary testing shows that beginning with a randomly generated AST, the algorithm

can consistently produce a correct 8 to 1 MUX within 250 generations. Results also show

that various correct ASTs generated within a single run of the algorithm are fairly similar,

but ASTs generated in different runs of the algorithm often have no nodes in common aside

from the inputs. ASTs were graphed using the NetworkX Python library [46].

Figure 3.7: AST of 8 to 1 MUX from one run

Figure 3.8: Alternate AST of 8 to 1 MUX from same run

Figure 3.9: AST of 8 to 1 MUX from a second run

Figure 3.10: AST of 8 to 1 MUX from a third run

These results should give a good idea of whether genetic programming is scalable to

larger circuits. The circuits generated by Karpuzcu in [39] and Cullen in [38] are 3-input

circuits, significantly simpler than an 11-input circuit. The AST generated for an 8 to

1 MUX has 2048 possible input combinations, whereas a full adder has only 8. These

MUXes were evolved in approximately 250,000 evaluations, but that number may be im-

proved upon by tuning the algorithm over time.

Figures 3.7 through 3.10 provide four correct abstract syntax trees representing the

input-output relationship of an 8 to 1 MUX. Input bits IN0, IN1, and IN2 are the 3 select

bits, and the rest of the inputs are the 8 MUX input bits. Note the similarities in the ASTs in

Figures 3.7 and 3.8. These are two correct ASTs generated during the same evolution (i.e.

from the same seed population). Contrast these with Figures 3.9 and 3.10, both generated

from different random seeds.

These results suggest that evolutionary algorithms settle on a type of correctness and

produce homogeneity over time. This is likely due to an aggressive selection for more

correct individuals over less correct ones, and can be seen in any application of evolution-

ary algorithms. Once an individual becomes significantly more correct than the rest of a

population, it might be expected that it dominates the following generations. It is ques-

tionable whether this homogeneity is desirable. Exploring how to limit this homogeneity

by adjusting the algorithm’s parameters (or doing something radically different) may be a

worthwhile direction for future work.

3.4 Trust-Oriented Applications of Evolvable Hardware

This section explains past applications of evolvable hardware in FPGA trust and security,

and introduces the idea of removing Trojans from FPGAs or mitigating their effects using

evolvable hardawre.

One of the strongest applications of evolvable hardware has been in introducing adap-

tive fault tolerance to FPGA systems. Using evolvable computing systems to introduce

fault tolerance into a system was first discussed by Mange et al. in [47]. This work sug-

gests using evolvable computing systems to monitor and repair faults at run-time. Since the

proposal, much work has been done in the field of evolvable hardware-based fault tolerance

[48, 5, 49].

Fault tolerance built through evolvable hardware was explored by Canham et al. in

[48]. In this work, Canham et al. use evolutionary algorithms to develop a fault tolerant

hardware design. This work is particularly interesting because the hardware design did not

need to be evolved after a fault. The hardware design was created using an evolutionary

algorithm, and after it was finished faults were injected. This evolved design was more

innately fault tolerant than human-designed creations.

Table 3.1: Fault Tolerance in Evolved Hardware - Canham et al.

Circuit Type Number of Faults Number of Failures Failures per Fault

Standard 2443 414 0.169

Fault Tolerant 59267 796 0.0134

Canham et al. found that evolved circuits had more intrinsic fault tolerance. Figure

3.1 shows the difference in fault tolerance between standard and evolved fault tolerant

circuits [48]. This difference is striking - evolved circuits had 10 times the fault tolerance

of human-designed.

The fault tolerance introduced by evolvable hardware can be further improved by adding

adaptable evolutionary fault tolerance systems. These systems continuously monitor the

hardware and reconfigure it in an attempt to mitigate faults whenever a fault is detected.

This is a more costly practice, but may significantly improve fault tolerance in FPGA sys-

tems.

In [5], Larchev et al. present a fault tolerance system using evolvable hardware. The

purpose of this system is to repair FPGAs experiencing hardware faults while they are in

space, far from anybody who can replace the FPGA. Larchev et al. explore the ability

to generate correct circuits using evolvable hardware in the presence of multiple stuck-at

faults in the FPGA hardware. The work focuses on three circuits: a quadrature decoder, a

3-by-3 bit multiplier, a 3-by-3 bit adder, and a 4-to-7 decoder.

Table 3.2: Fault Tolerance Systems - Larchev et al.

Circuit Type Average Initial Average Final Average #

Correctness Correctness Fitness Evals

Quadrature Decoder 76.7% 99.5% 546,226

Multiplier 83.3% 95.83% 4,250,990

Adder 73.4% 94.38% -

Decoder 77.9% 99.2% -

This work demonstrates the ability for evolutionary algorithms to significantly improve

circuits in the presence of faults. Larchev et al. show results that are not perfect, but still

very promising [5].

Recent years have shown an increasing interest among both researchers and the general

public in hardware security. As hardware designers become more conscious of the security

implications of their designs, we should consider what applications evolutionary hardware

has to security and trust. In this section we present two methods to leverage our evolu-

tionary hardware model to improve an application’s security. Mal-Sarkar et al. proposed

a taxonomy of hardware trojans in FPGAs, and we propose methods to mitigate trojans in

FPGA IP and trojans in FPGA hardware that cause circuit malfunction [1].

3.4.1 Applications to Trojans in FPGA IP

As mentioned in earlier sections, Trojans in FPGA IP are a fairly new concept, and as

such there has been little to no work done on methods to mitigate the effects of these

trojans. Including unverified code in security-sensitive applications has not been a reality

of hardware design until fairly recently. Trojans in FPGA IP can be thought of as being

similar to malicious code in software applications. Ultimately, a trojan can manifest itself

in two ways - as either a fault or an unwelcome side effect - introduced in the code used to

program an FPGA.

Because these faults are introduced in the code, a genetic programming-based evolvable

hardware system is aptly equipped to repair them. This idea is promising because of the

great ability of evolutionary algorithms to repair errors in computer software [50], which is

very similar in structure FPGA IP [31]. Hardware designers can take advantage of a variety

of different fault detection techniques, like anomaly detection or triple modular redundancy

(in this case, redundant modules should be purchased from separate vendors) to monitor

for faults in the FPGA system. If a fault is detected, an evolutionary algorithm can be used

to remove the fault from the FPGA code and produce a more correct configuration.

Genetic programming’s applications to FPGA IP Trojan removal are discussed in depth

in the next section.

3.4.2 Applications to Trojans in FPGA Hardware

Trojans may also exist in FPGA hardware, i.e. in the FPGA chip [36, 1, 4, 10, 9]. As

more semiconductor manufacturing and fabrication moves overseas, hardware trojans are

becoming a greater concern to all chip designers. FPGAs are particularly vulnerable to

hardware trojans due to their easy-to-understand layouts compared to other more complex

and diverse chips, like CPUs. Hardware designers would like to know that their designs

will function correctly despite the possibility of trojans or other faults in their hardware.

One might expect a hardware trojan to manifest itself as a fault injection in an FPGA.

In this sense, a hardware trojan manifesting as a circuit malfunction is the same problem as

an accidental circuit malfunction due to a fabrication error or any other cause. Protecting

from hardware trojans can be viewed as a type of fault tolerance. Larchev et al. presented

findings on discovering and working around hardware faults using more traditional evolu-

tionary algorithms [5]. These findings show that evolutionary algorithms can be used to

produce correctly functioning FPGA circuits even in the face of hardware faults.

Hardware designers might expect to be able to use genetic programming in the same

way as these older evolutionary fault tolerance mechanisms. Results in figures 3.7 through

3.10 see that separate evolutions of a piece of hardware (in this case case, the 8 to 1 MUX)

produce a very diverse set of abstract syntax trees. If it can be found that different abstract

syntax trees correspond to significantly different FPGA configurations after running the

circuit through place-and-route and synthesis, then a genetic programming-based evolu-

tionary model will be able to work around hardware faults in the same manner as earlier

evolutionary fault tolerance algorithms that are no longer possible have.

CHAPTER 4

GENETIC PROGRAMMING-BASED EVOLVABLE HARDWARE FOR FPGA

SECURITY

Despite the various Trojan tolerance systems available to hardware designers working with

FPGAs, there is not a single system that can be used to protect against any type of Trojan.

Some Trojans, such as those that leak information, are able to produce their malicious effect

even in the presence of Trojan tolerance systems. Because evolvable hardware has been

used to reconfigure FPGAs to repair hardware faults, it is worthwhile to consider whether a

similar system can be used to repair hardware Trojans in FPGA IP. Genetic programming is

particularly appropriate, because it deals directly with the HDL code delivered to hardware

designers when they purchase 3rd-party IP. Rather than just tolerating Trojans, a system

using genetic programming should be able to completely repair Trojans, removing any

conceivable threat from FPGA IP.

This chapter proposes a novel system titled GENPEFS (GENetic Programming-based

Evolvable FPGA Security) for using genetic programming to remove Trojans from FPGA

IP. This system design and results have been submitted and in [14]. It then analyzes the

effectiveness of this approach on three hardware circuits infected with different types of

Trojans.

4.1 System Design

GENPEFS is a run-time Trojan tolerance extrinsic evolutionary system that uses a proces-

sor capable of FPGA synthesis and place-and-route to evolve a hardware configuration for

an FPGA. This processor continuously monitors the FPGA for any type of Trojan. Rather

than evolving the FPGA bitstream, the GENPEFS system uses genetic programming to pro-

duce a program in a hardware design language, like Verilog or VHDL. A similar approach

has been applied to generating Verilog code from scratch, and has produced promising re-

sults [38, 39]. A system using genetic programming to enable evolvable hardware should

generate a correct Verilog program and synthesize it into an FPGA bitstream using the

FPGA vendors place-and-route and synthesis tools [36]. Removing a Trojan from IP is ul-

timately a software problem, and past work has demonstrated that evolutionary and genetic

algorithms are effective at repairing software [6].

Because 3rd-party IP is often delivered as HDL code, GENPEFS is particularly useful

for removing any Trojans that are detected in IP. If a Trojan is not detected during testing, it

may be difficult to deal with its malicious effects once the FPGA is deployed. Most Trojan

tolerance schemes such as TMR are not equipped to deal with any Trojans whose payloads

are side effects. Even in a triply redundant system, infected FPGAs will still be able to leak

information. We expect GENPEFS to be particularly useful in removal of information-

leaking Trojans in FPGA applications where an FPGA cannot be immediately taken offline

for maintenance. The goal of GENPEFS is to remove Trojans from an infected IP block

at run-time while leaving the desired functionality intact. This approach provides many

advantages over traditional Trojan tolerance schemes. Rather than causing downtime by

bringing an entire FPGA offline, we prefer a solution that allows us to fix the FPGA while

it is deployed.

One difficulty in repairing Trojans in FPGA IP is the relative inability to trigger Trojans

during testing. Trojans are often intentionally hidden by their designers, and very difficult

to trigger during testing due to rare triggers. Even if a Trojan is caught at run-time, it may

be nearly impossible to recreate the conditions at test-time that triggered the Trojan, and

the Trojan may never be seen again. It would be useful to maintain a queue of candidate

FPGA configurations that have passed all tests after being evolved in the GENPEFS system.

Whenever a Trojan is detected at run-time, the system should mark the configuration as

infected, and replace it with a candidate system from the queue. It is worth considering

that smaller ASTs that consistently produce correct results might be Trojan-free, as they

have removed something from the tree that is rarely triggered.

Figure 4.1: GENPEFS System Design

Figure 4.1 shows a system diagram for an FPGA using GENPEFS. A genetic engine

continuously evolves FPGA configurations. All configurations that pass all test benches

in simulation are added to a queue of candidate configurations. A monitoring application

watches the FPGA for Trojans. Whenever a Trojan is detected by the monitoring system,

it prompts the synthesis of a candidate configuration. When the configuration is synthe-

sized, the FPGA is reprogrammed with the new configuration. The monitoring application

continues to watch the FPGA, and the process repeats as necessary.

A genetic programming-based approach has many advantages over traditional evolu-

tionary hardware models. It is guaranteed to produce valid FPGA configurations, and the

author of an evolutionary algorithm need not add restrictions to the algorithm to cause it to

produce valid configurations. We can expect such an approach to produce correct circuits

faster than evolving a bitstream due to the simplified and abstracted nature of HDLs. For

example, an adder can be represented in Verilog code using 5 nodes in an abstract syntax

tree (AST) one assignment, one addition, two operands, and a destination for the assign-

ment. In contrast, a creating 4-bit adder in a bitstream requires configuring at least 256

lookup tables or LUTs. However, its ability to produce efficient circuits is limited by the

capabilities of the FPGA vendors synthesis tools [32].

4.2 Experimental Approach

To demonstrate the effectiveness of GENPEFS, this work uses the DEAP genetic program-

ming framework [42] to produce valid Veriog programs using operators that are available

in Verilog. Initial fitness evaluation - testing combinational logic - is performed within the

DEAP simulator. When a fit individual is believed to have been created, the individual is

translated to Verilog and tested in the Icarus Verilog simulator [43]. The genetic algorithm

represents programs as an abstract syntax tree (AST) as seen in Figure 4.2. The AST using

boolean logic operations can be thought of as being similar to a digital circuit. Of course,

ASTs may also contain integer, floating point, etc. operators like + and /, and more complex

programming structures such as if statements and loops.

Figure 4.2: AST of correct 4 to 1 MUX

To demonstrate Trojan removal, this work analyzes removing Trojans from multiplex-

ers, fairly simple circuits. First this work uses a 4 to 1 MUX. An example of a program that

can be used to evolve a MUX in python from scratch is provided in the DEAP documenta-

tion and was originally explored by Koza in [45]. This work uses a population size of 1000,

random mutation with a probability of 0.3, and single-point crossover with a probability of

0.8. This work uses a two-parameter fitness function that measures program correctness as

the primary parameter and minimizes AST size as the secondary parameter. A hard AST

depth limit of 80 is imposed on the algorithm.

This process is used to measure the effectiveness of the GENPEFS Trojan removal

strategy:

1. Initialize a random population of hardware circuits

2. Evolve the population until there is a correct MUX in the population

3. Randomly insert a fault-injecting Trojan into the circuit

4. Replace the entire population with this individual

5. Continue evolving the circuit until the population has a correct individual

6. Measure the number of generations after inserting a Trojan it takes to have a com-

pletely correct individual in the population

The Trojans inserted into the circuit are randomly chosen from a variety of fault-

injecting Trojans. Some of these Trojans are transient (making them harder to detect),

while some of them always produce the same incorrect result. Plots are graphed using

Matplotlib [51].

4.3 Results

Results are shown in histograms, categorizing into bins of fewer than 1000, 2000, 4000,

8000, and more than 8000 fitness evaluations. 1000 fitness evaluations corresponds to one

generation. The results show that in a 4 to 1 MUX, the algorithm successfully removed

the Trojan from the circuit in under 1000 fitness evaluations in most of the runs. All of the

remaining runs took under 8000 fitness evaluations.

Figure 4.3: 4 to 1 MUX Number of Generations To Remove Trojans

Next, the same tests are retried instead using an 8 to 1 MUX. An 8 to 1 MUX has 2048

possible input combinations, as opposed to the 4 to 1 MUX’s 64. The 8 to 1 MUX also has

a significantly larger AST, as can be seen in Figure 4.4. This helps demonstrate that the

GENPEFS approach can scale to larger circuits.

Figure 4.4: AST of correct 8 to 1 MUX

Figure 4.5: 8 to 1 MUX Number of Generations To Remove Trojans

Removing a Trojan from the AST of an 8 to 1 MUX took approximately the same

number of generations as the 4 to 1 MUX. More Trojans took more than 4000 fitness

evaluations to remove. This is consistent with the idea that more complicated ASTs take

longer to remove Trojans from, but it is promising that the results look mostly the same.

Next, we run the same tests on an infected AES encryption module built using PyRTL

[52]. An AES encryption module is an example of an actual IP that hardware designers

may choose to purchase. It provides a much more realistic example of an IP that may be

infected. We can expect an encryption module to be a desirable IP to attack, as encryption

modules process large amounts of secret information. An infected encryption module may

allow attackers access to valuable information.

This module is infected with one Trojan that leaks information, a type of Trojan not

tested against GENPEFS yet (and one that that is unaffected by many Trojan tolerance

schemes). Each time the encryption module is used, the Trojan sends the secret to be

encrypted over a serial communication port, similar to the Trojan introduced in Figure 2.

This AES encryption module is a significantly more complex IP than either MUX, so it

should be expected that removal will take somewhat longer. The goal is to remove the

Trojans that leak the key and secret, while leaving the rest of the module intact. The trojan

is considered removed when the encryption module still passes the PyRTL test cases [52],

and the information leaking Trojan has been removed entirely from the circuit.

In the AES core with a single Trojan (Figure 4.6), removal was actually faster than

in the 8 to 1 MUX on average. This may be due to the all-or-nothing fitness of a more

complicated core. Even when there is an error in a MUX, it will evaluate many of the inputs

correctly. In the far more complicated AES core, a small change to the logic will make

every result incorrect. This means that the selection function will choose only completely

correct individuals for the next generation, and will tend to keep fewer of the mutations

between generations. Because misleading, almost-correct results are no longer likely, the

algorithm never gets stuck for a long time.

Figure 4.6: AES Encryption Module Number of Fitness Evaluations To Remove One Tro- jan

Finally the same tests are retried on the same AES core, this time infected with two Tro-

jans instead of just one. This test should help determine whether the approach is scalable

to systems with more than one Trojan present. Results are in Figure 4.7.

Although Trojan removal took longer than in either of the MUXes and in the AES

core with one Trojan, still the IP was Trojan-free after 8000 fitness evaluations in all but a

few runs. No runs were able to eliminate both Trojans in under 1000 fitness evaluations, or

within one generation. This is due to the nature of single point crossover. If the two Trojans

are at different points in the abstract syntax tree - the second Trojan is not within the first

Trojan’s subtree - then it is impossible for single point crossover to remove both Trojans

from the circuit in one operation. Future work might consider adding additional mutation

and crossover operations, such as two point crossover or a tree shrinking operation, to

examine whether those operations are able to more quickly eliminate Trojans from a circuit.

In the AES core, there was no difference between checking the structure of the circuit to

Figure 4.7: AES Encryption Module Number of Fitness Evaluations To Remove Two Tro- jans

make sure it was intact, and performing logic testing on the circuit. In other words, the only

situation in which the algorithm produced a correct individual was when the algorithm had

removed the Trojan and made no other changes to the circuit. The algorithm was not able

to make any changes to the rest of the circuit without breaking some of its functionality.

This leads to the idea that any time a component is removed from the circuit and it still

passes test benches, the removed component was unnecessary logic, possibly a Trojan.

These results demonstrate the value of genetic programming in repairing Trojans in

FPGA device IP, and especially in removing information-leaking Trojans. While most Tro-

jan tolerance strategies are unable to deal with information-leaking Trojans, GENPEFS is

able to remove such a Trojan with relative ease, returning the FPGA system to normal activ-

ity without human intervention. The ability to remove Trojans from FPGA IP should give

hardware designers confidence that their designs will be returned to normal operation after

a Trojan is detected. Future work should develop a metric for determining confidence in the

removal of a Trojan from a circuit, potentially using AST diversity or number of subcircuits

removed from the design to measure whether a Trojan might have been removed.

CHAPTER 5

CONCLUSION

This work has provided a novel classification of hardware Trojans in FPGA IP, based on

past work done in creating hardware Trojans in Verilog. This classification will allow

hardware designers to consider all possible types of Trojans when designing hardware. A

comprehensive classification helps assue hardware designers that they have covered all of

their bases, so to speak.

Future work in hardware Trojan classification will involve finding or creating new types

of hardware Trojans, particularly inspired by different types of Trojans found in software.

Although the proposed taxonomy is meant to cover all current types of Trojans, more work

is needed especially in examining Trojans that introduce other vulnerabilities into the sys-

tem. In order to more completely cover those Trojans, an analysis of different types of

vulnerabilities in FPGA systems is required.

This work has also presented GENPEFS, a novel approach to Trojan tolerance built

around using genetic programming to repair HDL code. This proposed approach to FPGA

security is novel in that it attempts to remove Trojans from the FPGA rather than just

working around them. Additionally, no past work has explored using genetic programming

to continuously monitor and repair FPGAs. This idea was largely inspired by the fault-

tolerant FPGA system proposed by Larchev et al. in [5]. Rather than just using evolvable

hardware to mitigate hardware faults, GENPEFS uses it to tolerate and remove Trojans

from an FPGA system. This can be useful when dealing with Trojans that are not tolerated

by standard Trojan tolerance strategies, such as Trojans that leak information or Trojans

that waste FPGA resources.

The experimental results of GENPEFS are very promising. GENPEFS has been shown

to be effective in small circuits, and appropriately scalable to the much larger, more prac-

tical AES core. Even in the complex AES core, two Trojans were removed within 8,000

fitness evaluations in most cases. This is fairly quick, and much faster than generating

even extremely simple circuits from scratch [39, 38]. So, it is more effective for hardware

designers to monitor and repair hardware designs purchased from 3rd-parties than it is to

attempt to generate them using evolvable hardware.

Future work in improving GENPEFS should involve examining even larger FPGA IP

blocks, like an entire design that makes use of encryption, multiple communication al-

gorithms, and many control structures. Demonstrating the usefulness of GENPEFS in

even larger systems should provide confidence in its ability to monitor complex real-world

FPGA systems.

Another direction for future work in GENPEFS is tuning the evolutionary parameters

to more quickly remove Trojans from the system. In all examined circuits, most runs

had the Trojans being removed within 10 generations, and often within 5. Avoiding the

outliers by tuning parameters like crossover and mutation probability, and potentially using

minimizing AST size as a metric for removing Trojans, are interesting future directions.

Finally, future work should examine the usefulness of GENPEFS in mitigating hard-

ware faults and hardware Trojans. Although GENPEFS evolves the AST, it is possible

that modifying the AST while leaving funcitonality intact will produce a differently-routed

FPGA configuration, avoiding Trojans or faults in hardware. It would be equally interest-

ing to examine how generating logic from scratch creates diverse routings after synthesis.

It is possible that producing diverse ASTs corresponds to diverse routings, meaning even if

a Trojan affects one generated circuit, another generated circuit might be immune. Exam-

ining how this diversity occurs and coming up with a diversiy metric for generated FPGA

circuitry is an interesting direction for future work.

REFERENCES

[1] S. Mal-Sarkar, A. Krishna, A. Ghosh, and S. Bhunia, “Hardware trojan attacks in FPGA devices,” Proc. the great lakes symposium on VLSI (GLSVLSI), pp. 287–292, 2014.

[2] Y. Li and K. Skadron, “TMR : A Solution for Hardware Security Designs,” 2015.

[3] A. Alanwar, M. A. Aboelnaga, Y. Alkabani, M. W. El-Kharashi, and H. Bedour, Dynamic fpga detection and protection of hardware trojan: A comparative analysis, 2017. arXiv: 1711.01010 [cs.CR].

[4] S. Mal-Sarkar, R. Karam, S. Narasimhan, A. Ghosh, A. Krishna, and S. Bhunia, “De- sign and Validation for FPGA Trust under Hardware Trojan Attacks,” IEEE Trans- actions on Multi-Scale Computing Systems, vol. 2, no. 3, pp. 186–198, 2016.

[5] G. V. Larchev and J. D. Lohn, “Evolutionary Based Techniques for Fault Tolerant Field Programmable Gate Arrays,” 2nd IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT’06), pp. 314–321, 2006.

[6] A. Arcuri, “Evolutionary repair of faulty software,” Applied Soft Computing Journal, vol. 11, no. 4, pp. 3494–3514, 2011.

[7] F. Wolff, C. Papachristou, S. Bhunia, and R. S. Chakraborty, “Towards trojan-free trusted ics: Problem analysis and detection scheme,” in 2008 Design, Automation and Test in Europe, 2008, pp. 1362–1365.

[8] X. Wang, M. Tehranipoor, and J. Plusquellic, “Detecting malicious inclusions in secure hardware: Challenges and solutions,” in 2008 IEEE International Workshop on Hardware-Oriented Security and Trust, 2008, pp. 15–19.

[9] M. Tehranipoor and F. Koushanfar, “A survey of hardware trojan taxonomy and detection,” IEEE Design Test of Computers, vol. 27, no. 1, pp. 10–25, 2010.

[10] R. S. Chakraborty, S. Narasimhan, and S. Bhunia, “Hardware trojan: Threats and emerging solutions,” in 2009 IEEE International High Level Design Validation and Test Workshop, 2009, pp. 166–171.

[11] H. Momeni, M. Masoumi, and A. Dehghan, “A practical fault induction attack against an fpga implementation of aes cryptosystem,” in World Congress on Internet Secu- rity (WorldCIS-2013), 2013, pp. 134–138.

http://arxiv.org/abs/1711.01010

[12] R. Karri, K. Wu, P. Mishra, and Y. Kim, “Concurrent error detection of fault-based side-channel cryptanalysis of 128-bit symmetric block ciphers,” in Proceedings of the 38th Annual Design Automation Conference, ser. DAC ’01, Las Vegas, Nevada, USA: ACM, 2001, pp. 579–584, ISBN: 1-58113-297-2.

[13] J. Breier and W. He, “Multiple fault attack on present with a hardware trojan im- plementation in fpga,” in 2015 International Workshop on Secure Internet of Things (SIoT), 2015, pp. 58–64.

[14] D. K.A. R. Z. Collins R. Jha, Submitted in ieee transactions on computer-aided design of intergated circuits and systems.

[15] I. Hadzic, S. Udani, and J. M. Smith, “Fpga viruses,” in Proceedings of the 9th International Workshop on Field-Programmable Logic and Applications, ser. FPL ’99, London, UK, UK: Springer-Verlag, 1999, pp. 291–300, ISBN: 3-540-66457-2.

[16] R. Druyer, L. Torres, P. Benoit, P. V. Bonzom, and P. Le-Quere, “A survey on security features in modern fpgas,” in 2015 10th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2015, pp. 1–8.

[17] L. Kafka, P. Kubalı́k, H. Kubatova, and E. Novak, “Fault classification for self- checking circuits implemented in fpga,” 2005.

[18] E. Netto, R Vaslin, J Crenne, P. Cotret, G. Gogniat, J.-P. Diguet, J.-L. Danger, P. Maurine, V Fischer, B. Badrignans, L Barthe, P Benoit, and L. Torres, “Security fpga analysis,” in. Jan. 2011, pp. 7–46, ISBN: 978-94-007-1338-3.

[19] M. Antonakakis, T. April, M. Bailey, M. Bernhard, E. Bursztein, J. Cochran, Z. Durumeric, J. A. Halderman, L. Invernizzi, M. Kallitsis, D. Kumar, C. Lever, Z. Ma, J. Mason, D. Menscher, C. Seaman, N. Sullivan, K. Thomas, and Y. Zhou, “Understanding the mirai botnet,” in Proceedings of the 26th USENIX Conference on Security Symposium, ser. SEC’17, Vancouver, BC, Canada: USENIX Association, 2017, pp. 1093–1110, ISBN: 978-1-931971-40-9.

[20] K. Ganesan, J. Jo, W. L. Bircher, D. Kaseridis, Z. Yu, and L. K. John, “System- level max power (sympo) - a systematic approach for escalating system-level power consumption using synthetic benchmarks,” in 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2010, pp. 19–28.

[21] A. Tripathi and U. K. Singh, “Towards standardization of vulnerability taxonomy,” in 2010 2nd International Conference on Computer Technology and Development, 2010, pp. 379–384.

[22] S. Bhunia, M. S. Hsiao, M. Banga, and S. Narasimhan, “Hardware trojan attacks: Threat analysis and countermeasures,” Proceedings of the IEEE, vol. 102, pp. 1229– 1247, 2014.

[23] S. Bhunia, M. Abramovici, D. Agrawal, P. Bradley, M. S. Hsiao, J. F. Plusquel- lic, and M. M. Tehranipoor, “Protection against hardware trojan attacks: Towards a comprehensive solution,” IEEE Design & Test, vol. 30, pp. 6–17, 2013.

[24] I. Chipworks, Semiconductor manufacturing - reverse engineering of semiconductor components, parts and process, http://chipworks.com, Accessed: 2019-02- 12.

[25] R. S. Chakraborty, F. Wolff, S. Paul, C. Papachristou, and S. Bhunia, “Mero: A statistical approach for hardware trojan detection,” in Cryptographic Hardware and Embedded Systems - CHES 2009, C. Clavier and K. Gaj, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 396–410, ISBN: 978-3-642-04138-9.

[26] J. Aarestad, D. Acharyya, R. Rad, and J. Plusquellic, “Detecting trojans through leakage current analysis using multiple supply pad iddqs,” Trans. Info. For. Sec., vol. 5, no. 4, pp. 893–904, Dec. 2010.

[27] D. Rai and J. Lach, “Performance of delay-based trojan detection techniques under parameter variations,” in Proceedings of the 2009 IEEE International Workshop on Hardware-Oriented Security and Trust, ser. HST ’09, Washington, DC, USA: IEEE Computer Society, 2009, pp. 58–65, ISBN: 978-1-4244-4805-0.

[28] T. Reece, D. Limbrick, and W. Robinson, “Design comparison to identify malicious hardware in external intellectual property,” Nov. 2011, pp. 639–646.

[29] E. Love, Y. Jin, and Y. Makris, “Proof-carrying hardware intellectual property: A pathway to trusted module acquisition,” Trans. Info. For. Sec., vol. 7, no. 1, pp. 25– 40, Feb. 2012.

[30] S.-M. Yen and M. Joye, “Checking before output may not be enough against fault- based cryptanalysis,” IEEE Trans. Computers, vol. 49, pp. 967–970, 2000.

[31] J. R. Koza, “Human-competitive results produced by genetic programming,” Genetic Programming and Evolvable Machines, vol. 11, no. 3-4, pp. 251–284, 2010.

[32] A. Thompson, “An evolved circuit, intrinsic in silicon, entwined with physics,” in Evolvable Systems: From Biology to Hardware, T. Higuchi, M. Iwata, and W. Liu, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 1997, pp. 390–405, ISBN: 978- 3-540-69204-1.

http://chipworks.com

[33] P. C. Haddow and A. M. Tyrrell, “Evolvable hardware challenges: Past, present and the path to a promising future,” in Inspired by Nature: Essays Presented to Julian F. Miller on the Occasion of his 60th Birthday, S. Stepney and A. Adamatzky, Eds. Cham: Springer International Publishing, 2018, pp. 3–37, ISBN: 978-3-319-67997-6.

[34] G. Barlow and M. A. Edwards, “Self-evolving hardware,” Apr. 2004.

[35] D. Levi and S. A. Guccione, “Geneticfpga: Evolving stable circuits on mainstream fpga devices,” in Proceedings of the First NASA/DoD Workshop on Evolvable Hard- ware, 1999, pp. 12–17.

[36] L. Sekanina, “Evolutionary hardware design,” Proceedings of SPIE - The Interna- tional Society for Optical Engineering, vol. 8067, May 2011.

[37] R. Salvador, “Evolvable hardware in fpgas: Embedded tutorial,” in 2016 Interna- tional Conference on Design and Technology of Integrated Systems in Nanoscale Era (DTIS), 2016, pp. 1–6.

[38] J. Cullen, “Evolutionary meta programming,” GEC ’09: Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation, pp. 81–88, 2009.

[39] U. R. Karpuzcu, “Automatic verilog code generation through grammatical evolu- tion,” in Proceedings of the 7th Annual Workshop on Genetic and Evolutionary Com- putation, ser. GECCO ’05, Washington, D.C.: ACM, 2005, pp. 394–397.

[40] M. O’Neill and C. Ryan, “Grammatical evolution,” IEEE Transactions on Evolu- tionary Computation, vol. 5, no. 4, pp. 349–358, 2001.

[41] C. Ryan, J. Collins, and M. O. Neill, “Grammatical evolution: Evolving programs for an arbitrary language,” in Genetic Programming, W. Banzhaf, R. Poli, M. Schoe- nauer, and T. C. Fogarty, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 1998, pp. 83–96, ISBN: 978-3-540-69758-9.

[42] F.-M. De Rainville, F.-A. Fortin, M.-A. Gardner, M. Parizeau, and C. Gagné, “Deap: A python framework for evolutionary algorithms,” in Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation, ser. GECCO ’12, Philadelphia, Pennsylvania, USA: ACM, 2012, pp. 85–92, ISBN: 978-1-4503- 1178-6.

[43] Icarus verilog, http://iverilog.icarus.com/home, Accessed: 2019-02- 12.

[44] Multiplexer 3-8 problem, https://deap.readthedocs.io/en/master/ examples/gp_multiplexer.html, Accessed: 2019-02-13.

http://iverilog.icarus.com/home

https://deap.readthedocs.io/en/master/examples/gp_multiplexer.html

[45] J. R. Koza, Genetic programming: On the programming of computers by means of natural selection. Cambridge, MA, USA: MIT Press, 1992, ISBN: 0-262-11170-5.

[46] D. A. Schult, “Exploring network structure, dynamics, and function using networkx,” in In Proceedings of the 7th Python in Science Conference (SciPy, 2008, pp. 11–15.

[47] D. Mange and M. Tomassini, Bio-inspired computing machines: Towards novel com- putational architectures. PPUR Presses Polytechniques, 1998.

[48] R. O. Canham and A. M. Tyrrell, “Evolved fault tolerance in evolvable hardware,” in Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No.02TH8600), vol. 2, 2002, 1267–1271 vol.2.

[49] L. Ionescu, A. Mazare, and G. Şerban, “Intrinsic evolvable hardware used for fault tolerance systems,” Int. J. Organ. Collect. Intell., vol. 3, no. 2, pp. 43–80, Apr. 2012.

[50] W. Weimer, S. Forrest, C. Le Goues, and T. Nguyen, “Automatic program repair with evolutionary computation,” Commun. ACM, vol. 53, no. 5, pp. 109–116, May 2010.

[51] J. D. Hunter, “Matplotlib: A 2d graphics environment,” Computing In Science & Engineering, vol. 9, no. 3, pp. 90–95, 2007.

[52] Pyrtl, https://pyrtl.readthedocs.io/en/latest/, Accessed: 2019- 03-01.

https://pyrtl.readthedocs.io/en/latest/

Title Page
Acknowledgments
Table of Contents
List of Tables
List of Figures
Introduction
Hardware Trojans

Taxonomies
Taxonomy of Trojans in FPGA IP

Trojans that Cause Malfunction
Trojans that Prevent FPGA Operation
Trojans that Inject Faults
Trojans that Cause Side Effects
Trojans that Leak Information
Trojans that Waste FPGA Resources
Trojans that Introduce Vulnerabilities

Existing Trojan Mitigation Strategies and FPGA IP

Trojan Detection Techniques
Trojan Tolerance Techniques

Evolutionary Algorithms and Evolvable Hardware

Evolutionary and Genetic Programming
Evolvable Hardware In FPGAs
Genetic Programming-based Evolvable Hardware

Background and Justification
Past Work
Preliminary Results

Trust-Oriented Applications of Evolvable Hardware

Applications to Trojans in FPGA IP
Applications to Trojans in FPGA Hardware

Genetic Programming-based Evolvable Hardware for FPGA Security

System Design
Experimental Approach
Results

Conclusion
References

sources/142/Silwal - 2013 - Asynchronous Physical Unclonable Function using FP.pdf

A Thesis

entitled

Asynchronous Physical Unclonable Function using FPGA-based Self-Timed Ring

Oscillator

Roshan Silwal

Submitted to the Graduate Faculty as partial fulfillment of the requirements for the

Master of Science Degree in Electrical Engineering

_________________________________________

Dr. Mohammed Y Niamat, Committee Chair

_________________________________________

Dr. Robert C. Green II, Committee Member

_________________________________________

Dr. Weiqing Sun, Committee Member

_________________________________________

Dr. Patricia R. Komuniecki, Dean

College of Graduate Studies

The University of Toledo

August 2013

This document is copyrighted material. Under copyright law, no parts of this document

may be reproduced without the expressed permission of the author.

iii

An Abstract of

Asynchronous Physical Unclonable Function using FPGA-based Self-Timed Ring

Oscillator

Roshan Silwal

Submitted to the Graduate Faculty as partial fulfillment of the requirements for the

Master of Science Degree in Electrical Engineering

The University of Toledo

August 2013

Field Programmable Gate Array (FPGA) security has emerged as a challenging

security paradigm in system design. Systems implemented on FPGAs require secure

operations and communication. There is a growing concern over the security attributes of

FPGAs regarding protecting and securing information processed within them, protecting

designs during distribution and protecting intellectual property rights. One of the

important aspects of improving the trustworthiness level of FPGAs is enhancing the

physical security of FPGAs. A Physical Unclonable Function (PUF) provides a means to

enhance physical security of Integrated Circuits (ICs) against piracy and unauthorized

access. PUFs exploit the inherent and embedded randomness that occurs during the

fabrication process of silicon devices.

This thesis presents a novel FPGA-based PUF design technique using

asynchronous logic. Significant process variations exist in IC fabrication, which makes

each IC unique in its delay characteristics. The statistical delay variation in transistors

and wires across FPGA chips is exploited through identically laid-out asynchronous ring

oscillators. The asynchronous ring oscillators generate oscillations of varying frequencies

when the oscillators are identically mapped on a semiconductor device. These varying

frequencies produced by identically mapped self-timed ring oscillators are used to

generate unique PUF response bits, which are used in device authentication and

cryptographic applications such as generating secret keys and True Random Number

Generator (TRNG). Experimental analysis shows that asynchronous oscillators of PUFs

generate oscillations of varying frequencies, and the uniqueness for the PUF responses is

49.92%, which is very close to the desired 50% factor.

This thesis is dedicated to my parents, my sisters and my lovely wife.

Acknowledgements

I would like to express my deep sense of gratitude to my thesis supervisor, Dr.

Mohammed Niamat, for giving me an opportunity to work with him in this research and

providing me a tremendous level of support and cooperation throughout my research

work and graduate studies.

I would also like to thank the thesis committee member Dr. Robert C. Green II

and Dr. Weiqing Sun for their valuable time in reviewing this thesis.

The research work in this thesis was supported in part by National Science

Foundation (NSF) grant award #203687.

vii

Table of Contents

Abstract ............................................................................................................................. iii

Acknowledgements .......................................................................................................... vi

Table of Contents ............................................................................................................ vii

List of Tables .................................................................................................................. xi

List of Figures .................................................................................................................. xii

List of Abbreviations .......................................................................................................xv

List of Symbols .............................................................................................................. xvii

1 Introduction .......................................................................................................1

1.1 Context and Motivation ....................................................................................1

1.2 Contributions………..........................................................................................3

1.3 Thesis Outline ....................................................................................................4

2 Physical Unclonable Functions .............................................................................5

2.1 Introduction .......................................................................................................5

2.2 PUF Terminologies ............................................................................................6

2.2.1 Significance of Process Variations .....................................................6

2.2.2 Environmental Variations ...................................................................7

2.2.3 Challenge-Response Pairs ...................................................................8

2.3 Sources of Noise ...............................................................................................8

2.3.1 Noise due to Manufacturing Process ..................................................8

viii

2.3.2 Local Noise .........................................................................................8

2.3.3 Environmental Noise ..........................................................................9

2.4 Measure of Quality ............................................................................................9

2.4.1 Uniqueness ..........................................................................................9

2.4.2 Reliability ..........................................................................................10

2.4.3 Resiliency ..........................................................................................11

2.5 PUF Classifications ..........................................................................................11

2.5.1 Non-electronic PUF, Electronic PUF and Silicon PUF ....................12

2.5.2 Strong PUF and Weak PUF ..............................................................13

2.5.3 Intrinsic PUF and Non-intrinsic PUF ...............................................13

2.6 PUF Circuits.....................................................................................................14

2.6.1 Delay-based PUF ..............................................................................14

2.6.1.1 Arbiter PUF ........................................................................14

2.6.1.2 Ring Oscillator PUF ...........................................................15

2.6.1.3 Glitch PUF .........................................................................17

2.6.2 Memory-based PUF ..........................................................................18

2.6.2.1 SRAM PUF ........................................................................18

2.6.2.2 Butterfly PUF .....................................................................19

2.7 PUF Applications .............................................................................................20

3 Self-Timed Rings ..................................................................................................24

3.1 Introduction .....................................................................................................24

3.2 Asynchronous Circuits .....................................................................................24

3.3 Asynchronous Logic .......................................................................................25

3.3.1 Muller C-element ..............................................................................27

3.4 Self-Timed Rings .............................................................................................28

3.4.1 Self-Timed Ring Structure ................................................................28

3.4.2 Token and Bubble Propagation ........................................................30

3.4.3 Jitter in Inverter RO and Self-Timed RO ..........................................31

4 Asynchronous Approach to Ring Oscillator for FPGA-based PUF Design ...33

4.1 Introduction .....................................................................................................33

4.2 FPGA Architecture ..........................................................................................34

4.2.1 Architecture of Spartan-II .................................................................35

4.3 LUT Implementation of Muller Gate ...............................................................36

4.4 Logical Implementation of a Self-Timed Ring Oscillator ...............................39

4.5 Experimental Results .......................................................................................42

4.6 Conclusion .......................................................................................................44

5 STRO-PUF: Self-Timed Ring Oscillator based PUF ........................................45

5.1 Introduction .....................................................................................................45

5.2 Architecture of STRO-PUF .............................................................................46

5.3 Implementation of STRO-PUF .......................................................................47

5.4 Experimental Analysis .....................................................................................50

5.4.1 Analysis of Output Frequencies ........................................................51

5.4.2 Analysis of Uniqueness of STRO-PUF ............................................54

5.4.3 FPGA Authentication using STRO-PUF ..........................................59

5.4.4 Reliability Enhancement with STRO-PUF .......................................60

5.5 Conclusion ......................................................................................................61

6 Conclusion .......................................................................................................63

6.1 Conclusion .......................................................................................................63

6.2 Future Directions .............................................................................................64

References .........................................................................................................................66

A Source Codes .......................................................................................................73

A.1 VHDL Code for a Self-Timed Ring (STR) .....................................................73

A.2 VHDL Code for STRO-PUF...........................................................................78

A.3 UCF File for Mapping STRO-PUF in a Desired Region ................................83

A.4 Uniqueness Analysis of STRO-PUF for 16-bit Response ..............................88

A.5 Uniqueness Analysis of STRO-PUF for 256-bit Response ............................92

List of Tables

2.1 Different types of PUFs .........................................................................................11

4.1 LUT mapping of reset Muller gate ........................................................................38

4.2 LUT mapping of set Muller gate ...........................................................................38

4.3 Frequency values for implemented asynchronous ring oscillators ........................44

5.1 16-bit STRO-PUF responses..................................................................................55

5.2 256-bit STRO-PUF responses................................................................................57

5.3 Comparing responses with dependent bits and independent bits...........................58

5.4 Uniqueness results for FPGA-based PUFs ............................................................58

xii

List of Figures

2-1 Optical PUF ...........................................................................................................12

2-2 An Arbiter PUF delay circuit .................................................................................15

2-3 Ring Oscillator PUF ...............................................................................................16

2-4 RO-PUF generating a single response bit ..............................................................16

2-5 Anderson PUF ........................................................................................................18

2-6 SRAM Cell.............................................................................................................18

2-7 Butterfly PUF cell ..................................................................................................19

2-8 Secret key generation using PUF ...........................................................................21

2-9 HRNG using PUF ..................................................................................................22

3-1 Synchronous circuit ...............................................................................................26

3-2 Asynchronous circuit .............................................................................................26

3-3 Abstract data-flow view of an asynchronous circuit..............................................26

3-4 Standard Muller gate and its truth table .................................................................28

3-5 Implementations of Muller C-element ...................................................................29

3-6 Three stage pipeline and ring .................................................................................29

3-7 An N-stage self-timed ring.....................................................................................29

3-8 Token-bubble propagation .....................................................................................31

3-9 Burst mode propagation and evenly-spaced mode propagation ............................31

4-1 A typical FPGA architecture ..................................................................................34

xiii

4-2 Structure of a typical logic block ...........................................................................35

4-3 Spartan-II slice .......................................................................................................36

4-4 A stage in STR .......................................................................................................37

4-5 VHDL instantiation of reset Muller gate ...............................................................39

4-6 LUT-based four-stage asynchronous ring oscillator ..............................................40

4-7 Technology schematic view of 6-stage self-timed ring oscillator .........................41

4-8 Implementation of 6-stage self-timed ring oscillator .............................................42

4-9 Placement constraint used to define position of stages of self-timed ring .............42

4-10 Simulation result of 6-stage STR oscillator with TTBBBB configuration ............43

4-11 Simulation result of 6-stage STR oscillator with TTTTBB configuration ............43

4-12 Real output of 6-stage STR oscillator with TTBBBB ...........................................43

5-1 Architecture of the proposed STRO-PUF ..............................................................47

5-2 Six-stage asynchronous ring oscillator ..................................................................48

5-3 Hard-macro implemented as 6-stage asynchronous ring oscillator .......................48

5-4 Layout view of an STRO-PUF implemented.........................................................49

5-5 Portion of an STRO-PUF in FPGA Editor ............................................................49

5-6 PUFs mapped on six different regions ...................................................................50

5-7 PUF outputs in initialization mode and oscillation mode ......................................51

5-8 Simulation result of STRO-PUF output frequencies .............................................52

5-9 Portion of STRO-PUF output frequencies in a logic analyzer ..............................53

5-10 Distribution of frequencies generated by asynchronous ring oscillator .................53

5-11 Uniqueness Analysis for 16-bit PUF response ......................................................55

5-12 Uniqueness Analysis for 256-bit PUF response ....................................................56

xiv

5-13 FPGA authentication using STRO-PUF ...............................................................59

5-14 Effect of temperature and voltage on oscillator frequencies .................................61

List of Abbreviations

ASIC ..........................Application Specific Integrated Circuits

BPUF..........................Butterfly PUF

CLB ............................Configurable Logic Block

CLK............................Clock

CRP ............................Challenge-Response Pair

ECC ............................Error Correcting Code

EDA ...........................Electronic Design Automation

EMI ............................Electro-Magnetic Interference

ERAI ..........................Electronics Resellers Association International

FF ...............................Flip-Flop

FPGA .........................Field Programmable Gate Array

HD ..............................Hamming Distance

HRNG ........................Hardware Random Number Generator

I/O ..............................Input / Output

IC................................Integrated Circuit

IP ................................Intellectual Property

IRO .............................Inverter Ring Oscillator

ITRS ...........................International Technology Roadmap for Semiconductors

LAB............................Logic Array Block

LC ..............................Logic Cell

LE ...............................Logic Element

LUT ............................Look-Up-Table

MUX ..........................Multiplexer

NIST ...........................National Institute of Standards and Technology

OEM ...........................Original Equipment Manufacturer

xvi

PDF ............................Probability Density Function

PMF............................Probability Mass Function

PUF ............................Physical Unclonable Function

RFID ..........................Radio Frequency Identification

RO ..............................Ring Oscillator

RO-PUF .....................Ring Oscillator based Physical Unclonable Function

RTL ............................Register Transfer Level

SR ...............................Set / Reset

SRAM ........................Static Random Access Memory

STR ............................Self-Timed Ring

STRO .........................Self-Timed Ring Oscillator

STRO-PUF .................Self-Timed Ring Oscillator based Physical Unclonable Function

TRNG .........................True Random Number Generator

UCF ............................User Constraint File

VHDL ........................VHSIC Hardware Description Language

VLSI ...........................Very Large Scale Integration

xvii

List of Symbols

ack ..............................acknowledge signal

B .................................Bubble

C .................................Muller C-element of Muller gate

F .................................Forward input of Muller gate

f ..................................frequency

MHz ...........................Mega-Hertz

N .................................Number of stages in a ring oscillator

NB ...............................Number of bubbles

ns ................................nano-seconds

NT ...............................Number of tokens

Q .................................Current output state of Muller gate

Q’ ...............................previous output state of Muller gate

R .................................Reverse input of Muller gate

R’i ...............................Response bit from chip i in different environmental conditions

R’i,y .............................y th

sample of R’i Ri ................................Response bit from chip i

SR ...............................Set/Reset Signal

T .................................Token

TV ...............................Target value

Chapter 1

Introduction

1.1 Context and Motivation

FPGAs are being increasingly used in products and systems of all kinds; FPGAs

often form the core of any system. FPGAs are dominating a wide range of application

areas including military, defense, space, automotive and consumer electronics. This rise

in both the usage and importance of FPGAs in systems makes protecting the IP contained

in FPGAs as important as protecting the data processed by the FPGA. There has been a

growing concern over the security attributes of FPGAs regarding protecting and securing

information processed within them, protecting designs during distribution and protecting

intellectual property rights [1]. The design security is often thought of in terms of

protecting Intellectual Property (IP); however, potential losses extend beyond just the

financial. With the increasing use of programmable logic beyond commercial markets to

avionic, space and military applications, design security takes on the additional aspects of

safety and national security.

As FPGAs are being used in more applications that require security features,

attackers look for vulnerabilities and developers for defenses. Cloning, overbuilding,

reverse engineering and tampering are the major security vulnerabilities of FPGAs. These

threats can have far-reaching consequences ranging from counterfeiting to espionage, and

are faced by corporations and governments alike [2]. Cloning is making an illegal replica

of an original design without understanding the exact details of the design. The attacker

simply considers the original design as a black-box to copy the design to resell without

making an investment in the initial design effort. Cloning not only harms the revenue of

the Original Equipment Manufacturer (OEM) but also affects the OEM’s reputation

because of the poor quality of cloned products. Overbuilding is the easiest form of design

theft, which occurs when a subcontractor builds more units than have been ordered for

fabrication by an OEM. The overbuilt units produced are identical to the originals, which

makes identification difficult. Reverse engineering is making functionally equivalent

designs from an existing design by probing details of the original design. An adversary

can use this information to either develop effective countermeasures or to produce similar

equipment. In FPGAs, bitstream reversal can transform the encoded bitstream into a

functionally equivalent description of the original design. Tampering is an attempt to gain

unauthorized access to an electronic system. Tampering can either be part of a reverse

engineering program, or it can have a malicious motive.

Recently, electronic industries have been facing an increased amount of hardware

counterfeits. The increased complexity in the supply chain system of electronic

components has made counterfeit components easily available in the gray market. These

counterfeit components, when assembled into a product or a system, cannot only degrade

its performance and reliability but also create safety issues. Increasing incidents have

been reported to the Electronics Resellers Association International (ERAI) since 2008.

In 2011, there were more than 1,300 counterfeit incidents reported from around the

world. This number is more than double the number reported in 2010 and 2008, and

quadruple the number reported in 2009 [3].

Physical Unclonable Function (PUF) [4, 5] provides a means to enhance physical

security of Integrated Circuits (ICs) against piracy and unauthorized access. A PUF is

used to solve various security issues, such as chip authentication, cryptographic key

generation, software licensing, Intellectual Property (IP) protection, and detection and

prevention of IC counterfeiting.

Although a Self-Timed Ring (STR) is well studied in many contexts, there has

been limited work done in the field of hardware security and hardware cryptography. The

work in this thesis is also motivated by the fact that there is no previous work on the

FPGA-based implementation of PUFs using asynchronous logic. Self-timed rings are

considered robust to environmental variations, [6, 7] and this feature of the self-timed

ring oscillator is explored to build robust PUFs that strengthen the PUF responses. The

terms ‘asynchronous ring’ and ‘self-timed ring’ are used interchangeably throughout this

thesis.

1.2 Contributions

The major contributions of the work described in this thesis are as follows:

 Introduces a Look-Up-Table (LUT) based implementation of asynchronous ring

oscillators for PUF design.

 Proposes a novel PUF design approach using self-timed ring oscillators. The

proposed PUF is given a name; ‘Self-Timed Ring Oscillator PUF’ (STRO-PUF).

 Experimental analyses are performed on real semiconductor devices. Previous

work [8] on an asynchronous PUF was limited to electrical simulations.

1.3 Thesis Outline

This thesis is organized as follows:

Chapter 2 gives an overview of Physical Unclonable Functions (PUFs) including

PUF definitions, terminologies related to PUFs, PUF quality measures, different types of

PUFs and applications of PUFs.

Chapter 3 gives a brief introduction of asynchronous logic and asynchronous

circuits to design a Self-Timed Ring (STR), also called an asynchronous ring. It

discusses the structure of a self-timed ring oscillator using Muller C-element and the

propagation mode of oscillation in the ring.

Chapter 4 focuses on two major implementations required for the proposed PUF

design; LUT-based implementation of Muller C-element and the asynchronous approach

to the ring oscillator for implementing the Self-Timed Ring (STR) on FPGAs. This

chapter explains the technique for logical implementation of the self-timed ring oscillator

using an underlying FPGA architecture.

Chapter 5 discusses the architecture and the detailed implementation of the

proposed Self-Timed Ring Oscillator based PUF (STRO-PUF). The experimental

analyses are performed to validate the design for calculating PUF uniqueness and

analyzing variation in output frequencies of asynchronous ring oscillators.

Finally, Chapter 6 concludes the thesis and presents ideas for future work.

Chapter 2

Physical Unclonable Functions

2.1 Introduction

The security in Integrated Circuits (IC) has become an important issue due to high

information security requirements. One of the important aspects of improving the

trustworthiness level of semiconductor devices and the semiconductor supply chain is

enhancing physical security. These semiconductor devices demand both computational

security and physical security. Physical Unclonable Function (PUF) [4, 5] provides a

means to enhance physical security of Integrated Circuits (ICs) against piracy and

unauthorized access. This chapter discusses PUF definitions, terminologies related to

PUFs, PUF quality measures, different types of PUFs and applications of PUFs.

PUFs exploit the inherent delay characteristics of wires and transistors that differ

from chip to chip due to manufacturing process variations [9]. These complex physical

characteristics of ICs are used to generate unique signatures which are random,

unpredictable and difficult to reproduce. A PUF generates a set of responses while

stimulated by a set of input challenges. The challenge response relation is defined by

complex physical properties of the material, such as process variability of semiconductor

devices.

PUFs increase physical security by generating volatile secrets in digital form

while the chip is in operation. Secret keys are essential to many security related

applications. Storing secrets in a non-volatile memory is not only expensive but can also

be an easy target for invasive attacks[1]. A PUF offers an inexpensive and secure

approach for generating secret keys. A PUF generates a unique response, or output bits

for each challenge, or input bits. This feature of PUF is used to solve various security

issues, such as chip authentication, cryptographic key generation, software licensing,

Intellectual Property (IP) protection, and detection and prevention of IC counterfeiting.

2.2 PUF Terminologies

2.2.1 Significance of Process Variations

Significant process variations exist in IC fabrication, which makes each IC unique

in its delay characteristics [10]. These variations exist die-to-die (inter-die) or within a die

(intra-die). Die-to-die parameter fluctuations resulting from lot-to-lot, wafer-to-wafer,

and a portion of the within-wafer variations affect every element on a chip equally.

Within-die parameter fluctuations consisting of both random and systematic components

produce a non-uniformity of electrical characteristics across the chip. These variations

occur during various fabrications steps. The lot-to-lot and wafer-to-wafer variations

include process temperatures and pressures, equipment properties, wafer polishing, and

wafer placement. The within-wafer variations affect both die-to-die and within-die

variations. Across a die, device delays vary due to mask variations and placement of

dopant atoms in the device channel region. Variability in device parameters, such as

effective channel length, threshold voltage and gate oxide thickness results in different

characteristics of circuit elements in a chip.

The process variation is becoming more difficult to control in modern Very Large

Scale Integration (VLSI) designs due to the continuous reduction in feature size. Process

variations in nanometer technologies are becoming more significant for cutting-edge

FPGAs. Though FPGA has a regular fabric with replicated layout tiles, the design-

dependent systematical variation is significant in advanced technology [11]. A

manufacturer resistant PUF can be created by exploiting statistical delay characteristics

of the PUF circuit [12].

Most of the PUF designs are based on delay variation of logic and interconnects.

The fundamental principle behind the delay based PUF is to compare a pair of identically

mapped circuit elements and measure the delay mismatch due to manufacturing process

variations. This technique demands identical implementation of two circuit elements

being compared. The identical mapping of circuit elements mapping can be achieved by

VLSI level placement and routing techniques.

2.2.2 Environmental Variations

The delay of gates and wires depends on junction temperatures which rely on

ambient temperatures. The significant variations in the ambient temperatures can result in

major variations in delays. Therefore, the ambient temperature is one of the most

significant environmental conditions that affect the circuit operating conditions. The

impact of varying junction temperatures can be compensated for by using identical

components in PUF circuit design. The main problem caused due to environmental

variation is the inconsistent result from the same design, which may pose challenges

related to robustness. The relative measure of delays can provide robustness against

environmental variations including variations in temperatures and voltages. Circuit aging

can also change delay characteristics of a circuit, but its effect is considerably smaller

than variations in supply voltage and temperatures.

2.2.3 Challenge-Response Pairs

An input to a PUF is called a challenge and the output a response. An applied

challenge and its measured response are generally called a Challenge-Response Pair

(CRP). A PUF generates a unique set of output bits, or response, for each secret input set,

or challenge. In PUF-based authentication, a CRP database is created from a particular

PUF by applying randomly chosen challenges to obtain unpredictable responses. During

verification, a challenge from the CRP database is applied to the PUF, and the response

produced by the PUF is compared with the corresponding response from the database.

2.3 Sources of Noise

The PUF circuit can have three major sources of randomness from its

manufacturing to its usage; noise due to the manufacturing process, local noise and

environmental noise [13].

2.3.1 Noise due to Manufacturing Process

Manufacturing process noise is due to variations in silicon layers during various

steps in the manufacturing processes. This noise is specific to each IC. An ideal PUF is

built to extract the maximum information related to manufacturing process noise to

uniquely identify a circuit or device.

2.3.2 Local Noise

Local noise arises when the circuit is in operation. This noise is due to the random

thermal motion of charge carriers. Local noise should be minimized to decrease intra-

chip variation for PUF designs. However, local noise can be a good source of randomness

for random number generators.

2.3.3 Environmental Noise

Environmental variations such as temperature and power supply voltages

variations are the major causes of noise in PUF responses. This environmental noise can

disrupt the consistency in PUF responses and increase the intra-chip variations, which

reduces the robustness of PUF design.

2.4 Measure of Quality

The metrics to evaluate the basic PUF functions define the trustworthiness of the

PUF. The quality factor of a PUF is measured in terms of its uniqueness, reliability and

resiliency [9, 14].

2.4.1 Uniqueness

Uniqueness is the estimation of how uniquely a PUF can distinguish different

chips based on the generated response. The uniqueness factor is the measure of inter-chip

variation, which gives information on the number of PUF output bits that are different

between two different PUFs. The uniqueness of a PUF is estimated by the average inter-

die Hamming Distance (HD) over a group of chips. It quantifies the Hamming distance of

PUF responses that are provided with the same input challenge. It is characterized by the

Probability Mass Function (PMF) or Probability Density Function (PDF) of Hamming

distances, where PUFs have PDF or PMF curves that are centered at half the number of

response bits. For binary strings, a Hamming distance between any two strings of equal

length is the number of bits that are different in the two strings.

Let (i, j) be a pair of chips with i ≠ j and Ri (respectively, Rj) the n-bit response of

chip i (respectively, chip j). The first metric is the average inter-die Hamming distance

among a group of k chips and is defined as [14]:

2.1

If the PUF produces uniformly distributed independent random bits, i.e. if each

binary response bit of a PUF has an equal probability of producing a ‘0’ or a ‘1’, then the

inter-chip variations should be 50% on average. Truly random bits are produced if only

the random process variation exists.

2.4.2 Reliability

Reliability indicates the reproducibility of the PUF outputs. Reliability gives

information on how many PUF output bits are changed when regenerated from the same

PUF with or without environmental variations. The responses for an ideal PUF are

expected to be consistent; however, factors such as variation in temperature, supply

voltage fluctuations and errors due to thermal noise affect the reproducibility of the PUF

responses. Reliability is the measure of consistency or stability of the PUF output

responses, when the responses are subjected to varying environmental conditions such as

variations in power supply voltages and temperature, and the same input challenge.

Since, the responses being compared are from generated from the same chip; this

variation is also called as intra-chip or intra-die variations.

An n-bit reference response (Ri) is extracted from chip i at normal operating

conditions. The same n-bit response is extracted from the same PUF at a different

operating condition with response bits R’i. Let, R’i, y be the y th

sample of R’i . Then, the

average intra-die HD over x samples for the chip i is defined as [14]:

2.2

The lower value of the average intra-die HD factor results in more reliable PUF

responses. The intra-chip variations for an ideal PUF should be 0%.

2.4.3 Resiliency

Resiliency of a PUF is the ability of the PUF to prevent an adversary from

revealing the PUF secrets. This is the measure of resiliency against attack or security.

2.5 PUF Classifications

PUFs can be categorized based on their construction properties, operation

principle and from a security point of view. Table 2.1 summarizes various PUFs under

different categories.

Table 2.1: Different types of PUFs

Categories Examples

Non-electronic PUF Optical PUF [15], Acoustical PUF [16]

Electronic PUF Coating PUF [17], Power Distribution PUF [18]

Delay-based PUF

Arbiter PUF [5], Ring Oscillator PUF [9], Glitch PUF [19],

Anderson PUF [20]

Memory-based PUF SRAM PUF [21], Butterfly PUF [22], Flip-flop PUF [23]

2.5.1 Non-electronic PUF, Electronic PUF and Silicon PUF

On the basis of construction and operation principles, PUFs can be categorized

into three categories; non-electronic PUFs, electronic PUFs and silicon PUFs [24].

Non-electronic PUFs refer to those with PUF-like properties whose construction

and/or operation is inherently non-electronic. Their PUF-like behavior is based on non-

electronic technologies or materials such as the random fiber-structure of a sheet of paper

or the random reflection of the scattering characteristics of an optical medium. For

example, optical PUFs based on transparent media as proposed in [15] are physical one-

way functions. Figure 2-1 shows the basic implementation of the Optical PUF. The CRP,

consisting of the laser orientation and the resulting hash, is saved in a public database for

later use.

In electronic PUFs, the basic operation consists of an analog measurement of an

electric or electronic quantity such as power, resistance and capacitance. An example of

Figure 2-1: Optical PUF [15]

an electronic PUF is the coating PUF [17], which considers the randomness of

capacitance measurements in comb-shaped sensors in the top metal layer of an IC.

Silicon PUFs [4] exhibit PUF behaviors which are embedded on a silicon chip.

Silicon PUFs are based on the hidden timing and delay information of ICs. A complex

integrated circuit can be represented as silicon based PUF, which helps in identifying and

authenticating individual ICs. Silicon PUFs can be implemented as a hardware building

block in cryptographic implementations. Silicon PUFs exploit manufacturing process

variations in integrated circuits with identical masks to uniquely characterize each IC.

Silicon PUFs are of particular interest for security solutions, and they are widely studied

as a major type of PUF. Delay-based PUFs and memory-based PUFs are considered

silicon PUFs.

2.5.2 Strong PUF and Weak PUF

The distinction between strong PUFs and weak PUFs is explained based on the

security properties of their challenge-response behavior [25]. A PUF is considered a

strong PUF; if it has a large number of CRPs such that an attack based on exhaustively

measuring the CRPs only has a negligible probability of success. For a strong PUF, it is

infeasible to build an accurate model of the PUF based on observed CRPs. If the number

of CRPs is small, then it is considered a weak PUF.

2.5.3 Intrinsic PUF and Non-intrinsic PUF

Another classification based on PUFs construction properties are intrinsic PUFs

and non-intrinsic PUFs. The intrinsic PUF was initially proposed by Guajardo et al. in

[21]. In intrinsic PUFs, its evaluations are performed internally by embedded

measurement equipment, and its random instance-specific features are implicitly

introduced during the manufacturing process. All silicon PUF based on random process

variations occurring during the manufacturing process of silicon chips, are intrinsic

PUFs. These silicon PUFs include both delay-based PUFs and memory-based PUFs.

The non-intrinsic PUFs are externally evaluated and their randomness features are

explicitly introduced. Optical PUF and Coating PUF are the types of non-intrinsic PUFs.

2.6 PUF Circuits

PUFs have drawn considerable attention over the past couple of years, making

them one of the potential areas in the field of hardware security and cryptography. There

have been various PUF techniques proposed for on-chip implementations; on both

Application Specific Integrated Circuits (ASICs) and FPGAs. Since this thesis is about

the FPGA-based PUF implementation, the discussion is limited to those techniques that

have been implemented on FPGAs.

2.6.1 Delay-based PUF

2.6.1.1 Arbiter PUF

Arbiter PUF is the first silicon PUF to be proposed [5]. Arbiter PUF is based on a

delay-based circuit consisting of a parallel multiplexer chain and an arbiter. Depending

on the challenge bits, the skew in propagation delay between the two paths due to process

variations is detected by an arbiter which latches out either logic ‘0’ or logic ‘1’. The two

delay paths are simultaneously excited and make the transition race against each other.

The arbiter block, which is simply a latch or a flip-flop, at the output determines which

rising edge arrives first and sets its output to ‘0’ or ‘1’ depending on the winner. If the

racing paths are symmetric or identical in layout and the arbiter is not biased to either

path, the response is equally likely to be ‘0’ or ‘1’ regardless of the challenge bits. The

output is determined only by the statistical delay variation due to process variations.

Figure 2-2 shows a silicon PUF delay circuit. The circuit has multiple-bit input

and computes a one-bit output based on the relative delay difference between two paths

with identical layout length. Arbiter PUF demands careful layout and routing for identical

mapping of the logic, which is quite difficult, especially in the case of FPGA.

2.6.1.2 Ring Oscillator PUF

The Ring Oscillator (RO) PUF consists of several identically mapped delay loops,

or ring oscillators, each of which oscillates with unique frequency due to manufacturing

process variations [9]. Each input challenge selects a pair of oscillator for comparison in

order to generate a response bit. A set of input challenges are given to PUF, which selects

a fixed sequence of oscillator pairs to generate a fixed number of response bits. The

frequency differences are determined by process variations if all the oscillators are

identically laid-out, which results in equal probability of getting ‘1’ or ‘0’ as a response

bit if random variation exists. The ease of duplicating a ring oscillator using hard-macros

Figure 2-2: An Arbiter PUF delay circuit [9].

x[0] x[n-1]x[2] x[n]

0 or 1

features has made its implementation more popular in FPGAs. Figure 2-3 and Figure 2-4

illustrate the structure of RO-PUF.

. A configurable ring oscillator has been proposed in [26] to improve reliability in

an RO-PUF. The authors have shown that an RO-PUF requires careful design decisions

to avoid the systematic process variations; and the placement techniques and the selection

of ring oscillator pairs significantly improves the PUF uniqueness.

Figure 2-4: Basic RO-PUF generating a single response bit

Counter

Figure 2-3: Ring Oscillator PUF [9]

Inp ut bits

Output bit

0 or 1

2.6.1.3 Glitch PUF

In a combinational logic, there exists a time difference between output changes

from an input change, i.e. it takes some time before the output is settled to its steady-state

value. These unintended transitions in signals are called glitches. The occurrence of

glitches is determined by the differences in delay of the different logical paths from the

inputs to an output signal.

The glitch PUF proposed in [19] exploits glitch waveforms that behave non-

linearly from delay variation between gates. It consists of an on-chip high-frequency

sampling of the glitch waveform and a quantization circuit which generates a response bit

based on the sampled data. The operation sequences of the glitch PUF are as follows:

 Data input to a random logic

 Acquisition of glitch waveforms at the output

 Conversion of the waveforms into response bits

The Anderson PUF proposed in [20] generates a response bit depending on the

presence or absence of glitch. This design is targeted especially for FPGA-based

implementations. It consists of custom logical circuits implementing shift registers and

carry-chain multiplexers. Figure 2-5 shows a basic Anderson PUF. The shift registers are

implemented using a Look-Up-Table (LUT) and are initialized with bit strings that are

inverses of each other. The two LUTs generate square waves that are 180 degrees out of

phase. Due to the process variations in the LUTs and the multiplexers, the propagation

delay from the input to the output will vary from LUT to LUT. When an LUT’s outputs

are sufficiently out of phase, it produces a glitch at the output, which can be captured by a

flip-flop. The presence or absence of the glitch determines the PUFs output bit. Anderson

PUF is also analyzed using the concept of neural network and artificial intelligence [27-

29].

2.6.2 Memory-based PUF

2.6.2.1 SRAM PUF

Static Random Access Memory (SRAM) is a volatile digital memory cell, each

capable of storing a single bit. SRAM memories are available in almost every computing

device including FPGAs, and they can be used as an intrinsic PUF. It is bi-stable and can

be realized with two cross-coupled inverters as illustrated in Figure 2-6.

Figure 2-6: SRAM Cell (PUF). Logical circuit (left) and six-transistor (6T) SRAM

cell (right)

Figure 2-5: Anderson PUF

Clock

LUT A ← AAAA Output

LUT A ← 5555

SRAM PUF proposed in [21] is an FPGA intrinsic PUF based on random initial

states of SRAM cells. Every cell contains a certain degree of mismatch between the two

halves of the cross-coupled circuit. The random physical mismatch in the cell, caused by

manufacturing variability, determines the power-up behavior. When the cell is powered

on, it tends to attain both the stable stages. The power-on condition forces a cell to ‘0’ or

‘1’ during power-up depending on the sign of the mismatch. But, which power-up state a

cell prefers is random and not known in advance, and this random behavior can be used

as a PUF response.

2.6.2.2 Butterfly PUF

The Butterfly PUF (BPUF) is proposed in [22] to overcome the drawbacks of an

SRAM PUF. The disadvantage of intrinsic SRAM PUFs is that not all FPGAs support

uninitialized SRAM memory. In most of the FPGAs, all SRAM cells are enabled hard

reset to zero directly after power-up and hence all the randomness is lost. Also, the

SRAM PUFs require device power-up to enable the response generation.

Figure 2-7: Butterfly PUF cell

Excite

The construction of a butterfly PUF is similar to the SRAM PUF except BPUF

consists of a cross-coupled latch instead of an inverter. A butterfly PUF cell is depicted in

Figure 2-7. A BPUF cell can be brought to a floating or unstable state before allowing it

to settle to one of the two possible stable states. Using the clear/preset functionality of the

latches, an unstable state can be introduced after which the circuit converges back to one

of the two stable states. The preferred stable state of a butterfly PUF cell is determined by

the physical mismatch between the latches and the cross-coupling wires.

2.7 PUF Applications

Some of the major PUF applications proposed so far are as follows:

 Low-cost device authentication [9]

As the PUF output is unique and unpredictable for each IC, PUF can be used for

device identification and authentication. The PUF outputs can be stored in a database and

compare that output with a re-generated signature later. The set of challenge-response

pairs act as the lock and PUFs act as the key. When a key is presented to a lock, the lock

queries the key for the response to a particular challenge. The lock opens only when the

correct key from the database responds.

 Cryptographic key generation [9]

Due to the presence of noise, the PUF outputs are likely to vary slightly on every

evaluation. In order to use PUF outputs as cryptographic keys, the outputs are required to

undergo error correction process and key generation process. With error correction

process, which contains initialization and re-generation, PUF can consistently produce

the same result despite significant environmental changes. During initialization step, PUF

output is generated and the error correcting syndrome for that output is computed and

saved for later. The syndrome is the information that allows correcting bit-flips in re-

generated PUF outputs. In re-generation phase, the PUF uses the syndrome from the

initialization step to correct any changes in the PUF output. The key generation process

converts the PUF output into cryptographic keys.

 Memoryless secret key storage [9]

In current practice, secret keys are stored in a non-volatile memory for

cryptographic primitives. Managing secrets in a memory in a secure way is difficult and

expensive. Storing secrets in a non-volatile memory is also vulnerable to invasive attacks.

PUF can generate volatile secret keys for cryptographic applications. PUFs increase the

physical security by generating volatile secret keys in digital form when the chip is

operating.

 Hardware Random Number Generator (HRNG) [30]

Hardware random number generator extracts randomness directly from a complex

physical source. HRNG accepts an incoming request for a random output and produces

an output using an iterative process for generating a challenge in order to give

unpredictable results. An unpredictable challenge is saved in local registers. Once a

Figure 2-8: Secret key generation using PUF

Initialization phase Re-generation phase

Syndrome

Secret

KeyPUF Circuit

PUF Circuit

suitable challenge is found, a post-processing step is applied to remove bias and extract

randomness from the bit ordering. The National Institute of Standards and Technology

(NIST) test results carried out indicate that a PUF can be used as a reasonably good

hardware random number generator with low area overhead.

 Software licensing [12]

A piece of code can be made to run only on a chip that has a specific identity

defined by a PUF. This prevents the execution of pirated code.

 Intellectual Property (IP) protection [21]

PUFs provide IP protection of FPGAs based on public key cryptography. The

major advantage of using public-key based protocol is that it allows the design in which

the private key is always stored in a FPGA. As PUFs implemented on FPGAs are

intrinsic to the FPGAs, it provides better security.

Figure 2-9: HRNG using PUF

Error correction

PUF Circuit

Random Numbers

Save value

Resp onse

Challenge

Incoming request

 PUF-based Radio Frequency IDentification (RFID) tags for anti-counterfeiting

[31]

A RFID-tag can be made unclonable by linking it inseparably to a PUF.

Chapter 3

Self-Timed Rings

3.1 Introduction

On-chip digital oscillators are ubiquitous in many IC designs. They are considered

a key component in many applications including PLLs, frequency synthesizers and clock

recovery systems. Oscillators are also an essential block for many cryptographic

applications such as on-chip TRNGs [32, 33] and PUFs [9, 14]. This chapter discusses

the Self-Timed Ring (STR), also called as asynchronous ring, as an alternative approach

to standard inverter ring oscillator.

3.2 Asynchronous Circuits

Asynchronous circuits, or self-timed circuits, use handshaking between their

components in order to perform the necessary synchronization, communication, and

sequencing of operations. Asynchronous circuits have shown many interesting potentials

including low power consumption, high operating frequency, less EMI (Electro-Magnetic

Interference), less noise, robustness towards variation in supply voltage, temperature, and

fabrication process parameters, better modularity for easier reuse of components, and no

clock skew problems [6]. However, asynchronous circuits are not yet matured enough to

be accepted openly in the industries, especially due to the lack of suitable Electronic

Design Automation (EDA) tools for asynchronous designs. The acceptance of

asynchronous technology by the semiconductor industries strongly depends on the

availability of synthesis tools and the possibility to prototype a design on standard

FPGAs.

The development of synchronous circuits currently dominates the semiconductor

design industry. However, there are major limiting factors to the synchronous, clocked

approach, including the increasing difficulty of clock distribution, increasing clock rates,

decreasing feature size, increasing power consumption, timing closure effort, and

difficulty with design reuse. Asynchronous circuits can offer a better solution to address

these issues. As the demand continues for designs with higher performance, higher

complexity, and decreased feature size, asynchronous paradigms will become more

widely used in the industry, as evidenced by the 2003 and 2007 International Technology

Roadmap for Semiconductors’ (ITRS) prediction of a likely shift from synchronous to

asynchronous design styles in order to increase circuit robustness, decrease power, and

alleviate many clock-related issues. The 2008 ITRS shows that asynchronous circuits

account for 11% of chip area in 2008, compared to 7% in 2007, and estimates they will

account for 23% of chip area by 2014, and 35% of chip area by 2019 [34].

3.3 Asynchronous Logic

Logic design, in general, consists of a separate computation part and storage part.

Computation takes place in a combinational block or a functional block; whereas storage

takes place in flip-flops, or registers, or latches, although they may exist combined or

separately. In synchronous logic, a global time reference, or a clock, controls activity to

synchronize the entire functional block in a circuit, or a system. Asynchronous logic uses

a local handshaking protocol to communicate among different modules, or functional

blocks. Local handshake between combinational blocks is also called asynchronous

control. Figure 3-1 and Figure 3-2 shows the synchronous and asynchronous

communication to control the events.

An asynchronous circuit can be represented as a static data-flow structure. The

static data-flow structure represents a high-level view of asynchronous design that is

equivalent to Register Transfer Level (RTL) in synchronous design. The data is copied

from one register to the next along the path through the circuit. The handshaking between

Figure 3-1: Synchronous circuit

Figure 3-2: Asynchronous circuit

Figure 3-3: Abstract data-flow view of an asynchronous circuit

CLK

A B C D E

data

A B C D E

ackack ack ack ack

data

Channel or link = data + handshake signals

the registers controls the data. The data and handshake signals connecting one register to

the next can be viewed as a handshake channel, or a link, as in Figure 3-3. The arrows

represent channels or links consisting of request, acknowledge and data signals. The

handshaking protocol is the basis of following sequencing rules of asynchronous circuits

[6, 35]:

 a module starts the computation, if and only if, all the data required for the

computation are available,

 as far as the result can be stored, the module releases its input ports,

 it outputs the result in the output port, if and only if, the port is available.

3.3.1 Muller C-element

The Muller C-element or Muller gate is a fundamental primitive for building

asynchronous logic and implementing the synchronization required by most handshaking

protocols. Figure 3-4 shows a Muller gate representation and its truth table. ‘F’ and ‘R’

represent forward and reverse input respectively, ‘Q’ and ‘Q’’ represent current output

state and previous output state respectively. Figure 3-5 shows transistor level and logic

level implementation of Muller gate. Muller gate copies its input values to output if its

inputs are matched, otherwise it will hold the previous state. In the case of Muller gate

with inverted reverse input, it will copy forward input values to output if its inputs differ

in states, otherwise it will hold the previous states.

3.4 Self-Timed Rings

Rings are the backbone structures of circuits that perform iterative computations.

One can turn a pipeline into a ring by looping data from its output back around to its

input [36]. Figure 3-6 shows a three stage pipeline and the pipeline with its output

connected around to its input to form a ring. If the stages in the ring are all self-timed and

initialized with input data, then the ring will iterate under self-timed control. Self-timed

circuits use handshake protocols to control the sequencing of operations. In a self-timed

ring, events propagate between adjacent stages according to a simple

request/acknowledge handshake. These handshake signals replace the clocks of

synchronous designs.

3.4.1 Self-Timed Ring Structure

Muller C-element or Muller gate is an integral part of self-timed rings. Each stage

of a self-timed ring consists of a Muller gate and an inverter [37]. A standard N-stage

self-timed ring is depicted in Figure 3-7 [38].

Figure 3-4: Standard Muller gate and its truth table (left). Muller gate with inverted

reverse input and its truth table (right).

F R Q

0 0 0 (Reset)

0 1 Q’ (Hold)

1 0 Q’ (Hold)

1 1 1 (Set)

F R Q

0 0 Q’ (Hold)

0 1 0 (Reset)

1 0 1 (Set)

1 1 Q’ (Hold)

R Q F

R Q

Figure 3-7: An N-stage self-timed ring

[i+1] [i] [i-1]

[i-2]

Qi+1 Qi Qi-1

Qi-2 C1 C2

Ci Ci-1Ci+1

Ci-2

Figure 3-5: Implementations of Muller C-element

Figure 3-6: Three stage pipeline (top) and a ring (bottom)

y x

y z

3.4.2 Token and Bubble Propagation

The temporal behavior of the self-timed ring can be explained on the basis of the

token-bubble abstraction model. From micro-pipeline point of view, a token usually

represents the presence of data in a stage, whereas a bubble represents an empty stage

ready to accept new data. A stage is said to have token if its output is not equal to its

input. Similarly, a stage is said to have bubble if its output is equal to its input. If Qi and

Qi+1 represent output for stage i and stage i+1 respectively, then token (T) and bubble (B)

may be represented as:

Token: if Qi ≠ Qi+1 and Bubble: if Qi = Qi+1.

Token-bubble configuration also represents the output states of each stage in a ring.

For example, for a ring having TTBBBB configuration, the stage output is either

“101111” or 010000”. A token propagates from stage i to next stage i+1 if, and only if,

the next stage i+1 contains a bubble. Similarly, a bubble propagates from stage i+1 to

previous stage i if, and only if, the previous stage i contains a token. Figure 3-8 illustrates

propagation of tokens and bubbles in a self-timed ring. For example, with initial ring

configuration as TTB (101 or 010), propagation occurs as:

TTB (101) → TBT (011) → BTT (110) → TTB (101)

An STR will create an oscillation only if the following conditions are satisfied[7, 39]:

 N ≥ 3 and N = NT + NB, where N is the number of stages in an STR with NT

number of tokens and NB number of bubbles.

 NB > 1

 NT is a positive even number

The oscillation depends on process variability and the initial stages of the ring

defined by NT and NB. STR provides two different propagation modes; burst mode and

evenly-spaced mode, as shown in Figure 3-9. In burst mode, the tokens get together to

form a cluster that propagates all around the ring. In evenly-spaced mode, the tokens get

distributed evenly around the ring with constant spacing.

3.4.3 Jitter in Inverter RO and Self-Timed RO

Inverter Ring Oscillators (IROs) and self-timed ring oscillators exhibit thermal

noise [8]. This thermal noise is called jitter in time-domain and phase noise in frequency

domain. Self-timed ring oscillators and inverter ring oscillators differ in the way jitter

accumulates. There are two major jitter sources in FPGAs; local Gaussian jitter and

global deterministic jitter [39, 40].

Local Gaussian jitter is the source of randomness. For FPGA-based

implementation, where each stage of ring oscillators is implemented in a single Look-Up-

Figure 3-8: Token-bubble propagation

Figure 3-9: Burst mode propagation (top) and evenly-spaced mode propagation

(bottom)

Table (LUT), each stage of ring oscillators is considered source of the local Gaussian

jitter. In inverter ring oscillators, oscillation period is defined by two loops of a single

token around the ring and the jitter accumulates from the number of crossed stages.

Whereas, in asynchronous ring oscillators, several tokens propagate around the ring and

the oscillation period is defined by the elapsed time between successive tokens. Each

token crossing a stage experiences varying delay characteristics due to local Gaussian

jitter contribution of the stage. So, the period jitter in STRs is mostly composed of the

jitter generated locally in the ring stage. This provides better robustness against noise

instabilities caused by jitter in inverter ring oscillators in PUF design.

Global deterministic jitter is due to the non-random variations in delay

characteristics caused from external environmental variations. The global deterministic

jitter accumulates linearly throughout the ring in IROs. In STR oscillators, several events

propagate simultaneously, so deterministic jitter affects each event in the same way rather

than the whole ring structure. This gives increased robustness in self-timed ring

oscillators than inverter ring oscillators.

Chapter 4

Asynchronous Approach to Ring Oscillator for FPGA-

based PUF Design

4.1 Introduction

Recent development and advancement in design and process technology has made

Field Programmable Gate Array (FPGA) a key component in most of the electronic

systems. FPGAs are semiconductor devices consisting of matrix of Configurable Logic

Blocks (CLBs), which are interconnected using programmable interconnects. FPGA is

dominating a wide range of application area including military, defense, space,

automotive and consumer electronics. It is believed that FPGA may emerge as a potential

security platform due to their desirable features including flexibility, rapid time-to-

market, and post-silicon validation of the functionality. There has been growing concern

over the security attributes of FPGAs regarding protecting and securing information

processed within it, protecting designs during distribution and protecting intellectual

property rights [1].

This chapter mainly discusses two major implementations required for the

proposed STRO-PUF design; LUT-based implementation of Muller C-element and the

asynchronous approach to the ring oscillator for implementing Self-Timed Ring (STR) on

FPGAs.

4.2 FPGA Architecture

The typical FPGA architecture consists of an array of logic blocks, Input / Output

(I/O) pads and routing channels. The array is surrounded by programmable I/O blocks,

which provides external interface to the FPGA. The logic block is also called as

Combinational Logic Block (CLB) or Logic Array Block (LAB) depending on vendors.

Xilinx and Altera are the two major FPGA vendors in the current market. The detail

architecture of FPGAs differs from one vendor to another vendor; however, the typical

FPGA architecture is shown in Figure 4-1.

Figure 4-1: A typical FPGA architecture

Logic Block

I/O Pad

Routing

Channels

Logic blocks implement logic functions. They form the basic computation and

storage element of digital logic functions on FPGA. The logic block consists of Logic

Cells (LCs), which is also called as Logic Elements (LEs) or a slice. The typical logic cell

consists of Look-Up-Table (LUT) and storage elements such as latches or flip-flops. The

input signals consist of inputs to LUTs and a clock input; and can have registered or

unregistered output. The basic structure of a logic block is shown in Figure 4-2.

4.2.1 Architecture of Spartan-II

The proposed design is implemented using Xilinx XC2S100 FPGA device. This

section describes the overview of a Spartan-II family architecture, which helps in

implementing the STR on the FPGA. The particular XC2S100 device has 20 rows by 30

columns CLBs, which totals 600 CLBs and has 2700 logic cells [41].

The basic building block of the Spartan-II FPGA CLB is the Logic Cell (LC). An

LC includes a 4-input function generator, carry logic, and a storage element. Each

Spartan-II FPGA CLB contains four LCs, organized in two identical slices. Each CLB

consists of two identical slices. A Spartan-II slice is shown in Figure 4-3. The function

generators are implemented as 4-input LUTs.

Figure 4-2: Structure of a typical logic block

Inputs

CLK

LUT FF

Latch

Output

4.3 LUT Implementation of a Muller Gate

Every Look-Up-Table (LUT) implements a Boolean logic equation, which is

defined by an INIT attribute. The INIT attribute defined with an appropriate hexadecimal

digits is attached to the LUT inputs to specify its logical function [42]. The INIT

Figure 4-3: Spartan-II slice

LUT

Carry

Control

Logic

Carry

Control

Logic

parameter for the LUT primitive defines the logical values of the LUT. This value is zero

by default, which drives the output to a zero regardless of the input values. The LUT can

be loaded with custom hexadecimal values, defined by INIT attribute, to perform a

particular logical function.

A self-timed ring requires its initial states to be loaded with required configuration

of tokens and bubbles, which can be defined by assigning the output of each stage with

either ‘0’ or ‘1’. A Muller gate with a set/reset feature (as shown on the left side of Figure

4-4) is used to force its output to either set or reset as desired. A Muller gate with set

input is called set Muller gate and a Muller gate with reset input is called reset Muller

gate. The set Muller gate forces its output to ‘1’ and reset Muller gate forces its output to

‘0’ during the initialization process.

A 4-bit LUT with general output is considered in the implementation to define

STR stages. Figure 4-4 shows a single stage of a self-timed ring oscillator for its

implementation in LUT. One of the inputs is configured as a Set/Reset (SR) signal, which

is responsible for setting stage output value at either ‘0’ or ‘1’. The remaining three

inputs are configured as forward input (F), reverse input (R) and feedback (Q’).

Figure 4-4: A stage in STR. Muller gate with set/reset option (left). LUT mapped as

Muller gate (right) for FPGA implementation

Set/Reset I3

I2 O

I1 LUT

I0Q’

R Q

Set/Reset

A common technique to determine the desired INIT value for realizing a logical

function with LUT is using a truth table. The logical function of set Muller gate and reset

Muller gate is mapped in the Table 4.1 and Table 4.2. The custom hexadecimal digits to

define INIT attribute are obtained by grouping the output bits. The INIT attribute can be

obtained by reading the output states in groups of four from the bottom-up fashion and

converting them into hexadecimal characters. From the tables below, the INIT attribute

obtained for reset Muller gate and set Muller gate are “00B2” and “FF02” respectively.

Figure 4-5 shows the VHDL instantiation of reset Muller gate using a 4-input LUT with

INIT attribute.

Table 4.1: LUT mapping of reset Muller gate. INIT = > x“00B2”

I3 = SR I2 = F I1 = R I0 = Q’ O = Q INIT

0 0 0 0 0

“0010” = 2 0 0 0 1 1

0 0 1 0 0

0 0 1 1 0

0 1 0 0 1

“1011” = B 0 1 0 1 1

0 1 1 0 0

0 1 1 1 1

1 0 0 0 0

“0000” = 0 1 0 0 1 0

1 0 1 0 0

1 0 1 1 0

1 1 0 0 0

“0000” = 0 1 1 0 1 0

1 1 1 0 0

1 1 1 1 0

Table 4.2: LUT mapping of set Muller gate. INIT => x“FFB2”

I3 = SR I2 = F I1 = R I0 = Q’ O = Q INIT

0 0 0 0 0

“0010” = 2 0 0 0 1 1

0 0 1 0 0

0 0 1 1 0

0 1 0 0 1 “1011” = B

0 1 0 1 1

0 1 1 0 0

0 1 1 1 1

1 0 0 0 1

“0000” = F 1 0 0 1 1

1 0 1 0 1

1 0 1 1 1

1 1 0 0 1

“0000” = F 1 1 0 1 1

1 1 1 0 1

1 1 1 1 1

4.4 Logical Implementation of a Self-Timed Ring Oscillator

The proposed PUF design is a logic-based design, which uses asynchronous ring

oscillators instead of basic inverter ring oscillators. The design is especially targeted for

LUT-based FPGAs. Each stage in a ring is mapped in an LUT to perform a Muller gate

function. An asynchronous ring oscillator can be constructed by replicating each stage of

Figure 4-5: VHDL instantiation of reset Muller gate.

the ring described in Figure 4-4 to form a ring structure, as illustrated in Figure 3-7 in

Chapter 3. The ring should be designed to meet the oscillation conditions described in

Chapter 3. It is necessary to initialize ring stages, satisfying the oscillation conditions,

before oscillation occurs. The number and positions of set Muller gates or reset Muller

gates, defines the initialization states and the token-bubble states in the ring.

Figure 4-6 depicts a four-stage asynchronous ring oscillator implemented using

LUTs. A common signal ‘SR’ is connected to every stages of the ring. SR signal controls

the initialization and oscillation of the ring oscillator. In other words, SR switches the

self-timed ring oscillator between initialization mode and oscillation mode. For the

purpose of this design, initialization occurs when SR = ‘1’ and oscillation occurs when

SR = ‘0’.

The placement constraints [43] are used in the coding to ensure each stage of the

ring is mapped in a separate LUT. Placement constraints are used to prevent alteration of

design mapping, which may be caused by a synthesis tool. Figure 4-7 shows the

Figure 4-6: LUT-based four-stage asynchronous ring oscillator

I1 LUT

O OOO

schematic view of the implemented 6-stage self-timed ring oscillator with 2T4B

configuration and the initial states of “101111”.

Each stage of the ring is mapped in a separate LUT. Since six different LUTs are

used for implementing the ring oscillator, three different slices are used, as shown in

Figure 4-8. The position of each stages of the self-timed ring oscillator is defined by

using placement constraints, as shown in Figure 4-9.

Figure 4-7: Technology schematic view of 6-stage self-timed ring oscillator with

2T4B configuration and the initial states of “101111”.

4.5 Experimental Results

To observe the oscillatory behavior of a self-timed ring, the design in

implemented on XSA board with Xilinx XC2S100 FPGA device. For experimental

analysis, the self-timed ring oscillator is implemented with different numbers of stages,

and with different spatial configurations. Figure 4-10 through Figure 4-12 below show

the oscillation pattern of post-place & route simulation results and the real output tapped

Figure 4-8: Implementation of 6-stage self-timed ring oscillator shown in Xilinx

FPGA Editor

Figure 4-9: Placement constraint used to define position of stages of a self-timed ring

6-stage STR mapped

in 3 sep arate slices

from a logic analyzer. Table 4.3 shows the frequency observed for different

configurations of self-timed ring oscillators.

The oscillation frequency of the ring oscillator depends on the number of events,

i.e. number of bubbles or number of tokens; but not on the spatial arrangement or

distribution for the same number of tokens and bubbles. From the Table 4.3, it can be

observed that the 6-stage ring oscillator with spatial distribution of “TTBBBB” or

“TBTBBB” results in the same frequency. Also, with the different initialization states,

the same stage ring oscillator can give different oscillation frequencies. This is one of the

Figure 4-10: Simulation result of 6-stage STR oscillator with TTBBBB configuration

Figure 4-11: Simulation result of 6-stage STR oscillator with TTTTBB configuration

Figure 4-12: Real output of 6-stage STR oscillator with TTBBBB configuration

obtained from a logic analyzer

benefits of the self-timed ring to add reconfigurable features within the design. Unlike,

conventional inverter oscillator, the oscillator frequency of the asynchronous ring

oscillator does not decrease with the number of stages.

Table 4.3: Frequency values for asynchronous ring oscillators with different

configurations

No. of Stages NT.NB Time Period

(ns)

Frequency

(MHz)

Spatial Configuration

6 2T4B 10 100 TTBBBB, TBTBBB

6 4T2B 6.2 169.29 TTTTBB, TTBBTT

8 2T6B 8.3 120.48 TTBBBBBB

8 4T4B 5.9 169.49 TTTTBBBB

8 6T2B 8.3 120.48 TTTTTTBB

4.6 Conclusion

The technique for LUT-based implementation of Muller gate to construct a self-

timed ring oscillator, or an asynchronous ring oscillator is described in this chapter. The

experimental analysis illustrates the oscillation generating from an asynchronous ring

oscillator with different configurations.

It is a well known fact that significant process variations exist in IC fabrication,

which makes each IC unique in its delay characteristics [11, 44]. The statistical delay

variation in transistors and wires across FPGA chips can be exploited through identically

laid-out asynchronous ring oscillators. The next chapter discusses the proposed FPGA-

based PUF using the self-timed ring oscillator.

Chapter 5

STRO-PUF: Self-Timed Ring Oscillator based PUF

5.1 Introduction

This chapter introduces the implementation of self-timed ring oscillators as a

novel PUF approach on FPGAs. The proposed PUF is given a name; ‘Self-Timed Ring

Oscillator based Physical Unclonable Function (STRO-PUF)’. Like RO-PUF, the self-

timed ring oscillator based PUF generates oscillations of different frequencies when

identically mapped on a semiconductor device. These varying frequencies produced by

all identically mapped self-timed ring oscillators can be used to generate unique PUF

response bits.

Although the self-timed ring is well studied in many contexts, there has been very

limited work done in the field of hardware cryptography and the areas of security

applications using the concept of asynchronous logic. In [8], the author has initiated PUF

implementation using asynchronous ring oscillators to address robustness and entropy.

However, the result is limited to electrical stimulation. The work described in this thesis

is implemented on real silicon devices. In [39], authors have analyzed a self-timed ring

oscillator as the entropy source for the True Random Number Generator (TRNG)

implemented on FPGA. This chapter aims to explore the implementation of asynchronous

ring oscillators in PUF design targeting FPGA devices.

5.2 Architecture of STRO-PUF

The proposed PUF architecture is also based on a ring oscillator, but it uses a self-

timed ring oscillator instead of a conventional inverter ring oscillator. The architecture of

the proposed design for a self-timed ring oscillator based PUF is shown in Figure 5-1. It

consists of two groups of identically laid-out self-timed ring oscillators. A Set/Reset

(SR) signal is common to all the oscillators present in both groups. The SR signal

initializes the states of every ring oscillator in order to create oscillations.

The initialization is done setting SR = ‘1’; SR can be switched back to SR = ‘0’ so

that oscillation is created. Each oscillator oscillates with different frequencies due to

process variations. Outputs of each oscillator are fed to the multiplexers (MUX) of

corresponding groups. Inputs to the PUF are given through a challenge generator, which

selects two self-timed ring oscillators from each group. The frequency comparator

captures the frequency differences between these two oscillators and generates a single

output bit. A frequency comparator consists of two counters counting TV (target value)

periods of two frequencies coming from each MUX. Whichever counter reaches the

targeted value of TV first, the frequency driving that counter is greater than the other. For

example, if the frequencies of STROs from group A and group B are f1 and f2

respectively, then the response bit = 1 if f1 ≥ f1; otherwise the response bit = 0. A unique

set of output responses is generated for each set of input challenges, which is used in

identifying a particular device and also used in various cryptographic applications.

5.3 Implementation of STRO-PUF

FPGAs are considered an efficient platform for implementing cryptographic

algorithms on hardware. The implementation of PUFs on FPGAs involves significant

challenges because it is difficult for a designer to exploit full layout level design

techniques, and there is not sufficient information available about the gate level structure

of the FPGA fabric. Also, many PUF designs require careful routing symmetry, and this

is quite difficult to achieve in FPGA-based design.

A six-stage asynchronous ring oscillator is considered for the purpose of the

implemented PUF design. The prototype asynchronous ring oscillator, which is

implemented using an LUT-based approach, is shown in Figure 5-2. The details of LUT-

based implementation of a self-timed ring oscillator have already been discussed in

Chapter 4. The proposed PUF design requires the identical mapping of each self-timed

ring oscillator. This includes both the symmetrical routing and the placement of identical

Figure 5-1: Architecture of the proposed STRO-PUF

Group A Group B Challenge

Generator

Frequency

Comparator

SR SR

Response bits

circuit instances. The FPGA Editor in the Xilinx toolset allows the user to create identical

instances using hard-macros. Figure 5-3 shows the layout of a six-stage self-timed ring

oscillator implemented as a hard-macro. The bull’s eye symbol represents the reference

point of the hard-macro.

Each group in a PUF circuit can have a number of asynchronous ring oscillators.

The number of ring oscillators in the groups determines the possible combination of input

challenges, the number of responses and the number of bits in each response. The

response generated from the PUF circuit also depends on how the comparisons are made

among the oscillators. Depending on the number of oscillators required in each group, the

self-timed ring is duplicated using the hard-macro to ensure all the oscillators are

identical.

Figure 5-2: A 6-stage asynchronous ring oscillator.

Figure 5-3: Hard-macro implemented as a six-stage asynchronous ring oscillator.

C1 C3C2 C6C5C4 F1 F2 F3 F4 F5 F6

Q1 Q2 Q3 Q4 Q5 Q6

Figure 5-4: Layout view of an STRO-PUF implemented with 16 pairs of identical

STR oscillators in each group.

Figure 5-5: Portion of an STRO-PUF in FPGA Editor.

Hard-macros are instantiated in the main program and the locations of the hard-

macros are defined in a User Constraint File (UCF) to map the PUF as desired. Figure 5-

4 shows the duplication of a self-timed ring oscillator instance, which is created using

hard-macros, in order to map 16 pairs of identical oscillators for implementing the

STRO-PUF. Figure 5-5 shows a portion of the implemented STRO-PUF mapped in a

region defined in the user constraint file.

5.4 Experimental Analysis

The proposed design is implemented on three different Xilinx Spartan-II boards.

PUFs are mapped onto six different regions of each device as shown in Figure 5-6. Each

PUF is realized using 16 pairs of identically laid-out STROs with 16 STROs in each

group. For the purpose of the implemented design, a six-stage self-timed ring oscillator is

used with two token and four bubble configurations, which are represented by their initial

states of either ‘101111’ or ‘010000’ (TTBBBB).

Figure 5-6: PUFs mapped on six different regions of XC2S100 FPGA (20 X 30 CLBs)

The Set/Reset (SR) signal initializes the PUF states when SR = ‘1’ and generates

oscillations when SR = ‘0’. Figure 5-7 illustrates PUF output read from a logic analyzer

during initialization mode and oscillation mode.

5.4.1 Analysis of Output Frequencies

Frequencies generated from each of the self-timed ring oscillators of the STRO-

PUFs are read through a logic analyzer. The varying oscillatory behavior of STROs is

observed in the logic analyzer. In the simulation output, however, the same PUF design

gives identical oscillatory behavior with same frequency for all STROs. Figure 5-8 and

Figure 5-9 show the simulated waveform, and the real output taken from the logic

analyzer. Figure 5-10 shows the frequency variations for 36 groups of asynchronous ring

oscillators, which are mapped across six different regions of all three FPGAs. The

maximum and the minimum frequencies observed are 125 MHz and 16.2438 MHz,

respectively. The average frequency observed is 101.4460 MHz. The simulation result

shows the identical frequency of 100 MHz for all the oscillators, which is different from

the real responses. The robust responses can be determined by selectively comparing the

frequencies of the oscillators, which have larger frequency differences.

Figure 5-7: PUF outputs during initialization mode and oscillation mode.

Initialization mode, SR =1 Oscillation mode, SR=0

Figure 5-8: Simulation result of STRO-PUF output frequencies.

Figure 5-10: Distribution of frequencies generated by asynchronous ring across FPGA

devices

100

120

140

0 5 10 15 20 25 30 35 40

F re

q u e n c y

Asynchronous Ring

Frequency Variation RO1

RO2

RO3

RO4

RO5

RO6

RO7

RO8

RO9

RO10

RO11

RO12

RO13

RO14

RO15

RO16

Figure 5-9: Portion of STRO-PUF output frequencies taken from a logic analyzer.

5.4.2 Analysis of Uniqueness of STRO-PUF

For each challenge provided, a pair of oscillators is selected to generate a single

bit response. For k number of ring oscillators, k (k-1)/2 distinct pairs can be selected to

generate k (k-1)/2 response bits. But generating response bits from all the possible pairs

reduces entropy due to the inclusion of dependent bits [13]. To avoid correlation, a

simple approach is to use each oscillator only once in order to generate a single bit. The

uniqueness can be calculated by using equation 2.1.

The uniqueness analyses are performed for 16-bit PUF response and 256-bit PUF

response, which are generated based on how the comparisons are made. Table 5.1 and

Table 5.2 show 18 different PUF responses for two different comparisons. If each

oscillator is used only once to generate a response bit, the STRO-PUF, having 16 pairs of

STROs, can generate a 16-bit response. To analyze the overall signature uniqueness of

the implemented design, all the PUF responses are considered. There are six different

PUFs mapped on each of three FPGAs, which gives total of (6X3 = 18) 18 PUFs,

producing (18*(18-1)/2 = 153)153 data points. The average Hamming distance for 16-bit

responses is calculated as 7.99. Figure 5-11 illustrates the probability histogram of

responses from the PUFs, indicating an average uniqueness of 49.92%, which is very

close to the desired 50% factor.

Table 5.1: 16-bit STRO-PUF responses

16-bit STRO-PUF responses

B0A2

6FFF

7F2A

B8DF

A647

F49B

F6E1

06FF

4041

77A7

82F5

EB70

7FFF

41DB

AF9D

4062

8590

6EE4

Figure 5-11: Uniqueness Analysis for 16-bit PUF response

If comparisons are made with each oscillator in a group being compared with

every oscillator in another group, it can give a 256-bit (16X16 = 256) response. The

entropy of these responses is reduced because the bits obtained also include the correlated

bits. For example, consider two ring oscillators ‘a’ and ‘b’ in group ‘A’ and two ring

oscillators ‘c’ and ‘d’ in group ‘B’. The possible combinations are (a, c), (a, d), (b, c) and

(b, d), generating 4-bit response. If a>c, c>b, b>d then it can be easily predicted that a>d.

The uniqueness for 256-bit response is obtained as 26.28% and its histogram is shown in

Figure 5-12.

Figure 5-12 Uniqueness Analysis for 256-bit PUF response

Table 5.2: 256-bit STRO-PUF responses

256-bit STRO-PUF responses

F062300030003000F062F062F062F062F2E27000F062F062F2E2F2E2F062F2E2

7040500070400000FFFFFFFFFFFFFFFFFFFFF040FFFFFFFFFFFFFFFFFFFFFFFF

7000700070007000FF7AFF7AFF7FFF7AFF7F7000FF7A7042FF7AFF7AFF7FFF7A

F052100070401000FFFFFADAF052F052FFFFF052F052F8DAFFFFFFFFF052FFFF

F040100030000000F042FFF7FFF7F042F9627040F040F042FFF7FDF6F042FFF7

F040700020007000F4DBF453FDFBF4DBF4DB7000F053F053FDFBFDFBF042F043

F040704030003000F060F761F761F060F7E33040F761FFE3F761FFE3F040F761

7000300010000000F048FEFFFFFFFEFFFEFF7040FFFFFEFFFEFFFEFFFFFFFFFF

7000700000000000F040F040F040F841F040F040F841F040F040F841F040F841

7040700070007000F040F440FFFFFDC3FFC77000FFEFFC41F040FFC7FFC7FFC3

F040300010000000F040FA55FED5FEFDFEF57040FEFDFED5FEF5FA55FEFDFA55

F040700020002000F860F840FEFDFFFFF8607040FC79FEFDF040F860FEFDF860

7060704070407040FFFBFFFBFFFFFFFFFFFF7040FFFFFFFFFFFFFFFFFFFFFFFF

70424000400040007142714279FA79FAFFFF70427142FFFF79FA71427042FFFF

F000000020000000FFFFFFFFFFFFFFFFFFFF0000F050FC7DFC7DFFFFF050FFFF

7040704000000000F062F062F062F062F062F040F062F062F062F062F062F062

F040300010000000F0C2FCFBFCFBFDFBF0D3F000F0C3FCFBF0C2F0DBF040F042

7000600020002000FC67FE67FFF7FC67FFFFF040FC67FC67FFF7FC65F040F440

Table 5.3 summarizes the analysis based on two different comparisons;

comparing each oscillator only once, which gives the responses without dependent bits

and comparing each oscillator in group A to every oscillator in group B, which gives

responses with dependent bits.

Table 5.3: Comparing responses with dependent bits and independent bits

Response without dependent bits Response with dependent bits

No. of output bits 16 256

Uniqueness 49.92 % 26.28 %

Average HD 7.99 67.27

Minimum HD 1 10

Maximum HD 15 123

The uniqueness (inter-die variation) achieved with the proposed STRO-PUF is the

closest to the desired factor of 50% compared to the previous work on FPGA-based PUF.

For the comparison, uniqueness analysis with Table 5.4 shows the uniqueness results of

the implemented STRO-PUF with 16-bit response versus previous work.

Table 5.4: Uniqueness results for FPGA-based PUFs

Different PUFs Uniqueness

STRO-PUF (proposed) 49.92 %

Configurable RO-PUF [45] 47 %

RO-PUF [9] 46.15 %

RO-PUF [46] 48.4 %

Configurable RO-PUF[14] 47.31 %

Anderson PUF [20] 48.28 % (Average HD of 61.8 for 128-bit

output)

RO-PUF based on placement [47] Random placement : 43.40 %

Chain-like placement : 48.51 %

5.4.3 FPGA Authentication using STRO-PUF

STRO-PUFs can be used to authenticate individual ICs without costly primitives.

Figure 5-13 shows a basic PUF-based FPGA authentication process. Trusted parties

create a Challenge-Response Pair (CRP) database from an authentic FPGA for future

authentication operations. To verify the authenticity, the trusted party selects a challenge

from the database and checks whether it matches its corresponding response or not. Each

CRP is used only once to increase security. Both the 16-bit responses and the 256-bit

responses generated from STRO-PUFs can be applied for this device authentication

mechanism.

Figure 5-13: FPGA authentication using STRO-PUF

Authentic FPGA Unknown FPGA

PUF PUF

CRP

Da ta ba se

--------------

FB19 F22F

AB43 653A

BBF2 EC31

Untrusted

Environment

Supply Cha in

Cha llenge : Response1 Cha llenge :Response2

Response1 = Response2

?? ?

5.4.4 Reliability Enhancement with STRO-PUF

Frequencies of ring oscillators can change significantly as environmental effects

can cause the oscillators to flip their output bits. The effect of temperature and voltage on

frequencies of ring oscillators is shown in Figure 5-14. When temperature increases,

frequencies of oscillators slow down at different rate due to different device or physical

parameters. In the Figure 5-14, at certain initially temperature, a ring oscillator

represented by a dotted line is faster than a ring oscillator represented by a solid line.

When temperature changes significantly, these ring oscillators flip. Similarly, with

significant changes in voltages, the frequencies of oscillators change at different rate,

which gives different result. It shows that ring oscillators with greater frequency

differences are much less likely to flip than ring oscillators with narrow frequency

differences. The errors caused due to the bit flips can be significantly reduced by

comparing ring oscillators, whose frequencies are far apart, to generate response bits.

In STRO-PUF, the number of token and bubble can be configured from its

initialization stage. By determining the configuration of the self-timed ring oscillators

with the maximum frequencies differences, maximum reliability can be achieved.

5.5 Conclusion

This chapter described the implementation of a PUF using self-timed ring

oscillators on FPGA. It uses a logic-based design of an underlying FPGA architecture.

The approach can be used to implement low-cost authentication of the FPGA device and

to generate secret keys for many cryptographic applications. The frequency analysis

shows that asynchronous oscillators generate varying frequencies due to process

variations. These frequencies can be selectively compared to generate response bits. The

uniqueness of the implemented STRO-PUF for 16-bit response is calculated as 49.92 %,

Figure 5-14: Effect of temperature and voltage on oscillator frequencies and PUF

response bits

Temperature

F re

q u

e n

c y

Flipped

bits

Narrow frequency

difference

Temperature

F re

q u

e n

c y

Original

bits retained

Wider frequency

difference

Voltage

F re

q u

e n c y

Flipped

bits

Narrow frequency

difference

F re

q u

e n c y

Original

bits retained

Wider frequency

difference

Voltage

which is close to desired 50% factor. The uniqueness and the strength of PUF responses

also depend on how the comparisons are done. With the inclusion of dependent bits, the

uniqueness factor reduces.

Chapter 6

Conclusion

6.1 Conclusion

Today’s global marketplace has opened up not only new opportunities but also

new threats. In the current global marketplace, commercial products can be obtained

easily, either by legitimate means or simply by theft. Counterfeiting has become one of

the most significant threats to the free market. Physical Unclonable Functions (PUFs)

have emerged as a potential technique to fight against hardware counterfeiting. PUFs are

methods of extracting unique identity information from silicon devices or circuits based

on their physical properties for device authentication.

Since the inception of a PUF concept, there have been various PUF techniques

proposed, each with their own implications. In this thesis, a novel approach towards

FPGA-based PUF design using asynchronous ring oscillators has been described. It uses

the logic-based design of the underlying FPGA architecture. The frequency analysis

shows that asynchronous oscillators generate varying frequencies due to process

variations. These frequencies can be selectively compared, based on input challenges

provided to the PUF, to generate response bits. The responses generated from the STRO-

PUF are used in device authentication and in many cryptographic applications such as

generating secret keys and TRNG. From the experimental analysis, it is observed that the

proposed PUF has a uniqueness factor of 49.92 %, which is close to desired factor of

50%. The uniqueness achieved with the STRO-PUF is better than the previous work in

FPGA-based PUF designs (Table 5.4). The experimental analyses also show that the

uniqueness of PUF responses also depends on how the input challenges are given to the

PUF in order to generate response bits. The input challenge decides the selection of

oscillators and the number of response bits.

The STRO-PUF can achieve better re-configurability features without significant

hardware overhead. The initial stages of asynchronous ring oscillators in an STRO-PUF

can be configured by setting different number of tokens and bubbles. Reliability of

STRO-PUF can be enhanced by selectively comparing the frequencies of asynchronous

oscillators which have wider frequency differences.

6.2 Future Directions

The work in this thesis is an initial step toward PUF design using asynchronous

logic. Some possible extensions to this work include the following:

 The proposed design can be extended to have reconfigurable features by adding

control logic to load a different token-bubble word during the initialization stage.

A self-timed ring oscillator with same number of stages can generate different

frequencies with different token-bubble configurations. This feature is not

possible in a conventional inverter oscillator.

 Experimental analysis of the robustness of STRO-PUF in varying environmental

conditions such as varying temperatures and varying voltages.

 Implementing STRO-PUF in other PUF applications such as a secret key

generator and TRNG.

 Power analysis of STRO-PUF compared to other PUF designs.

Reference

[1] S. Drimer, "Volatile FPGA design security–a survey," IEEE Computer Society

Annual Volume, pp. 292-297, 2008.

[2] C. Hu, "Solving Today’s Design Security Concerns," Xilinx Corporation, 2010.

[3] C. Gorman. Counterfeit Chips on the Rise [Online]. Available:

http://spectrum.ieee.org/computing/hardware/counterfeit-chips-on-the-rise

[4] B. Gassend, D. Clarke, M. Van Dijk, and S. Devadas, "Silicon physical random

functions," in Proceedings of the 9th ACM conference on Computer and

communications security, 2002, pp. 148-160.

[5] J. W. Lee, D. Lim, B. Gassend, G. E. Suh, M. Van Dijk, and S. Devadas, "A

technique to build a secret key in integrated circuits for identification and

authentication applications," in VLSI Circuits, 2004. Digest of Technical Papers.

2004 Symposium on, 2004, pp. 176-179.

[6] J. Sparsø, "Asynchronous Circuit Design: A Tutorial," Technical University of

Denmark, 2006.

[7] J. Hamon, L. Fesquet, B. Miscopein, and M. Renaudin, "Constrained

Asynchronous Ring Structures for Robust Digital Oscillators," Very Large Scale

Integration (VLSI) Systems, IEEE Transactions on, vol. 17, pp. 907-919, 2009.

[8] J. Murphy, "Asynchronous Physical Unclonable Functions–A sync PUF,"

Multimedia Communications, Services and Security, pp. 230-241, 2012.

[9] G. E. Suh and S. Devadas, "Physical unclonable functions for device

authentication and secret key generation," in Proceedings of the 44th annual

Design Automation Conference, 2007, pp. 9-14.

[10] K. A. Bowman, S. G. Duvall, and J. D. Meindl, "Impact of die-to-die and within-

die parameter fluctuations on the maximum clock frequency distribution for

gigascale integration," Solid-State Circuits, IEEE Journal of, vol. 37, pp. 183-190,

2002.

[11] H.-Y. Wong, L. Cheng, Y. Lin, and L. He, "FPGA device and architecture

evaluation considering process variations," in Proceedings of the 2005

IEEE/ACM International conference on Computer-aided design, 2005, pp. 19-24.

[12] B. Gassend, D. Clarke, M. Van Dijk, and S. Devadas, "Controlled physical

random functions," in Computer Security Applications Conference, 2002.

Proceedings. 18th Annual, 2002, pp. 149-160.

[13] F. Bernard, V. Fischer, C. Costea, and R. Fouquet, "Implementation of ring-

oscillators-based physical unclonable functions with independent bits in the

response," International Journal of Reconfigurable Computing, vol. 2012, p. 13,

2012.

[14] A. Maiti and P. Schaumont, "Improved ring oscillator PUF: an FPGA-friendly

secure primitive," Journal of cryptology, vol. 24, pp. 375-397, 2011.

[15] R. S. Pappu, "Physical one-way functions," Massachusetts Institute of

Technology, 2001.

[16] B. Škorić, P. Tuyls, and W. Ophey, "Robust key extraction from physical

uncloneable functions," in Applied Cryptography and Network Security, 2005, pp.

407-422.

[17] P. Tuyls, G.-J. Schrijen, B. Škorić, J. van Geloven, N. Verhaegh, and R. Wolters,

"Read-proof hardware from protective coatings," in Cryptographic Hardware and

Embedded Systems-CHES 2006, ed: Springer, 2006, pp. 369-383.

[18] R. Helinski, D. Acharyya, and J. Plusquellic, "A physical unclonable function

defined using power distribution system equivalent resistance variations," in

Proceedings of the 46th Annual Design Automation Conference, 2009, pp. 676-

681.

[19] D. Suzuki and K. Shimizu, "The glitch PUF: a new delay-PUF architecture

exploiting glitch shapes," in Cryptographic Hardware and Embedded Systems,

CHES 2010, ed: Springer, 2010, pp. 366-382.

[20] J. H. Anderson, "A PUF design for secure FPGA-based embedded systems," in

Proceedings of the 2010 Asia and South Pacific Design Automation Conference,

2010, pp. 1-6.

[21] J. Guajardo, S. Kumar, G.-J. Schrijen, and P. Tuyls, "FPGA intrinsic PUFs and

their use for IP protection," Cryptographic Hardware and Embedded Systems-

CHES 2007, pp. 63-80, 2007.

[22] S. S. Kumar, J. Guajardo, R. Maes, G.-J. Schrijen, and P. Tuyls, "The butterfly

PUF protecting IP on every FPGA," in Hardware-Oriented Security and Trust,

2008. HOST 2008. IEEE International Workshop on, 2008, pp. 67-70.

[23] R. Maes, P. Tuyls, and I. Verbauwhede, "Intrinsic PUFs from flip-flops on

reconfigurable devices," in 3rd Benelux workshop on information and system

security (WISSec 2008), 2008.

[24] R. Maes and I. Verbauwhede, "Physically unclonable functions: A study on the

state of the art and future research directions," in Towards Hardware-Intrinsic

Security, ed: Springer, 2010, pp. 3-37.

[25] J. Guajardo, S. S. Kumar, G.-J. Schrijen, and P. Tuyls, "Physical unclonable

functions and public-key crypto for FPGA IP protection," in Field Programmable

Logic and Applications, 2007. FPL 2007. International Conference on, 2007, pp.

189-195.

[26] A. Maiti and P. Schaumont, "Improving the quality of a physical unclonable

function using configurable ring oscillators," in Field Programmable Logic and

Applications, 2009. FPL 2009. International Conference on, 2009, pp. 703-707.

[27] S. Pappala, M. Niamat, and W. Sun, "FPGA Based Device Specific Key

Generation Method using Physically Uncloanble Functions and Neural

Networks," presented at the IEEE 55th International Midwest Symposium on

Circuits and Systems (MWSCAS), 2012.

[28] S. Pappala, M. Niamat, and W. Sun, "FPGA based trustworthy authentication

technique using Physically Unclonable Functions and artificial intelligence," in

Hardware-Oriented Security and Trust (HOST), 2012 IEEE International

Symposium on, 2012, pp. 59-62.

[29] S. Pappala, M. Niamat, and W. Sun, "FPGA based key generation technique for

anti-counterfeiting methods using Physically Unclonable Functions and artificial

intelligence," in Field Programmable Logic and Applications (FPL), 2012 22nd

International Conference on, 2012, pp. 388-393.

[30] C. W. O’donnell, G. E. Suh, and S. Devadas, "PUF-based random number

generation," In MIT CSAIL CSG Technical Memo, vol. 481, 2004.

[31] P. Tuyls and L. Batina, "RFID-tags for Anti-Counterfeiting," in Topics in

Cryptology–CT-RSA 2006, ed: Springer, 2006, pp. 115-131.

[32] M. Baudet, D. Lubicz, J. Micolod, and A. Tassiaux, "On the security of oscillator-

based random number generators," Journal of cryptology, vol. 24, pp. 398-425,

2011.

[33] H. Bock, M. Bucci, and R. Luzzi, "An offset-compensated oscillator-based

random bit source for security applications," Cryptographic Hardware and

Embedded Systems-CHES 2004, pp. 27-83, 2004.

[34] S. C. Smith and J. Di, "Designing asynchronous circuits using NULL convention

logic (NCL)," Synthesis Lectures on Digital Circuits and Systems, vol. 4, pp. 1-

96, 2009.

[35] L. Fesquet, J. Quartana, and M. Renaudin, "Asynchronous systems on

programmable logic," Reconfigurable Communication-centric SoCs,

ReCoSoC’05, pp. 105-112, 2005.

[36] T. Williams, Latency and throughput tradeoffs in self-timed speed-independent

pipelines and rings: Computer Systems Laboratory, Stanford University, 1990.

[37] A. J. Winstanley, "Temporal Properties of Self-Timed Rings," The University of

British Columbia, 2001.

[38] A. J. Winstanley, A. Garivier, and M. R. Greenstreet, "An event spacing

experiment," in Asynchronous Circuits and Systems, 2002. Proceedings. Eighth

International Symposium on, 2002, pp. 47-56.

[39] A. Cherkaoui, V. Fischer, A. Aubert, and L. Fesquet, "Comparison of Self-Timed

Ring and Inverter Ring Oscillators as Entropy Sources in FPGAs," in Design,

Automation & Test in Europe Conference & Exhibition (DATE), 2012, 2012, pp.

1325-1330.

[40] V. Fischer, F. Bernard, N. Bochard, and M. Varchola, "Enhancing security of ring

oscillator-based trng implemented in FPGA," in Field Programmable Logic and

Applications, 2008. FPL 2008. International Conference on, 2008, pp. 245-250.

[41] Xilinx. Spartan—II FPGA Family Data Sheet [Online]. Available:

http://www.xilinx.com/support/documentation/data_sheets/ds001.pdf

[42] Xilinx. Spartan-II and Spartan-IIE Libraries Guide for HDL Designs [Online].

Available:

http://www.xilinx.com/itp/xilinx10/books/docs/spartan2_hdl/spartan2_hdl.pdf

[43] Xilinx. User Constraints Guide 10.1 [Online]. Available:

http://www.xilinx.com/itp/xilinx10/books/docs/cgd/cgd.pdf

[44] K. A. Bowman and J. D. Meindl, "Impact of within-die parameter fluctuations on

future maximum clock frequency distributions," in Custom Integrated Circuits,

2001, IEEE Conference on., 2001, pp. 229-232.

[45] Y. Haile, P. H. W. Leong, and X. Qiang, "An FPGA Chip Identification

Generator Using Configurable Ring Oscillators," Very Large Scale Integration

(VLSI) Systems, IEEE Transactions on, vol. 20, pp. 2198-2207, 2012.

[46] C. Costea, F. Bernard, V. Fischer, and R. Fouquet, "Analysis and enhancement of

ring oscillators based physical unclonable functions in FPGAs," in

Reconfigurable Computing and FPGAs (ReConFig), 2010 International

Conference on, 2010, pp. 262-267.

[47] D. Merli, F. Stumpf, and C. Eckert, "Improving the quality of ring oscillator PUFs

on FPGAs," in Proceedings of the 5th Workshop on Embedded Systems Security,

2010, p. 9.

Appendix A

Source Codes

A.1 VHDL Code for a Self-Timed Ring (STR)

-- six-stage self-timed ring oscillator

library IEEE;

use IEEE.STD_LOGIC_1164.ALL;

use IEEE.STD_LOGIC_ARITH.ALL;

use IEEE.STD_LOGIC_UNSIGNED.ALL;

library UNISIM;

use UNISIM.VComponents.all;

entity str6 is

Port ( ring_out : out STD_LOGIC;

init : in STD_LOGIC);

end str6;

architecture Behavioral of str6 is

signal qout : std_logic_vector (1 to 6) := (others => '0');

attribute loc: string;

attribute loc of LUT4_inst1: label is "CLB_R1C1.S1";

attribute loc of LUT4_inst2: label is "CLB_R1C1.S1";

attribute loc of LUT4_inst3: label is "CLB_R1C1.S0";

attribute loc of LUT4_inst4: label is "CLB_R1C1.S0";

attribute loc of LUT4_inst5: label is "CLB_R1C2.S1";

attribute loc of LUT4_inst6: label is "CLB_R1C2.S1";

begin

-- SET cell

LUT4_inst1 : LUT4

generic map (

INIT => X"FFB2")

port map (

O => qout(1), -- LUT general output

I0 => qout(1), -- LUT input, fedback

I1 => qout(2), -- LUT input, reverse signal

I2 => qout(6), -- LUT input, forward

I3 => INIT -- LUT input, set/reset

);

-- End of LUT4_inst instantiation

-- RESET cell

LUT4_inst2 : LUT4

generic map (

INIT => X"00B2")

port map (

O => qout(2),

I0 => qout(2),

I1 => qout(3),

I2 => qout(1),

I3 => INIT

);

-- SET cell

LUT4_inst3 : LUT4

generic map (

INIT => X"FFB2")

port map (

O => qout(3),

I0 => qout(3),

I1 => qout(4),

I2 => qout(2),

I3 => INIT

);

-- SET cell

LUT4_inst4 : LUT4

generic map (

INIT => X"FFB2")

port map (

O => qout(4),

I0 => qout(4),

I1 => qout(5),

I2 => qout(3),

I3 => INIT

);

-- SET cell

LUT4_inst5 : LUT4

generic map (

INIT => X"FFB2")

port map (

O => qout(5),

I0 => qout(5),

I1 => qout(3),

I2 => qout(4),

I3 => INIT

);

-- SET cell

LUT4_inst6 : LUT4

generic map (

INIT => X"FFB2")

port map (

O => qout(6),

I0 => qout(6),

I1 => qout(1),

I2 => qout(5),

I3 => INIT

);

ring_out <= qout(6);

end Behavioral;

A.2 VHDL Code for STRO-PUF

-- STRO-PUF; 16 STROs per group; 32 STROs per PUF

library IEEE;

use IEEE.STD_LOGIC_1164.ALL;

use IEEE.STD_LOGIC_ARITH.ALL;

use IEEE.STD_LOGIC_UNSIGNED.ALL;

library UNISIM;

use UNISIM.VComponents.all;

entity str6PUF8 is

Port ( init : in STD_LOGIC;

ringout : out STD_LOGIC_VECTOR (1 to 32));

end str6PUF8;

architecture Behavioral of str6PUF8 is

component hm_str6 is

port (hm_init : in std_logic;

hm_ringout : out std_logic);

end component;

begin

-- instantiating hard-macros

puf1: hm_str6

port map( hm_init => init, hm_ringout => ringout(1));

puf2: hm_str6

port map( hm_init => init, hm_ringout => ringout(2));

puf3: hm_str6

port map( hm_init => init, hm_ringout => ringout(3));

puf4: hm_str6

port map( hm_init => init, hm_ringout => ringout(4));

puf5: hm_str6

port map( hm_init => init, hm_ringout => ringout(5));

puf6: hm_str6

port map( hm_init => init, hm_ringout => ringout(6));

puf7: hm_str6

port map( hm_init => init, hm_ringout => ringout(7));

puf8: hm_str6

port map( hm_init => init, hm_ringout => ringout(8));

puf9: hm_str6

port map( hm_init => init, hm_ringout => ringout(9));

puf10: hm_str6

port map( hm_init => init, hm_ringout => ringout(10));

puf11: hm_str6

port map( hm_init => init, hm_ringout => ringout(11));

puf12: hm_str6

port map( hm_init => init, hm_ringout => ringout(12));

puf13: hm_str6

port map( hm_init => init, hm_ringout => ringout(13));

puf14: hm_str6

port map( hm_init => init, hm_ringout => ringout(14));

puf15: hm_str6

port map( hm_init => init, hm_ringout => ringout(15));

puf16: hm_str6

port map( hm_init => init, hm_ringout => ringout(16));

puf17: hm_str6

port map( hm_init => init, hm_ringout => ringout(17));

puf18: hm_str6

port map( hm_init => init, hm_ringout => ringout(18));

puf19: hm_str6

port map( hm_init => init, hm_ringout => ringout(19));

puf20: hm_str6

port map( hm_init => init, hm_ringout => ringout(20));

puf21: hm_str6

port map( hm_init => init, hm_ringout => ringout(21));

puf22: hm_str6

port map( hm_init => init, hm_ringout => ringout(22));

puf23: hm_str6

port map( hm_init => init, hm_ringout => ringout(23));

puf24: hm_str6

port map( hm_init => init, hm_ringout => ringout(24));

puf25: hm_str6

port map( hm_init => init, hm_ringout => ringout(25));

puf26: hm_str6

port map( hm_init => init, hm_ringout => ringout(26));

puf27: hm_str6

port map( hm_init => init, hm_ringout => ringout(27));

puf28: hm_str6

port map( hm_init => init, hm_ringout => ringout(28));

puf29: hm_str6

port map( hm_init => init, hm_ringout => ringout(29));

puf30: hm_str6

port map( hm_init => init, hm_ringout => ringout(30));

puf31: hm_str6

port map( hm_init => init, hm_ringout => ringout(31));

puf32: hm_str6

port map( hm_init => init, hm_ringout => ringout(32));

end Behavioral;

A.3 UCF File for Mapping STRO-PUF in a Desired Region

#PACE: Start of Constraints generated by PACE

#PACE: Start of PACE I/O Pin Assignments

# Mapping onto Region 1

NET "init" LOC = "p59" ;

# output pins for oscillators in Group A

NET "ringout<1>" LOC = "p43" ;

NET "ringout<2>" LOC = "p48" ;

NET "ringout<3>" LOC = "p47" ;

NET "ringout<4>" LOC = "p42" ;

NET "ringout<5>" LOC = "p40" ;

NET "ringout<6>" LOC = "p29" ;

NET "ringout<7>" LOC = "p28" ;

NET "ringout<8>" LOC = "p27" ;

NET "ringout<9>" LOC = "p68" ;

NET "ringout<10>" LOC = "p44" ;

NET "ringout<11>" LOC = "p46" ;

NET "ringout<12>" LOC = "p49" ;

NET "ringout<13>" LOC = "p26" ;

NET "ringout<14>" LOC = "p23" ;

NET "ringout<15>" LOC = "p57" ;

NET "ringout<16>" LOC = "p22" ;

# output pins for oscillators in Group B

NET "ringout<17>" LOC = "p75" ;

NET "ringout<18>" LOC = "p50" ;

NET "ringout<19>" LOC = "p51" ;

NET "ringout<20>" LOC = "p60" ;

NET "ringout<21>" LOC = "p62" ;

NET "ringout<22>" LOC = "p54" ;

NET "ringout<23>" LOC = "p56" ;

NET "ringout<24>" LOC = "p63" ;

#NET "ringout<25>" LOC = "p64" ;

#NET "ringout<26>" LOC = "p65" ;

#NET "ringout<27>" LOC = "p66" ;

#NET "ringout<28>" LOC = "p76" ;

#NET "ringout<29>" LOC = "p79" ;

#NET "ringout<30>" LOC = "p80" ;

#NET "ringout<31>" LOC = "p77" ;

#NET "ringout<32>" LOC = "p67" ;

#PACE: Start of PACE Area Constraints Region 1

#Region 1 Group A

INST "puf1" LOC=CLB_R1C1.S1;

INST "puf2" LOC=CLB_R1C3.S1;

INST "puf3" LOC=CLB_R1C5.S1;

INST "puf4" LOC=CLB_R1C7.S1;

INST "puf5" LOC=CLB_R1C9.S1;

INST "puf6" LOC=CLB_R1C11.S1;

INST "puf7" LOC=CLB_R1C13.S1;

INST "puf8" LOC=CLB_R1C15.S1;

INST "puf9" LOC=CLB_R2C1.S1;

INST "puf10" LOC=CLB_R2C3.S1;

INST "puf11" LOC=CLB_R2C5.S1;

INST "puf12" LOC=CLB_R2C7.S1;

INST "puf13" LOC=CLB_R2C9.S1;

INST "puf14" LOC=CLB_R2C11.S1;

INST "puf15" LOC=CLB_R2C13.S1;

INST "puf16" LOC=CLB_R2C15.S1;

#Region 1 Group B

INST "puf17" LOC=CLB_R3C1.S1;

INST "puf18" LOC=CLB_R3C3.S1;

INST "puf19" LOC=CLB_R3C5.S1;

INST "puf20" LOC=CLB_R3C7.S1;

INST "puf21" LOC=CLB_R3C9.S1;

INST "puf22" LOC=CLB_R3C11.S1;

INST "puf23" LOC=CLB_R3C13.S1;

INST "puf24" LOC=CLB_R3C15.S1;

INST "puf25" LOC=CLB_R4C1.S1;

INST "puf26" LOC=CLB_R4C3.S1;

INST "puf27" LOC=CLB_R4C5.S1;

INST "puf28" LOC=CLB_R4C7.S1;

INST "puf29" LOC=CLB_R4C9.S1;

INST "puf30" LOC=CLB_R4C11.S1;

INST "puf31" LOC=CLB_R4C13.S1;

INST "puf32" LOC=CLB_R4C15.S1;

# End of UCF

A.4 Uniqueness Analysis of STRO-PUF for 16-bit Response

% 6 PUF /FPGA , 3 devices, 16 STROs/group, 32 STROs / PUF

% independent bits, one on one comparison

% simple comparison, comparing each oscillator only once

group_size = 16;

Npuf = 36 ;

k=1;

t=0;

rbit = (0);

for p=1:2:Npuf

k=1;

t=t+1;

for j=1:group_size

if (data(j,p) >= data(j,p+1))

rbit(t,k) = 1;

else

rbit(t,k) = 0;

end

k = k +1;

end

disp(rbit);

%binary to decimal

nbit1=2.^(size(rbit,2)-1:-1:0);

%decimal to hex

hexR1=dec2hex(nbit1*rbit.');

disp(hexR1); % PUF responses

%probability density function (PDF)

%calculating hamming distance between 1-1 pairs

%(generate 16-bit per comparisons from 16 pairs of STROs)

c=0;

int_hd = (0);

for i=1:(Npuf/2 - 1)

for j = i+1 : Npuf/2

c =c+1;

int_hd(c,:)= sum(abs(rbit(i,:)-rbit(j,:)));

end

display combinations:

disp(c);

disp(int_hd);

hd_data= int_hd; %frequency of occurrence of #bits flipped

binWidth = 1;

binCtrs = 1:1:16; %Bin centers, depends on data

n=length(hd_data);

counts = hist(hd_data,binCtrs);

prob = counts / (n * binWidth); %pmf = prob = counts / n

bar(binCtrs,prob,'hist');

min1 = min (int_hd); % minimum HD

max1 = max (int_hd); % maximum HD

fprintf ('\n minimum hamming distance = %d', min1);

fprintf ('\n maximum hamming distance = %d', max1);

s = sum(int_hd); % sum of HD

avg_hd = s/153;

fprintf ('\n average hamming distance = %f', avg_hd);

uniqueness = s/(153*16) *100; % sum_hd /(no. of combination * no. of bits in an output)

fprintf ('\n uniqueness = %f%% \n', uniqueness);

A.5 Uniqueness Analysis of STRO-PUF for 256-bit Response

% 6 PUF /device , 3 devices, 16STROs/group, 32 STROs

% inclusion of dependent bits

% comparing each oscillator in a group with every oscillator in another group

group_size = 16;

Npuf = 36 ;% no of oscillator. so no of pufs = Npuf/2

t=0;

rbit = (0);

for p = 1:2: (Npuf-1)

k=1;

t=t+1;

for i=1:(group_size)

for j=1:group_size

if (data(i,p) >= data(j,p+1))

rbit(t,k) = 1;

else

rbit(t,k) = 0;

end

k = k +1;

end

disp(rbit);

% converting binary to decimal to hex

% can’t convert more than 52 bit ; breaking 256-bit into set of 32-bit

r1 = rbit(1:(Npuf/2),1:32);

r2 = rbit(1:(Npuf/2),33:64);

r3 = rbit(1:(Npuf/2),65:96);

r4 = rbit(1:(Npuf/2),97:128);

r5 = rbit(1:(Npuf/2),129:160);

r6 = rbit(1:(Npuf/2),161:192);

r7 = rbit(1:(Npuf/2),193:224);

r8 = rbit(1:(Npuf/2),225:256);

%binary to decimal

nbit1=2.^(size(r1,2)-1:-1:0);

%decimal to hex

hexR1=dec2hex(nbit1*r1.');

nbit2=2.^(size(r2,2)-1:-1:0);

hexR2=dec2hex(nbit2*r2.');

nbit3=2.^(size(r3,2)-1:-1:0);

hexR3=dec2hex(nbit3*r3.');

nbit4=2.^(size(r4,2)-1:-1:0);

hexR4=dec2hex(nbit4*r4.');

nbit5=2.^(size(r5,2)-1:-1:0);

hexR5=dec2hex(nbit5*r5.');

nbit6=2.^(size(r6,2)-1:-1:0);

hexR6=dec2hex(nbit6*r6.');

nbit7=2.^(size(r7,2)-1:-1:0);

hexR7=dec2hex(nbit7*r7.');

nbit8=2.^(size(r2,2)-1:-1:0);

hexR8=dec2hex(nbit8*r8.');

rbitHex = [hexR1 hexR2 hexR3 hexR4 hexR5 hexR6 hexR7 hexR8];

disp(rbitHex); % PUF responses

%probability density function (PDF)

%calculating hamming distance

c=0;

int_hd = (0);

for i=1:(Npuf/2 - 1)

for j = i+1 : Npuf/2

c =c+1;

int_hd(c,:)= sum(abs(rbit(i,:)-rbit(j,:)));

end

display combinations:

disp(c);

disp(int_hd);

hd_data= int_hd; %frequency of occurrence of #bits flipped

binWidth = 1;

binCtrs = 1:5:250; %Bin centers, depends on data

n=length(hd_data);

counts = hist(hd_data,binCtrs);

prob = counts / (n * binWidth); %pmf = prob = counts / n

bar(binCtrs,prob,'hist');

min1 = min (int_hd); % minimum hd

max1 = max (int_hd); % maximum hd

fprintf ('\n minimum hamming distance = %d', min1);

fprintf ('\n maximum hamming distance = %d', max1);

s = sum(int_hd); % sum of hd

avg_hd = s/153; % average hamming distance

fprintf ('\n average hamming distance = %f', avg_hd);

uniqueness = s/(153*256) *100; % sum_hd /(no. of combination * no. of bits in an

output)

fprintf ('\n uniqueness = %f%% \n', uniqueness);

sources/146/8551286-aam.pdf

IEEE TRANS. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. XX, NO. YY, YEAR ZZZ 1

Thwarting Security Threats from Malicious FPGA

Tools with Novel FPGA-Oriented Moving Target

Defense (FOMTD) Zhiming Zhang, Laurent Njilla, Member, IEEE, Charles A. Kamhoua, Senior Member, IEEE,

and Qiaoyan Yu, Senior Member, IEEE

Abstract—The increasing usage and popularity of FPGA sys- tems bring in security concerns. Existing countermeasures are mostly based on the assumption that the computer-aided-design (CAD) tools for FPGA configuration are trusted. Unfortunately, this assumption does not always hold. In this work, we investigate the potential security threats originated from the untrusted CAD tools. Further, we exploit the principle of moving target defense (MTD) to propose a FPGA-oriented MTD (FOMTD) method. The three defense lines in FOMTD generate uncertainties, from the attacker’s point of view, to thwart hardware Trojan insertion attacks. The theoretical upper bound of the hardware Trojan hit rate for each defense line is provided in this work. Experimental results show that the proposed defense lines 2 and 3 reduce the Trojan hit rate by up to 40% and 91%, respectively, for the scenario that the malicious CAD tool can insert Trojans in the occupied FPGA slices. The proposed gate replacement technique in the defense line 3 further improves the attack resilience and obtains 88% reduction on the Trojan hit rate. Compared to the static redundancy based Trojan detection method, the proposed method achieves better resilience against Trojan insertions and consumes 50% less dynamic power.

Index Terms—Moving target defense, FPGA, Xilinx, Altera, hardware Trojan, FPGA design suite, hardware security.

I. INTRODUCTION

Field Programmable Gate Arrays (FPGAs) enter a rapid

growth era due to their attractive flexibility and CMOS-

compatible fabrication process. Global Market Insights pre-

dicts that the FPGA market size is expected to reach 9.98

billion US dollars by 2022 [1]. The increasing popularity of

FPGA may drive more attackers to compromise FPGA-based

systems through various channels. The work [2] highlights that

FPGA security embraces four aspects: (1) secure operations

conducted by FPGA devices, (2) utilization of FPGAs for

system security enhancement, (3) secure bitstream delivery to

FPGA devices, and (4) exploitation of FPGA devices as an

attack surface to breach FPGA-based systems. The aspects (1)

and (2) emphasize that the programmable features of FPGAs

Z. Zhang and Q. Yu are with the Department of Electrical and Computer Engineering, University of New Hampshire, Durham, NH, 03824 USA. e- mail: [email protected].

L. Njilla is with the Cyber Assurance Branch of Air Force Research Laboratory, Rome, NY 13441, USA. e-mail: [email protected].

C. Kamhoua is with the Network Security Branch of Army Research Lab- oratory, Adelphi, MD 20783, USA. e-mail: [email protected].

DISTRIBUTION A. Approved for public release: distribution unlimited. Case Number: 88ABW-2018-2036. Dated 11 May 2018.

Manuscript received April 7, 2018, revised August 10, 2018, September 14, 2018, accepted October 16, 2018.

have been exploited to address the security challenges that

Application-Specific Integrated Circuits (ASICs) are facing.

For example, the embedded FPGA is used to perform locking

key authentication [3], [4]. However, FPGAs have their own

security vulnerabilities. The literature [3], [5]–[8] extensively

discuss the aspects (3) and (4).

For the reason of efficiency and economy, the supply chain

of modern FPGAs is getting globalized. This trend potentially

increases the chance that FPGA devices or FPGA design

tools are not trustworthy. Intellectual property (IP) stealing

and tampering could happen in different data formats, such

as hardware description language (HDL) and bitstream [3],

[12]. The integrity of FPGA systems may be harmed by the

hardware Trojans induced in some stages of the FPGA design

flow [9]–[11].

This work aims to address the Trojan insertion threat from

malicious FPGA CAD tools. More specifically, we make the

following contributions in this work:

• We use two practical examples to demonstrate that a hardware Trojan can be injected during several stages

of the FPGA design flow without disturbing the original

HDL design file.

• We exploit the principle of moving target defense and propose an FPGA-Oriented Moving Target Defense

(FOMTD) countermeasure to resist the attacks from

malicious FPGA tools. To the best of our knowledge,

together with our preliminary work [10], [12], our re-

search is the first effort that assesses the feasibility of

applying the MTD concept to defeat hardware Trojan

insertion via malicious FPGA software.

• We propose three defense lines to generate three types of unpredictability, which facilitate to thwart the stealthy

design modification induced by compromised FPGA

software. The first defense line utilizes a user constraints

file to designate a portion of the design to specific FPGA

slices. The second defense line randomly selects one of

the design replicas at runtime and uses an input gating

technique to mute the unused replicas for power saving.

The third defense line divides a design into multiple

submodules and assembles the complete design with

hot-swappable submodules at runtime, increasing the

number of design configurations on the FPGA device.

• We analyze the theoretical upper bound of hardware Trojan hit rate for each defense line, and validate the

analysis through FPGA emulation.

Digital Object Identifier: 10.1109/TVLSI.2018.2879878

1557-9999 c© 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

IEEE TRANS. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. XX, NO. YY, YEAR ZZZ 2

The remainder of this work is organized as follows. Section

II discusses related work. Section III describes the attack

model used in this work. Practical attack examples on two

commercial FPGA design suites are provided in Section IV.

The FOMTD method is presented in Section V. The theoretical

security strength achieved by the three defense lines in the

FOMTD method is analyzed in Section V, as well. Extensive

evaluation of our method and relevant work is conducted in

Section VI. We conclude this work in Section VII.

II. RELATED WORK

One category of existing countermeasures against security

threats on FPGA systems focus on IP theft issues during FPGA

deployment phase. To avoid information leaking through hard-

ware Trojans, the MORPH architecture [13] combines multiple

levels of protection schemes, including morph operation, onion

encryption, replication, partial runtime reconfiguration and

hardware abstraction layer to mitigate the Trojans induced

in fabrication time or design time. No hardware cost and

detailed assessment are available in [13]. In the work [6], a

bitstream encryption method is implemented for the Xilinx

Virtex FPGA family. The security protocol for that encryp-

tion scheme protects the IP from being illegally copied via

restricting the access to the configuration file and key bits.

The method in [14] manipulates a state transition graph to

create a rare property and form watermarks. In the PUF-

FSM binding protection mechanism [15], the FSM in an IP

can only be activated by the correct response from the PUF

embedded in the FPGA. The MUTARCH approach in [16]

assigns each FPGA device a unique architecture to encrypt

the bitstream distinctively. Only the authorized device can

recognize the encrypted bitstream. However, those bitstream

protection methods only secure the FPGA implementation

during the bitstream generation stage. Without considering the

potential threats from the mapping and place&route stages in

the FPGA design flow, FPGA deployment is still vulnerable

to the threats from untrusted FPGA design suite.

Another category of defense efforts is to thwart the se-

curity threats originated from malicious FPGA devices. The

work [17] detects anomalies in the physical layer of the

FPGA by identifying the basic building block on the FPGA

die, which has different physical statistical characteristics

with neighboring blocks. In [18], a specific taxonomy of

FPGA-based hardware Trojan attacks is illustrated. That work

also presents an adapted triple modular redundancy (ATMR)

to detect hardware Trojans on FPGAs. The ATMR method

replicates the design three times and the third replica is

activated only when mismatch is found between the first two.

In the work [19], the normalized parameters (e.g., power

consumption and timing variation) are weighted and combined

as a threat detectability metric, which is compared with a

threshold to determine whether a hardware Trojan exists in

the design. The work [20] fills up the unoccupied FPGA space

with low-level dummy logics to eliminate the FPGA resource

available for hardware Trojan insertion. Those methods could

be nullified if Trojans are inserted during the process of FPGA

configuration.

There are limited works addressing the attacks from CAD

tools for FPGAs. Logic testing and side-channel analysis

have been exploited to detect the hardware Trojans inserted

through malicious FPGA design suites [21]. The Multiple

Excitation of Rare Occurrence (MERO) method [22] provides

a compact way to generate test patterns for Trojan detection.

The work [23] leverages the dependency between dynamic

current and maximum operating frequency to detect hardware

Trojan on FPGAs. Our preliminary work [12] addresses the

security challenges occurred during the FPGA deployment for

legacy systems. We apply the pin grounding scheme to the

unused FPGA I/O pins to block the communication between

FPGA Trojans and off-chip world, and further propose a

hardware MTD to thwart the Trojan insertion by malicious

CAD tools. We expand our work for legacy systems to general

FPGA applications in [10].

According to the discussions above, we conclude that most

of the existing solutions aim for the FPGA security threats

either from supply chain or FPGA devices, not from malicious

FPGA design suites. Although the FPGA vendors [24] adopt

bit encryption, authentication, and key/register zeroization

techniques to prevent bitstreams from being tampered, those

methods do not thwart the design modification before the

bitstream is generated by the FPGA software. Our previous

work [10], [12] exploit the principle of MTD to generate

uncertainty from the attacker’s point of view, effectively miti-

gating hardware Trojans and thus protecting the bitstream from

being maliciously modified. In this work, we add theoretical

analyses for our countermeasures, improve the MTD defense

strength, and perform more extensive performance and over-

head assessments.

III. ATTACK MODEL

A. Attacks from Malicious FPGA Design Suites

FPGA design software has been considered as potential

hardware threats challenging the FPGA security [25]. Un-

trusted FPGA CAD tools can be exploited by attackers to

insert hardware Trojans [26], [27]. As shown in Fig. 1(a),

our attack model assumes that the FPGA deployment engi-

neers, in-house designs, the bitstream downloading channel

and procedure are trusted. The untrusted phase interested in

this work is the FPGA configuration, especially the design

mapping, place and route stages. The attacks are originated

from malicious software mounted on top of the original

FPGA design suite for SRAM FPGAs, as shown in Fig. 1(b).

The FPGA design suite may not be malicious initially, but

advanced attackers could exploit the vulnerability of the FPGA

design suite to implant malicious software to the original suite

through software upgrading. We argue that the FPGA design

suite will be propagated through computer network or retailers,

so the integrity of the software may be sabotaged by advanced

attackers. One motivation example for this type of attacks

could be: if an attacker knows the military or bank is about

to purchase an FPGA to perform some specific functions.

The source of FPGA devices and the functional modules (in

a format of hardware description language) are trusted after

the rigorous examination. A stealthy way to compromise the

system is through a compromised FPGA design suite.

IEEE TRANS. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. XX, NO. YY, YEAR ZZZ 3

(a)

(b)

Fig. 1. Contaminated FPGA design suite leading to a stealthy modification on the placelist for an FPGA device. (a) Software compromising stage, and (b) malicious software add-on in the supply chain of FPGA tools.

B. Three Levels of Attacks

As the malicious program is mounted on top of the original

FPGA CAD tool before FPGA users utilize the FPGA tool

and development board, it is reasonable to assume that the

attacker does not know what exact design will be mapped

to the FPGA die. Depending on the attacker’s capability, we

classify the attacks into three levels.

• L-1 attack: Based on his/her experiences, the attacker places hardware Trojans in the most popular FPGA

die area. At this level, the attacker does not have any

knowledge of the design to be configured on the FPGA.

• L-2 attack: The attacker is capable of extracting informa- tion like which FPGA slices are utilized by the current

design from the FPGA placelist (i.e., the output after

placement and routing). Although the attack at this level

does not analyze the exact function of the design, the

exploration space of L-2 attacks is significantly smaller

than that of L-1 attacks.

• L-3 attack: The malicious FPGA CAD tool searches for the design replicas used by duplication based defense

techniques, and inserts identical Trojans to each replica.

Attacks at this level is powerful, as L-3 attacks are

able to nullify the countermeasure that simply duplicates

the design. In spite of being the most challenging, L-3

attacks will cost attackers more resources to guarantee

the success of Trojan insertion attacks.

To make an effective impact on the design function, hard-

ware Trojans interested in this work are the ones altering the

look-up table (LUT) configuration for the original design. A

hardware Trojan is composed of a trigger and a payload. In

our attack model, the trigger circuit can be located in either the

Synthesize -XST

PlanAhead

NGDBuild (Translate)

Map

PAR (Place & Route)

Bitgen

.ngc .ucf

.ngd

_map.ncd

.ncd

.bit

Attack

surface 1

_map.ncd*

Attack

surface 2

.ncd*

Attack

surface 3

.bit*

HDL

Fig. 2. Attack surfaces on the Xilinx FPGA design flow. The rectangles represent the output file from each step. The file with the symbol of * is an output file modified by the malicious FPGA software.

occupied or unoccupied FPGA slices, but the payload circuit

must interact with the FPGA area occupied by the original

design.

IV. DEMONSTRATION OF ATTACKS FROM MALICIOUS

FPGA SOFTWARE

In this section, we demonstrate two practical attacks through

two commercial FPGA design suites. The design suite’s built-

in tools are exploited to manually disturb the placelist.

A. Attacks on Xilinx ISE

Figure 2 depicts the design flow for a Xilinx FPGA design

suite. There are three potential attack surfaces for maliciously

implanted FPGA tools to land on. We use Xilinx ISE 14.1

as an example in the following discussion. In the step of

mapping, an attacker could introduce additional I/O pins,

exchange the existing I/O pin connection, modify the slew

rate and the voltage level of I/O pins. As the tampered

mapping output map.ncd* is not readable (unless the FPGA

design suite provides a program like ncd2xdl to read back

the native circuit description file), it is not easy to notice

the modification performed by the malicious FPGA software.

More tampering on the FPGA configuration can be done

in the step of Place and Route (PAR) than in the mapping

stage because all the LUTs, flip-flops, SRAM blocks, and

interconnects are specifically designated on the FPGA die. The

attack on the stage of bitstream generation is mainly for the

purpose of IP piracy, which is out of the scope in this work.

For interested readers, many existing literatures [15], [16], [28]

have extensive discussion on this issue. Our work focuses on

the first two attack surfaces shown in Fig. 2.

We successfully modified the configuration of the target

slice through the FPGA editor tool from Xilinx. Figure 3

shows the graphic interface. In the edit mode of the FPGA

editor, we changed the logic configuration after the PAR stage,

and then re-did bitstream generation. The attack can also be

performed via XDL file editing followed by the command

xdl2ndc. All attack actions here can be implemented in a

malicious FPGA software implanted in the original FPGA

design suite.

IEEE TRANS. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. XX, NO. YY, YEAR ZZZ 4

Fig. 3. An example of practical attack performed through the FPGA editor tool available in the Xilinx ISE 14.1 design suite.

Fig. 4. An example of practical attack performed through Quartus Chip Planner.

B. Attacks on Altera Quartus

The Altera FPGA design suite, Quartus, leaves similar back

doors for attackers to insert hardware Trojans. The security

vulnerability of Quartus is in the process of placement and

routing Fitter, like PAR in the Xilinix ISE. Attackers can,

in theory, manipulate the entire FPGA configuration if they

control Fitter or access and alter the design file that the tool

Fitter is dealing with. As shown in Fig. 4, attackers can

change buffer slew rate, I/O standard or logic function of

the design via the Quartus built-in tool Chip Planner. The

malicious changes can be done after design compilation and

no re-compilation process is needed to save the changes. The

attacks performed through Chip Planner are stealthy because

they do not disturb the functional module in a format of HDL

and the constraint settings.

V. PROPOSED FPGA-ORIENTED MOVING TARGET

DEFENSE (FOMTD) METHOD

Defenders must protect every entry point from potential

security threats. In contrast, an adversary only needs to find

one way to breach the attack surface. Moreover, the attacker

may even have unlimited time to perform attacks. The main

motivation of applying the MTD concept to a system is to

reduce, if not completely eliminate, the imbalanced advantage

that an attacker could have. MTD techniques can make the

system less predictable and thus the attack surface is changed

over time [29]. The early concept of MTD was illustrated

in [30] and the application of MTD has been observed in the

domain of cyber security [31].

A. Method Overview

We exploit the principle of MTD as a mean to proactively

address the security threats from malicious FPGA software.

Different with the traditional MTD methods in the domain

of cyber security, FOMTD explores the unpredictability of a

hardware design being configured on FPGAs to deter attackers

from precisely inserting hardware Trojans. More specifically,

the key idea of FOMTD is to make the output of FPGA

placement and routing unpredictable, such that attackers who

mount a malicious program on the original FPGA design suite

cannot easily and successfully alter the original implemen-

tation. Note, our method does not guarantee to completely

prevent all hardware intrusions. Instead, our approach will

increase the difficulty of a Trojan successfully landing on one

(or more) FPGA slices occupied by the design.

The desired unpredictabilities are achieved by the three de-

fense lines provided by our method. In the domain of hardware

(i.e. FPGA), we exploit the following configuration resources

to realize the FOMTD method: (i) the availability of multiple

replicas of the intended design, (ii) random selection of one

replica for operation at runtime, (iii) random designation of

FPGA slice positions for the selected LUTs, and (iv) hot-

swappable submodules for runtime design assembling.

B. Defense Line 1 (DFL1): Slice Position Selection through

User Constraints File

1) Method description: The use of FPGA default settings

for placement and route will make the location of occupied

FPGA slices predictable, which eases the Trojan insertion

through malicious FPGA CAD tools. To address this issue,

we propose to specify some slice locations for the selected

LUT configurations. This specification can be performed by

appending commands to the user constraints file, which is

typically used to specify pin and timing constraints. Figure 5

shows the effect of the proposed defense line 1 (DFL1). As

can be seen, the entire design is mapped to a different area of

the FPGA grid thanks to the reallocation of three LUTs (black

squares in Fig. 5).

The selection of slice positions is conducted by FPGA users

at the FPGA deployment stage. As FPGA deployment happens

after the implementation of the malicious FPGA software, it

is not easy for the malicious software designers (attackers) to

ensure the injected hardware Trojans successfully alter user

designs. Here, we assume that attackers do not have access

to the user constraints file applied after the FPGA CAD tool

is delivered to the FPGA user. A blindly inserted Trojan may

not effectively impact the design on the FPGA.

2) Case study: We used the ISCAS benchmark circuit

c6288 as an example to show the effect of slice position

specification. In the first case, we followed the default settings

of the Xilinx ISE 14.1 to generate the placelist for c6288. In

the second case, we chose one slice position for four randomly

selected LUTs (we refer this as the single-slice case). In

the third case, three slice locations are designated to twelve

LUTs (triple-slice case). We can observe the design placement

details in the FPGA editor. Figure 6 shows the slice occupation

results (red dots) for the three cases described above. As can

IEEE TRANS. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. XX, NO. YY, YEAR ZZZ 5

Fig. 5. FPGA mapping modified by proposed defense line 1. Three parts in different colors represent three partitions of the intended design. Black squares are three LUT configurations. Proposed defense line 1 alters the default LUT mapping on the FPGA grid.

(a) (b) (c)

Fig. 6. Design placement observed from the Xilinx FPGA editor for (a) default setting, (b) single-slice selection, and (c) triple-slice selection cases.

be seen, our defense line 1 indeed significantly changes the

design placement on the FPGA die.

3) Theoretical bound for defense line 1 thwarting different

Trojan attacks: The baseline here is the original design

without any protection. We assume that the intended baseline

design occupies φ slices, the entire FPGA die is composed of Φ user controllable slices. We define the hardware Trojan hit rate, Γ, as the probability that a randomly-picked slice is indeed one of the slices utilized by the design. As long as the

Trojan payload is located in the area occupied by the original

design, we consider it as a Trojan hit. If an attacker blindly

inserts a hardware Trojan to the FPGA die (i.e., blind attack),

the Trojan hit rate is equal to Eq. (1).

Γbaseline vs. blind attack = φ

Φ (1)

When the attacker has knowledge of the commonly used

slice area (i.e. L-1 attack), the target FPGA area will be smaller

than the entire FPGA die. The empirical number ξ is the coefficient for how much Trojan insertion space is narrowed by

the attacker based on his/her experience, and the range of ξ is between 0 and 1. Hereafter, we name ξ as the space coefficient of Trojan attacks. The function f(ξ) represents the degree of accuracy regarding whether the real design placement matches

to the attacker’s prediction. The detailed function of f(ξ) varies with the attacker’s LUT occupation guessing algorithm.

Now, the hardware Trojan hit rate for the design without any

protection against L-1 attack is calculated in Eq. (2). If f(ξ) reaches its maximum value, the entire design will be covered

in the attack space. Γbaselinevs. L−1 will decrease with the increasing space coefficient of Trojan attack ξ.

Γbaselinevs. L−1 = f(ξ) ∗ φ

ξ ∗ Φ (2)

When the L-2 attacker has the knowledge of the detailed

slice utilization, each inserted hardware Trojan will absolutely

impact the original design because the Trojan exploration

space is equal to the injection space. The Trojan hit rate for

L-2 attacks Γbaseline vs. L−3 is expressed in Eq. (3).

Γbaselinevs. L−2 = φ

φ = 1 (3)

In contrast, our proposed defense line 1 (DFL1) does not

use the default FPGA mapping settings. Thus, the target FPGA

area remains as the entire FPGA die Φ. Our Trojan hit rate turns to Eq. (4). Comparing Eq. (2) and Eq. (4) we can see

that the denominator of Eq. (4) is larger than that in Eq. (2).

Hence, our defense line 1 reduces the Trojan hit rate in the

scenario of L-1 attacks. Once the attacker knows the exact slice

utilization, the proposed defense line 1 cannot thwart L-2 and

L-3 attacks and the corresponding Trojan hit rate is 1.

ΓDF L1 vs. L−1 = f(ξ) ∗ φ

Φ (4)

C. Defense Line 2 (DFL2): Pseudo-Random Replica Selection

1) Method description: FPGA has a nature of reconfigura-

tion and redundancy. We exploit this nature to implement the

principle of MTD on FPGAs. Suppose a design is composed

multiple parts (however design portioning is not always neces-

sary). We duplicate the entire design (as a single unit) n times. Only one of the replicas will be active at a time, and the rest of

the replicas are inactive by using input gating technique. The

replica selection and input gating are controlled by a pseudo-

random selector, which is not a true random number generator.

Because we only have a limited number of replicas on the

FPGA, the range of the random number is not large. A user-

defined arbitrary logic function and a set of external inputs are

good enough to pseudo-randomly choose one of the replicas.

Meanwhile, the use of user-defined arbitrary logic can prevent

attackers from searching the typical random number generator

circuit to nullify the countermeasure in advance. Figure 7

shows the concept of our defense line 2, in which we do not

have a comparison logic to examine the consistency among

the n replicas to save power. As the fact that which replica

will be active is determined after the FPGA configuration, an

attacker (at L-1) needs to blindly place the hardware Trojan

to the entire FPGA die to make a successful attack.

2) Theoretical bound for defense line 2 thwarting different

Trojan attacks: Figure 8 depicts an example of exploration

expansion by our proposed defense line 2. A complete design

(including replication) consists of multiple units and the con-

trol logic for replica selection is small enough (compared to

φ) to be ignored for the simplicity of analysis. Because of the slice position specification, the rough size of the Trojan

exploration space SF OMT D can be expressed by Eq. (5).

SF OMT D = max (|Xi − Xk|) ∗ max (|Yk − Yj|) (5)

Compared to the baseline, our method achieves the theo-

retical worst-case hardware Trojan hit rate for L-2 and L-3

attacks as described in Eqs. (6) and (7), respectively. If L-2

attacks are taken place in the design, Γbaselinevs. L−2 increases

IEEE TRANS. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. XX, NO. YY, YEAR ZZZ 6

Fig. 7. Schematic diagram of proposed defense line 2.

Fig. 8. Hardware Trojan attack exploration space for (a) the design placement with default FPGA setting and (b) the design protected with FOMTD defense lines 1 and 2.

to 1. In contrast, ΓDF L1&2 vs. L−2 remains low due to the expanded Trojan exploration space by the defense line 2. The

exact Trojan hit rate depends on the size of the design unit

for duplication, ν. Eq. (6) is greater than φ SF OMT D

. Under

the condition of L-3 attacks, our Trojan hit rate will not go

beyond 1/n (theoretically, the worst-case Trojan hit rate is a uniform distribution of random replica selections). In our

simulation section, we observed that our actual Trojan hit rate

never reaches this upper bound.

ΓDF L1&2 vs. L−2 = φ

n ∗ ν + (φ − ν) (6)

ΓDF L1&2 vs. L−3 = φ

n ∗ φ =

n (7)

D. Defense Line 3 (DFL3): Runtime Design Assembling

1) Method description: Our defense line 3 is the hot-

swappable submodule assembling technique, as shown in

Fig. 9. We partition the original design into m submodules

and each submodule is duplicated by n times. Only one

replica of each submodule will be assembled into a complete

design. The pseudo-random selector is utilized to determine

which replica to be chosen at runtime. After a period of time,

the selection of submodule replicas will be changed without

stopping the normal operation (i.e. hot-swappable assembling).

The maximum number of design configurations is nm. This

Fig. 9. Schematic diagram of the Hot-swappable submodule assembling technique provided by proposed defense line 3.

Fig. 10. Two styles of applying defense line 3 to sequential circuits.

large number of configurations further increases the difficulty

for the attacker to recognize the entire design for attack.

The hot-swappable assembling technique shown in Fig. 9

is directly applicable for combinational circuits. We tailor this

technique to make it suitable for sequential circuits. As shown

in Fig. 10, two styles are available for the circuit composed of

combinational logic and memory elements. In style I, we do

not duplicate the registers so that the submodule assembling

techniques for combinational and sequential circuits are the

same. In style II, the registers have replicas, too. To realize the

hot-swappable feature, we copy the content of active registers

to the hot-swap registers (HS Reg. in Fig. 10) before the

runtime submodule swapping happens. Then, we load the

value saved in HS Reg. to all register replicas to resume the

operation after runtime submodule swapping.

Additional option 1: input gating. To thwart L-3 attacks,

we could further strengthen our defense line 3 by loosing the

input gating and enabling two replicas active, such that the

two replicas can be used to examine the consistency between

their final outputs. However, the enhanced defense capability

comes with more power consumption.

Additional option 2: gate replacement on replicas. To

defeat L-3 attacks better, we enhance our defense line 3

by bringing diversity to the replicas of hot-swappable sub-

modules. In the work [18], the diversity on implementation

is introduced by using different hard macros, which are

IEEE TRANS. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. XX, NO. YY, YEAR ZZZ 7

Fig. 11. Gate replacement for the security enhancement in defense line 3.

obtained by applying different constraint conditions during

FPGA synthesis. Inspired by that work, we create hard macros

at gate level so that we have more flexibility to facilitate

the implementation of heterogeneous replicas for submodules.

Those gate-level hard macros are used to replace some gates in

one of the replicas. As a result, even if an attacker searches for

the same FPGA configuration patterns between two replicas,

the success rate of finding two identical copies for future

Trojan insertion will be extremely low.

The flowchart for the proposed gate replacement on replicas

is depicted in Fig. 11. First, we randomly choose one (or

more) type(s) of logic gates, for instance NAND (c, a, b),

in one replica. Next, we apply the de Morgan’s laws to

replace the chosen gate with other types of logic gates, while

maintaining the same Boolean function. For the 2-input NAND

gate, we can replace it with OR (c, ∼a, ∼b). Note, all the gate replacement is done in the Verilog description. To prevent

the FPGA synthesis tool from removing our gate replacement

during the logic optimization process, we implement the OR

(c, ∼a, ∼b) with three customized hard macros, HM OR (ā, b̄, c), HM NOT (a, ā) and HM NOT (b, b̄). HM OR and HM NOT defined in Verilog work as the logic OR and

inversion operations. By using hard macros, the gates for

replacement can be mapped into one independent slice and

they will not be merged with other LUT configuration. We

can conduct gate replacement for one or multiple replicas so

that the identical LUT configurations will be removed. Hence,

our enhanced defense line 3 can thwart L-3 attacks.

2) Theoretical bound for defense line 3 thwarting different

Trojan attacks: With the defense line 3, we can obtain nm

configurations in total. Given a design, more submodules lead

to more dynamic configurations and thus more unpredictability

for Trojan insertion. The coefficient ξi varies for each configu- ration and so does f(ξi). To obtain the Trojan hit consistently, the overall Trojan hit rate for L-1 attacks is as expressed in Eq.

(8), in which sp is the number of hot-swapping configurations. As the slides for the non-duplicated portion of the design

change in each FPGA configuration, the overall Trojan hit

rate is the product of the Trojan hit rate for sp different hot- swapping configurations. The maximum value of sp is nm.

ΓDF L3 vs. L−1 =

n m

∏

i=1

(

f(ξi) ∗ φ

)sp

(8)

With respect to L-2 attacks, the attacker knows which slices

are occupied by the design but cannot differentiate which

submodule belongs to which replica. Hence, the target slice

for Trojan insertion is not clear. The attacker has to randomly

chooses φ slices out of all the occupied slices n ∗ ν + (φ − ν). The corresponding Trojan hit rate for this scenario is expressed

in Eq. (9).

ΓDF L3 vs. L−2 =

(

n ∗ ν + (φ − ν)

)sp

(9)

In L-3 attacks, the attacker has full knowledge of which

slices are configured for the design protected with the defense

line 3, but he/she could only form the complete design by

guessing which submodule replicas will be used. Without gate

replacement, the corresponding Trojan hit rate is shown in Eq.

(10), where ∑m

i=1 xi is equal to ν. The more swapping during

the runtime operation (i.e., higher sp), the less Trojan hit rate

the attacker could achieve.

ΓDF L3 vs. L−3 =

(

(m + 1)! · ∏m

i=1 nxi

∏m

i=1 (n ∗ ν + (φ − ν) − i)

)sp

(10)

VI. EXPERIMENTAL RESULTS

A. Experimental Setup

In the following experiments, we used the Xilinx ISE 14.1

design suite to synthesize, place and route the netlist of

ISCAS’85 and ISCAS’89 benchmark circuits, and the Amber

23 processor core (hereafter, a23) and the communication

controller Ethernet MAC (hereafter, ethmac) downloaded from

the OpenCores website. The ISCAS circuits were configured

for a Xilinx Spartan-6 XC6SLX16 FPGA, and the large-scale

a23 and ethmac circuits were mapped to a Xilinx XC6SLX75

FPGA. The detailed slice utilization of each circuit was

analyzed by our Python script to extract the occupied FPGA

slice positions. We used MATLAB to insert hardware Trojans

blindly or purposely (depending on the experimental goal)

in the extracted placelists to mimic the Trojan injection in

the FPGA mapping and PAR stages, and then measured the

hardware Trojan hit rate. We assume that only the Trojans

having payloads in the FPGA slices occupied by the design

under protection will lead to a Trojan hit. The FPGA slice

utilization and worst-case delay were obtained from the tools

available in the Xilinx design suite.

B. Variation on FPGA Slice Utilization

Variation on slice allocation for a design is critical to ensure

the high unpredictability offered by our method. Hence, we

first examined the impact of our defense line 1 on the FPGA

slice utilization. We compared all the slices used by the

baseline design and the one applied user-specified slice des-

ignations. The baseline means the original benchmark circuits

without any protection. We define a metric non-similarity rate

to assess the slice location difference that have been made

by our defense line 1. The non-similarity rate represents the

ratio of the number of the LUT instances being placed to new

positions due to our method over the total number of slices

used in the baseline.

IEEE TRANS. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. XX, NO. YY, YEAR ZZZ 8

c432 1s

c1355 1s

c1908 1s

c6288 1s

c432 3s

c1355 3s

c1908 3s

c6288 3s

0.47

0.48

0.49

0.5

0.51

0.52

0.53

N o n -s

im il a ri ty

R a te

Fig. 12. Non-similarity rate achieved by proposed defense line 1. The subscripts 1s and 3s means the location of a single slice or three slices are specified in the user constraints file for the FPGA implementation. On each bar, the central mark indicates the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively.

TABLE I MEDIANS OF NON-SIMILARITY RATE ON FPGA CONFIGURATION.

Circuits c4321s c13551s c19081s c62881s Std. deviation

Median 0.49167 0.50595 0.49351 0.49123 0.0070

Circuits c4323s c13553s 19083s c62883s Std. deviation

Median 0.5000 0.50595 0.49351 0.50125 0.0051

Circuits s3442s s5262s s14882s s132072s Std. deviation

Median 0.48333 0.42105 0.4878 0.43367 0.0340

As shown in Fig. 12, our method achieves an average non-

similarity rate in the range of 0.49 to 0.51. This means, on

average, about 50% of the LUT instances for each benchmark

circuit being placed to different positions on the FPGA die

due to our defense line 1. We repeated the simulation on

non-similarity rate for sequential circuits and summarized the

median values for all non-similarity rates in Table I. As shown,

the proposed defense line 1 approximately achieves a non-

similarity rate of 0.5. The increase on the number of user

specified slice locations slightly enlarges the variation on the

non-similarity rate (but still close to 0.5). Each non-similarity

rate in Fig. 12 and Table I was based on five test trails.

According to our case study, the average standard deviation

on the median value of different non-similarity rates is in the

range of 0.0070 to 0.034, which is very small.

Figure 13 provides the average non-similarity rates for seven

benchmark circuits (c432, c1355, c1908, c6288, s444, s1488

and s13207) based on five trials. The non-similarity rates are

all near 0.5, regardless of the number of re-allocated FPGA

slices by defense line 1. Based on the results above, we do

not suggest users re-allocating more than three slices even for

large designs.

C. Assessment on Attack Resilience

The attack resilience of the baseline and our method are

compared in this section. Three attack levels mentioned in

Section III.B are considered in the following assessment.

1) Hardware Trojan Hit Rate for L-1 Attacks: Recall that

attackers who execute L-1 attacks do not know the locations

Fig. 13. Average non-similarity rate for different number of re-allocated slices.

0% 10% 20% 30% 40% 50%

Space coefficient of Trojan attacks, ξ

0.05

0.1

0.15

0.2

0.25

H a rd

w a re

T ro

ja n h

it r

a te

, Γ c432-baseline

c432-proposed

(a)

0% 10% 20% 30% 40% 50%

Space coefficient of Trojan attacks, ξ

0.2

0.4

0.6

0.8

H a rd

w a re

T ro

ja n h

it r

a te

, Γ c1355-baseline

c1355-proposed

(b)

0% 10% 20% 30% 40% 50%

Space coefficient of Trojan attacks, ξ

0.1

0.2

0.3

0.4

0.5

H a rd

w a re

T ro

ja n h

it r

a te

, Γ c1908-baseline

c1908-proposed

(c)

0% 10% 20% 30% 40% 50%

Space coefficient of Trojan attacks, ξ

0.1

0.2

0.3

0.4

0.5

0.6

H a rd

w a re

T ro

ja n h

it r

a te

, Γ

c6288-baseline

c6288-proposed

(d)

0% 10% 20% 30% 40% 50% Space coefficient of Trojan attacks, ξ

0.005

0.01

0.015

0.02

0.025

0.03

H a

rd w

a re

T ro

ja n

h it r

a te

, Γ

s444-baseline

s444-proposed

(e)

0% 10% 20% 30% 40% 50%

Space coefficient of Trojan attacks, ξ

0.1

0.2

0.3

0.4

0.5

H a

rd w

a re

T ro

ja n

h it r

a te

, Γ

a23-baseline

a23-proposed

(f)

Fig. 14. Hardware Trojan hit rate reduction by proposed defense line 1 applied in the benchmark circuit (a) c432, (b) c1355, (c) c1908, (d) c6288, (e) s444, and (f) a23 in the scenario of L-1 attacks.

of all occupied slices for the design of interest. We varied the

range of attack exploration space from 5% to 50% of the entire

FPGA die. Figure 14 shows that the proposed defense line 1

achieves a lower hardware Trojan hit rate Γ than the baseline in a wide range of the attack exploration space. This is because

our defense line 1 makes the LUT placement unpredictable and

not targetable for L-1 attackers. The hardware Trojan hit rate

for c432, c1908, c6288, s444, and a23 first increases with the

increasing ξ. This is because f(ξ) ∗ φ in Eq. (2), the number of occupied slices falling in the attack space, grows faster

than ξ ∗ Φ, the attack space. As the maximum value of f(ξ) is 1, Γbaseline starts to drop after ξ exceeds a threshold. In our case studies, the ξ thresholds for c432, c1355, c1908, c6288, s444, and a23 are 15%, 5%, 15%, 25%, 40%, and

IEEE TRANS. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. XX, NO. YY, YEAR ZZZ 9

1 2 3 4 5 6

Number of inserted hardware Trojans

0.2

0.4

0.6

0.8

H a rd

w a re

T ro

ja n h

it r

a te

Baseline vs. L- 2 at t ack

DFL2 vs. L- 2 at t ack

DFL3 vs. L- 2 at t ack

(a)

c432 c1355 c1908 c6288 s444 s1488 s13207 a23 ethmac

Benchmark circuits

0.2

0.4

0.6

0.8

H a rd

w a re

T ro

ja n h

it r

a te

Baseline vs. L2 at t ac k

DFL2 vs. L2 at t ac k

DFL3 vs. L2 at t ac k

(b)

Fig. 15. Hardware Trojan hit rate for (a) c432, and (b) nine benchmark circuits suffering from four hardware Trojans inserted via L-2 attacks.

35%, respectively. The case of c1355 has a smaller ξ threshold than the other benchmark circuits, so we do not observe that

the corresponding Γbaseline increases with ξ. The hardware Trojan hit rate of our method increases much slower with the

increasing ξ than the baseline. When the attack exploration space is large enough to cover the entire design placed on

the FPGA die, the Trojan hit rate of proposed method will

approach to the Trojan hit rate of the baseline eventually.

2) Hardware Trojan Hit Rate for L-2 Attack: Different with

L-1 attacks, L-2 attacks are able to retrieve the exact locations

of the occupied slices. Consequently, the baseline design does

not have any resilience against L-2 attacks. The proposed

defense line 2 (DFL2) activates one complete design replica

according to the pseudo-random selection and the defense

line 3 (DFL3) assembles the hot-swappable submodules at

runtime. Here, we used two design replicas and each replica

composed of four submodules. Our simulation indicates that

DFL2 and DFL3 further increase the unpredictability of the

truly activated design copy and achieve a lower Trojan hit

rate than the baseline. As shown in Fig. 15(a), the baseline

yields a hardware Trojan hit rate of 1, which means Trojans are

always injected to the occupied slices. In contrast, our DFL2

and DFL3 significantly reduce the Trojan hit rate over the

baseline especially for the small number of injected Trojans.

When more Trojans are placed in the utilized FPGA slices, our

Trojan hit rate eventually increases due to the limited number

1 2 3 4 5 6

Number of inserted hardware Trojans

0.2

0.4

0.6

0.8

H a rd

w a re

T ro

ja n h

it r

a te

Baseline vs. L2 at t ack

DFL2 vs. L3 at t ack

DFL3 vs. L3 at t ack

(a)

c432 c1355 c1908 c6288 s444 s1488 s13207 a23 ethmac

Benchmark circuits

0.2

0.4

0.6

0.8

H a rd

w a re

T ro

ja n h

it r

a te

Baseline vs. L3 at t ac k

DFL2 vs. L3 at t ac k

DFL3 vs. L3 at t ac k

(b)

Fig. 16. Hardware Trojan hit rate for (a) c432, and (b) nine benchmark circuits suffering from four hardware Trojans inserted via L-3 attacks.

of replicas available in the design.

We examined the Trojan hit rate for nine benchmark circuits,

which suffer from four Trojan insertions via L-2 attacks. Each

hardware Trojan hit rate was obtained from 10,000 test cases.

The average Trojan hit rate of DFL2 (DFL3) is 71% (38%).

As shown in Fig. 15(b), the DFL2 reduces the hit rate by up

to 40% over the baseline. The reduction on the Trojan hit rate

can be further improved to 91% with DFL3.

3) Hardware Trojan Hit Rate for L-3 Attack: L-3 attacks

can recognize the multiple replicas of the design by searching

for the exactly same or approximately similar LUT configura-

tion. We repeated the same experiments as we did for Section

VI.C.2), except with a different attack level. For the sequential

circuits s1488 and s13207, they were implemented in style I.

As shown in Fig. 16(a), the Trojan hit rate for the design under

L-3 attacks increase with the increasing number of Trojans.

This trend is similar with that for the L-2 attack case. However,

the average Trojan hit rate of DFL2 (DFL3) against L-3 attacks

increases to 76% (48%), which is higher than in the scenario

of L-2 attacks. As shown in Fig. 16(b), the DFL2 reduces the

hit rate by up to 35% over the baseline. The DFL3 further

improves the attack resilience by up to 72%. From Figs. 15(b)

and 16(b) we can also conclude that L-3 attacks indeed are

more powerful than L-2 attacks. This is because L-3 attacks

can search for the matched LUT configuration patterns.

Figure 17(a) shows that the average number of exactly

IEEE TRANS. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. XX, NO. YY, YEAR ZZZ 10

c432 c1355 c1908 c6288

Benchmark circuits

10 0

10 1

10 2

10 3

A v e ra

g e N

u m

b e r

o f

E x a c t M

a tc

h in

w/o gate replacement

w/ gate replacement

(a)

c432 c1355 c1908 c6288

Benchmark circuits

10 0

10 1

10 2

10 3

A v e ra

g e N

u m

b e r

o f

A p p ro

x im

a te

M a tc

h in

w/o gate replacement

w/ gate replacement

(b)

Fig. 17. Comparison of the number of Trojan hits for without and with gate replacement to thwart L-3 pattern searching attack. (a) Exact matching and (b) Approximate matching.

matched LUT configurations per each benchmark circuit is

close to 100 (i.e. 1). If attackers search for the LUT con- figurations which have a similar format but use different

input/out pins (i.e approximate matching), the number of

matched cases increases. To address this issue, we applied

the gate replacement technique to the defense line 3. As can

be seen from Fig. 17(a), our enhanced method can increase

the number of exact matching LUT configurations, so that

the same LUT configurations do not stand for the identical

logic function for the benchmark circuit any more. Therefore,

when an attacker performs the L-3 attack, the Trojan hit

rate of our method can be reduced. Not only increasing

the number of exact matching cases, our gate replacement

technique also increases the number of approximate matching

patterns, as shown in Fig. 17(b). As a result, our enhanced

DFL3 reduces the hardware Trojan hit rate. From Fig. 18 we

can see, the proposed gate replacement technique reduces the

Trojan hit rate for different circuits. On average, our method

makes the Trojan hit rate decrease by 62% and 88% for

the attacker searching for exact matching and approximate

matching configurations, respectively.

D. Dependent Design Factors for Trojan Hit Rate Reduction

In the proposed defense line 3, our method swaps the

replicas of submodules at runtime. We examined the impact of

the number of hot swaps on the Trojan hit rate. As depicted in

Figs. 19(a) and (b), a larger number of hot swaps used in the

design yields a lower hardware Trojan hit rate. However, as

the number of inserted hardware Trojans increases, the Trojan

hit rate reduced by hot swapping gradually decreases. This

c432 c1355 c1908 c6288

Benchmark circuits

0.02

0.04

0.06

0.08

0.1

H T

H it R

a te

b y S

e a rc

h in

fo r

E x a c t M

a tc

h in

w/o gate replacement

w/ gate replacement

(a)

c432 c1355 c1908 c6288

Benchmark circuits

0.2

0.4

0.6

0.8

H T

H it R

a te

b y S

e a rc

h in

fo r

A p p ro

x im

a te

M a tc

h in

w/o gate replacement

w/ gate replacement

(b)

Fig. 18. Comparison of hardware Trojan hit rate for without or with proposed gate replacement to thwart L-3 pattern searching attack. (a) Exact matching and (b) approximate matching.

1 2 3 4 5 6

Number of inserted hardware Trojans

0.2

0.4

0.6

0.8

H a rd

w a re

T ro

ja n h

it r

a te

no hot swaps

2 hot swaps

4 hot swaps

6 hot swaps

8 hot swaps

(a)

1 2 3 4 5 6

Number of inserted hardware Trojans

0.2

0.4

0.6

0.8

H a rd

w a re

T ro

ja n h

it r

a te

no hot swaps

2 hot swaps

4 hot swaps

6 hot swaps

8 hot swaps

(b)

Fig. 19. Impact of number of hot swaps on hardware Trojan hit rate for c432 under (a) L-2 attacks, and (b) L-3 attacks.

conclusion applies to all the benchmark circuits we tested,

and remains consistent with the scenario of L-3 attacks shown

in Figs. 20(a) and (b).

The impact of the number of replicas n and submodules m on the Trojan hit rate are shown in Figs. 21(a) and (b), respectively. The increase on n helps to reduce the hardware

Trojan hit rate, as a larger n yields more unpredictability for attackers. The reduction on the Trojan hit rate becomes

more noticeable if more hardware Trojans are injected to the

design. The impact of m on the Trojan hit rate is not as significant as the impact from n (which is also indicated by the mathematical analysis in Eq. (10)). However, the number

of submodules in the original design will slightly affect the

area overhead, as shown in Table II. The overhead on the

worst-case delay varies, depending on how the submodules

are divided. In general, more submodules lead to an increase

IEEE TRANS. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. XX, NO. YY, YEAR ZZZ 11

2 4 6 8

Number of hot swaps

100

R e

d u

c ti o

n o

n h

a rd

w a

re T

ro ja

h it r

a te

( %

)

ethmac

a23

s13207

s1488

s444

c6288

c1908

c1355

c432

(a)

2 4 6 8

Number of hot swaps

R e

d u

c ti o

n o

n h

a rd

w a

re T

ro ja

h it r

a te

( %

)

ethmac

a23

s13207

s1488

s444

c6288

c1908

c1355

c432

(b)

Fig. 20. Impact of number of hot swaps on hardware Trojan hit rate for nine benchmark circuits affected by four hardware Trojans inserted via (a) L-2 attacks, and (b) L-3 attacks.

1 2 3 4 5 6

Number of inserted hardware Trojans

0.2

0.4

0.6

0.8

H a

rd w

a re

T ro

ja n

h it r

a te

n=2

n=3

n=5

(a)

1 2 3 4 5 6

Number of inserted hardware Trojans

0.2

0.4

0.6

0.8

H a rd

w a re

T ro

ja n h

it r

a te m=4

m=6

m=8

m=10

(b)

Fig. 21. Impact of the number of (a) replicas n and (b) submodules m on hardware Trojan hit rate for c6288 affected by four hardware Trojans.

TABLE II IMPACT OF NUMBER OF SUB-MODULES (M) ON FPGA COST AND DELAY.

LUTs c432 c1355 c1908 c6288 s444 s1488 s13207

m = 4 110 158 181 1118 96 259 433

m = 6 110 169 186 1139 99 264 458

m = 8 119 172 195 1152 83 269 468

m = 10 129 179 195 1151 87 273 472

Delay(ns) c432 c1355 c1908 c6288 s444 s1488 s13207

m = 4 6.747 5.249 5.974 10.666 1.43 4.559 3.587

m = 6 6.638 5.195 6.121 10.768 1.376 4.677 3.587

m = 8 6.954 5.288 6.136 11.288 1.634 4.418 3.641

m = 10 7.034 5.344 5.92 10.755 1.635 4.472 3.589

on the delay. The results in Table II are based on the DFL3

without gating technique and the replica number of 2.

E. Assessment on Hardware Cost, Delay and Power

The following experiments are based on the setup below:

the replicas for DFL2 and DFL3 were two, each circuit was

TABLE III NUMBER OF FPGA LUTS UTILIZED BY DIFFERENT METHODS.

Circuits c432 c1355 c1908 c6288 s444 s1488 s13207

Baseline 58 62 58 530 33 117 180

DFL1 58 62 59 530 33 117 181

DFL2 158 156 178 1123 67 261 429

DFL3.G 173 167 216 1157 84 296 443

DFL3.NG 110 158 181 1118 96 259 433

divided into four submodules, the style I was applied to DFL3,

four hot swappings were conducted during simulation.

1) Hardware Utilization: Table III summarizes the number

of utilized LUTs for different methods. Since our DFL1 only

changes the location of designated slices, on average, our

method consumes 0.33% more LUTs than the baseline. In

DFL2, we duplicated the design under protection once and

utilized a pseudo-random selection unit for replica selection.

The unselected replica was muted through input gating. For

the small circuits, the increase on the LUT utilization could

be large due to the relative large size of pseudo-random

selection and input gating. However, when the object for

protection is large, the FPGA overhead can be reduced through

optimization. The LUT overheads for the largest combinational

circuit c6288 and sequential circuit s13207 in our case studies

are 111.9% and 138.3%, respectively.

During the hot-swapping process, our DFL3 without input

gating (i.e. DFL3.G) interleaved multiple sections of the orig-

inal design and its replicas. In addition to the primary inputs,

the input gating technique was also applied to the inputs for

hot-swappable submodules. As a result, the LUT overheads for

c6288 and s13207 increase to 116.8% and 145%, respectively.

If we remove the input gating (i.e. NG) option, the correspond-

ing overheads on the utilized LUTs for the largest circuits

are reduced to 110.9% and 140.6%, respectively. Certainly,

removing the input gating will cost more power consumption.

Although our DFL3 incurs comparable LUT utilization for

double modular redundancy (DMR), our runtime replica se-

lection ensures lower power consumption and provides better

unpredictability. We also examined the hardware utilization

on the large-scale circuits a23 and ethmac. Our experiments

indicated that the overheads on LUT utilization for a23 and

ethmac are 196.4% (212.4%) and 119.3% (156.3%) for DFL2

(DFL3), respectively.

2) Power Consumption: We synthesized the Verilog codes

for the benchmark circuits in the Synopsys Design Compiler.

The clock frequency was set to 100 MHz for each design. We

measured the power consumption in the tool Design Compiler

and reported in Table IV. On average, the proposed DFL2

leads to an increase on the total power by 8.86% over the

baseline. Our DFL3 with input gating provides better resilience

against advanced attacks, at the cost of 11% more total power

than the baseline. The increased power consumption is due to

the pseudo-random selection and input gating logic, as well

as the multiplexers before the final outputs.

3) Worst-case Delay: We measured the worst-case delays

for different designs using the PlanAhead tool in Xilinx ISE

14.1 design suite. As shown in Table V, slice designation

used in the proposed DFL1 could lead to more or less worst-

IEEE TRANS. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. XX, NO. YY, YEAR ZZZ 12

TABLE IV TOTAL POWER CONSUMPTION BY DIFFERENT METHODS. UNIT: µW.

Circuits Baseline DFL 2 DFL3.G

c432 10.37 (100%) 11.05 (107%) 11.82 (114%)

c1355 48.66 (100%) 50.56 (104%) 50.56 (104%)

c1908 40.14 (100%) 42.50 (106%) 43.56 (109%)

c6288 217.41 (100%) 232.75 (107%) 233.67 (107%)

s444 20.01 (100%) 21.68 (108%) 21.85 (109%)

s1488 12.50 (100%) 15.25 (122%) 15.56 (124%)

s13207 303.33 (100%) 329.03 (108%) 335.07 (110%)

case delay, depending on where the slice is designated. To

examine the impact of the slice designation on the worst-case

delay, we varied the number of designated slices from 1 to 3,

and performed five test cases for each designation condition.

Based on our case studies, DFL1 induces a delay overhead

as large as 1.74% and 3.52% for the single-slice designation

and triple-slice designation, respectively. Given a tight timing

budget, several slice selections should be examined for the best

slice re-allocation in terms of incurring minor delay overhead.

Compared to the baseline, our DFL2 leads to the worst-case

delay increase in the range of 4.23% to 17.19% for different

benchmark circuits. Due to the hot-swappable logic, the delay

overhead induced by DFL3 is no more than 22.02%. For the

large-scale benchmark circuits a23 (ethmac), the DFL2 incurs

4.4% (6.2%) more delay than the baseline. Our DFL3 causes

the worse-case delay increase by 16% and 8.3% over the

baseline for a23 and ethmac, respectively.

F. Comparing FOMTD with Static Trojan Detection Method

In this section, we compare our FOMTD with static Trojan

detection methods, which are based on double or triple mod-

ular redundancy (DMR or TMR). Even though the attacker

who performs L-2 attacks can see the utilized LUTs, it is

not guaranteed that the attacker can successfully place two

identical hardware Trojans in two design replicas. Because

the Trojans inserted on the replica comparison logic cannot

be detected by DMR, the Trojan hit rate is not reduced to

zero. When we advance the attack method to L-3 attacks, our

DFL3 effectively reduces the Trojan hit rate. Together with

the runtime hot-swapping feature, fewer number of exactly

matched LUT configurations available in the netlist of our

method benefits us to reduce the success rate of a Trojan

inserted by L-2 and L-3 attacks. Figure 22 shows that our

method achieves a lower Trojan hit rate than DMR. On

average, our DFL3 reduces the Trojan hit rate by 63.3% and

42.5% against L-2 and L-3 attacks, respectively. Indeed, L-

3 attacks can search for the identical LUT configurations, but

the number of exactly matched LUT configurations is not high

in FPGA mapping (which is different with ASIC design).

Figure 23 shows that our DFL3 can effectively reduce the

number of exact matching cases over DMR. This explains why

DFL3 obtains a better attack resilience than DMR. Because of

the input gating, our DFL3 consumes less power than DMR.

As indicated in Fig. 24, the total power consumption for the

five benchmark circuits protected with DFL3 is less than that

for the circuits protected with DMR. On average, our method

Proposed DFL3 DMR 0

0.2

0.4

0.6

0.8

H a rd

w a re

T ro

ja n h

it r

a te

c1355

c1908

c6288

s1488

s13207

(a)

Proposed DFL3 DMR 0

0.2

0.4

0.6

0.8

H a rd

w a re

T ro

ja n h

it r

a te

c1355

c1908

c6288

s1488

s13207

(b)

Fig. 22. Comparison of hardware Trojan hit rate for proposed defense line 3 and DMR affected by four Trojans inserted via (a) L-2 and (b) L-3 attacks.

c1355 c1908 c6288 s1488 s13207 1

1.2

1.4

1.6

1.8

2.2

A v e

ra g

e N

u m

b e

r o

E x a

c t

M a

tc h

in g

Proposed DFL3

DMR

Fig. 23. Comparison of number of exact matching on LUT configuration.

Fig. 24. Comparison of power consumption between proposed DFL3 and DMR.

achieves 50% reduction on the total power over the DMR

method.

Next, we applied the proposed method and adaptive TMR

(ATMR) [18] to the circuits for a practical application. We

connected the Xilinx FPGA board to a monitor through a

Video Graphics Array (VGA) cable. The function module

configured in the FPGA device was used to draw a chess

board on a screen by sending a VGA signal to the monitor.

Two hardware Trojans were inserted to the FPGA placelist by

the mean of L-3 attacks. Our DFL3 was applied to thwart the

L-3 attack from the untrusted FPGA software. Our method

guaranteed the correct display of the original picture shown in

Fig. 25(a). In contrast, the ATMR method did not eliminate the

effect of the two Trojans, yielding a distorted chess board, as

shown in Fig. 25(b). This is because the L-3 attack searches for

the identical LUT configurations in the ATMR design replicas

and inserts the Trojans in the identical LUTs, each belonging

to one design replica.

VII. CONCLUSION

Many security mechanisms for FPGA-based systems have

been investigated to prevent systems from IP theft and reverse

IEEE TRANS. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. XX, NO. YY, YEAR ZZZ 13

TABLE V COMPARISON OF WORST-CASE DELAY. UNIT: NS.

Circuits c432 c1355 c1908 c6288 s444 s1488 s13207

Baseline 5.659 4.677 5.241 10.181 1.43 4.105 3.328

DFL1

single-slice designation

case 1 5.713 4.677 5.500 10.128 1.314 4.051 3.328 case 2 5.603 4.677 5.458 10.358 1.322 3.997 3.274 case 3 5.549 4.677 5.322 10.18 1.43 3.947 3.274 case 4 5.711 4.622 5.257 10.013 1.322 3.979 3.274 case 5 5.657 4.679 5.278 9.905 1.376 4.049 3.328 +/- delay -1.94%∼0.95% -1.18%∼0 0∼0.49% -1.65%∼1.74% -0.81%∼0 -3.85%∼0 1.62%∼0

triple-slice designation

case 1 5.607 4.57 5.287 10.234 1.378 4.009 3.272 case 2 5.715 4.731 5.406 9.966 1.378 3.979 3.328 case 3 5.553 4.679 5.335 9.979 1.378 4.049 3.328 case 4 5.661 4.677 5.448 10.51 1.378 4.049 3.328 case 5 5.606 4.669 5.334 9.899 1.322 4.009 3.272 +/- delay -0.96%∼1.93% 0%∼3.52% 0∼3.05% -3.27%∼2.7% -4.06%∼0 -0.75%∼1% 0%∼1.17%

DFL2 6.164 (+8.92%)

5.249 (+12.23%)

5.702 (+8.80%)

10.612 (+4.23%)

1.578 (+10.35%)

4.699 (+14.47%)

3.900 (+17.19%)

DFL3 6.528 (+15.36%)

5.707 (+22.02%)

6.177 (+17.86%)

10.925 (+7.31%)

1.637 (+14.48%)

4.785 (+16.57%)

3.900 (+18.81%)

(a) (b)

Fig. 25. FPGA output for the circuit protected with (a) proposed defense line 3, and (b) ATMR [18].

engineering attack on bitstream. However, there is limited

literature available that studies the security threats originated

from the untrusted FPGA CAD tools. This work fills the

gap. We demonstrate two practical attacks through Xilinx

and Altera FPGA design suites. We further classify three

Trojan attack levels, depending on the attacker’s prior FPGA

experience and ability to manipulate the FPGA software.

To mitigate the hardware Trojans induced by the malicious

FPGA tools, we propose a FOMTD method which offers

three defense lines. Each defense line generates a different

degree of unpredictability from the malicious FPGA software

designer’s point of view. As our unpredictability is formed

after the CAD tool is delivered to FPGA users, our method

facilitates FPGA users to thwart Trojan insertion attacks during

the FPGA configuration phase. To the best of our knowledge,

our research effort is the first work that investigates the FPGA

based moving target defense for SRAM FPGAs.

We did extensively evaluation on the security, hardware cost,

and performance. The proposed defense line 1 changes 50%

of the default LUT mapping on the FPGA device and reduces

the hardware Trojan hit rate of L-1 attacks, at the cost of

0.33% more LUT utilization compared to the baseline. When

advance attacks occur, our defense lines maintain a low Trojan

hit rate. Defense lines 2 and 3 reduce the Trojan hit rate by

up to 40% and 91%, respectively, over the baseline. The gate

replacing technique in defense line 3 further reduces the Trojan

hit rate, on average, by 62% and 88% even if the attacker

searches for exact and approximate configuration matching,

respectively. The power increase due to the defense line 2

and defense line 3 is 8.86% and 11%, respectively, compared

to the baseline. The delay overhead varies. According to our

case studies, the worst-case delay overhead our defense line

incurs is 22.02%. We also compared the defense line 3 to

a static Trojan detection method, DMR. Experimental results

show that our method improves the hardware Trojan hit rate

by 63.3% and 42.5% against L-2 and L-3 attacks, respectively.

Because of the input gating and hot-swappable features in our

method, our defense line 3 consumes 50% less power than

DMR.

The limitation of the proposed defense lines is the hardware

cost and delay increase. However, considering the significant

improvement on the resilience against Trojan insertion attacks,

the overhead of our method is moderate and acceptable for

security-critical applications. In future work, we will work on

the cost minimization of the FOMTD method.

ACKNOWLEDGMENT

This work is partially supported by National Science Foun-

dation CAREER Award No. 1652474 and Air Force Research

Laboratory Visiting Faculty Research Program (VFRP) 2017.

REFERENCES

[1] “FPGA Market size set to exceed USD 9.98 Billion by 2022, with over 8.4from 2015 to 2022: Global Market Insights Inc..” https://goo.gl/uEmByo [Accessed 9/13/2018].

[2] “Security for volatile FPGAs.” http://www.cl.cam.ac.uk/techreports/ UCAM-CL-TR-763.pdf [Accessed 9/13/2018].

[3] M. Majzoobi, F. Koushanfar, and M. Potkonjak, “FPGA-oriented Se- curity,” Introduction to Hardware Security and Trust, pp. 1–38, Sept 2012.

[4] M. Majzoobi and F. Koushanfar, “Time-Bounded Authentication of FPGAs,” IEEE Trans. on Information Forensics and Security, vol. 6, pp. 1123–1135, Sept 2011.

[5] I. Hadzic, S. Udani, and J. M. Smith, “FPGA Viruses,” in Proc. Intl. Workshop on FPL’99, pp. 291–300, April 1999.

[6] S. Trimberger, “Trusted design in FPGAs,” in Proc. DAC’07, pp. 5–8, June 2007.

IEEE TRANS. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. XX, NO. YY, YEAR ZZZ 14

[7] S. Skorobogatov and C. Woods, “Breakthrough silicon scanning dis- covers backdoor in military chip,” in Proc. CHES’12, pp. 23–40, Sept 2012.

[8] P. Swierczynski, A. Moradi, D. Oswald, and C. Paar, “Physical Security Evaluation of the Bitstream Encryption Mechanism of Altera Stratix II and Stratix III FPGAs,” ACM Trans. Reconfigurable Technol. Syst., vol. 7, pp. 34:1–34:23, Dec. 2014.

[9] S. Mal-Sarkar, R. Karam, S. Narasimhan, A. Ghosh, A. Krishna, and S. Bhunia, “Design and Validation for FPGA Trust under Hardware Trojan Attacks,” IEEE Trans. on Multi-Scale Computing Syst., vol. 2, pp. 186–198, July 2016.

[10] Z. Zhang, Q. Yu, L. Njilla, and C. Kamhoua, “FPGA-oriented moving target defense against security threats from malicious FPGA tools,” in Proc. HOST’18, pp. 163–166, April 2018.

[11] R. S. Chakraborty, I. Saha, A. Palchaudhuri, and G. K. Naik, “Hard- ware Trojan Insertion by Direct Modification of FPGA Configuration Bitstream,” IEEE Design Test, vol. 30, pp. 45–54, April 2013.

[12] Z. Zhang, L. Njilla, C. Kamhoua, K. Kwiat, and Q. Yu, “Securing FPGA- based Obsolete Component Replacement for Legacy Systems,” in Proc. ISQED’18, pp. 401–406, March 2018.

[13] G. Bloom, B. Narahari, R. Simha, A. Namazi, and R. Levy, “FPGA SoC architecture and runtime to prevent hardware Trojans from leaking secrets,” in Proc. HOST’15, pp. 48–51, May 2015.

[14] A. L. Oliveira, “Techniques for the creation of digital watermarks in sequential circuit designs,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Syst., vol. 20, pp. 1101–1117, Sept 2001.

[15] J. Zhang, Y. Lin, Y. Lyu, and G. Qu, “A PUF-FSM Binding Scheme for FPGA IP Protection and Pay-Per-Device Licensing,” IEEE Trans. on Information Forensics and Security, vol. 10, pp. 1137–1150, June 2015.

[16] R. Karam, T. Hoque, S. Ray, M. Tehranipoor, and S. Bhunia, “MU- TARCH: Architectural diversity for FPGA device and IP security,” in Proc. ASPDAC’17, pp. 611–616, Jan 2017.

[17] Y. Pino, V. Jyothi, and M. French, “Intra-die process variation aware anomaly detection in FPGAs,” in Proc. ITC’14, pp. 1–6, Oct 2014.

[18] S. Mal-sarkar, A. Krishna, A. Ghosh, and S. Bhunia, “Hardware Trojan Attacks in FPGA Devices: Threat Analysis and Effective Countermea- sures,” in Proc. GLSVLSI’14, pp. 287–292, May 2014.

[19] D. M. Shila and V. Venugopal, “Design, implementation and security analysis of Hardware Trojan Threats in FPGA,” in Proc. IEEE ICC’14, pp. 719–724, June 2014.

[20] B. Khaleghi, A. Ahari, H. Asadi, and S. Bayat-Sarmadi, “FPGA-based protection scheme against hardware Trojan horse insertion using dummy logic,” IEEE Embedded Syst. Letters, vol. 7, pp. 46–50, June 2015.

[21] S. Bhunia, M. S. Hsiao, M. Banga and S. Narasimhan, “Hardware Trojan attacks: Threat analysis and countermeasures,” Proceedings of the IEEE, vol. 102, pp. 1229–1247, Aug. 2014.

[22] R. S. Chakraborty, F. Wolff, S. Paul, C. Papachristou, and S. Bhunia, “Mero: A statistical approach for hardware Trojan detection,” in Proc. CHES’09, pp. 396–410, Aug. 2009.

[23] S. Narasimhan, D. Du, R. S. Chakraborty, S. Paul, F. G. Wolff, C. A. Papachristou, K. Roy, and S. Bhunia, “Hardware Trojan detection by multiple-parameter side-channel analysis,” IEEE Trans. on Computers, vol. 62, pp. 2183–2195, Nov 2013.

[24] R. Druyer, L. Torres, P. Benoit, P. V. Bonzom, and P. Le-Quere, “A survey on security features in modern FPGAs,” in Proc. ReCoSoC’15, pp. 1–8, June 2015.

[25] “SoC FPGA Hardware Security Requirements and Roadmap.” https://www.intel.com/content/dam/www/programmable/us/en/pdfs/ education/events/northamerica/isdf/SoC-FPGA-Hardware-Security.pdf [Accessed 9/13/2018].

[26] M. Tehranipoor and F. Koushanfar, “A survey of hardware Trojan taxonomy and detection,” IEEE Design Test of Computers, vol. 27, pp. 10–25, Jan 2010.

[27] J. A. Roy, F. Koushanfar, and I. L. Markov, “Extended abstract: Circuit cad tools as a security threat,” in Proc. HOST’08, pp. 65–66, June 2008.

[28] S. M. Trimberger and J. J. Moore, “FPGA Security: Motivations, Fea- tures, and Applications,” Proceedings of the IEEE, vol. 102, pp. 1248– 1265, Aug 2014.

[29] D. Last, D. Myers, M. Heffernan, M. Caiazzo, and Captain N. Paltzer, “Command and control of proactive defense,” J. of Cyber Security and Information Syst., vol. 4, no. 1, pp. 8–13, 2015.

[30] “Moving Target Defense.” https://www.dhs.gov/science-and- technology/csd-mtd [Accessed 9/13/2018].

[31] R. Zhuang, S. A. Deloach and X. Ou, “Towards a theory of moving target defense,” in Proc. the First ACM Workshop on Moving Target Defense, pp. 31–40, 2014.

Zhiming Zhang is currently pursuing the Ph.D.

degree with the Department of Electrical and

Computer Engineering, University of New Hamp-

shire, Durham, New Hampshire, USA. His cur-

rent research focuses on hardware security which

includes design obfuscation, side channel analysis

of encryption algorithms, fault attack analysis,

and emerging technologies with emphasis on

hardware security and trust.

Laurent Njilla (M’05) received his Ph.D. from

the Electrical and Computer Engineering De-

partment at Florida International University, Mi-

ami, his M.S. from the University of Central

Florida, Orlando USA, and his B.S. from the

Department of Computer Science, University of

Yaounde, Yaounde, Cameroon. He is currently

a Research Engineer at the Air Force Research

Laboratory, Department of Defense. His research

interests and expertise include cyber security,

Game theory, hardware and network security,

blockchain technology, cyber threat information and advanced computer

networking.

Charles A. Kamhoua (S’10-M’12-SM’14) is a

researcher at the Network Security Branch of

the U.S. Army Research Laboratory (ARL) in

Adelphi, MD, where he is responsible for con-

ducting and directing basic research in the area

of game theory applied to cyber security. Prior

to joining the Army Research Laboratory, he

was a researcher at the U.S. Air Force Research

Laboratory (AFRL), Rome, New York for 6 years

and an educator in different academic institutions

for more than 10 years. He has held visiting

research positions at the University of Oxford and Harvard University.

He has co-authored more than 150 peer-reviewed journal and conference

papers. He has been recognized for his scholarship and leadership with

numerous prestigious awards. He received a B.S. in electronics from

the University of Douala (ENSET), Cameroon, in 1999, and a Ph.D. in

Electrical Engineering from FIU in 2011. He is currently an advisor for

the National Research Council postdoc program, a member of the FIU

alumni association and ACM, and a senior member of IEEE.

Qiaoyan Yu (S’03-M’11-SM’17) received the B.S.

degree from Xidian University, Xian, China in

2002, the M.S. degree from Zhejiang University,

Hangzhou, China in 2005, and the Ph.D. de-

gree in Electrical Engineering from University

of Rochester, Rochester, New York, USA in 2011.

Dr. Yu is currently an Associate Professor with

the Department of Electrical and Computer Engi-

neering, University of New Hampshire, Durham,

New Hampshire, USA. Her research interests

include hardware security and trust, embedded

system security, cyber-physical system, error control for networks-on-

chip, fault-tolerance for VLSI circuits and systems. Dr. Yu is the recipient

of National Science Foundation CAREER award in 2017. She has served

on the technical program committees of HOST, DAC, FDTC, Asian

HOST, ISVLSI, DFT, ASP-DAC, GLSVLSI, and ISCAS. She is a member

of the editorial boards of Integration, the VLSI Journal, Microelectronics

Journal, and Journal of Circuits, Systems, and Computers.

sources/149/Zhang et al. - 2015 - A PUF-FSM Binding Scheme for FPGA IP Protection an.pdf

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 6, JUNE 2015 1137

A PUF-FSM Binding Scheme for FPGA IP Protection and Pay-Per-Device Licensing

Jiliang Zhang, Yaping Lin, Yongqiang Lyu, and Gang Qu, Senior Member, IEEE

Abstract— With its reprogrammability, low design cost, and increasing capacity, field-programmable gate array (FPGA) has become a popular design platform and a target for intellectual property (IP) infringement. Currently available IP protection solutions are usually limited to protect single FPGA configurations and require permanent secret key storage in the FPGA. In addition, they cannot provide a commercially popular pay-per-device licensing solution. In this paper, we propose a novel IP protection mechanism to restrict IP’s execution only on specific FPGA devices in order to efficiently protect IPs from being cloned, copied, or used with unauthorized integration. This mechanism can also enforce the pay-per-device licensing, which enables the system developers to purchase IPs from the core vendors at the low price based on usage instead of paying the expensive unlimited IP license fees. In our proposed binding-based mechanism, FPGA vendors embed into each enrolled FPGA device with a physical unclonable function (PUF) customized for FPGAs; IP vendors embed augmented finite-state machines (FSM) into the original IPs such that the FSM can be activated by the PUF responses from the FPGA device. We propose protocols to lock and unlock FPGA IPs, demonstrate how PUF can be embedded onto FPGA devices, and analyze the security vulnerabilities of our PUF-FSM binding method. We implement a 128-bit delay-based PUF on 28-nm FPGAs with only 258 RAM-lookup tables and 256 flipflops. The PUF responses are unique and reliable against environment changes. We also synthesize a variety of FSM benchmark circuits. On large benchmarks, the average timing overhead is 0.64% and power overhead in 0.01%.

Index Terms— Binding, field-programmable gate array (FPGA), finite state machine (FSM), hardware metering, intellectual property (IP) protection, physical unclonable functions (PUFs).

I. INTRODUCTION

A. Motivations

FIELD-PROGRAMMABLE gate arrays (FPGAs) are thesemiconductor devices that can be reprogrammed by the Manuscript received January 3, 2014; revised May 30, 2014 and

November 29, 2014; accepted January 28, 2015. Date of publication February 5, 2015; date of current version April 13, 2015. This work was supported in part by the National Natural Science Foundation of China under Grant 61173038 and Grant 61228204 and in part by a scholarship from China Scholarship Council under Grant 201306130042. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Farinaz Koushanfar.

J. Zhang and Y. Lin are with the College of Information Science and Engineering, Hunan University, Changsha 410082, China (e-mail: [email protected]; [email protected]).

Y. Lyu is with the Research Institute of Information Technology, Tsinghua University, Beijing 100084, China (e-mail: [email protected]).

G. Qu is with the Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742 USA (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIFS.2015.2400413

end-users to implement any digital system. Comparing to the implementation with Application-specific Integrated Circuits (ASICs), FPGA design has the advantages of shorter time-to-market, lower non-recurring engineering costs and higher flexibility. These have made FPGA a popular design platform for many applications such as automotive electron- ics, consumer electronics and aerospace equipments. In this FPGA-based design platform, third-party intellectual proper- ties (IPs) are widely used due to both the technical merits (e.g., the IPs proven functionality, compatibility, and performance) and non-technical concerns (e.g., time-to-market, cost, and patent enforcement). However, there are severe piracy attacks to the FPGA IPs and the current licensing schemes are also not flexible enough to precisely control the authorized usage.

Firstly, from the perspective of the attack, piracy attacks, such as cloning, copy, misuse and unauthorized integration, are considered to be the most common security vulnerability of volatile FPGAs [1]. Un-configured FPGA devices are off-the-shelf products, and the configuration bitstreams can be obtained by eavesdropping or directly from the volatile SRAM FPGAs [1], which not only reduces the profits and market share, but also causes the damage to the brand reputation and even leads to severe early product failures and safety hazards [1], [2]. Furthermore, this is not limited to high-value single FPGA designs; the third-party FPGA intellectual property (IP) cores are also vulnerable to those attacks.

Secondly, from the perspective of licensing, it is often vital to ensure that the configuration bit-streams can only be used on the licensed FPGA devices. In such a case, IP core vendors would prefer to sell their IP products through pay-per-device licensing rather than through up-front license fees that allows users to configure any FPGA device. In order to adapt the IP core business model for the low/medium-volume FPGA applications [3], effective pay-per-device licensing techniques are in urgent need.

Mainstream FPGA vendors have been paying more and more efforts in protecting their IPs from piracy attacks and improving licensing schemes to activate and protect the IP-based commercial flow. However, the state-of-art tech- niques still have some drawbacks. In this paper, we consider hardware IPs (HWIPs) as the soft-core (synthesized from HDL) hardware modules stored in the FPGA configuration bitstreams [11]. Our goal is to develop techniques to solve the piracy and licensing challenges. We propose to solve these problems by a binding mechanism that seeks to restrict the execution of the protected IPs to the authorized FPGA devices only.

1556-6013 © 2015 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1138 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 6, JUNE 2015

B. Limitations of Prior Art

FPGA HWIP protection techniques have been well-studied in academic [2]–[5] and widely used in industry [6]–[9]. However, all the existing HWIP protection techniques are based on encryption and have the following main drawbacks:

1) The commercially available encryption-based techniques can only protect the single large FPGA configurations.

2) The commercially encryption-based techniques cannot provide a solution to the commercially popular pay-per-device licensing requirement for both single large configurations and individual IP cores.

3) The current encryption-based FPGA IP protection meth- ods introduce security vulnerabilities (e.g., physical attacks and side channel attacks) for permanent key storage and management.

C. Our Contributions In this paper, we propose a binding scheme that binds

the HWIPs to specific FPGA devices via the interaction between physical unclonable functions (PUF) built on the FPGA devices and the FSMs in the HWIPs in order to address the limitations of existing FPGA HWIP protection techniques. We first report this concept in [37]. In this article, 1) we provide a concrete construction and implementation of a delay-based PUF on 28nm FPGAs as a reference design of our binding scheme; 2) we implement and verify the proposed binding scheme by synthesizing MCNC’91 circuits and large FSM benchmarks from GenFSM on FPGAs; 3) we elaborate the details of the proposed binding scheme with illustrative example and in-depth discussion on design flow, system integration, and security vulnerabilities. To the best of our knowledge, this is the first non-encryption based FPGA HWIP binding method. Comparing to the traditional encryption-based HWIP protection methods, our approach has the following advantages:

1) It can be used to protect both single FPGA configura- tions and third-party FPGA IP cores. Currently available encryption-based commercial methods can only protect the former, but not the latter.

2) It supports the pay-per-device licensing mechanism. The FPGA configuration bitstream can only be used to configure specific FPGA devices, giving IP vendors control of their IPs and allowing product developers to pay licensing fee only for the FPGA devices they are using.

3) It does not need permanent storage for secret keys in the FPGA. In our binding scheme, the secret PUF response can be ephemeral and immediately cleared after use. Therefore, it eliminates the security vulnerabilities of the permanent key management and exchange.

4) It has low hardware overhead. We implement the pro- posed method on Virtex-5 FPGA devices and find that the 128-bit delay-based PUF needs about 256 slices [41] and the modified FSM only introduces 0.64% timing overhead and 0.01% power overhead on average for ten large FSM designs. As a comparison, previous FPGA IP protection schemes consume 6776 LUTs for a SHA-1 core and an ECDH core [4], [5].

D. Outline of the Paper

The rest of this paper is organized as follows. Related work is surveyed in Section II. The necessary background informa- tion on PUF, FSM, and parties involved in HWIP binding is presented in Section III. The proposed binding method and its working mechanism are elaborated in Section IV. An IP locking mechanism and a reference implementation of PUF for the proposed binding method are then given in Section V and Section VI, respectively. Potential security threats and countermeasures are analyzed in Section VII. The detailed experimental results and analysis are reported in Section VIII. Finally, we conclude in Section IX.

II. RELATED WORK

A. FPGA HWIP Protection Techniques

Many intellectual property protection techniques for FPGAs have been proposed in academic and industry.

In commercial tools, bit-stream encryption [6]–[9] is the most popular intellectual property protection method against direct cloning of single large FPGA configurations for high-end FPGA devices. Some recent FPGAs employ the advanced encryption standard (AES) core or triple data encryp- tion standard (3DES) core to support the encryption of the FPGA configuration bitstreams; some FPGAs employ keyed- hash message authentication code (HMAC) core to enable bit-stream authentication [8]. They all need the on-chip cryp- tographic decryption module and the permanent secure key storage. Unfortunately, these solutions come with some prac- tical limitations: they are not appropriate for resource-limited environments, and more importantly, it is well-known that such permanent key storage scheme allows attackers to attack at any time.

In the academic domain, Gneysu et al. [4] proposed a protection scheme for the FPGA bitstreams, which uses the secondary secure key register and the authenticated bitstream encryption and requires minor modification to the current FPGA technology. They employed a public-key-based pro- tocol between the IP providers and the FPGA-based system developers, and a trusted third party (TTP) is used to handle key exchange and installation in the symmetric-key-decryption engines. This solution is only suitable for the protection of single large FPGA configurations, and the protection of individual HWIP cores remains as a challenging problem. Drimer et al. [5] presented an encryption-based method to protect multiple IPs, and Kepa et al. [13] proposed a secure reconfigurable controller based method to support license enforcement within the partial reconfiguration flow. More recently, Maes et al. [2] introduced a valuable “pay-per-use” licensing scheme to protect multiple FPGA IPs through the self-reconfiguring capabilities of modern FPGAs and a TTP for metering the service.

As we can see from the above, all commercial and academic FPGA configuration bitstream protection meth- ods are encryption-based; they have three shortcomings: 1) the commercial methods are limited to the protection of single large FPGA configurations; 2) they cannot support the pay-per-device licensing; 3) the previous encryption-based

ZHANG et al.: PUF-FSM BINDING SCHEME FOR FPGA IP PROTECTION AND PAY-PER-DEVICE LICENSING 1139

HWIP protection methods require permanent key storage and on-chip cryptographic decryption modules to decrypt the bitstream, which introduces some security vulnerabilities and high overhead. Our approach overcomes these limitations.

B. Metering ASIC Intellectual Properties

A number of watermarking methods for ASIC/FPGA intellectual property protection have been proposed [32]–[35]. However, watermarking techniques are passive and only used to identify the intellectual property. In 2001, Koushanfar and Qu [38] proposed the first hardware metering method that can enable the design house to gain the post-fabrication control by passive or active control of the number of produced ICs. Alkabani et al. [24] proposed an anti-overbuilding mechanism which exploits the functional description of the design and the unique and unclonable IC identifiers. The locks can be embedded via modifying the hardware computational model such as an FSM. They also presented another FSM manipulation method [25] which introduces only a few new states. These solutions are only suitable for protecting single ASIC chips. Later on, they further extended their scheme to actively control multiple IP cores [26] for ASIC chips. Recently, Koushanfar [27] improved again the locking structure in [24] by a multi-point function. Meanwhile, Roy et al. [20] presented another kind of cryptography-based metering methods, but their solution has a very high overhead. These metering mechanisms are designed for anti-overbuilding ASIC devices, they are not appropriate for pay-per-device licensing of FPGA designs.

In this paper, our proposed FPGA HWIP binding technique not only addresses the main drawbacks of the traditional FPGA HWIP protection methods, it can also support a pay-per-device licensing scheme. This provides technical sup- port for the product developers (system developers) to pay IP licensing fees only for the FPGA devices they are using. It also enables the IP vendors to freely distribute their IPs because they can ensure that the distributed IPs run only on specific FPGAs rather than all the FPGAs. This binding scheme brings a remarkable advantage for the IP-based busi- ness model: the IP owners can take the full control over the use of their IP cores and protect them from unlicensed use; the FPGA-based product developers who could not afford the expensive unlimited IP license are now also able to obtain a number of single instances of the required IP cores at a much lower cost.

III. PRELIMINARIES

In this section, we will introduce the general terms and concepts used throughout the paper. More specific definitions would be described as necessary.

A. Physical Unclonable Function (PUF)

PUF provides a unique chip-dependent mapping from a set of digital inputs (challenges) to a set of digital out- puts (responses) based on the unclonable properties of the

underlying physical device. Although it is difficult to come up with a uniform definition for all types of PUFs, they should all satisfy the following properties [39]:

• Persistent and unpredictable. The response ( Ri ) to a challenge (Ci ) is random and unpredictable, but should remain the same for the same challenge over multiple observations.

• Unclonable. It is impossible to obtain Ri from Ci without the physical presence of the PUF. In other words, given a PUF, it is infeasible for an adversary to build another PUF that provides the same responses to every possible challenge. This is assumed to be true due to the uncon- trollable technology variations.

• Tamper evident. Invasive attacks to PUFs will destroy the PUFs and thus can be detected easily.

Because of those properties, PUF has become an efficient mechanism to address security and trust problems in many applications, such as binding software IPs to specific FPGAs [11], hardware/software authentication [16], FPGA IP protection [18], [43], anti-overbuilding [24]–[27] and resisting FPGA replay attacks [36].

B. Finite State Machine (FSM)

FSM is a popular model for sequential systems. In this paper, we employ FSMs to bind HWIPs to the FPGAs with PUFs to restrict the HWIP’s usage so that it can only work on the enrolled FPGA devices. Similar to the FSM-based works such as [15] and [24]–[27], the method proposed in this paper is not applicable to some high-speed designs that do not have FSMs. These high-speed designs are normally small dedicated modules such as digital filters, channel equalizers, address decoders and arithmetic logic units. Fortunately, for the HWIPs in industrial designs that we target to protect, the sequential components or functions, and therefore FSMs, are ubiquitous [15].

C. Parties Involved in HWIP Binding

In order to facilitate our study, we consider the following parties involved in the binding mechanism and their respective roles:

• FPGA vendor (FV): FV designs and manufactures un-configured FPGA devices and can securely deploy PUF in the fabric of these devices.

• System developer (SD): SD integrates the third-party IPs along with their own designs to create a commercial prod- uct on an FPGA chip. The product will be synthesized into a configuration bitstream file for the FPGA chip to download using the computer aided design (CAD) tools provided by the FV.

• IP core vendor (CV): CV creates innovative logic circuits (HWIP cores) and sells them to SDs for profits. CV needs an effective technique to keep the full control over the use of the HWIP cores.

• End user (EU): EU purchases the FPGA products developed by the SD. The SD expects that EUs cannot ‘clone’ the products by copying the FPGA configuration bitstream file and run on unauthorized FPGA devices.

1140 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 6, JUNE 2015

Fig. 1. Design flow of modifying hardware IP.

Our goal is to design a new binding mechanism so the SDs and CVs can protect their FPGA designs or IPs from piracy without introducing much inconvenience and large performance degradation to the EUs and FVs.

IV. THE PROPOSED BINDING SCHEME

Traditionally, the HWIPs are written without any concern of binding to any specific FPGA. The configuration bitstream can be used to configure any FPGA device of the same type. Given a HWIP, our goal is to modify the original FSM of the HWIP to produce an augmented FSM which is functionally equivalent to the former. The modified FSM reacts with the intrinsic PUF located in the specific FPGA hardware, and it can perform exactly the same as that of the HWIP as long as the challenges issued by the HWIP obtain correct responses through the PUF. This means that only the FPGA chips authorized by the CVs can guarantee the correct functionalities. Meanwhile, as long as there are PUFs embedded in the FPGA chips from the FV, and the CV modifies their IP designs to support PUF, no more changes are needed at any party when a new HWIP is developed and needs to be deployed to a new FPGA device. The details of the binding scheme are depicted in figures 1, 2, and 3 and described as follows.

A. Design Flow

The design flow of modifying the HWIP together with a standard FPGA design methodology is shown in Fig. 1. First, the CV uses the high-level design description to setup the behavioral model of the FSM. Next the original FSM is modified so that the added FSM structure (such as additional states and transitions) and the original FSM form a new augmented FSM. The standard phases of the FPGA design methodology (e.g., design synthesis, placement and routing) can then be carried out. Finally, the HWIP configuration bitstream file can be downloaded into an FPGA device to run. Although there is an inevitable verification/testing overhead due to the added features in the augmented FSM, the entire traditional design methodology is maintained so the introduced design overhead can be controlled.

TABLE I

SYMBOLS AND ACRONYMS US ED I N THE PROTOCOL

B. Description of the Protocol

For reader’s convenience, we list the symbols and acronyms used in the protocol, as shown in Table I. The proposed PUF-FSM binding protocol is described as follows.

1) FPGA Device Enrollment: The device enrollment protocol is shown in Fig. 2(a). To enable the proposed scheme, the FPGA vendor (FV) initially tests the PUF for every piece of FPGA chip to obtain their random challenge-response pairs (CRPs) before selling them. The PUF challenges are stored in the non-volatile on-chip memory, which is automat- ically configured on F iPU F immediately when the device is powered on. Note that the PUF challenges can be public and do not need to be encrypted or hidden because of the uniqueness and unpredictability of the PUF responses. In addition, FV can also generate the I D(F iPU F ) which is a public unique serial number burned in at manufacturing time (e.g., Xilinx Device DNA [19]). If the core vendor (CV) or system developer (SD) wants to buy the FPGA embedded with the PUF, F iPU F , to start the HWIP/system development, the FV will respond with the I D(F iPU F ) from database and then sell the FPGA device F iPU F to the CV/SD.

2) Hardware IP Core Enrollment and Distribution: As Fig. 2(b) shows, before the system developer (SD) devel- ops its product, the core vendor (CV) creates the IP with I D(H W I Pj ). The CV then synthesizes the H W I Pj with the PUF-binding FSM into the bit-stream to generate the new version b{H W I Pj }locked . This process can be expressed as b{H W I Pj }locked = Lock b{H W I Pj }. The CV stores I D(H W I Pj ) and b{H W I Pj }locked in its database, and releases I D(H W I Pj ) for sale. When a SD needs the H W I Pj to develop FPGA-based products, it asks for buying it via sending the I D(H W I Pj ) to the CV. The CV then looks up the database for I D(H W I Pj ) and sends the corresponding b{H W I Pj }locked , the locked HWIP bit-stream, to the SD.

3) Hardware IP Core Licensing: As Fig. 2(c) shows, when the system developer (SD) requires to unlock the purchased b{H W I Pj }locked in their FPGA-based products, it sends I D(F iPU F ) and I D(H W I Pj ) to the core vendor (CV). The CV will send I D(F iPU F ) to the FPGA vendor (FV) to obtain the corresponding CRPs and then calculate licenses based on the CRPs and the modified FSM. The computed licenses can be public. Finally, the licenses are sent to the SD to unlock b{H W I Pj }. This process can be expressed as b{H W I Pj }unlocked = b{H W I Pj }locked (Li censes). Note that the CRPs should be securely transferred from the FV to the CV or SD.

ZHANG et al.: PUF-FSM BINDING SCHEME FOR FPGA IP PROTECTION AND PAY-PER-DEVICE LICENSING 1141

Fig. 2. Hardware IP core binding protocol. (a) FV generates I D(F iPU F ) and CRPs and then sells devices to SD and CV. (b) CV generates the locked H W I Pj and distributes it to SD. (c) CV licenses H W I Pj to SD. (d) SD licenses Pr oduct j to EU.

Fig. 3. Configuration process of an FPGA-based product containing multiple locked hardware IP cores.

4) Product Licensing: As Fig. 2(d) shows, if an end user (EU) would like to buy the products developed by the system developer (SD) to run on a specific FPGA device F iPU F , it should send I D(F

i PU F ) and ID( Pr od uct j ) to SD.

The SD will send I D(F iPU F ) to FV to obtain the corre- sponding CRPs and then calculate licenses based on the FPGA-PUF responses and the modified FSM. Note that the licenses can be public. Finally, the licenses are sent to EUs to unlock b{H W I Pj }. This process can be denoted as b{ Pr od uct j }unlocked = b{ Pr od uct j }locked (Li censes).

C. System Integration

The proposed binding scheme can support multiple HWIP cores to be integrated on a single FPGA design. To develop an FPGA-based product, the system developer (SD) obtains

an FPGA device with the hard core PUF inside from the FPGA vendor (FV) [following Fig. 2(a)], the required third- party HWIP cores from the core vendors (CVs) [following Fig. 2(b)] and the required licenses for these cores from the CVs [following Fig. 2(c)]. For example, when there are two different HWIP cores from two different CVs; the SD can integrate them into the same FPGA device by putting the PUF challenges from the FV and the authorized licenses from the CVs in a nonvolatile memory (NVM) next to the FPGA device, then our IP protection scheme will work as shown in Fig. 3. When the system is powered on, the activation process checks the PUF-based licenses of the IPs and loads the unlocked IP cores into the reconfigurable FPGA fabric. If a purchased HWIP is copied to an unauthorized FPGA device, even the same license cannot unlock the HWIP because the unauthorized FPGA could not generate the same PUF responses as the authorized one.

V. HWIP LOCKING MECHANISM

A. Locking the Hardware IP

In this section, we describe a prototyping design of the lock mechanism proposed in the binding scheme. The lock is achieved by exploiting PUF’s unique properties (unclonable, persistent and unpredictable). As Fig. 4 shows, we use the PUF response to control the transitions of the FSM in the HWIP. The error corrected PUF response is used to uniquely determine the transitions of the state transition graph (STG) of the HWIP (the IP behavior); without the correct PUF response, the STG would not perform correctly. Therefore, the circuit is kept locked until the correct license (formed by

1142 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 6, JUNE 2015

Fig. 4. PUF response is used to uniquely control the transitions of the STG.

Fig. 5. The binding FSM structure. The original states are shown in dark and the added states are shown in white on the STG of the added FSM.

the correct PUF response) unlocks it. It should be noticed that the computed licenses can also be public and different PUF responses can be used to calculate different licenses. Additionally, the FPGA vendor often computes the error correcting code (ECC) to adjust for any bit-flip to the PUF output (response) because the PUF output is hard to maintain absolutely stable due to the noise or other sources of physical uncertainty.

As an example, considering the original STG with 6 states, S0 ∼ S5, in Fig. 5, the transition from state S0 to S1 is excited by a specific input combination. S0 is called the reset state of the original FSM and S1 is the next state of this transition. Now we introduce the method to generate a new FSM with additional structure to bind with PUF. We add M (M is an even number) layers of states to form the added FSM. Any even-number layer consists of m states and any odd-number layer only has one state. We define a fixed power-up state Sr for the binding FSM. The first transition step starts from Sr with m transitional edges to each of the other m states. Then the second transition step goes from each of these m states to the next layer (odd layer). After the M -layer (M transition steps) transitions, the state transits to S0 which is the unlocked state (the reset state).

Assuming the input of the original STG is k-bit long, we define a k-bit input sequence: {b1b2 · · · bk }ti ({t|t ∈ N, 1 ≤ t ≤ 2k, k ∈ N }; i ∈ N, 1 ≤ i ≤ M) where i denotes the i -th transitional step and t denotes the specific transition in the i -th transitional step. The k-bit input is the function of a L-bit PUF response which determines the transition path in the transitional steps. The odd-number transitional steps are determined by partial bits of the PUF response, and the even- number transitions of the FSM are determined by the license and the rest PUF response.

Fig. 6. An example of the lock mechanism and the generation of the licenses for sequential circuit dk16.

For an added FSM structure of M layers, every transitional step needs log2(m)-bit PUF output. Hence, the length of required PUF bits can be formulated as Eq. (1) shows.

L PU F = M × log2(m) (1) In addition, log2(m)-bit license should also be provided in

the even-layer transitional step. The length of license bits can be computed by:

Llicense = M

2 × log2(m) (2)

Note that L PU F and Llicense should be sufficiently long in practice in order to guarantee the security of this model.

To illustrate the key idea of our approach, we give an example of the implementation of the lock and the generation of the licenses for benchmark circuit dk16 shown in Fig. 6. Considering a two-layer added FSM structure composed by two transition steps, assume k = 2, hence each transitional step consists of 4 (22 = 4) edges forming the 4 transition paths. In the step 1, we use {b1b2}ti (t = 1, 2, 3, 4) which is designed to be 4 different values to distinguish the 4 edges. Then the value of {b1b2}ti will be decided by a 2-bit PUF output value once the design begins to run, it begins from Sr to one of the four connected states depending on the first 2-bit PUF outputs. As the first 2-bit PUF output value is “01”, which equals to the designed {01}21, thus the 1st step will transition from Sr to S7. Then in the 2nd step, the design can only possibly transition from S7 to S10 when the first two input bits equal to {10}22. To possibly enable the transition, the second 2-bit PUF output “00” should be XOR’d with a 2-bit key that is able to generate the result of “10” (in this case the key should be “10”). The FSM can transit from state Sr to s_1 (the original reset state of dk16) with the calculated license and the PUF response.

B. Unlocking the Hardware IP

The PUF outputs L PU F bits to determine the transitions of the binding FSM. Now, an attacker with no information about the transition table of the FSM cannot find the correct

ZHANG et al.: PUF-FSM BINDING SCHEME FOR FPGA IP PROTECTION AND PAY-PER-DEVICE LICENSING 1143

Fig. 7. The structure of the delay-based PUF design.

sequence of the primary input combinations to arrive at the reset state S0. Hence, the CV is the only one who can compute the license to unlock the b{H W I Pj }locked .

The unlocking process is stated as follows. The FV provides enrolled PUF-embedded FPGA F iPU F ; each F

i PU F provides a

specific set of PUF challenges. If a b{H W I Pj }locked is ille- gally over-used, copied or cloned by a SD, it would be locked into the fixed power-up state, Sr , on the event of powering up the F iPU F because it does not have the correct PUF responses for the challenges. Hence, in order to unlock the design, the CV must use the received L PU F -PUF responses from the FV who tests the PUF responses on the provided PUF challenges, and then calculates the correct Llicense -bit license for SD. SD can use this license to unlock the b{H W I Pj }locked correctly.

VI. THE REFERENCE IMPLEMENTATION OF PUF

Many kinds of PUF have been proposed in the past decade [42], such as optical PUF, SRAM PUF, arbiter PUF and ring oscillator PUF. Some of them have also been implemented on FPGAs [17], [18], [21], [22]. The proposed binding method can work with any PUF implemented on FPGA that satisfies the properties defined in section III.A. Which PUF to use is up to the FPGA vendor. In this study, we give a concrete implementation based on a delay-based PUF for the designers to refer to. This PUF is designed specifically for FPGAs. It does not need the hard macro with fix routing and is completely described in VHDL with the merits of easy-of-use and low silicon area overhead [41].

A. A Delay-Based PUF

In this paper, we designed and implemented a delay-based PUF on 28nm FPGAs, which takes advantage of the manu- factured difference of the switching latencies of two carry- chain multiplexers on the FPGA to produce a positive pulse (glitch) at the output of downstream multiplexer. The glitch can be used to set the output of a D flip-flop to logic-1 from the default logic-0, which forms a one-bit PUF response. The detailed structure of the PUF design is illustrated in Fig. 7. The shift register contents are pre-initialized as follows:

• Input A: 0x5555 (0101010101010101) • Input B: 0xAAAA (1010101010101010)

Fig. 8. The new prototype implementation of a primitive PUF on Xilinx Zynq-7000 FPGA.

When the look-up-table (LUT) A and its driving multiplexer A are faster than the LUT B and multiplexer B, the output OUT would be logic-1.

Note that the current delay-based PUF [41] for FPGAs cannot be directly implemented on the latest Xilinx FPGAs such as Virtex-7, Kintex-7, Artix-7 and Zynq-7000 since the structure of SLICE of the latest Xilinx FPGAs are different from that of the previous FPGA families such as Virtex-5. In the architecture of Virtex-5 FPGAs, once a LUT in SLICEM is configured into a shift register, the logic-0 data input of multiplexer can be connected to a logic-0 signal to meet the design requirement. However, in the SLICEM of Zynq-7000 FPGAs, two optional paths of logic-0 data input of the carry chain multiplexer have been used as output or input of a shift register, which cannot meet the requirement that logic-0 data input of multiplexer should be always connected to a logic-0 signal.

To solve this problem, we use four SLICEs to implement one bit PUF signature. In the layout of Xilinx Zynq-7000 XC7Z020 FPGA, there are two SLICEs in one CLB. A SLICE whose X coordinate is even number is SLICEM; then the other SLICE in the same CLB would be SLICEL. Two SLICEMs are configured into two shift registers respectively, while their corresponding SLICELs are configured into a carry chain multiplexer. As shown in Fig. 8, four slices are used to implement a new primitive PUF. The dotted line represents the direction of data flow.

B. Reliability-Enhancing Techniques

Silicon PUF is based on manufacture variation, which may be very sensitive to the operating environment such as voltage and temperature, particularly for delay based PUF [10], [22]. It is very hard for any known PUF to maintain an absolutely stable response. Methods such as error correcting [21], [40], pattern matching [44], [45],

1144 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 6, JUNE 2015

and temperature aware collaboration [46] have been proposed to correct bit flips in PUF responses to generate stable PUF output.

These state-of-the-arts have already been very successful in reducing and correcting the PUF bit errors. For example, by using the Index-Based Syndrome coding (IBS), error correc- tion performed on output responses of ring oscillator (RO) PUFs implemented on Virtex-5 FPGAs has an error rate less than 10−6 when temperature goes from −55°C to 125°C under 1.0V operating voltage with a ± 10% variation [40]. Maes et al. demonstrated that an efficient and extremely low overhead BCH decoder specially for correcting bit flips in PUF responses utilizes merely 112 Slices on a Xilinx Spartan-6 FPGA [21]. Paral and Devadas proposed to use string pattern matching to generate reliable PUF responses, both the false positive and false negative rates can be less than 10−9 [44]. Yin and Qu built a temperature aware collaborative RO PUF where they measure the PUF output values at different temper- atures and choose the correct one based on the real operating temperature from on-chip temperature sensors, which ideally guarantees no bit error [46].

Moreover, electro-migration, hot carrier injection (HCI), negative bias temperature instability (NBTI) and temperature- dependent dielectric breakdown (TDDB) cause the device aging, which would impact the stability of PUF signatures [47], [48]. For example, Ganta et al. [48] observe that around 4% of the RO-PUF bits are prone to instability due to aging in various operating conditions. Recently, the corresponding aging resistant techniques [49], [50] are also developed.

As we will report in Section VIII, our reference delay- based PUF does have bit errors when operating at different temperatures. However, such error will not cause any false positive or false negative, which means that we will be able to distinguish a PUF response with bit errors from a PUF response from a different device. To keep our discussion focused on the new PUF-FSM binding scheme for FPGA IP protection and pay-per-device licensing, we will not elaborate how to improve the reliability of the above reference delay- based PUF and the associated cost. When high reliable PUF responses are critical, one can always use one or more of the reliability-enhancing techniques. It will be a task for the vendors and IP developers to balance the tradeoff between PUF reliability and design overhead.

C. The Integration Architecture

The delay-based PUF in this paper is designed in HDL, and hence has the merits of easy-of-use and high flexibility. The PUF can be implanted into FPGA in the form of soft- core or hard-core. Soft-core PUF is implemented in FPGA fabric while hard-core PUF actually physically implemented as a structure in the silicon connected to the FPGA fabric. Both hard-cores and soft-cores have been widely adopted in FPGA industry. An example of FPGA with hard-cores is the ARM Cortex-A9 dual-core MCU used in the new Xilinx Zynq-7000 System on a Programmable Chip (SOPC). On the other hand, soft-cores are more commonly used in FPGAs such as MicroBlaze, Nios II and OpenRISC.

Fig. 9. An example of a soft-core PUF implanted in a SOPC.

Fig. 9 illustrates how a soft-core PUF can be implanted into a SOPC. The PUF is mounted on PLB bus to connect to a Xilinx MicroBlaze soft-core embedded processor. In our proposed binding scheme, the PUF will only be used when there is a need to unlock the IPs, normally during the FPGA power-up process. When the FPGA is running, the PUF will not be needed anymore. Therefore, we propose to power off the PUF unit once the IPs are unlocked. This mechanism can easily be implemented with some control logic and brings several advantages. First, by shutting down the hard-core or soft-core PUF unit, it will not consume unnecessary power; second, when the PUF unit is off, there will not be any leak of timing, power, or electromagnetic emanation from the PUF unit, so it will be more resilient to potential side channel attacks.

VII. THE SECURITY ANALYSIS

The objective of the proposed PUF-FSM binding method is to protect the HWIPs from the piracy attacks such as cloning, copying, unauthorized redistribution, over-use, etc. To analyze the security of this method, we consider the following existing attacks:

• Brute force. The adversary tries to guess the correct license to unlock the b{H W I Pj }locked . By using the unclonable PUF responses to control the transition of the added STG, the space of the correct license becomes exponential, making such brute force attack infeasible. For example, when L PU F = 256-bit (License = 128-bit), the search space of such brute force attack will be all the 2128 possible license values.

• PUF removal/tampering attack. The adversary tries to remove/tamper the PUF on the FPGA, for example by replacing the PUF with a SRAM that contains PUF responses from a previously unlocked hardware IP. Then the license for unlocking the previous HWIP can be used to unlock a new HWIP. There are several countermeasures to address this kind of attack, such as adding obfuscated states within the FSM for PUF checking [27]. Our binding scheme can adopt these countermeasures.

• Simulating PUF. According to the intrinsic properties of the PUFs described above, it is impractical to dupli- cate a PUF with functional and timing characteristics identical to another PUF. Although machine learning

ZHANG et al.: PUF-FSM BINDING SCHEME FOR FPGA IP PROTECTION AND PAY-PER-DEVICE LICENSING 1145

techniques [14] have been used to model some strong PUFs with high prediction rate, they need a huge amount of PUF CRPs during the learning phase. Therefore, this attack will not be effective to weak PUFs such as the one used in this paper, SRAM PUF [18], [29] and similar architectures.

• Tapping PUF responses. In the binding scheme, the secret PUF response is ephemeral (the response is only used to unlock HWIPs at boot time) and will be immedi- ately cleared after use, and hence it resists tapping PUF responses.

• Reverse engineering the added FSM. An adversary tries to extract the STG and separate/remove the added STG from the original STG. However, STG recovery is a computationally intractable problem [15], [24], [28], and there exist effective methods that we can use in our scheme against such attacks such as creating black holes in the added FSM and merging the added FSM with the test and other FSMs [24].

• Side channel attacks. These attacks statistically analyze the time, power consumption or electromagnetic ema- nation of the cryptographic devices to gain knowledge about integrated secrets. Our delay-based glitch PUF architecture (see Fig. 7) uses multiple flip-flops in parallel and leaves little room for side channel attacks. For elec- tromagnetic emanation analysis, it is practically difficult to locate each flip-flop on the die of an FPGA and to focus the EM probe mainly on the radiation of its components. Timing and power analysis attacks are unlikely because all the flip-flops will be on regardless of the PUF bit will be a 0 or a 1 and the PUF will only be used at the IP unlocking phrase. However, our approach is not completely side channel attack free as it is well- known that any PUF-based security mechanism would be vulnerable to side channel attacks unless appropriate countermeasures are taken [21].

VIII. EXPERIMENTAL RESULTS AND ANALYSIS

We have performed a set of experiments to evaluate the effectiveness of the proposed new binding method. The exper- iments include two parts: the reference implementation of the delay-based PUF on the 28nm Xilinx FPGAs, Zynq-7000 FPGAs; and the evaluation of the PUF-bound FSM on FPGAs.

A. Design Evaluation of a Delay-Based PUF

We implemented 16 identical 64-bit PUFs at different loca- tions on a Xilinx Zynq-7000 FPGA. We used range constraints (ROLC_RANGE statement) supported by the Xilinx integrated development kit to place the PUF design to the designated area. Hence, the responses from these PUFs will be indepen- dent. It is well-known that the manufacture variation between two chips is normally larger than the variation between dif- ferent regions on the same chip. Consequently, if the PUFs located in the different regions on a single FPGA produce unique outputs, we would have the strong confidence that the PUF outputs from different chips should also be unique [41]. In this section, we first show the area overhead caused by

the PUFs, and then discuss the uniqueness and reliability of the PUF outputs.

1) Area Overhead: The Xilinx Zynq-7000 XC7Z020 FPGA has about 53,200 LUTs, 17400 of which can be used as storage or shift registers. In our experiments, a 128-bit PUF will consume 258 shift register LUTs (utilization: 1%) and 256 flip-flops (utilization: 1%). Hence, the reference PUF implementation’s area overhead can be neglected.

2) Uniqueness: The uniqueness shows how uniquely a PUF response can be, which determines the quality of the PUF. It is not acceptable if different PUFs produce the same or very similar responses when fed with the same challenge. We use Hamming Distance (HD) to evaluate the PUF response’s uniqueness. For a pair of n-bit PUF responses: Pi and Pj (i �= j ), their HD is the number of bits that Pi and Pj are different. A PUF response is unique as long as it has a non-zero HD with the responses of other challenges. However, due to reliability concerns (see item 3) below for more details), the PUF responses under different oper- ating environments may have bit errors and thus produce the same response on different challenges when their des- ignate responses have a very small HD. We define, for k n-bit PUF responses: P1, P2, · · · , Pk , their average pairwise HD as:

u = 2 k(k − 1)

k−1∑

i=1

k∑

j =i+1

H D(Pi , Pj )

n × 100% (3)

where,

H D(Pi , Pj ) = n∑

m=1 (ri,m ⊕ r j,m )

ri,m is the m-th response bit of the n-bit response string from PUF Pi .

If each PUF response is unique and logic-0 and logic-1 are distributed in responses uniformly, the expectation of HDs between the PUF responses should be 50%. In our experiment, we use k = 16 and n = 64. From the (16 ∗ 15)/2 = 120 data points of pairwise HD, we have u = 49.6%, which means that on average, any pair of PUF responses have a HD of 31.75 bits.

To further investigate the PUF response’s uniqueness, we consider the frequency histogram of these 120 pairwise HD which is shown in Fig. 10. These HDs are concentrated around the expected value of 32 (which is half of 64 bits) with max- imum equals to 45 and minimum equals to 18. This implies that for two different challenges to generate the same response, at least one of the PUF response has to have 9 out of the total 64 bits flipped, as we will see next, this is almost impossible to happen with the current PUF technologies. Consequently, we conclude that the implemented PUF achieves good response uniqueness.

3) Reliability: Reliability is used to assess the stability of PUF responses in different environments. Ideally, PUF responses challenged by the same input should remain the same in repeated multiple tests. However, PUF responses may change due to factors such as ambient temperature variation and supply voltage fluctuation since these factors

1146 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 6, JUNE 2015

Fig. 10. PUF response uniqueness: Hamming distance distribution.

Fig. 11. PUF response variability at high vs. normal temperature.

may affect circuit delay in practice. As a most-effective factor appearing in normal using scenarios, the temperature variation plays very important role to the PUF performance because it can affect the circuit delay. In this study, we select the temperature as the effecting environmental factor to verify the PUF performance. We expect the PUF can also have good uniqueness and reliability when the FPGA runs at high temperature which may be caused by high working load, poor ventilation and high environmental temperature and so on.

We used the Xilinx Chip-Scope to monitor and read the temperature of FPGA chip with PUFs embedded. Room tem- perature was 15°C, and the normal temperature of the FPGA chip is about 40°C. Then we used an electric hair dryer to raise the FPGA chip temperature to 70°C and tested the PUF responses.

In order to verify the reliability of the PUF at the high temperature, the 64-bit PUF responses were recorded when the FPGA temperatures were about 70°C. The HDs were then calculated between the high-temperature responses and the normal-temperature responses for the 16 testing PUFs as Fig. 11 shows. As for the same PUF under the same challenge at high and normal temperatures, the maximal, minimal and average HD of the responses were 9, 2 and 5.5 respectively; there were 81.25% HDs distributed within the range [3, 8] (small difference). As shown in Fig. 11, when the temperature

TABLE II

STATI S TI CS F OR MCNC’91 BENCHMARKS

rose from 40 °C to 70 °C, the reliability degradation was small: the average HD increased from 2.38 to 5.50, and the maximum increased from 6 to 9. From Fig. 10 and Fig. 11, one can see clearly there is no overlap between the number of bit errors at high temperature (maximum to be 9 bits) and the Hamming distance of two different PUF responses (minimum at 18 bits). This phenomenon is known as a vacuum belt: if we use any value x between 10 and 17 as a threshold, when the number of bit difference between the PUF response at run time and the original correct PUF response is less than x, they are errors from the same PUF response; otherwise, they are from different PUF response. Therefore, there will not be any false positive or false negative. As we have discussed earlier in Section VI.B, when the design exhibits large variation to operating environment and there is no vacuum belt, we can apply reliability-enhancing techniques to reduce the bit errors from PUF response.

B. Overhead Analysis of Modifying FSM

We performed experiments to evaluate the overhead incurred by modifying FSM on the MCNC’91 benchmark sequential circuits and FSM circuits randomly generated by GenFSM [31]. The circuits are described in KISS2 format. Firstly, we use a JAVA program to add the states and transitions for the circuits in KISS2 format, and then use the kiss2vl tool [30] to convert KISS2 to Verilog. Finally, each FSM circuit in Verilog format was synthesized and implemented using the Xilinx ISE 14.1 on the Xilinx Virtex5 FPGA XC5VLX50T, featuring 7200 slices and 28800 Slice LUTs. All experiments were conducted on a 2.4GHz AMD Athlon(tm) 64 Processor 3800+ Dell OptiPlex 740 machine with 1GB RAM.

Table II gives the original synthesis summary conducted on MCNC’91 benchmark designs. The columns “|S|”, “PI”, “PO” and “T” are the numbers of states, input variables, output variables and transitions, respectively, in each FSM benchmark. The columns “LUTs”, “Slices”, “Delay” and “Power” are the “Number of Slice LUTs”, “Number of occu- pied Slices”, “Minimum period” and the “Estimated power”, respectively, of the design with the original FSM as reported by the ISE tools. The Minimum period was obtained by using the Timing Analyzer, and the Power is the estimated power obtained by using the XPower Analyzer.

In our experiment, the number of replicated states m in each odd layer and the number of layers M in the added FSM were

ZHANG et al.: PUF-FSM BINDING SCHEME FOR FPGA IP PROTECTION AND PAY-PER-DEVICE LICENSING 1147

TABLE III

STATI S TI CS F OR MCNC’91 BENCHMARKS WI TH OUR METHOD WHEN m = 4 & M = 4

TABLE IV

STATI S TI CS F OR MCNC’91 BENCHMARKS WI TH OUR METHOD WHEN m = 4 & M = 6

TABLE V

STATI S TI CS F OR LARGE FSMs GENERATED BY GENFSM WI TH OUR METHOD WHEN m = 4 & M = 4

set as parameters. Table III and Table IV show the synthesis summary on the benchmark circuits processed by our method when (m = 4 & M = 4) and (m = 4 & M = 6), respectively. Resources overhead is denoted by the increased “Number of Slice LUTs” and “Number of occupied Slices”. Timing overhead is measured by the increased Minimum period. �R-LUTs and �R-Slices are normalized resources overhead in our proposed scheme in LUTs and Slices, respectively. �D, and �P are the normalized overhead in delay and power, respectively. We can see from Table III and Table IV that the resources, timing and power overhead due to modifying FSM seems to be independent of the benchmark circuit size. The average resources, power and timing overhead is 52.02% for LUTs (55.34% for Slices), 11.77% and 0.03% when m = 4 & M = 4; and 61.27% for LUTs (49.21% for Slices), 13.91% and 0.03% when m = 4 & M = 6.

The Table III and Table IV reveal that the power is rather low, and the timing degradation is moderate (11.77% for Table III and 13.91% for Table IV on average) and even negative in some instances. A negative percentage implies that our method has actually improved the performance. The high area overhead and moderate timing overhead on these small benchmark circuits is a direct result of the simplicity of these circuits as they contain only control paths. In practice, an actual HWIP would be much larger with lots of other components in addition to control paths. In those cases, we expect the overhead to be small.

To demonstrate this, we use GenFSM [31] to generate ten random STGs with hundreds of states and hundreds to thou- sands of state transitions for experimentation by specifying the number of inputs, outputs and states. The experimental results, as shown in Table V, indicate that our method introduces very

1148 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 6, JUNE 2015

Fig. 12. Area, delay, and power overhead with different M = (2,4,6,8,10,12) and m = (2,4,6,8,10) for circuit planet.

low “�R-LUTs”, “�R-Slices”, “�D” and “�P” for large FSM designs, the average of resources, timing and power overhead are −2.67% for LUTs (−6.25% for Slices), 0.64% and 0.01% respectively. In addition, it must be noted that the overhead could be much less in large designs where there are many components other than control paths, such as memory and I/O peripheries. Furthermore, the control path realized by the binding FSM on each benchmark is only a small part of the overall size of the design (≤ 1%) [23]. Therefore, adding more control paths to the binding FSM would be acceptable with low overhead on area, timing and power in practice.

And finally, we discuss the impact of various m and M on resources, timing and power overhead for benchmarks. Fig. 12 shows the impact of various M and various m on resources, timing and power overhead for benchmark planet when M was assigned to 2, 4, 6, 8, 10, 12 and m was assigned to 2, 4, 6, 8, 10, successively. It can be seen that the power overhead is negligible, and the resources and timing overhead are roughly positive-correlated to both M and m, but nonlinear due to the optimization of the circuits during synthesis.

From the above results, we see that the PUF-FSM binding scheme can control the overhead on the area, power and timing especially in large designs. Proper m and M values can also be considered near the empirically proper values.

IX. CONCLUSION AND FUTURE WORK This article presents a new binding method that enables

binding hardware IPs to specific FPGAs by utilizing the PUF and the FSM of circuits. The method is fundamentally dif- ferent from the traditional encryption-based HWIP protection methods and offers the following advantages: 1) it can be used to protect the third-party FPGA IP cores in addition to the single FPGA configuration bitstream; 2) it does not need any third parties or permanent storage for secret keys in the FPGA; 3) it supports the pay-per-device licensing mechanism; and 4) it has low hardware cost. Experimental results on a reference implementation of the binding scheme show that a 128-bit delay-based PUF utilizes only 258 RAM-LUTs and 256 flip-flops on 28nm Xilinx FPGAs and the modified FSM only introduces 0.64% timing overhead and 0.01% power overhead on average for ten FSM designs randomly generated by GenGSM.

We conclude with a discussion on the limitations of our PUF-FSM binding scheme which lead to several future research directions. First, in our approach, we modify the FSM of a design to lock it and use the PUF response to unlock it. This effectively protects the whole design, not only the FSM, from attacks such as cloning, copying, misusing and unauthorized integration. However, the design compo- nents without bound FSMs will still be vulnerable to

ZHANG et al.: PUF-FSM BINDING SCHEME FOR FPGA IP PROTECTION AND PAY-PER-DEVICE LICENSING 1149

tamping attacks. Anti-tampering is beyond the scope of this article, but it will be interesting to study how our approach can be combined with anti-tamper methods such as those described in [12]. Second, although we have argued that our approach is more resilient again physical attacks than encryption-based IP protection method. From the physical security perspective, it is well-known that any implemen- tation of a cryptographic primitive and PUF-based security mechanism would be vulnerable to side channel attacks when no appropriate countermeasures are taken [21]. It will be of high interest to develop effective countermeasures to enhance resiliency against various types of physical attacks.

ACKNOWLEDGMENT The authors would like to thank Dr. Qiang Wu,

Dr. Qiang Zhou, Wenjie Che and Kecheng Yang for reviewing this article and providing us feedback. We would also like to thank the anonymous reviewers for their insightful suggestions and comments.

REFERENCES

[1] S. Drimer, “Security for volatile FPGAs,” Ph.D. dissertation, Dept. Comput. Lab., Univ. Cambridge, Cambridge, U.K., Tech. Rep. UCAM-CL-TR-763, Nov. 2009.

[2] R. Maes, D. Schellekens, and I. Verbauwhede, “A pay-per-use licensing scheme for hardware IP cores in recent SRAM-based FPGAs,” IEEE Trans. Inf. Forensics Security, vol. 7, no. 1, pp. 98–108, Feb. 2012.

[3] T. Kean, “Cryptographic rights management of FPGA intellectual prop- erty cores,” in Proc. ACM/SIGDA 10th Int. Symp. Field-Program. Gate Arrays (FPGA), 2002, pp. 113–118.

[4] T. Güneysu, B. Möller, and C. Paar, “Dynamic intellectual property protection for reconfigurable devices,” in Proc. Int. Conf. Field-Program. Technol. (ICFPT), Dec. 2007, pp. 169–176.

[5] S. Drimer, T. Güneysu, M. G. Kuhn, and C. Paar. (2008). Protect- ing Multiple Cores in a Single FPGA Design. [Online]. Available: http://www.cl.cam.ac.uk/~sd410/papers/protect_many_cores.pdf

[6] “Design security in Stratix III devices (v1.5),” Altera, San Jose, CA, USA, White Paper 01010, Sep. 2009.

[7] “Using high security features in Virtex-II series FPGAs (v1.0),” Xilinx, San Jose, CA, USA, Appl. Note 766, Jul. 2004.

[8] S. Trimberger, J. Moore, and W. Lu, “Authenticated encryption for FPGA bitstreams,” in Proc. 19th ACM/SIGDA Symp. Field-Program. Gate Arrays (FPGA), 2011, pp. 83–86.

[9] “Protecting the FPGA design from common threats (v1.0),” Altera, San Jose, CA, USA, White Paper 01111, Jun. 2009.

[10] G. E. Suh and S. Devadas, “Physical unclonable functions for device authentication and secret key generation,” in Proc. 44th ACM/IEEE Design Autom. Conf. (DAC), Jun. 2007, pp. 9–14.

[11] M. A. Gora, A. Maiti, and P. Schaumont, “A flexible design flow for software IP binding in FPGA,” IEEE Trans. Ind. Informat., vol. 6, no. 4, pp. 719–728, Nov. 2010.

[12] S. J. Stone, “Anti-tamper method for field programmable gate arrays through dynamic reconfiguration and decoy circuits,” M.S. thesis, Dept. Elect. Comput. Eng., Air Force Inst. Technol., Wright-Patterson Air Force Base, OH, USA, 2008.

[13] K. Kepa, F. Morgan, and K. Kosciuszkiewicz, “IP protection in partially reconfigurable FPGAs,” in Proc. IEEE Int. Conf. Field-Program. Logic Appl. (FPL), Aug./Sep. 2009, pp. 403–409.

[14] U. Rührmair, F. Sehnke, J. Sölter, G. Dror, S. Devadas, and J. Schmidhuber, “Modeling attacks on physical unclonable functions,” in Proc. 17th ACM Conf. Comput. Commun. Secur. (CCS), 2010, pp. 237–249.

[15] A. L. Oliveira, “Robust techniques for watermarking sequential circuit designs,” in Proc. 36th Annu. ACM/IEEE Design Autom. Conf. (DAC), Jun. 1999, pp. 837–842.

[16] E. Simpson and P. Schaumont, “Offline hardware/software authentication for reconfigurable platforms,” in Proc. 8th Int. Conf. Cryptogr. Hardw. Embedded Syst. (CHES), 2006, pp. 311–323.

[17] M. Majzoobi and F. Koushanfar, “Time-bounded authentication of FPGAs,” IEEE Trans. Inf. Forensics Security, vol. 6, no. 3, pp. 1123–1135, Sep. 2011.

[18] J. Guajardo, S. S. Kumar, G.-J. Schrijen, and P. Tuyls, “FPGA intrinsic PUFs and their use for IP protection,” in Proc. 9th Int. Workshop Cryptogr. Hardw. Embedded Syst. (CHES), 2007, pp. 63–80.

[19] “Security solutions using Spartan-3 generation FPGAs (v1.1),” Xilinx, San Jose, CA, USA, White Paper 266, Apr. 2008.

[20] J. A. Roy, F. Koushanfar, and I. L. Markov, “EPIC: Ending piracy of integrated circuits,” in Proc. Eur. Design Test Conf. (DATE), 2008, pp. 1069–1074.

[21] R. Maes, A. Van Herrewege, and I. Verbauwhede, “PUFKY: A fully functional PUF-based cryptographic key generator,” in Proc. 14th Int. Conf. Cryptogr. Hardw. Embedded Syst. (CHES), 2012, pp. 302–319.

[22] M. Majzoobi, A. Kharaya, F. Koushanfar, and S. Devadas. (2014). “Auto- mated design, implementation, and evaluation of arbiter-based PUF on FPGA using programmable delay lines,” Rice Univ., Houston, TX, USA, Tech. Rep. 2014/639. [Online]. Available: http://eprint.iacr.org/

[23] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quan- titative Approach, 4th ed. San Mateo, CA, USA: Morgan Kaufmann, 2006.

[24] Y. M. Alkabani and F. Koushanfar, “Active hardware metering for intellectual property protection and security,” in Proc. 16th USENIX Secur. Symp., 2007, pp. 291–306.

[25] Y. Alkabani, F. Koushanfar, and M. Potkonjak, “Remote activation of ICs for piracy prevention and digital right management,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD), Nov. 2007, pp. 674–677.

[26] Y. Alkabani and F. Koushanfar, “Active control and digital rights management of integrated circuit IP cores,” in Proc. Int. Conf. Compil., Archit., Synth. Embedded Syst., 2008, pp. 227–234.

[27] F. Koushanfar, “Provably secure active IC metering techniques for piracy avoidance and digital rights management,” IEEE Trans. Inf. Forensics Security, vol. 7, no. 1, pp. 51–63, Feb. 2012.

[28] A. Cui, C.-H. Chang, S. Tahar, and A. T. Abdel-Hamid, “A robust FSM watermarking scheme for IP protection of sequential circuit design,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 30, no. 5, pp. 678–690, May 2011.

[29] D. E. Holcomb, W. P. Burleson, and K. Fu, “Power-up SRAM state as an identifying fingerprint and source of true random numbers,” IEEE Trans. Comput., vol. 58, no. 9, pp. 1198–1210, Sep. 2009.

[30] C. Pruteanu. (2000). Kiss to Verilog FSM Converter. [Online]. Available: http://codrin.freeshell.org/

[31] C. Pruteanu and C.-G. Haba, “GenFSM: A finite state machine gen- eration tool,” in Proc. 9th Int. Conf. Develop. Appl. Syst., 2008, pp. 165–168.

[32] A. B. Kahng et al., “Watermarking techniques for intellectual property protection,” in Proc. 35th Annu. Design Autom. Conf. (DAC), 1998, pp. 776–781.

[33] G. Qu and M. Potkonjak, Intellectual Property Protection in VLSI Designs: Theory and Practice. Boston, MA, USA: Kluwer, 2003.

[34] J. Zhang, Y. Lin, Q. Wu, and W. Che, “Watermarking FPGA bitfile for intellectual property protection,” Radioengineering, vol. 21, no. 2, pp. 764–771, 2012.

[35] J. Zhang, Y. Lin, W. Che, Q. Wu, Y. Lyu, and K. Zhao, “Efficient verification of IP watermarks in FPGA designs through lookup table content extracting,” IEICE Electron. Exp., vol. 9, no. 22, pp. 1735–1741, 2012.

[36] J. Zhang, Y. Lin, and G. Qu, “Reconfigurable binding against FPGA replay attacks,” ACM Trans. Design Autom. Electron. Syst., vol. 20, no. 2, Feb. 2015, Art. ID 33.

[37] J. Zhang et al., “FPGA IP protection by binding finite state machine to physical unclonable function,” in Proc. 23rd Int. Conf. Field-Program. Logic Appl. (FPL), Sep. 2013, pp. 1–4.

[38] F. Koushanfar and G. Qu, “Hardware metering,” in Proc. 38th Annu. Design Autom. Conf. (DAC), 2001, pp. 490–493.

[39] J.-L. Zhang, G. Qu, Y.-Q. Lv, and Q. Zhou, “A survey on silicon PUFs and recent advances in ring oscillator PUFs,” J. Comput. Sci. Technol., vol. 29, no. 4, pp. 664–678, 2014.

[40] M.-D. Yu and S. Devadas, “Secure and robust error correction for physical unclonable functions,” IEEE Des. Test Comput., vol. 27, no. 1, pp. 48–65, Jan./Feb. 2010.

[41] J. H. Anderson, “A PUF design for secure FPGA-based embedded systems,” in Proc. 15th Asia South Pacific, Design Autom. Conf. (ASP-DAC), 2010, pp. 1–6.

[42] R. Maes and I. Verbauwhede, “Physically unclonable functions: A study on the state of the art and future research directions,” in Towards Hardware-Intrinsic Security. Berlin, Germany: Springer-Verlag, 2010, pp. 3–37.

1150 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 6, JUNE 2015

[43] J. Guajardo, S. S. Kumar, G.-J. Schrijen, and P. Tuyls, “Physical unclonable functions and public-key crypto for FPGA IP protection,” in Proc. Int. Conf. Field Program. Logic Appl. (FPL), Aug. 2007, pp. 189–195.

[44] Z. Paral and S. Devadas, “Reliable and efficient PUF-based key genera- tion using pattern matching,” in Proc. IEEE Int. Symp. Hardw.-Oriented Secur. Trust (HOST), Jun. 2011, pp. 128–133.

[45] M. Majzoobi, M. Rostami, F. Koushanfar, D. S. Wallach, and S. Devadas, “Slender PUF protocol: A lightweight, robust, and secure authenti- cation by substring matching,” in Proc. IEEE Symp. Secur. Privacy Workshops (SPW), May 2012, pp. 33–44.

[46] G. Qu and C.-E. Yin, “Temperature-aware cooperative ring oscil- lator PUF,” in Proc. IEEE Int. Workshop Hardw.-Oriented Secur. Trust (HOST), Jul. 2009, pp. 36–42.

[47] A. Maiti, L. McDougall, and P. Schaumont, “The impact of aging on an FPGA-based physical unclonable function,” in Proc. Int. Conf. Field Program. Logic Appl. (FPL), Sep. 2011, pp. 151–156.

[48] D. Ganta and L. Nazhandali, “Study of IC aging on ring oscillator phys- ical unclonable functions,” in Proc. 15th Int. Symp. Quality Electron. Design (ISQED), Mar. 2014, pp. 461–466.

[49] M. Rahman, D. Forte, J. Fahrny, and M. Tehranipoor, “ARO-PUF: An aging-resistant ring oscillator PUF design,” in Proc. Design, Autom., Test Eur. Conf. Exhibit. (DATE), 2014, pp. 1–6.

[50] R. Maes and V. van der Leest, “Countering the effects of silicon aging on SRAM PUFs,” in Proc. IEEE Int. Symp. Hardw.-Oriented Secur. Trust (HOST), May 2014, pp. 148–153.

Jiliang Zhang received the B.E. degree in chemical engineering and technology from the Shandong Uni- versity of Science and Technology, Qingdao, China, in 2009, and the Ph.D. degree in computer applica- tion technology from Hunan University, Changsha, China, in 2015. From 2013 to 2014, he was a Research Scholar with the Maryland Embedded Sys- tems and Hardware Security Laboratory, University of Maryland, College Park, MD, USA. His research interests include hardware security, such as secu- rity for field-programmable gate arrays, PUF and

PUF-related applications, IC obfuscation, and IP protection.

Yaping Lin received the B.S. degree from Hunan University, Changsha, China, in 1982, the M.S. degree from the University of Defense Technology, Changsha, in 1985, and the Ph.D. degree from Hunan University, in 2000. From 2004 to 2005, he was a Visiting Scholar with the University of Texas at Arlington, Arlington, TX, USA. He is currently a Professor with the College of Information Science and Engineering, Hunan University. His primary research interests are in the area of computer networking and information

security with a focus on sensor networks, cloud security, and hardware related security.

Yongqiang Lyu received the B.S. degree in computer science from Xidian University, Xi’an, China, in 2001, and the M.S. and Ph.D. degrees in computer science from Tsinghua University, Beijing, China, in 2003 and 2006, respectively. He is cur- rently an Assistant Professor with the Research Institute of Information Technology, Tsinghua University. His research interest focuses on the hardware–software fusion architecture in emerging computing systems.

Gang Qu (SM’07) received the B.S. and M.S. degrees in mathematics from the University of Science and Technology of China, Hefei, China, in 1992 and 1994, respectively, and the Ph.D. degree in computer science from the University of California at Los Angeles, Los Angeles, CA, USA, in 2000. Upon graduation, he joined the University of Maryland, College Park, MD, USA, where he is currently a Professor with the Department of Electrical and Computer Engineering and the Institute for Systems Research.

He is a member of the Maryland Cybersecurity Center and the Maryland Energy Research Center. He is the Director of Maryland Embedded Systems and Hardware Security Laboratory, College Park, and the Wireless Sensors Laboratory.

His primary research interests are in the area of embedded systems and very large scale integration (VLSI) computer aided design (CAD) with a focus on low power system design and hardware related security and trust. He studies optimization and combinatorial problems and applies his theoretical discovery to applications in VLSI CAD, wireless sensor network, bioinformatics, and cybersecurity. He has received many awards for his academic achievements, teaching, and service to the research community. He serves as an Associate Editor of the IEEE E MBEDDED SYS TEMS LETTERS, and the Integration, the VLSI Journal.

sources/152/Trimberger and Moore - 2014 - FPGA Security Motivations, Features, and Applicat.pdf

INVITED P A P E R

FPGA Security: Motivations, Features, and Applications This paper discusses all aspects of FPGA security and trust.

By Stephen M. Trimberger, Fellow IEEE, and Jason J. Moore

ABSTRACT | Since their inception, field-programmable gate arrays (FPGAs) have grown in capacity and complexity so that

now FPGAs include millions of gates of logic, megabytes of

memory, high-speed transceivers, analog interfaces, and whole

multicore processors. Applications running in the FPGA include

communications infrastructure, digital cinema, sensitive data-

base access, critical industrial control, and high-performance

signal processing. As the value of the applications and the data

they handle have grown, so has the need to protect those

applications and data. Motivated by specific threats, this paper

describes FPGA security primitives from multiple FPGA ven-

dors and gives examples of those primitives in use in

applications.

KEYWORDS | Anti-tamper (AT); authentication; encryption; field-programmable gate arrays (FPGAs); information assur-

ance; physically uncloneable function (PUF); system on chip

(SoC); trust

I . I N T R O D U C T I O N

A. FPGAs and Programming Technology A field-programmable gate array (FPGA) is a semicon-

ductor device that can be programmed after manufacture

to perform a specific application design, typically specified

as a digital logic system [43]. A taxonomy of FPGAs com-

monly starts with the program storage technology (Fig. 1).

SRAM-programmed FPGAs store their configuration

data in internal volatile memory cells distributed through-

out the device. These are generally not SRAM cells, but are

more similar to static latch cells [43]. Xilinx’s 7-Series and Altera’s Stratix-5 are examples of popular SRAM-based

FPGAs. A recognized disadvantage of SRAM programming

stems from its volatility. When power is removed, the programming is lost, so an SRAM FPGA requires an exter-

nal nonvolatile memory for permanent storage of the ap-

plication program. When power is applied, the SRAM

FPGA loads its programming bitstream from that external

storage. Besides requiring a second device, the transmis-

sion of the program from the nonvolatile external memory

to the SRAM FPGA may expose the programming to a

potential adversary. The volatility of data may also be used as a positive security feature, enabling the SRAM FPGA to

clear all programming if it is tampered [48].

In contrast, flash memory programmable logic devices,

such as traditional complex programmable logic devices

(CPLDs), the Microsemi Corporation (Aliso Viejo, CA, USA)

SmartFusion2, and Lattice Semiconductor (Hillsboro, OR,

USA) ispXPGA [1], [21], are nonvolatile and use internal flash

memory to hold the programming. While the internal flash memory eliminates the need for an external nonvolatile storage

device and the consequent exposure of the programming to

potential adversaries, systems employing flash FPGAs com-

monly require in-system programming (ISP) of the FPGA. ISP

exposes the programming of the FPGA to the same security

concerns as SRAM FPGAs. The availability of reprogrammable

flash provides FPGA manufacturers with the ability to build

applications that ‘‘remember’’ information through power cyclesVuseful in cryptographic applications such as tamper logging and key revocation. Flash devices can also be erased

upon command to eliminate the design when needed.

Antifuse FPGAs, such as the Microsemi Axcelerator,

use a one-time programmable structure to form

Manuscript received September 14, 2013; revised May 16, 2014; accepted June 11, 2014.

Date of publication July 8, 2014; date of current version July 18, 2014.

S. M. Trimberger is with Xilinx, San Jose, CA 95124 USA (e-mail: [email protected]).

J. J. Moore is with Xilinx, Albuquerque, NM 87109 USA.

Digital Object Identifier: 10.1109/JPROC.2014.2331672

Fig. 1. FPGA taxonomy.

0018-9219 � 2014 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/ redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1248 Proceedings of the IEEE | Vol. 102, No. 8, August 2014

nonvolatile links between internal nodes [2], [25], [26]. An antifuse, commonly built as a programmable via be-

tween metal layers, is disconnected at manufacture. A

high-voltage pulse programs the fuse, causing it to form a

low-resistance connection between the internal nodes.

Antifuses are nonvolatile, but one-time programmable.

Once programmed, an antifuse FPGA cannot be changed

or reprogrammed. Because there is no need for external

configuration storage, the confidentiality and authentica- tion of configuration data is more easily maintained. ISP is

not possible. However, because the program cannot be

erased from the device, additional system-level security

concerns may remain.

B. Why SRAM? By far, the most common FPGAs, even in security-

conscious applications, are those programmed with SRAM. If SRAM programming exposes sensitive data to adversar-

ies, why would anyone use them? The popularity of SRAM

programming technology derives from the simplicity of its

manufacture: SRAM FPGAs require only transistors and

wires to realize the interconnect, configuration memory

cells and switches of the generic device. Therefore, SRAM

FPGAs take advantage of new process nodes earlier than

other FPGAs [47], which may be two process generations ahead of other technologies. This process advantage results

in higher performance, greater logic density, and improved

power efficiency for SRAM FPGAs. SRAM programming

also simplifies manufacturing test, where the SRAM FPGA

is typically programmed many times to perform self-tests.

In addition, SRAM FPGA applications can be easily up-

dated in the field in much the same way software is

updated. When used with strong bitstream security features, in-

cluding those described in this paper, the security of

SRAM FPGAs is on par with the security of nonvolatile

internal storage of the bitstream. Therefore, despite the

greater perceived security of antifuse and flash FPGAs,

SRAM FPGAs are deployed in many security-conscious

applications.

C. The FPGA Design Lifecycle The FPGA lifecycle includes two design flows: the base

array design and the application design (Fig. 2). Security

must be maintained through both [44]. The base array

design is a standard integrated circuit development flow

controlled by the FPGA manufacturer. The base array is

designed using commercial design tools and libraries,

manufactured at a foundry and tested. It is then typically sent to another facility for packaging and final test. The

resulting base array is shipped to a customer or authorized

distributor. The base array design is subject to all the sup-

ply chain trust and security concerns as any other integ-

rated circuit, including questions about tampering with

tools, supply-chain control, and reverse engineering. Large

FPGA manufacturers maintain a close watch on their

supply chain, tracking every manufactured device through

to final customer delivery or destruction. As the security

issues associated with the design and manufacture of the

base array are no different than those of other semicon-

ductor devices, this paper does not focus on the base array

design and manufacture, but instead focuses on the secu-

rity concerns that arise from the need to protect the appli- cation design.

The application design also has a design phase, typically

performed with FPGA vendors’ tools, often augmented

with commercial EDA tools. The application developer in-

tegrates design information or intellectual property (IP)

from a number of sources into an FPGA application: ori-

ginal and reused hardware description language (HDL)

code, libraries from the FPGA vendor and other parties and software for soft and hard microprocessors. The FPGA

vendor’s tools compile the application design into a bit-

stream, the programming of the FPGA base array to realize

the application function. As with any design process, the

design itself can be carried out in a secure location. Protec-

tion of IP during the design phase is no different for FPGAs

than it is for ASICs or microprocessors. Therefore, this

paper does not address design-phase security. A nonvolatile FPGA, such as a flash or antifuse FPGA,

may be programmed before it is shipped. An SRAM FPGA

is typically shipped with a separate nonvolatile memory

containing the programming, and when power is applied,

the FPGA loads its programming from the nonvolatile

memory.

D. This Paper This paper begins by focusing on those FPGA aspects

that impact security, both positively and negatively. It

summarizes the common threat vectors and then intro- duces some early FPGA security strategies. The remainder

of the paper focuses on modern FPGA security as it relates

to two of the primary security domains: information

assurance (IA) and anti-tamper (AT). In each domain, the

presentation describes the techniques that are currently

deployed, introducing them broadly, then using specific

threats to motivate additional detail. The various FPGA

Fig. 2. FPGA lifecycle flows. (Left) Generic integrated circuit flow for the base array. (Right) Application design and deployment flow.

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

Vol. 102, No. 8, August 2014 | Proceedings of the IEEE 1249

vendors have chosen solutions to these threats that are similar, yet they differ in detail. In this paper, we attempt

to describe the major security solutions deployed by large

FPGA vendors, outlining major distinctions while omitting

minor differences. Since FPGA security is continually

changing, newer FPGAs may well deploy different

mechanisms.

Following the discussion of security features is a sec-

tion showing applications using those features to achieve security goals. This paper concludes with a short discus-

sion of the future of FPGA security capabilities.

I I . F P G A S E C U R I T Y I N T R O D U C T I O N

A. Unique Aspects of FPGA Security FPGA programming bitstreams are qualitatively much

like microprocessor software. They are susceptible to all

the same security concerns that surround software, includ-

ing unauthorized copy, theft of IP embodied in the FPGA

application program, and tampering to introduce malware

[9], [46]. FPGA programming is present in the system in

the field, whether programmed directly in antifuses, flash

memory cells, or in an external nonvolatile memory. If an

adversary can recover the programming by reading the internal memory, intercepting the programming bit-

stream, or reverse-engineering programmed fuses from a

decapped device, then the application can be duplicated

and reverse engineered. SRAM FPGAs, in particular, have

been criticized over this concern [2], although Flash-based

FPGAs have the same susceptibility if in system repro-

grammability is required.

On the other hand, the application developer does not reveal the application design to FPGA vendors or their

suppliers. Because the FPGA base array is manufactured

without knowledge of the end application, there is no

chance of IP theft or tampering of an application design

during manufacture and test of the FPGA base array. Since

all FPGA devices are manufactured identically and sold

into a variety of applications, an adversary cannot discover

any application-dependent information by attacking the FPGA vendor’s supply chain.

Further, since the programming is not done with me-

tallization as is the case with ASIC devices, traditional

reverse engineering, where the mask layers are recognized

from a decapped device, does not work. Such reverse

engineering may yield the application-independent base

array, but not the application implemented on it.

B. Environment and the Cost of Security FPGA security is complicated by the environment in

which the FPGA is expected to perform. The design of

FPGA security features assumes no physical barrier and no

communication network: the FPGA may be in the hands of

an adversary with no trusted party available. This envi-

ronmental assumption distinguishes FPGA security from

internet security, where servers may physically reside in a trusted environment and those servers can verify identity

through name servers with which they are in communi-

cation. In FPGA security design, it is assumed that the

adversary has physical access to the device and may mount

any electrical, physical, side channel, or replay attack. The

rationale is straightforward: if the adversary does not have

such access, then the containing system could ensure the

security of the FPGA by controlling all access to the FPGA. In this case, built-in FPGA security would be redundant.

Although military systems may employ physical secu-

rity, the cost of ‘‘guns, gates, and guards’’ is impractical in

commercial systems. The adversary is assumed to have an

economic motive, such as theft of IP. Therefore, the secu-

rity applied in the commercial domain is an economic

concern where the cost of security measures is balanced

against the value of the information being protected. FPGA security is designed to make the cost of breaking the

security greater than the adversary’s expected economic

gain. This decision is ultimately in the hands of the

application developer, not the FPGA manufacturer.

As FPGAs have become larger and more capable, the

value of the IP of the application designs has grown, moti-

vating significant investment in built-in security functions.

Further, the value of the data handled by the FPGA has also increased significantly, including such information as

decrypted digital cinema and personal-data databases. As

a result, today we find FPGAs deployed in a security-

hostile environment, protecting data of great commercial

value.

I I I . T H R E A T S

An adversary may attack the IP of the application design

itself, the data stored in the application or the system of

which the FPGA is a part. Each type of data has different

value. Each attack requires different security features to

defend. The attacks of major concern to FPGA vendors can

be divided into categories.

A. Cloning/Overbuilding In cloning, an adversary copies the FPGA program-

ming, then uses it in an identical device, selling it as his

own. In overbuilding, an adversary such as a contract

manufacturer builds additional systems, inserting the legi-

timate bitstream into those systems and selling them

without the designer’s approval. Cloning may apply to an

entire design or may apply to a subset of the design, for

example, purchased cores that may be restricted by the seller. In both cases, the adversary does not require de-

tailed knowledge of the design.

B. Reverse Engineering An adversary may reverse engineer the bitstream to

recover the circuit design that it implements. This may be

done to understand and duplicate the functionality of that

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

1250 Proceedings of the IEEE | Vol. 102, No. 8, August 2014

application, but may also be used as part of an attack on other aspects of the system. Reverse engineering may be

used to tamper with the application to insert malware.

Historically, reverse engineering an FPGA bitstream, like

decompiling software, has been considered possible,

though tedious and nontrivial. Reverse engineering of

FPGA bitstreams is further complicated because FPGA

vendors do not have a standardized bitstream. As a result,

every new FPGA device requires a new bitstream reverse- engineering effort.

A more insidious problem is dealing with the size of the

application. Although reverse engineering may divulge the

netlist of the application, transforming a multimillion gate

netlist into an understandable design that can be modified

is problematic. The complexity of the application increases

its value, making theft attractive, but the consequent size

makes theft difficult. Regardless, researchers have periodically reported the

ability to reverse engineer unencrypted bitstreams. It

would seem imprudent to rely on the tedium of bitstream

reverse engineering to protect valuable IP.

C. Tampering In tampering, an adversary modifies an application

design. Tampering may be employed to add logic that leaks information from an application or tampering may

disable parts of the application, potentially defeating other

security measures. For the former, tampering must control

the application to set values in the bitstream, so reverse

engineering may also be required. However, for the latter,

merely scrambling parts of the bitstream may be

sufficient.

D. Spoofing In spoofing, an adversary replaces the FPGA bitstream

with his own. That bitstream may or may not include

components derived from cloning or reverse engineering.

A spoofed application may compromise the system in

which it operates.

E. Denial of Service, Destruction of the FPGA, and Substitution

Since it is assumed that the FPGA is in the hands of an

adversary, denial of service and malicious destruction of

the FPGA device are somewhat irrelevant. Rather than

mount a clever attack on the design to prevent the system

from operating, an adversary could simply smash the FPGA

with a hammer. Conversely, if a system requires an FPGA

containing a unique key, an adversary may choose to circumvent security measures by replacing the FPGA in a

system with another identically manufactured device from

the FPGA vendor without the key or with his own key. In

many cases, this substitution is simpler than attempting to

break the FPGA device security. Since these physical at-

tacks are so simple, FPGAs typically do not defend against

these types of threats.

I V . H I S T O R I C A L F P G A S E C U R I T Y

Early FPGAs contained very little logic, and by inference

that logic had low value. Therefore, when they were in- troduced, FPGAs provided only rudimentary protection

against threats.

FPGA manufacturers did not release the coding of their

bitstreams, though they did release a considerable amount

of information about the bitstream in tools and documen-

tation [50]. They considered the task of reverse engineering

the bitstream to be more expensive than the task of re-

creating the design by black-box observation of its operation.

A. Readback From their inception, FPGAs of all types included a

readback mechanism, whereby the program and data in the

device can be read out for test purposes. To prevent un-

authorized copy, early FPGAs followed the features of

programmable logic devices (PLDs) and included a prog-

ramming bit to disable the readback mechanism. This

method worked well for antifuse and flash-based FPGAs,

where the program could be loaded at a secure location, but SRAM FPGAs still needed to load the bitstream in the

field, while potentially in the hands of an adversary.

Preventing readback gave little protection if the bitstream

could be intercepted as it was loaded into the FPGA. For

this reason, antifuse FPGAs, that did not expose the

programming in the system, gained an early reputation for

being a more secure FPGA technology.

It is important to note that the readback function has, and continues to be, a valuable feature for both the FPGA

manufacturer and the user. Whether the manufacturer

uses it for device test, or the user employs readback for in-

system data integrity checks, it is a feature, much like

JTAG, that is useful but needs to be adequately protected to

avoid vulnerabilities.

Readback continues to be a concern, and as late as

2012, Skorobogatov and Woods [38] discovered a keyed back-door/test mechanism that enabled the readback fea-

ture of a Microsemi antifuse FPGA that was assumed to be

protected by the FuseLock protection mechanism [27].

B. Early Bitstream Protections for SRAM FPGAs Before bitstream encryption, two methods were used

to protect SRAM bitstreams. The first method was to load

the FPGA at a secure location and use a battery to hold

the configuration bitstream for the entire lifetime of the

fielded system [3]. Since programmable logic devices had

privacy settings to prevent readback of the program, and since the bitstream was never exposed outside the device,

this method assured that the bitstream running inside the

FPGA is both secure and unmodified. This is precisely the

same level of security achieved by antifuse and other

nonvolatile FPGAs. The drawback of this method is, of

course, the requirement that the system be powered

continually. As FPGAs grew larger and more complex, this

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

Vol. 102, No. 8, August 2014 | Proceedings of the IEEE 1251

solution became impractical due to increased standby power requirements.

The second solution was to use an external memory

with a unique identifier, and customize the FPGA program

to require that identifier, essentially tying the FPGA appli-

cation bitstream to a unique board-level identifier. An im-

provement to the simple test of the board-level identifier

uses an external keyed device that is queried with a ran-

dom number generated by the FPGA [6]. This solution defeats simple cloning because the bitstream only func-

tions correctly in the system with the correct external

identifier device. However, the bitstream for each system

is unique, which complicates the manufacturing process.

Further, the design can be copied by an adversary who

reverse engineers the bitstream, identifies the check logic,

and rebuilds the application with the check removed.

This solution increases the difficulty and cost of copy- ing the design but still relies on the difficulty of reverse

engineering the bitstream as the basis of security. This

solution was considered strong enough for many commer-

cial applications, and there is no evidence that anyone

mounted a successful attack on a device protected with it.

However, reliance on the tedium and complexity of bit-

stream reverse engineering seemed risky [49].

C. Modern FPGA Security As FPGAs grew in capacity, the applications grew in

value, driving the need for stronger security. Over the

years, FPGA vendors have implemented circuitry, soft-

ware, IP cores, and usage models to address security

threats. Since the FPGA application design is embodied in

a design file, aspects of information security, notably en-

cryption and authentication, were applied to FPGA bit- streams. But that was not enough. Given that FPGAs were

deployed into a hostile environment, measures were taken

also to improve protocols and implementations to secure

designs in the field. These include not only cryptography

on the configuration files but also development of fault-

tolerant design methodologies for the base array and for

applications. Today, FPGA security is strong enough that

they are deployed in security-sensitive applications in commercial and government systems [24].

V . I N F O R M A T I O N A S S U R A N C E

The basic tenets of information assurance (IA) are:

confidentiality, integrity, availability, authentication, and

nonrepudiation. As mentioned earlier, since access to the

FPGA is assumed, availability is not a requirement ad- dressed by FPGA security features. Nonrepudiation will be

addressed in the context of authentication. Therefore, we

focus on confidentiality, integrity, and authentication.

A. Confidentiality Large FPGA designs can contain IP of significant value,

and bitstream encryption prevents a competitor from

simply copying that IP. Encryption can also provide trust assurance by limiting access to the FPGA only to designs

constructed with the proper key.

1) Overview of Bitstream Encryption: Xilinx (San Jose, CA, USA) introduced bitstream encryption in 2001 in Virtex-II

devices [40], [41] to address the problem of unauthorized

copy of the bitstream as it is loaded into the FPGA from

external memory. Since that time, other FPGA vendors have added encrypted-bitstream capability.

Preventing unauthorized copy does not strictly

require encryption, since the task from a cryptographic

point of view is to determine if the bitstream is author-

ized to operate in the FPGA. This fundamentally requires

authentication, not confidentiality: a device could verify

a message authentication code on the bitstream. How-

ever, the adversary’s workaround is simple: reverse engi- neer the bitstream, recompile, and load it into a new

FPGA with the authentication removed. Therefore, re-

verse engineering must also be prevented, so confi-

dentiality of the bitstream becomes a requirement for

preventing cloning.

Virtex-II FPGAs used triple-Data Encryption Standard

(DES) encryption and subsequent Xilinx FPGAs use 256-b

Advanced Encryption Standard (AES). Recent SRAM devices from Altera Corporation (San Jose, CA, USA) [4]

and Flash devices from Microsemi Corporation (Aliso

Viejo, CA, USA) [28] also use 256 b AES. Lattice

Semiconductor (Hillsboro, OR, USA) devices use 128 b

AES [19]. Although features have changed over the years,

and details vary among vendors, the basics of FPGA

bitstream encryption for all SRAM and Flash FPGAs are

similar. The major components and use flow are described here with respect to the Xilinx, Inc. (San Jose, CA, USA)

7-series FPGA.

An application developer prepares a secured FPGA

application with the same tools and processes used for any

other application. At the end of the design process, when

the bitstream is generated, Xilinx proprietary software en-

crypts the bitstream. The Xilinx software can supply a ran-

domly generated key and initialization vector or the application developer may supply those values. The Xilinx

software produces the encrypted bitstream and a key-

insertion file.

2) Key Loading: At a secure facility, the application developer uses the key-insertion file to load the decryption

key into the FPGA through the JTAG scan chain, as shown

in Fig. 3. On-chip, the key is stored in either dedicated nonvolatile or volatile memory. FPGAs supply an inde-

pendent battery-backed array for volatile storage or one-

time-programmable eFuses for nonvolatile storage or both.

Typically, the key is loaded into the FPGA in plaintext

form, which is why this must be done at a trusted facility.

Alternative strategies for key loading and key storage are

discussed in Section VI-A2.

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

1252 Proceedings of the IEEE | Vol. 102, No. 8, August 2014

3) Bitstream Loading: Later, in the field when the FPGA board boots, the FPGA loads its bitstream from an external memory. The FPGA begins loading an unencrypted bit-

stream. If the bitstream includes an encrypted-bitstream

indicator, the FPGA starts the decryptor and decrypts the

remainder of the bitstream as it loads. If the encrypted-

bitstream indicator is not present, the FPGA bypasses the

decryptor. This feature allows an FPGA in the field to be

booted with either an encrypted bitstream or an unen-

crypted bitstream for test purposes without compromising the security of the bitstream confidentiality. In addition,

most FPGAs now offer the ability to force the device to

always configure with an encrypted bitstream.

B. Data Integrity Bitstream data integrity, the ability to ensure a design

has not been accidentally modified, was a feature of very

early FPGAs. In those early devices, an improperly prog-

rammed FPGA might enable two large internal drivers in

contention, generating excessive heat and current, dam-

aging the chip. To prevent this, data integrity checks were

added to FPGA bitstreams to detect corruption of the bit- stream during loading. Cyclic redundancy check (CRC), a

common data integrity check in data transmission proto-

cols, was deployed in many FPGAs. While CRC is effective

in detecting accidental data corruption, it is ineffective

against intentional data modification.

1) Tampering With Encrypted Bitstreams: Xilinx FPGAs use 256 b AES encryption [11] in cipher block chaining (CBC) mode of operation [33] to produce a stream cipher.

In CBC encryption, each block of data is first xored with

the ciphertext of the previous encryption before being en-

crypted. In decryption, the decrypted plaintext of each

block is xored with the ciphertext of the previous block

(Fig. 4). CBC causes blocks with identical plaintext (for

example, all zero) to encrypt to different ciphertext,

thereby eliminating a dictionary attack on the data. Altera

devices use AES in counter mode (CTR) [30]. In CTR mode, an encryptor encrypts the output of a counter to

generate a pseudorandom stream of bits. That pseudo-

random stream is xored with the plaintext to generate

ciphertext. On decryption, an encryptor generates the

same pseudorandom stream to recover the plaintext.

CBC and CTR are non-error-extension modes of ope-

ration, meaning that corruption of the encrypted data

causes only a localized corruption of the corresponding plaintext. Therefore, both CBC and CTR permit a ‘‘bit-

flipping’’ attack on the plaintext. The attack is shown in

Fig. 4 with respect to CBC. If an adversary inverts a bit in

the first encrypted block, as shown by the shaded area, the

first block will decrypt to unintelligible nonsense. How-

ever, the corresponding plaintext bit in the next decrypted

block is inverted. Bit-flipping CTR mode is more straight-

forward, since a bit flip anywhere in the ciphertext inverts the corresponding bit in the decrypted plaintext without

disrupting any other data.

Using this bit-flipping technique, an adversary can

selectively invert any number of bits in the decrypted bit-

stream. If the location and state of the target bit are

known, an adversary can set it. For example, if the logic to

enable bitstream readback is disabled with a ‘‘0’’ at a

specific location, an adversary could reenable bitstream readback without knowing the contents of the bitstream

by inverting that one bit. For this reason, disabling of

readback of an encrypted FPGA bitstream is not controlled

by bits in the encrypted bitstream itself, but is instead

controlled by the configuration logic of the FPGA. When

the FPGA loads an encrypted bitstream, readback is dis-

abled regardless of the bitstream contents. However, other

attacks may attempt to modify the FPGA in a simple way: enable the internal configuration access port (ICAP), ena-

ble input/output (I/O) blocks, or change clock speed in an

attempt to gain access to internal data.

Fig. 3. Encryption architecture for Xilinx 7-series FPGAs.

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

Vol. 102, No. 8, August 2014 | Proceedings of the IEEE 1253

Attacks on the bitstream are also possible without

knowing the specific bit to attack. In Fig. 4, the first block

of data is scrambled. An adversary does not know the

plaintext that results from modifying the ciphertext. If the

number of bits to control is small enough, an attacker with patience may attempt a brute-force attack on part of the

bitstream. Scrambling the bits may program the FPGA to

perform a function it was not supposed to, such as leak-

sensitive information.

Checksums and CRCs on the FPGA bitstream detect

errors in transmission, corrupted bitstreams, and uninten-

tionally flipped bits. However, it is computationally

straightforward to compute a revised CRC after tampering with the bitstream or to determine a set of bit flips that

produce the same CRC value. Further, a CRC is typically

only 16 or 32 b, so brute-force attacks on the CRC are

tractable. Finally, in some FPGA architectures, CRCs can

be disabled altogether. Simple data integrity checks are not

sufficient to ensure that a bitstream has not been inten-

tionally tampered.

C. Authentication Communication of the bitstream to the FPGA is a one-

way transfer. Therefore, two-way entity authentication

cannot be performed. Instead, FPGAs rely on one-way

message authentication, which assures the recipient of a

message that the message is exactly the message the sender intended [8]. Strong authentication requires a message

authentication code (MAC), a cryptographic hash function

computed over the entire message. The hash function must

be impossible to compute without knowing the plaintext of

the message. The difficulty of recomputation of the MAC

eliminates all forms of CRC as the hash function, since each

bit of the CRC is a known xor of a set of bits of the message.

Because authentication verifies that the application has

not been accidentally or intentionally altered, it assures

trust in the running application. That trust enables an ap-

plication developer to guarantee protection of crypto-

graphic services and the handling of sensitive data. These sensitive data may be customer data of high value, such as

personal data in a database or copyrighted video. The

cryptographic services may include key management

functions, encryption/decryption algorithms, or keys for

further partial reconfiguration of the FPGA. Data authen-

tication provides a strong root of trust, allowing an initial

FPGA configuration to act as a trusted boot loader for

trusted subsequent configuration of the FPGA. Xilinx integrated strong data authentication in Virtex-6

devices and 7-series to address the concerns of targeted

tampering with encrypted bitstreams and the inherent

cryptographic weaknesses of CRC. Microsemi also has a

dedicated data integrity check for all of the nonvolatile

configuration memory segments of some Flash devices

[28]. Authentication is described here as it is implemented

in Xilinx devices.

1) Data Authentication in Xilinx Virtex Devices: Virtex-6 and subsequent Xilinx FPGAs authenticate using the

secure hash algorithm (SHA-256) to compute a 256-b

keyed hashed MAC (HMAC) [12], [13], [42]. SHA-256 is

a one-way hashing algorithm with a compact hardware

implementation. The keyed HMAC requires a secret au-

thentication key included in the hash. The MAC result cannot be computed without knowing the key, thereby

authenticating the identity of the sender as well as veri-

fying that the message has not been altered. The 256-b

hash size ensures that any tampering with the bitstream

will be detected with a high probability. HMAC with

Fig. 4. Bit-flipping attack on CBC mode.

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

1254 Proceedings of the IEEE | Vol. 102, No. 8, August 2014

SHA-256 makes tampering with the bitstream as compu- tationally difficult as guessing the encryption key, which is

also 256 b.

2) Integration of Authentication With Bitstream Encryp- tion: Virtex devices use generic composition of the SHA- 256 keyed HMAC authentication with AES-256 encryption

[34], [37]. Generic composition allowed the two parts to be

separated, which permitted them to be developed inde- pendently and separately pipelined.

Virtex-6 and 7-series authentication and encryption are

composed using authentication then encryption (AtE). The

HMAC is computed on the plaintext, unencrypted bit-

stream. The configuration data and the MAC result value

are then encrypted. On the FPGA, the data are first de-

crypted and the MAC result is recomputed on the de-

crypted data and compared with the transmitted value in the bitstream. If the two MAC values disagree, the FPGA

configuration fails and the FPGA does not become active.

The authentication check catches errors in transmission

and attempts to configure the FPGA with the incorrect key

value as well as intentional tampering.

3) The Authentication Key: HMAC requires a secret authentication key in addition to the decryption key [12]. When generating an authenticated encrypted bitstream,

both keys are specified to the bitstream generation soft-

ware. To save nonvolatile storage space, only the decryp-

tion key is stored in the FPGA array. Because of the AtE

composition, the encrypted authentication key can be

transmitted with the bitstream. The bitstream encryption

provides the privacy to keep the authentication key secret.

4) Authentication Using Public Key Cryptography in FPGAs/ SoCs: The recent introduction of programmable systems on chip (SoCs) from FPGA manufacturers, including the

Xilinx Zynq and Microsemi SmartFusion2 devices, have

brought public key cryptography to the programmable

logic market. Both of these devices use asymmetric crypto-

graphy to provide authentication during the secure boot

process. The public key is stored on-chip in nonvolatile memory and its integrity checked before use. Public-key

authentication of configuration files such as a first stage

boot loader (FSBL) detects random-data attacks such as

those commonly used for side-channel attacks. It can also

serve to provide nonrepudiation of protected applications.

D. Bitstream Structure Fig. 5 compares bitstream structures of representative

Xilinx FPGA families, each with different security capa-

bilities [42]. Fig. 5(a) shows the bitstream format of an

unencrypted bitstream for Virtex devices. The unen-

crypted bitstream structure starts with a synchronization

word (SYNC) followed by a sequence of instructions.

Header commands set registers and control a variety of

functions, including declaring the device type and setting

up the startup sequence. The available commands and

registers are described in the Configuration User Guides

for each device family [50], [51]. The Write Frame Data

ration memory. An unencrypted bitstream can contain any

number of Write FDRI commands, each writing a differ-

ent, possibly discontinuous, portion of the FPGA config-

uration memory. Footer commands allow setting of

fies data integrity and STARTUP begins the FPGA startup

sequence. DESYNC prepares the configuration logic to accept postconfiguration reconfiguration commands.

Virtex-II through Virtex-5 FPGAs allowed encryption of

the FPGA configuration data, but not authentication. As a

representative of those encrypted-only bitstreams, Fig. 5(b)

shows a Virtex-5 encrypted bitstream structure. The CTL

instruction informs the FPGA that this is an encrypted bit-

stream. If the CTL command is missing, the FPGA assumes

the bitstream is unencrypted. CBC IV is the initialization vector for the AES CBC register. The CBC IV does not need

to be secret, and it is evident in the bitstream structure that

it is set with an unencrypted header command. The Write

FDRI command passes encrypted configuration data

through the decryptor. The Write FDRI command includes

a length field, also transmitted unencrypted, so the de-

cryptor decrypts the proper amount of data. Only the

configuration data are encrypted, although the CRC is computed on all data that precede it in the bitstream.

Fig. 5(c) shows an authenticated encrypted bitstream

from Virtex-6 and 7-series devices. Authentication and

encryption are always used together. There is no way to

specify a bitstream that is only encrypted or only

Fig. 5. Xilinx bitstream structure. (a) Unencrypted. (b) Virtex-5. Shaded area is encrypted. (c) Virtex-6/7-series. Shaded area is

authenticated and encrypted.

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

Vol. 102, No. 8, August 2014 | Proceedings of the IEEE 1255

authenticated for these devices. As in earlier devices, the CTL instruction informs the FPGA that the bitstream has

security enabled. The CBC initialization vector initializes

the decryptor as before. Decrypt word count (DWC)

indicates to the FPGA the amount of secure data to follow.

DWC includes not only the configuration data but header

and footer commands as well. Header commands and

footer commands are encrypted and covered by authenti-

cation. DWC is transmitted in the clear and could be modified by an adversary, but since the length of the data is

included in the MAC computation, a modification of DWC

will invalidate the computed MAC.

The authentication key is transmitted to the FPGA at

the start of configuration and again at the end of confi-

guration, since the key is used twice in the HMAC com-

putation. ALIGN is a variable number of no-operation

instructions, inserted to ensure the authenticated en- crypted data are an even multiple of 512 b, simplifying the

MAC computation. At the end of the bitstream, the re-

quired MAC is transmitted to the FPGA where it is com-

pared with the MAC computed by the FPGA.

Confidentiality, data integrity, and data authentication

of the configuration data are all required to protect FPGA

configuration data that are exposed to potential adversar-

ies. To date, only a few devices available from Xilinx and Microsemi provide all three protections on their config-

uration files.

V I . A N T I - T A M P E R

Physical security of the FPGA is just as critical as the ap-

plication of confidentiality, integrity, and authentication to

the device configuration. While there are focus areas of AT that overlap with IA, there are also aspects of AT that are

unique. FPGA manufacturers are faced with a number of

challenges while focusing on improving the physical secu-

rity of the device. As commercial products, some of the

challenges include, but are not limited to:

• FPGAs are readily available for adversaries to ex- periment on;

• compliance with U.S. and worldwide export and import restrictionsVmanufacturers must be able to sell their product worldwide, and do so while

meeting all import/export laws;

• FPGAs are cost sensitive, requiring a careful ba- lance between protecting customers’ IP and ena-

bling FPGA use in all types of systems.

There has been significant investment by FPGA manu-

facturers to enhance the physical security of their devices, driven primarily by the continual growth in performance,

density, and capabilities. This puts FPGAs at the heart of

most electronic systems today, where customers’ IP must

be protected.

This section describes some of the primary security

features and protocols of the Xilinx 7-series FPGA. These

are explored by looking at the configuration lifecycle of the

device. AT protections are employed preconfiguration, during configuration, and postconfiguration.

A. Preconfiguration

1) Defense Against Trojan Insertion: FPGAs allow the ability to configure either encrypted or unencrypted. This

is useful for application developers who may not want to

use encryption during integration and test, but then enable encryption when the system is fielded. This ability to con-

figure either encrypted or unencrypted, subjects the device

to a class of Trojan insertion attacks.

If an FPGA contains a decrypted bitstream, an adver-

sary may attempt to load a partial configuration into a

subset of the device that spies on the resident application.

It could connect to internal signals or memories. By con-

necting to internal components, the Trojan could be used to deduce the secured application in the FPGA. Con-

versely, an adversary may operate the same attack by pre-

loading a Trojan design and interrupting the secure loading

of the protected application.

Consequently, Xilinx FPGAs do not permit mixing

encrypted and unencrypted bitstreams, or partial bit-

streams, in any order. A new configuration of the device

requires fully clearing the existing device, either by cycling power or executing the JTAG JPROGRAM command.

Both methods initiate internal device housekeeping,

which clears all configuration and internal memory.

Similar concerns exist today with SoCs being intro-

duced by the FPGA manufactures. Xilinx, Altera, and

Microsemi are now offering processor-centric SoC devices

that typically have separate and independent regions for

the processor subsystem and the programmable logic. The independence provides users flexibility and the ability to

significantly reduce power by turning off the programma-

ble logic. This capability presents vulnerability. If an ad-

versary can preload a Trojan, either into the processor

memory or the programmable logic before allowing the

device to boot normally, then the Trojan will have access to

the entire internal application running on the device. As

with Xilinx FPGAs, Xilinx SoCs have been designed to ad- dress this security concern. The Xilinx Zynq device boots

both the processor and the programmable fabric from the

same root of trust, either fully secured, or fully open.

The Trojan insertion vulnerability also exists postconfi-

guration. Xilinx and Altera FPGA families permit partial

reconfiguration, the ability to change the configuration of a

section of the FPGA while the rest operates normally. This

feature has proven to be very valuable for innovative ap- plications. However, it also is susceptible to a Trojan

insertion attack after the initial configuration. The appli-

cation design must authenticate postconfiguration bit-

streams to exclude Trojans.

2) Protecting Keys: The secrecy of a cryptographic key is fundamental to security; protecting the key is the first

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

1256 Proceedings of the IEEE | Vol. 102, No. 8, August 2014

priority of the FPGA manufacturer. As stated earlier, this can be a challenge for commercial vendors who are de-

veloping a device that is used in nearly all types of ap-

plications, and not specifically designed for a specific

domain, application, or cost point. Also important to note

is the fact that while we address the protection of keys here

in the preconfiguration section of this paper, protection of

keys is essential before, during, and after configuration.

a) Key storageVTechnology: Most FPGA manufac- turers provide both volatile and nonvolatile key storage. In

the case of SRAM FPGAs, volatile key storage is imple-

mented as battery-backed RAM (BBRAM) and nonvolatile

key storage is implemented as eFuses. Each has advantages

and disadvantages.

BBRAM key storage requires no process changes, mak-

ing it easily implemented in state-of-the-art process tech-

nology. The volatile key storage also allows for key agility and key zeroization, critical components of a strong cryp-

tographic system. In Xilinx FPGAs, when primary power is

applied, the BBRAM is powered by that power supply, which

not only reduces the drain on the battery but also permits

replacing the battery in fielded system. Altera specifies that a

battery must be attached before the key is loaded, implying

that the source for the BBRAM memory is only the external

battery [4]. Xilinx also provides an internal interface that can be used by an application to command a zeroization of

the key space. Zeroization is intended for use when the

FPGA detects tampering with an operating application.

BBRAM is not without its disadvantages. A momentary

loss of contact or low battery voltage could cause the key to

be lost. While modern coin-cell batteries hold enough

energy to hold encryption keys for the design lifetime of

20 years, and new betavoltaic batteries with great re- liability are being introduced to the market, many battery

vendors do not specify thermal wearout or other failure

modes, for the length of time required by most FPGA

users.

BBRAM is inherently more physically secure than

nonvolatile key storage technology. To steal the key, an

adversary would need to decap the FPGA and mill away

many levels of metal, then scan the bits with a scanning electron microscope (SEM). This attack must be per-

formed while keeping clean power to the key memory.

This is the type of attack required to extract the entire

configuration directly from the FPGA SRAM cells as well,

so no bitstream encryption method is qualitatively

stronger. This attack is considered to be beyond the capa-

bilities of all but the most sophisticated of adversaries.

An eFuse provides a simple, one-time-programmable nonvolatile memory. Because they are nonvolatile, eFuses

eliminate the maintenance issues associated with a battery.

A common eFuse structure is a narrow wire that is prog-

rammed by electromigration from high programming cur-

rent. eFuses are simple to build and program, requiring no

additional process complexity or high voltage. However,

eFuses and their programming circuitry are rather large, so

eFuses are practical only for small amounts of memory, such as a decryption key. The physical change caused by

eFuse programming is visible under a microscope, so

eFuses are comparatively easy to reverse engineer from a

decapped part. Of course, they cannot be reprogrammed or

erased. However, to zeroize an eFuse key, one could burn

all eFuse cells in the key.

b) Key loading: The JTAG test port is a common in- terface for loading keys into programmable logic devices [4], [22], [35]. Loading a key into a Xilinx device begins by

first executing a JTAG command to enter key access mode,

which clears the existing key and all configuration data and

memory in the FPGA. A second JTAG command writes the

new key and reads it back to verify it. Of course, on power-

up, FPGAs key access is disabled.

Details of key loading vary considerably among manu-

facturers. Loading of the key may be done in plaintext (‘‘red key load’’) or ciphertext (‘‘black key load’’) or other-

wise obscured. In Xilinx devices, the key is transmitted to

the FPGA in the plaintext, so it must be loaded in a secure

location. The key access control sequence ensures that the

key is cleared before any command is executed that could

read it back. Other vendors have chosen alternative solu-

tions. Altera Stratix devices include a key obfuscation

mechanism so the key may be presented to the FPGA in an encrypted form. Moradi [32] reported that two 128 b keys

are used by Altera Stratix-II devices. The bitstream key is

transmitted and stored in encrypted form, encrypted by a

second key, which is presented without any obfuscation.

Although the key used to decrypt user data is not trans-

mitted or stored in the FPGA, and hence cannot be ex-

tracted, it can be computed by a straightforward algorithm

from the readable key that accompanies it. In Altera Stratix-V devices, the user key is sent through a one-way

function before being stored on the device [4]. In both

of these scenarios, the loading of the key is obscured, and

while not cryptographically sound, may provide a level of

security acceptable at a given price point.

Microsemi is the first FPGA manufacturer to offer a

true ‘‘black-key load’’: the key is encrypted by a secret key

before loading. In selected SmartFusion2 devices, the de- vice and user exchange public keys and perform an elliptic

curve Diffie–Hellman (ECDH) exchange to generate a key

that can be used for the authenticated/encrypted loading of

a user key [28]. The generated key is used as a key encryp-

tion key (KEK) to encrypt the user key on the transmit side,

and to decrypt the user key within the SmartFusion2 device.

c) Key storageVred or black?: Much like key loading, the device key can be stored in plaintext, ciphertext, or obfuscated form. Xilinx stores 7-series keys in plaintext

form. An adversary who decaps the part and can identify

the key storage cells can attempt to extract the actual key

bits. An obfuscated key defeats this attack until the obfus-

cation method is discovered. Altera implemented a key

obfuscation algorithm in Stratix devices, so that probing

the device could not divulge the key directly. When the

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

Vol. 102, No. 8, August 2014 | Proceedings of the IEEE 1257

obfuscation algorithm was revealed, obfuscation was no longer a barrier to invasive key extraction [32].

Selected SmartFusion2 devices from Microsemi make

use of Intrinsic-ID’s (Eindhoven, The Netherlands)

Quiddikey technology [17]. This technology does not store

an encryption key on-chip. Instead, it generates the key

when needed through the use of an activation code gene-

rated during an enrollment phase, and the output of

Intrinsic-ID’s SRAM-based physically uncloneable func- tion (PUF) [17], [29].

d) Eliminating keys: When an unauthorized event occurs, the application may need to eliminate sensitive

keys within the device. For systems that employ BBRAM

key storage, there are multiple options. First, passive era-

sure can be accomplished by simply electrically discon-

necting the battery from the supply. Second, for Xilinx

devices, an external device could send the appropriate JTAG command to enter key access mode. As mentioned

earlier, this actively clears the device key and the configu-

ration of the device. Finally, most FPGA vendors have the

ability to erase the key from within the device under con-

trol of the application [4], [29], [38]. Xilinx and Microsemi

offer the ability to fully zeroize the device key, actively

erase the key, and then verify that it indeed has been

erased, either through readback or dedicated hardware.

3) Antispoofing: When they are manufactured, FPGAs can accept either an unencrypted bitstream or an en-

crypted bitstream. All programmable logic vendors that

provide encrypted bitstreams have the ability to modify the

FPGA to require an encrypted configuration. This modi-

fication involves programming a nonvolatile eFuse register

that disables unencrypted configuration. An adversary cannot substitute an alternative bitstream in the device or

change the key. Instead, that adversary must replace the

FPGA with another equivalent device. This solution pro-

vides no value with a BBRAM volatile key, as the adversary

only needs to remove the battery to clear the key, then load

a new key into the device to gain access. Of course, an

adversary can circumvent the antispoofing by replacing the

protected FPGA with a new, unprogrammed one. None- theless, antispoofing the device is a cost-effective compo-

nent of overall system security.

4) Test Circuitry: Because it provides access to and control of internal nodes, test circuitry has long been a

primary point of security vulnerability in integrated cir-

cuits, and must be disabled for a secure application to

indeed be secure. While protection of test circuitry is dis- cussed in this preconfiguration section, it must be consid-

ered during configuration and postconfiguration as well.

Test interfaces can be disabled in many ways. Pro-

prietary test interfaces are typically handled differently

than industry-standard interfaces such as JTAG. Xilinx

disables readback by setting internal security bits when an

encrypted bitstream is loaded. In the Zynq SoC, eFuses

may be used to disable test interfaces permanently [36]. Test disable is also provided in Altera devices where a

tamper-protection bit disables the test modes of the FPGA

[4]. When permanently disabling test circuitry, users must

be aware of the consequences for additional failure anal-

ysis: if the test access port has been disabled, there is very

little anyone can do to debug the device.

Microsemi and Xilinx provide mechanisms to perma-

nently disable the JTAG interface as well as monitor it internally for tamper conditions [29], [35]. Altera has the

ability to reduce the number of JTAG commands executed

to only those mandatory by the standard (e.g., Extest,

Intest, IDCODE, etc.). The execution of nonmandatory

JTAG instructions can be enabled by issuing the UNLOCK

JTAG instruction, which is only allowed to execute when

sent from within the device [4].

B. During Configuration

1) Side-Channel Attacks on Keys: In recent literature, Xilinx, Altera, and Microsemi FPGAs have been shown to

be vulnerable to differential power analysis (DPA) attacks

on their keys [30]–[32], [38]. Although noninvasive, these

published attacks employ a custom board with a significant

reduction in bypass capacitance in order to enhance the power signal. This brings up the question of the difficulty

of moving an FPGA from one board to another while

keeping the key intact. eFuse, antifuse, and Flash storage

should be unaffected, but battery backed RAM keys are lost

if, during the transfer, power is lost to the keys or if the

device temperature exceeds operating limits.

Security is always a moving target. Attacks continue to

improve, and since a custom board is not required, in principle, to mount a DPA attack, one would expect that

future side-channel attacks on FPGAs will target devices in

their native environment. Defenses improve as well. As

side-channel attacks became better understood, FPGA

vendors added countermeasures, though they are not al-

ways explicit about precisely what they have done. Micro-

semi has licensed CRI technology, but has not released

which aspects of that technology they have used. Other vendors are silent on the question of precise circuit details

to address DPA.

C. Postconfiguration FPGAs rely on the application as an active participant

in protecting the device after configuration, a capability

somewhat novel to FPGAs [45]. FPGAs provide security-

related features, but leave the policy decision of handing the features to the user of the FPGA to implement in the

application.

1) Readback Disable: Traditional FPGA operation allows the unencrypted bitstream and data to be read out using

the bitstream readback command. Therefore, when an

FPGA loads an encrypted bitstream, it disables the

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

1258 Proceedings of the IEEE | Vol. 102, No. 8, August 2014

readback mechanism, regardless of bitstream settings. This automatic, mandatory setting prevents the simple attack of

using the FPGA to decrypt the bitstream, then reading it

out. Readback continues to be a valuable feature for both

the FPGA manufacturer and the application developer.

Proper measures must be taken so that it does not jeo-

pardize the application security.

Skorobogatov and Woods [38] used a side-channel

attack to extract a key that unlocked readback in an FPGA that was advertised to have no such capability. While sen-

sationalized as a back door, and questioned for who in-

serted it, and for whom, in all practicality it was no more

than an interface used for device test.

2) Restricted Access to Base Silicon Cryptographic Logic: On-chip cryptographic functions, such as the decryptor,

are well-tested, high-speed logic designs of a standard function. It would seem efficient to allow an operating

FPGA application to use cryptographic functions after

configuration. However, user access to the decryptor, or

other cryptographic functions, permits data flow paths that

complicate the analysis of the security of the base silicon. If

the user has access to the cryptographic functions, and the

device is programmed to permit unencrypted bitstreams,

then the adversary has access to the cryptographic func- tions as well. The manufacturer must perform a security

analysis to verify that no key data could leak into the

application domain.

Second, there are U.S. export and various national

import laws worldwide that add risk to the manufacturer if

cryptographic functions are used for more than just the

configuration of the device. Third, most cryptographic

functions, such as AES decyrptors are simply not very large and can be implemented in the user application without

consuming much of the FPGA logic. Finally, users have a

wide range of needs for cryptographic services. This be-

comes a cost/benefit tradeoff for the manufacturer. Xilinx

and Altera do not allow access to the cryptographic

functions on the FPGAs. Microsemi allows access to the

cryptographic functions on selected models of the

SmartFusion2 devices [28].

3) Restricted Access to Base Silicon Features: Concern over tampered bitstreams in early Virtex devices led Xilinx

to prohibit reconfiguration of encrypted bitstreams. This

restriction applied to the internal configuration access port

(ICAP) as well as the external configuration port. The

concern was that a bitstream might be tampered to enable

access to ICAP, which could then be used to read back the decrypted configuration. Virtex-II through Virtex-5 de-

vices required encrypted bitstreams to pass a CRC to begin

operating, thus ensuring the integrity of the bitstream

data. However, as described earlier, CRC does not give a

strong defense against bitstream tampering.

Since the addition of authentication in Virtex-6 and

7-series, a secured bitstream must pass the authentication

check, defeating any bitstream tampering. Since an au- thenticated bitstream could not have been modified by an

adversary, it can be trusted. This trust applies to the appli-

cation in general, but specifically enables trusted self-

reconfiguration with the ICAP. Since the application

design is trusted, ICAP operation is permitted with au-

thenticated encrypted bitstreams. An authenticated bit-

stream may use the ICAP to launch a partial configuration

while the device continues to operate, allowing the design of a trusted reconfigurable platform [53].

ICAP is a Xilinx-specific example of a base silicon fea-

ture that, if used maliciously, could provide a vulnerability

without the appropriate protections. In all cases, the man-

ufacturer must provide safeguards, while the application

developer has final responsibility. It is, of course, possible

to construct an insecure application despite the encryption

and authentication. For example, if an application devel- oper connected the ICAP interface directly to the external

pins, an adversary could interrogate the ICAP to read back

the unencrypted application bitstream. FPGA security

enables the construction of secure applications; it does not

guarantee them.

4) The Value of ICAP and Checking Designs in the Field: ICAP permits logic inside the FPGA to read and write its own bitstream, providing a wide range of powerful use

cases. These include:

• internal readback of the device configuration for in-system integrity checks;

• configuration clearing and zeroization; • algorithm agility for those applications that need to

change algorithms without a complete reconfigu-

ration of the device; • self-test; • use of user-specific decryption and authentication

algorithms with custom protections against attacks

such as DPA or other side-channel attacks;

• configuration repair: random single-event upsets (SEU) [23], [52] or intentional tampering may

cause configuration bits inside the FPGA to

change. Jones [19] describes the SEU controller, an application in which the FPGA logic reads its

own bitstream internally through ICAP, checks the

stored bitstream with previously computed ECC

data, and corrects configuration errors. The SEU

controller is intended to detect and correct errors

in a high-reliability environment, but it can be used

to detect tampering with the FPGA in the field if

individual bits are flipped. More recent FPGAs include the SEU detection and scrubbing feature in

dedicated hardware [35].

D. Invasive Attacks Because of the environment of the fielded FPGA, the

difficulty of protecting FPGA keys and configuration data

persists regardless of the technology used to store them.

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

Vol. 102, No. 8, August 2014 | Proceedings of the IEEE 1259

When an adversary can physically open the device and scan the contents, no storage technology is wholly secure.

However, using our model of cost-based security, some

storage methods are more expensive to break, sometimes

despite having no qualitative advantage.

The strongest way to prevent theft and tampering with

a bitstream is to keep it out of an adversary’s hands. The

Xilinx Spartan 3AN is a multichip package containing an

FPGA die and a flash memory die. Since nothing is trans- mitted from an external source, the trivial bitstream inter-

ception method does not work. However, after decapping

the package, the signals between die can be probed to

pirate the bitstream.

MicroSemi’s SmartFusion devices have internal non-

volatile Flash memory storage as well. These devices are

still subject to physical, invasive attack, though that attack

is more difficult for several reasons. The storage in these devices is distributed around the device, and there is no

localized point at which one could intercept the config-

uration data, so the attack must scan the entire device.

Programmed Flash and antifuse cells are not observably

changed from unprogrammed cells, so the detection of

programming is more difficult. It may require SEM or

thermal analysis. SEM images of programmed and unprog-

rammed antifuse cells show no apparent differences [2]. Invasive physical attacks on antifuse devices and Flash

devices are qualitatively no more difficult than methods for

extracting eFuse bits. However, these attacks are consid-

ered significantly more expensive because millions of bits

of data must be extracted, rather than merely a 256-b key.

Further, the resulting extracted programming is not for-

matted for programming another FPGA, so it must be

formatted properly by the adversary in order to clone the design. The proper format is not published, so there is no

cryptographically strong protection, but it is considered

difficult and tedious.

Despite the concerns, there has not been a report of a

successful invasive attack on any FPGA regardless of the

internal storage: SRAM, BBRAM, eFuse, Antifuse, or Flash.

E. Environmental Attacks The circuits inside FPGAs that implement the security

functions are no less susceptible to attack than those in

other semiconductor integrated circuits. Published attacks

on security functions in other devices include out-of-range

temperature and power adjustment, overclocking, and

other environmental attacks. Defense against these attacks

is very difficult because, by definition, semiconductor

foundries do not guarantee operation outside their gua- ranteed environmental range. FIPS140-2, level 4, requires

environmental failure protection on cryptographic mod-

ules [10] and FPGA vendors provide limited protection

from environmental attacks.

The traditional response to environmental attacks has

been more robust circuitry, including dedicated voltage

regulation for security functions, large hamming distances

in security-critical state machines, and redundant storage of critical state values, such as those disabling readback in

a secure system.

Xilinx provides an embedded analog-to-digital con-

verter (ADC) that can be used to monitor voltage and

temperature both outside and inside the FPGA. Users can

configure the circuitry to specific voltage and temperature

ranges based on the environment the system will operate

in. If the voltage or temperature exceeds this user-specified range, an internal alarm signal will be generated notifying

the application running on the device. User-specific

actions can then be taken, for example, clearing sensitive

cryptographic variables in registers or RAMs, zeroizing the

key or clearing the configuration of the device itself and

shutting down.

With few exceptions, FPGA manufacturers do not pub-

lish details of their security circuitry. Microsemi FuseLock and FlashLock include internal fuses or flash cells that

prevent inappropriate access. According to Microsemi,

‘‘special security keys are hidden throughout the fabric of

the device, preventing internal probing and overwriting.

They are located such that they cannot be accessed or

bypassed without destroying the rest of the device’’ [28].

Xilinx readback disabling circuitry has ‘‘hardened triple-

redundant logic’’ and key loading FSMs have ‘‘large ham- ming distances between states’’ [35].

1) Device Identifier: A unique identifier is a powerful way to restrict access to an FPGA, defeating cloning and

spoofing. An application can be coded to operate only on

the one device that matches a specific identifier or on a

subset of devices with a range of values.

Modern FPGAs contain a device identification register. Xilinx provides device DNA, a 57-b serial number prog-

rammed in eFuses during manufacture and used for track-

ing devices. Device DNA is accessible from outside the

FPGA via JTAG. In addition, devices include a user-

programmable 32-b eFuse field that can be used as an

identifier as well. This user eFuse field is only available to

logic within the FPGA.

F. PUFs and FPGAs Other alternatives exist for device identifier. PUFs [14],

[39] provide a device-specific unique identifier derived

from random process variations. A PUF generator pro-

duces a different signature for each manufactured device.

PUFs have been demonstrated in FPGA fabric (‘‘soft PUF’’)

as well as in dedicated logic (‘‘hard PUF’’). Microsemi’s

SmartFusion2 includes a hard PUF. Other FPGA vendors have IP providers who provide soft PUF functions in fabric.

Therefore, application developers can build PUFs for de-

vice identification today with existing FPGAs.

There are several drawbacks for the use of PUFs that

have precluded their use as decryption keys in FPGAs.

First, the PUF only resides inside the device. It must be

read out of the device to encrypt the bitstream data file.

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

1260 Proceedings of the IEEE | Vol. 102, No. 8, August 2014

Alternatively, Kean recommended an encryption method in which the FPGA encrypts its own data file using its

internal key and emits the encrypted data for external

storage [20]. More importantly, the PUF is unique to each

unit, so the bitstream must be encrypted uniquely for each

device. This problem may be addressed by including a key

transformation word, the exclusive-or of the computed

PUF for the device with the actual key used to decrypt the

data. Still, at system build time, the FPGA must be pow- ered on and the transformation word derived. Perhaps

most importantly, PUFs are not stable: a few bits may

change over the lifetime of the device. This is not parti-

cularly important for a device identifier, but disastrous for

a decryption key. One method to compensate for this is the

addition of helper data to the PUF-encrypted bitstream.

Helper data are fundamentally error correcting code

information for correcting errant bits in the PUF. It is unclear how much information about the key is leaked in

helper data. Finally, long-term PUF reliability data over

process, voltage, and temperature is sketchy at advanced

process nodes, leading to concern over lost keys during the

lifetime of the fielded device.

V I I . A P P L I C A T I O N S

A. IFF Flow for Nonsecured Devices Baetoniu [6] described ‘‘identification friend or foe’’

(IFF), a way to tie an FPGA bitstream to a specific

system. IFF uses an external storage device, a secure serial

electrically erasable programmable read-only memory

(EEPROM), such as the Dallas Semiconductor/Maxim

DS2432 (Fig. 6). The secure EEPROM includes a crypto-

graphic hash function. At system build time, the application developer programs a secret key into the EEPROM and also

programs the secret key into the FPGA application.

After the FPGA boots, it uses its random number

generator to interrogate the EEPROM. The EEPROM

computes the hash of the random string with its stored key.

The FPGA does the same. If the two hashes match, the

FPGA continues to operate. If the hashes do not match, the

FPGA enacts countermeasures such as ceasing operation or disabling premium functionality. The check may be

repeated as often as desired during operation.

IFF ties the FPGA bitstream to a properly programmed

secure EEPROM. Although it can be applied to an FPGA

without bitstream encryption, doing so leaves the system

vulnerable. An adversary may reverse engineer the bit-

stream and disable the check on the hash function. This

mechanism is even vulnerable with an encrypted, but not authenticated, bitstream, because an adversary may at-

tempt to disable the hash function check by a bit-flipping

attack or random perturbation of the plaintext to disable

the hash check.

B. Metered IP As third-party IP cores become more common, one

would like a mechanism to charge per copy for those

cores. The core vendor would be paid for each use, just as if it had been a physical device. Guajardo et al. [15] described a method for doing this and the company Intrinsic-ID

developed into a product under the brand name Quiddi-

card [17].

The method has an enrollment phase and an authenti-

cation phase. In the enrollment phase, the FPGA is prog-

rammed with a PUF which generates an identifier unique

to the FPGA. An activation code is generated from the PUF value and stored off-chip. The activation code generation is

a proprietary algorithm, but may be an encryption of the

PUF value using a private key of a public/private key pair.

In the authentication phase, the same PUF is constructed

in the FPGA and the design is authorized with the acti-

vation code (Fig. 7).

To turn this activation process into an IP metering

mechanism, the generation of the activation code may be done by a trusted third party, possibly a trusted piece of

billing hardware at a manufacturing site that reports the IP

usage as it generates the activation code. This mechanism

has been extended to include multiple keys to permit ac-

cess to multiple pieces of IP in the FPGA application [18].

This mechanism relies on confidentiality and authen-

tication of the application design, so that an adversary

Fig. 6. IFF design.

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

Vol. 102, No. 8, August 2014 | Proceedings of the IEEE 1261

cannot reverse engineer the device to remove the activa-

tion code checking. There is nothing fundamental about

using a PUF for identification. Device DNA or some other

unique or nearly unique fixed device identifier can serve.

C. Just in Time Secure Configuration Utilizing partial reconfiguration and authenticated en-

crypted bitstreams, it is possible to design a system where

critical technology (CT) is only configured into the device

when it is needed, thereby adding an additional layer of security to the system. Peterson [35] proposed a method by

which a user application is partitioned between CT and

non-CT. The non-CT is resident in the FPGA at all times

and the CT logic is partially reconfigured into the FPGA

only when needed. Otherwise, it is stored externally,

encrypted and authenticated.

The CT, which exists as a partial configuration, can be

decrypted by the device using the device key or by the application using a user-specified algorithm implemented

in the FPGA fabric, and potentially a PUF to generate the

key. The boot configuration of the FPGA sends the CT

partial bitstreams to the ICAP so that the decryption pro-

cess is completely contained with the FPGA. Encryption is

required to ensure the privacy of keys included in the CT

partial bitstreams or the boot configuration bitstream.

Authentication is required so that the bitstreams cannot be tampered in a way that compromises the CT partial bit-

streams. The IP described by Zeineddini and Wesselk-

amper [53] for secure and high-reliability applications

utilizing partial reconfiguration also checks for tampering.

It uses the integrated ADC to monitor power and tem-

perature, and checks the JTAG port to detect tamper

conditions. If necessary, the IP zeroizes the CT and its key.

D. Fault-Tolerant Design FPGA manufacturers supporting confidentiality, in-

tegrity, and authentication of the configuration provide a

strong foundation that users can build high-reliability system upon. Cryptographic processing and security

services, like any high-reliability function, must be fault

tolerant. Xilinx’s isolation design flow (IDF) [7], developed

in conjunction with government entities [24], was the first

in the programmable logic industry. Altera has since

developed similar technology, called the design separation

flow [5].

IDF provides fault containment at the FPGA module level, enabling single-chip fault tolerance by various tech-

niques, including modular redundancy, watchdog alarms,

segregation by safety level, and isolation of test logic for

safe removal [7]. The applicability of this type of technol-

ogy goes beyond cryptographic processing and security.

The same technology can be used to aid in compliance for

systems that must be designed to safety-critical standards

such as IEC61508, ISO26262, and DO-254. The basic concept is to separate critical and/or inten-

tionally redundant functions physically on the FPGA. This

can be accomplished through careful floorplanning and the

use of unused logic as fences. Fig. 8 represents a design

that has been floorplanned with IDF. Fig. 9 is the same

design after place and route.

The fences are exhaustively analyzed by the FPGA

manufacturer to show that a single failure would not com- promise the isolation or redundancy built into the system.

The goal is to minimize the size of the fence to reduce the

inefficiencies that come with its use [16]. As an example,

the width or height of a fence made of configurable logic

blocks (CLBs) in a Xilinx 7-series FPGA is a single CLB.

In an ideal world, each module would be completely

isolated from each other. In practice, this scenario is not

feasible: some level of communication must exist between isolated regions. Xilinx developed the concept of ‘‘trusted

routing,’’ restricted routing that is specifically chosen by

the place and route algorithms such that the isolation

established by the use of ‘‘fences’’ is not compromised.

Finally, no high-reliability system is complete without

the use of independent verification. To address concerns

associated with software ‘‘bugs’’ or inappropriate use of the

design methodology by the user, FPGA manufacturers must provide independent verification tools that can be applied

to the design to validate the isolation of the modules. Xilinx

Fig. 8. Notional floorplan of a design into five isolated regions.

Fig. 7. Metered IP system architecture.

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

1262 Proceedings of the IEEE | Vol. 102, No. 8, August 2014

developed the isolation verification tool (IVT) for this

purpose. IVT can be used early in the development flow to

aid in isolation verification before a printed wiring board

(PWB) is committed. It is also used once the design is

complete in order to verify that the final design, placed and routed, has the isolation designed in that the user intended.

E. Single-Chip Cryptography Single-chip crypto (SCC) combines data of different

levels of secrecy or control in a single device. The device

must not only protect programs during loading, but also it

must defend against attacks from outside and attacks while

operating, including leakage of protected information across internal boundaries. Therefore, single-chip cryp-

tography aggregates much of the technology discussed in

this paper.

SCC uses the authenticated encryption capability to

load a boot loader. The boot loader, isolation region #1 in

Fig. 8, manages further FPGA configuration, software for

on-chip processors, and data handling. Because it was au-

thenticated and encrypted, the boot loader is known to be unaltered by potential adversaries or accidental bit errors.

In addition, sensitive data, such as session keys, are known

to be kept secret. To ensure no internal leakage of in-

formation, SCC implements the fences of IDF as described

in Section VII-D (Fig. 8) to separate sensitive data spatially

in the FPGA. This separation assures the confidentially of

sensitive information even in the presence of accidental or

intentional attacks on the fences. The spectrum of isola- tion capabilities is sufficient to support applications such

as the separation of red and black data processing, key

management, and other high-reliability functions.

Bitstream scrubbing, using internal readback, contin-

ually monitors the configuration data, in particular the

isolation fences, to ensure that changes to the configura-

tion are detected and corrected quickly. SCC can even

verify that the device DNA is correct, ensuring operation on the proper individual chip.

Starting with the root of trust, followed by the power

and flexibility of both hardware and software, coupled with

the application of isolation technologies and partial recon-

figuration, a system that would typically have been devel-

oped through the use of multiple devices now could be

integrated into just one with no loss of security.

V I I I . T H E F U T U R E O F F P G A S E C U R I T Y

A. Field-Programmable SoC SCC was originally conceptualized and developed in

cooperation with government authorities for FPGAs [24],

and the application provides additional value in new prog-

rammable SoCs such as Zynq. Zynq includes both a prog- rammable logic subsystem (PL) that comprises hundreds

of thousands of gates of logic, and a processor subsystem

(PS) that includes a dual-core ARM (ARM Holdings,

Cambridge, U.K.) Cortex A9 processor, caches, memories,

and peripherals, connected to one another and to the PL

using an Advanced Microcontroller Bus Architecture

(AMBA) Advanced eXtensible Interface (AXI) bus. The

Zynq device boots securely, using authenticated encryp- tion capabilities like those described for FPGAs. Zynq also

provides asymmetric and symmetric authentication, con-

fidentiality, and integrity. Leveraging this root of trust,

applications can implement cryptoprocessors or systems

performing cryptographic functions in the combination of

processor and FPGA with confidence that they have not

been compromised.

In Zynq, the processor subsystem is known to be iso- lated physically from the programmable logic. Within the

PL, isolated regions as in IDF ensure separation of sensi-

tive data spatially. Within the PS, known software meth-

ods, such as hypervisors and ARM Trustzone technology

isolate sensitive software processes from other processes.

The trusted boot loader decrypts and authenticates all

configuration data and software.

Partial reconfiguration is further enhanced. The entire PL can be reconfigured, or even powered down, controlled

by the PS. Alternatively, portions of the PL can be partially

reconfigured for applications that require algorithm agi-

lity. The same reliability checks performed on ICAP [52]

can be applied to the processor configuration access port

(PCAP) to ensure proper data integrity of software. De-

cryption and authentication of partial configuration files

Fig. 9. FPGA editor view of a design implemented using the IDF methodology.

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

Vol. 102, No. 8, August 2014 | Proceedings of the IEEE 1263

can be performed by either the PS or the PL, allowing users the flexibility to choose their own authentication and de-

cryption algorithms as well as perform functions such as

authenticate before decryption to aid in defense against

side-channel attacks. Of course, key management remains

a critical consideration in these applications.

B. Conclusion Security in FPGAs has been driven by the need to

address new threats, by the growth in value of the IP of the

applications, and by the growth in the expected sophisti-

cation of the adversary. All three drivers continue to

operate. New areas of protection, such as confidentiality of

the data handled by the FPGA, metering of third-party IP,

and counterfeit protection motivate additional capabilities and combinations of capabilities in the FPGA. Modern

FPGAs and new programmable SoC devices hold applica-

tions that comprise complete systems, processing very sen-

sitive data and controlling valuable systems. The high value

of the applications, the data they handle, and the systems

they control motivate well-equipped adversaries to steal IP

or to subvert the systems of which the FPGA is a part.

As adversaries become more sophisticated, so do the FPGA defenses. Future FPGA security features must con-

tinue to improve to meet all three drivers. As in the past,

these features will include circuits on the base array,

algorithms in silicon, and IP in the programmable part of

the device. h

R E F E R E N C E S

[1] Actel, ‘‘Implementation of security in Actel’s ProASIC and ProASICPLUS Flash-based FPGAs,’’ Appl. Note AC185, 2003.

[2] Actel, ‘‘Understanding Actel antifuse device security,’’ 2004. [Online]. Available: www. actel.com/documents/AntifuseSecurityWP. pdf.

[3] P. Alfke, ‘‘Configuration issues: Power-up, volatility, security, battery back-up,’’ Xilinx, Appl. Note XAPP092, 1997. [Online]. Available: http://www.xilinx.com/support/ documentation/application_notes/xapp092. pdf.

[4] Altera, ‘‘Using the design security features in Altera FPGAs,’’ Appl. Note, AN-556, Jun. 19, 2013.

[5] Altera, ‘‘Quartus II design separation flow,’’ 2013. [Online]. Available: http://www.altera. com/literature/hb/qts/qts_qii51019.pdf.

[6] C. Baetoniu, ‘‘FPGA IFF copy protection using Dallas Semiconductor/Maxim DS2432 Secure EEPROMs,’’ Xilinx, Appl. Note XAPP780 v. 1.1, 2010. [Online]. Available: http://www. zylinks.com/support/documentation/ application_notes/xapp780.pdf.

[7] J. D. Corbett, ‘‘The Xilinx isolation design flow for fault-tolerant systems,’’ Xilinx WP412, 2012. [Online]. Available: http:// www.xilinx.com/support/documentation/ white_papers/wp412_IDF_for_Fault_Toler- ant_Sys.pdf.

[8] S. Drimer, ‘‘Authentication of FPGA bitstreams, why and how,’’ Reconfigurable Computing: Architectures, Tools and Applications, vol. 4419. Berlin, Germany: Springer-Verlag, 2007, pp. 73–84.

[9] S. Drimer, ‘‘Security for volatile FPGAs,’’ Ph.D. dissertation, Comput. Sci. Dept., Cambridge Univ., Cambridge, U.K., 2009.

[10] National Institute of Standards and Technology (NIST), ‘‘Security requirements for cryptographic modules,’’ FIPS 140-2, 2001.

[11] National Institute of Standards and Technology (NIST), ‘‘Announcing the advanced encryption standard,’’ FIPS 197, 2001.

[12] National Institute of Standards and Technology (NIST), ‘‘The keyed-hash message authentication code (HMAC),’’ FIPS PUB 198, Mar. 6, 2002. [Online]. Available: http://csrc.nist.gov/publications/fips/ fips198-1/FIPS-198-1_final.pdf.

[13] National Institute of Standards and Technology (NIST), ‘‘Secure hash standard,’’ FIPS PUB 180-2 + Change Notice to include

SHA-224, Aug. 1, 2002. [Online]. Available: http://csrc.nist.gov/publications/fips/fips180- 2/fips180-2withchangenotice.pdf.

[14] J. Guajardo, S. S. Kumar, G. J. Schrijen, and P. Tuyls, ‘‘Physical unclonable functions and public-key crypto for FPGA IP protection,’’ Proc. IEEE Int. Conf. Field-Programm. Logic Appl., 2007, pp. 189–195.

[15] J. Guajardo, S. S. Kumar, G. J. Schrijen, and P. Tuyls, ‘‘Brand and IP protection with physical unclonable functions,’’ in Proc. IEEE Int. Symp. Circuits Syst., 2008, pp. 3186–3189.

[16] T. Huffmire et al., ‘‘Moats and drawbridges: An isolation primitive for reconfigurable hardware based systems,’’ in Proc. IEEE Symp. Security Privacy, 2007, pp. 281–295.

[17] Intrinsic-ID, ‘‘Quiddikey-Flex,’’ 2013. [On- line]. Available: http://www.intrinsic-id.com/ products/quiddikey-flex.

[18] Intrinsic-ID, ‘‘Quiddicard protecting your IP gainst overproduction, counterfeiting and cloning,’’ Aug. 30, 2013. [Online]. Available: www.intrinsic-id.com/products/ quiddicard-.

[19] L. Jones, ‘‘Single event upset (SEU) detection and correction using Virtex-4 devices,’’ Xilinx, Appl. Note #714, 2007. [Online]. Available: http://www.xilinx.com/bvdocs/appnotes/ xapp714.pdf.

[20] T. Kean, ‘‘Secure configuration of field programmable gate arrays,’’ in Proc. IEEE Annu. Symp. Field-Programm. Custom Comput. Mach., 2001, pp. 259–260.

[21] Lattice, ‘‘FPGA design security issues: Using Lattice FPGAs to achieve high design security,’’ White Paper, 2007.

[22] Lattice, ‘‘Advanced security encryption key programming guide for LatticeECP3, LatticeECP2MS, LatticeECP2S devices,’’ Tech. Note TN1215, 2012.

[23] A. Lesea, S. Drimer, J. Fabula, C. Carmichael, and P. Alfke, ‘‘The Rosetta experiment: Atmospheric soft error rate testing in differing technology FPGAs,’’ IEEE Trans. Device Mater. Reliab., vol. 5, no. 3, pp. 317–328, Sep. 2005.

[24] M. McLean and J. Moore, ‘‘FPGA-based single chip cryptographic solution,’’ Military Embedded Systems, 2007. [Online]. Available: http://www.mil-embedded.com/ pdfs/NSA.Mar07.pdf.

[25] Microsemi, ‘‘Igloo2 FPGAs revision 0,’’ 2013. [Online]. Available: www.microsemi.com/ document-portal/doc_download/132042- igloo2-fpga-datasheet.

[26] Microsemi, ‘‘Axcelerator family FPGAs,’’ 2012. [Online]. Available: http://www. microsemi.com/document-portal/doc_

download/130669-axcelerator-family-fpgas- datasheet.

[27] Microsemi, ‘‘Implementation of security in Microsemi Antifuse FPGAs,’’ Appl. Note AC168, 2012.

[28] Microsemi, ‘‘Security architecture,’’ 2013. [Online]. Available: http://www.microsemi. com/products/fpga-soc/technology-solutions/ security/security-architecture.

[29] Microsemi, ‘‘SmartFusion2 SoC FPGA reliability and security user’s guide,’’ 2013.

[30] A. Moradi, A. Barenghi, T. Kasper, and C. Paar, ‘‘On the vulnerability of FPGA bitstream encryption against power analysis attacks: Extracting keys from Xilinx Virtex-II FPGAs,’’ in Proc. ACM Conf. Comput. Commun. Security, 2011, pp. 111–124.

[31] A. Moradi, M. Kasper, and C. Parr, ‘‘Black-box side-channel attacks highlight the importance of countermeasuresVAn analysis of the Xilinx Virtex 4 and Virtex-5 bitstream encryption mechanism,’’ in Proc. 12th Conf. Topics Cryptol., 2012, DOI: 10.1007/ 978-3-642-27954-6_1.

[32] A. Moradi, D. Oswald, C. Paar, and P. Swierczynski, ‘‘Side channel attacks on the bitstream encryption mechanism of Altera Stratix II,’’ in Proc. ACM/SIGDA Int. Symp. Field-Programm. Gate Arrays, 2013, pp. 91–100.

[33] National Institute of Standards and Technology (NIST), ‘‘Recommendation for block cipher modes of operation,’’ Special Publ. 800-38A, 2001.

[34] M. Parlekar, ‘‘Authenticated encryption in hardware,’’ M.S. thesis, Electr. Comput. Eng. Dept., George Mason Univ., Fairfax, VA, USA, 2005.

[35] E. Peterson, ‘‘Developing tamper resistant designs with Xilinx Virtex-6 and 7 series FPGAs,’’ Xilinx, Appl. Note XAPP1084, 2012.

[36] L. Sanders, ‘‘Secure boot of Zynq-7000 all-programmable SoC,’’ Xilinx, Appl. Note XAPP 1175 (v1.0), 2013.

[37] B. Schneier, Applied Cryptography Second Edition. New York, NY, USA: Wiley, 1996.

[38] S. Skorobogatov and C. Woods, ‘‘Breakthrough silicon scanning discovers backdoor in military chip,’’ Cryptographic Hardware and Embedded SystemsVCHES 2012, vol. 7428. Berlin, Germany: Springer-Verlag, 2012, pp. 23–40.

[39] G. E. Suh and S. Devadas, ‘‘Physical unclonable functions for device authentication and secret key generation,’’ in Proc. Design Autom. Conf., 2007, pp. 9–14.

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

1264 Proceedings of the IEEE | Vol. 102, No. 8, August 2014

[40] A. Telikepalli, ‘‘Is your design secure?’’ Xilinx, 2003. [Online]. Available: http://www.xilinx. com/publications/archives/xcell/Xcell47.pdf.

[41] S. Trimberger, ‘‘Method and apparatus for protecting proprietary configuration data for programmable logic devices,’’ U.S. Patent 6 654 889, 2003.

[42] S. Trimberger, J. Moore, and W. Lu, ‘‘Authenticated encryption of FPGA bitstreams,’’ in Proc. 19th ACM/SIGDA Int. Symp. Field Programm. Gate Arrays, 2011, pp. 83–86.

[43] S. Trimberger, Field-Programmable Gate Array Technology. Norwell, MA, USA: Kluwer, 1994.

[44] S. Trimberger, ‘‘Trusted design in FPGAs,’’ in Proc. Design Autom. Conf., 2007, pp. 5–8.

[45] S. Trimberger and J. Moore, ‘‘FPGA security: From features to capabilities to trusted systems,’’ in Proc. 51st Annu. Design Autom. Conf., 2014, DOI: 10.1145/2593069.2602555.

[46] S. Trimberger, ‘‘Security in SRAM FPGAs,’’ IEEE Design Test Comput., vol. 24, no. 6, p. 581, Nov./Dec. 2007.

[47] S. Trimberger, ‘‘Three ages of FPGAs,’’ in FPGA20. Highlights of the International Symposium on Field-Programmable Gate Arrays, ACM, 2011, pp. 1–18.

[48] T. Tuan, T. Strader, and S. Trimberger, ‘‘Analysis of data remanence in a 90 nm FPGA,’’ in Proc. IEEE Custom Integr. Circuits Conf., 2007, pp. 93–96.

[49] T. Wollinger and C. Parr, ‘‘How secure are FPGAs in cryptographic applications,’’ Field Programmable Logic and Application,

vol. 2778, P. Y. K. Cheung, G. A. Constantinides, and J. T. de Sousa, Eds. Berlin, Germany: Springer-Verlag, 2003, pp. 91–100.

[50] Xilinx, ‘‘Virtex-4 FPGA configuration user guide, v1.11,’’ UG071, 2009.

[51] Xilinx, ‘‘Virtex-6 FPGA Configuration User Guide,’’ UG360, Jul. 30, 2010. [Online]. Available: http://www.xilinx.com/support/ documentation/user_guides/ug360.pdf.

[52] Xilinx, ‘‘Device reliability report, second quarter 2013,’’ UG116, 2013.

[53] A. Zeineddini and J. Wesselkamper, ‘‘PRC/ EPRC: Data integrity and security controller for partial reconfiguration,’’ Appl. Note XAPP887, 2012.

A B O U T T H E A U T H O R S

Stephen M. Trimberger (Fellow, IEEE) received

the B.S. degree in engineering and applied science

from the California Institute of Technology,

Pasadena, CA, USA, in 1977, the M.S. degree in

information and computer science from the

University of California at Irvine, Irvine, CA, USA,

in 1979, and the Ph.D. degree in computer science

from the California Institute of Technology in 1983.

He was employed at VLSI Technology from

1982 to 1988. Since 1988 he has been at Xilinx, San

Jose, CA, holding a number of positions. He is currently a Xilinx Fellow,

heading the Circuits and Architectures group in Xilinx Research Labs in

San Jose, CA, USA. He is an author and editor of five books as well as

dozens of papers and journal articles. He is an inventor on more than 200

U.S. patents in the areas of integrated circuit (IC) design, field-

programmable gate array (FPGA) and application-specific integrated

circuit (ASIC) architecture, computer-aided engineering (CAE), 3-D die

stacking semiconductors, and cryptography.

Dr. Trimberger is a four-time winner of the Freeman Award, Xilinx’s

annual award for technical innovation. He is a Fellow of the Association

for Computing Machinery (ACM).

Jason J. Moore received the B.S. degree in

electrical engineering from New Mexico State

University, Las Cruces, NM, USA, in 1992.

He is currently a Director of Market Segments

Engineering at Xilinx, Albuquerque, NM, USA,

focused on security and safety architectures. Pre-

vious to his assignments at Xilinx, he was respon-

sible for the development of field-programmable

gate array (FPGA)-based communication security

equipment in a wide range of avionics and ground-

based platforms at the Motorola Government Group. He has been

awarded multiple patents on cryptographic design in addition to novel

approaches for logical and functional isolation within a single FPGA.

Mr. Moore is a two-time winner of the Freeman Award, Xilinx’s annual

award for technical innovation.

Trimberger and Moore: FPGA Security: Motivations, Features, and Applications

Vol. 102, No. 8, August 2014 | Proceedings of the IEEE 1265

<< /ASCII85EncodePages false /AllowTransparency false /AutoPositionEPSFiles false /AutoRotatePages /None /Binding /Left /CalGrayProfile (Gray Gamma 2.2) /CalRGBProfile (sRGB IEC61966-2.1) /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2) /sRGBProfile (sRGB IEC61966-2.1) /CannotEmbedFontPolicy /Warning /CompatibilityLevel 1.4 /CompressObjects /Off /CompressPages true /ConvertImagesToIndexed true /PassThroughJPEGImages true /CreateJDFFile false /CreateJobTicket false /DefaultRenderingIntent /Default /DetectBlends true /DetectCurves 0.0000 /ColorConversionStrategy /LeaveColorUnchanged /DoThumbnails true /EmbedAllFonts true /EmbedOpenType false /ParseICCProfilesInComments true /EmbedJobOptions true /DSCReportingLevel 0 /EmitDSCWarnings false /EndPage -1 /ImageMemory 1048576 /LockDistillerParams true /MaxSubsetPct 100 /Optimize true /OPM 1 /ParseDSCComments false /ParseDSCCommentsForDocInfo true /PreserveCopyPage true /PreserveDICMYKValues false /PreserveEPSInfo true /PreserveFlatness true /PreserveHalftoneInfo true /PreserveOPIComments false /PreserveOverprintSettings true /StartPage 1 /SubsetFonts true /TransferFunctionInfo /Remove /UCRandBGInfo /Preserve /UsePrologue false /ColorSettingsFile () /AlwaysEmbed [ true ] /NeverEmbed [ true ] /AntiAliasColorImages false /CropColorImages true /ColorImageMinResolution 300 /ColorImageMinResolutionPolicy /OK /DownsampleColorImages true /ColorImageDownsampleType /Bicubic /ColorImageResolution 300 /ColorImageDepth -1 /ColorImageMinDownsampleDepth 1 /ColorImageDownsampleThreshold 1.50000 /EncodeColorImages true /ColorImageFilter /DCTEncode /AutoFilterColorImages false /ColorImageAutoFilterStrategy /JPEG /ColorACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /ColorImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /JPEG2000ColorACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /JPEG2000ColorImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 300 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages false /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /GrayImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /JPEG2000GrayACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /JPEG2000GrayImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict << /K -1 >> /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False /Description << /CHS <FEFF4f7f75288fd94e9b8bbe5b9a521b5efa7684002000410064006f006200650020005000440046002065876863900275284e8e9ad88d2891cf76845370524d53705237300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c676562535f00521b5efa768400200050004400460020658768633002> /CHT <FEFF4f7f752890194e9b8a2d7f6e5efa7acb7684002000410064006f006200650020005000440046002065874ef69069752865bc9ad854c18cea76845370524d5370523786557406300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c4f86958b555f5df25efa7acb76840020005000440046002065874ef63002> /DAN <FEFF004200720075006700200069006e0064007300740069006c006c0069006e006700650072006e0065002000740069006c0020006100740020006f007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400650072002c0020006400650072002000620065006400730074002000650067006e006500720020007300690067002000740069006c002000700072006500700072006500730073002d007500640073006b007200690076006e0069006e00670020006100660020006800f8006a0020006b00760061006c0069007400650074002e0020004400650020006f007000720065007400740065006400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006b0061006e002000e50062006e00650073002000690020004100630072006f00620061007400200065006c006c006500720020004100630072006f006200610074002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e> /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e002000410064006f006200650020005000440046002d0044006f006b0075006d0065006e00740065006e002c00200076006f006e002000640065006e0065006e002000530069006500200068006f006300680077006500720074006900670065002000500072006500700072006500730073002d0044007200750063006b0065002000650072007a0065007500670065006e0020006d00f60063006800740065006e002e002000450072007300740065006c006c007400650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f00620061007400200075006e0064002000410064006f00620065002000520065006100640065007200200035002e00300020006f0064006500720020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e> /ESP <FEFF005500740069006c0069006300650020006500730074006100200063006f006e0066006900670075007200610063006900f3006e0020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f00730020005000440046002000640065002000410064006f0062006500200061006400650063007500610064006f00730020007000610072006100200069006d0070007200650073006900f3006e0020007000720065002d0065006400690074006f007200690061006c00200064006500200061006c00740061002000630061006c0069006400610064002e002000530065002000700075006500640065006e00200061006200720069007200200064006f00630075006d0065006e0074006f00730020005000440046002000630072006500610064006f007300200063006f006e0020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e> /FRA <FEFF005500740069006c006900730065007a00200063006500730020006f007000740069006f006e00730020006100660069006e00200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e00740073002000410064006f00620065002000500044004600200070006f0075007200200075006e00650020007100750061006c0069007400e90020006400270069006d007000720065007300730069006f006e00200070007200e9007000720065007300730065002e0020004c0065007300200064006f00630075006d0065006e00740073002000500044004600200063007200e900e90073002000700065007500760065006e0074002000ea0074007200650020006f007500760065007200740073002000640061006e00730020004100630072006f006200610074002c002000610069006e00730069002000710075002700410064006f00620065002000520065006100640065007200200035002e0030002000650074002000760065007200730069006f006e007300200075006c007400e90072006900650075007200650073002e> /ITA <FEFF005500740069006c0069007a007a006100720065002000710075006500730074006500200069006d0070006f007300740061007a0069006f006e00690020007000650072002000630072006500610072006500200064006f00630075006d0065006e00740069002000410064006f00620065002000500044004600200070006900f900200061006400610074007400690020006100200075006e00610020007000720065007300740061006d0070006100200064006900200061006c007400610020007100750061006c0069007400e0002e0020004900200064006f00630075006d0065006e007400690020005000440046002000630072006500610074006900200070006f00730073006f006e006f0020006500730073006500720065002000610070006500720074006900200063006f006e0020004100630072006f00620061007400200065002000410064006f00620065002000520065006100640065007200200035002e003000200065002000760065007200730069006f006e006900200073007500630063006500730073006900760065002e> /JPN <FEFF9ad854c18cea306a30d730ea30d730ec30b951fa529b7528002000410064006f0062006500200050004400460020658766f8306e4f5c6210306b4f7f75283057307e305930023053306e8a2d5b9a30674f5c62103055308c305f0020005000440046002030d530a130a430eb306f3001004100630072006f0062006100740020304a30883073002000410064006f00620065002000520065006100640065007200200035002e003000204ee5964d3067958b304f30533068304c3067304d307e305930023053306e8a2d5b9a306b306f30d530a930f330c8306e57cb30818fbc307f304c5fc59808306730593002> /KOR <FEFFc7740020c124c815c7440020c0acc6a9d558c5ec0020ace0d488c9c80020c2dcd5d80020c778c1c4c5d00020ac00c7a50020c801d569d55c002000410064006f0062006500200050004400460020bb38c11cb97c0020c791c131d569b2c8b2e4002e0020c774b807ac8c0020c791c131b41c00200050004400460020bb38c11cb2940020004100630072006f0062006100740020bc0f002000410064006f00620065002000520065006100640065007200200035002e00300020c774c0c1c5d0c11c0020c5f40020c2180020c788c2b5b2c8b2e4002e> /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken die zijn geoptimaliseerd voor prepress-afdrukken van hoge kwaliteit. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.) /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f0070007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740065007200200073006f006d00200065007200200062006500730074002000650067006e0065007400200066006f00720020006600f80072007400720079006b006b0073007500740073006b00720069006600740020006100760020006800f800790020006b00760061006c0069007400650074002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e00650073002000690020004100630072006f00620061007400200065006c006c00650072002000410064006f00620065002000520065006100640065007200200035002e003000200065006c006c00650072002000730065006e006500720065002e> /PTB <FEFF005500740069006c0069007a006500200065007300730061007300200063006f006e00660069006700750072006100e700f50065007300200064006500200066006f0072006d00610020006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000410064006f0062006500200050004400460020006d00610069007300200061006400650071007500610064006f00730020007000610072006100200070007200e9002d0069006d0070007200650073007300f50065007300200064006500200061006c007400610020007100750061006c00690064006100640065002e0020004f007300200064006f00630075006d0065006e0074006f00730020005000440046002000630072006900610064006f007300200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002000650020006f002000410064006f00620065002000520065006100640065007200200035002e0030002000650020007600650072007300f50065007300200070006f00730074006500720069006f007200650073002e> /SUO <FEFF004b00e40079007400e40020006e00e40069007400e4002000610073006500740075006b007300690061002c0020006b0075006e0020006c0075006f00740020006c00e400680069006e006e00e4002000760061006100740069007600610061006e0020007000610069006e006100740075006b00730065006e002000760061006c006d0069007300740065006c00750074007900f6006800f6006e00200073006f00700069007600690061002000410064006f0062006500200050004400460020002d0064006f006b0075006d0065006e007400740065006a0061002e0020004c0075006f0064007500740020005000440046002d0064006f006b0075006d0065006e00740069007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f0062006100740069006c006c00610020006a0061002000410064006f00620065002000520065006100640065007200200035002e0030003a006c006c00610020006a006100200075007500640065006d006d0069006c006c0061002e> /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006f006d002000640075002000760069006c006c00200073006b006100700061002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400200073006f006d002000e400720020006c00e4006d0070006c0069006700610020006600f60072002000700072006500700072006500730073002d007500740073006b00720069006600740020006d006500640020006800f600670020006b00760061006c0069007400650074002e002000200053006b006100700061006400650020005000440046002d0064006f006b0075006d0065006e00740020006b0061006e002000f600700070006e00610073002000690020004100630072006f0062006100740020006f00630068002000410064006f00620065002000520065006100640065007200200035002e00300020006f00630068002000730065006e006100720065002e> /ENU (Use these settings to create Adobe PDF documents best suited for high-quality prepress printing. Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.) >> /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ << /AsReaderSpreads false /CropImagesToFrames true /ErrorControl /WarnAndContinue /FlattenerIgnoreSpreadOverrides false /IncludeGuidesGrids false /IncludeNonPrinting false /IncludeSlug false /Namespace [ (Adobe) (InDesign) (4.0) ] /OmitPlacedBitmaps false /OmitPlacedEPS false /OmitPlacedPDF false /SimulateOverprint /Legacy >> << /AddBleedMarks false /AddColorBars false /AddCropMarks false /AddPageInfo false /AddRegMarks false /ConvertColors /ConvertToCMYK /DestinationProfileName () /DestinationProfileSelector /DocumentCMYK /Downsample16BitImages true /FlattenerPreset << /PresetSelector /MediumResolution >> /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles false /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /DocumentCMYK /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> ] >> setdistillerparams << /HWResolution [600 600] /PageSize [612.000 792.000] >> setpagedevice

sources/157/Long et al. - 2019 - PUF-Based Anonymous Authentication Scheme for Hard.pdf

SPECIAL SECTION ON MOBILE EDGE COMPUTING AND MOBILE CLOUD COMPUTING: ADDRESSING HETEROGENEITY AND ENERGY ISSUES OF COMPUTE AND NETWORK RESOURCES

Received May 28, 2019, accepted June 19, 2019, date of publication June 26, 2019, date of current version September 13, 2019.

Digital Object Identifier 10.1109/ACCESS.2019.2925106

PUF-Based Anonymous Authentication Scheme for Hardware Devices and IPs in Edge Computing Environment JING LONG1,2, WEI LIANG 3, KUAN-CHING LI 4, (Senior Member, IEEE), DAFANG ZHANG5, MINGDONG TANG6, (Member, IEEE), AND HAIBO LUO7 1Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha 410081, China 2College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China 3School of Opto-Electronic and Communication Engineering, Xiamen University of Technology, Xiamen 361024, China 4Department of Computer Science and Information Engineering, Providence University, Taichung 43301, Taiwan 5College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China 6School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou 510006, China 7Industrial Robot Application of Fujian University Engineering Research Center, Minjiang University, Fuzhou, China

Corresponding author: Wei Liang ([email protected])

This work was supported in part by the National Natural Science Foundation of China under Grant 61572186 and Grant 61572188, in part by the Hunan Provincial Science & Technology Project Foundation under Grant 2018TP1018, in part by the Start-Up Funds of Hunan Normal University under Grant 531120-3812, in part by the Scientific Research Program of New Century Excellent Talents in Fujian Province University, China, in part by the Industrial Robot Application of Fujian University Engineering Research Center, China, in part by the Minjiang University, China, under Grant MJUKF-IRA201802, and in part by the Fujian Provincial Natural Science Foundation of China under Grant 2018J01570.

ABSTRACT With rapid advances in edge computing and the Internet of Things, the security of low-layer hardware devices attract more and more attention. As an ideal hardware solution, field programmable gate array (FPGA) becomes a mainstream technology to design a complex system. The designed modules are named as intellectual property (IP) cores. In this paper, we consider both misappropriation of hardware devices and software IPs in edge computing and propose a PUF-based IP copyright anonymous authenti- cation scheme. The scheme utilizes the double physical unclonable function (PUF) authentication model. Both the parties generate the challenge jointly in authentication to avoid replay attack and modeling attack on PUF circuit. The complexity of authentication is greatly reduced. Besides, the server of FPGA vendor is unnecessary to store all the challenge response pairs (CRPs) of each PUF-based chip due to the use of the double PUF authentication model. It saves the system resource and achieves better security. To protect software IP, IP core vendor inserts copyright information and anonymous buyer identity information into the design before trading. The anonymity of the buyer ensures the benefits of the buyer. With the participation of trustable device vendor, infringement behavior can be traced according to extracted fingerprints. The experiments show that the resource overhead of the proposed scheme is reduced by 61.96% and 31.61% by comparing with 2-1 DAPUF and built-in self-adjustable PUF. Besides, PUF stability is 99.54%. It demon- strates the good performance of the proposed scheme.

INDEX TERMS Edge computing, field programmable gate array (FPGA), IP cores, PUF authentication model, anonymous authentication.

I. INTRODUCTION With rapid advances in Internet-of-Things (IoT) and edge computing, hardware security is widely concerned by researchers and institutes all over the world [1]. As core components of hardware devices in edge computing, security of Field Programmable Gate Array (FPGA) design modules

The associate editor coordinating the review of this manuscript and approving it for publication was Junaid Shuja.

should not be neglected [2]. Due to integrated circuit (IC) manufacturing process, there are some inevitable differences in threshold voltage and oxide thickness of each produced chip [3]. Therefore, the physical structures of different chips have random differences even in the same manufacturing environment. The difference is similar to the human finger- print, iris and palm print. It will not affect normal functional- ity of chips, but can be used as unique intrinsic characteristic to identify chips. On basis of human identity authentication,

VOLUME 7, 2019 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ 124785

https://orcid.org/0000-0003-4909-4629

https://orcid.org/0000-0003-1381-4364

J. Long et al.: PUF-Based Anonymous Authentication Scheme for Hardware Devices and IPs in Edge Computing Environment

researchers presented to use the unique manufacturing dif- ference in physical structure to recognize identity of chips. The difference is named physical unclonable function (PUF), which is a microcircuit to extract the manufacturing char- acteristic from a complex physical system [4]. It will pro- duce a unpredictable unique response for an arbitrary input challenge due to the inevitable random differences in chip manufacture [5], [6]. Many PUF circuits with different types are proposed by research institutes and semiconductor com- panies in recent years, which are widely used in the fields of intellectual property (IP) protection [7], secret key generation and device authentication, etc [8].

FPGA is a semi-custom circuit in Application Specific Integrated Circuit (ASIC). It is widely used in IoT and edge computing environment due to its good programmable and reconfigurable capabilities [9]. In the view of security, IP protection techniques implemented on FPGA have better flexibility and require no extra resource overhead by com- paring to that on the traditional custom circuit. Therefore, PUF technology can be used for FPGA protection [10]. The unique intrinsic characteristic can be extracted by PUF as secret key, namely challenge response pair (CRP), which can be used as identity of a chip. In device licensing or authentication [11], [12], identity of a chip can be recognized by comparing the PUF response to the registered one.

Researches on PUF-based IP protection can ensure secu- rity of FPGA designs at the hardware level, thereby the ability of hardware circuit to resist attacks is enhanced. In this work, the participators and security issues of the entire IP trad- ing procedure are considered. A PUF based anonymous IP authentication scheme is proposed by using PUF and digital fingerprint techniques. A double PUF structure is proposed to authenticate both hardware FPGA and software IP. Therefore, FPGA vendor is unnecessary to store all CRPs in advance, thereby achieving good superiority in resource overhead, security and applicability. IP vendor can insert copyright information and anonymous identity of IP buyer into IP core before IP trading. It can realize passive IP protection and infringement tracing. The anonymity could protect benefits of IP buyer and track misappropriation behavior with partic- ipation of trustable device vendor.

This work is organized as follows. Section II analyzes previous PUF-based IP authentication schemes. Section III introduces the proposed double PUF model. In section IV, the PUF-based anonymous IP authentication is proposed. The security is analyzed in section V. Section VI evaluates experimental result. Finally, this paper is summarized.

II. RELATED WORK PUF is a novel technique to extract a ‘‘secret’’ from the complex physical system [13]. It utilizes inevitable ran- dom difference in hardware manufacturing and generates a secret with unique characteristic. Many PUF implementa- tions with various types are proposed in recent years. Based on the implementation principle, PUF can be classified into delay-based PUF (arbiter PUF, Ring Oscillator PUF) and

storage-based PUF (SRAM PUF, butterfly PUF). For the number of CRPs, there are strong PUF and weak PUF. The former (such as arbiter PUF) has numerous CRPs and is widely used in device authentication. The later has less CRPs and is mainly used in key generation. PUF input is challenge (C) and the output is response (R). In general, it appears as a challenge-response pair. The relationship between C and R can be represented by PUF(C) = R. R is different for different C, which can be evaluated by inter-hamming dis- tance. The difference of R by inputting the same challenge to a PUF can be measured by the intra-hamming distance. In the ideal situation, the response of a PUF for the same challenge will not be changed even the PUF is affected by external environmental factors, such as temperature and noise. The inter-hamming distance and intra-hamming distance can be intuitively shown by histogram.

The concept of PUF is firstly proposed by Pappu etal. [14]. After that, researchers all over the world attempt to focus on PUF based copyright protection techniques. Li et al. [15] utilized PUF, data selector and reconfigurable logic to hide original logic functions, thereby preventing illegal attackers obtaining complete circuit netlist by reverse engineering. This technique is suitable for combinational and sequen- tial logic circuits. Simulation results show the technique can realize high security with less than 10% area overhead. Kumar et al. [16] proposed a SRAM based ‘‘butterfly’’ PUF (BPUF) and a novel IP protection protocol. The proposed PUF utilized an unstable cross coupling circuit. Namely, the inverter is changed to a latch or trigger. The latch can store the circuit signal and can be cleared or reset. Real- time measurement is realized without being powered on. BPUF is suitable for all types of FPGAs. Besides, this team proposed a public key cryptography algorithm for FPGA IP protection [17]. It is unnecessary to store the key into FPGA device, thereby greatly improving the security of this algo- rithm. The improvement on security is realized with the cost of extra hardware overhead, but will not obviously degrade the performance.

To ensure the legality of IP core and make it use in a licensed device, Gora et al. [18] extracted 128 bit secret key by a PUF in FPGA and used it to encrypt software IP core. Therefore, an IP core is binding to a specific FPGA device. This scheme assumes system integration vendor is com- pletely trustable. All CRPs of PUF in FPGA are stored by the system integration vendor. Simpson et al. [19] used PUF to authenticate the third party IP and hardware platform. In this protocol, the trustable third party (TTP) knows IP content, thereby it may cause IP leakage caused by the untrustable third party. To address this issue, the authors in [20] proposed a novel PUF structure, and improved the authentication pro- tocol. In this protocol, TTP cannot obtain the content of IP core. The proposed PUF is utilized to generate the secret key for encryption and the message authentication code. The message code can be used to authenticate the originality of IP cores since encryption cannot realize authentication. Zhang et al. [21], [22] proposed several FPGA IP protection

124786 VOLUME 7, 2019

J. Long et al.: PUF-Based Anonymous Authentication Scheme for Hardware Devices and IPs in Edge Computing Environment

FIGURE 1. Improved PUF circuit.

methods to resist illegal replay attacks. Besides, they also proposed a delay-based PUF to protect FPGA IP core [23]. The above methods achieve good security but cause large hardware overhead. The author in [24] proposed a RO-PUF with low overhead and high performance to protect FPGA copyright. In authentication, all CRPs should be directly transmitted. If they are captured by illegal attackers, it will pose a great threat on the security of PUF, especially for strong PUF.

In this work, we consider the random difference in chip manufacturing and propose a PUF based anonymous IP authentication scheme to authenticate both the hardware chip and software IP. Firstly, a physical and simulated double PUF authentication model is proposed. In hardware authentication, it is unnecessary to transmit all CRPs, achieving better abil- ity against modeling attacks. The authentication parties can jointly generate the challenges of PUF and the response is matched for authentication. It can resist the replay attacks. In authentication, IP watermark and anonymous information of IP buyer can be inserted into the design for IP protec- tion and infringement tracing. The previous IP watermark- ing techniques can be directly used in the proposed method without additional modification. Only legal IP buyer can use IP core. If IP infringement occurs, the seller can track the illegal distribution and provide creditable evidences. It can prevent the dishonest seller acting as legal IP buyer to obtain compensation. Meanwhile, the identity of honest buyer is not leaked in authentication.

III. PUF-BASED AUTHENTICATION MODEL In this section, an improved arbiter PUF is realized on FPGA and a double PUF copyright authentication model is designed based on the improved PUF structure. This section will intro- duce various modules of the PUF structure and illustrate the designed PUF authentication model.

A. PUF CIRCUIT MODEL This section proposes a referenced improved arbiter PUF and its implementation on FPGA by considering the feature of IP protection protocol and the principle of arbiter PUF. As shown in Fig.1, the PUF includes three modules, chal- lenge generation, PUF feature extraction and signal voting respectively.

1) CHALLENGE GENERATION MODULE The challenge generation module includes linear feedback shift register (LFSR) and mixing function. The random

FIGURE 2. LFSR challenge generation module.

FIGURE 3. Structure of traditional arbiter PUF.

challenge signals generated by lightweight LFSR will be inputted to mixing function, thereby generating several groups of testable challenge signals. In ideal situation, a n level LFSR should have the characteristic of the maximum length sequence and the generated sequence satisfies the random characteristic of Golomb assumption. A n level LFSR consists of n flip-flops and several xor gates, as shown in Fig.2.

Where D denotes the flip-flop. f0, f1, f2, . . . , fn is feedback coefficient with the value of 0 and 1. fi = 0 represents no feedback path in the circuit and fi = 1 represents feedback path existing in the circuit. Initial challenges act as the input of LFSR. A new challenge will be generated by cyclic shift and sent to mixing function for producing multiple groups of challenges. The mixing function depends on the number of paths and levels of arbiter PUF. The extension function outputs the multiple groups of challenges to the PUF cir- cuit. For instance, 2-XOR PUF generates a 128-bit response. It requires two groups of 128-bit challenges generated by mixing function and acted as PUF input.

2) PUF-BASED CHARACTERISTIC EXTRACTION MODULE The proposed PUF structure belongs to arbiter PUF. It is a strong PUF and can provide numerous CRPs for specific application. Therefore, arbiter PUF has good ability against replay attacks due to the large number of CRPs. As arbiter PUF is not realized by detecting the absolute delay of a specific path, but by checking the relative delay difference of two symmetric paths. The PUF structure consists of multi- plexer and arbiter, as shown in Fig.3. The multiplexer has two input ports and two output ports. Each multiplexer and inner delay are various due to the manufacturing process. When a signal passes through the path, the delay time is different. If a challenge C = 0, the signal will pass through two paths directly. If C = 1, the signal will pass across the paths. By comparing the delay difference, if the top signal reaches the arbiter firstly, the arbiter will output 1. Otherwise, it will output 0.

VOLUME 7, 2019 124787

J. Long et al.: PUF-Based Anonymous Authentication Scheme for Hardware Devices and IPs in Edge Computing Environment

FIGURE 4. PUF implementation on FPGA.

FIGURE 5. Structure of Slice MUX.

However, the implementation of the traditional arbiter PUF on FPGA is difficult due to the coupling paths between multiplexers. It leads to asymmetric wiring, thereby the PUF response has low uniqueness. To address this issue, the authors in [25] proposed a double arbiter PUF. It effec- tively improves the uniqueness, but causing the growth of FPGA resources exponentially. On this basis, the authors in [26] pointed that, the coupling paths between multiplex- ers should be eliminated to realize symmetric wiring on FPGA. It can ensure good uniqueness of PUF response and reduce hardware resource of PUF. However, technique of [26] mainly reduces resource consumption of traditional arbiter PUF, which still has defects in terms of uniqueness and stability.

The proposed PUF structure is based on technique in [26] to reduce resource overhead of traditional implementation. The xor operation on outputs of two arbiter PUFs can effec- tively improve the uniqueness of PUF. Besides, signal vot- ing module is added in PUF to generate a stable response. This module follows the principle of minority subordinate to majority. The challenges are inputted into PUF circuit. The signal voting module will select the signal which appears more times as PUF response.

The PUF is implemented on Xilinx Virtex5, as shown in Fig.4. Here, both MUX components in each delay node are constituted by Slice MUX in Fig.5. As each slice includes four 6-inputs lookup table (LUT), several multiplexers, and other logic resources in Virtex5 FPGA. LUT is the basic unit to realize logic function. It can implement a 4:1 multi- plexer, thereby a slice can implement four 4:1 multiplexers. Similarly, four LUTs (namely a slice) can implement a 16:1 multiplexer. Besides, there are three specific

FIGURE 6. Structure of signal voting circuit.

multiplexers in Virtex5, F7AMUX, F7BMUX and F8MUX respectively. They can realize a 16:1 multiplexer with 11 con- trol signals by combining the LUTs. The paths are parallel and the control signals only change the transmission paths of signal within Slice MUX. The symmetry is easily realized in FPGA due to the parallel structure and the same structures of slices.

3) SIGNAL VOTING MODULE The signal voting circuit can select an output value which appears more times as the response by repeatedly inputting a challenge for several times. It follows the principle of minority subordinate to majority. It can avoid bit flipping caused by occasional factor and keep the stability of response with less hardware resources. In traditional implementation, error correction algorithm is widely used to realize stability. However, large hardware overhead is required, which is not suitable for lightweight PUF.

The structure of signal voting circuit is shown in Fig.6. The sampling counter ct is used to sample the decision result sr for several repeated challenges. rmaj represents the output that appears more times in sr. tmaj denotes times of rmaj. Firstly, the parameters of signal voting circuit are initialized. ct and tmaj are set as 0. When the challenge is given, ct starts sampling and the first response sr is used as initial value of rmaj. tmaj adds 1. If the second response is equal to rmaj, tmaj adds 1. Otherwise, tmaj reduces 1. When tmaj = 0, sr is compared to rmaj. If they are not consistent, rmaj is changed by sr. Above operations are repeated until the sampling finishes. The valid output of signal voting circuit is the value of rmaj.

B. DOUBLE PUF AUTHENTICATION MODEL The security of arbiter PUF is widely concerned in recent years. Arbiter PUF is a type of strong PUF [27]. PUF is unclonable. Namely, a simulated model with similar behavior to original physical PUF cannot be built based on PUF CRPs. However, existing arbiter PUFs can be modeled by software

124788 VOLUME 7, 2019

J. Long et al.: PUF-Based Anonymous Authentication Scheme for Hardware Devices and IPs in Edge Computing Environment

FIGURE 7. Double PUF based authentication model.

with enough CRPs. PUF response mainly depends on chal- lenge C and inner delay vector ω of PUF. ω can be calculated with enough PUF CRPs. The PUF model can be simulated by using machine learning algorithm. If an illegal attacker captures enough CRPs, the modeling attacks will be probably realized.

In previous PUF-based authentication techniques, CRPs generated by PUF are stored in database at the initial stage. The CRP will be removed from the database after a round of authentication. It can resist replay attacks. The defect of these techniques is FPGA vendor stores numerous CRPs. For strong PUF, the number of CRPs grows exponentially with the IC area. The recorded CRPs in registration may greatly exceed the requirements in authentication. The transmission of PUF CRPs requires secure channel to avoid machine learn- ing attacks.

In this work, a double PUF authentication model is pro- posed, as shown in Fig.7. In this model, FPGA manufacturer uses the simulated model and the physical PUF is deployed in chip. Legal manufacturer will set an access point for orig- inal PUF, from which the PUF CRPs can be collected. The collected CRPs can be legally analyzed and used to establish a simulated model. The access point will be destroyed per- manently after the model is successfully built. The authors in [28] pointed that, a PUF model with accuracy rate of 90% and error rate of 10% can be built with only 1000 CRPs in a short time. It is mainly simple PUF and the CRPs are completely leaked. Simulated PUF has similar behavior with the original physical PUF and can be used in identity authen- tication.

The response of n level arbiter PUF depends on delay difference of signal on each path, namely, the delay sum of all paths. The delay difference is related to the challenge signal. Therefore, µ1,i and µ0,i respectively denote the delay difference related with challenge ‘‘1’’ and ‘‘0’’ on the i− th path of n level arbiter PUF. FPGA manufacturer measures all CRPs of each chip via the access point and establishes the simulated model. For m level arbiter PUF, delay vector Eν = (ν0,ν2, . . . ,νm) can be calculated to build the simulated PUF model, as in (1).

  v0 = µ0,1 −µ1,1 vi = µ0,i +µ1,i +µ0,i+1 −µ1,i+1, i ∈ [1,m−1] . . .

vm = µ0,m−1 +µ1,m−1

(1)

FIGURE 8. Participators in IP trading.

At the output end, total delay 1D is the product of trans- posed delay vector and the characteristic vector Eϕ of chal- lenge C. Namely, 1D = (Eν)T Eϕ. If 1D > 0, we have R = 1. Otherwise, R = 0. The characteristic vector Eϕ of challenge C can be represented by equation (2).{

ϕi = ∏m

t=i (−1) Ct, i ∈ [0,k −1]

ϕk = 1 (2)

In this model, PUF challenge is constituted by the random numbers from both authentication parties. Malicious attack- ers cannot completely control PUF challenge in an round of authentication. All PUF CRPs are not transmitted directly. Therefore, attackers are difficult to capture enough CRPs for modeling attacks. In PUF implementation, outputs of arbiters are performed by xor operation to enhance the resistance against attacks. Besides, the benefits of participators in IP trading are considered in this work. The digital watermark is used to protect IP copyright and the piracy of IP buyer. Some existing watermarking techniques can be directly used in the proposed scheme without extra modification.

IV. ANONYMOUS IP AUTHENTICATION SCHEME A. PARTICIPATORS IN IP TRADING This work considers authentication both in software and hard- ware, mainly involving device authentication and IP authen- tication. The former is to authenticate the legality of chip and the latter is to protect copyright of IP owner. In entire IP trading, various participators should satisfy security protocol to guarantee their benefits.

The participators in IP trading are shown in Fig. 8, involv- ing FPGA vendor, IP vendor, system integration vendor, trustable third party, etc [29]. FPGA vendor (FV) relates to the semiconductor companies of FPGA, such as Xilinx, Altera. IP core vendor (CV) is companies or individuals who design and implement an IP core. System developer (SD) utilizes the hardware from FPGA vendor and IP core from IP vendor to design a complex system. The trustable third party is assumed as an authority institute that can be trusted by other participators. It can deal with data storage, processing and transmission.

FV manufactures a new type of FPGA every 12 to 18 months. The entire flow requires numerous efforts in design, manufacturing and verification. The number of tran- sistors at a single silicon is limited. Therefore, FV only implements embedded function in FPGA for majority of

VOLUME 7, 2019 124789

J. Long et al.: PUF-Based Anonymous Authentication Scheme for Hardware Devices and IPs in Edge Computing Environment

FIGURE 9. Participator registration protocol.

users or minority of big customers. FV has two consid- erations. On one hand, FPGA design should be protected from reverse engineering, illegal copy, leakage or tampering. On the other hand, some security measures are provided to protect the design of IP user and secure trading of IP core.

SD integrates the bought IP designs into a complex system. These IP designs may come from different IP vendors. The system will be realized by following IP integration rules. The protection of system should consider the cost of implementa- tion and make it valid in the whole surviving cycle.

CV can be FV or other companies to design and sell IP cores. After an IP core is successfully verified, CV can sell IP core with different types based on the design level. The main concentration of CV is to ensure IP cores being used by legal IP buyers after trading. Malicious infringement and resale of IP cores should be avoided. Besides, IP copyright can be authenticated and infringement can be tracked when IP infringement occurs.

TTP can deal with data storage, processing and transmis- sion in authentication protocol. It is easy to add a third party in protocol. However, it will cause many problems in practice. TTP stores lots of critical information and is vulnerable to illegal attacks, such as denial of service (DoS). FV has direct relationship with SD and CV. In this protocol, FV is regarded as TTP to simplify communication complexity of PUF-based IP trading protocol.

B. PROTOCOL IMPLEMENTATION The proposed anonymous IP authentication protocol includes registration, IP trading, copyright authentication and tracing.

1) REGISTRATION The registration protocol includes chip registration, IP regis- tration and SD registration, as shown in Fig.9. The content is described as follows.

• FPGA Registration

The PUF module will be inserted into each manufactured chip Fi. For a chip ID(F

i PUF), PUF CRPs will be tested

and used for analyzing the delay attribute. The delay vector of PUF is then stored into database DB for authenticating the identity of ID(FiPUF). FV issues ID(F

i PUF) for trading.

SD or CV sends ID(FiPUF) to FV and applies to buy F i PUF.

FV records the identity of buyer ID(SDi) or ID(CVi) and allocates a unique delay vector for them.

• IP Registration

CV applies to FV for IP registration and sends {SCVi,Hash(IPi),Description} to FV. After receiving

registration information {SCVi,Hash(IPi),Description} from CV, FV generates a random number Nc and a symmetric key KeyFC for calculating the following formulas.

ID(CVi) = Hash(SCVi) (3)

ID(IPi) = Hash(IPi)⊕Nc (4)

KFC = Hash(SCVi)⊕KeyFC (5)

VNc = Hash(SCVi||Nc||KeyFC) (6)

FV is assumed to be trustable. But in registration stage, CV sends the hash message and description of IP to FV for ensuring security of IP content. After allocating ID, FV stores {ID(CVi), ID(IPi),Hash(IPi),Description}. {ID(CVi), ID(IPi),KFC,VNc} will be returned to CV after registration. CV extracts Nc and KeyFC, and verifies whether the received content is tampered. If verification is success- ful, the registered identity ID(CVi) and IP registration infor- mation {ID(IPi), IPi,Nc,KeyFC,Description} are stored into database. ID(IPi) is issued for public trading.

• SD Registration

SD needs to buy software IP design and FPGA device from CV and FV to realize the complex system. For registration, SD sends the identity SSDi to FV. FV generates random num- ber Ns and a symmetric key KeyFS after receiving registration information from SD. The following equations are calculated.

ID(SDi) = Hash(SSDi)⊕Ns (7)

KFS = Hash(SSDi)⊕KeyFS (8)

VNs = Hash(SSDi||Ns||KeyFS) (9)

FV stores ID(SDi), Ns, SSDi and KeyFS into database. ID(SDi), KFS and VNs are sent to SD. The identity ID is unique for different SD. The third party except SD and FV knows nothing about the real identity of ID. Therefore, the identity of SD is anonymous to CV. After receiving the registration information from FV, SD extracts and verifies validity of Ns and KeyFS with SSDi. If verification is success- ful, {ID(SDi),Ns,KeyFS} is stored into database. Otherwise, registration is invalid.

2) IP TRADING PROTOCOL The trading protocol includes FPGA trading and IP trad- ing. FV makes IDs and descriptions of FPGA devices be public for trading. When SD or CV needs to buy FPGA device, ID(FiPUF) is used. FV stores trading record {ID(SDi)/ID(CVi),F

i PUF} and sends FPGA device F

i PUF to

SD or CV. IP trading protocol is shown in Fig.10 and the content is illustrated as follows. Step 1: CV stores ID(IPi) and IPi into database, issues

ID(IPi) for trading IPi. Step2: SD sends {ID(SDi), ID(IPi), ID(F

i PUF)} to FV and

applies fingerprint for IP trading. Step3: FV verifies legality of ID and generates ran-

dom number Ni for calculating a temporary identity of SD by equation 10 and a disposable trading password by

124790 VOLUME 7, 2019

J. Long et al.: PUF-Based Anonymous Authentication Scheme for Hardware Devices and IPs in Edge Computing Environment

TABLE 1. Illustration of symbols in protocol.

FIGURE 10. IP trading protocol.

equation 11. {IDti, ID(IPi),Pi} is sent to CV and SD.

IDti = ID(SDi)⊕Ni (10)

Pi = H(Ni||IDti||ID(SDi)) (11)

Step4: SD receives the temporary identity IDti and password Pi for IP trading. After that, SD gener- ates a random number a, calculates R(a) and sends {ID(FiPUF), IDti,a,R(a), ID(IPi)} to CV. Step5: CV finds IP core with ID(IPi), inserts copyright

information and anonymous fingerprint of IP buyer into IP core. IPwi is generated. CV determines whether IDti exists in database. If so, IDti is sent to FV. Step6: FV calculates ID(SDi) for IDti and finds {ωSDi}ID(SDi)

. By decryption, ωSDi is obtained and sent to CVi. Step7: CVi calculates R′(a) with ωSDi. If R

′(a) = R(a), SDi is successfully verified and ID(CVi) is sent to FV. Step8: FV searches {ωCVi}ID(CV i) with ID(CVi) from

database. {ωCVi}ID(CV i) is decrypted with ID(CVi) to get ωCVi. A random number b is generated and R′(b) is calculated with ωCVi. b is sent to CVi. Step9: CVi uses b as the challenge of PUFmod. The

response is R(b). R(b) and ID(FiPUF) are sent to FV. Step10: If R(b) = R′(b), FV successfully verifies CVi and

updates ID(CVi) = ID(CVni). Step11: CVi updates ID(CVi) = ID(CVni) and calculates a

combined challenge CFiPUF with a and b. The response RFiPUF

is generated by PUFmod. The encrypted IP core E(RFiPUF :

IPwi ) is obtained. {b, ID(F i PUF),E(RFiPUF

: IPwi )} is sent to SD. Step12: SDi receives E(RFiPUF

: IPwi ) and calculates a com- bined challenge with a and b. The response is then generated as a key to decrypt IPwi , making it run normally.

In each trading procedure, SD will apply a unique finger- print to ensure the anonymity of trading. Even the same SD buys IP cores for several times, CV knows nothing about the real identity of trading customer. IPwi realizes passive copyright authentication and tracing after active encryption protection is cracked. The temporary identity IDti of SD and the signature of CV can be used as fingerprint of IP buyer and copyright information respectively. In the anonymous IP authentication scheme, the identities of SDi and CVi should be firstly authenticated to prevent the decrypted IP cores being obtained by illegal users. In the double PUF authentica- tion model, FV will generate a random number a as challenge to verify the legality of CVi. It is unnecessary for FV and CVi to store all CRPs in advance, thereby it has good superiority in resource overhead, security and applicability. • There are two authentication processes before IP decryp- tion. Firstly, CVi authenticates SDi. In registration, a unique ID ID(SDi) is allocated to SDi, which is binding to a delay vector ωSDi. CVi applies to search {ωSDi}ID(SDi) with IDti. After decryption with ID(SDi), ωSDi is sent to CVi. CVi uses the random number a generated by SD as challenge of PUF with ωSDi and calculates the response R′(a). If R′(a) and R(a) are consistent, the identity of SDi is successfully authenticated. Only legal SDi can decrypt IP content.

• FV will authenticate CVi. CVi sends ID(CVi) to FV. FV searches {ωCVi}ID(CV i) in database with ID(CVi). After decryption with ID(CVi), ωCVi is generated. A ran- dom number b acts as challenge of PUF with ωCVi, the response is R′(b). The random number b is sent to CVi. CVi uses b as challenge of its PUF, producing R(b). R(b) and ID(FiPUF) are sent to FV for comparison. If R(b) is equal to R′(b), CVi is successfully authen- ticated. ID(CVi) should be updated to avoid legal CVi leaking the delay vector. Illegal CVi may cheat FV and pass the authentication.

3) COPYRIGHT AUTHENTICATION AND INFRINGEMENT TRACING In this section, we consider hardware authentication and soft- ware IP authentication. Assume there are two cases. • Legal SD buys hardware FPGA device from seller. He can send ID(FiPUF) to FV to authenticate the legality of the device. FV searches the database to determine whether the hardware ID exists. If so, a randomly selected CRP is returned to SD for authentication.

VOLUME 7, 2019 124791

J. Long et al.: PUF-Based Anonymous Authentication Scheme for Hardware Devices and IPs in Edge Computing Environment

After receiving the authentication information, SD cal- culates PUF response with the challenge. If the response is equal to that from FV, hardware identity is legal. Otherwise, SD can inform FV the forged behavior. Both SD and FV can track the initiator of forging and pursue their infringement.

• If CV finds IP core is misappropriated illegally, he can apply to authenticate IP copyright. CV sends the identity ID(IPi) of suspected IP and ID(F

i PUF) to FV. With par-

ticipation of FV, IDti and SSDi will be extracted from the suspected IP. If the extraction is successful, IP copyright can be proven. FV uses the extracted temporary identity IDti of SD, and search ID(SDi) and real identity to track the infringement.

V. SECURITY ANALYSIS The security analysis mainly involves counterfeit attack, modeling attack, replay attack and anonymity. For coun- terfeit attack, illegal attacker pretends to be legal partic- ipator and steals key information. Modeling attack learns the response of PUF in protocol and builds a PUF model with similar behavior to the original. Replay attack uses historical challenges to generate corresponding response key. Anonymity is that the participator uses temporary identity for privacy protection. Concrete analysis is illustrated as follows.

A. COUNTERFEIT ATTACK In the proposed protocol, an attacker cannot pretend to be a trustable FV. FV is a trustable participator with respon- sibility for registration, authentication, etc. PUF hardware circuit is implemented in manufactured chip. CRPs of PUF are analyzed to build PUF model. An attacker needs to obtain all information of device and trading, thereby he can pre- tend as an illegal FV. However, it is very difficult for an attacker to obtain these information since they are critical to FV. Besides, SD and CV encrypt both identities by pub- lic key for FV before transmission. Only trustable FV can decrypt and get the real identity information. Take SD for example, FV calculates ID(SDi) = Hash(SSDi) ⊕ Ns and KFS = Hash(SSDi) ⊕ KeyFS. Ns and KeyFS are then sent to SD. Other users cannot obtain Ns and KeyFS except SD. One-way hash function could verify whether the received information is from the trustable FV. Similarly, Ns and KeyFS can be also verified. In trading process, SD verifies whether VN1 = Hash(SSD1||Ns||KeyFS) satisfies, there by authenticat- ing FV. The use of hash function for authentication can avoid an attacker pretending as FV. Moreover, disposable trading password Pi = H(Ni||IDti||ID(SDi)) is used in trading. Ni is a random key. Trustable FV can verify Pi to determine validity of SD, thereby an illegal attacker cannot pretend as legal SD.

B. MODELLING ATTACK Modeling attack is the biggest threat for arbiter PUF. PUF response R mainly depends on challenge C and inner delay

vector ω. With enough PUF CRPs, the inner delay vector ω can be calculated and used for modeling a simulated PUF. An attacker may use machine learning to perform modeling attack. A suitability function f (.) is required to determine which PUF model is closest to the original one with a givenω. However, machine learning is only suitable for those single and simple PUFs. As xor operation can mix PUF response and improve PUF security effectively. In the proposed pro- tocol, PUF challenges are constituted by random numbers from both authentication parties. Malicious attackers cannot control a complete PUF challenge in one round of authentica- tion. The protocol will not transmit all CRPs directly. FPGA manufacturer implements a PUF in each manufactured chip and tests all PUF CRPs via an access point. After building a simulated PUF with the CRPs, the access point is destroyed permanently. The simulated PUF has similar behavior with the original one. FV can use the model to authenticate the device with the original PUF. Besides, an attacker requires Nmin CRPs at least to realize modeling attack on a N level PUF [27]. Here, Nmin = N/e. e is an error threshold. If the PUF model has an accuracy rate of 90%, the error thresh- old is 10%. In this work, 2-XOR PUF is used which has better security than the traditional one. In the double PUF authentication model, an attacker is difficult to obtain enough complete CRPs for modeling attack.

C. REPLAY ATTACK The ability against replay is analyzed in two aspects. On one hand, transmitted CRPs cannot be captured by illegal attacker in hardware authentication. PUF challenges are generated jointly by FV and SD to avoid an attacker capturing complete challenge. In the worst case, the attacker can obtain a half challenge. The proposed PUF structure has good performance on avalanche effect. Namely, One changed bit of PUF chal- lenge will cause over half of PUF response bit flipping. Attackers cannot realize replay attack even he captured half of PUF challenge. On the other hand, legal SD may forge IP copyright by replay attack and pretend as IP owner. In reg- istration, CV applies to FV for identity authentication and IP registration. When CV requires to authenticate IP owner- ship, he can extract identity information and the temporary identity of the buyer from IP design. FV can participate the authentication. FV can track the real identity of IP buyer with the temporary identity. The existence of trading record can be proven. However, malicious SD may also extract a forged copyright information. But it cannot convince FV and CV. In the worst case, malicious SD removes the copyright information of CV and the fingerprint of IP buyer, thereby IP design loses passive protection. In this case, hash message of IP design is compared to the stored one of FV. If both are consistent, the counterfeit behavior of SD is proven. How- ever, If SD inserts the fingerprint in IP design and inserts his own signature. The hash message will be different with that in the database of FV. If the trading record exists, IP bitstream can be also analyzed. If the result exceeds the threshold, SD is also probable to forge the IP copyright.

124792 VOLUME 7, 2019

J. Long et al.: PUF-Based Anonymous Authentication Scheme for Hardware Devices and IPs in Edge Computing Environment

TABLE 2. Comparison of PUF resource overhead.

D. ANONYMITY In registration, trustable FV calculates identity of SD with the real identity information SSDi and a random key Ns. In the trading procedure, FV will generate a new temporary trading identity IDti for SD with ID(SDi) and Ni. The temporary trading identity ensures privacy and security of IP buyer. The anonymous identity of SD will be sent to CV and inserted into IP design as the tracking evidence of infringement. Anonymity makes illegal CV difficult to pretend as legal SD to resell IP and falsely accuse SD for compensation. If SD resells IP illegally, CV can extract the identity information to prove IP ownership. The extracted anonymous identity of SD can be sent to FV for tracking the infringement.

VI. EXPERIMENT ANALYSIS In this section, the experiments are conducted on Xilinx Vir- tex5 FPGA for performance evaluation. The design tool ISE, logic synthesis software Synplify, simulation tool Modelsim are used in experiment. The PUF circuit is implemented in Virtex 5 FPGA device. After that, a 128-bit binary sequence is generated by random function and preset in the LFSR for challenge generation. With the shift pulse, the generated challenge will be inputted to PUF circuit, producing a PUF response. This section mainly evaluates the resource over- head and PUF performance.

A. RESOURCE OVERHEAD This work implements a 128-bit PUF response via 2-XOR PUF circuit. Except the hardware resource overhead of PUF itself, the assistant modules such as challenge generation, extension function, signal voting will also consume hard- ware resources. The comparative PUFs are respectively 2-1 DAPUF [25] and built-in self-adjustable PUF [30]. The com- parison result is listed in Table. 2

As show in Table.2, the proposed PUF has good perfor- mance in resource overhead. The built-in self-adjustable PUF determine the delay of two delay paths in implementation. It achieves good uniqueness and stability of PUF by the cost of hardware resources. 2-1 DAPUF includes two arbiters. XOR operation is used to improve the uniqueness. The pro- posed PUF uses four delay paths, but resource overhead is reduced by 31.61% by comparing to the self-adjustable PUF, and 61.96% by comparing to 2-1DAPUF.

B. PERFORMANCE EVALUATION This section evaluates the randomness, stability and unique- ness for PUF. The calculation equations of these metrics are referenced from [9] and the evaluation results are analyzed.

1) RANDOMNESS PUF generates an unpredictable response. Namely, PUF response with good randomness is difficult to predict by

inputting a challenge, thereby achieving better security. Gen- erally, the number of 0-bit and 1-bit are the same in the response signal. In other words, the ratio of 0 and 1 are close to 50% respectively in the response of the PUF, which demon- strates good randomness. The randomness can be represented by equation (12). Here, RD is quantified value of randomness. l denotes the index of a certain bit in response. ri,l is the bit value at the l-th position.

RD = 1 n

n∑ l=1

ri,l ×100% (12)

By sampling the generated 128-bit PUF response, the dis- tribution of 0-bit and 1-bit are recorded. The result shows that RD is 48.64%. The difference to ideal value is only 2.36%. The performance in randomness is encouraging.

2) STABILITY Stability means the response of a PUF is reliable. It can be quantified by intra variance, which represents the num- ber of changed bits in the response signal by inputting the same challenges to a PUF in different environments. In the- ory, the response will not be changed. However, it will be affected by external factors. If the differences among multiple responses for the same challenges fall into the range of the preset threshold, it will be acceptable. The intra variance is measured by HDintra. The value of HDintra close to 0 illus- trates good stability of the PUF. Let m and n be the number of responses and the number of bits in response respectively. Ri,k represents the response in k-th round. On this basis, the stability of PUF can be calculated by equation (13).

S = 1−HDintra

= 1− 1 m

m∑ k=1

HD(Ri,R′i,k)

n ×100% (13)

In this experiment, a random challenge is generated to evaluate the stability of response. FPGA device is partitioned into 15 regions and each can implement an independent PUF circuit. A challenge is repeatedly inputted to a PUF in each region for five times. The hamming distances of responses are recorded. In each region, there are 10 results for every two PUF responses. For different regions, there are 15 groups of results by repeating the challenge.

In Fig.11, x-axis is the percentage of hamming distance and y-axis is density of hamming distance in a region. The statistics of 150 hamming distances show that, about 71.3% of results falling within 0 ∼ 1%. Namely, majority of PUF responses has the instability less than 1%.

In Fig. 12, x-axis is the region and y-axis is the stability percentage of PUF. By repeating a challenge for several times, the PUF in a region will generate a constant response in theory. By equation (13), if the response is constant, HDintra is equal to 0 and the stability is 100%. The evaluation result for each point at x-axis can be regarded as an average value of multiple PUFs. Due to the use of signal voting module,

VOLUME 7, 2019 124793

J. Long et al.: PUF-Based Anonymous Authentication Scheme for Hardware Devices and IPs in Edge Computing Environment

FIGURE 11. Average intra hamming distance.

FIGURE 12. Stability of PUF response under the same temperature.

FIGURE 13. Stability of PUF response under different temperatures.

PUF will select the output that appears more times as the final response. The average stability value is 99.54%.

Besides, we evaluate the impact of environmental factor such as temperature on PUF stability. A hairdryer is used to simulate the environmental temperatures from 25◦C to 70◦C. The stability of PUF in a region is evaluated in environment

FIGURE 14. Density distribution of average hamming distances for pairs of PUF response in different regions.

of different temperatures and the result is shown in Fig.13. When temperature is changing from 25 ◦C to 70◦C, the insta- bility of PUF falls within 1%, thus demonstrating a good stability.

3) UNIQUENESS Uniqueness represents the response of a PUF is unclonable. It can be quantified by inter-variance. The PUF with the same structure is deployed into different chips. The inter- variance can be measured by the number of different bits in the response signal by inputting the same challenge to the deployed PUFs. The difference of physical structure of IC is randomly distributed. Thus the structures of different chips should be unclonable. Uniqueness can be evaluated by (14).

U = HDinter

= 2

t(t −1)

t−1∑ i=1

t∑ j=i+1

HD(Ri,Rj) n

×100% (14)

Here, HDinter is the average inter hamming distance. HD(Ri,Rj) is the inter hamming distance of two PUFs. Ri and Rj are the response of i-th and j-th PUF. t denotes the number of PUFs in experiment. The quantified value of uniqueness is calculated by the average value of hamming distances between response pairs of t-th PUF.

In this experiment, the regions of FPGA is also 15 and each implements a PUF to simulate PUF implementation on different chips. With the same challenges, the number of 0- bit and 1-bit in PUF responses are recorded. The hamming distances between pairs of responses are also calculated, producing 105 results. The density distribution of these ham- ming distances is shown in Fig.14. In this figure, x-axis is percentage of hamming distance and y-axis is the distribution density of hamming distance in a region. High histogram demonstrates large distribution density of hamming distances in a certain region.

The same PUF implemented in different chip will generate different responses by using the same challenges. In ideal case, the difference may be 50%. However, the evaluated

124794 VOLUME 7, 2019

J. Long et al.: PUF-Based Anonymous Authentication Scheme for Hardware Devices and IPs in Edge Computing Environment

TABLE 3. PUF performance comparison.

value may be different with the ideal value. In this experi- ment, the maximum and minimum values of inter hamming distance are 62.5% and 21.9%. Namely, there are 80 and 28 different bits respectively. By statistics, the average value of 105 hamming distances is 59.95 and the percentage is about 46.84%. There are more differences of different FPGAs than that of different regions in a FPGA. So, it will perform better in different FPGAs.

Finally, Table.3 lists the comparison of three PUFs. The first column is ideal values for various metrics. The self- adjustable APUF improves PUF uniqueness and random- ness by the self-adjustable module. It has better random- ness than the proposed PUF, but causing more hardware resources. By comparing to self-adjustable PUF and 2- 1 DAPUF, the proposed PUF improves the stability by 7.17% and 12.86% respectively. The uniqueness of the proposed PUF is slightly improved than 2-1 DAPUF.

VII. CONCLUSION Hardware authentication issues are critical in edge computing and IoT environment. To address this issues, a PUF based anonymous IP authentication technique is proposed for both hardware FPGA and software IP designs. When an infringe- ment occurs, the double PUF protocol can be used for authen- tication. In hardware authentication, challenges information is jointly generated by both authentication parties. It can resist against replay attack and modeling attack. In the double PUF authentication protocol, FPGA vendor is unnecessary to store numerous PUF CPRs, which saves plenty of storage. The IP copyright information and anonymous identity of IP buyer will be inserted into IP design before trading. It realizes pas- sive IP protection and infringement tracing. The anonymity can protect benefits of IP buyer and track IP infringement with the participation of trustable device vendor.

REFERENCES [1] W. Z. Khan, M. Y. Aalsalem, and M. K. Khan, ‘‘Communal acts of

IoT consumers: A potential threat to security and privacy,’’ IEEE Trans. Consum. Electron., vol. 65, no. 1, pp. 64–72, Feb. 2019.

[2] L. Zhang and C.-H. Chang, ‘‘A pragmatic per-device licensing scheme for hardware IP cores on SRAM-based FPGAs,’’ IEEE Trans. Inf. Forensics Security, vol. 9, no. 11, pp. 1893–1905, Nov. 2014.

[3] Q. Xiang, P. Zhang, and D. Ouyang, ‘‘Multiple frequency slots based physical unclonable functions,’’ J. Electron. Inf. Technol., vol. 34, no. 8, pp. 2007–2012, Aug. 2012.

[4] W. Liang, B. Liao, J. Long, Y. Jiang, and L. Peng, ‘‘Study on PUF based secure protection for IC design,’’ Microprocessors Microsyst., vol. 45, pp. 56–66, Aug. 2016.

[5] M. T. Rahman, F. Rahman, D. Forte, and M. Tehranipoor, ‘‘An aging- resistant RO-PUF for reliable key generation,’’ IEEE Trans. Emerg. Topics Comput., vol. 4, no. 3, pp. 335–348, Jul./Sep. 2016.

[6] Z. Huang and Q. Wang, ‘‘A PUF-based unified identity verification frame- work for secure IoT hardware via device authentication,’’ World Wide Web, Apr. 2019, pp. 1–32.

[7] A. Sengupta, D. Roy, and S. P. Mohanty, ‘‘Triple-phase watermarking for reusable IP core protection during architecture synthesis,’’ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 37, no. 4, pp. 742–755, Apr. 2018.

[8] Q. Guo, J. Ye, Y. Gong, Y. Hu, and X. Li, ‘‘PUF based pay-per-device scheme for IP protection of CNN model,’’ in Proc. IEEE 27th Asian Test Symp. (ATS), Oct. 2018, pp. 115–120.

[9] J. Zhang, Y. Lin, and Y. Lyu, ‘‘A PUF-FSM binding scheme for FPGA IP protection and pay-per-device licensing,’’ IEEE Trans. Inf. Forensics Security, vol. 10, no. 6, pp. 1137–1150, Jun. 2015.

[10] Q. Ma, C. Gu, N. Hanley, C. Wang, W. Liu, and M. O’Neill, ‘‘A machine learning attack resistant multi-PUF design on FPGA,’’ in Proc. 23rd Asia South Pacific Des. Automat. Conf. (ASP-DAC), Jan. 2018, pp. 97–104.

[11] Z. Siddiqui, O. Tayan, and M. K. Khan, ‘‘Security analysis of smartphone and cloud computing authentication frameworks and protocols,’’ IEEE Access, vol. 6, pp. 34527–34542, 2018.

[12] D. He, N. Kumar, M. K. Khan, L. Wang, and J. Shen, ‘‘Efficient privacy- aware authentication scheme for mobile cloud computing services,’’ IEEE Syst. J., vol. 12, no. 2, pp. 1621–1631, Jun. 2018.

[13] G. Li, P. Wang, and Y. Zhang, ‘‘A highly reliable lightweight PUF circuit with temperature and voltage compensated for secure chip identification,’’ in Proc. IEEE 12th Int. Conf. ASIC (ASICON), Oct. 2018, pp. 60–63.

[14] R. Pappu, B. Recht, J. Taylor, and N. Gershenfeld, ‘‘Physical one-way functions,’’ Science, vol. 297, no. 5589, pp. 2026–2030, Sep. 2002.

[15] D. Li, W. Liu, X. Zou, and Z. Liu, ‘‘Hardware IP protection through gate- level obfuscation,’’ in Proc. 14th Int. Conf. Comput.-Aided Des. Comput. Graph. (CAD/Graphics), Aug. 2015, pp. 186–193.

[16] S. S. Kumar, J. Guajardo, R. Maes, G.-J. Schrijen, and P. Tuyls, ‘‘Extended abstract: The butterfly PUF protecting IP on every FPGA,’’ in Proc. IEEE Int. Workshop Hardware-Oriented Secur. Trust, Jun. 2008, pp. 67–70.

[17] J. Guajardo, S. S. Kumar, G.-J. Schrijen, and P. Tuyls, ‘‘Physical unclon- able functions and public-key crypto for FPGA IP protection,’’ in Proc. Int. Conf. Field Program. Logic Appl., Aug. 2007, pp. 189–195.

[18] M. A. Gora, A. Maiti, and P. Schaumont, ‘‘A flexible design flow for software IP binding in commodity FPGA,’’ in Proc. IEEE Int. Symp. Ind. Embedded Syst., Jul. 2009, pp. 211–218.

[19] E. Simpson and P. Schaumont, ‘‘Offline hardware/software authentica- tion for reconfigurable platforms,’’ in Proc. Int. Workshop Cryptograph. Hardw. Embedded Syst., Oct. 2006, pp. 311–323,

[20] J. Guajardo, S. S. Kumar, G.-J. Schrijen, and P. Tuyls, ‘‘FPGA intrinsic PUFs and their use for IP protection,’’ in Proc. Int. Workshop Cryptograph. Hardw. Embedded Syst, 2007, pp. 63–80.

[21] J. Zhang, Y. Lin, Y. Lyu, R. C. C. Cheung, W. Che, Q. Zhou, and J. Bian, ‘‘Binding hardware IPs to specific FPGA device via inter-twining the PUF response with the FSM of sequential circuits,’’ in Proc. IEEE 21st Annu. Int. Symp. Field-Program. Custom Comput. Mach., Apr. 2013, p. 227.

[22] J. Zhang, Y. Lin, and G. Qu, ‘‘Reconfigurable binding against FPGA replay attacks,’’ Acm Trans. Design Autom. Electron. Syst., vol. 20, no. 2, Feb. 2015, Art. no. 33.

[23] J. Zhang, Q. Wu, Y. Lyu, Q. Zhou, Y. Cai, Y. Lin, and G. Qu, ‘‘Design and implementation of a delay-based PUF for FPGA IP protection,’’ in Proc. Int. Conf. Comput.-Aided Des. Comput. Graph., Nov. 2013, pp. 107–114.

[24] G. Zhang, Q. Liu, and Q. Zhang, ‘‘Low cost and high performance RO- PUF design for IP protection of FPGA implementations,’’ Xi’an Dianzi Keji Daxue Xuebao/J. Xidian Univ., vol. 43, no. 6, pp. 97–102, Dec. 2016.

[25] T. Machida, D. Yamamoto, M. Iwamoto, and K. Sakiyama, ‘‘Implementa- tion of double arbiter PUF and its performance evaluation on FPGA,’’ in Proc. 20th Asia South Pacific Des. Automat. Conf., Jan. 2015, pp. 6–7.

[26] Z. Liu, B. Liu, and Z. Lu, ‘‘FPGA design of low resource consumed arbiter PUF,’’ J. Huazhong Univ. Sci. Technol., vol. 2, pp. 5–8, Feb. 2016.

[27] J. Delvaux, R. Peeters, D. Gu, and I. Verbauwhede, ‘‘A survey on lightweight entity authentication with strong PUFs,’’ Acm Comput. Surv., vol. 48, no. 2, Nov. 2015, Art. no. 26.

[28] M. Majzoobi, M. Rostami, F. Koushanfar, D. S. Wallach, and S. Devadas, ‘‘Slender PUF protocol: A lightweight, robust, and secure authentication by substring matching,’’ in Proc. IEEE Symp. Secur. Privacy Workshops, May 2012, pp. 33–44.

[29] R. Maes, D. Schellekens, and I. Verbauwhede, ‘‘A pay-per-use licensing scheme for hardware IP cores in recent SRAM-based FPGAs,’’ IEEE Trans. Inf. Forensics Security, vol. 7, no. 1, pp. 98–108, Feb. 2012.

[30] Y. Gong, J. Ye, and Y. Hu, ‘‘Built-in self adjustable arbiter PUF,’’ J. Comput.-Aided Des. Comput. Graph., vol. 29, no. 9, pp. 1734–1739, 2017.

VOLUME 7, 2019 124795

J. Long et al.: PUF-Based Anonymous Authentication Scheme for Hardware Devices and IPs in Edge Computing Environment

JING LONG received the M.S. degree from the College of Computer Science and Engineering, Hunan University of Science and Technology, China, in 2012, and the Ph.D. degree from the College of Computer Science and Electronic Engi- neering, Hunan University, China, in 2018. She is currently a Lecturer with the College of Infor- mation Science and Engineering, Hunan Normal University. Her current research interests include hardware security, the Internet of Things, and network security.

WEI LIANG received the Ph.D. degree in com- puting science from Hunan University, China, in 2013. He is currently a Full Professor with the School of Opto-Electronic and Communica- tion Engineering, Xiamen University of Technol- ogy, China. His current research interests include steganography, real-time embedded systems, field programmable gate arrays, and vehicle networks.

KUAN-CHING LI (SM’07) is currently a Distinguished Professor with the Department of Computer Science and Information Engineering, Providence University, Taiwan. Besides publish- ing numerous research papers and articles, he is coauthor/co-editor of several technical profes- sional books published by CRC Press/Taylor & Francis, Springer, McGraw-Hill, and IGI Global. His research interests include parallel and dis- tributed processing, GPU/many-core computing,

and big data and cloud. He is a member of the AAAS, a Life Member of the TACC, and a Fellow of the IET. He received distinguished and chair professorships from universities in China and other countries, and a recipient of awards and funding support from several agencies and high-tech companies. He has been actively involved in several major conferences and workshops in program/general/steering conference chairman positions and has organized numerous conferences on high-performance computing and computational science and engineering.

DAFANG ZHANG was born in Shanghai. He received the Ph.D. degree in applied mathe- matics from Hunan University, China, in 1997, where he is currently a Professor, a Doctor, and a Ph.D. Supervisor with the College of Computer Science and Electronic Engineering. His research interests include dependable systems/networks, network security, network measurement, hardware security, and IP protection.

MINGDONG TANG received the B.S. degree in electrical engineering from Tianjin University, Tianjin, China, in 2000, the M.S. degree in control engineering from Shanghai University, Shanghai, China, in 2003, and the Ph.D. degree in computer science from the Institute of Computing Tech- nology, Chinese Academy of Sciences, Beijing, China, in 2010. He is currently a Professor with the School of Information Science and Technol- ogy, Guangdong University of Foreign Studies,

Guangzhou, China, and the Guangdong Provincial Key Laboratory of Com- putational Intelligence and Cyberspace Information, South China University of Technology, Guangzhou. He has published more than 100 peer-reviewed scientific research papers in various journals and conferences. His research interests include information security, service-oriented computing, data min- ing, and blockchain. He is a member of China Computer Federation and ACM.

HAIBO LUO received the B.E. degree in commu- nication engineering from the Wuhan University of Technology, China, in 2006, the M.E. degree in information and communication engineering from Hunan University, China, in 2009, and the Ph.D. degree with the College of Physics and Infor- mation Engineering, Fuzhou University, Fuzhou, China. His research interests include the Internet of Things, the cognitive Internet of Things, edge computing, and mobile computing.

124796 VOLUME 7, 2019

INTRODUCTION
RELATED WORK
PUF-BASED AUTHENTICATION MODEL

PUF CIRCUIT MODEL

CHALLENGE GENERATION MODULE
PUF-BASED CHARACTERISTIC EXTRACTION MODULE
SIGNAL VOTING MODULE

DOUBLE PUF AUTHENTICATION MODEL

ANONYMOUS IP AUTHENTICATION SCHEME

PARTICIPATORS IN IP TRADING
PROTOCOL IMPLEMENTATION

REGISTRATION
IP TRADING PROTOCOL
COPYRIGHT AUTHENTICATION AND INFRINGEMENT TRACING

SECURITY ANALYSIS

COUNTERFEIT ATTACK
MODELLING ATTACK
REPLAY ATTACK
ANONYMITY

EXPERIMENT ANALYSIS

RESOURCE OVERHEAD
PERFORMANCE EVALUATION

RANDOMNESS
STABILITY
UNIQUENESS

CONCLUSION
REFERENCES
Biographies

JING LONG
WEI LIANG
KUAN-CHING LI
DAFANG ZHANG
MINGDONG TANG
HAIBO LUO

sources/160/send.pdf

A Dissertation

entitled

PUF based FPGAs for Hardware Security and Trust

Muslim Mustapa

Submitted to the Graduate Faculty as partial fulfillment of the requirements for the

Doctor of Philosophy Degree in Electrical Engineering

_________________________________________

Dr. Mohammed Niamat, Committee Chair

_________________________________________

Dr. Mansoor Alam, Committee Member

_________________________________________

Dr. Jackson Carvalho, Committee Member

_________________________________________

Dr. Junghwan Kim, Committee Member

_________________________________________

Dr. Weiqing Sun, Committee Member

_________________________________________

Dr. Patricia R. Komuniecki, Dean

College of Graduate Studies

The University of Toledo

August 2015

This document is copyrighted material. Under copyright law, no parts of this document

may be reproduced without the expressed permission of the author

iii

An Abstract of

PUF based FPGAs for Hardware Security and Trust

Muslim Mustapa

Submitted to the Graduate Faculty as partial fulfillment of the requirements for the

Doctor of Philosophy Degree in Electrical Engineering

The University of Toledo

August 2015

Hardware security threats have become a major issue in the technology sector and

cyberspace. In 2011, more than 1300 counterfeit incidents were reported from around the

world to the Electronic Resellers Association International (ERAI). The incidents

reported in 2011 were more than double compared to the incidents reported in 2010 and

2008. The federal contract report states that counterfeiting of electronic parts has

threatened the operability and reliability of the US weapons system. Electronic parts

counterfeiting has become a very big business perpetrated by corrupt operators.

Just like ASIC semiconductors, reconfigurable hardware is also prone to hardware

security threats. The most commonly used reconfigurable hardware is the Field

Programmable Gate Array (FPGA). Demand for FPGAs has increased as can be seen by

the growth in FPGA companies such as Xilinx and Altera. Despite the increased demand

and use of FPGAs in the market, there is a great concern that security is not currently a

part of the FPGA hardware and software to the fullest extent. Design theft, and hardware

tampering threats on FPGAs can be dealt using Ring Oscillator Physical Unclonable

Function (ROPUF). A ROPUF takes advantage of the process variation on a silicon chip

to generate a unique ID for the purpose of authentication. A ROPUF can be implemented

on an FPGA chip to produce a unique ID for each FPGA chip. An adversary that tries to

tamper with the ROPUF inadvertently changes the properties of the process variation in

the silicon chip; thus any tampering attempt can be detected.

In this research, ROPUF based hardware security for FPGAs is presented. A total

of 50 Xilinx FPGAs are used in our investigation. Performance in terms of uniqueness

and reliability is evaluated. The effects of temperature variation, voltage variation, and

aging on these parameters are also studied. Our work shows that lower number of stages

used in the Ring Oscillator (RO) offers better security feature. The lower number of

stages used in ROs yield higher Challenge and Response Pairs (CRPs). The higher

number of CRPs contributes to higher security. In addition, we have introduced a

technique called Random Patch Mixer (RPM) to minimize the systematic variations

effect on the frequency generated from ROPUFs on FPGA. The results obtained by using

RPM technique are shown to be better than other techniques that have been proposed

before. The responses generated from ROPUF after applying the RPM technique passed

most of the NIST statistical test for randomness. Finally, we show how the ROPUF can

be used for the security of a Smart Grid. The security of ROPUF system is also tested

using support vector machine (SVM). The SVM is trained using a large data set of

challenges to predict the response sets. Results obtained show that the SVM fails to

predict ROPUF responses based on the challenges, thus enhancing the security offered by

the proposed authentication system.

Acknowledgements

My wonderful Ph.D. journey will never be a success without great people around

me. Firstly, I would like to thank my advisor, Dr. Mohammed Niamat for his guidance,

advice and support throughout my time here at University of Toledo. Secondly, my

thanks go to my Ph.D. committee members who have spent their time to review my

research work and share their thoughts with me. I cannot forget to thank all my

colleagues for the fun time that we have had together. I also would like to use this

opportunity to thank my sponsor, the Malaysian Government, which has given me the

opportunity to further my Ph.D. studies. Finally, and the most important one for me, I

would like to thank my wife, children, parents, and my family for their continuous

support, understanding, and sacrifices throughout this Ph.D. journey.

Table of Contents

Abstract .............................................................................................................................. iii

Acknowledgements ..............................................................................................................v

Table of Contents ............................................................................................................... vi

List of Tables .......................................................................................................................x

List of Figures ................................................................................................................... xii

List of Abbreviations .........................................................................................................xv

1 Introduction ..............................................................................................................1

1.1 Motivation ..........................................................................................................2

1.2 Research Objectives ...........................................................................................3

2 Research Background ..............................................................................................5

2.1 Process Variations ..............................................................................................5

2.2 Physical Unclonable Functions (PUFs) .............................................................6

2.2.1 Arbiter Physical Unclonable Function (APUF) ....................................6

2.2.2 Butterfly Physical Unclonable Function (BPUF) .................................7

2.2.3 Ring Oscillator Physical Unclonable Function (ROPUF) ...................7

2.2.4 PUF Implementation on FPGA .............................................................8

2.3 Challenge and Response on ROPUF .................................................................9

vii

2.4 RO Delay .........................................................................................................10

3 Frequency Uniqueness in ROPUF on FPGA .........................................................11

3.1 Introduction ......................................................................................................11

3.2 Experimental Setup ..........................................................................................12

3.3 Results and Analysis ........................................................................................14

3.3.1 Impact of Number of Stages on PUFs ................................................14

3.3.2 Explanation of High Frequency Variation in a 3-Stage RO ...............16

3.4 Summary ..........................................................................................................20

4 Relationship between Number of Stages in ROPUF and CRP Generation ...........21

4.1 Introduction ......................................................................................................21

4.2 Background ......................................................................................................22

4.2.1 Number of Stages of the RO ...............................................................23

4.2.2 RO Parameters ....................................................................................24

4.3 Experimental Setup ..........................................................................................26

4.4 Results and Analysis ........................................................................................29

4.5 Summary ..........................................................................................................36

5 A Novel RPM Technique for Minimize Systematic Variations ............................38

5.1 Introduction ......................................................................................................38

5.2 Systematic Variations and Bit Flip Minimization ...........................................40

5.2.1 The Effect of Systematic Variations ...................................................40

5.2.2 Regression Based Distiller ..................................................................40

5.2.3 RPM Technique ..................................................................................42

5.3 Results and Discussion ....................................................................................43

viii

5.3.1 Systematic Variations Effect on Frequency Distribution ...................44

5.3.2 Systematic Variations Minimization by RPM Technique ..................46

5.3.3 NIST Statistical Test for Randomness ................................................50

5.4 Summary .........................................................................................................51

6 Temperature, Voltage, and Aging Effects on ROPUFs Function ..........................52

6.1 Introduction ......................................................................................................52

6.2 Background ......................................................................................................53

6.2.1 Ring Oscillator PUF Response ...........................................................54

6.2.2 Number of Stages in Ring Oscillator ..................................................54

6.3 Experimental Setup ..........................................................................................54

6.4 Results and Analysis ........................................................................................56

6.5 Summary ..........................................................................................................66

7 A Comparative Study of Ring Oscillator PUFs on Different FPGA Families ......67

7.1 Introduction ......................................................................................................67

7.2 Related Work ...................................................................................................69

7.3 Background ......................................................................................................70

7.3.1 Ring Oscillator PUF Response ...........................................................70

7.3.2 Number of Stages in Ring Oscillator ..................................................70

7.3.3 ROPUF Parameters .............................................................................70

7.4 Experimental Setup ..........................................................................................73

7.5 Results and Analysis ........................................................................................74

7.5.1 ROPUF Uniqueness ............................................................................75

7.5.2 ROPUF Uniformity .............................................................................78

7.5.3 ROPUF Bit Aliasing ...........................................................................79

7.5.4 ROPUF Reliability ..............................................................................79

7.5.2 ROPUF Diverseness ...........................................................................84

7.5 Summary ..........................................................................................................84

8 ROPUF Application: Hardware-Oriented Security-Based Authentication for

Advanced Metering Infrastructure .........................................................................85

8.1 Introduction ......................................................................................................85

8.2 Related Work ...................................................................................................88

8.3 Hardware-Oriented Security-Based Authentication for AMI ..........................90

8.3.1 ROPUF Design ...................................................................................96

8.3.2 Authentication .....................................................................................98

8.4 Proof of Concept ............................................................................................101

8.4 Summary ........................................................................................................106

9 Conclusions ..........................................................................................................107

9.1 Summary and Conclusions ............................................................................107

9.2 Contributions and Results ..............................................................................108

9.3 Future Works .................................................................................................110

References ........................................................................................................................112

List of Tables

2.1 Comparison of ROPUF, APUF, and BPUF .............................................................9

3.1 Standard deviation for all RO stages .....................................................................15

3.2 Number of slice and CLB used on FPGA for single RO .......................................18

4.1 Uniqueness, bit-aliasing, diverseness, and uniformity for 3, 5, and 7-stage ROs .31

4.2 SD for Chip 1, Chip 2, and Chip 3 .........................................................................32

4.3 Reliability for Chip 3 .............................................................................................33

4.4 Number of comparison pairs generated on Chip 1, Chip 2, and Chip 3 ................36

5.1 Hamming Distance (HD) for FPGA chip 1 before RPM is applied ......................49

5.2 Hamming Distance (HD) for FPGA chip 1 after RPM is applied .........................49

5.3 NIST statistical test for randomness results ...........................................................51

6.1 Bit flip occurrences on Spartan 3E ........................................................................57

6.2 Bit flip occurrences due to aging on Spartan 3E....................................................59

6.3 Percentage of bit flip occurrences ..........................................................................65

6.4 Number of comparison pairs according to threshold frequency ............................66

7.1 ROPUF’s parameters comparison..........................................................................75

7.2 ROPUF’s reliability due to changes in temperature, voltage, and aging ...............80

8.1 Comparison of different schemes based on Smart Grid requirements ..................89

8.2 Number of possible CRPs ......................................................................................98

8.3 Authentication time for each level .......................................................................102

8.4 Data storage size for each authentication level ....................................................102

8.5 Data storage size needed based on number of devices on the AMI .....................103

xii

List of Figures

2-1 Arbiter Physical Unclonable Function (APUF) .......................................................6

2-2 Butterfly Physical Unclonable Function (BPUF) ....................................................7

2-3 Ring Oscillator Physical Unclonable Function (ROPUF) .......................................8

2-4 ROPUF circuit .........................................................................................................9

3-1 ROs mapped used three slices ...............................................................................13

3-2 Test circuit diagram ...............................................................................................13

3-3 Frequency pattern for a 3-stage ring oscillator ......................................................14

3-4 Frequencies (MHz) for 3, 5, 7, 9, and 11 stage ring oscillators .............................18

4-1 5-stage RO .............................................................................................................22

4-2 3-stage RO .............................................................................................................23

4-3 7-stage RO .............................................................................................................23

4-4 Xilinx Spartan 2 CLBs layout ................................................................................27

4-5 FST circuit diagram ...............................................................................................28

4-6 Difference between comparison pairs and CRPs ...................................................34

5-1 Frequencies across columns 1-3 of the CLBs ........................................................41

5-2 Location of ROs on Spartan 2 FPGA ....................................................................42

5-3 3-stage RO frequencies across Spartan 3E ............................................................44

xiii

5-4 3-stage RO frequencies across Spartan 3E FPGA after RPM technique has been

applied ....................................................................................................................46

6-1 ROs numbering system based on spatial location..................................................56

6-2 The relationship between the RO frequency distance and the probability of a PUF

output flip ...............................................................................................................57

6-3 Frequency changes in ROs due to the aging effect on Spartan 3E ........................60

6-4 Frequency changes with respect to the temperature variations on Spartan 3E ......62

6-5 Frequency changes with respect to the voltage variations on Spartan 3E .............63

6-6 Temperature chamber ............................................................................................64

7-1 RO frequencies versus location on Spartan 3E ......................................................76

7-2 RO frequencies versus location on Artix-7 ............................................................77

7-3 Spartan 3E ..............................................................................................................81

7-4 Artix-7 ....................................................................................................................82

7-5 RO frequency changes with respect to aging .........................................................83

8-1 AMI in Smart Grid .................................................................................................92

8-2 ROPUF connected to a smart meter.......................................................................92

8-3 Smart meter to utility company authentication ......................................................93

8-4 Smart meter to utility company fail authentication................................................93

8-5 Data concentrator to utility company authentication .............................................94

8-6 Utility company to smart meter authentication ......................................................95

8-7 Utility company to smart meter fail authentication ...............................................95

8-8 ROPUF logic blocks ..............................................................................................97

8-9 Parity Bits PBi generator .....................................................................................99

xiv

8-10 Parity bits from 128 response bits form 64 parity bits ...........................................99

8-11 ROPUFs registration with utility company .........................................................100

8-12 Parity bits and corresponding ROs ......................................................................104

8-13 SVM prediction accuracy for ROPUF .................................................................105

8-14 Bit flip probability vs. frequency difference (MHz) ............................................106

List of Abbreviations

AMI ............................Advanced Metering Infrastructure

APUF .........................Arbiter Physical Unclonable Function

BPUF..........................Butterfly Physical Unclonable Function

CLB ............................Configurable Logic Blocks

CPBP ..........................Challenges and Parity Bit Pairs

FDT ............................Frequency Difference Thresholds

FPGA .........................Field Programmable Gate Array

FST .............................Full Scan Technique

HD ..............................Hamming Distance

HW .............................Hamming Weight

MUX ..........................Multiplexer

PUF ............................Physical Unclonable Function

RO ..............................Ring Oscillator

ROPUF .......................Ring Oscillator Physical Unclonable Function

RPM ...........................Random Patch Mixer

SD ..............................Standard Deviation

SVM ...........................Support Vector Machine

Chapter 1

Introduction

As a society, a lot of trust is placed on the hardware we use on a daily basis. For

example, communication regularly occurs on sophisticated digital phones or computers.

These same devices are also capable of monitoring our bank accounts and buying and

selling goods electronically. For these reasons, it is vital that the security of hardware

devices continues to improve in order to ensure the secure transfer of information across

untrusted networks. By using hardware based authentication, a digital system can verify

that a user is in fact who he or she claims to be through the use of unique secret keys. This

secret key can be stored in the memory, or generated specifically when it needs to be used.

The first option is not used because memory is vulnerable to inexpensive attacks [5][6].

The second option is more appealing because it is both simple to implement and difficult

to attack.

Physically Unclonable Functions (PUFs) are one way of generating secret keys on

the spot, without relying on memory. PUFs exploit process variations, which are

unintentionally introduced during the manufacturing process of integrated circuits. The

process variation in turn causes small amounts of additional delays within the circuit. By

using this additional delay effectively, secure bits can be generated. Silicon PUFs (SPUFs)

are PUFs that are specifically designed to take advantage of the silicon manufacturing

process. The SPUFs are designed to exploit the process variation and circuit delays to

create unique challenge-response patterns [6]. There are two kinds of SPUF; Arbiter PUFs

and Ring Oscillator PUFs (ROPUFs). An Arbiter PUF is constructed from multiplexers

and an arbiter. ROPUFs are constructed from delay loops (ring oscillators) and counters.

Arbiter PUF circuits need to be symmetric in order to ensure that the routing

lengths are the same [6]. ROPUFs on the other hand do not need to be symmetric. For this

reason, ROPUFs are the preferred solution when working with FPGAs [7]. There are

many techniques used by researchers in order to improve the reliability and uniqueness of

ROPUFs [6][8][9][10]. Reliability means that the secret key generated by the ROPUF will

be the same despite any change in operating conditions [8]. Uniqueness, on the other hand,

refers to how each and every FPGA is able to generate a unique secret key [8].

1.1 Motivation

A ROPUF takes advantage of the process variation on the silicon chip to generate

a unique ID for authentication. A ROPUF can be implemented on any VLSI chip

including a FPGA to produce a unique ID for each FPGA chip. An adversary who tries to

tamper a ROPUF will change the properties of the process variation in the silicon chip;

thus any tampering effort will fail [3]. A ROPUF cannot be modeled because the process

variation on a silicon chip is random. Until now, there was no technology that could

measure the process variation with high accuracy [3].

1.2 Research Objectives

ROPUF research areas can be divided into four main categories: fabrication

variation extraction [11], secret selection [6][9][10][11][22][25][26], error correction, and

tests for security and reliability. Fabrication variation extraction is the study on the

physical behavior of the silicon chip. This is the most fundamental research area in

ROPUF that interacts directly with the process variation. The uniqueness and reliability

parameters of the ROPUF are studied thoroughly in this work to take full advantage of

the process variation [11][12]. Secret selection is the study of the algorithm to select the

comparison pairs that are known as challenge and response pairs. The randomness

parameter of the ROPUF is also studied in this research. Error correction is studied by

using an algorithm that corrects any flipped bits. This is important especially for ROPUF

implementation as a cryptography technique, where zero bit flipped occurrence is

expected [6]. Finally, tests for security and reliability research look into the diffuseness,

bit-aliasing, and probability of misidentification parameters of ROPUF.

This research focusses on process variation extractions, secret selection of

challenge-response pairs, and tests for reliability and security. For the process variation

extraction, the relationship between different numbers of stages used in RO with the

ROPUF’s reliability and uniqueness is studied. The objective of this study is to improve

the ROPUF’s uniqueness and reliability parameters by manipulating the structure of ROs.

For the challenge-response secret selection, the systematic variation effect on the ROPUF

is studied. The objective in this study is to develop an algorithm that can dismiss the

systematic variation effect on ROPUF. For the tests and security reliability, we have

conducted a study on the weaknesses of the ROPUFs. The objective in this study is to

develop an algorithm which will enhance the security and reliability of the ROPUF.

This dissertation is organized as follows:

Chapter One: This chapter briefly introduces the motivation and objective of this

research.

Chapter Two: This chapter gives background information about PUF, RO and ROPUF.

Chapter Three: This chapter discusses the relationship between the number of stages used

in an RO and the uniqueness of the frequency of the RO.

Chapter Four: This chapter discusses the relationship between the number of stages used

in a ROPUF and the number of challenge-response pairs on CRPs on a FPGA.

Chapter Five: This chapter discusses the RPM technique developed to dismiss systematic

variation effect on a ROPUF.

Chapter Six: This chapter discusses the temperature, voltage, and aging effects in

ROPUF.

Chapter Seven: This chapter discusses a comparative study of ring oscillator PUFs on

different FPGA families.

Chapter Eight: This chapter discusses the hardware-oriented security-based

authentication for advanced metering infrastructure.

Chapter Nine: This chapter forms the conclusion of this dissertation and also discusses

future work.

Chapter 2

Research Background

2.1 Process Variations

The reduced feature sizes in silicon chip devices make it hard to attain uniformity

in manufacturing. This results in variation in the transistor gate length and oxide

thickness that introduces propagation delays in the silicon chip [13]. This variability in

the manufacturing is known as process variation. Process variations are random and

cannot be controlled. Process variations can be divided into two types namely intra-die

variations and inter-die variations. Intra-die variations are the variations within a single

die and inter-variations are variations from chip to chip.

Intra-die variations can be categorized into two types: systematic (process shift)

and stochastic (process spread) variations [13]. Systematic variations are created by

reticle stepper alignment errors, mask errors from inaccuracies in the process model, and

lithographic off-axis focusing errors. The sources of stochastic variations are: wafer

unevenness, non-uniformity in resist thickness, and vibrations during lithography.

2.2 Physical Unclonable Functions (PUFs)

A PUF is a chip level structure that deliberately exploits random process

manufacturing variations to establish the chip’s identity. There are three common types

of delay PUFs that can be used to extract the delay introduced by the process variations:

Arbiter PUF (APUF), Butterfly PUF (BPUF), and ROPUF.

2.2.1 Arbiter Physical Unclonable Function (APUF)

An APUF is composed of two identically configured delay paths that are

stimulated by an activating signal as shown in Figure 2-1. The difference in the

propagation delays of the signals in the two delay paths is measured by an edge triggered

flip-flop known as an arbiter. There are two main components used in an APUF:

switches, and the arbiter [14]. Various response bits can be generated by configuring

different delay paths.

Figure 2-1: Arbiter Physical Unclonable Function (APUF).

2.2.2 Butterfly Physical Unclonable Function (BPUF)

A BPUF consists of two cross coupled latches as shown in Figure 2-2. The BPUF

exploits the random assignment of a stable state from an unstable state that is forcefully

imposed by holding one latch in preset while holding the other in clear mode by an

excitation signal. The final state is determined by the random delay mismatch in the pair

of feedback paths and the excitation signal paths due to process variations [15].

Figure 2-2: Butterfly Physical Unclonable Function (BPUF).

2.2.3 Ring Oscillator Physical Unclonable Function (ROPUF)

A ROPUF is composed of an odd series of inverters. The RO frequency is generated from

the inverted signal that travels through the RO loop as shown in Figure 2-3. The presence

of process variations inside logic gates and wires causes an uneven delay across the chip.

Figure 2-3: Ring Oscillator Physical Unclonable Function (ROPUF).

A pair of ROs could produce two different frequencies because of the presence of process

variations.

2.2.4 PUF Implementation on FPGA

Researchers have compared the implemetation of APUF, BPUF and ROPUF on

FPGAs [16]. For APUF and BPUF, the implementation on FPGA is tedious because both

designs need to be symmetric as shown in Table 2.1. It is almost impossible to get

symmetric design on an FPGA because the design needs to be mapped using a fixed

routing. ROPUF design does not need symmetric design which makes it the best

candidate for PUF on FPGAs. ROPUF implemetation on FPGAs require identical

instantiation as shown in Figure 2-3.

A pair of

ROs

Table 2.1: Comparison of ROPUF, APUF, and BPUF [16].

ROPUF APUF BPUF

Does not require

symmetric routing in a

building block.

Requires symmetric routing in

a building block.

Requires symmetric routing in

a building block.

Building blocks require

identical instantiation.

Identical instantiation of

building blocks may not be

necessary.

Identical instantiation of

building blocks may not be

necessary.

2.3 Challenge and Response on ROPUF

The response bit from an ROPUF can be generated by comparing the output

signals of two ROs. More response bits can be generated by comparing additional RO

pairs. The RO pair selection of RO pairs is determined by the challenge. For example,

RO1 and RO2 pair generates one response bit and RO3 and RO4 pair generates another

response bit. All ROs are connected to two MUXs, as shown in Figure 2-4. The challenge

bits for the ROPUF circuit shown in Figure 2-4 are applied at the input of each MUX.

Figure 2-4: ROPUF circuit.

The challenge selects one RO to each MUX. The selected RO from each MUX will be

fed into the counter to measure the number of cycles generated from each RO for a

certain period of time. After both counters have measured the number of cycles from

each RO, the comparator will compare the number of cycles generated from each RO.

Finally, the response bit is generated. The logic used is: if the number of cycles measured

from the first counter is larger than the number of cycles measured from the second

counter, then the response bit is ‘1’; otherwise, it is ‘0’ or vice versa.

2.4 RO Delay

Equation 2-1 shows the RO delay is comprised of three components [11]. The

parameter 𝑑𝑎𝑣𝑔is the delay component that comes from the routing and is the same for all

ROs. The parameter 𝑑𝑃𝑉𝑎 is the delay component that comes from the process variations

and is expected to be different for different ROs. The parameter 𝑑𝑁𝑂𝐼𝑆𝐸𝑎 comes from the

noise factor and is a dynamic component that changes over time. When the delay

between two ROs are compared (𝑑𝑎 and 𝑑𝑏), the 𝑑𝑎𝑣𝑔 cancels each other. Thus, the delay

difference between two ROs comes from process variations and noise delay components.

𝑑𝑎 = 𝑑𝑎𝑣𝑔 + 𝑑𝑃𝑉𝑎 + 𝑑𝑁𝑂𝐼𝑆𝐸𝑎 (2-1)

𝑑𝑏 = 𝑑𝑎𝑣𝑔 + 𝑑𝑃𝑉𝑏 + 𝑑𝑁𝑂𝐼𝑆𝐸𝑏 (2-2)

𝑑𝑎 − 𝑑𝑏 = (𝑑𝑃𝑉𝑎 − 𝑑𝑃𝑉𝑏 ) + (𝑑𝑁𝑂𝐼𝑆𝐸𝑎 + 𝑑𝑁𝑂𝐼𝑆𝐸𝑏 ) = ∆𝑑𝑃𝑉 + ∆𝑑𝑁𝑂𝐼𝑆𝐸 (2-3)

Chapter 3

Frequency Uniqueness in ROPUF on FPGA

Hardware security in Field Programmable Gate Arrays (FPGAs) that use PUF

rely on the ability to produce a large number of unique frequencies. This chapter explores

the frequency uniqueness as it relates to the number of stages used to build an RO.

3.1 Introduction

There are many techniques used by researchers in order to improve the reliability

and uniqueness of ROPUFs [6][8][9][10]. Reliability means that the secret key generated

by the ROPUF will be the same despite changing operating conditions [8]. Uniqueness on

the other hand refers to how each and every FPGA is able to generate a unique secret key

[8].

The measure of how much each ring oscillator frequency varies from the next is

called frequency uniqueness. By increasing the frequency uniqueness of a system, it is

possible to increase the security of that system. Until now, to the best of our knowledge,

there has not been any research on how the number of stages in a ring oscillator PUF

affects the frequency uniqueness. This chapter addresses this issue.

3.2 Experimental Setup

This section explains the procedure used to determine frequency uniqueness for

varying stages of ring oscillators. For this purpose, three FPGA development boards are

used. Each board has a single Xilinx Spartan 2 XC2S100 TQ144 FPGA. The three boards

generate a total of 60 ring oscillators for each stage. Initially, data was obtained at room

temperature. Various configurations of ring oscillators are tested, including rings

oscillators with 3, 5, 7, 9, and 11 stages. For each of these stages, the frequency produced

by each ring oscillator is measured and recorded.

The first step in designing the experiment is to create a hard macro for a single ring

oscillator. This ensures that the routing lengths for each ring oscillator are identical. It is

important that the routing lengths are the same so that there is no additional delay in any

single ring oscillator. The total delay for each ring oscillator is as shown in Equation 2-1.

For the FPGAs used in this experiment, one Configurable Logic Block (CLB) is

composed of two slices, as shown in Figure 3-1. For the hard macros used in this

experiment, each slice contains only one inverter. So an N stage ring oscillator will use N

inverters, N slices and N/2 CLBs. This macro is horizontally placed 20 times on the

Spartan 2 FPGA as shown in Figure 3-2. Each of these ring oscillators is connected to a 1-

to-20 demultiplexer which acts as an enable signal for each of the ring oscillators. The

purpose of this is to ensure that neighboring ring oscillators do not cause extra noise while

they are not in use. Enabling all oscillators at the same time will also produce extra heat

that could affect the frequencies being generated [13].

The output of the ring oscillators is fed to a 20-to-1 multiplexer with the same

select lines as the demultiplexer shown in Figure 3-2. The output of the multiplexer can

be sampled and measured. Measurements for this experiment are done using an Agilent

16801A Logic Analyzer. By using a logic analyzer, the entire waveform produced by the

ring oscillators is observed and counted. The counting feature of the logic analyzer is

particularly useful because the patterns produced by the ring oscillators are not uniform,

as shown in Figure 3-3. So, the frequencies reported are actually the average frequencies.

Figure 3-1: ROs mapped use three slices.

Figure 3-2: Test circuit diagram.

Figure 3-3: Frequency pattern for a 3-stage ring oscillator.

3.3 Results and Analysis

In this section, the results of the experiments described in the previous section are

presented and analyzed. This section is divided into two parts relating to the number of

stages used in the ring oscillator and the reasons why some ring oscillators vary in

frequency more than others. The first part focuses on the security of the Physically

Unclonable Functions on an FPGA as it relates to the number of stages used. The second

part focuses on the reasons why 3-stage ring oscillators have a higher variation in terms of

frequency as compared to ring oscillators with more stages.

3.3.1 Impact of Number of Stages on PUF

Figure 3-4 displays the average frequency (MHz) produced by the ring oscillators.

Each line represents the results based on the number of stages that were used while

implementing the ring oscillators on the Spartan 2 FPGA. For each of the different stage

ring oscillators, the frequency produced is nearly constant, except for the 3-stage ring

oscillators. The 3-stage ring oscillators have a much greater frequency variation compared

with others.

While ring oscillators with more than 3 stages may vary by a few MHz, 3-stage

ring oscillators have been shown to vary from 120 MHz to nearly 200 MHz. The

frequencies produced by 5, 7, 9, and 11 stage ring oscillators remain almost constant in

comparison. Their frequencies are centered around the 115 MHz, 80 MHz, 65 MHz and

55 MHz marks, respectively. As the number of stages increases, it appears that the

frequencies become more consistent. The value of the average frequency generated by the

ring oscillator also decreases as the number of stages increase. As more stages are added,

more delays are introduced in the circuit.

Table 3.1 shows the standard deviation of the frequencies produced from the ring

oscillators when configured with different number of stages. As the table shows, the

standard deviation is very low for all ring oscillators with more than 3 stages. A low

standard deviation indicates that the ring oscillators may be more susceptible to bit

flipping caused by noise, and therefore, erroneous results. For this reason, these ring

oscillators are not suitable for ROPUF applications. However, the larger standard

deviation in the 3-stage ring oscillator makes it more appropriate for ROPUF applications.

Table 3.1: Standard deviation for all RO stages.

Number of

stages

Standard

deviation

3 stages 11.3

5 stages 1.6

7 stages 0.74

9 stages 1

11 stages 0.62

In Table 3.1, the highest standard deviation for multistage ring oscillators occurs

when there are 3 stages. This indicates that a 3-stage ring oscillator will have the highest

frequency uniqueness of any of the multistage ring oscillators measured. Due to the high

frequency uniqueness, this model will be useful in generating secret keys, since the

likelihood of flipping the bits is low.

By assuming that there are N ring oscillators that produce unique frequencies, the

circuit will produce log2(N!) bits of entropy [6]. If 60 unique ring oscillators existed on a

device, 272 security bits could reliably be produced. If 100 unique ring oscillators existed,

525 security bits could reliably be produced. This is only possible when each frequency is

sufficiently unique from the others. As the frequency uniqueness is reduced, so is the

possible number of security bits generated.

As the frequencies of two ring oscillators approach each other, the possibility for

comparison errors increase due to noise. At one instance in time, the first frequency may

be faster; however, at another instance, the second might be slightly faster. This will result

in a flipped security bits, making the secured message completely unreadable by the

receiving party. To reduce the possibility of this happening, it is important that there be a

minimum difference between all ring oscillator frequencies that will be used. Ring

oscillators that are not sufficiently unique should not be used. To generate a large number

of security bits, it is important to maximize the number of unique frequencies produced by

the ring oscillators.

3.3.2 Explanation of High Frequency Variation in a 3-stage RO

This section discusses the reasons why 3-stage ring oscillators have the highest

variations among the ring oscillators tested. In [18], some of the basic ideas of why lower

stage oscillators have higher process variation are discussed. One of the reasons this

occurs is that increasing the number of stages used by a ring oscillator also increases the

correlation coefficient between the actual delays and the theoretical delays. As more

stages are added, the delay is less dependent on process variation. This means that the

frequencies generated by a ring oscillator will converge to a central frequency as the

number of stages increases.

The authors of [13] have shown that there are two types of variation within-die,

systematic and stochastic. Systematic variation is caused by lithographic off-axis focusing

errors, reticle stepper alignment errors, and mask errors due to inaccuracies in the process

model. Stochastic variation is caused by non-uniformity in resists thickness, vibrations

during lithography, and wafer unevenness. The researchers in [11] suggest that systematic

variations are the primary cause of process variation, and thus, the largest influence on

frequency uniqueness. Certain patterns are enforced in the die via systematic variation that

reduces frequency uniqueness as the ring oscillators increases in stages.

There is a direct relationship between the amount of space consumed on an FPGA

and the number of stages in a ring oscillator. Table 3.2 shows that as the number of stages

increases, so do the number of slices and CLBs used, and therefore, space consumed on

the FPGA also increases. As this area increases, the delay from one ring oscillator will

begin to correspond more with the delay from other ring oscillators [18]. In effective

ROPUF applications, this correspondence should be minimized. By minimizing the

correspondence in delay, the ROPUF will effectively be able to produce more secure bits,

and thus, increase the security of the application.

Table 3.2: Number of slice and CLB used on FPGA for single RO.

Ring oscillator Slice used CLB used

3 stages 3 2

5 stages 5 3

7 stages 7 4

9 stages 9 5

11 stages 11 6

(a)

Figure 3-4: Frequencies (MHz) for 3, 5, 7, 9, and 11 stage ring oscillators.

(a) FPGA Board 1.

(b)

(c)

Figure 3-4: Frequencies (MHz) for 3, 5, 7, 9, and 11 stage ring oscillators.

(b) FPGA Board 2.

100

120

140

160

180

200

R o

w 1

R o

w 2

R o

w 3

R o

w 4

R o

w 5

R o

w 6

R o

w 7

R o

w 8

R o

w 9

R o

w 1

R o

w 1

R o

w 1

R o

w 1

R o

w 1

R o

w 1

R o

w 1

R o

w 1

R o

w 1

R o

w 1

R o

w 2

3-stages

5-stages

7-stages

9-stages

11-stages

F re

q u

e n

cy (

M H

100

120

140

160

180

200

R o

w 1

R o

w 2

R o

w 3

R o

w 4

R o

w 5

R o

w 6

R o

w 7

R o

w 8

R o

w 9

R o

w 1

R o

w 1

R o

w 1

R o

w 1

R o

w 1

R o

w 1

R o

w 1

R o

w 1

R o

w 1

R o

w 1

R o

w 2

3- stages 5- stages 7- stages 9- stages

F re

q u

e n

cy (

M H

3.4 Summary

The number of stages of a ring oscillator plays a critical role in generating secure

bits on a FPGA. By choosing the correct number of stages while designing a ring

oscillator, the number of unique frequencies can be maximized. As the number of unique

frequencies increases, the number of frequency comparisons also increases; thus, creating

more secure bits, which could be used in a secret key.

Chapter 4

Relationship between Number of Stages in ROPUF and

CRP Generation

4.1 Introduction

Physical Unclonable Function (PUF) is commonly used to prevent hackers from

stealing information from semiconductor chips. PUFs utilize the process variations on

the chip to create an irreversible function that generates unique response bits for each

challenge. A good response bit can be generated by comparing two Ring Oscillators (RO)

frequencies that have a significant amount of difference. An insignificant amount of

frequency difference can cause bit flip in the response bit generated. A higher threshold

for the frequency difference is preferred to dismiss the bit flip occurrence. As the

frequency difference threshold (FDT) increases, the number of challenge and response

pairs (CRP) is reduced. In this chapter, it is shown that the higher Standard Deviation

(SD) of RO frequencies can compensate the higher FDT. The Full Scan Technique (FST)

is used on different number of RO stages to determine the number of stages that have the

highest SD for RO frequencies. The experimental results show that the SD of RO

frequencies increase as the number of stages decrease. It is also shown that by reducing

the number of stages, good Inter-Hamming Distance (HD), Hamming Weight (HW), and

percentage of bit flip occurrences can still be obtained.

Despite the promising solution offered by ROPUF, there are still challenges that

need to be overcome for ROPUF to become a practical solution. Making the ROPUF

response better in uniqueness and increasing its reliability are among the challenges.

Uniqueness refers to the ability of similar ROPUF circuits to generate unique responses

on different chips. Reliability refers to the generation of the same response under various

environmental conditions, such as temperature and humidity.

This chapter focuses on the process variation extraction for a ROPUF on a FPGA.

Three different RO stages are tested and compared in terms of the SD, HW, and inter-

HD. The three different RO stages are tested using our new proposed Full Scan

Technique (FST), which records frequencies from all CLBs available on the FPGA.

4.2 Background

RO frequency is generated from the inverted signal that travels through the RO

loop, as shown in Figure 4-1. The presence of process variation inside logic gates and

wires causes an uneven delay across the chip. As a result, a pair of ROs will produce two

different frequencies: fa and fb. The frequencies are compared to see if fa is greater than

fb. If fa is greater than fb, response bit 1 is generated; otherwise, the response is 0 as

shown in Equation 4-1.

Figure 4-1: 5-stage RO.

𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒 𝑏𝑖𝑡 = { 1 𝑖𝑓 𝑓𝑎 > 𝑓𝑏

0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (4-1)

4.2.1 Number of Stages of the RO

In this experiment, there are three different number of stages used. Figure 4-1

shows the 5-stage RO where each component in the RO counts as one stage. The 5-stage

RO consists of one NAND gate and 4 inverter gates. The NAND gate is used to control

the switching of the RO. The RO is activated (starts to produce an oscillation) when the

input is set to high. Figure 4-2 shows the 3-stage RO. The 3-stage RO consists of one

NAND gate, one buffer gate and two inverter gates. The reason for using a 3-stage RO is

explained in section 4.4. One buffer gate is used instead of inverter gate because the

inverting components need to be odd in number in order to produce an oscillation. The

buffer gate is added to increase the total delay in the RO; therefore, reducing the RO

frequency. Finally, Figure 4-3 shows the 7-stage RO. The 7-stage RO consist of one

NAND gate and 6 inverter gates.

Figure 4-2: 3-stage RO.

Figure 4-3: 7-stage RO.

4.2.2 RO Parameters

There are number of parameters proposed to measure PUF performance, such as

uniformity, reliability, steadiness, uniqueness, diverseness, bit-aliasing, and probability of

misidentification [12][13][14][15][16]. In this research, 4 existing parameters and one

newly proposed parameter are used. The 4 existing parameters are chosen based on the

suitability of measuring the performance of the different number of stages used in the

ROPUF. The 4 parameters are uniqueness, reliability, uniformity and bit-aliasing. One

new parameter proposed in this research is diverseness. Uniqueness represents the ability

of a PUF to uniquely differentiate a particular chip among a group of chips of the same

type [12]. Uniqueness can be measured by calculating the inter-chip HD, as shown in

Equation 4-2. In this equation, m is the number of chips used, u and v are the two chips

being compared, and n is the number of response bits generated. Ru and Rv are the

response bits from the same challenge C for chip u and v. HD is the hamming distance

between response bits generated from chip u and v. A good uniqueness value is around

50%. This means that at least 50% of the responses generated from chip u and v differ

from each other (responses obtained by giving the same challenge to chip u and v).

𝑈𝑛𝑖𝑞𝑢𝑒𝑛𝑒𝑠𝑠 = 2

𝑚(𝑚−1) ∑ ∑

𝐻𝐷(𝑅𝑢,𝑅𝑣)

𝑛 × 100%𝑚𝑣=𝑢+1

𝑚−1 𝑢=1 (4-2)

Reliability refers to how efficient a PUF is in reproducing the response bits.

Reliability can be measured by using Equation 4-3 and 4-4. Rs is the response from chip i

at normal operating condition (at room temperature). Rs,t is t-th sample of R’s response

from chip i at different operating conditions such as different temperature setting. A good

reliability value is 100%. As can be seen in Equation 4-4, if the HD intra (comparison of

response under normal operating conditions and different operating conditions) is low or

zero, then the reliability is around 100%.

𝐼𝑛𝑡𝑟𝑎 − 𝑐ℎ𝑖𝑝 𝐻𝐷 = 1

𝑘 ∑

𝐻𝐷(𝑅𝑠,𝑅′𝑠,𝑡)

𝑛 × 100%𝑘𝑡=1 (4-3)

𝑅𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 100% − 𝐻𝐷 𝐼𝑛𝑡𝑟𝑎 (4-4)

Uniformity estimates how uniform the ratio of ‘0’s and ‘1’s is in the response bits

of a PUF. Uniformity can be measured by calculating the intra-chip Hamming Weight

HW, as shown in Equation 4-5 where rs,l is the l-th binary bit. A good value for

uniformity is around 50%, which means the response from RO is well distributed

between ‘0’s and ‘1’s.

𝑈𝑛𝑖𝑓𝑜𝑟𝑚𝑖𝑡𝑦 = 1

𝑛 ∑ 𝑟𝑠,𝑙 × 100%

𝑛 𝑙=1 (4-5)

Bit-aliasing estimates the uniformity of ‘1’s and ‘0’s in each bit in the responses

across a group of chips of the same type. Bit-aliasing can be measured by calculating the

inter-chip HW, as shown in Equation 4-6. A good value for bit-aliasing is 50%, which

means each bit in the responses across a group of chip is well distributed between ‘0’s

and ‘1’s. Uniformity and bit-aliasing are the parameters that can measure the randomness

features in the responses generated.

𝐵𝑖𝑡 − 𝑎𝑙𝑖𝑎𝑠𝑖𝑛𝑔 = 1

𝑚 ∑ 𝑟𝑠,𝑙 × 100%

𝑚 𝑖=1 (4-6)

Finally, the new parameter, diverseness, is used to measure the range of

frequencies in the different number of stages used in a ROPUF. Diverseness can be

measured by calculating the standard deviation SD of the frequencies from each stage

used in an ROPUF as shown in Equation 4-7, 4-8, and 4-9. h is the number of ROs used

on a chip. fi,j is the individual frequency for each RO. fi,j,q is the q-th frequency sample of

the j-th RO in the i-th chip. favg is the average frequency on a chip.

𝐷𝑖𝑣𝑒𝑟𝑠𝑒𝑛𝑒𝑠𝑠 = √ 1

ℎ−1 ∑ (𝑓𝑖,𝑗 − 𝑓𝑎𝑣𝑔 )

2ℎ 𝑗=1 (4-7)

𝑓𝑖,𝑗 = 1

𝑞 ∑ 𝑓𝑖,𝑗,𝑞

𝑞 𝑞=1 (4-8)

𝑓𝑎𝑣𝑔 = 1

ℎ ∑ 𝑓𝑖,𝑗

ℎ 𝑗=1 (4-9)

4.3 Experiment Setup

In this experiment, three Xilinx Spartan 2 XSA-100 boards are used. There are

600 CLBs on each chip, as shown in Figure 4-4 [19]. Each CLB contains two slices and

each slice contains two Lookup Tables (LUTs). One stage in the RO occupies one LUT.

One CLB is used for the 3-stage RO and two CLBs are used for the 5-stage and the 7-

stage RO. Six hundred 3-stage ROs and 300 5-stage and 7-stage ROs are mapped on each

chip.

Figure 4-4: Xilinx Spartan 2 CLBs layout.

The FPGA area is divided into two parts, left and right (three hundred CLBs on

each part). The experiment is run two times for each chip and RO stage. The first run

occupied the right area with ROs and the left area with other circuits needed, such as

MUX and counters. The blue boxes in Figure 4-4 show the occupied CLBs. As can be

seen on the right side of Figure 4-4, 300 ROs occupy half of the FPGA. The other half of

the FPGA is partially occupied by other logic used in FST. In the second run, the left area

is switched for other logic and right area for ROs. For each RO, the frequency is recorded

10 times. Overall, 18,000 frequencies for 3-stage ROs and 9,000 frequencies for each 5-

stage and 7-stage ROs are recorded.

Figure 4-5 shows the logic blocks for the FST test circuit. The challenge generator

produces the inputs to MUX that activate one RO at a time. Each RO is activated for 0.4

ms, and there is a 0.1 ms time period before the next RO is activated. This reduces the

noise in the form of heat that is generated from the adjacent CLB [20]. The RO is

activated from the top and moves down to the bottom of each column of the CLBs. A 0.2

ms gap between the RO and counter activation allows the signal to stabilize before the

measurement is started. The timing controller regulates all time intervals involved, such

as the time interval for each RO being activated and the time interval for the counter to

measure each RO.

Figure 4-5: FST circuit diagram.

Frequency is computed using Equation 4-10, where x is the cycle counts from

each RO and y is the cycle counts for the 50 MHz reference clock. The preset value for y

is set to be 7,000 cycles. That means the RO cycles are measured within a 0.14 ms

period. The accuracy of the measurement is 0.007 MHz/cycle which is adequate to record

the differences between frequencies generated from the ROs.

𝑓 = 𝑥 × 50

𝑦 MHz (4-10)

4.4 Results and Analysis

Response bits from 4, 5, and 7-stage ROs are generated to calculate diverseness,

uniformity, uniqueness, bit-aliasing, and reliability. The response bits are generated using

the chain-like neighbor coding method where neighboring ROs are compared [6]. The

first response bit generated from the comparison of RO1 is mapped in row 1 and column

1 of the CLB with RO2 mapped in row 2 and column 1 of the CLB. Equation 4-1 is used

for comparison.

Table 4.1 shows the diverseness, uniformity, uniqueness, and bit-aliasing for 4, 5,

and 7-stage ROs. The diverseness of frequencies for the 3-stage RO is the highest

compared to 5 and 7-stage. The results in Table 4.1 clearly show that as the number of

stages used in ROs is reduced, the diverseness of the RO frequencies increases. However,

there is a limitation on the usability of ROs with low number of stages because each

FPGA chip has a maximum operating frequency. The Spartan 2 FPGA family has the

maximum operating frequency of 200 MHz [19]. The lowest number of RO stages that

can be used on Spartan 2 FPGA is 4 because the average frequency generated from the 3-

stage RO on Spartan 2 is 182.77 MHz. The average frequency generated from the 3-stage

RO on Spartan 2 is 220 MHz, which exceeds the maximum operating frequency for

Spartan 2. If n RO is produces a frequency beyond the operating frequency of the FPGA,

the frequency from the RO cannot be measured correctly. Frequencies generated from the

RO for all stages are verified using the Agilent Logic Analyzer, where the RO output is

connected directly to the output pin of the FPGA board [17].

A high diverseness of RO frequencies is good for ROPUF because it indicates

that there are high amounts of frequency variations. High frequency variations are

desirable for generating a high number of good CRPs which are discussed later in this

section. For authentication in ROPUF applications, the challenge cannot be reused

because this reduces the security level of ROPUF as the response bits traverse the open

domain for verification and is susceptible to adversary attack [6]. This means that in

order to make the ROPUF effective, ample number of CRPs is needed.

A good RO should exhibit high diverseness for RO frequencies and should also

have good uniformity and uniqueness. As mentioned in Section 4.2, a good uniformity

and uniqueness average is 50%. For the uniformity average, the 5-stage ROs have the

highest value, and the 7-stage ROs have the lowest. However, the difference in

uniformity between the two is only 0.87%. The average uniformity results for all stages

used can be considered good as the values are close to 50%. High uniformity value means

the secret bits generated are uniformly distributed between 1s and 0s which is a desired

randomness characteristic.

Table 4.1: Uniqueness, bit-aliasing, diverseness, and uniformity for 3, 5, and 7-

stage ROs.

Stage Uniqueness (%) Bit-aliasing (%) Diverseness

(MHz) Uniformity (%)

3 40.178 47.0228 1.9469 47.0228

5 34.5596 47.7146 1.2375 47.7146

7 40.5797 46.1539 0.736 46.1538

Table 4.1 shows the 3-stage and 7-stage ROs have better uniqueness than the 5-

stage ROs for inter-chip measurements. Average uniqueness is obtained by comparing

the responses generated from all three FPGA chips. The higher the differences between

responses from each chip, the higher the value of uniqueness. It is important to make sure

that the uniqueness is high because this indicates that the ROPUF could generate unique

response from mass number of FPGA chips under the same challenge.

For bit-aliasing, the 5-stage RO has the highest percentage at 47.71%, and the 7-

stage has the lowest percentage at 46.15%. Nevertheless, all stages have good bit-aliasing

percentages that are close to 50%. Table 4.2 shows the diverseness frequencies for 4, 5

and 7-stage ROs for each FPGA chip used. The 3-stage ROs have the highest SD of RO

frequencies value for all three chips used compared to the other stages. These results are

consistent with the previous results presented in [17] where it was shown that as the

number of stages used in an RO is reduced, the diverseness of RO frequencies obtained is

higher. All three different FPGA chips showed the same pattern. It is found that the

diverseness of RO frequencies increases as the number of stages in ROs is reduced.

Table 4.2: SD for Chip 1, Chip 2, and Chip 3.

Standard Deviation (MHz)

CHIP 1 CHIP 2 CHIP 3

3-Stage 2.117440 2.586673 1.136851

5-Stage 0.938292 1.878756 0.895488

7-Stage 0.828066 0.821414 0.558649

We also study the effect of the number of stages on a ROPUF based on the

percentage of bit flip occurrences. To calculate the percentage of bit flip occurrences,

responses are recorded at different environmental conditions. In this experiment,

responses from 3, 5 and 7-stage ROs are generated at four different temperature settings,

as shown in Table 4.3. The experiment is conducted in a temperature controlled test

chamber. The frequencies from each RO are recorded 10 times at each temperature

setting. The responses are generated by comparing the average RO frequencies obtained.

The bit generation equation used is shown in Equation 4-1.

All responses obtained at various temperature settings are compared with the

responses generated at room temperature. The results obtained are shown in Table 4.3.

The lowest reliability is 97.32% at 0°C for 3-stage ROs. In this case 8 bits flipped out of

299 bits. The highest reliability is found to be 99.33% at 20°C for 4 and 7-stage ROs. In

this case 4 bits flipped, out of 599 bits for the 3-stage ROs, and 2 bits flipped out of 299

bits for the 5-stage ROs. From Table 4.3, it can be observed that reducing the number of

stages in ROs has no direct relationship with the percentage of bit flip occurrences.

Table 4.3: Reliability for Chip 3.

Reliability %

ROs stage 0°C 20°C 45°C 70°C

3 98.1636 99.3322 98.9983 98.9983

5 97.3244 98.9967 98.9967 97.9933

7 98.3278 99.3311 98.9967 98.6622

We also investigate the relationship between the number of stages used in ROs

and CRP generation. To do this, all possible comparison pairs need to be generated. It

should be noted that there is a differences between a challenge and comparison pairs. A

challenge is the selection of the comparison pairs to form a response bitstream. One

challenge can consist of many comparison pairs depending on the design of the challenge

and the length of the response. For example, a challenge that produces 128 bits reponse

might have 128 comparison pairs.

Figure 4-6 shows the list of possible challenge formations. The first three

response bits are generated from Pair 1, Pair 2, and Pair 3. Assume that comparison result

for Pair 1, Pair 2, and Pair 3 are 1,0, and 1, then response bits are 101. The challenges for

this response are the combination of the MUX inputs for Pair 1, Pair 2, and Pair 3 that are

0000 0001 0010. The number of possible challenges can be measure by n!/(n-r)!(r!)

where n is the number of available comparison pairs and r is the number of response bits.

As the number of available comparison pairs increases, the number of possible challenges

will also increase.

Figure 4-6: Difference between comparison pairs and CRPs.

The easiest way to generate all possible comparison pairs is by selecting a sort

algorithm that has O(n 2 ) complexity. As mentioned earlier, the comparison pairs

generated need to be good. This means each comparison pair needs to pass a certain

Frequency Difference Threshold (FDT). To determine the FDT, the frequency differences

at all bit flip occurrences on all FPGA chips are checked. The maximum frequency

difference that causes the bit flip is set as FDT. It is observed that the majority of bit flips

occur when the frequency difference between ROs is 1 MHz and below. The maximum

frequency difference where bit flips can occur is 3.5 MHz which is also the FDT.

The pseudocode of the algorithm used to generate the various comparison pairs is

shown below. The input to the algorithm are all RO frequencies generated at room

temperature. The algorithm compares the frequency difference between one RO with the

rest of the ROs available based on O(n 2 ) complexity. If the frequency difference passes

the FDT, then those ROs are selected as the comparison pair.

Comparison Pair Generation in pseudocode

Input: 1) 600 frequencies for 3-stage ROs and 300 frequencies for 5 and

7-stage ROs represented as RO frequencies(i).

2) n is equal to the number of ROs.

Output:

1) The list of all possible ROs comparison pairs that passed the

FDT represented as ROs comparison pair(k,i).

Algorithm 1. i <- 0, j <- 0, k <- 1 2. for i = 1 to n-1 3. for j = i + 1 to n 4. frequency difference = absolute (ROs frequecies(i)-RO

frequencies(j))

5. if frequency difference > FDT 6. comparison pair(k,1) = i 7. comparison pair(k,2) = j 8. k++ 9. end if 10. end for 11. End for

Table 4.4 shows the results obtained for comparison pair generation. It is

observed that the highest number of comparison pairs are generated from 3-stage ROs on

FPGA chip 2 at FDT equal to 2 MHz. The lowest number of comparison pairs are

generated from 7-stage ROs on FPGA chip 1 and 3 at FDT value equal to 3 and 3.5 MHz.

In general, Table 4.4 shows that the number of comparison pairs generated are higher

when the number of stages used in RO is reduced. As the FDT increases the number of

comparison pairs are reduced.

Table 4.4: Number of comparison pairs generated on Chip 1, Chip 2, and Chip 3.

Frequency Difference Threshold FDT (MHz)

2 2.5 3 3.5

FPGA Chip ROs stage Number of Comparison Pairs

3 45757 28746 16606 8685

5 5955 2749 1116 381

7 1221 167 8 1

3 102800 95161 86713 75831

5 22287 18422 14036 9776

7 1171 525 342 302

3 37932 23095 12769 7122

5 3811 2790 843 150

7 154 44 1 1

As mentioned earlier, the FDT used to filter all the bit flip occurrences is 3.5MHz.

In Table 4.4, it is observed that the 7-stage RO is adversely affected by the higher value

of FDT. The comparison pairs generated from 7-stage ROs are 302 on chip 2, and only 1

on chips 1 and 3. This shows that 7-stage ROs cannot be used in ROPUF as the lower

number of comparison pairs generated diminishes the ROPUF application. For 5-stage

ROs, the comparison pairs generated at FDT 3.5 MHz are very low for chips 1 and 3 (381

and 150). The 3-stage ROs have the highest number of comparison pairs that can be

generated at FDT equal to 3.5 MHz.

4.5 Summary

This experiment was run on the Xilinx Spartan 2 FPGA chip that uses 180 nm

semiconductor process technology. Therefore, conclusions are based on the Xilinx

Spartan 2 FPGA and cannot be generalized on different FPGA technology. For the

Spartan 2 FPGA chips, it can be concluded that the diverseness of RO frequencies

increases as the number of stages is reduced. The lowest number of stages that can be

used in an RO is dependent on the operating frequency of the FPGA chip. For Spartan 2

FPGA, the maximum operating frequency is 200 MHz. Therefore, the lowest number of

RO stages that can be used is 4 as the frequency produced from a 3-stage RO exceeds the

maximum operating frequency. This chapter shows that the lower number of stages used

in an RO does not compromise the uniqueness, uniformity, bit-aliasing, and reliability of

the ROs. The relationship between the number of stages used in the ROs and CRPs is

also established experimentally. It is found that more comparison pairs are generated

when lower number of stages is used.

Chapter 5

A Novel RPM Technique for Minimize Systematic

Variations

Because PUFs rely highly on process variations, the response bits generated are

governed by the systematic process variations which reduce the randomness in the

response bits. In this chapter, we describe a novel Random Patch Mixer (RPM) technique

to minimize the systematic variation effects on the response bits. The RPM technique is

applied on data obtained from FPGA chips. It is shown that the RPM technique

successfully nullifies the systematic variation effect on the response bits generated by the

ROPUF on FPGA. It also demonstrates that the responses generated after application of

the RPM Technique pass the NIST statistical test for randomness [21].

5.1 Introduction

The ROPUF produces a stream of ‘1’s and ‘0’s based on the process variation on

a silicon chip. The process variation is a random process that occurs during silicon chip

fabrication and is caused by inaccuracy in the fabrication process. This inaccuracy

produces a small delay that is not visible in the functional operation of the circuit on a

silicon chip. The ROPUF magnifies this small delay through the frequency generation

from a Ring Oscillator (RO). The difference in the frequencies generated by ROs is used

to generate a random binary bit stream, which is used for authentication or producing a

cryptographic encryption and decryption key [6]. The random binary bit stream is known

as the response. Each response is generated by a given challenge from the user. A

challenge is a binary bit stream that selects the RO pairs for comparison. Each challenge

produces a unique response.

The response generated from the ROPUF needs to be truly random for it to be

used for authentication or developing the cryptographic key. One way to verify true

randomness is by applying the NIST statistical test for randomness. Generating true

random response is one of the main challenges in a ROPUF. By default, a ROPUF does

not produce a true random response because the process variation is not completely

random [8]. In silicon chips, there are two types of variations, systematic and stochastic

[13]. Systematic variation is caused by process and equipment non-uniformity, dissimilar

interactions between circuit layout and chemical mechanical polishing process, and the

gradient of thermal annealing[13][27][28]. A stochastic variation is caused by the random

component that accounts for the difference between the observed data and the model

estimates. These include atomic-level stochastic phenomena, such as random dopant

profiles, any unidentified patterns, and measurement errors [3][28][29]. It seems that

systematic variation is more prevalent than stochastic variation in a ROPUF [8].

A systematic variation has a direct link with the true randomness in the response

generated from a ROPUF. It has been shown that the immediate output from a ROPUF

fails the NIST statistical test for randomness [29]. Therefore, the responses generated

from an ROPUF are not truly random [29]. Amongst the several techniques used to deal

with systematic variation is the regression based distiller [29]. The regression based

distiller is based on the polynomial regression and is applied before the secret selection

step. The regression based distiller has high computational cost when implemented on the

hardware.

In this chapter, a new Random Patch Mixer (RPM) technique used to cancel the

systematic variation effect on an FPGA chip is developed.

5.2 Systematic Variations and Bit Flip Minimization

5.2.1 The Effect of Systematic Variations

Shown in Figure 5-1 systematic variations cause neighboring frequencies to be

correlated to each other. The graph shows a repeating pattern. Response bits are

generated by comparing the neighboring ROs, RO-n and RO-n+1. The response is 1 if

RO-n is greater than RO-n+1, otherwise the response is 0. For RO-1 until RO-20, the

response A is 00110000100100100101, and for RO-21 until RO-40, the response B is

01010100110110100101. RO-1 to RO-20 are mapped on the first column of the CLBs

and RO-21 to RO-40 are mapped on the second column of the CLBs, as shown in Figure

5-2. The hamming distance for the two responses is 5, which is very low. This implies

that response A and B, are 75% similar to each other. This reduces the randomness of the

responses.

5.2.2 Regression Based Distiller

One way to normalize the systematic variation effect on frequency distribution is

to apply the regression-based distiller technique as proposed in [13]. This technique uses

polynomial regression to capture the systematic variation. The regression-based distiller

technique eliminates the systematic variation effect for polynomial regression of order 2

and above. The regression-based distiller has been tested on several challenge selection

techniques such as S-sequence, T-sequence, 1-out-of-8 coding and neighbor coding. The

responses generated from all 4 techniques are evaluated using the NIST randomness test,

but none of the responses fully pass the tests. Nonetheless the response from S-sequence,

T-sequence and 1-out-of-8 coding pass most of the tests. Only the neighbor coding

technique failed the entire test.

Figure 5-1: Frequencies across columns 1-3 of CLBs.

Figure 5-2: Location of ROs on Spartan 2 FPGA.

The regression-based distiller also incurs more computational cost. It can be seen

that majority of the challenge selection techniques require a polynomial regression of

order 2 or above to pass the NIST randomness test.

5.2.3 RPM Technique

The RPM technique is based on the uniform random number generated from the

pseudorandom number generator. There are three steps involved in this technique:

a) Generation of N uniform random numbers that range from 0 to 1 (N is equal to

number of ROs) as shown in Equation 5-1.

b) Normalization of the random numbers generated to the maximum value of RO

frequency difference from the average RO frequency on an FPGA chip as given

by Equation 5-2, 5-3, and 5-4. The normalized random numbers will be the Patch

of the RO frequencies.

c) Addition of Patch to the RO frequencies. We use Equation 5-5 to determine

how the Patch can be added to the RO frequencies.

𝑃𝑅𝑁 𝑥 = 𝑥𝑖 , 𝑥𝑖+𝑛, … 𝑥𝑛 {𝑖 = 1,2,3. . 𝑛} 0 < 𝑥 < 1 (5-1)

𝑓𝑎𝑣𝑔 = 1

𝑛 ∑ 𝑓𝑖

𝑛 𝑖=1 (5-2)

𝑎 = max{𝑓𝑖 − 𝑓𝑎𝑣𝑔 , 𝑓𝑖+1 − 𝑓𝑎𝑣𝑔 , … 𝑓𝑛 − 𝑓𝑎𝑣𝑔 } (5-3)

𝑥 ^ = 𝑎{𝑥𝑖 , 𝑥𝑖+1, … , 𝑥𝑛 } (5-4)

𝑓𝑖 ′ = {

𝑥𝑖 ^ + 𝑓𝑖 , 𝑖𝑓 𝑓𝑖 < 𝑓𝑎𝑣𝑔

𝑥𝑖 ^ − 𝑓𝑖 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

(5-5)

The RPM technique is designed to be simple and easy to implement on the

ROPUF circuit. The technique utilizes the normalized pseudorandom number (Patch) to

improve the frequency distribution randomness. The Patch used is differrent for each of

the FPGA chips used. The Patch is stored in the memory as part of the ROPUF cirucit. A

concern with ROPUF security is whether or not an adversary could predict the responses

if he knows Patch of a certain ROPOF. In a later section, the responses generated after

the RPM technique has been applied, will be shown to have no correlation with the

Patch; hence, the responses cannot be predicted from a Patch.

5.3 Results and Discussion

In this section, the RPM technique is applied to the frequencies obtained from 3-

stage ROs on 29 Spartan 3E FPGA chips. The Spartan 3E chip has 240 CLBs and each

CLB has 4 slices with two LUTs on each slice. Each stage occupies one LUT. Thus, a 3-

stage RO can be fitted in one CLB. A total of 240 3-stage ROs are mapped on each

FPGA chip.

5.3.1 Systematic Variation Effect on Frequency Distribution

Figure 5-3 shows the frequency distributions across Spartan 3E FPGA 1, 2, and 3

for 3-stage ROs. In Figure 5-3 (a), the ROs on the left side region of the FPGA chip 1

produce lower frequencies, and the ROs in the center region produce higher frequencies.

It can be observed that the frequency of each RO is close to the neighboring ROs’

frequency. A similar trend can be observed on all three FPGA chips shown in Figure 5-3.

(a)

Figure 5-3: 3-stage RO frequencies across Spartan 3E.

(a) Board 1

(b)

(c)

Figure 5-3: 3-stage RO frequencies across Spartan 3E.

(b) Board 2

It is observed that for the 29 Spartan 3E FPGA chips, high frequency ROs are

grouped mostly in the center region and low frequency ROs are distributed around the

high frequency ROs’ region. Figure 5-3 shows the effect of systematic variation on the

ROs frequency distribution.

5.3.2 Systematic Variations Minimization by RPM Technique

Figure 5-4 shows the ROs frequency distribution after the RPM technique is

applied to FPGA chips 1, 2, and 3. It can be observed that the RPM technique efficiently

increases the frequency distribution randomness and minimizes the systematic variation

effect on the frequency distribution across FPGA chips 1, 2, and 3. There are no more

low and high frequency regions, as can be observed in Figure 5-3.

(a)

Figure 5-4: 3-stage RO frequencies across Spartan 3E FPGA after RPM technique has

been applied.

(a) Chip 1

(b)

(c)

Figure 5-4: 3-stage RO frequencies across Spartan 3E FPGA after RPM technique has

been applied.

(b) Chip 2

Responses from FPGA Spartan 3E chip 1 before and after the RPM technique is

applied are compared to verify that the RPM technique successfully minimizes the

systematic variation effect on the responses generated. A response is generated using

neighbor coding. A total of 65 response bits are generated from each FPGA chip, as

shown in Table 5.1 and 5.2. The response bits are generated from the center part of the

Spartan 3E FPGA where the systematic variation is visibly present. The correlation for

responses before and after the RPM technique is applied is compared by measuring the

hamming distance (HD) percentage between each CLB column (comparisons are made

for CLB columns 7 until 11). For this comparison, an acceptable HD percentage is 50%

or higher, which means the responses generated between the CLBs columns have around

50% dissimilarity. Table 5.1 shows the response bits generated from each CLBs column

(table on the left) and the HD between neighboring CLBs column (table on the right)

before the RPM technique is applied. The results in Table 5.1 show that the HD

percentages for CLBs column 7-8 and 8-9 are low which represents the effect of the

systematic variation.

Table 5.2 shows the response bits generated from each CLBs column (table on

the left) and the HD between neighboring CLBs column (table on the right) after the

RPM technique is applied. The results in Table 5.2 show how the RPM technique

successfully minimized the systematic variation effect on the response generated. The

HD percentage for CLBs column 7-8 is increased from 15.38% to 46.15% and for CLBs

column 7-8 is increased from 30.77% to 53.85%

Table 5.1: Hamming Distance (HD) for FPGA chip 1 before RPM is applied.

CLBs Column

Hamming Distance HD between CLBs Column R

e sp

o n se

B it

7 8 9 10 11

7-8 8-9 9-10 10-11

0 0 0 0 0

0 0 0 0

1 1 1 1 1

0 0 0 0

0 0 0 0 1

0 0 0 1

1 0 0 1 0

1 0 1 1

1 1 1 0 1

0 0 1 1

0 0 1 0 0

0 1 1 0

1 1 1 1 1

0 0 0 0

1 1 0 1 0

0 1 1 1

0 0 0 0 1

0 0 0 1

1 1 0 1 0

0 1 1 1

1 0 1 0 0

1 1 1 0

0 0 0 0 1

0 0 0 1

1 1 1 1 0

0 0 0 1

2 4 6 8

HD Percentage % 15.38 30.77 46.15 61.54

Table 5.2: Hamming Distance (HD) for FPGA chip 1 after RPM is applied.

CLBs Column

Hamming Distance HD between CLBs Column

R e sp

o n se

B it

7 8 9 10 11

7-8 8-9 9-10 10-11

1 1 1 1 0

0 0 0 1

1 0 0 1 0

1 0 1 1

0 0 0 1 1

0 0 1 0

1 1 0 0 0

0 1 0 0

0 0 1 0 1

0 1 1 1

0 0 0 0 0

0 0 0 0

0 1 0 1 1

1 1 1 0

1 0 1 0 1

1 1 1 1

0 0 1 0 1

0 1 1 1

0 1 0 1 0

1 1 1 1

1 1 0 1 0

0 1 1 1

1 0 0 1 1

1 0 1 0

0 1 1 0 0

1 0 1 0

6 7 10 7

HD Percentage % 46.15 53.85 76.92 53.85

The responses from Patch values for each FPGA chip used are generated to

measure the correlations between a Patch and the responses generated from the ROs

frequency after applying the RPM technique. A total of 239 bits are generated from each

Patch and ROs’ frequency. This is done to ensure that no correlation exists between the

Patch and response generated. If there is a correlation between the Patch and response

generated, the level of ROPUF security is compromised. HD percentage is used to

measure the correlation between the response generated from the Patch and the ROs’

frequency. The average HD obtained for 29 FPGA chips is 49.97%. Therefore, there is no

correlation between the Patch and the response.

5.3.3 NIST Statistical Test for Randomness

The responses generated from a ROPUF need to be truly random to ensure a good

security level for the ROPUF. The NIST Statistical Test for Randomness can be used to

measure the randomness feature inside the response generated from a ROPUF. In this

research, we tested responses generated by using neighbor coding, and an 8-to-1 selection

technique for 29 FPGA chips.

A total of 240 ROs are used to generate the responses. Results obtained for the

NIST statistical test for randomness are shown in Table 5.3. The responses generated

after applying the RPM technique by using the Neighbor Coding selection passed the

entire test except for ‘runs’ test. The responses generated from the 8-to-1 selection passed

all the tests.

Table 5.3: NIST statistical test for randomness results.

Success Percentage (%)

Statistical test

NIST input

parameters

Neighbor

Coding 8-to-1

Frequency

100 100

BlockFrequency 128 100 100

CumulativeSums

100 100

Runs

Failed 100

LongestRun

60 100

FFT

100 100

NonOverlapping

Template 9 80 80

OverlappingTemplate 9 100 100

ApproximateEntropy 4 100 100

Serial 16 80 100

LinearComplexity 500 100 100

The results obtained are better than the regression-based distiller technique results [3].

None of the polynomial distiller order makes meaningful improvement in the randomness

for the neighbor coding selection. For the 8-to-1 selection, the responses passed all of the

tests only when the 4 th

order distiller is applied.

5.4 Summary

As a ROPUF utilizes the process variations to generate the secured response bits,

vulnerability still exists. The systematic variations dominated the overall process

variations; therefore it is important to nullify the systematic variation effect and increase

the true randomness on the response generation in a ROPUF. To address this issue, we

propose our RPM technique which gives better results in terms of the response

randomness generated from the ROPUF.

Chapter 6

Temperature, Voltage, and Aging Effects on ROPUFs

Function

6.1 Introduction

Silicon physical unclonable functions (PUFs) take advantage of the random

process variations inherent in silicon chips. The random process variations are unique for

each chip and cannot be modeled. This randomness and uniqueness characteristics of

silicon chips have been exploited by researchers in designing PUFs for hardware security.

As PUFs are highly reliant on process variations in the chip, it is desirable that they

should be resilient towards other temporal changes. The changes may occur due to

exposure to temporal variabilities which can be observed in the frequency of the Ring

Oscillator. The temporal variabilities can be divided into reversible and irreversible

variabilities. The reversible variability causes temporary changes to the circuit’s behavior

inside the silicon chip and can be caused by the environmental variations such as voltage

and temperature. However, overexposure to high voltage and temperature may lead to

irreversible variability [30]. Irreversible variability can also be caused by silicon chip

aging. There are three types of aging effects, the first one is the hot carrier injection or

HCI, the second is the trap charge in the dielectric due to bias temperature instability, and

the third is the oxide breakdown due to electrically active defects known as traps. These

traps occur within the dielectric. In this study, we simulate the first type of aging that is

caused by HCI. The HCI causes the electrical charges to build up within the dielectric

layer thereby increasing the threshold voltage needed to turn the transistor on. The

increased threshold voltage results in increased transistor switching time, thus slowing

the transistor speed [31].

In this chapter, we present accelerated aging experimental results along with

temperature and voltage effects done under normal environmental conditions on 9

Spartan 3E FPGAs. ROs having 3, 5, and 7-stages are mapped on Spartan 3E FPGAs.

Voltage and temperature variation experiments are performed separately (3 and 5-stage

ROs).

6.2 Background

We briefly describe work done in the past to study the effect of temperature,

voltage, and aging on ROs. Accelerated aging experiment for 5-stage ROs mapped on 90

nm FPGAs is presented in [32]. In this study, it is observed that aging causes ROPUF

responses to be unreliable. Simulated aging on ROs using HSPICE is presented in [33]. It

is observed that 4% of the ROPUF bits are prone to instability due to aging. Temperature

and voltage effects on ROs have been analyzed in [11]. This study concludes that

ROPUF reliability reduces due to voltage and temperature variations. Whereas prior work

is focused on studying the effect of temporal changes for fixed stage ROPUFs, we

analyze the effect of temporal changes on ROPUFs having different stages.

6.2.1 Ring Oscillator PUF response

RO frequency is typically generated from a series of inverters comprising the RO

loop. The presence of process variation inside the chip causes uneven delays across the

chip. Hence a pair of ROs mapped at two different chip locations produces two different

frequencies: fa and fb. Frequencies fa and fb are compared to see which one has the

higher frequency. If fa is greater than fb, a response bit 1 is generated; otherwise the

response is 0.

6.2.2 Number of Stages in Ring Oscillator

In this experiment, ROs having three different stages are used. The 5-stage RO

consists of one NAND gate and 4 inverter gates as shown in Figure 4-1. The NAND gate

is used to control the on and off switching of the RO. The RO is activated (starts to

produce an oscillation) when the input is set to high. The 3-stage RO consists of one

NAND gate, one buffer, and two inverter gates. The buffer is used instead of the inverter

to obtain an odd number of inversions. The buffer gate is added to increase the total delay

in the RO in order to reduce the RO frequency. The 7-stage RO consists of one NAND

gate and 6 inverters.

6.3 Experimental Setup

The experimental circuitry is shown in Figure 4-5. The challenge generator is used

to produce the inputs to the MUX which activates one RO at a time. ROs are activated,

one at a time, from the top to the bottom of each column of the FPGA. Each RO is

activated for 0.4 ms. There is a 0.1 ms delay before the next RO is activated; this is to

reduce the noise in the form of heat that can be generated from the adjacent CLB [20]. A

0.2 ms delay gap is given between the RO and the counter activation for the signal to be

stabilized before the measurement starts. The timing controller controls all time intervals

involved, such as the time interval for the RO activation and the time interval for the

counter to measure each RO.

The frequency is computed using x × (50/y), where x is the cycle counts from each

RO and y is the cycle counts for the 50 MHz reference clock. The preset value for y is set

to 10000 cycles implying that the RO cycles are measured within a 0.2 ms period. The

accuracy of the measurement is 0.005 MHz/cycle which is good enough to measure the

differences between frequencies generated from the ROs.

For the accelerated aging experiment on Spartan 3E, each RO is activated every

64 ms. Each activation turns on the RO for a time period of 0.4 ms. Thus, each RO is

activated 1.3 million times a day. This aging experiment is conducted for 30 days. The

number of ROs mapped on Spartan 3E is 120. ROs are numbered according to the

location they are mapped on the Spartan 3E as shown in Figure 6-1. Responses are

generated by using a neighbor chain scheme where RO(n) is compared with RO(n+1). In

total, there are 119 response bits generated from 120 ROs.

Figure 6-1: ROs numbering system based on spatial location.

6.4 Results and Analysis

Table 6.1 shows the total number of bit flip occurrences for Spartan 3E FPGAs

for the 30 day aging period. Responses from all FPGAs are recorded once every day.

Thus 30 responses are recorded for 30 days from each FPGA. These responses are

compared to the responses generated at normal setting to measure the bit flip occurrences.

The total bit flip occurrences for 3, 5, and 7 stage ROs are found to be 192, 250 and 267,

respectively (three FPGAs are used for each RO stage). FPGA 3 has the lowest number

of bit flip occurrences of 18. This is because many of the RO comparison pairs in FPGA

3 have a high frequency difference. On the other hand, FPGA 9 has the highest number of

bit flip occurrences since many of the RO comparison pairs have a low frequency

difference. Figure 6-2 (a) shows the example of the bit flip occurrence when the

difference between RO comparison pair is small (below 1 MHz) [6]. The frequency

generated from the blue RO tends to reduce faster when compared to the green RO’s

frequency when the temperature is increased, therefore bit flip occurs. Figure 6-2 (b)

shows how the bit flip occurrence can be prevented by selecting an RO comparison pair

that has higher frequency difference. It is important to note that most of the bit flips occur

at the same bit locations which have lower frequency difference in the RO pairs.

Table 6.1: Bit flip occurrences on Spartan 3E.

3-stage ROs 5-stage ROs 7-stage ROs

FPGA FPGA FPGA

1 2 3 4 5 6 7 8 9

88 86 18 75 78 97 82 82 103

Total 192 250 267

(a) (b)

Figure 6-2: The relationship between the RO frequency distance and the probability of a

PUF output flip.

Table 6.2 shows the bit flip occurrences for 3, 5, and 7-stage ROs with respect to

frequency differences in RO pairs. For 5 and 7-stage ROs, most of the bit flips occur

when the frequency difference in RO comparison pairs is lower than 0.4 MHz. Few bit

flips occur when the frequency difference lies between 0.3 and 0.7 MHz. For the 3-stage

ROs, maximum bit flips occur when the frequency difference is lower than 0.3 MHz.

There are some bit flips at the higher frequency range. These results suggest that the 3-

stage ROs are more susceptible to noise compared to 5 and 7-stage ROs. For the

maximum frequency difference (1.0-1.2 MHz), the number of bit flips in the 3-stage

ROs is 8 which still can be considered as low since 3-stage ROs have a standard

deviation of 1.9 MHz compared to 1.2 and 0.7 MHz for 5 and 7-stage ROs, respectively

[34]. High standard deviation in ROs implies that the range between the minimum and

maximum frequency used in the ROPUF is high. Therefore, many RO comparison pairs

that have high frequency differences are generated.

Figure 6-3 (a), (b), and (c) show the frequency changes due to aging effects for 10

different ROs for 3, 5, and 7-stages. It can be seen that the 3, 5, and 7-stage ROs have

minimal frequency fluctuations for the 30 day aging period. Some frequencies overlap

(e.g. RO1 and RO2, RO4 and RO9 in Figure 6-3 (b)). This illustrates how bit flips can

occur. It can also be seen from Figure 6-3 that there is no significant difference in

frequency fluctuations, as a result of aging, when different number of stages are used in

the ROPUF.

Table 6.2: Bit flip occurrences due to aging on Spartan 3E.

RO Pairs’ Frequency Differences Ranges (MHz)

Bit Flip Occurrences

3-stage 5-stage 7-stage

0-0.09 46 109 182

0.1-0.19 42 114 70

0.2-0.29 46 14 7

0.3-0.39 11 12 1

0.4-0.49 12 0 0

0.5-0.59 7 1 1

0.6-0.69 5 0 2

0.7-0.79 12 0 1

0.8-0.89 2 0 0

0.9-0.99 1 0 0

1.0-1.2 8 0 0

(a)

(b)

(c)

Figure 6-3: Frequency changes in ROs due to the aging effect on Spartan 3E.

(a) 3-stage ROs

(b) 5-stage ROs

Figure 6-4 (a) and (b) show the frequency changes with respect to the temperature

variations for 40 ROs that are mapped at different spatial locations on the same FPGA

chip for 3 and 5-stage ROs. Responses are generated at three different environment

temperatures, namely, room temperature, 45°C and 70°C. Different temperatures are

generated using temperature chamber as shown in Figure 6-6. It is observed that the ROs

are sensitive to the temperature variations. As the environment temperature increases,

both 3 and 5-stage RO frequencies decrease uniformly. Similar patterns are observed for

all RO frequencies at each of the three different environment temperatures which

suggests that the effect of the temperature variations are uniformly distributed on the

FPGA chip.

Figure 6-5 (a) and (b) show the frequency changes with respect to the voltage

variations for 40 ROs that are mapped on different locations (as shown in Figure 6-1) in

the same FPGA chip for 3 and 5-stage ROs. Responses are generated for three different

internal core supply voltages (VCCINT), namely, 1.2V (normal), 1.3V and 1.4V. It is

observed that both 3 and 5-stage RO frequencies are sensitive to the voltage variations.

The average frequency increment is 20 MHz for 1.3V and 50 MHz for 1.4V when

compared to the normal 1.2V VCCINT. Although the frequency has high increment with

respect to the higher VCCINT, the RO frequency for 3 and 5-stage ROs follow the same

pattern which implies the voltage variation effect is uniformly distributed throughout the

FPGA chip. The bit flips do occur due to temperature and voltage variations but only

when the frequency difference in the RO comparison pair is lower than 1.5 MHz.

(a)

(b)

Figure 6-4: Frequency changes with respect to the temperature variations on Spartan 3E.

(a) 3-stage ROs

(b) 5-stage ROs

(a)

(b)

Figure 6-5: Frequency changes with respect to the voltage variations on Spartan 3E.

(a) 3-stage ROs

(b) 5-stage ROs

Figure 6-6: Temperature chamber.

Table 6.3 shows the percentages of bit flip occurrences due to temperature,

voltage variations, and aging on 9 Spartan 3E FPGAs. For temperature variations, the

responses from ROPUF are generated at three different settings: room temperature, 45°C,

and 70°C. For voltage variations, the responses are generated using three different

internal core supply voltages: 1.2V (normal), 1.3V, and 1.4V.

Responses generated at different temperature and voltage settings are compared

with the responses generated at normal settings to measure the percentage of bit flip

occurrences. As the number of stages increases, the percentage bit flip occurrences also

increase (except at 70°C). Voltage variations seem to be causing the most bit flip

occurrences followed by temperature and aging. The maximum bit flip percentages for 3,

5, and 7-stage ROPUFs are 2.8%, 5.6%, and 8.4%, respectively. Based on our

experimental results, we conclude that bit flips occur only when the frequency difference

in the RO comparison pair is lower than 1.5 MHz

Table 6.3: Percentage of bit flip occurrences.

Percentage of bit flips occurrences (%)

RO number of stages Temperature Voltage

Aging 45°C 70°C 1.3V 1.4V

3 2.24 1.96 2.24 2.80 1.79

5 2.52 3.92 5.88 5.60 2.33

7 4.20 3.08 7.28 8.40 2.46

The results presented in this chapter suggest that the temporal variabilities can

affect the ROPUF functionality only if the frequency difference between RO comparison

pair is low. We propose that a high threshold be used to select the RO comparison pairs

in ROPUF to prevent the effect of temporal variabilities. Table 6.4 shows the number of

RO comparison pairs generated based on different frequency thresholds. The comparison

pairs are generated using select sort algorithm that has O(n 2 ) complexity [34].

The higher number of RO comparison pairs are also required for better security

[34]. The results in Table 6.4 suggest that the 3-stage RO has better security feature as it

has the highest number of RO comparison pairs compared to the 5 and 7-stage ROs.

Table 6.4: Number of comparison pairs according to threshold frequency.

Threshold Frequency (MHz)

2 2.5 3 3.5

ROs stage Number of RO comparison pairs

3 4671 4158 3682 3152

5 4043 3446 2860 2336

7 3128 2344 1701 1191

6.5 Summary

In this chapter, we study the effect of accelerated aging, voltage, and temperature

variations for different number of stages used in a ROPUF. Our experimental results show

that RO frequencies are sensitive to aging, voltage, and temperature regardless of the

number of RO stages used in a ROPUF. The percentage of bit flips is observed to increase

as the number of stages increase. Most bit flips occur when the frequency difference

between RO comparison pairs is low. We suggest that only RO comparison pairs that have

high frequency differences be used in a ROPUF in order to reduce temporal variabilities.

Our work shows that the 3-stage ROPUF has the lowest percentage of bit flip occurrences

and is more secure.

Chapter 7

A Comparative Study of Ring Oscillator PUFs on

Different FPGA Families

7.1 Introduction

ROPUF utilizes ring oscillators (ROs) to exploit the process variation inside a

silicon chip to generate a unique ID. A typical ROPUF comprises of ring oscillators,

multiplexers (MUXs), counters, and a comparator. A ROPUF can generate a binary bit

stream (response) from a given input bit stream (challenge). A ROPUF can generate

multiple sets of responses from different sets of challenges. A challenge that produces a

response is known as a challenge-response pair (CRP).

Earlier studies have shown that ROPUF can be implemented on FPGAs [3][44].

The fact that ROPUF circuits do not need to be symmetric compared to other types of

PUFs, such as Arbiter PUF (APUF) and Butterfly PUF (BPUF) that require a stringent

symmetric circuit makes the ROPUFs attractive. The only requirement in ROPUF circuit

is that the ROs need to be identical, and this can be achieved by creating a hard macro for

the RO and instantiating it as many times as needed. If two ROs are identical then the

difference in the frequencies generated is due to process variation.

FPGA security is a concern among the FPGA manufacturers. FPGAs are prone to

several security issues such as IP protection, cloning, side channel attack and tampering.

Xilinx, for example, has introduced Device DNA as the additional security feature in

some of its FPGA chips [45]. ROPUF can be used as an additional security feature in

FPGAs. Any tampering attempt by hackers will change the unique parameters of the

process variation [3]. ROPUF can be applied as a secret bit generator (which is known as

response in PUF applications) where it can generate n bits of response for authentication

purpose. Besides that, response bits generated from ROPUF can be applied as a

cryptography key to encode and decode secure information [6].

Despite the promising solution offered by ROPUF, there are still challenges that

need to be overcome for ROPUF to become a practical solution. Making the ROPUF

response better in uniqueness and increasing in reliability are among the challenges.

Uniqueness refers to the ability of similar ROPUF circuits to generate unique responses

on different chips. Reliability refers to the generation of same response under various

environmental conditions such as temperature and voltage. ROPUF’s reliability can also

be affected by silicon aging.

Current FPGA families are fabricated using the latest silicon technology which

provides smaller transistor size. Smaller transistors size gives better performance on the

FPGA chip in terms of speed and power consumption, but in terms of the performance of

ROPUF implementation on FPGA, it still needs to be studied [6]. As the silicon

technology shrinks, the process variation parameters will also change [13]. In this work,

we analyze ROPUF parameters on two different Xilinx FPGA families that use different

silicon technologies; 28 nm technology (Artix 7) and 90 nm technology (Spartan 3E).

The work focuses on:

1) ROPUF’s comparison on two FPGA families that use different

silicon technologies: We compare ROPUF’s responses from two different

FPGA families in terms of five parameters; uniqueness, reliability,

uniformity, bit aliasing, and diverseness.

2) Temperature, voltage, and aging effects: For reliability, we compare the

responses generated at different temperature and voltage settings. We also

compare the responses generated through an accelerated aging

experiment.

7.2 Related Work

Some work has been done in the past to study ROPUF performance on FPGA.

Large scale characterization of ROPUF on Spartan 3E (90 nm silicon technology) FPGAs

has been done in [11]. They show that the average inter-die hamming distance (HD) for

ROPUF is 47.31% and the average intra-die HD is 0.86% at normal operating condition.

The hamming weight (HW) for ROPUF responses is shown to lie between 46% and 56%.

In [6], implementation of ROPUF on Virtex 4 (90 nm silicon technology) FPGAs is

presented. It is shown that the inter-chip HD is 46.15%. The accelerated aging

experiment on 5-stage ROs mapped on Spartan 3E FPGAs is presented in [46]. It is

observed that aging causes ROPUF responses to be unreliable. Simulated aging on ROs

using HSPICE is shown in [47]. It is observed that 4% of the ROPUF bits are prone to

instability due to aging. The experiment on temperature and voltage effects on ROs is

presented in [11]. It is shown that ROPUF reliability reduces due to voltage and

temperature variations.

7.3 Background

7.3.1 Ring Oscillator PUF response

RO frequency is typically generated from a series of inverters comprising the RO

loop. The presence of process variation inside the chip causes uneven delays across the

chip. Hence a pair of ROs mapped at two different chip locations produces two different

frequencies: fa and fb. Frequencies fa and fb are compared to see which one has the

higher frequency. If fa is greater than fb, a response bit 1 is generated; otherwise the

response is 0.

7.3.2 Number of Stages in Ring Oscillator

In this experiment, 5-stage ROs are used. The 5-stage RO consists of one NAND

gate and 4 inverter gates as shown in Figure 4-1. The NAND gate is used to control the

on and off switching of the RO. The RO is activated (starts to produce an oscillation)

when the input is set to high.

7.3.3 ROPUF parameters

For PUF implementations, different researchers have used different parameters in

the past [11][34][48]. In this work, we use five of the most common parameters. These

parameters are uniqueness, reliability, uniformity, bit-aliasing and diverseness. The

uniqueness can be measured by comparing the Hamming Distance (HD) between

responses from different FPGA chips in the same family. The equation used to measure

the uniqueness is shown in Equation 7-1:

𝑈𝑛𝑖𝑞𝑢𝑒𝑛𝑒𝑠𝑠 = 2

𝑚(𝑚−1) ∑ ∑

𝐻𝐷(𝑅𝑢,𝑅𝑣)

𝑛 × 100% 𝑚𝑣=𝑢+1

𝑚−1 𝑢=1 (7-1)

where, m is the number of chips used, u and v are the two chips being compared,

and n is the number of responses generated. Ru and Rv are the response from the same

challenge C for chips u and v. HD is the hamming distance between the responses

generated from chips u and v. The higher uniqueness percentage represents the better

uniqueness in the response generated from ROPUF. But considering the large number of

response bits, a good uniqueness percentage should be around 50%. This means that at

least 50% of the responses generated from chip u and v differ from each other (responses

obtained by given the same challenge to chip u and v).

The reliability can be measured by comparing the response from the same FPGA

chip that is generated under different environmental conditions such as temperature and

voltage. The equation used to measure the reliability is shown in Equations 7-2 and 7-3.

Rs is the response from chip i at normal operating condition (at room temperature and

normal operating voltage). Rs,t is t-th sample of R’s response from chip i at a different

operating condition such as different temperature and voltage settings. A good reliability

value is 100%. As can be seen in Equation 7-3, if the HD intra is low or zero, then the

reliability will be around 100%.

𝐻𝐷 𝐼𝑛𝑡𝑟𝑎 = 1

𝑘 ∑

𝐻𝐷(𝑅𝑠,𝑅′𝑠,𝑡)

𝑛 × 100%𝑘𝑡=1 (7-2)

𝑅𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 100% − 𝐻𝐷 𝐼𝑛𝑡𝑟𝑎 (7-3)

The uniformity and bit-aliasing parameters can be measured by using Hamming

Weights (HWs) as shown in Equation 7-4 and 7-5 where rs,l is the l-th binary bit. The HW

of the response from an FPGA chip represents the uniformity and the HW of the

responses from different FPGA chips represents the bit-aliasing. The HW for bit aliasing

is measured across the same bit location in responses from different FPGA chips. A good

value for uniformity and bit aliasing is around 50%, which means the response from RO

is well distributed between ‘0’s and ‘1’s.

𝑈𝑛𝑖𝑓𝑜𝑟𝑚𝑖𝑡𝑦 = 1

𝑛 ∑ 𝑟𝑠,𝑙 × 100%

𝑛 𝑙=1 (7-4)

𝐵𝑖𝑡 − 𝑎𝑙𝑖𝑎𝑠𝑖𝑛𝑔 = 1

𝑚 ∑ 𝑟𝑠,𝑙 × 100%

𝑚 𝑖=1 (7-5)

The diverseness of the frequency can be measured by using standard deviation as

shown in Equation 7-6, 7-7 and 7-8. Diverseness represents the range of the frequency

generated from the ROs. A ROPUF’s diverseness that has a value which is close to 0

shows that the ROs’ frequencies tend to be very close to the ROs’ mean frequency. A

high diverseness shows that the frequencies are spread out over a wider range of values.

The advantages of having higher diverseness have been discussed in detail [34]. In

equation 7-6, h is the number of ROs, fi,j is frequency for each RO, fi,j,q is the q-th

frequency sample of the j-th RO in the i-th chip. favg is the average frequency of the ROs

on an FPGA chip.

𝐷𝑖𝑣𝑒𝑟𝑠𝑒𝑛𝑒𝑠𝑠 = √ 1

ℎ−1 ∑ (𝑓𝑖,𝑗 − 𝑓𝑎𝑣𝑔 )

2ℎ 𝑗=1 (7-6)

𝑓𝑖,𝑗 = 1

𝑞 ∑ 𝑓𝑖,𝑗,𝑞

𝑞 𝑞=1 (7-7)

𝑓𝑎𝑣𝑔 = 1

ℎ ∑ 𝑓𝑖,𝑗

ℎ 𝑗=1 (7-8)

7.4 Experimental Setup

In this work, ROPUF performance on 29 Xilinx Spartan 3E and 20 Xilinx Artix-7

FPGA chips is analyzed. Test circuitry that runs completely on the FPGA chip has been

developed. The ROs’ frequencies are recorded using Agilent 16801A logic analyzer. The

architecture of the design is shown in Figure 4-5. The challenge generator is used to

produce the inputs to the MUX which activates one RO at a time. ROs are activated, one

at a time, from the top to the bottom of each column of the FPGA. Each RO is activated

for 0.4 ms. There is a 0.1 ms delay before the next RO is activated; this is to reduce the

noise in the form of heat that can be generated from the adjacent CLB [20]. A 0.2 ms

delay gap is given between the RO and the counter activation for the signal to be stabilized

before the measurement starts. The timing controller controls all time intervals involved,

such as the time interval for the RO activation and the time interval for the counter to

measure each RO.

The frequency is computed using Equation 7-9 where x is the cycle counts from

each RO and y is the cycle counts for the 50 MHz reference clock. The preset value for y

is set to 10000 cycles implying that the RO cycles are measured within a 0.2 ms period.

The accuracy of the measurement is 0.005 MHz/cycle which is good enough to measure

the differences between frequencies generated from ROs.

𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 = 𝑥 × 50

𝑦 (7-9)

For the accelerated aging experiment on Spartan 3E and Artix-7 FPGAs, each RO

is activated every 64 ms and 107.5 ms respectively. Each activation turns the RO on for a

time period of 0.4 ms. Therefore, each RO is activated 1.3 million times a day for Spartan

3E and 0.8 million times a day for Artix-7. This aging experiment is conducted for 30

days. The number of ROs mapped on Spartan 3E and Artix-7 is 120 and 171 respectively.

ROs are numbered according to the location they are mapped as shown in Figure 6-1.

Responses are generated by using a chain-like neighbor coding where RO(n) is compared

with RO(n+1). In total, there are 119 response bits generated from 120 ROs for Spartan

3E and 170 response bits are generated from 171 ROs for Artix-7.

7.5 Results and Analysis

ROs are mapped on all the CLBs available on the Spartan 3E FPGAs, and on half

of the CLBs available on Artix-7 to record the frequencies. Responses are generated from

the frequencies recorded. Chain-like neighbor coding technique is used to select the RO

comparison pair [3]. Equation 7-10 is used to generate the response. Table 7.1 shows the

uniqueness, reliability, uniformity, bit aliasing and diverseness results for both FPGA

families used in this experiment.

𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒 𝑏𝑖𝑡 = { 1 𝑖𝑓 𝑓𝑎 > 𝑓𝑏

0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (7-10)

Table 7.1: ROPUF’s parameters comparison.

Spartan 3E

(90nm)

Artix-7

(28nm)

Uniqueness (%) 39.79 45.15

Uniformity (%) 51.25 50.17

Bit aliasing (%) 50.54 50.17

Reliability (%) 96.34 97.28

Diverseness 2.09 3.88

7.5.1 ROPUF Uniqueness

ROPUF responses on Artix-7 has the highest uniqueness percentage (45.16%)

compared to Spartan 3E (39.79%). Each response from Artix-7 and Spartan 3E contains

780 bits and 239 bits respectively. The Artix-7 used in this experiment has 101,440 logic

cells compared to the Spartan 3E that has 2160 cells. Thus Spartan 3E has limited

resources compared to Artix-7. The maximum number of ROs that can be mapped on

Spartan 3E is 240. Artix-7 uniqueness is shown to be closer to the ideal uniqueness value

of 50%. Spartan 3E uniqueness seems to be a little bit far from the ideal uniqueness

value.

Figure 7-1 and Figure 7-2 show the planar view of graphs for RO frequencies

versus RO locations for Spartan 3E and Artix-7. These graphs are plotted to better

understand the uniqueness difference in ROPUF responses between these two FPGA

families. Figure 7-1 shows RO frequencies from three Spartan 3E FPGAs. The dark blue

blocks are the inaccessible area on the FPGA. It can be observed that ROs with high

frequency are mostly distributed in the red circle. The same observation can be made for

all 29 Spartan 3E FPGAs where most of the ROs with high frequencies are located in the

middle of the FPGA. Figure 7-2 shows RO frequencies for the three Artix-7 FPGAs. In

Figure 7-2, (a) and (b), it can be observed that most of the ROs with high frequency are

located in the top part of the FPGAs. But this same observation is not found in the 20

Artix-7 FPGAs.

(a) (b) (c)

Figure 7-1: RO frequencies versus location on Spartan 3E.

(a) FPGA 1

(b) FPGA 2

(a) (b) (c)

Figure 7-2: RO frequencies versus location on Artix-7.

(a) FPGA 1

(b) FPGA 2

The high frequency distribution at the same FPGA location in Spartan 3E FPGAs

used in this experiment demonstrates the effect of systematic variation. There are two

types of process variations: systematic and stochastic variation [13]. The systematic

variation is usually caused by the mask, lithographic, and reticle stepper errors.

Systematic variation has high correlation on all ICs that are manufactured on the same

line. The stochastic variation is caused by the vibrations during lithography, wafer

unevenness and non-uniformity in resist thickness. Stochastic variation possesses more

random characteristics, which are different for each chip [13]. Stochastic variation’s

effect can be observed when there is a random high and low RO frequency distribution.

On the contrary, systematic variation’s effect can be observed when there is a certain

pattern of high or low RO frequency distribution in a group of FPGAs. ROPUF response

uniqueness decreases due to the systematic variation on FPGAs. Response bits generated

from ROs located in an area that is affected by systematic variation tends to be the same

[49].

7.5.2 ROPUF Uniformity

As far as uniformity is concerned, the Spartan 3E and Artix-7 chips both have

good uniformity percentages (51.25% and 50.17%) which represent a good balance

between ‘1’ and ‘0’ bits in the responses. The uniformity result shows that the ROPUF

carry the randomness feature in the response within the FPGA chip regardless of the

change in the silicon technologies used. This result also shows that the systematic

variation effect that is observed in Spartan 3E FPGAs does not affect the ROPUF’s

uniformity.

7.5.3 ROPUF Bit Aliasing

The bit aliasing percentages for Spartan 3E (50.54%) and Artix-7 (50.17%) chips

are close to the ideal value of 50%. These results represent that there is a balance in the

bits ‘1’ and ‘0’ composition across the same bit location in the responses from different

FPGAs. The ROPUF responses are observed to carry the randomness feature between the

FGPAs in the same family regardless of the change in the silicon technologies used.

7.5.4 ROPUF Reliability

The ROPUF’s responses are generated at different temperature and voltage

settings to measure the reliability. Temperature variation experiment is done using a

temperature chamber. Four different temperature settings are used: 0°C, 20°C, 45°C, and

70°C. For voltage variations, three different internal core supply voltages (VCCINT) are

used for Spartan 3E: 1.2V (normal), 1.3V, and 1.4V. For Artix-7, two different VCCINT are

used: 1.0V (normal) and 1.2V. The responses that are generated at the different

temperature and voltage settings are compared with the responses that are generated at

room temperature. The 30 day accelerated aging experiment is also implemented to

extend the ROPUF’s reliability study on FPGAs that are fabricated using different silicon

technologies.

The average ROPUF’s reliability for Spartan 3E and Artix-7 is 96.34% and

97.28%, respectively. Table 7.2 shows the individual ROPUF’s reliability due to

temperature, voltage, and aging effects on ROPUF. It can be seen that Artix-7 reliability

is higher than Spartan 3E for temperature, voltage, and aging. The lowest ROPUF’s

reliability for both Spartan 3E and Artix-7 is caused by the voltage variations that are

94.26% and 94.82% respectively. The ROPUF’s reliability on the temperature and

voltage effects for both Spartan 3E and Artix-7 is fairly high.

Table 7.2: ROPUF’s reliability due to change in temperature, voltage, and aging.

Reliability (%)

Spartan 3E Artix-7

Temperature 97.10 98.16

Voltage 94.26 94.82

Aging 97.67 98.86

Figures 7-3 and 7-4 show the RO frequencies with respect to the temperature and

voltage variations for Spartan 3E and Artix-7, respectively. Figure 7-3 (a) and Figure 7-4

(a) show how the RO frequencies on Spartan 3E and Artix-7 decrease when the

environment temperature increases. It can be observed that the frequency for each RO is

decreasing uniformly with respect to the temperature changes. This observation suggests

that the temperature changes affect the RO frequency uniformly regardless of its location.

The only difference that can be noticed in the RO frequency changes due to the

temperature effect between Spartan 3E and Artix-7 is the frequency decrement quantity.

The frequency of the ROs on Spartan 3E reduces 2 to 3 MHz on an average at 45°C and 5

to 6 MHz at 70°C. For Artix-7, the RO frequency reduces 1 to 0.5 MHz at 45°C and 1.5

to 1 MHz at 70°C.

Figure 7-4 (b) and Figure 7-4 (b) show the RO frequency changes due to the

voltage variation for Spartan 3E and Artix-7, respectively. The RO’s frequency can be

seen as increasing uniformly as the VCCINT is increased. This observation suggests that the

voltage changes affect the RO’s frequency uniformly despite the RO’s location. It can be

noticed from these figures that the RO’s frequency is sensitive towards the VCCINT

changes. The RO’s frequency for Spartan 3E increases 20 MHz at 1.3V and 40 MHz at

1.4V. The RO’s frequency on Artix-7 increases by 90 MHz at 1.2V. The RO frequency

change due to the voltage variation is significantly higher than the changes due to the

temperature variations. This observation suggests that the ROPUF’s reliability is lowest

for both Spartan 3E and Artix-7 due to the voltage effect.

(a)

(b)

Figure 7-3: Spartan 3E

(a) RO frequency changes with respect to temperature variations.

(b) RO frequency changes with respect to voltage variations.

(a)

(b)

Figure 7-4: Artix-7

(a) RO frequency changes with respect to temperature variations.

(b) RO frequency changes with respect to voltage variations.

Figure 7-5 shows the aging effect on 10 RO frequencies on Spartan 3E and Artix-

7. For Spartan 3E, the frequencies from 10 ROs are observed to have normal fluctuations.

No increasing or decreasing frequency pattern is observed. However, for Artix-7, the

frequencies from 10 ROs are observed to have similar decreasing pattern. In average, the

RO frequencies on Artix-7 are reduced by 0.5 MHz at the end of the aging experiment.

This observation suggests that the aging affects the frequency of the ROs uniformly

regardless of their spatial location.

(a)

(b)

Figure 7-5: RO frequency changes with respect to aging.

(a) Spartan 3E

(b) Artix-7

7.5.4 ROPUF Diverseness

The Artix-7 has the highest ROPUF diverseness of 3.89. Thus represents the high

gap between the maximum and minimum frequencies. High diverseness is a good feature

for ROPUF as it increases the number of CRPs which can be generated [34]. The Spartan

3E diverseness is found to be slightly lower than Artix-7 namely, 2.09. It is observed that

the diverseness increases when advanced silicon technologies are used. This is due to the

reduced transistor size which increases the RO’s frequency. Therefore, slight changes in

the process variations are amplified by the higher RO’s frequency.

7.6 Summary

In this work, we have implemented ROPUFs on 20 Artix-7 FPGA chips and 29

Spartan 3E chips which cover a wide range of silicon technologies. We have recorded and

analyzed thousands of RO frequencies from each chip. We conclude that only diverseness

parameter changes with respect to the silicon technologies used, and the uniqueness

parameter improves as the FPGA chip density increases.

Chapter 8

ROPUF Application: Hardware-Oriented Security-

Based Authentication for Advanced Metering

Infrastructure

8.1 Introduction

A smart grid is often described as the merging of a traditional power grid with an

advanced communication technology to increase the power network’s delivery efficiency.

The smart grid is capable of two-way electricity and information. The two-way

communication in the smart grid allows many parties in the network to exchange

information. For example, the power provider can receive information on the customer’s

power usage, and the customer can receive the recent pricing information from the power

provider. The information exchanges facilitate the power provider to estimate and control

the power generation more efficiently. The customer can utilize the pricing information

to optimize their electricity usage. Many other benefits can be obtained through the smart

grid’s implementations that are not mentioned here [35].

Though the smart grid has been utilized in some places in the world, there is a

growing concern on its security. The smart grid’s security objectives can be grouped into

three categories: availability, integrity, and confidentiality [35]. Availability ensures

timely and reliable access to and use of information in all components of the smart grid.

Integrity guards the information from being modified or destructed to ensure information

nonrepudiation and authenticity. Confidentiality preserves the authorized restrictions on

information access in order to protect personal privacy and proprietary information.

Authentication is one of the important security features that contributes to the integrity

and confidentiality security objectives.

In a smart grid, the authentication scheme has to be different from the one used by

internet technology [35]. This is due to the different threats that exist in the smart grid

network compared to threats that exist in internet technology. The possible threats of a

smart grid network include attacks targeting data integrity and operation disruption

[35][36]. These types of attacks can be managed by having secured authentication. There

are some basic requirements in smart grid authentication protocol such as high efficiency

and tolerance to faults and attacks [35]. We discuss our design based on these

requirements later in section 8.4.

We use a Physical Unclonable Function (PUF) as the authentication scheme for

the Advanced Metering Infrastructure (AMI) in the smart grid. There are many types of

PUFs: this work uses silicon PUF, specifically PUF on a FPGA. During fabrication of the

silicon PUF, minor irregularities occur. The minor irregularities cause slight differences

in the electrical delay in the silicon chip. The differences are not noticeable in the

functionality of the chip, but PUF exploits the minor irregularities to generate a number

of binary IDs that are unique for each chip. Delay-based PUF uses ring oscillators (ROs)

to extract the minor irregularities and make them visible through different frequencies

(will be discussed in Section 8.3). A Ring Oscillator Physical Unclonable Function

(ROPUF) is highly secure; it cannot be modeled since the minor irregularities that occur

during the fabrication process are random for each chip. Confidential information is not

stored on its circuit.

In this work, we discuss our proposed hardware oriented security based

authentication on AMI using ROPUF on FPGAs. Our contributions are as follows:

1. We introduce an authentication scheme using ROPUF. The authentication scheme

is limited to the network between the utility company and smart meter which we

refer to as AMI. The intention is to set a design boundary so that the proposed

scheme is developed to meet the authentication requirements within that boundary.

2. The proposed authentication scheme focuses on the current enhancement of the

existing AMI. No major changes in the protocol are needed to implement the

scheme. Our scheme can be combined with the existing protocol.

3. We have proved that our ROPUF is tolerant to attack since it cannot be modeled.

A linear support vector machine (SVM) is used to test the ROPUF, and the results

show that the SVM fail to model the ROPUF.

4. We have also proved that the proposed ROPUF meet the efficiency and tolerance

requirements through experiments conducted for the proof of concept.

8.2 Related Work

In this chapter, we limit the scope of authentication to communication in AMI.

There are many authentication schemes that have been proposed. We divide the proposed

schemes into two groups. The first group comprises of schemes proposed in terms of

algorithms on the existing resources. In [37], a lightweight message authentication

scheme that uses a shared session key established using Diffie-Hellman exchange

protocol is presented. In [38], an authentication scheme that is based on Merkle hash tree

scheme which is used to construct a tree based on a one-way cryptographic hash function

is described. In [39], smart grid key management (SGKM) based on enhanced identity

based cryptography (EIBC) is suggested. These schemes are based on non-volatile

memory technologies that are vulnerable to invasive/spoofing attacks.

The second group for the authentication scheme is based on hardware-oriented

security. In [40], an interoperable device identification in a smart grid based on a trusted

platform module (TPM) is proposed. This technology is defined by a trusted computing

group implementing consistently behaving computer systems as a technology. Trusted

computing technology provides methods for reliably checking a system’s integrity and

identifying anomalous or unwanted characteristics. In [41] an authentication and key

management scheme for advanced metering infrastructures using PUF is proposed.

In this chapter, our proposed scheme for authentication in AMI fits in the second

group which uses PUF for authentication. The advantages offered by our scheme

compared to [40] and [41] are discussed in section 8.5. We list the requirements of the

smart grid security environment and discuss each authentication scheme that belongs to

the second group according to the requirements as shown in Table 8.1 [36].

Table 8.1: Comparison of different schemes based on Smart Grid requirements

Smart Grid Security

Applications

Requirements

Trusted Computing Technology

[40]

PUF Scheme [41]

1) High performance

in terms of latency

and jitter in message

exchange.

(efficiency)

Message length is not mentioned.

Authentication process takes 11

steps to complete.

Message length is not

mentioned. Average

authentication process takes

3 steps to complete.

2) Timeliness:

computation and

communications

subsystems must

meet real-time

requirements of

applications

(efficiency).

The scheme takes 982.91 ms to

complete the authentication

process.

The exact information is

not stated. The only

information mentioned

about time is 2.4 ms for

PUF execution and 0.2 ms

(average) for the SHA-1 on

32 bit PUF response using

the PC.

3) Comprehensive

security design, as

schemes are likely

targets for

sophisticated cyber-

attacks (tolerance to

attacks).

Highly sophisticated TPM which

provides unique identity for each

module and strong cryptography

co-processor. TPM has secure

storage using a unique asymmetric

storage root key (SRK) of which

the private part never leaves the

TPM.

Scheme based on the

unclonable features derived

from the process variation

on the silicon chip.

4) Adaptable and

evolvable designs

because components

typically have a

lifetime of 15 or

more years once

deployed (tolerance

to faults).

This scheme is adaptable since it is

added to the existing devices on a

smart grid. It is not evolvable as the

design is made on application

specific IC (ASIC).

This scheme is both

adaptable and evolvable.

The scheme is independent

and can be added to any

smart meter. It is also

evolvable because it uses

FPGA; hence, it can be

reprogrammed.

The latency to complete the authentication process is 11 steps for the trusted

computing scheme and 3 steps for the PUF scheme. The trusted computing scheme takes

982.91 ms to complete the authentication process. The timing information for the

authentication using PUF scheme is not sufficient to summarize the total time. The

latency implies the availability of the scheme to operate in real time. Both technologies

have very good features that support tolerance to attacks: the security scheme is

embedded in the hardware. The last requirement discussed in Table 8.1 regards the

adaptable and evolvable designs due to the limitation on the components’ life span. The

trusted computing scheme is adaptable because the module can be replaced over time, but

is not evolvable as it uses ASIC in the design. The PUF scheme is both adaptable and

evolvable because it is designed to be a stand-alone unit that can integrate with any smart

meter and is also evolvable through the configurability features that are offered by FPGA.

8.3 Hardware-Oriented Security-Based Authentication for

AMI

Based on the AMI, the utility companies monitor their customers’ usage through

the smart meter. All data from smart meter is sent to the utility companies through a

number of smart meters (hopping network) and data concentrators. This means that all

data sent to and from the utility companies goes through a number of hops before it

reaches the destination as shown in Figure 8-1. For the authentication process, we

propose the utility company to be the center point for the data concentrators and smart

meters authentication. There are three good reasons for the utility company to be the

center point. The first one is that utility companies need to monitor and update their

customers regularly. The second reason is that all critical control messages, such as

switching off certain users’ appliances, can only come from the utility companies; and the

third reason is the utility companies have bigger and more secure storage (secured from

site channel attacks).

All devices (data concentrators and smart meters) involved in the communication

from the utility company to the smart meter need to have the ROPUFs chip as shown in

Figure 8-2. The utility company does not need to have the ROPUF chip as it is the trusted

authority that controls and monitors the network. The ROPUF uses the existing network

protocol. In this chapter we present the proposed short authentication protocol that needs

to be added in addition to the existing protocol. Our focus is to provide a practical

solution that can be applied to the existing technology at low cost.

The first step in the implementation is to recognize all devices present in the AMI.

The utility company scans and records all the challenge and parity bits pairs (CPBPs)

from each ROPUF chip. The scanned ROPUF chips are connected to each device in the

AMI. Through this method an utility company can keep track of the devices that are

present in the AMI.

Figure 8-1: AMI in Smart Grid [42].

Figure 8-2: ROPUF connected to a smart meter.

When the smart meter needs to send data to the utility company, it needs to

authenticate itself. The smart meter first requests that the utility company send data as

shown in Figure 8-3. Then utility company sends a challenge Ci to the smart meter. The

smart meter then uses the Ci received to produce the authentication code in the form of

Parity Bits (PBi) and send it back to the utility company. The utility company verifies the

response PBi; if it is correct then permission is granted, but if it is wrong, two more

chances are given until the smart meter can send a correct response as shown in Figure 8-

4. The worst case scenario is that the smart meter fails to send a correct response, in

which case the utility company sends a broadcast signal (BLOCK(broadcast)) so that all

devices in the AMI drop any packet received from that particular smart meter, and no

data can be sent through the smart meter. This action automatically isolates the smart

meter from the AMI. To solve this problem, the utility company needs to verify with the

customer whether it is a technical error or an adversary attack.

Figure 8-3: Smart meter to utility company authentication.

Figure 8-4: Smart meter to utility company fail authentication.

The next potential adversary attack is against the data concentrator. The data

concentrator acts as the forwarding device that completes the data path from the smart

meter to the utility company. Identity impersonation and data jamming are potential

attacks on the utility company. To make sure that all data concentrators in the network

are real, the authentications need to be done regularly. For the smart grid authentication,

we propose an authentication every 15 minutes, as suggested in [36][41]. For the data

concentrator authentication, the utility company first sends a verification request signal

with the challenge (VER(Ci)) to the data concentrator as shown in Figure 8-5. The data

concentrator uses the Ci sent by the utility company to produce the PBi and send it to

utility company. The utility company verifies the PBi, and if the PBi matches, then an

acknowledge (ACK) is sent and the data concentrator can operate as usual. If the PBi

does not match then two chances are given for the data concentrator to send a correct PB.

If the data concentrator still fails to produce a correct PB, the utility company drops that

data concentrator from the network by sending a broadcast signal to all devices on the

network in order to isolate the particular data concentrator. The steps taken are the same

as shown in Figure 8-4.

Figur 8-5: Data concentrator to utility company authentication.

The most critical attack possible is the utility company impersonation. The utility

company holds the main authority over all devices in the network. An adversary can get

control of the customer’s smart meter if they can impersonate the utility company. The

ANSI C12.18 standard defines six security levels of access that indicate different

privileges, L0 to L5, with L5 being the highest privilege. The security level L0 requires

no password [41]. To differentiate the security levels we are using five different lengths

of PBs. The five different length of the PBs are 64, 128, 256, 512 and 1024 bits. Higher

security privilege access requires a higher length of PBs. Authentication occurs when the

utility company sends a request to the smart meter for a specific level of access

permission (REQ(level)) as shown in Figure 8-6. Then the utility company sends a

challenge and a hamming code parity bits pair (CPBPi-length(level)) to the smart meter.

The smart meter verifies the PBi-length sent from the utility company by generating its own

PBi-length from the given Ci-length. The utility company has two more chances if the smart

meter fails to verify the CRPi-length(level) as shown in Figure 8-7. If the adversary tries

to impersonate the utility company, the utility company detects the attack when receiving

the first NACK from the smart meter.

Figure 8-6: Utility company to smart meter authentication.

Figure 8-7: Utility company to smart meter fail authentication.

8.3.1 ROPUF Design

In our ROPUF design we use 3-stage ROs. The benefits of using 3-stage ROs

have been discussed in another chapter [17]. For the smart grid authentication application

we propose to use ROPUF that can generate up to 2048 bits responses. Our ROPUF

design uses 120 ROs. If the smart meter in the smart grid application is required to

authenticate itself every 15 minutes for the smart meter to update the usage information,

this means that it needs to authenticate 1,752,000 times or 50 years’ time span. Our

ROPUF design is capable of handling this type of requirement. Each authentication uses

new response bits which makes it harder for the adversary to launch an attack.

Figure 8-8 shows the logic blocks for the ROPUF circuit. Each RO activates for

0.4 ms and there is a 0.1 ms gap before the next RO is activated; this is to reduce the heat

noise that generates from the adjacent CLB [17]. A 0.2 ms gap between the RO and

counter activation allows the signal to be stabilized before the measurement starts. The

timing controller controls all time intervals involved, such as the time interval for each

RO’s activation and the time interval for the counter to measure each RO. The counter 1

measures the number of cycles generated from the RO’s frequency selected through

multiplexer 1 (Mux 1) and Counter 2 measures the number of cycles generated from

RO’s frequency selected through Mux 2. Then the comparator compares the number of

cycles recorded by Counter 1 and 2 to generate one response bit. The generated response

is stored in the register.

Figure 8-8: ROPUF logic blocks.

To reduce the bit flip occurrences we measured the response generation under

various environmental factors such as temperature and voltage variations in order to

study the bit flip occurrences [34]. From the study, we found the importance of using the

ROs comparison pairs that have high differences to avoid the bit flip occurrences. Figure

6-2 (a) shows the example of ROPUF output flip occurrence when the difference between

two selected ROs is small. Figure 6-2 (b) shows how the output flip can be prevented by

selecting two ROs that have a higher frequency difference. From the data obtained on

specific FPGA chips (S3E100), we found the best threshold to be 5 MHz. The threshold

measures the number of possible ROs comparison pairs that have higher frequency

differences than 5 MHz. The number of ROs comparison pairs and CRPs that pass the

5MHz threshold for 5 different FPGA chips (S3E100) are shown in Table 8.2 (we use

(n!/(n-r)!(r!)) equation). The number of possible CRPs generated is abundant enough to

support the frequent authentication requirement, and the security is guaranteed as CRPs

will not be reused.

Table 8.2: Number of possible CRPs.

Number of possible CRPs

ROPUF Comparison pairs 128 bits

1 1863 1.18 x 10 285

2 3942 5.758 x 10 243

3 5880 1.947 x 10 266

4 4504 1.919 x 10 251

5 3381 1.18 x 10 235

8.3.2 Authentication

For the authentication in the AMI, the Ri generated from the ROPUF is not used

because the adversary could model the ROPUF based on the Ci and Ri that are sent

through the network. To enhance security, the hamming code parity bits (PBs) are sent

out for authentication as shown in Figure 8-9. Hamming code is a linear error correcting

code that generalizes the hamming (7,4) code. A block of data that has a length of k bits

is assigned with n-k parity bits. The length of the message after adding the parity bits is n

bits. The block length is represented as n=2r-1, and the message length is represented as

k=2r-r-1where r is the length of the parity bits and r ≥ 0. The block of data represented

by m and the data with parity bits represented by x are given by: x=mG, where G is the

generating matrix.

Figure 8-9: Parity Bits PBi generator.

PBi is used in the authentication by generating 4 parity bits for every 8 bits of Ri

as shown in Figure 8-10. For 128 bits Ri, 64 bits of PBi are generated. Based on the

equation mentioned before, 4 hamming code parity bits are able to cover 16 bits of data

for the error detection and correction. But in our ROPUF design, we use hamming code

as an authentication code generator. We maximize the length of authentication code to

increase the security level by generating 4 hamming code parity bits for every 8 bits of

data. There are three advantages of using hamming code parity bits as the authentication

code. First, hamming code is a one-way function. The second advantage is that there is no

way to model the ROPUF using the Ci and PBi (discussed in Section 8.4). Third, the

authentication code has a shorter length compared to Ri, yet produces better security.

Figure 8-10: Parity bits from 128 response bits form 64 parity bits.

Figure 8-11 shows how each ROPUF chip is registered with the utility company.

The utility company has a database that stores all possible challenge for each ROPUF.

100

The combinations of challenge for each ROPUF are different as discussed in the previous

section because only ROs comparison pairs that pass the frequency difference threshold

are selected. Challenge for all ROPUFs should be provided by the manufacturer. Utility

company sends one Ci at a time to each ROPUF and records the PBi generated from each

particular ROPUF in the database. The ROPUF registration process starts with the

challenge for 128 bits response and continues until 2048 bits response for different level

of security access.

Figure 8-11: ROPUFs registration with utility company.

ROPUFs that have been registered can be used by the smart meters and data

concentrators in the AMI. The ROPUF does not need any additional storage from the

devices. It just needs to be connected serially to the device as shown in Figure 8-2 via

serial connection, and the firmware on the devices needs to be updated to support the

additional protocol proposed to generate the authentication key.

101

8.4 Proof of Concept

In this section, we discuss the proposed ROPUF design for smart grid

authentication based on three smart grid requirements mentioned in Section 8.1. To prove

the concept, we implement our ROPUF design on Spartan 3E FPGAs. The PC acts as an

utility company that stores all the CRPs, and the smart meters and data concentrators are

implemented on FPGAs. This set-up simulates the protocol and the effectiveness of our

scheme. Table 8.3 shows the time taken to transfer the PBi and Ci through the USB

connection. For the first authentication level, the total authentication time taken is 65.364

ms via the USB connection. This time is within the range of achieving real-time

communication. The first authentication level is used the most as it involves the

information passing between the utility company and smart meter. The second level

authentication takes 130.73 ms, followed by third (261.46 ms), fourth (522.91 ms) and

fifth (1045.83 ms) level authentications. In terms of high efficiency, our system meets the

smart grid requirement in which the authentication can be achieved in real-time.

Another factor to consider is cost of storage. Extra data storage is needed to store

the challenge and PB from ROPUFs. Table 8.4 shows the data storage needed to store all

challenge and parity bits pairs (CPBPs) for 50 years. For the first authentication level, we

assume that authentication would take place every 15 minutes. In this case, one ROPUF

needs 35136 CPBPs in one year. The size of data storage needed to store the CPBPs for

one year is 8 MB. If the life span of the AMI in smart grid is expanded to 50 years, then

the data storage size needed to store CPBPs for one ROPUF would be 408 MB. For other

authentication levels, we assume that a 5 times a day usage will require 1830 CPBPs in

one year for one ROPUF. For a 50-year life span, the second authentication level needs

102

46 MB data storage, followed by the third, fourth and fifth level authentications (93 MB,

186 MB and 371 MB, respectively).

Table 8.5 shows the total data storage needed to store all the CPBPs according to

the number of devices (data concentrator and smart meter) involved in the AMI. Storing

complete CPBPs (all authentication levels) for one device takes 1.1 GB of data storage. If

the AMI has 2000 devices, 2207 GB of data storage is needed. Two TB of data storage

cost around $140 currently. Thus, the proposed authentication scheme using ROPUF

does not incur high cost to the utility company and is cost effective.

Table 8.3: Authentication time for each level.

Time (ms)

Authentication

level

(bits)

PBi

(bits)

PBi

Generation

PBi

Transfer

Total

Authentication

First L1 128 64 1792 64 0.047 1.317 65.364

Second L2 256 128 3584 128 0.094 2.634 130.728

Third L3 512 256 7168 256 0.188 5.268 261.457

Fourth L4 1024 512 14336 512 0.376 10.537 522.913

Fifth L5 2048 1024 28672 1024 0.753 21.074 1045.827

Table 8.4: Data storage size for each authentication level.

CPBP authentication level Year(s) Data size (megabytes)

First

1 8.152

10 81.516

20 163.031

30 244.547

40 326.062

50 407.578

Second 50 46.4

Third 50 92.8

Fourth 50 185.6

Fifth 50 371.2

103

Table 8.5: Data storage size needed based on number of devices on the AMI.

Number of devices Data Size (gigabytes)

1 1.103578

100 110.3578

200 220.7156

300 331.0734

400 441.4312

500 551.789

600 662.1468

700 772.5046

800 882.8624

900 993.2202

1000 1103.578

2000 2207.156

To test our ROPUF security level, we use a support vector machine (SVM) to

model the ROPUF based on the Ci and PBi. In this model we assume that the adversary

has knowledge of the encryption code used in the network and also of the hamming code

to generate the PBi for every 8 bits of the response. We use SVM because the Ci and PBi

can be classified as ‘1’ and ‘0’. A SVM classifies data by finding the best hyper-plane

that separates one class from another class. The best hyper plane has the largest margin

between the two classes.

First, we train the SVM classifier with a group of data with the correct classifier.

The data X and classifier Y are fed to the SVM train function to train the classifier. The

Gaussian Radial Basis Function Kernel with a scaling factor of sigma equal to one is used

for the classifier training. The data set consists of the RO pairs used to generate the parity

bit as shown in Figure 8-12. As an example, response bit b0, b1, b3, b4, and b6 are used for

the generation of the first parity bit. RO1 and RO2 are used to generate b1, RO3 and RO4

104

are used to generate b2, and so on as shown in Figure 8-12. The parity bit p0 is the

classifier Y, and the data X consists of RO1 to RO10. One authentication key has 64 parity

bits comprises of p0 until p63.

Figure 8-12: Parity bits and corresponding ROs.

Figure 8-13 shows the accuracy of the prediction results. As the number of data

used to train the SVM classifier increases, the accuracy also increases to a point, but then

gradually decreases as the number of data used is further increased. The best accuracy

obtained is 60.9%, showing that the SVM cannot model the ROPUF by using the Ci and

PBi. This test proves that the proposed ROPUF design for the AMI is secure from the

adversary’s attack. Another advantage of using ROPUF as the authentication system is

that no clues useful for the adversary to crack the ROPUF are stored on the devices. Even

if the adversary is able to model one of the ROPUFs in the AMI, it will take more time to

break another ROPUF because the only way to break the ROPUF (if there is a way of

doing it) is by gathering the challenge and PB pairs and creating a model.

The security of the database that stores all challenge and PBi for the devices on

the AMI is also important. However, the database is less vulnerable to adversary attack

105

since the other devices have no access to the utility company. The only communication

the utility company has with the devices in the network is sending and receiving

information. But, if the adversary is able to hack the utility company, or some of the

utility company employees breach trust, then the threat is unavoidable. However, if that

type of attack occurs, our ROPUF authentication system can be fixed. The ROPUF can

be reprogrammed, and different sets of ROs can be used, rendering the previous

challenge and PB invalid. In terms of the tolerance to attack requirements, the ROPUFs

design requires endless effort from the adversary in order to model the ROPUFs. The

vulnerability exists at the utility company database, and we assume that the utility

company has to have a good firewall to protect the system as a whole.

Figure 8-13: SVM prediction accuracy for ROPUF.

Regarding the tolerance to faults, the authentication system is designed to tolerate

10% of discrepancies in the PB. Additionally, the comparison pairs used to generate the

response in the ROPUFs have at least 5 MHz difference. This ensures that the Ri

generated from the ROPUFs will not get flipped when exposed to anomaly voltage and

temperature conditions. Figure 8-14 shows that the bit flip probability trend reduces when

the frequency difference increases. We find that the bit flip occurs most when the

maximum frequency difference is 1 MHz [34]. To guarantee the ROPUF is able to deal

106

with the worst case scenario, though, we recommend using the largest threshold

frequency possible.

Figure 8-14: Bit flip probability vs. frequency difference (MHz)

8.5 Summary

In this work, we propose a new scheme for authentication of the Advanced

Metering Infrastructure in a smart grid. The novel authentication scheme using ROPUF

offers high security with less overhead compared to previous proposed schemes [40][41].

In terms of latency, our proposed scheme takes, at most, four steps for authentication.

The complete authentication times for the most used security level L1 and L2 are 65.4 ms

and 130.7 ms, respectively. These times satisfy the availability requirements in a smart

grid. The authentication keys sent through the network do not provide any clues that

allow the adversary to model the ROPUF as proved in the SVM trained data results. This

system is designed as a stand-alone unit so that it can work in addition to the existing

protocol currently used in the industry. The system is also designed to tolerate the fault

occurrence in the system, such as using high RO comparison pairs only, and tolerating

10% discrepancies in the PB. The reconfigurability feature offered by the FPGA makes

the ROPUF evolvable as it can be reprogrammed at any time.

107

Chapter 9

Conclusions

9.1 Summary and Conclusions

The importance of hardware security and trust is increasing as the industry supply

chain has become more complex and also more vulnerable to adversary attacks. An

article in the IEEE Spectrum entitled “The Hidden Dangers of Chop-Shop Electronics”

describes how clever counterfeiters sell old components as new, thus threatening both

military and commercial systems [43]. On August 17, Boeing warned the U.S. Navy that

an ice-detection module in the P-8A Poseidon (new reconnaissance aircraft) contained a

reworked part that should not have been put on the airplane originally and should have

been replaced immediately. The company that supplies the ice-detection module has

blamed the part, a Xilinx field-programmable gate array (FPGA), for the failure of the

ice-detection module during a test flight. However, retracing that FPGA’s path led not to

Xilinx but to a Chinese company called, “A Access Electronics”. It apparently had turned

a quick profit by selling used Xilinx parts as new. This incident is one of the examples of

108

how the vulnerability in the hardware security and trust has become a security threat.

There are two points that can be highlighted from this true incident. The first is how

widely the programmable chip or FPGA is used in the industry. The second is the

vulnerability that exists in the industry’s supply chain that could lead to serious safety

and security issues. Therefore, it is important to increase the security of FPGAs and other

custom designed chips. Some techniques have been proposed in the past to enhance the

security of FPGAs. In this work, we have proposed a ring oscillator based technique

which extracts the process variation effects from the FPGA and converts it to a unique

ID. A ROPUF can be used as an authentication technique for an FPGA to verify its

trustworthiness. Apart from that, ROPUFs can also be used as a cryptography technique

to encrypt and decrypt information.

9.2 Contributions and Results

Major contributions made in this research are listed below:

 Three different FPGA families which are fabricated using different silicon

technologies are used to explore the ROPUF. The ROs are studied and compared

based on five parameters; uniqueness, reliability, uniformity, bit-aliasing, and

diverseness.

o The temperature variations, voltage variations, and accelerated aging

experiments are done to measure the reliability.

o The FPGA fabricated using the latest technology shows better

performance based on the five parameters used.

109

o Different numbers of stages used in ROs are explored. The experimental

results obtained suggest that a lower number of stages used in a ROPUF

on FPGA contributes to better performance regardless of the silicon

technologies used.

o Different FPGAs may have different minimum number of stages that can

be used in ROs due to the limitation on the FPGA components’ maximum

operating frequency. The minimum number of RO stages that can be used

in Spartan 2 and Spartan 3E is 3, and for Artix-7 is 5-stage.

o Based on the experimental results, we conclude that a ROPUF is

applicable regardless of different silicon fabrication technologies used to

produce an FPGA.

 The systematic variation effect on ROPUF’s security reliability has also been

studied in this work.

o The experimental results showed that systematic variation does affect the

ROPUF responses’ randomness and uniqueness parameter.

o The RPM technique is developed to overcome this effect. The results

obtained by using the RPM technique are shown to be better than other

techniques that have been proposed before.

o The responses generated from ROPUFs after applying the RPM technique

passed most of the NIST statistical test for randomness.

110

 The ROPUF is applied as the hardware-oriented security-based authentication for

advanced metering infrastructure (AMI). The authentication system is developed

based on ROPUF and is targeted for AMI.

o ROPUF is used to generate the unique ID for the devices involved in the

AMI for the authentication.

o This system is designed to fit in the current AMI system with low cost

implementation.

o Details of the implementation cost are shown in this work as the proof of

concept.

o The security of ROPUF system used is also tested using support vector

machine (SVM). The SVM is trained using a large data set and challenges

are fed into the SVM to predict the response sets.

o Results obtained show that SVM failed to predict ROPUF responses based

on the challenges, thus lending credence to the security offered by the

proposed authentication system.

9.3 Future Works

The research work done in this dissertation can be further extended by performing

the following:

 Implementation of the ROPUF authentication scheme for the AMI network using

a simulation software such as NS-2 to analyze its performance.

111

 An efficient error correction circuit can be developed to improve the ROPUF

security.

 A ROPUF scheme can be developed for cryptography

 A hardware Trojan detection technique can be designed using the RO.

112

References

[1] The Bureau Of National Affairs, INC, “Counterfeit Electronic Parts: What to do

Before the Regulations (and Regulators) come?” ISSN 0014-9063 2012.

[2] Ryan Kastner and Ted Huffmire, “Threats and Challenges in Reconfigurable

Hardware Security,” California University San Diego La Jolla Department of

Computer Science and Engineering, July 2008.

[3] C.E. Yin and Q. Gang, “Improving PUF Security with Regression-based Distiller,”

Design Automation Conference (DAC), pp. 1-6, Jun 2013.

[4] A. Maiti, V. Gunreddy, and P. Schaumont, "A Systematic Method to Evaluate and

Compare the Performance of Physical Unclonable Functions," Chapter 11 in

"Embedded System Design with FPGAs," Eds. P. Athanas, D. Pnevmatikatos, N.

Sklavos, Springer 2012, ISBN 978-1-4614-1361-5.

[5] B. Gassend, D. E. Clarke, M. Van Dijk and S. Devadas, “Silicon physical unknown

functions,” in ACM Conference on Computer and Communications Security (CCS)

2002, pp. 148–160.

113

[6] G.E. Suh and S. Devadas, “Physical unclonable functions for device authentication

and secret key generation,” in Proc. 44th Design Automation Conf. (DAC 07), ACM

Press, pp. 9–14.

[7] Susana Eiroa and Iluminada Baturone, “An analysis of ring oscillator PUF behavior

on FPGAs,” in Int. Conf. on Field-Programmable Technology (FPT) 2011.

[8] A. Maiti, and P. Schaumont, “Improved ring oscillator PUF: an FPGAfriendly secure

primitive,” Journ. of Cryptology, Vol. 4 (2), pp. 375-397, April 2011.

[9] C.E.D. Yin and Q. Gang, “LISA: Maximizing RO PUF’s secretextraction,” In HOST

2010, pp. 100-105.

[10] A. Maiti, and P. Schaumont, “Improving the quality of a physical unclonable

function using configurable ring oscillators,” in FPL 2009, pp.703-707.

[11] A. Maiti, J. Casarona, L. McHale, and P. Schaumont, ”A large characterization of

RO-PUF,” HOST 2010, pp. 66-71, 2010.

[12] H. Yu, P.H.W. Leong, H. Hinkelmann, L. Moller, M. Glesner, and P. Zipf,"Towards

a unique FPGA-based identification circuit using process variations," International

Conference on Field Programmable Logic and Application, pp.397,402, Aug. 31

2009-Sept. 2 2009.

[13] P. Sedcole and P. Y. K. Cheung, “Within-die delay variability in 90nmFPGAs and

beyond,” Proc. FPT 2006, pp. 97-104.

[14] D. Lim, J.W. Lee, B. Gassend, M. Van Dijk, and S. Devadas, “Extracting secret keys

from integrated circuits,” IEEE Transactions on Very Large Scale Integration (VLSI)

Systems, 2005.

114

[15] S.S. Kumar, J. Guajardo, R. Maes, G.J. Schrijen, and P. Tuyls, “The Butterfly PUF:

Protecting IP on every FPGA, “IEEE International Workshop on Hardware Oriented

Security and Trust (HOST), 2006.

[16] S. Morozov, A. Maiti, and P. Schaumont, "A Comparative Analysis of Delay Based

PUF Implementations on FPGA," 6th International Symposium on Applied

Reconfigurable Computing, March 2010.

[17] M. Mustapa, M. Niamat, M. Alam and T. Killian, “Frequency Uniqueness in Ring

Oscillator Physical Unclonable Functions on FPGAs,” MWSCAS 2013, pp. 465-468,

Aug. 2013

[18] Q. Liu and S.S. Sapatnekar, “A Framework for scalable post-silicon statistical delay

prediction under process variations,” in IEEE Trans. on CAD of Integrated Circuits

and Systems 2009, IEEE Press, pp. 1201-1212.

[19] Xilinx, “Spartan-II FPGA Family Data Sheet,” DS001 June 13 2008

[20] S. Lopez-Buedo, J. Garrido, and E. Boemo, "Thermal testing on reconfigurable

computers," Design & Test of Computers, IEEE , vol.17, no.1, pp.84,91, Jan-Mar

2000.

[21] National Institute of Standards and Technology, “A Statistical Test Suite for Random

and Pseudorandom Number Generators for Cryptographic Applications,” April 2010.

[22] M. D. Yu and S. Devadas, “Secure and Robust Error Correction for Physical

Unclonable Functions,” IEEE Trans. on Design & Test of Computers, 2010.

[23] Y. Dodis, R. Ostrovsky, L. Reyzin and L. Smith, “Fuzzy Extractors: How to

Generate Strong Keys from Biometrics and Other Noisy Data,” SIAM Journal on

115

Computing 38(1), 97-139, 2008.

[24] R. Maes, A. V. Herrewege and I. Verbauwhede, “PUFKY: A Fully Functional PUF-

based Cryptographic Key Generator,” Cryptographic Hardware and Embedded

Systems CHES 2012, pp 302-319, 2012.

[25] C. E. Yin and G. Qu, “Temperature-Aware Cooperative Ring Oscillator PUF,”

Proceedings of 2 nd

IEEE International Workshop on Hardware Oriented Security

and Trust, Jun 2009.

[26] C C. E. Yin, G. Qu and Q. Zhou, “Design and Implementation of a Group-based RO

PUF,” Design, Automation Test (DATE13), March 2013.

[27] B. E. Stine, D. S. Boning, and J. E. Chung, “Analysis and decomposition of spatial

variation in integrated circuit processes and devices,” IEEE Transactions on

Semiconductor Manufacturing, Vol. 10, Issue 1, pp. 24-91, Feb 1997.

[28] K. Bernstein, D. J. Frank, A. E. Gattiker, W. Haensch, B. L. Ji, S. R. Nassif, E. J.

Nowak, D. J. Pearson, and N. J. Rohrer, “High-performance cmos variability in the

65-nm regime and beyond,” IBM Journal of Research and Development, vol. 50, no.

4.5, pp. 433-449, 2006.

[29] B. E. Stine, T. Maung, R. Divecha, and et al, “Using a statistical metrology

framework to identify systematic and random sources of die and wafer-levelild

thickness variation in cmp processes,” International Electron Devices Meeting, pp.

499-502, Dec. 1995.

116

[30] J.R. Celaya, P. Wysocki, V. Vashchenko, S. Saha and K. Goebel, "Accelerated aging

system for prognostics of power semiconductor devices," AUTOTESTCON, 2010

IEEE , vol., no., pp.1,6, 13-16 Sept. 2010.

[31] J. Keane and C.H. Kim,“An odomoeter for CPUs," Spectrum, IEEE , vol.48, no.5,

pp.28,33, May 2011.

[32] A. Maiti, L. McDougall, and P. Schaumont, "The Impact of Aging on an FPGA-

Based Physical Unclonable Function," Field Programmable Logic and Applications

(FPL), 2011 International Conference on , vol., no., pp.151,156, 5-7 Sept. 2011.

[33] D. Ganta and L. Nazhandali, "Study of IC aging on ring oscillator physical

unclonable functions," Quality Electronic Design (ISQED), 2014 15th International

Symposium on , vol., no., pp.461,466, 3-5 March 2014.

[34] M. Mustapa, and M. Niamat, “Relationship between Number of Stages in ROPUF

and CRP Generation on FPGA,” The 2014 International Conference on Security and

Management (SAM’14), 21-24 July 2014.

[35] W. Wang, and Z. Li, “Cyber security in the Smart Grid: Survey and challenges”,

Computer Networks, Volume 57, Issue 5, Pages 1344-1371 April 2013.

[36] H. Khurana, R. Bobba, T. Yardley, P. Agarwal, and E. Heine, "Design Principles for

Power Grid Cyber-Infrastructure Authentication Protocols," System Sciences

(HICSS), 2010 43rd Hawaii International Conference on , vol., no., pp.1,10, 5-8 Jan.

2010

117

[37] MM. Fouda, Z.M. Fadlullah, N. Kato, Rongxing Lu, and Xuemin Shen, "A

Lightweight Message Authentication Scheme for Smart Grid

Communications," Smart Grid, IEEE Transactions on , vol.2, no.4, pp.675,685, Dec.

2011

[38] Li Hongwei, Lu Rongxing, Liang Zhou, Bo Yang, and Xuemin Shen, "An Efficient

Merkle-Tree-Based Authentication Scheme for Smart Grid," Systems Journal, IEEE ,

vol.8, no.2, pp.655,663, June 2014

[39] H. Nicanfar, P. Jokar, K. Beznosov, and V.C.M. Leung, V, "Efficient Authentication

and Key Management Mechanisms for Smart Grid Communications," Systems

Journal, IEEE , vol.8, no.2, pp.629,640, June 2014

[40] N. Kuntze, C.Rudolph, I. Bente, J. Vieweg, and J. Von Helden, "Interoperable device

identification in Smart-Grid environments," Power and Energy Society General

Meeting, 2011 IEEE , vol., no., pp.1,7, 24-29 July 2011

[41] M. Nabeel, S. Kerr, Xiaoyu Ding, and E. Bertino, "Authentication and key

management for Advanced Metering Infrastructures utilizing physically unclonable

functions," Smart Grid Communications (SmartGridComm), 2012 IEEE Third

International Conference on , vol., no., pp.324,329, 5-8 Nov. 2012.

[42] R. Lehrbaum. (2013, Sept. 30). Smart grid data concentrator dev kit runs Linux

[online]. Available: http://linuxgizmos.com

[43] J. Villasenor and M. Tehranipoor, "Chop shop electronics," Spectrum, IEEE , vol.50,

no.10, pp.41,45, October 2013.

http://linuxgizmos.com/smart-grid-data-concentrator-dev-kit-runs-linux/

118

[44] S. Morozov, A. Maiti, and P. Schaumont, “An Analysis of Delay Based PUF

Implementations of FPGA,” 6 th

International Symposium ARC 2010, Bangkok,

Thailand, pp. 382-387, March 17-19 2010.

[45] Xilinx, “Virtex-6 FPGA Configuration,” User Guide, Aug. 2014.

[46] A. Maiti, L. McDougall, and P. Schaumont, "The Impact of Aging on an FPGA-

Based Physical Unclonable Function," Field Programmable Logic and Applications

(FPL), 2011 International Conference on , vol., no., pp.151,156, 5-7 Sept. 2011.

[47] D. Ganta and L. Nazhandali, "Study of IC aging on ring oscillator physical

unclonable functions," Quality Electronic Design (ISQED), 2014 15th International

Symposium on , vol., no., pp.461,466, 3-5 March 2014.

[48] Y. Hori, T. Yoshida, T. Katashita, and A. Satoh, “Quantitative and Statistical

Performance Evaluation of Arbiter Physical Unclonable Functions on FPGAs,”

International Conference on Reconfigurable Computingand FPGAs (ReConFig)

2010, pp 298-303, December 2010. [49] M. Mustapa and M. Niamat, “Novel RPM Technique to Dismiss Systematic

Variation for ROPUF on FPGA,” IEEE National Aerospace & Electronics

Conference (NAECON 2014), 25-27 June 2014.

sources/162/Duncan et al. - 2019 - FPGA Bitstream Security A Day in the Life.pdf

FPGA Bitstream Security: A Day in the Life Adam Duncan∗, Fahim Rahman†, Andrew Lukefahr∗, Farimah Farahmandi†, Mark Tehranipoor†

∗Intelligent Systems Engineering, Indiana University, Bloomington, Indiana 47401 USA †Electrical and Computer Engineering, University of Florida, Gainesville, Florida 32611 USA

Email: [email protected]

Abstract—Security concerns for field-programmable gate array (FPGA) applications and hardware are evolving as FPGA designs grow in complexity, involve sophisticated intellectual properties (IPs), and pass through more entities in the design and implementation flow. FPGAs are now routinely found integrated into system-on-chip (SoC) platforms, cloud-based shared computing resources, and in commercial and government systems. The IPs included in FPGAs are sourced from multiple origins and passed through numerous entities (such as design house, system integrator, and users) through the lifecycle. This paper thoroughly examines the interaction of these entities from the perspective of the bitstream file responsible for the actual hardware configuration of the FPGA. Five stages of the bitstream lifecycle are introduced to analyze this interaction: 1) bitstream-generation, 2) bitstream-at-rest, 3) bitstream-loading, 4) bitstream-running, and 5) bitstream-end-of-life. Potential threats and vulnerabilities are discussed at each stage, and both vendor-offered and academic countermeasures are highlighted for a robust and comprehensive security assurance.

Keywords—FPGA Security, Encryption, Bitstream Protection

I. INTRODUCTION

A field-programmable gate array (FPGA) is an integrated circuit with post-fabrication hardware programming capa- bilities used to implement custom functionality on a ded- icated hardware platform [1]. Products ranging from low- cost consumer electronics to high-end commercial systems use FPGAs for reconfigurability, low development cost, and high-performance [2]. The specific hardware functionality pro- grammed into an FPGA is defined by a binary file commonly known as a bitstream which is generated following a rigorous design, synthesis, and validation process. FPGAs are typically classified by the type of on-chip configuration memory used to store this bitstream file, with common examples being static random access memory (SRAM), Flash, and antifuse. Each configuration memory variant has associated performance, fab- rication, and security tradeoffs as discussed in [2]. However, in each FPGA type, the primary FPGA-specific security concern eventually simplifies down to protecting the bitstream from either tampering or intellectual property (IP) piracy. Tampering an FPGA bitstream can compromise the root of trust, and thus the security, of an entire system. Just as consequential, IP piracy conducted at the bitstream level can have an enormous financial impact for the design house and system manufacturer.

A simplified design flow illustrating the loading of a bitstream into an FPGA is depicted in Figure 1(a). The FPGA manufacturer, such as Xilinx, Intel, or Microsemi, first

produces the FPGA integrated circuit (IC), along with the proprietary bitstream development software. The user loads the design into the bitstream development software to generate the bitstream file. The bitstream is then loaded into the FPGA configuration memory when the device is powered on for functional operation.

Early FPGAs could only hold simple designs, e.g., 1000 ASIC equivalent gates for Xilinx XC2064 [3], making this de- sign flow tractable. However, FPGA technology has matured, and the size and the complexity of the FPGA have grown over time. The Xilinx VU19P device released in 2019 contains over 9-million logic cells, or roughly 90-million ASIC gates [4]. A design utilizing a significant portion of these logic resources often requires a large team of designers, incorporating multiple third party IP (3PIP) blocks and legacy designs. The VU19P, like most recent FPGAs, also allows for partial reconfiguration, that is allowing a system programmer to reconfigure the FPGA while operating in the field with partial bitstream updates. Fig- ure 1(b) shows the modern-day FPGA design flow with these additional entities interacting with each other and highlights their connection paths to the final bitstream responsible for the FPGA hardware configuration. As each entity shares a connection to the bitstream, they also pose a potential security threat to the authenticity, integrity, and confidentiality of the bitstream.

In this paper, we explore the journey an FPGA bitstream takes from conception to FPGA-based system obsolescence and present a comprehensive threat taxonomy to guide the reader. Industry and academic countermeasures are then pre- sented to illustrate defenses against each threat. The re- viewed protection mechanisms are composed of five stages: 1) bitstream-generation, 2) bitstream-at-rest, 3) bitstream- loading, 4) bitstream-running, and lastly, 5) bitstream-end-of- life (EOL). Our main contribution in this paper is to provide a comprehensive security assessment of the bitstream as it travels between these stages.

The rest of the paper is organized as follows: Related work and additional background information is provided in Section II. We introduce our bitstream lifecycle stages and present the threat taxonomy in Section III. Security threats and vulnerabil- ities, along with selected countermeasures, associated with the bitstream-generation stage are discussed in Section IV. Similar analysis is provided for the subsequent stages – bitstream-at- rest, bitstream-loading, bitstream-running, and bitstream-end- of-life – in Sections V, VI, VII, and VIII, respectively. Finally,

INTERNATIONAL TEST CONFERENCE 1

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 18:03:46 UTC from IEEE Xplore. Restrictions apply.

Fig. 1: a) Classical view of the FPGA design flow. b) Modern FPGA design flow involving multiple entities.

the paper is concluded in Section IX.

II. BACKGROUND

Different entities involved in a modern and complex FPGA design flow are highlighted in Figure 1(b). The 3PIP design house produces generic or client-specific IPs for the system integrator. The system integrator obtains and integrates the 3PIPs with in-house IPs to produce the actual bitstream for the FPGA. The system programmer represents the entity in charge of loading the bitstream into the FPGA. Lastly, in-field is used in reference of the FPGA operating in the field, such as inside a computer networking router, with its bitstream loaded into its physical configuration memory.

The physical configuration memory that stores the bitstream in an FPGA has a direct impact on the security and accessibil- ity of a bitstream. SRAM-based FPGAs are the most common FPGA type, using volatile SRAM-based latches to store the bitstream. They are fabricated using standard state-of-the-art manufacturing processes allowing for high-performance and high-density [5]. However, they require off-chip bitstream storage, and must transmit the bitstream into the FPGA after it is powered on. Hence it is possible for an attacker to intercept the unprotected bitstream at the board level [5].

There also exist non-volatile FPGAs, such as Flash memory and antifuse-based, which store their bitstream inside the FPGA, eliminating the board-level bitstream interception prob- lem. These FPGAs require additional manufacturing process steps and lack in the performance and density metrics of their SRAM-based counterparts [5]. Academic researchers have also proposed FPGA designs utilizing emerging non- volatile memories such as magneto-resistive RAM (MRAM) to produce higher performance non-volatile FPGAs [6].

Irrespective to complexity and memory architecture, FPGA security issues eventually simplify down to unauthorized ac- cess and tampering to the FPGA bitstream. For example, con- cerns may include attackers performing reverse engineering on proprietary IP or may involve the loading of an unauthorized design into an FPGA-based system to alter intended system be- havior. Specific threats and countermeasures will be discussed throughout subsequent sections of this paper.

FPGA vendors have included bitstream protection features dating back to the earliest FPGAs. Xilinx published an applica- tion note in 1997 to program the FPGA at a secure facility and use a battery to maintain power throughout the lifetime of the system, preventing an attacker from intercepting the bitstream [7]. In 2001, Xilinx introduced bitstream encryption into their Virtex-II devices using the Data Encryption Standard (DES) [8]. Here, the bitstream is encrypted with an encryption key that is stored securely within the FPGA. Without knowledge of the encryption key, an adversary cannot reverse engineer or copy the bitstream. Other FPGA manufacturers have since included bitstream encryption in their devices, with encryption standards eventually migrating to include variants of the newer Advanced Encryption Standard (AES) [5].

In 2009, on-chip bitstream authentication was included by Xilinx in their Virtex-6 devices [5]. This authentication imple- ments a keyed-Hash Message Authentication Code (HMAC) algorithm in hardware to compute the hash digest of a bit- stream. The digest is compared to a pre-computed reference digest before bitstream loading, and the loading is aborted upon a mismatch. In 2015, Microsemi included physically unclonable function (PUF) protection to their bitstream en- cryption keys in their IGLOO2 and Smartfusion2 devices [9]. The PUF uses the inherent physical properties of the IC to generate a device-specific digital signature generated at run- time by the chip. This PUF value is then incorporated into the bitstream encryption scheme so that an attacker cannot thwart the encryption protection by obtaining the on-chip encryption key alone. Xilinx and Intel also offer similar solutions for their Ultrascale+ and Stratix-10 devices, respectively [10], [11].

III. THREAT MODEL

The modern FPGA design flow experiences complex in- teractions among multiple involved entities as discussed in Section II. We present our threat taxonomy in Figure 2 to explore the threats and vulnerabilities facing the bitstream as it travels amongst these different FPGA entities. The top flow of Figure 2 illustrates the Design Flow Entities involved: 1) 3PIP Design House, 2) System Integrator, 3) System Programmer, 4) System in-Field, and lastly 5) Recycler.

Security Invited 1.1 INTERNATIONAL TEST CONFERENCE 2

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 18:03:46 UTC from IEEE Xplore. Restrictions apply.

Fig. 2: A taxonomy of the different threats facing a bitstream as it traverses through a modern FPGA design flow composed of multiple entities.

Our five Bitstream Stages are located below these entities and describe the different points involved in the journey of a bitstream. Bitstream-Generation refers to the stage where the bitstream is physically being generated by either FPGA design tools or other means. Bitstream-at-Rest defines the stage where a bitstream has been generated and is stored either on a computer, in a cloud repository, or in a non- volatile memory that is not currently configuring the FPGA. Bitstream-Loading describes the physical act of loading the bitstream from its resting state into the FPGA configuration memory. Bitstream-Running is the state where a bitstream has been loaded into the configuration memory, and the FPGA is operating according to its programmed hardware configuration. Lastly, Bitstream-EOL is used to describe the decommissioning of the bitstream as well as physical FPGA- related threats tangentially related to the FPGA bitstream.

The interaction between the design flow entities and bit- stream stages illustrates the complexity involved in modern FPGA security. The first observation is that each design flow entity has a connection to more than one bitstream stage. For example, the in-field system may contain a bitstream stored in a non-volatile memory on a PCB, categorized as bitstream- at-rest. After the system powers up, it enters the bitstream- loading stage, and transitions into the bitstream-running stage after the bitstream reaches the FPGA configuration memory.

Several threat categories are provided for each bitstream stage as shown in the bottom of Figure 2. The taxonomy also lists two examples for threat category. The subsequent sections of this paper will discuss these threats and associated countermeasures in detail with respect to each bitstream stage.

IV. BITSTREAM GENERATION

The life a bitstream begins with an intended hardware design specification that is targeted towards the FPGA. The

design specification is then translated into a complete design IP that is either developed by the user, outsourced as 3PIP, includes other licensed IP, or is a combination of all. At this point, the FPGA design software takes the IP and synthesizes it into FPGA resources according to the targeted FPGA models and specifications. These resources are then placed within the FPGA fabric and routed together to create a final configuration. This final configuration is ultimately specified as the bitstream. This bitstream generation process can be seen in Figure 3. Two important things can be observed from this figure. First, the design flow is similar to that of an ASIC, and as such, the design specification and IP steps share the same threats and countermeasures found in the ASIC literature. Second, the synthesis, place and route, and bitstream generation steps have distinct differences compared to an ASIC, due to their reconfigurability and the fact that the physical FPGA fabric, including the model-specific hardware information, is often public and known to an attacker.

Our taxonomy in Figure 2 divides threats in the bitstream- generation phase into two categories: malicious intent and non-malicious intent. Malicious intent refers to an attacker deliberately performing an attack, such as Trojan insertion or IP overuse during the generation of a bitstream. Non-malicious intent is presented to cover the expanding threat space where vulnerabilities are unintentionally introduced by the complex FPGA design tools generating final bitstreams.

Fig. 3: A simplified view of the FPGA bitstream generation flow.

Security Invited 1.1 INTERNATIONAL TEST CONFERENCE 3

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 18:03:46 UTC from IEEE Xplore. Restrictions apply.

A. Malicious Design Flow Threats

Trojan Attacks: Attacks on the design IP within the bitstream generation flow are often very similar to attacks on design IPs in an ASIC design flow. For example, hardware Trojan insertion [12] shares the same basic attack principles for register transfer level (RTL) design IP, independent of whether the IP is targeting an FPGA or an ASIC. Once the design IP has been synthesized into the specific design elements inside the FPGA, the synthesized blocks become vulnerable to Trojan insertion attack. Mal-Sarkar et al. discussed FPGA- specific post-synthesis threat vectors [13] as illustrated in Figure 4. Here, logic blocks contain programmable lookup tables (LUTs), combinational logic primitives such as adders, and sequential elements in the form of flip-flops and latches. These elements are combined to implement combinational and sequential logic functions. There are also routing elements: local interconnects, connection boxes, and switch boxes which are used to route outputs between logic blocks.

An attacker, such as a rogue employee with access to the design in this post-synthesis state can change the properties of logic blocks to introduce a Trojan or modify the design functionality in some way. After synthesis, logic blocks go through a place and route step where they are placed within the FPGA fabric at specific locations. Similarly, routing ele- ments are placed and configured to achieve the desired design functionality. An attacker here can potentially modify the placement of the blocks or add additional blocks to the design.

Lastly, the generated bitstream provides the correlation between the configuration memory inside the FPGA and the behavior of the logic blocks and routing elements. The bitstream can be attacked directly to modify the configuration memory which in turn modifies the functionality of FPGA, as will be discussed in Section V. During the synthesis and place and route steps, the designs are often checkpointed by the bitstream development software. These design checkpoints allow for the possibility of an insider threat to modify or insert elements in the design by the editing of the intermediate software file or by even creating a malicious modification to the FPGA design software [14].

Trojan Countermeasures: Borrowing from ASIC Trojan detection work, techniques presented by Salmani et al. [15] can be used to detect Trojans in FPGA designs at the IP and synthesis levels. These techniques operate on the principle that the triggering of a Trojan is likely a rare occurrence, and thus potentially identified by profiling and/or simulating a design to probe for rarely activated logic. At the place and route level, techniques have been presented for ASICs using the built-in self-authentication (BISA) [16] approach to add a test infrastructure inside a design to test for the placement of additional malicious logic. Khaleghi et al. extended this concept into the FPGA space to fill unused Logic Blocks and routing elements with a test-verifiable dummy design to prevent attackers from utilizing unused FPGA resources to insert Trojans [17].

IP Piracy Attacks: IP piracy-based threats in the bitstream-

Fig. 4: The fundamental building blocks of the FPGA with high- lighted Trojan insertion points [13].

generation phase [18] involve common threats familiar to ASIC and software IP piracy. IP overuse refers to implement- ing more instances of an IP than specified by the IP licensing agreement, and it is becoming a larger threat as the market for FPGA IP grows. IPs without specific licensing protection are vulnerable to an attacker generating more bitstreams than allowed. IP theft, IP reuse, and IP reverse engineering are also a growing concern as techniques have been published discussing tool flows to convert between intermediate formats in the FPGA design flow cycle [19]. The specific issues regarding the direct manipulation of the bitstream at the end of the FPGA design flow are discussed in Section V.

IP Piracy Countermeasures: To detect the instances of IP piracy, watermarks [20] can be inserted by the user in the IP design stage and then evaluated at a later time to provide a proof of authorship. FPGA vendor software packages currently offer the distribution of third-party IP encrypted using IEEE standard p1735 [21] to protect against reverse engineering activities. To further protect against IP overuse, researchers have proposed methodologies incorporating a PUF response from the chip into the licensing to generate a device-specific key to enable design functionality within a given device [22]. The general concept is shown in Figure 5, where a locked bitstream component containing the IP is stored alongside a challenge in a non-volatile memory. At runtime, the PUF is evaluated, and its response is used to unlock the IP to enable its design functionality. Two-party variants of these licensing schemes have been proposed as well to improve efficiency [23], [24]. Logic obfuscation is another powerful technique used at the design level to defend against IP piracy [25]. Typical logic obfuscation schemes integrate logic locking gates into a design to disable normal functionality unless correct values are applied to the logic locking gate inputs.

B. Non-Malicious Threats

Attacks: Traditionally, the vendor bitstream generation tools do not inherently offer security checking while they implement the design flow. Consequently, there may be un- intended security vulnerabilities introduced during the trans-

Security Invited 1.1 INTERNATIONAL TEST CONFERENCE 4

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 18:03:46 UTC from IEEE Xplore. Restrictions apply.

Fig. 5: A PUF-based licensing scheme binding IP to specific FPGA devices to prevent FPGA IP piracy and overuse [22].

lation of a design into the logic blocks inside the FPGA. High-level synthesis (HLS) specifically creates a higher level of design abstraction for logic block representation, which may increase the probability of unintended vulnerabilities. An example for this may be an AES encryption engine implemen- tation, which is cryptographically secure at the C language abstraction but leaks information when it is synthesized to a hardware logic for an FPGA implementation [26]. Research findings compromising FPGA bitstream generation tools at various stages have also been published [14].

Countermeasures: The defense against tool-induced vul- nerabilities first begins by adhering to proven best software security processes and verifying the tool authenticity through the use of a trusted vendor-provided hash during tool download and installation. At the design level, researchers have proposed a moving target defense to defend against attacks originating from malicious FPGA software tooling [27]. The movement of the target in this situation is the randomization of the synthesis and place and route operations by the vendor tools so that the attacker cannot predict the necessary information from a user design required to conduct a meaningful attack. Researchers have addressed high-level Trojan insertion by proposing Trojan-aware HLS [26]. Here, equivalence checking is performed between the original higher-level code and the lower-level code during the design space exploration (DSE) of the HLS operation. Moreover, a set of security properties can be developed to be used in formal verification tools to ensure the safe design translation. FPGA software vendors can also provide design checkpoint hashes which should be used along with proven best practices for software security during the design development.

V. BITSTREAM-AT-REST

After a bitstream has been generated, it needs to be stored someplace so that it can eventually be loaded onto the FPGA on its quest to perform its intended function. We use the term bitstream-at-rest to define this storage state. Bitstream storage locations for this stage can include multiple loca- tions, including the hard disk of the computer used to run the FPGA development software during bitstream generation.

Other storage locations can include a non-volatile memory used to configure the FPGA upon the application of power, or even a software repository of system-level firmware images containing an FPGA bitstream. Bitstreams in this state may be stored in their encrypted or plaintext versions, and are vulnerable to tampering or IP extraction.

In order to develop an attack against a bitstream or to extract its IP, a relationship between the bitstream and its hardware behavior must be established. The format of a bitstream for Xilinx, Intel, and Microsemi FPGAs is vendor proprietary and often serves as the first line of defense against such threats. However, multiple researchers have published techniques to successfully reverse engineer a vendor-proprietary plaintext bitstream into a netlist [29], [28]. The flow by Zhang et al. [28] depicted in Figure 6 is an example of one such technique where the bitstream is parsed into a functioning netlist that can be simulated and analyzed. Once a netlist has been obtained, it can be used to analyze a design for Trojans [28], or used for malicious activities such as IP piracy or tampering.

Our threat taxonomy in Figure 2 divides threats in the bitstream-at-rest stage into bitstream tampering and IP piracy categories. Example attacks and countermeasures are pre- sented below.

A. Bitstream Tampering

Attacks: Chakraborty et al. first introduced the concept of Trojan insertion using plaintext bitstream manipulations in 2013 [30]. Since then, researchers have increased the sophis- tication of bitstream manipulation attacks to create automated blind attacks on soft IP blocks within an FPGA, such as soft-encryption cores [31]. Bitstream reverse engineering can further refine the scope of bitstream-based attacks to target specific soft IP blocks inside the FPGA fabric [32].

Countermeasures: Techniques proposed by Kamali et al. [33] and Karam et al. [34] help defend against tampering attacks by applying logic locking at the bitstream level to help obfuscate the netlist functionality from the attacker. To accomplish the logic locking, the authors construct keys at runtime using PUFs implemented within the FPGA fabric. The PUF responses for each device are then connected to lookup tables (LUTs) within the user design to act as a key so that

Fig. 6: A reverse engineering workflow translating a decrypted bitstream into a netlist [28].

Security Invited 1.1 INTERNATIONAL TEST CONFERENCE 5

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 18:03:46 UTC from IEEE Xplore. Restrictions apply.

correct design functionality will only occur if the correct key value is applied. Hence, if the attacker does not know the correct key values, the attacker will not be able to extract the correct functional netlist and as a consequence, will not be able to find the desired node to tamper.

B. IP Piracy

Attacks: Another attack goal may be to extract proprietary IP from a bitstream. Motivations for this may include not paying for IP and then subsequently using the IP illegally in a design, or reverse engineering an IP to extract proprietary information that may be used commercially.

Countermeasures: Inserting watermarks at the bitstream level has been proposed by Schmid et al. [35]. In contrast, RTL watermarking, this technique directly embeds a watermark into the LUT contents of a design. Watermark extraction and comparison is then performed at the bitstream level to determine authorship.

C. Single Key Encryption

Countermeasures: Modern Xilinx, Intel, and Microsemi FPGAs offer encrypted versions of the generated bitstreams to increase resistance to bitstream tampering, piracy, and reverse engineering activities. In fact, modern Microsemi FPGAs such as the Polarfire, Igloo2, and SmartFusion2 series, only store encrypted versions of their bitstream [36]. While researchers have demonstrated bitstream tampering attacks on encrypted bitstreams [31] that have an observable behavior, encryption makes targeted tampering attacks infeasible without knowl- edge of the encryption key.

D. Red/Black Encryption

Countermeasures: The concept of a red/black encryption scheme has been adopted by Intel, Xilinx, and Microsemi in their respective Stratix 10, Ultrascale+, and Polarfire product lines. The basic concept is outlined in Figure 7 for the Xilinx Ultrascale+ Zynq [37]. Here, the red key used to decrypt the

Fig. 7: A red/black encryption key flow where the key decrypting the bitstream at runtime is obfuscated from the key stored inside the FPGA [37].

bitstream is not stored directly inside the FPGA. Instead, a device-specific PUF is used to generate a black key that is stored inside the FPGA. Upon power-up, the PUF is exercised, and its response (black key) is used to generate the red key for decrypting the bitstream used to populate the FPGA fabric. As the black key is not directly used to encrypt the bitstream, it cannot be used by itself to decrypt the bitstream by an attacker.

VI. BITSTREAM-LOADING

The Bitstream-Loading stage loads the bitstream into the configuration memory of the FPGA. The specifics of this stage vary depending upon the configuration memory variant from different vendors and FPGA device models. Non-volatile memory-based FPGAs, such as Flash-based or antifuse-based, only experience this stage when loading a new bitstream. SRAM-based FPGAs require the bitstream to be loaded every time power is applied to the FPGA to turn it on for functional application. A set of on-chip authentication and decryption circuitry is often employed by the FPGA during this stage to both authenticate and decrypt the bitstream before loading it into the configuration memory. FPGA-based system-on-chip devices, such as the Xilinx Zynq family, add additional fea- tures to the bitstream loading process in terms of bootloaders as well as physical processor cores implemented on the same silicon. Attacks considered within this stage originate from unintended side channels as well as loading outdated, and potentially vulnerable, bitstream versions.

Our threat taxonomy in Figure 2 divides threats in the bitstream-loading stage into two categories. First, we discuss side channel threats which involve the extraction of sensitive on-chip information during the loading of the bitstream. Sec- ond, we discuss replay attacks, where older, or potentially unauthorized versions of a bitstream are loaded into the FPGA.

A. Side Channel Threats

Attacks: Side channel attacks (SCA) have been applied to earlier generations of FPGAs to extract encryption keys by collecting information through unintended side channels. Encryption keys have been extracted in early generations of FPGAs by researchers analyzing the power consumption during the decryption process [38]. Similarly, information from a user design running in the fabric has been shown to leak in the electromagnetic spectrum [39]. Recently, laser- based approaches have shown the ability to read out on-chip information [40].

Countermeasures: FPGA vendors have addressed these attacks by implementing side-channel defenses [36], [10], [11] in their latest products to eliminate the leakage of key material. Key rolling limits the amount of time an attacker has to extract a given key, defeating attacks that rely upon multiple samples such as differential power analysis (DPA). The black/red scheme discussed in section V-D reduces the impact of a key being extracted as well since the key stored in the non-volatile memory of the FPGA is not the final key used to encrypt or decrypt the bitstream. In addition, the academic research

Security Invited 1.1 INTERNATIONAL TEST CONFERENCE 6

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 18:03:46 UTC from IEEE Xplore. Restrictions apply.

Fig. 8: Xilinx secure boot process flow where user code first stage bootloader (FSBL) loads the bitstream into the programmable logic of an FPGA-based SoC [37].

community has proposed physical techniques such as nanopy- ramids [41] to defend against these attacks. Nanopyramids are intended to be inserted in the device manufacturing flow to introduce random changes in the optical reflectance properties of silicon when conducting optical probing attacks, preventing an attacker using reflectance information to reveal information about the corresponding circuit storing key material.

B. Bitstream Replay Threats

Attacks: Bitstream versioning refers to the concept of having multiple versions of a bitstream for a given FPGA- based system. Analogous to the software world, a vulnerability can be discovered within an FPGA bitstream, requiring an updated bitstream to be loaded into the device. However, if the FPGA has already been deployed in the field, an adversary can potentially downgrade it to use the original bitstream containing the vulnerability [36]. Classical encryption and authentication techniques do not protect against this concern as the original vulnerable bitstream was encrypted and authen- ticated with the same encryption key as the updated bitstream. The act of securely transmitting an updated bitstream to the FPGA is another security concern.

Countermeasures: Microsemi addresses replay attacks by implementing a versioning control in their bitstream, combined with setting different non-volatile version control bits within their FPGA [36]. Xilinx offers QuickBoot [42] as a solution to load different bitstream versions in different non-volatile memory locations depending upon bits set in the bitstream. Researchers have addressed possible security concerns with the concept of transmitting an updated bitstream to a device by implementing an authenticated station-to-station protocol [43] or implementing custom protocols within user logic [44].

C. FPGA-based SoCs

Attacks: The introduction of the SoC-based FPGAs such as the Xilinx Zynq [37] and Microsemi SmartFusion [9] adds additional steps to the loading of the bitstream. SoC- based FPGAs introduce the concept of a first stage bootloader (FSBL), which is a user code designed to facilitate the loading of the bitstream as well as to configure the non-FPGA aspects of the SoC, such as the processor and other hard IP blocks. Figure 8 shows the Xilinx “secure boot” implementation where immutable BootROM code is used to boot the SoC and run the user code within the FSBL, which eventually loads the bitstream [37]. Attacks in this boot process can thus result from running a malicious FSBL code in the SoC processor.

Countermeasures: To protect the privacy and integrity of the FSBL, SoCs typically implement a FSBL authentication

Fig. 9: Xilinx Zynq first stage bootloader (FSBL) authentication process [45].

scheme, such as the RSA-based authentication applied to the Xilinx Zynq series shown in Figure 9. Here, a public/private key pair is used to compare a hash signature on FSBL code with a hash signature stored in the FPGA’s non-volatile memory to only run authenticated FSBL code [45].

VII. BITSTREAM-RUNNING

The Bitstream-Running stage defines the stage when the bitstream has been loaded into the configuration memory, and the FPGA is operating according to its hardware configuration. As shown in our threat taxonomy in Figure 2, this stage is vulnerable to fault injection and run-time threats originat- ing within the fabric. Faults injected into the configuration memory, or directly into logic blocks and routing resources, can modify the functionality of the FPGA and are a primary concern in this stage. To correct faults in the configuration memory, and to provide the FPGA designer with more flex- ibility, modern FPGAs also include a partial reconfiguration framework within their architectures to allow for the bitstream to change dynamically at run-time. Partial reconfiguration allows the FPGA design itself to update portions of the design at run-time while keeping the remainder of the design intact.

We divide threats in the bitstream-running stage into two categories according to our threat model in Figure 2. First, we discuss fault injection on a running FPGA design. Next, we discuss the emerging topic of run-time attacks.

A. Fault Injection

Threats: Faults may be injected into an FPGA running a de- sign through a variety of means, such as clock glitches, power glitches, electromagnetic pulses, laser exposure, or ionizing ra- diation [46]. The physical mechanisms behind fault injections have root in the physical transistors themselves. Therefore, threats to integrated circuits can be considered applicable to FPGAs. For example, random bit flips in the configuration memory caused by atmospheric single event upsets (SEUs) [47] are more of a concern as technology feature sizes shrink and thus more of a concern for newer FPGAs fabricated in state-of-the-art manufacturing processes. Targeted laser-based fault injection [48] has also been discussed by researchers, primarily as a means to replicate SEUs for hardening designs to space radiation effects.

Security Invited 1.1 INTERNATIONAL TEST CONFERENCE 7

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 18:03:46 UTC from IEEE Xplore. Restrictions apply.

Countermeasures: Partial reconfiguration cores, such as the internal configuration access port (ICAP) for Xilinx FP- GAs, are included in modern FPGAs. Providing a user design with access to the partial reconfiguration core in an FPGA has been largely regarded as a security vulnerability [5] as the user design can then have the capability to read and write any area of the configuration memory. However, partial reconfiguration is also used as a mechanism to detect and correct for inadvertent bit flips in the configuration memory, such as those caused by radiation-induced single event upsets [49]. Several academic papers have proposed the use of the Xilinx ICAP to read the configuration memory of an FPGA at run-time and generate a hash for comparison against an expected hash in order to detect run-time tampering [50], [51]. In any usage of a partial reconfiguration core, proper safeguards must be put in place.

B. Run-time Attacks

The massively parallel nature of FPGAs has lent themselves to inclusion in data centers where users can purchase comput- ing time. Amazon offers fee-based access to its FPGAs in the cloud through their Amazon Web Service (AWS) program [52]. Here, a shell architecture is described to abstract away communication links and create a separate application area in the FPGA. First, a design is created containing an ‘AWS partial reconfigurable (PR) shell’ to facilitate the loading of a user design. Next, the user design is loaded by the AWS PR shell to fit into the designated user ‘custom PR logic’ section of the floorplan. The shell protects a Peripheral Component Interconnect (PCI) Express connection in the ‘static’ region, manages clocking for the user region, and monitors activity elsewhere in the FPGA [53].

Threats: The shared FPGA computing resources described above have enabled a new class of remote side-channel attack, where one bitstream can leak or corrupt information in another bitstream, from a remote location. The example attack typi- cally runs a user design, such as an oscillator-based array, on a shared FPGA fabric in order to affect another user’s design [54], [55]. In Figure 10(a), Schellenberg et al. showed that a malicious bitstream can extract secrets from a victim bitstream sharing the same FPGA fabric [54] by corrupting the power in the shared power distribution network (PDN). It is also shown that malicious code running in the FPGA SoC can corrupt the PDN in a similar way to extract secrets from a victim FPGA design (see Figure 10(b)). Similarly, research has been performed to illustrate the possibility of a shared system-wide resource like the printed circuit board (PCB) PDN being used to affect an FPGA design. Here, the PCB PDN is corrupted by another PCB component to induce faults or leak information from the FPGA [55].

Countermeasures: Vendor-provided defenses to this type of attack include a bitstream-level screening of tenant bit- streams to check for suspicious functionality, such as multiple parallel ring-oscillator arrays that could potentially create power glitching. Modern FPGAs also incorporate on-chip voltage and temperature sensors to allow for the detection of

Fig. 10: Remote side-channel attack where user bitstream information can be extracted by a) malicious bitstreams on the same FPGA, or b) malicious CPU code on the same FPGA SoC [54].

anomalies in a shared FPGA resource, like the PDN. Vendor- provided soft-core defenses, such as the Xilinx Security Mon- itor (SecMon) core, implement the reading of on-chip voltage and temperature sensors, as well as the configuration memory health, to detect anomalous behavior and implement tampering penalties such the zeroization of the configuration memory, AES keys, or asserting the global reset of the FPGA [56].

VIII. BITSTREAM-END-OF-LIFE

The last stage in our bitstream’s lifecycle is defined as bitstream-end-of-life. We use this term to represent both the end-of-life (EOL) of a bitstream as well as a stage to capture threats to the physical FPGA device that are tangentially related to the bitstream. EOL in this context refers to when the FPGA running a bitstream has been decommissioned. This could refer to a formal decommission and destruction of a high-value proprietary system or the casual disposal of an FPGA-based networking router to a public trashcan. For other threats related to the bitstream, we briefly address FPGA device counterfeiting and reverse engineering.

We again refer to our threat taxonomy in Figure 2 and focus our discussion on two categories: data remanence and FPGA device counterfeiting.

A. Bitstream Remanence

Threats: As FPGA-based systems reach EOL, their bit- streams are also retired from use. A bitstream residing in an on-board non-volatile memory chip initially designed to program an SRAM-based FPGA may remain on that board in- definitely, remaining vulnerable to potential bitstream reverse engineering activities. Similarly, a bitstream stored in a Flash or antifuse-based FPGA may remain on the FPGA after the system has been disposed, creating a potential opportunity for bitstream extraction.

Countermeasures: FPGA vendors have incorporated ze- roization mechanisms into their on-chip security features that allow for certain information stored within an FPGA to be deleted by a user or as a tamper penalty. This information space may include the original bitstream, or any other volatile and non-volatile information inside the FPGA. The Microsemi PolarFire FPGA family offers three levels of zeroization: like-new, recoverable, and unrecoverable [36]. The like-new option deletes user data and keys and returns the device to its factory state. Recoverable is more comprehensive and places the device in a state that is only recoverable by a Microsemi

Security Invited 1.1 INTERNATIONAL TEST CONFERENCE 8

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 18:03:46 UTC from IEEE Xplore. Restrictions apply.

Fig. 11: programmable ICs (FPGAs) were the most reported coun- terfeit device type reported by ERAI in 2017 [22].

factory programming file. The unrecoverable option is the most thorough, incorporating the destruction of all on-chip data. Serialization certificates are provided by the device in all three cases via a JTAG/SPI instruction to prove that the operation was successful. Similar procedures are provided by other manufacturers as well [56].

B. FPGA Device Counterfeiting

Threats: FPGAs are frequently among the most popular counterfeit IC device types. As shown in Figure 11, ERAI listed FPGAs as occupying approximately 20 percent of their reported counterfeit part instances [57]. An example FPGA counterfeiting technique involves the selling of used devices as new devices. Remarking devices to represent more expensive devices is also a concern, as legacy and industrial/military grade FPGAs sell for a significant premium compared to their standard counterparts. These counterfeits FPGAs pose a threat to systems as their electrical and mechanical specifications as well as reliability are subject to compromise.

Countermeasures: FPGA vendors have addressed counter- feiting by both offering newer FPGA product lines designed to serve as drop-in replacements for legacy FPGAs and making their devices more difficult to counterfeit. Xilinx offers pin- compatible devices in their newer Ultrascale+ product lines that can replace older Ultrascale devices [58]. The major FPGA vendors also offer device-specific markings, such as unique packaging lid shapes, that defend against simple package remarking attacks [59]. Academic researchers have proposed the electrical characterization of oscillator structures programmed into the FPGAs to tease out reliability physics mechanism responses such as negative bias temperature insta- bility (NBTI) and hot carrier injection (HCI) that can indicate whether an FPGA has had previous usage [60].

FPGA vendors have begun incorporating fabric-accessible mask-level device serial numbers and lot numbers, such as the DeviceDNA information found in Xilinx FPGAs, to al- low users to determine the authenticity of a given FPGAs. DeviceID information indicating an FPGA product family line is often accessible through the IEEE joint test action group (JTAG) interface as well. Other counterfeit detection techniques designed for ASICs are also applicable to FPGAs and considered out of the scope for this paper [61].

IX. CONCLUSION

Our journey through the life of a bitstream has now come to an end. We have ventured through five different stages within the bitstream lifecycle: 1) bitstream-generation, 2) bitstream-at-rest, 3) bitstream-loading, 4) bitstream-running, and 5) bitstream-end-of-life. Each stage offered a connection to different entities in the FPGA design flow and contained unique threats along with countermeasures available from both FPGA vendors and academia. A threat taxonomy was introduced to capture the complex interactions between the bitstream stages and the design flow entities and highlight stage-specific threats.

Our threat taxonomy divided threats into two broad cate- gories according to each distinct bitstream stage. More specific threats and countermeasures were discussed in each threat category to help inform the reader of the current state of the art. As with any security-based research, a holistic approach towards security is recommended for each design flow entity to identify the pertinent threats and implement appropriate countermeasures.

REFERENCES [1] Microsemi, “Field-programmable gate array technology..” Norwell, MA,

USA:Kluwer, 1994. [2] S. Trimberger, “Three ages of fpgas: A retrospective on the first thirty

years of fpga technology,” Proceedings of the IEEE, vol. 103, no. 3, pp. 318–331, 2015.

[3] W. Carter, K. Duong, R. H. Freeman, H. Hsieh, J. Y. Ja, J. E. Mahoney, L. T. Ngo, and S. L. Sze, “A user programmable reconfigurable gate array,” in Proceedings Custom Integrated Circuits Conference, pp. 233– 235, IEEE, 1986.

[4] Xilinx, “Ultrascale fpga product tables and product selection guide..” Xilinx, 2016.

[5] S. M. Trimberger and J. J. Moore, “Fpga security: Motivations, features, and applications,” Proceedings of the IEEE, vol. 102, no. 8, pp. 1248– 1265, 2014.

[6] W. Zhao, E. Belhaire, C. Chappert, and P. Mazoyer, “Spin transfer torque (stt)-mram–based runtime reconfiguration fpga circuit,” ACM Transactions on Embedded Computing Systems (TECS), vol. 9, no. 2, p. 14, 2009.

[7] Xilinx, “Configuration issues: Power-up, volatility, security, battery back-up.” Xilinx, Appl. Note XAPP092, 1997.

[8] Xilinx, “Method and apparatus for protecting proprietary configuration data for programmable logic devices.” U.S. Patent 6 654 889, 2003.

[9] Microsemi, “Ug0443 user guide smartfusion2 and igloo2 fpga security and best practices.” 2015.

[10] E. Peterson, “Xapp1098 (v1.3): Developing tamper-resistant designs with ultrascale and ultrascale+ fpgas,” 2018.

[11] Intel, “Ug-s10security:intel stratix 10 device security user guide,” 2019. [12] M. Tehranipoor and F. Koushanfar, “A survey of hardware trojan

taxonomy and detection,” IEEE design & test of computers, vol. 27, no. 1, pp. 10–25, 2010.

[13] S. Mal-Sarkar, A. Krishna, A. Ghosh, and S. Bhunia, “Hardware trojan attacks in fpga devices: threat analysis and effective counter measures,” in Proceedings of the 24th Edition of the Great Lakes Symposium on VLSI, pp. 287–292, ACM, 2014.

[14] C. Krieg, C. Wolf, and A. Jantsch, “Malicious lut: a stealthy fpga trojan injected and triggered by the design flow,” in Proceedings of the 35th International Conference on Computer-Aided Design, p. 43, ACM, 2016.

[15] H. Salmani and M. Tehranipoor, “Analyzing circuit vulnerability to hard- ware trojan insertion at the behavioral level,” in 2013 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS), pp. 190–195, IEEE, 2013.

[16] K. Xiao and M. Tehranipoor, “Bisa: Built-in self-authentication for preventing hardware trojan insertion,” in 2013 IEEE international sym- posium on hardware-oriented security and trust (HOST), pp. 45–50, IEEE, 2013.

Security Invited 1.1 INTERNATIONAL TEST CONFERENCE 9

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 18:03:46 UTC from IEEE Xplore. Restrictions apply.

[17] B. Khaleghi, A. Ahari, H. Asadi, and S. Bayat-Sarmadi, “Fpga-based protection scheme against hardware trojan horse insertion using dummy logic,” IEEE Embedded Systems Letters, vol. 7, no. 2, pp. 46–50, 2015.

[18] A. Lesea, “Ip security in fpgas,” Xilinx http://direct. xilinx. com/bvdoc- s/whitepapers/wp261. pdf, 2007.

[19] N. Steiner, A. Wood, H. Shojaei, J. Couch, P. Athanas, and M. French, “Torc: towards an open-source tool flow,” in Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, pp. 41–44, ACM, 2011.

[20] A. K. Jain, L. Yuan, P. R. Pari, and G. Qu, “Zero overhead watermarking technique for fpga designs,” in Proceedings of the 13th ACM Great Lakes symposium on VLSI, pp. 147–152, ACM, 2003.

[21] IEEE, “Ieee recommended practice for encryption and management of electronic design intellectual property (ip).ieee sa-1735-2014.” 2014.

[22] J. Zhang, Y. Lin, Y. Lyu, and G. Qu, “A puf-fsm binding scheme for fpga ip protection and pay-per-device licensing,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 6, pp. 1137–1150, 2015.

[23] M. T. Rahman, D. Forte, Q. Shi, G. K. Contreras, and M. Tehranipoor, “Csst: an efficient secure split-test for preventing ic piracy,” in 2014 IEEE 23rd North Atlantic Test Workshop, pp. 43–47, IEEE, 2014.

[24] D. B. Roy, S. Bhasin, I. Nikolić, and D. Mukhopadhyay, “Combining puf with rluts: A two-party pay-per-device ip licensing scheme on fpgas,” ACM Transactions on Embedded Computing Systems (TECS), vol. 18, no. 2, p. 12, 2019.

[25] J. Rajendran, Y. Pino, O. Sinanoglu, and R. Karri, “Security analysis of logic obfuscation,” in Proceedings of the 49th Annual Design Automa- tion Conference, pp. 83–89, ACM, 2012.

[26] A. Sengupta, S. Bhadauria, and S. P. Mohanty, “Tl-hls: methodology for low cost hardware trojan security aware scheduling with optimal loop unrolling factor during high level synthesis,” IEEE Transactions on computer-aided design of integrated circuits and systems, vol. 36, no. 4, pp. 655–668, 2016.

[27] Z. Zhang, Q. Yu, L. Njilla, and C. Kamhoua, “Fpga-oriented moving target defense against security threats from malicious fpga tools,” in 2018 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), pp. 163–166, IEEE, 2018.

[28] T. Zhang, J. Wang, S. Guo, and Z. Chen, “A comprehensive fpga reverse engineering tool-chain: From bitstream to rtl code,” IEEE Access, vol. 7, pp. 38379–38389, 2019.

[29] F. Benz, A. Seffrin, and S. A. Huss, “Bil: A tool-chain for bitstream reverse-engineering,” in 22nd International Conference on Field Pro- grammable Logic and Applications (FPL), pp. 735–738, IEEE, 2012.

[30] R. S. Chakraborty, I. Saha, A. Palchaudhuri, and G. K. Naik, “Hardware trojan insertion by direct modification of fpga configuration bitstream,” IEEE Design & Test, vol. 30, no. 2, pp. 45–54, 2013.

[31] P. Swierczynski, G. T. Becker, A. Moradi, and C. Paar, “Bitstream fault injections (bifi)–automated fault attacks against sram-based fpgas,” IEEE Transactions on Computers, vol. 67, no. 3, pp. 348–360, 2017.

[32] M. Ender, P. Swierczynski, S. Wallat, M. Wilhelm, P. M. Knopp, and C. Paar, “Insights into the mind of a trojan designer: the challenge to integrate a trojan into the bitstream,” in Proceedings of the 24th Asia and South Pacific Design Automation Conference, pp. 112–119, ACM, 2019.

[33] H. M. Kamali, K. Z. Azar, K. Gaj, H. Homayoun, and A. Sasan, “Lut-lock: A novel lut-based logic obfuscation for fpga-bitstream and asic-hardware protection,” in Proceedings VLSI (ISVLSI) 2018 IEEE Computer Society Annual Symposium on. EH-2001, pp. 405–410, IEEE, 2018.

[34] R. Karam, T. Hoque, S. Ray, M. Tehranipoor, and S. Bhunia, “Ro- bust bitstream protection in fpga-based systems through low-overhead obfuscation,” in 2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig), pp. 1–8, IEEE, 2016.

[35] M. Schmid, D. Ziener, and J. Teich, “Netlist-level ip protection by watermarking for lut-based fpgas,” in 2008 International Conference on Field-Programmable Technology, pp. 209–216, IEEE, 2008.

[36] Microsemi, “User guide polarfire fpga security.” Microsemi, User Guide UG07532, 2018.

[37] E. Peterson, “Xapp1323 (v1.1): Developing tamper-resistant designs with zynq ultrascale+ devices,” 2018.

[38] A. Moradi, A. Barenghi, T. Kasper, and C. Paar, “On the vulnerability of fpga bitstream encryption against power analysis attacks: extracting keys from xilinx virtex-ii fpgas,” in Proceedings of the 18th ACM conference on Computer and communications security, pp. 111–124, ACM, 2011.

[39] E. De Mulder, P. Buysschaert, S. Ors, P. Delmotte, B. Preneel, G. Van- denbosch, and I. Verbauwhede, “Electromagnetic analysis attack on an fpga implementation of an elliptic curve cryptosystem,” in EUROCON 2005-The International Conference on” Computer as a Tool”, vol. 2, pp. 1879–1882, IEEE, 2005.

[40] S. Tajik, H. Lohrke, J.-P. Seifert, and C. Boit, “On the power of optical contactless probing: Attacking bitstream encryption of fpgas,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1661–1674, ACM, 2017.

[41] H. Shen, N. Asadizanjani, M. Tehranipoor, and D. Forte, “Nanopyramid: An optical scrambler against backside probing attacks,” in ISTFA 2018: Proceedings from the 44th International Symposium for Testing and Failure Analysis, p. 280, ASM International, 2018.

[42] Xilinx, “Quickboot method for fpga design remote update.” Xilinx, Appl. Note XAPP1081, 2014.

[43] J. Vliegen, N. Mentens, and I. Verbauwhede, “Secure, remote, dynamic reconfiguration of fpgas,” ACM Transactions on Reconfigurable Tech- nology and Systems (TRETS), vol. 7, no. 4, p. 35, 2015.

[44] S. Drimer and M. G. Kuhn, “A protocol for secure remote updates of fpga configurations,” in International Workshop on Applied Reconfig- urable Computing, pp. 50–61, Springer, 2009.

[45] E. Peterson, “Wp468 (v1.0): Leveraging asymmetric authentication to enhance security-critical applications using zynq-7000 all programmable socs,” Retrieved October, 2015.

[46] H. Li, G. Du, C. Shao, L. Dai, G. Xu, and J. Guo, “Heavy-ion microbeam fault injection into sram-based fpga implementations of cryptographic circuits,” IEEE Transactions on Nuclear Science, vol. 62, no. 3, pp. 1341–1348, 2015.

[47] A. Lesea, S. Drimer, J. J. Fabula, C. Carmichael, and P. Alfke, “The rosetta experiment: atmospheric soft error rate testing in differing tech- nology fpgas,” IEEE Transactions on Device and Materials Reliability, vol. 5, no. 3, pp. 317–328, 2005.

[48] V. Pouget, A. Douin, G. Foucard, P. Peronnard, D. Lewis, P. Fouillat, and R. Velazco, “Dynamic testing of an sram-based fpga by time-resolved laser fault injection,” in 2008 14th IEEE International On-Line Testing Symposium, pp. 295–301, IEEE, 2008.

[49] J. Heiner, B. Sellers, M. Wirthlin, and J. Kalb, “Fpga partial reconfigu- ration via configuration scrubbing,” in 2009 International Conference on Field Programmable Logic and Applications, pp. 99–104, IEEE, 2009.

[50] T. Güneysu, I. Markov, and A. Weimerskirch, “Securely sealing multi- fpga systems,” in International Symposium on Applied Reconfigurable Computing, pp. 276–289, Springer, 2012.

[51] D. Owen Jr, D. Heeger, C. Chan, W. Che, F. Saqib, M. Areno, and J. Plusquellic, “An autonomous, self-authenticating, and self-contained secure boot process for field-programmable gate arrays,” Cryptography, vol. 2, no. 3, p. 15, 2018.

[52] D. Pellerin, “Announcing amazon ec2 fi instances with custom fpgas.” ”https://www.slideshare.netlAmazonWebServices/ announcing-amazon-ec2-fl-instances-with-custom-fpgas, retrieved,April13,2017”.

[53] S. Trimberger and S. McNeil, “Security of fpgas in data centers,” in 2017 IEEE 2nd International Verification and Security Workshop (IVSW), pp. 117–122, IEEE, 2017.

[54] F. Schellenberg, D. R. Gnad, A. Moradi, and M. B. Tahoori, “An inside job: Remote power analysis attacks on fpgas,” in 2018 Design, Automa- tion & Test in Europe Conference & Exhibition (DATE), pp. 1111–1116, IEEE, 2018.

[55] M. Zhao and G. E. Suh, “Fpga-based remote power side-channel attacks,” in 2018 IEEE Symposium on Security and Privacy (SP), pp. 229–244, IEEE, 2018.

[56] Xilinx, “Security monitor ip core product brief.” Xilinx, Product Brief, 2015.

[57] D. Akhoundov, “2017 erai reported parts analysis.” ”http: //www.erai.com/ERAI Blog/3139/Damir Akhoundov 2017 ERAI Reported Parts Analysis”.

[58] Xilinx, “Ultrascale architecture and product data sheet: Overview.” Xilinx, Datasheet DS890 (v3.10), 2019.

[59] Xilinx, “Xq ultrascale architecture data sheet: Overview.” Xilinx, Datasheet DS895 (v2.0), 2018.

[60] M. M. Alam, M. Tehranipoor, and D. Forte, “Recycled fpga detection using exhaustive lut path delay characterization,” in 2016 IEEE Inter- national test conference (ITC), pp. 1–10, IEEE, 2016.

[61] M. M. Tehranipoor, U. Guin, and D. Forte, “Counterfeit integrated circuits,” in Counterfeit Integrated Circuits, pp. 15–36, Springer, 2015.

Security Invited 1.1 INTERNATIONAL TEST CONFERENCE 10

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 18:03:46 UTC from IEEE Xplore. Restrictions apply.

sources/166/Trimberger and Moore - 2014 - FPGA Security From Features to Capabilities to Tr.pdf

FPGA Security: From Features to Capabilities to Trusted Systems

Steve Trimberger Xilinx

2100 Logic Dr. San Jose, CA 95124 USA

{[email protected]}

Jason Moore Xilinx

5051 Journal Center Boulevard NE. Albuquerque, NM 87109 USA

{[email protected]}

ABSTRACT FPGA devices provide a range of security features which can provide powerful security capabilities. This paper describes many security features included in present-day FPGAs including bitstream authenticated encryption, configuration scrubbing, voltage and temperature sensors and JTAG-intercept. The paper explains the role of these features in providing security capabilities such as privacy, anti-tamper and protection of data handled by the FPGA. The paper concludes with an example of a single-chip cryptographic system, a trusted system built with these components.

Categories and Subject Descriptors B.7.1. Integrated Circuits, Types and Design Styles. FPGA

General Terms Design, Security

Keywords FPGA, Trusted Design, Bitstream Encryption, Cryptography

1. INTRODUCTION As FPGAs have grown in capability, the value of the applications in the FPGA has grown accordingly. Starting in the early 2000s, SRAM FPGA vendors offered bitstream encryption to protect their customers’ bitstreams from reverse-engineering. The usage of FPGAs has continued to grow into applications such as digital cinema, where the data handled by the FPGAs must be protected as well. Further, attacks on the operating FPGA device have grown in sophistication, leading FPGA vendors to provide additional security features. Today, FPGAs provide a large number of features to support secure configuration and operation.

2. FPGAS AND THE MANUFACTURING FLOW The FPGA lifecycle includes two design flows: the base array design and the application design (figure 1), and security must be maintained through both[8]. The base array design is a standard integrated circuit development flow controlled by the FPGA manufacturer. The base array is designed using commercial design tools and libraries, manufactured at a foundry and tested. It is then typically sent to another facility for packaging and final test. The resulting base array is shipped to a customer or authorized distributor. The base array design is subject to all the supply chain trust and security concerns as any other integrated circuit, including questions about tampering with tools, supply- chain control and reverse-engineering. Large FPGA manufacturers maintain a close watch on their supply chain, tracking every device through to final customer delivery or destruction. In addition, they audit their suppliers’ systems and processes. As the security issues associated with the design and manufacture of the base array are no different than those of other semiconductor devices, this paper does not focus on the base array design and manufacture, but instead focuses on the security concerns that arise from the need to protect the application design.

The application design also has a design phase, typically performed with FPGA vendors’ tools, but often augmented with commercial EDA tools. The application developer integrates design information from a number of sources into an FPGA application: original and re-used HDL code, libraries from the FPGA vendor and other parties and software for soft and hard microprocessors. The FPGA vendor’s tools compile the application design into a bitstream, the programming of the FPGA base array to realize the application function. As with any design process, the design itself can be carried out in a secure location, with validated IP and tools. Protection of IP during the design phase is no different for FPGAs than it is for ASICs or microprocessors. Therefore, this paper does not address design-

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. DAC '14, June 01 - 05 2014, San Francisco, CA, USA Copyright 2014 ACM 978-1-4503-2730-5/14/06…$15.00. http://dx.doi.org/10.1145/2593069.2602555

Figure 1. FPGA lifecycle flows. Left: base array. Right: application

phase security.

3. SECURITY IN CONFIGURATION A non-volatile FPGA, such as a flash or anti-fuse FPGA, may be programmed before it is shipped. An SRAM FPGA is typically shipped with a separate non-volatile memory containing the programming, and when power is applied, the FPGA loads its programming from the non-volatile memory. This programming step was identified early as a potential security problem.

3.1 Bitstream Encryption Xilinx introduced bitstream encryption in 2001 in Virtex-II devices address the problem of cloning, unauthorized copy of the bitstream as it is loaded into the FPGA from external memory[6][7]. Since that time, other FPGA vendors have added encrypted-bitstream capability.

Preventing unauthorized copy does not strictly require encryption, since the task from a cryptographic point of view is to determine if the bitstream is authorized to operate in the FPGA. This fundamentally requires authentication, not confidentiality: a device could verify a message authentication code on the bitstream. However, a conceptually-simple attack involves reverse-engineering the bitstream and recompilation[8]. Therefore, reverse-engineering must also be prevented, so confidentiality of the bitstream became a requirement for preventing cloning.

3.2 Bitstream Authentication Encryption protects only the design, not the data handled by the design. Without some way to deter tampering with an encrypted design, one cannot guarantee that an adversary has not compromised the design to the point where he can extract data from the FPGA. The 32-bit data integrity check on the FPGA bitstream is insufficient to address this attack.

Although there have been no reports of such tampering of FPGAs, Xilinx integrated strong authentication in Virtex-6 devices and 7- series to address concerns of targeted tampering with encrypted bitstreams and the inherent cryptographic weaknesses of a CRC intended only for data integrity[9].

Virtex-6 and subsequent Xilinx FPGAs authenticate using the Secure Hash Algorithm (SHA-256) to compute a 256-bit Keyed

Hashed MAC (HMAC)[1][9]. The MAC result cannot be computed without knowing the secret hash key, thereby authenticating the identity of the sender as well as verifying that the message has not been altered. The 256-bit hash size ensures that any tampering with the bitstream will be detected with high probability. HMAC with SHA-256 makes tampering with the bitstream as computationally difficult as guessing the encryption key, which is also 256 bits.

The authentication feature provides resistance to design tampering, which assures the privacy of data inside the FPGA. Privacy of data handled by the FPGA is important in a large number of applications, including digital cinema, network communications and secure database access.

One particularly useful type of data handled by the FPGA is bitstream data. An FPGA with an authenticated encrypted bitstream can reconfigure using the internal configuration access port (ICAP) and still maintain privacy and integrity of design data, basing it on the original bitstream root of trust.

3.3 Configuration Options and Restrictions Manufacturing tests for SRAM FPGAs require that the configuration data be read back and verified, so this feature is part of the FPGA base array. To prevent theft of the application, readback is disabled when the FPGA is programmed with an encrypted bitstream. Other restrictions include prevention of mixing encrypted and non-encrypted data in a single application, since the non-encrypted application piece might be a Trojan inserted by an adversary. This restriction need only apply to external configuration. A secure application that takes control of its own programming may apply other restrictions on partial configuration, such as restricting the region for the new partial design.

As manufactured, SRAM FPGAs can be programmed with either an encrypted or unencrypted bitstream. Xilinx provides a non- volatile E-fuse that, when programmed, restricts the FPGA to accept only a secured bitstream, preventing a potential adversary from inserting a Trojan design into the system of which the FPGA is a part. Of course, an adversary can still substitute a new, un- programmed FPGA into the system, but this substitution is difficult to carry out in practice.

4. FEATURES FOR AN OPERATING FPGA Modern FPGAs include security features available to applications operating inside the FPGA. These features are selected by the FPGA application designer and included in the FPGA application design. They make the FPGA application an active agent in device security.

4.1 Device DNA Device DNA is a term used by Xilinx to refer a unique identifier for each FPGA manufactured. Device DNA is programmed into the chip during device manufacture by setting one-time- programmable E-fuses. The Device DNA field is typically 56 or 64 bits long, depending on the FPGA family. It is not secret. Anyone can read the device DNA field. The small size and lack of confidentiality of Device DNA preclude its use as a decryption key. Rather, Device DNA may be used to uniquely identify a specific

Secret FPGA/SoC

Secret “Red” AES Key

Secret “R d” AES K

User Encrypted,

Authenticated File

User Design

NV Memor

Un-Encrypted and Authenticated Configuration

AES-CBC/HMAC AES-CBC/HMAC

IMPACT SW via JTAG

Vivado/ISE

Fielded System

Figure 2. Xilinx 7-series FPGA Secure Bitstream Flows

FPGA device or a range of devices, and restrict the application to function only in those few devices.

4.2 Physically Unclonable Function (PUF) A Physically Unclonable Function (PUF) is an identifier derived from physical attributes of a specific manufactured device[2]. Like Device DNA, a PUF can uniquely identify a device. A PUF has advantages of privacy and possibly immutability and pamper- resistance. Typically, PUFs are built from FPGA fabric so they can be built of arbitrary size. A PUF may reside anywhere in the FPGA and be unidentifiable by an adversary. However, PUFs are not stable over the lifetime of an integrated circuit. Therefore, to use a PUF as a decryption key, a significant amount of ECC “helper data” is required to ensure a stable key value. The company Intrinsic-ID used a soft PUF structure to uniquely identify FPGAs for their metered-IP solution. [5]

4.3 Bitstream Scrubbing An adversary may attempt to change individual bits in the FPGA’s stored configuration data by focused radiation or power adjustment. Xilinx FPGAs include bitstream “scrubbing” hardware that includes ECC bits for each FPGA configuration data frame. When enabled, scrubbing monitors configuration data and corrects errant bits. Scrubbing has a power cost, so it is not active on all designs. An application may include an enhancement to the standard scrubbing algorithm by building the scrubbing function using the ICAP to access configuration data. Since the error correction is done in the FPGA application, the application developer selects the number of bits to correct and the encoding of the correction data.

4.4 Program Intercept As with any complex system, FPGAs include buffers, caches and other temporary data storage locations that aren’t explicitly cleared when the device is reprogrammed. This lingering temporary data may divulge sensitive information should an adversary interrupt and re-program the FPGA while it is operating. To address this, the FPGA reprogram signal can be intercepted by the operating application. The application can hold off reprogramming while it clears sensitive temporary data or terminates communications.

4.5 JTAG Intercept JTAG scan chains are useful in debugging, but problematic for security because they provide access to data and functions throughout the FPGA. In secure applications, an adversary must not have access to the JTAG scan chain. Microsemi and Xilinx provide mechanisms to permanently disable the JTAG interface as well as monitor it internally for activity. Activity on a test port such as a scan chain may indicate an attack in progress. Altera restricts the executable JTAG commands in secured application to a bare minimum.

4.6 Voltage and Temperature Monitors Xilinx recently added internal monitors on voltage and temperature to its FPGA. These monitors can be used to identify possible environmental attacks on an operating design.

4.7 Key Clear and Device Clear When an attack is detected internal signals allow the operating application to clear key data or the entire programmed configuration.

5. FROM FEATURES TO CAPABILITIES Encryption, authentication, Device DNA identifiers, PUFs, bitstream scrubbing, temperature and voltage sensors. These all are features of a security system. But the value is not in the features themselves, but in the security capabilities they provide. These capabilities include prevention of theft of the application design, prevention of tampering with an application before loading, privacy of data handled by the application, both before and during operation and metered IP. Multiple features may be required to provide a capability: depending on the expected attack, authentication alone may not guarantee the privacy of data inside the FPGA. Additional active features may be required. FPGA bitstream privacy and tamper-resistance provide the basis of further FPGA security capabilities for an application. In the FPGA environment, it is incumbent on the designer of the FPGA application to apply those and other features to achieve security capabilities. The application designer decides whether or not to defend against radiation attacks on the programming of the device. If so, the designer may activate bitstream scrubbing. Similarly, it is the application developer who integrates queries of the on-chip temperature and voltage sensors. Further, the application developer decides what results indicate an attack. Finally, the application developer decides what action to take when an attack is detected.

6. SINGLE-CHIP CRYPTOGRAPHY This section gives an overview of a security-sensitive FPGA application called Single-Chip Cryptography (SCC). SCC demands many security capabilities, which are built upon the features discussed in this paper. SCC combines algorithms and data of different levels of secrecy or control in a single device. The device must not only protect programs during loading, it must also defend against attacks from outside and attacks while operating, including leakage of protected information across internal boundaries.

SCC uses the authenticated encryption capability to load a boot loader. The boot loader manages further FPGA configuration, including software for on-chip processors and data handling. Because it was authenticated and encrypted, the boot loader is known to be unaltered by potential adversaries or accidental bit errors. In addition, sensitive data, such as session keys buried in the boot loader, are known to be kept secret. Authenticated encryption permits trust in the boot loader. That trust can be further applied to additional configuration data handled by the boot loader.

The boot loader accepts further partial device configurations through normal FPGA I/O. It

Figure 3: Notional Floorplan of a design into five Isolated Regions

(boot loader) ICAP

authenticates and decrypts using algorithms in the boot loader itself, constructed using FPGA fabric, rather than using the dedicated FPGA functions. This permits the boot loader greater flexibility in choosing algorithms and in key handling. The boot loader loads various isolated regions through the Xilinx Internal Configuration Access Port (ICAP). The configuration data never leaves the FPGA and if it is authenticated by the FPGA boot loader, it is known to be un-tampered.

To ensure no internal leakage of information between regions, SCC implements the fences of isolation design flow IDF [4]. The basic concept is to take a design and separate critical and/or intentionally separate functions physically on the FPGA. This can be accomplished through careful floorplanning and the use of unused logic as “fences”. The empty fence regions are wide enough that a single-bit failure in configuration does not connecting neighboring regions. This separation assures the confidentially of sensitive information even in the presence of accidental or intentional attacks on the fences. Figure 3 shows a block diagram of a design that has been floorplanned with IDF, while figure 4 shows the placed and routed view of the design. Fences are visible as black, unused regions.

In an ideal world, each module would be completely isolated from all others. In practice, some level of communication must exist between isolated regions. Xilinx developed the concept of “Trusted Routing”, restricted use of the FPGA interconnects through the fences, such that the isolation established by the use of fences is not compromised. Loaded as part of the boot loader, bitstream scrubbing, using internal readback, continually monitors the configuration data, in particular the isolation fences, to ensure that changes to the configuration are detected and corrected quickly. SCC can even verify that the Device DNA of the chip, ensuring operation on the proper individual chip.

Without loss of security, the boot loader is itself one of the isolated regions in the device. Any attempt to configure the device from an external source triggers the program signal that is caught internally by the boot loader, which initiates a zeroization of the application inside the FPGA before permitting the re- programming to occur.

Originally conceptualized and developed in cooperation with government authorities for FPGAs [3], the application provides additional value in All- Programmable SoCs such as Zynq. Zynq includes both a programmable logic subsystem (PL) that comprises hundreds of thousands of gates of logic, and a processor subsystem (PS) that includes a multi-core ARM processor, caches, memories and peripherals, connected to one another and to

the PL using an AXI bus. The Zynq device boots securely, using authenticated encryption capabilities like those described for FPGAs. Zynq provides asymmetric and symmetric authentication, confidentiality and integrity. Leveraging this root- of-trust, applications can implement crypto-processors or systems performing cryptographic functions in the combination of processor and FPGA with confidence that they have not been compromised.

In Zynq, the Processing Subsystem (PS) is known to be isolated from the Programmable Logic (PL). Within the PL, isolated regions ensure separation of sensitive data spatially. Within the PS, known software methods, such as hypervisors and/or ARM Trustzone technology isolate sensitive software processes from other processes. The trusted boot loader decrypts and authenticates all configuration data and software, potentially using session keys and custom algorithms implemented in the FPGA fabric or the ARM processors.

The spectrum of isolation capabilities is suitable to support applications such as the separation of red and black data processing, key management and other high-reliability functions. Partial Reconfiguration is further enhanced. The entire Zynq PL can be reconfigured, or even powered down, controlled by the PS. Alternatively, portions of the PL can be partially reconfigured for applications that require algorithm agility. Decryption and authentication of partial configuration files can be performed by either the PS or PL, allowing users the flexibility to choose their own authentication and decryption algorithms as well as perform functions such as Authenticate before Decryption to aid in defense against side channel attacks.

Starting with the root-of-trust, followed by the power and flexibility of both hardware and software, coupled with the application of isolation technologies and PR, a system that would typically have been developed through the use of multiple devices now could be integrated into just one with no loss of security.

7. REFERENCES [1] FIPS, “The Keyed-Hash Message Authentication Code (HMAC)”,

FIPS PUB 198; March 6, 2002, http://csrc.nist.gov/publications/fips/fips198-1/FIPS-198-1_final.pdf

[2] J. Guajardo, et. al., “Physical Unclonable Functions and Public-Key Crypto for FPGA IP Protection, FPL 2007, IEEE

[3] M. McLean and J. Moore, “FPGA-Based Single Chip Cryptographic Solution,” Military Embedded Systems, 2007. http://www.mil- embedded.com/pdfs/NSA.Mar07.pdf.

[4] E. Peterson., “Developing Tamper Resistant Designs with Xilinx Virtex-6 and 7 Series FPGAs,” Xilinx Application Note XAPP1084, Xilinx 2012.

[5] Intrinsic-ID, “Quiddikey-Flex,” http://www.intrinsic- id.com/products/quiddikey-flex, 2013

[6] A. Telikepalli, “Is Your Design Secure?,” Xcell, Xilinx 2003. http://www.xilinx.com/publications/archives/xcell/Xcell47.pdf.

[7] S. Trimberger, “Method and apparatus for protecting proprietary configuration data for programmable logic devices,” US Patent 6654889 2003.

[8] S. Trimberger, Trusted Design in FPGAs”, Proceedings of the ACM/IEEE Design Automation Conference, 2007.

[9] S. Trimberger, J. Moore, W. Lu, “Authenticated Encryption of FPGA Bitstreams,” , FPGA 2011, ACM

Figure 4: FPGA Editor view of a SCC design with IDF

<< /ASCII85EncodePages false /AllowTransparency false /AutoPositionEPSFiles true /AutoRotatePages /None /Binding /Left /CalGrayProfile (Gray Gamma 2.2) /CalRGBProfile (sRGB IEC61966-2.1) /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2) /sRGBProfile (sRGB IEC61966-2.1) /CannotEmbedFontPolicy /Error /CompatibilityLevel 1.7 /CompressObjects /Off /CompressPages true /ConvertImagesToIndexed true /PassThroughJPEGImages true /CreateJobTicket false /DefaultRenderingIntent /Default /DetectBlends true /DetectCurves 0.0000 /ColorConversionStrategy /LeaveColorUnchanged /DoThumbnails false /EmbedAllFonts true /EmbedOpenType false /ParseICCProfilesInComments true /EmbedJobOptions true /DSCReportingLevel 0 /EmitDSCWarnings false /EndPage -1 /ImageMemory 1048576 /LockDistillerParams true /MaxSubsetPct 100 /Optimize true /OPM 0 /ParseDSCComments false /ParseDSCCommentsForDocInfo false /PreserveCopyPage true /PreserveDICMYKValues true /PreserveEPSInfo false /PreserveFlatness true /PreserveHalftoneInfo true /PreserveOPIComments false /PreserveOverprintSettings true /StartPage 1 /SubsetFonts true /TransferFunctionInfo /Remove /UCRandBGInfo /Preserve /UsePrologue false /ColorSettingsFile () /AlwaysEmbed [ true /AbadiMT-CondensedLight /ACaslon-Italic /ACaslon-Regular /ACaslon-Semibold /ACaslon-SemiboldItalic /AdobeArabic-Bold /AdobeArabic-BoldItalic /AdobeArabic-Italic /AdobeArabic-Regular /AdobeHebrew-Bold /AdobeHebrew-BoldItalic /AdobeHebrew-Italic /AdobeHebrew-Regular /AdobeHeitiStd-Regular /AdobeMingStd-Light /AdobeMyungjoStd-Medium /AdobePiStd /AdobeSansMM /AdobeSerifMM /AdobeSongStd-Light /AdobeThai-Bold /AdobeThai-BoldItalic /AdobeThai-Italic /AdobeThai-Regular /AGaramond-Bold /AGaramond-BoldItalic /AGaramond-Italic /AGaramond-Regular /AGaramond-Semibold /AGaramond-SemiboldItalic /AgencyFB-Bold /AgencyFB-Reg /AGOldFace-Outline /AharoniBold /Algerian /Americana /Americana-ExtraBold /AndaleMono /AndaleMonoIPA /AngsanaNew /AngsanaNew-Bold /AngsanaNew-BoldItalic /AngsanaNew-Italic /AngsanaUPC /AngsanaUPC-Bold /AngsanaUPC-BoldItalic /AngsanaUPC-Italic /Anna /ArialAlternative /ArialAlternativeSymbol /Arial-Black /Arial-BlackItalic /Arial-BoldItalicMT /Arial-BoldMT /Arial-ItalicMT /ArialMT /ArialMT-Black /ArialNarrow /ArialNarrow-Bold /ArialNarrow-BoldItalic /ArialNarrow-Italic /ArialRoundedMTBold /ArialUnicodeMS /ArrusBT-Bold /ArrusBT-BoldItalic /ArrusBT-Italic /ArrusBT-Roman /AvantGarde-Book /AvantGarde-BookOblique /AvantGarde-Demi /AvantGarde-DemiOblique /AvantGardeITCbyBT-Book /AvantGardeITCbyBT-BookOblique /BakerSignet /BankGothicBT-Medium /Barmeno-Bold /Barmeno-ExtraBold /Barmeno-Medium /Barmeno-Regular /Baskerville /BaskervilleBE-Italic /BaskervilleBE-Medium /BaskervilleBE-MediumItalic /BaskervilleBE-Regular /Baskerville-Bold /Baskerville-BoldItalic /Baskerville-Italic /BaskOldFace /Batang /BatangChe /Bauhaus93 /Bellevue /BellGothicStd-Black /BellGothicStd-Bold /BellGothicStd-Light /BellMT /BellMTBold /BellMTItalic /BerlingAntiqua-Bold /BerlingAntiqua-BoldItalic /BerlingAntiqua-Italic /BerlingAntiqua-Roman /BerlinSansFB-Bold /BerlinSansFBDemi-Bold /BerlinSansFB-Reg /BernardMT-Condensed /BernhardModernBT-Bold /BernhardModernBT-BoldItalic /BernhardModernBT-Italic /BernhardModernBT-Roman /BiffoMT /BinnerD /BinnerGothic /BlackadderITC-Regular /Blackoak /blex /blsy /Bodoni /Bodoni-Bold /Bodoni-BoldItalic /Bodoni-Italic /BodoniMT /BodoniMTBlack /BodoniMTBlack-Italic /BodoniMT-Bold /BodoniMT-BoldItalic /BodoniMTCondensed /BodoniMTCondensed-Bold /BodoniMTCondensed-BoldItalic /BodoniMTCondensed-Italic /BodoniMT-Italic /BodoniMTPosterCompressed /Bodoni-Poster /Bodoni-PosterCompressed /BookAntiqua /BookAntiqua-Bold /BookAntiqua-BoldItalic /BookAntiqua-Italic /Bookman-Demi /Bookman-DemiItalic /Bookman-Light /Bookman-LightItalic /BookmanOldStyle /BookmanOldStyle-Bold /BookmanOldStyle-BoldItalic /BookmanOldStyle-Italic /BookshelfSymbolOne-Regular /BookshelfSymbolSeven /BookshelfSymbolThree-Regular /BookshelfSymbolTwo-Regular /Botanical /Boton-Italic /Boton-Medium /Boton-MediumItalic /Boton-Regular /Boulevard /BradleyHandITC /Braggadocio /BritannicBold /Broadway /BrowalliaNew /BrowalliaNew-Bold /BrowalliaNew-BoldItalic /BrowalliaNew-Italic /BrowalliaUPC /BrowalliaUPC-Bold /BrowalliaUPC-BoldItalic /BrowalliaUPC-Italic /BrushScript /BrushScriptMT /CaflischScript-Bold /CaflischScript-Regular /Calibri /Calibri-Bold /Calibri-BoldItalic /Calibri-Italic /CalifornianFB-Bold /CalifornianFB-Italic /CalifornianFB-Reg /CalisMTBol /CalistoMT /CalistoMT-BoldItalic /CalistoMT-Italic /Cambria /Cambria-Bold /Cambria-BoldItalic /Cambria-Italic /CambriaMath /Candara /Candara-Bold /Candara-BoldItalic /Candara-Italic /Carta /CaslonOpenfaceBT-Regular /Castellar /CastellarMT /Centaur /Centaur-Italic /Century /CenturyGothic /CenturyGothic-Bold /CenturyGothic-BoldItalic /CenturyGothic-Italic /CenturySchL-Bold /CenturySchL-BoldItal /CenturySchL-Ital /CenturySchL-Roma /CenturySchoolbook /CenturySchoolbook-Bold /CenturySchoolbook-BoldItalic /CenturySchoolbook-Italic /CGTimes-Bold /CGTimes-BoldItalic /CGTimes-Italic /CGTimes-Regular /CharterBT-Bold /CharterBT-BoldItalic /CharterBT-Italic /CharterBT-Roman /CheltenhamITCbyBT-Bold /CheltenhamITCbyBT-BoldItalic /CheltenhamITCbyBT-Book /CheltenhamITCbyBT-BookItalic /Chiller-Regular /Cmb10 /CMB10 /Cmbsy10 /CMBSY10 /CMBSY5 /CMBSY6 /CMBSY7 /CMBSY8 /CMBSY9 /Cmbx10 /CMBX10 /Cmbx12 /CMBX12 /Cmbx5 /CMBX5 /Cmbx6 /CMBX6 /Cmbx7 /CMBX7 /Cmbx8 /CMBX8 /Cmbx9 /CMBX9 /Cmbxsl10 /CMBXSL10 /Cmbxti10 /CMBXTI10 /Cmcsc10 /CMCSC10 /Cmcsc8 /CMCSC8 /Cmcsc9 /CMCSC9 /Cmdunh10 /CMDUNH10 /Cmex10 /CMEX10 /CMEX7 /CMEX8 /CMEX9 /Cmff10 /CMFF10 /Cmfi10 /CMFI10 /Cmfib8 /CMFIB8 /Cminch /CMINCH /Cmitt10 /CMITT10 /Cmmi10 /CMMI10 /Cmmi12 /CMMI12 /Cmmi5 /CMMI5 /Cmmi6 /CMMI6 /Cmmi7 /CMMI7 /Cmmi8 /CMMI8 /Cmmi9 /CMMI9 /Cmmib10 /CMMIB10 /CMMIB5 /CMMIB6 /CMMIB7 /CMMIB8 /CMMIB9 /Cmr10 /CMR10 /Cmr12 /CMR12 /Cmr17 /CMR17 /Cmr5 /CMR5 /Cmr6 /CMR6 /Cmr7 /CMR7 /Cmr8 /CMR8 /Cmr9 /CMR9 /Cmsl10 /CMSL10 /Cmsl12 /CMSL12 /Cmsl8 /CMSL8 /Cmsl9 /CMSL9 /Cmsltt10 /CMSLTT10 /Cmss10 /CMSS10 /Cmss12 /CMSS12 /Cmss17 /CMSS17 /Cmss8 /CMSS8 /Cmss9 /CMSS9 /Cmssbx10 /CMSSBX10 /Cmssdc10 /CMSSDC10 /Cmssi10 /CMSSI10 /Cmssi12 /CMSSI12 /Cmssi17 /CMSSI17 /Cmssi8 /CMSSI8 /Cmssi9 /CMSSI9 /Cmssq8 /CMSSQ8 /Cmssqi8 /CMSSQI8 /Cmsy10 /CMSY10 /Cmsy5 /CMSY5 /Cmsy6 /CMSY6 /Cmsy7 /CMSY7 /Cmsy8 /CMSY8 /Cmsy9 /CMSY9 /Cmtcsc10 /CMTCSC10 /Cmtex10 /CMTEX10 /Cmtex8 /CMTEX8 /Cmtex9 /CMTEX9 /Cmti10 /CMTI10 /Cmti12 /CMTI12 /Cmti7 /CMTI7 /Cmti8 /CMTI8 /Cmti9 /CMTI9 /Cmtt10 /CMTT10 /Cmtt12 /CMTT12 /Cmtt8 /CMTT8 /Cmtt9 /CMTT9 /Cmu10 /CMU10 /Cmvtt10 /CMVTT10 /ColonnaMT /Colossalis-Bold /ComicSansMS /ComicSansMS-Bold /Consolas /Consolas-Bold /Consolas-BoldItalic /Consolas-Italic /Constantia /Constantia-Bold /Constantia-BoldItalic /Constantia-Italic /CooperBlack /CopperplateGothic-Bold /CopperplateGothic-Light /Copperplate-ThirtyThreeBC /Corbel /Corbel-Bold /Corbel-BoldItalic /Corbel-Italic /CordiaNew /CordiaNew-Bold /CordiaNew-BoldItalic /CordiaNew-Italic /CordiaUPC /CordiaUPC-Bold /CordiaUPC-BoldItalic /CordiaUPC-Italic /Courier /Courier-Bold /Courier-BoldOblique /CourierNewPS-BoldItalicMT /CourierNewPS-BoldMT /CourierNewPS-ItalicMT /CourierNewPSMT /Courier-Oblique /CourierStd /CourierStd-Bold /CourierStd-BoldOblique /CourierStd-Oblique /CourierX-Bold /CourierX-BoldOblique /CourierX-Oblique /CourierX-Regular /CreepyRegular /CurlzMT /David-Bold /David-Reg /DavidTransparent /Dcb10 /Dcbx10 /Dcbxsl10 /Dcbxti10 /Dccsc10 /Dcitt10 /Dcr10 /Desdemona /DilleniaUPC /DilleniaUPCBold /DilleniaUPCBoldItalic /DilleniaUPCItalic /Dingbats /DomCasual /Dotum /DotumChe /EdwardianScriptITC /Elephant-Italic /Elephant-Regular /EngraversGothicBT-Regular /EngraversMT /EraserDust /ErasITC-Bold /ErasITC-Demi /ErasITC-Light /ErasITC-Medium /ErieBlackPSMT /ErieLightPSMT /EriePSMT /EstrangeloEdessa /Euclid /Euclid-Bold /Euclid-BoldItalic /EuclidExtra /EuclidExtra-Bold /EuclidFraktur /EuclidFraktur-Bold /Euclid-Italic /EuclidMathOne /EuclidMathOne-Bold /EuclidMathTwo /EuclidMathTwo-Bold /EuclidSymbol /EuclidSymbol-Bold /EuclidSymbol-BoldItalic /EuclidSymbol-Italic /EucrosiaUPC /EucrosiaUPCBold /EucrosiaUPCBoldItalic /EucrosiaUPCItalic /EUEX10 /EUEX7 /EUEX8 /EUEX9 /EUFB10 /EUFB5 /EUFB7 /EUFM10 /EUFM5 /EUFM7 /EURB10 /EURB5 /EURB7 /EURM10 /EURM5 /EURM7 /EuroMono-Bold /EuroMono-BoldItalic /EuroMono-Italic /EuroMono-Regular /EuroSans-Bold /EuroSans-BoldItalic /EuroSans-Italic /EuroSans-Regular /EuroSerif-Bold /EuroSerif-BoldItalic /EuroSerif-Italic /EuroSerif-Regular /EuroSig /EUSB10 /EUSB5 /EUSB7 /EUSM10 /EUSM5 /EUSM7 /FelixTitlingMT /Fences /FencesPlain /FigaroMT /FixedMiriamTransparent /FootlightMTLight /Formata-Italic /Formata-Medium /Formata-MediumItalic /Formata-Regular /ForteMT /FranklinGothic-Book /FranklinGothic-BookItalic /FranklinGothic-Demi /FranklinGothic-DemiCond /FranklinGothic-DemiItalic /FranklinGothic-Heavy /FranklinGothic-HeavyItalic /FranklinGothicITCbyBT-Book /FranklinGothicITCbyBT-BookItal /FranklinGothicITCbyBT-Demi /FranklinGothicITCbyBT-DemiItal /FranklinGothic-Medium /FranklinGothic-MediumCond /FranklinGothic-MediumItalic /FrankRuehl /FreesiaUPC /FreesiaUPCBold /FreesiaUPCBoldItalic /FreesiaUPCItalic /FreestyleScript-Regular /FrenchScriptMT /Frutiger-Black /Frutiger-BlackCn /Frutiger-BlackItalic /Frutiger-Bold /Frutiger-BoldCn /Frutiger-BoldItalic /Frutiger-Cn /Frutiger-ExtraBlackCn /Frutiger-Italic /Frutiger-Light /Frutiger-LightCn /Frutiger-LightItalic /Frutiger-Roman /Frutiger-UltraBlack /Futura-Bold /Futura-BoldOblique /Futura-Book /Futura-BookOblique /FuturaBT-Bold /FuturaBT-BoldItalic /FuturaBT-Book /FuturaBT-BookItalic /FuturaBT-Medium /FuturaBT-MediumItalic /Futura-Light /Futura-LightOblique /GalliardITCbyBT-Bold /GalliardITCbyBT-BoldItalic /GalliardITCbyBT-Italic /GalliardITCbyBT-Roman /Garamond /Garamond-Bold /Garamond-BoldCondensed /Garamond-BoldCondensedItalic /Garamond-BoldItalic /Garamond-BookCondensed /Garamond-BookCondensedItalic /Garamond-Italic /Garamond-LightCondensed /Garamond-LightCondensedItalic /Gautami /GeometricSlab703BT-Light /GeometricSlab703BT-LightItalic /Georgia /Georgia-Bold /Georgia-BoldItalic /Georgia-Italic /GeorgiaRef /Giddyup /Giddyup-Thangs /Gigi-Regular /GillSans /GillSans-Bold /GillSans-BoldItalic /GillSans-Condensed /GillSans-CondensedBold /GillSans-Italic /GillSans-Light /GillSans-LightItalic /GillSansMT /GillSansMT-Bold /GillSansMT-BoldItalic /GillSansMT-Condensed /GillSansMT-ExtraCondensedBold /GillSansMT-Italic /GillSans-UltraBold /GillSans-UltraBoldCondensed /GloucesterMT-ExtraCondensed /Gothic-Thirteen /GoudyOldStyleBT-Bold /GoudyOldStyleBT-BoldItalic /GoudyOldStyleBT-Italic /GoudyOldStyleBT-Roman /GoudyOldStyleT-Bold /GoudyOldStyleT-Italic /GoudyOldStyleT-Regular /GoudyStout /GoudyTextMT-LombardicCapitals /GSIDefaultSymbols /Gulim /GulimChe /Gungsuh /GungsuhChe /Haettenschweiler /HarlowSolid /Harrington /Helvetica /Helvetica-Black /Helvetica-BlackOblique /Helvetica-Bold /Helvetica-BoldOblique /Helvetica-Condensed /Helvetica-Condensed-Black /Helvetica-Condensed-BlackObl /Helvetica-Condensed-Bold /Helvetica-Condensed-BoldObl /Helvetica-Condensed-Light /Helvetica-Condensed-LightObl /Helvetica-Condensed-Oblique /Helvetica-Fraction /Helvetica-Narrow /Helvetica-Narrow-Bold /Helvetica-Narrow-BoldOblique /Helvetica-Narrow-Oblique /Helvetica-Oblique /HighTowerText-Italic /HighTowerText-Reg /Humanist521BT-BoldCondensed /Humanist521BT-Light /Humanist521BT-LightItalic /Humanist521BT-RomanCondensed /Imago-ExtraBold /Impact /ImprintMT-Shadow /InformalRoman-Regular /IrisUPC /IrisUPCBold /IrisUPCBoldItalic /IrisUPCItalic /Ironwood /ItcEras-Medium /ItcKabel-Bold /ItcKabel-Book /ItcKabel-Demi /ItcKabel-Medium /ItcKabel-Ultra /JasmineUPC /JasmineUPC-Bold /JasmineUPC-BoldItalic /JasmineUPC-Italic /JoannaMT /JoannaMT-Italic /Jokerman-Regular /JuiceITC-Regular /Kartika /Kaufmann /KaufmannBT-Bold /KaufmannBT-Regular /KidTYPEPaint /KinoMT /KodchiangUPC /KodchiangUPC-Bold /KodchiangUPC-BoldItalic /KodchiangUPC-Italic /KorinnaITCbyBT-Regular /KozGoProVI-Medium /KozMinProVI-Regular /KristenITC-Regular /KunstlerScript /Latha /LatinWide /LetterGothic /LetterGothic-Bold /LetterGothic-BoldOblique /LetterGothic-BoldSlanted /LetterGothicMT /LetterGothicMT-Bold /LetterGothicMT-BoldOblique /LetterGothicMT-Oblique /LetterGothic-Slanted /LetterGothicStd /LetterGothicStd-Bold /LetterGothicStd-BoldSlanted /LetterGothicStd-Slanted /LevenimMT /LevenimMTBold /LilyUPC /LilyUPCBold /LilyUPCBoldItalic /LilyUPCItalic /Lithos-Black /Lithos-Regular /LotusWPBox-Roman /LotusWPIcon-Roman /LotusWPIntA-Roman /LotusWPIntB-Roman /LotusWPType-Roman /LucidaBright /LucidaBright-Demi /LucidaBright-DemiItalic /LucidaBright-Italic /LucidaCalligraphy-Italic /LucidaConsole /LucidaFax /LucidaFax-Demi /LucidaFax-DemiItalic /LucidaFax-Italic /LucidaHandwriting-Italic /LucidaSans /LucidaSans-Demi /LucidaSans-DemiItalic /LucidaSans-Italic /LucidaSans-Typewriter /LucidaSans-TypewriterBold /LucidaSans-TypewriterBoldOblique /LucidaSans-TypewriterOblique /LucidaSansUnicode /Lydian /Magneto-Bold /MaiandraGD-Regular /Mangal-Regular /Map-Symbols /MathA /MathB /MathC /Mathematica1 /Mathematica1-Bold /Mathematica1Mono /Mathematica1Mono-Bold /Mathematica2 /Mathematica2-Bold /Mathematica2Mono /Mathematica2Mono-Bold /Mathematica3 /Mathematica3-Bold /Mathematica3Mono /Mathematica3Mono-Bold /Mathematica4 /Mathematica4-Bold /Mathematica4Mono /Mathematica4Mono-Bold /Mathematica5 /Mathematica5-Bold /Mathematica5Mono /Mathematica5Mono-Bold /Mathematica6 /Mathematica6Bold /Mathematica6Mono /Mathematica6MonoBold /Mathematica7 /Mathematica7Bold /Mathematica7Mono /Mathematica7MonoBold /MatisseITC-Regular /MaturaMTScriptCapitals /Mesquite /Mezz-Black /Mezz-Regular /MICR /MicrosoftSansSerif /MingLiU /Minion-BoldCondensed /Minion-BoldCondensedItalic /Minion-Condensed /Minion-CondensedItalic /Minion-Ornaments /MinionPro-Bold /MinionPro-BoldIt /MinionPro-It /MinionPro-Regular /MinionPro-Semibold /MinionPro-SemiboldIt /Miriam /MiriamFixed /MiriamTransparent /Mistral /Modern-Regular /MonotypeCorsiva /MonotypeSorts /MSAM10 /MSAM5 /MSAM6 /MSAM7 /MSAM8 /MSAM9 /MSBM10 /MSBM5 /MSBM6 /MSBM7 /MSBM8 /MSBM9 /MS-Gothic /MSHei /MSLineDrawPSMT /MS-Mincho /MSOutlook /MS-PGothic /MS-PMincho /MSReference1 /MSReference2 /MSReferenceSansSerif /MSReferenceSansSerif-Bold /MSReferenceSansSerif-BoldItalic /MSReferenceSansSerif-Italic /MSReferenceSerif /MSReferenceSerif-Bold /MSReferenceSerif-BoldItalic /MSReferenceSerif-Italic /MSReferenceSpecialty /MSSong /MS-UIGothic /MT-Extra /MT-Symbol /MT-Symbol-Italic /MVBoli /Myriad-Bold /Myriad-BoldItalic /Myriad-Italic /MyriadPro-Black /MyriadPro-BlackIt /MyriadPro-Bold /MyriadPro-BoldIt /MyriadPro-It /MyriadPro-Light /MyriadPro-LightIt /MyriadPro-Regular /MyriadPro-Semibold /MyriadPro-SemiboldIt /Myriad-Roman /Narkisim /NewCenturySchlbk-Bold /NewCenturySchlbk-BoldItalic /NewCenturySchlbk-Italic /NewCenturySchlbk-Roman /NewMilleniumSchlbk-BoldItalicSH /NewsGothic /NewsGothic-Bold /NewsGothicBT-Bold /NewsGothicBT-BoldItalic /NewsGothicBT-Italic /NewsGothicBT-Roman /NewsGothic-Condensed /NewsGothic-Italic /NewsGothicMT /NewsGothicMT-Bold /NewsGothicMT-Italic /NiagaraEngraved-Reg /NiagaraSolid-Reg /NimbusMonL-Bold /NimbusMonL-BoldObli /NimbusMonL-Regu /NimbusMonL-ReguObli /NimbusRomDGR-Bold /NimbusRomDGR-BoldItal /NimbusRomDGR-Regu /NimbusRomDGR-ReguItal /NimbusRomNo9L-Medi /NimbusRomNo9L-MediItal /NimbusRomNo9L-Regu /NimbusRomNo9L-ReguItal /NimbusSanL-Bold /NimbusSanL-BoldCond /NimbusSanL-BoldCondItal /NimbusSanL-BoldItal /NimbusSanL-Regu /NimbusSanL-ReguCond /NimbusSanL-ReguCondItal /NimbusSanL-ReguItal /Nimrod /Nimrod-Bold /Nimrod-BoldItalic /Nimrod-Italic /NSimSun /Nueva-BoldExtended /Nueva-BoldExtendedItalic /Nueva-Italic /Nueva-Roman /NuptialScript /OCRA /OCRA-Alternate /OCRAExtended /OCRB /OCRB-Alternate /OfficinaSans-Bold /OfficinaSans-BoldItalic /OfficinaSans-Book /OfficinaSans-BookItalic /OfficinaSerif-Bold /OfficinaSerif-BoldItalic /OfficinaSerif-Book /OfficinaSerif-BookItalic /OldEnglishTextMT /Onyx /OnyxBT-Regular /OzHandicraftBT-Roman /PalaceScriptMT /Palatino-Bold /Palatino-BoldItalic /Palatino-Italic /PalatinoLinotype-Bold /PalatinoLinotype-BoldItalic /PalatinoLinotype-Italic /PalatinoLinotype-Roman /Palatino-Roman /PapyrusPlain /Papyrus-Regular /Parchment-Regular /Parisian /ParkAvenue /Penumbra-SemiboldFlare /Penumbra-SemiboldSans /Penumbra-SemiboldSerif /PepitaMT /Perpetua /Perpetua-Bold /Perpetua-BoldItalic /Perpetua-Italic /PerpetuaTitlingMT-Bold /PerpetuaTitlingMT-Light /PhotinaCasualBlack /Playbill /PMingLiU /Poetica-SuppOrnaments /PoorRichard-Regular /PopplLaudatio-Italic /PopplLaudatio-Medium /PopplLaudatio-MediumItalic /PopplLaudatio-Regular /PrestigeElite /Pristina-Regular /PTBarnumBT-Regular /Raavi /RageItalic /Ravie /RefSpecialty /Ribbon131BT-Bold /Rockwell /Rockwell-Bold /Rockwell-BoldItalic /Rockwell-Condensed /Rockwell-CondensedBold /Rockwell-ExtraBold /Rockwell-Italic /Rockwell-Light /Rockwell-LightItalic /Rod /RodTransparent /RunicMT-Condensed /Sanvito-Light /Sanvito-Roman /ScriptC /ScriptMTBold /SegoeUI /SegoeUI-Bold /SegoeUI-BoldItalic /SegoeUI-Italic /Serpentine-BoldOblique /ShelleyVolanteBT-Regular /ShowcardGothic-Reg /Shruti /SimHei /SimSun /SimSun-PUA /SnapITC-Regular /StandardSymL /Stencil /StoneSans /StoneSans-Bold /StoneSans-BoldItalic /StoneSans-Italic /StoneSans-Semibold /StoneSans-SemiboldItalic /Stop /Swiss721BT-BlackExtended /Sylfaen /Symbol /SymbolMT /Tahoma /Tahoma-Bold /Tci1 /Tci1Bold /Tci1BoldItalic /Tci1Italic /Tci2 /Tci2Bold /Tci2BoldItalic /Tci2Italic /Tci3 /Tci3Bold /Tci3BoldItalic /Tci3Italic /Tci4 /Tci4Bold /Tci4BoldItalic /Tci4Italic /TechnicalItalic /TechnicalPlain /Tekton /Tekton-Bold /TektonMM /Tempo-HeavyCondensed /Tempo-HeavyCondensedItalic /TempusSansITC /Times-Bold /Times-BoldItalic /Times-BoldItalicOsF /Times-BoldSC /Times-ExtraBold /Times-Italic /Times-ItalicOsF /TimesNewRomanMT-ExtraBold /TimesNewRomanPS-BoldItalicMT /TimesNewRomanPS-BoldMT /TimesNewRomanPS-ItalicMT /TimesNewRomanPSMT /Times-Roman /Times-RomanSC /Trajan-Bold /Trebuchet-BoldItalic /TrebuchetMS /TrebuchetMS-Bold /TrebuchetMS-Italic /Tunga-Regular /TwCenMT-Bold /TwCenMT-BoldItalic /TwCenMT-Condensed /TwCenMT-CondensedBold /TwCenMT-CondensedExtraBold /TwCenMT-CondensedMedium /TwCenMT-Italic /TwCenMT-Regular /Univers-Bold /Univers-BoldItalic /UniversCondensed-Bold /UniversCondensed-BoldItalic /UniversCondensed-Medium /UniversCondensed-MediumItalic /Univers-Medium /Univers-MediumItalic /URWBookmanL-DemiBold /URWBookmanL-DemiBoldItal /URWBookmanL-Ligh /URWBookmanL-LighItal /URWChanceryL-MediItal /URWGothicL-Book /URWGothicL-BookObli /URWGothicL-Demi /URWGothicL-DemiObli /URWPalladioL-Bold /URWPalladioL-BoldItal /URWPalladioL-Ital /URWPalladioL-Roma /USPSBarCode /VAGRounded-Black /VAGRounded-Bold /VAGRounded-Light /VAGRounded-Thin /Verdana /Verdana-Bold /Verdana-BoldItalic /Verdana-Italic /VerdanaRef /VinerHandITC /Viva-BoldExtraExtended /Vivaldii /Viva-LightCondensed /Viva-Regular /VladimirScript /Vrinda /Webdings /Westminster /Willow /Wingdings2 /Wingdings3 /Wingdings-Regular /WNCYB10 /WNCYI10 /WNCYR10 /WNCYSC10 /WNCYSS10 /WoodtypeOrnaments-One /WoodtypeOrnaments-Two /WP-ArabicScriptSihafa /WP-ArabicSihafa /WP-BoxDrawing /WP-CyrillicA /WP-CyrillicB /WP-GreekCentury /WP-GreekCourier /WP-GreekHelve /WP-HebrewDavid /WP-IconicSymbolsA /WP-IconicSymbolsB /WP-Japanese /WP-MathA /WP-MathB /WP-MathExtendedA /WP-MathExtendedB /WP-MultinationalAHelve /WP-MultinationalARoman /WP-MultinationalBCourier /WP-MultinationalBHelve /WP-MultinationalBRoman /WP-MultinationalCourier /WP-Phonetic /WPTypographicSymbols /XYATIP10 /XYBSQL10 /XYBTIP10 /XYCIRC10 /XYCMAT10 /XYCMBT10 /XYDASH10 /XYEUAT10 /XYEUBT10 /ZapfChancery-MediumItalic /ZapfDingbats /ZapfHumanist601BT-Bold /ZapfHumanist601BT-BoldItalic /ZapfHumanist601BT-Demi /ZapfHumanist601BT-DemiItalic /ZapfHumanist601BT-Italic /ZapfHumanist601BT-Roman /ZWAdobeF ] /NeverEmbed [ true ] /AntiAliasColorImages false /CropColorImages true /ColorImageMinResolution 200 /ColorImageMinResolutionPolicy /OK /DownsampleColorImages true /ColorImageDownsampleType /Bicubic /ColorImageResolution 300 /ColorImageDepth -1 /ColorImageMinDownsampleDepth 1 /ColorImageDownsampleThreshold 2.00333 /EncodeColorImages true /ColorImageFilter /DCTEncode /AutoFilterColorImages true /ColorImageAutoFilterStrategy /JPEG /ColorACSImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /ColorImageDict << /QFactor 1.30 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /JPEG2000ColorACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 10 >> /JPEG2000ColorImageDict << /TileWidth 256 /TileHeight 256 /Quality 10 >> /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 200 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 2.00333 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /GrayImageDict << /QFactor 1.30 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /JPEG2000GrayACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 10 >> /JPEG2000GrayImageDict << /TileWidth 256 /TileHeight 256 /Quality 10 >> /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 400 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.00167 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict << /K -1 >> /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False /CreateJDFFile false /Description << /ARA <FEFF06270633062A062E062F0645002006470630064700200627064406250639062F0627062F0627062A002006440625064606340627062100200648062B062706260642002000410064006F00620065002000500044004600200645062A064806270641064206290020064406440639063106360020063906440649002006270644063406270634062900200648064506460020062E06440627064400200631063306270626064400200627064406280631064A062F002006270644062506440643062A063106480646064A00200648064506460020062E064406270644002006350641062D0627062A0020062706440648064A0628061B0020064A06450643064600200641062A062D00200648062B0627062606420020005000440046002006270644064506460634062306290020062806270633062A062E062F062706450020004100630072006F0062006100740020064800410064006F006200650020005200650061006400650072002006250635062F0627063100200035002E0030002006480627064406250635062F062706310627062A0020062706440623062D062F062B002E> /BGR <FEFF04180437043f043e043b043704320430043904420435002004420435043704380020043d0430044104420440043e0439043a0438002c00200437043000200434043000200441044a0437043404300432043004420435002000410064006f00620065002000500044004600200434043e043a0443043c0435043d04420438002c0020043c0430043a04410438043c0430043b043d043e0020043f044004380433043e04340435043d04380020043704300020043f043e043a0430043704320430043d04350020043d043000200435043a04400430043d0430002c00200435043b0435043a04420440043e043d043d04300020043f043e044904300020043800200418043d044204350440043d04350442002e002000200421044a04370434043004340435043d043804420435002000500044004600200434043e043a0443043c0435043d044204380020043c043e0433043004420020043404300020044104350020043e0442043204300440044f0442002004410020004100630072006f00620061007400200438002000410064006f00620065002000520065006100640065007200200035002e00300020043800200441043b0435043404320430044904380020043204350440044104380438002e> /CHS <FEFF4f7f75288fd94e9b8bbe5b9a521b5efa7684002000410064006f006200650020005000440046002065876863900275284e8e5c4f5e55663e793a3001901a8fc775355b5090ae4ef653d190014ee553ca901a8fc756e072797f5153d15e03300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c676562535f00521b5efa768400200050004400460020658768633002> /CHT <FEFF4f7f752890194e9b8a2d7f6e5efa7acb7684002000410064006f006200650020005000440046002065874ef69069752865bc87a25e55986f793a3001901a904e96fb5b5090f54ef650b390014ee553ca57287db2969b7db28def4e0a767c5e03300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c4f86958b555f5df25efa7acb76840020005000440046002065874ef63002> /CZE <FEFF005400610074006f0020006e006100730074006100760065006e00ed00200070006f0075017e0069006a007400650020006b0020007600790074007600e101590065006e00ed00200064006f006b0075006d0065006e0074016f002000410064006f006200650020005000440046002c0020006b00740065007200e90020007300650020006e0065006a006c00e90070006500200068006f006400ed002000700072006f0020007a006f006200720061007a006f007600e1006e00ed0020006e00610020006f006200720061007a006f007600630065002c00200070006f007300ed006c00e1006e00ed00200065002d006d00610069006c0065006d00200061002000700072006f00200069006e007400650072006e00650074002e002000200056007900740076006f01590065006e00e900200064006f006b0075006d0065006e007400790020005000440046002000620075006400650020006d006f017e006e00e90020006f007400650076015900ed007400200076002000700072006f006700720061006d0065006300680020004100630072006f00620061007400200061002000410064006f00620065002000520065006100640065007200200035002e0030002000610020006e006f0076011b006a016100ed00630068002e> /DAN <FEFF004200720075006700200069006e0064007300740069006c006c0069006e006700650072006e0065002000740069006c0020006100740020006f007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400650072002c0020006400650072002000620065006400730074002000650067006e006500720020007300690067002000740069006c00200073006b00e60072006d007600690073006e0069006e0067002c00200065002d006d00610069006c0020006f006700200069006e007400650072006e00650074002e0020004400650020006f007000720065007400740065006400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006b0061006e002000e50062006e00650073002000690020004100630072006f00620061007400200065006c006c006500720020004100630072006f006200610074002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e> /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e002000410064006f006200650020005000440046002d0044006f006b0075006d0065006e00740065006e002c00200064006900650020006600fc00720020006400690065002000420069006c006400730063006800690072006d0061006e007a0065006900670065002c00200045002d004d00610069006c0020006f006400650072002000640061007300200049006e007400650072006e00650074002000760065007200770065006e006400650074002000770065007200640065006e00200073006f006c006c0065006e002e002000450072007300740065006c006c007400650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f00620061007400200075006e0064002000410064006f00620065002000520065006100640065007200200035002e00300020006f0064006500720020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e> /ESP <FEFF005500740069006c0069006300650020006500730074006100200063006f006e0066006900670075007200610063006900f3006e0020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f00730020005000440046002000640065002000410064006f0062006500200061006400650063007500610064006f007300200070006100720061002000760069007300750061006c0069007a00610063006900f3006e00200065006e002000700061006e00740061006c006c0061002c00200063006f007200720065006f00200065006c006500630074007200f3006e00690063006f0020006500200049006e007400650072006e00650074002e002000530065002000700075006500640065006e00200061006200720069007200200064006f00630075006d0065006e0074006f00730020005000440046002000630072006500610064006f007300200063006f006e0020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e> /ETI <FEFF004b00610073007500740061006700650020006e0065006900640020007300e400740074006500690064002000730065006c006c0069007300740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740069006400650020006c006f006f006d006900730065006b0073002c0020006d0069007300200073006f006200690076006100640020006b00f500690067006500200070006100720065006d0069006e006900200065006b007200610061006e0069006c0020006b007500760061006d006900730065006b0073002c00200065002d0070006f0073007400690067006100200073006100610074006d006900730065006b00730020006a006100200049006e007400650072006e00650074006900730020006100760061006c00640061006d006900730065006b0073002e00200020004c006f006f0064007500640020005000440046002d0064006f006b0075006d0065006e00740065002000730061006100740065002000610076006100640061002000700072006f006700720061006d006d006900640065006700610020004100630072006f0062006100740020006e0069006e0067002000410064006f00620065002000520065006100640065007200200035002e00300020006a00610020007500750065006d006100740065002000760065007200730069006f006f006e00690064006500670061002e> /FRA <FEFF005500740069006c006900730065007a00200063006500730020006f007000740069006f006e00730020006100660069006e00200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e00740073002000410064006f006200650020005000440046002000640065007300740069006e00e90073002000e000200049006e007400650072006e00650074002c002000e0002000ea007400720065002000610066006600690063006800e90073002000e00020006c002700e9006300720061006e002000650074002000e0002000ea00740072006500200065006e0076006f007900e9007300200070006100720020006d006500730073006100670065007200690065002e0020004c0065007300200064006f00630075006d0065006e00740073002000500044004600200063007200e900e90073002000700065007500760065006e0074002000ea0074007200650020006f007500760065007200740073002000640061006e00730020004100630072006f006200610074002c002000610069006e00730069002000710075002700410064006f00620065002000520065006100640065007200200035002e0030002000650074002000760065007200730069006f006e007300200075006c007400e90072006900650075007200650073002e> /GRE <FEFF03a703c103b703c303b903bc03bf03c003bf03b903ae03c303c403b5002003b103c503c403ad03c2002003c403b903c2002003c103c503b803bc03af03c303b503b903c2002003b303b903b1002003bd03b1002003b403b703bc03b903bf03c503c103b303ae03c303b503c403b5002003ad03b303b303c103b103c603b1002000410064006f006200650020005000440046002003c003bf03c5002003b503af03bd03b103b9002003ba03b103c42019002003b503be03bf03c703ae03bd002003ba03b103c403ac03bb03bb03b703bb03b1002003b303b903b1002003c003b103c103bf03c503c303af03b103c303b7002003c303c403b703bd002003bf03b803cc03bd03b7002c002003b303b903b100200065002d006d00610069006c002c002003ba03b103b9002003b303b903b1002003c403bf0020039403b903b1002d03b403af03ba03c403c503bf002e0020002003a403b10020005000440046002003ad03b303b303c103b103c603b1002003c003bf03c5002003ad03c703b503c403b5002003b403b703bc03b903bf03c503c103b303ae03c303b503b9002003bc03c003bf03c103bf03cd03bd002003bd03b1002003b103bd03bf03b903c703c403bf03cd03bd002003bc03b5002003c403bf0020004100630072006f006200610074002c002003c403bf002000410064006f00620065002000520065006100640065007200200035002e0030002003ba03b103b9002003bc03b503c403b103b303b503bd03ad03c303c403b503c103b503c2002003b503ba03b403cc03c303b503b903c2002e> /HEB <FEFF05D405E905EA05DE05E905D5002005D105D405D205D305E805D505EA002005D005DC05D4002005DB05D305D9002005DC05D905E605D505E8002005DE05E105DE05DB05D9002000410064006F006200650020005000440046002005D405DE05D505EA05D005DE05D905DD002005DC05EA05E605D505D205EA002005DE05E105DA002C002005D305D505D005E8002005D005DC05E705D805E805D505E005D9002005D505D405D005D905E005D805E805E005D8002E002005DE05E105DE05DB05D90020005000440046002005E905E005D505E605E805D5002005E005D905EA05E005D905DD002005DC05E405EA05D905D705D4002005D105D005DE05E605E205D505EA0020004100630072006F006200610074002005D5002D00410064006F00620065002000520065006100640065007200200035002E0030002005D505D205E805E105D005D505EA002005DE05EA05E705D305DE05D505EA002005D905D505EA05E8002E002D0033002C002005E205D905D905E005D5002005D105DE05D305E805D905DA002005DC05DE05E905EA05DE05E9002005E905DC0020004100630072006F006200610074002E002005DE05E105DE05DB05D90020005000440046002005E905E005D505E605E805D5002005E005D905EA05E005D905DD002005DC05E405EA05D905D705D4002005D105D005DE05E605E205D505EA0020004100630072006F006200610074002005D5002D00410064006F00620065002000520065006100640065007200200035002E0030002005D505D205E805E105D005D505EA002005DE05EA05E705D305DE05D505EA002005D905D505EA05E8002E> /HRV <FEFF005a00610020007300740076006100720061006e006a0065002000500044004600200064006f006b0075006d0065006e0061007400610020006e0061006a0070006f0067006f0064006e0069006a006900680020007a00610020007000720069006b0061007a0020006e00610020007a00610073006c006f006e0075002c00200065002d0070006f0161007400690020006900200049006e007400650072006e0065007400750020006b006f00720069007300740069007400650020006f0076006500200070006f0073007400610076006b0065002e00200020005300740076006f00720065006e0069002000500044004600200064006f006b0075006d0065006e007400690020006d006f006700750020007300650020006f00740076006f00720069007400690020004100630072006f00620061007400200069002000410064006f00620065002000520065006100640065007200200035002e0030002000690020006b00610073006e0069006a0069006d0020007600650072007a0069006a0061006d0061002e> /HUN <FEFF00410020006b00e9007000650072006e00790151006e0020006d00650067006a0065006c0065006e00ed007400e9007300680065007a002c00200065002d006d00610069006c002000fc007a0065006e006500740065006b00620065006e002000e90073002000200049006e007400650072006e006500740065006e0020006800610073007a006e00e1006c00610074006e0061006b0020006c006500670069006e006b00e1006200620020006d0065006700660065006c0065006c0151002000410064006f00620065002000500044004600200064006f006b0075006d0065006e00740075006d006f006b0061007400200065007a0065006b006b0065006c0020006100200062006500e1006c006c00ed007400e10073006f006b006b0061006c0020006b00e90073007a00ed0074006800650074002e0020002000410020006c00e90074007200650068006f007a006f00740074002000500044004600200064006f006b0075006d0065006e00740075006d006f006b00200061007a0020004100630072006f006200610074002000e9007300200061007a002000410064006f00620065002000520065006100640065007200200035002e0030002c0020007600610067007900200061007a002000610074007400f3006c0020006b00e9007301510062006200690020007600650072007a006900f3006b006b0061006c0020006e00790069007400680061007400f3006b0020006d00650067002e> /ITA <FEFF005500740069006c0069007a007a006100720065002000710075006500730074006500200069006d0070006f007300740061007a0069006f006e00690020007000650072002000630072006500610072006500200064006f00630075006d0065006e00740069002000410064006f00620065002000500044004600200070006900f9002000610064006100740074006900200070006500720020006c0061002000760069007300750061006c0069007a007a0061007a0069006f006e0065002000730075002000730063006800650072006d006f002c0020006c006100200070006f00730074006100200065006c0065007400740072006f006e0069006300610020006500200049006e007400650072006e00650074002e0020004900200064006f00630075006d0065006e007400690020005000440046002000630072006500610074006900200070006f00730073006f006e006f0020006500730073006500720065002000610070006500720074006900200063006f006e0020004100630072006f00620061007400200065002000410064006f00620065002000520065006100640065007200200035002e003000200065002000760065007200730069006f006e006900200073007500630063006500730073006900760065002e> /JPN <FEFF753b97624e0a3067306e8868793a3001307e305f306f96fb5b5030e130fc30eb308430a430f330bf30fc30cd30c330c87d4c7531306790014fe13059308b305f3081306e002000410064006f0062006500200050004400460020658766f8306e4f5c6210306b9069305730663044307e305930023053306e8a2d5b9a30674f5c62103055308c305f0020005000440046002030d530a130a430eb306f3001004100630072006f0062006100740020304a30883073002000410064006f00620065002000520065006100640065007200200035002e003000204ee5964d3067958b304f30533068304c3067304d307e305930023053306e8a2d5b9a3067306f30d530a930f330c8306e57cb30818fbc307f3092884c306a308f305a300130d530a130a430eb30b530a430ba306f67005c0f9650306b306a308a307e30593002> /KOR <FEFFc7740020c124c815c7440020c0acc6a9d558c5ec0020d654ba740020d45cc2dc002c0020c804c7900020ba54c77c002c0020c778d130b137c5d00020ac00c7a50020c801d569d55c002000410064006f0062006500200050004400460020bb38c11cb97c0020c791c131d569b2c8b2e4002e0020c774b807ac8c0020c791c131b41c00200050004400460020bb38c11cb2940020004100630072006f0062006100740020bc0f002000410064006f00620065002000520065006100640065007200200035002e00300020c774c0c1c5d0c11c0020c5f40020c2180020c788c2b5b2c8b2e4002e> /LTH <FEFF004e006100750064006f006b0069007400650020016100690075006f007300200070006100720061006d006500740072007500730020006e006f0072011700640061006d00690020006b0075007200740069002000410064006f00620065002000500044004600200064006f006b0075006d0065006e007400750073002c0020006b00750072006900650020006c0061006200690061007500730069006100690020007000720069007400610069006b00790074006900200072006f006400790074006900200065006b00720061006e0065002c00200065006c002e002000700061016100740075006900200061007200200069006e007400650072006e0065007400750069002e0020002000530075006b0075007200740069002000500044004600200064006f006b0075006d0065006e007400610069002000670061006c006900200062016b007400690020006100740069006400610072006f006d00690020004100630072006f006200610074002000690072002000410064006f00620065002000520065006100640065007200200035002e0030002000610072002000760117006c00650073006e0117006d00690073002000760065007200730069006a006f006d00690073002e> /LVI <FEFF0049007a006d0061006e0074006f006a00690065007400200161006f00730020006900650073007400610074012b006a0075006d00750073002c0020006c0061006900200076006500690064006f00740075002000410064006f00620065002000500044004600200064006f006b0075006d0065006e007400750073002c0020006b006100730020006900720020012b00700061016100690020007000690065006d01130072006f007400690020007201010064012b01610061006e0061006900200065006b00720101006e0101002c00200065002d00700061007300740061006d00200075006e00200069006e007400650072006e006500740061006d002e00200049007a0076006500690064006f006a006900650074002000500044004600200064006f006b0075006d0065006e007400750073002c0020006b006f002000760061007200200061007400760113007200740020006100720020004100630072006f00620061007400200075006e002000410064006f00620065002000520065006100640065007200200035002e0030002c0020006b0101002000610072012b00200074006f0020006a00610075006e0101006b0101006d002000760065007200730069006a0101006d002e> /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken die zijn geoptimaliseerd voor weergave op een beeldscherm, e-mail en internet. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.) /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f0070007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740065007200200073006f006d00200065007200200062006500730074002000650067006e0065007400200066006f007200200073006b006a00650072006d007600690073006e0069006e0067002c00200065002d0070006f007300740020006f006700200049006e007400650072006e006500740074002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e00650073002000690020004100630072006f00620061007400200065006c006c00650072002000410064006f00620065002000520065006100640065007200200035002e003000200065006c006c00650072002000730065006e006500720065002e> /POL <FEFF0055007300740061007700690065006e0069006100200064006f002000740077006f0072007a0065006e0069006100200064006f006b0075006d0065006e007400f300770020005000440046002000700072007a0065007a006e00610063007a006f006e00790063006800200064006f002000770079015b0077006900650074006c0061006e006900610020006e006100200065006b00720061006e00690065002c0020007700790073007901420061006e0069006100200070006f0063007a0074010500200065006c0065006b00740072006f006e00690063007a006e01050020006f00720061007a00200064006c006100200069006e007400650072006e006500740075002e002000200044006f006b0075006d0065006e0074007900200050004400460020006d006f017c006e00610020006f007400770069006500720061010700200077002000700072006f006700720061006d006900650020004100630072006f00620061007400200069002000410064006f00620065002000520065006100640065007200200035002e0030002000690020006e006f00770073007a0079006d002e> /PTB <FEFF005500740069006c0069007a006500200065007300730061007300200063006f006e00660069006700750072006100e700f50065007300200064006500200066006f0072006d00610020006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000410064006f0062006500200050004400460020006d00610069007300200061006400650071007500610064006f00730020007000610072006100200065007800690062006900e700e3006f0020006e0061002000740065006c0061002c0020007000610072006100200065002d006d00610069006c007300200065002000700061007200610020006100200049006e007400650072006e00650074002e0020004f007300200064006f00630075006d0065006e0074006f00730020005000440046002000630072006900610064006f007300200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002000650020006f002000410064006f00620065002000520065006100640065007200200035002e0030002000650020007600650072007300f50065007300200070006f00730074006500720069006f007200650073002e> /RUM <FEFF005500740069006c0069007a00610163006900200061006300650073007400650020007300650074010300720069002000700065006e007400720075002000610020006300720065006100200064006f00630075006d0065006e00740065002000410064006f006200650020005000440046002000610064006500630076006100740065002000700065006e0074007200750020006100660069015f006100720065006100200070006500200065006300720061006e002c0020007400720069006d0069007400650072006500610020007000720069006e00200065002d006d00610069006c0020015f0069002000700065006e00740072007500200049006e007400650072006e00650074002e002000200044006f00630075006d0065006e00740065006c00650020005000440046002000630072006500610074006500200070006f00740020006600690020006400650073006300680069007300650020006300750020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200035002e00300020015f00690020007600650072007300690075006e0069006c006500200075006c0074006500720069006f006100720065002e> /RUS <FEFF04180441043f043e043b044c04370443043904420435002004340430043d043d044b04350020043d0430044104420440043e0439043a043800200434043b044f00200441043e043704340430043d0438044f00200434043e043a0443043c0435043d0442043e0432002000410064006f006200650020005000440046002c0020043c0430043a04410438043c0430043b044c043d043e0020043f043e04340445043e0434044f04490438044500200434043b044f0020044d043a04400430043d043d043e0433043e0020043f0440043e0441043c043e044204400430002c0020043f0435044004350441044b043b043a04380020043f043e0020044d043b0435043a04420440043e043d043d043e04390020043f043e044704420435002004380020044004300437043c043504490435043d0438044f0020043200200418043d044204350440043d043504420435002e002000200421043e043704340430043d043d044b04350020005000440046002d0434043e043a0443043c0435043d0442044b0020043c043e0436043d043e0020043e0442043a0440044b043204300442044c002004410020043f043e043c043e0449044c044e0020004100630072006f00620061007400200438002000410064006f00620065002000520065006100640065007200200035002e00300020043800200431043e043b043504350020043f043e04370434043d043804450020043204350440044104380439002e> /SKY <FEFF0054006900650074006f0020006e006100730074006100760065006e0069006100200070006f0075017e0069007400650020006e00610020007600790074007600e100720061006e0069006500200064006f006b0075006d0065006e0074006f0076002000410064006f006200650020005000440046002c0020006b0074006f007200e90020007300610020006e0061006a006c0065007001610069006500200068006f0064006900610020006e00610020007a006f006200720061007a006f00760061006e006900650020006e00610020006f006200720061007a006f0076006b0065002c00200070006f007300690065006c0061006e0069006500200065002d006d00610069006c006f006d002000610020006e006100200049006e007400650072006e00650074002e00200056007900740076006f00720065006e00e900200064006f006b0075006d0065006e007400790020005000440046002000620075006400650020006d006f017e006e00e90020006f00740076006f00720069016500200076002000700072006f006700720061006d006f006300680020004100630072006f00620061007400200061002000410064006f00620065002000520065006100640065007200200035002e0030002000610020006e006f0076016100ed00630068002e> /SLV <FEFF005400650020006e006100730074006100760069007400760065002000750070006f0072006100620069007400650020007a00610020007500730074007600610072006a0061006e006a006500200064006f006b0075006d0065006e0074006f0076002000410064006f006200650020005000440046002c0020006b006900200073006f0020006e0061006a007000720069006d00650072006e0065006a016100690020007a00610020007000720069006b0061007a0020006e00610020007a00610073006c006f006e0075002c00200065002d0070006f01610074006f00200069006e00200069006e007400650072006e00650074002e00200020005500730074007600610072006a0065006e006500200064006f006b0075006d0065006e0074006500200050004400460020006a00650020006d006f0067006f010d00650020006f0064007000720065007400690020007a0020004100630072006f00620061007400200069006e002000410064006f00620065002000520065006100640065007200200035002e003000200069006e0020006e006f00760065006a01610069006d002e> /SUO <FEFF004b00e40079007400e40020006e00e40069007400e4002000610073006500740075006b007300690061002c0020006b0075006e0020006c0075006f00740020006c00e400680069006e006e00e40020006e00e40079007400f60073007400e40020006c0075006b0065006d0069007300650065006e002c0020007300e40068006b00f60070006f0073007400690069006e0020006a006100200049006e007400650072006e0065007400690069006e0020007400610072006b006f006900740065007400740075006a0061002000410064006f0062006500200050004400460020002d0064006f006b0075006d0065006e007400740065006a0061002e0020004c0075006f0064007500740020005000440046002d0064006f006b0075006d0065006e00740069007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f0062006100740069006c006c00610020006a0061002000410064006f00620065002000520065006100640065007200200035002e0030003a006c006c00610020006a006100200075007500640065006d006d0069006c006c0061002e> /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006f006d002000640075002000760069006c006c00200073006b006100700061002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400200073006f006d002000e400720020006c00e4006d0070006c0069006700610020006600f6007200200061007400740020007600690073006100730020007000e500200073006b00e40072006d002c0020006900200065002d0070006f007300740020006f006300680020007000e500200049006e007400650072006e00650074002e002000200053006b006100700061006400650020005000440046002d0064006f006b0075006d0065006e00740020006b0061006e002000f600700070006e00610073002000690020004100630072006f0062006100740020006f00630068002000410064006f00620065002000520065006100640065007200200035002e00300020006f00630068002000730065006e006100720065002e> /TUR <FEFF0045006b00720061006e002000fc0073007400fc0020006700f6007200fc006e00fc006d00fc002c00200065002d0070006f00730074006100200076006500200069006e007400650072006e006500740020006900e70069006e00200065006e00200075007900670075006e002000410064006f006200650020005000440046002000620065006c00670065006c0065007200690020006f006c0075015f007400750072006d0061006b0020006900e70069006e00200062007500200061007900610072006c0061007201310020006b0075006c006c0061006e0131006e002e00200020004f006c0075015f0074007500720075006c0061006e0020005000440046002000620065006c00670065006c0065007200690020004100630072006f0062006100740020007600650020004100630072006f006200610074002000520065006100640065007200200035002e003000200076006500200073006f006e0072006100730131006e00640061006b00690020007300fc007200fc006d006c00650072006c00650020006100e70131006c006100620069006c00690072002e> /UKR <FEFF04120438043a043e0440043804410442043e043204430439044204350020044604560020043f043004400430043c043504420440043800200434043b044f0020044104420432043e04400435043d043d044f00200434043e043a0443043c0435043d044204560432002000410064006f006200650020005000440046002c0020044f043a0456043d04300439043a04400430044904350020043f045604340445043e0434044f0442044c00200434043b044f0020043f0435044004350433043b044f043404430020043700200435043a04400430043d044300200442043000200406043d044204350440043d043504420443002e00200020042104420432043e04400435043d045600200434043e043a0443043c0435043d0442043800200050004400460020043c043e0436043d04300020043204560434043a0440043804420438002004430020004100630072006f006200610074002004420430002000410064006f00620065002000520065006100640065007200200035002e0030002004300431043e0020043f04560437043d04560448043e04570020043204350440044104560457002e> /ENU (Use these settings to create Adobe PDF documents best suited for on-screen display, e-mail, and the Internet. Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.) >> /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ << /AsReaderSpreads false /CropImagesToFrames true /ErrorControl /WarnAndContinue /FlattenerIgnoreSpreadOverrides false /IncludeGuidesGrids false /IncludeNonPrinting false /IncludeSlug false /Namespace [ (Adobe) (InDesign) (4.0) ] /OmitPlacedBitmaps false /OmitPlacedEPS false /OmitPlacedPDF false /SimulateOverprint /Legacy >> << /AddBleedMarks false /AddColorBars false /AddCropMarks false /AddPageInfo false /AddRegMarks false /ConvertColors /ConvertToRGB /DestinationProfileName (sRGB IEC61966-2.1) /DestinationProfileSelector /UseName /Downsample16BitImages true /FlattenerPreset << /PresetSelector /MediumResolution >> /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles true /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /NA /PreserveEditing false /UntaggedCMYKHandling /UseDocumentProfile /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> ] >> setdistillerparams << /HWResolution [600 600] /PageSize [612.000 792.000] >> setpagedevice

HistoryItem_V1 TrimAndShift Range: all pages Trim: fix size 8.500 x 11.000 inches / 215.9 x 279.4 mm Shift: none Normalise (advanced option): 'original' 32 D:20120516081844 792.0000 US Letter Blank 612.0000 Tall 1 0 No 675 320 None Up 0.0000 0.0000 Both AllDoc PDDoc Uniform 0.0000 Top QITE_QuiteImposingPlus2 Quite Imposing Plus 2.9 Quite Imposing Plus 2 1 4 3 4 1 HistoryList_V1 qi2base