Summer DSRT Summ

GymBoy
Recommending_Privacy_Settings_.pdf

Recommending Privacy Settings for Internet-of-Things

A Dissertation Presented to

the Graduate School of Clemson University

In Partial Fulfillment of the Requirements for the Degree

Doctor of Philosophy Computer Science

by Yang He

December 2019

Accepted by: Dr. Bart P. Knijnenburg, Committee Chair

Dr. Larry F. Hodges Dr. Alexander Herzog

Dr. Ilaria Torre

ProQuest Number:

All rights reserved

INFORMATION TO ALL USERS The quality of this reproduction is dependent on the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed,

a note will indicate the deletion.

Published by ProQuest LLC (

ProQuest

). Copyright of the Dissertation is held by the Author.

All Rights Reserved. This work is protected against unauthorized copying under Title 17, United States Code

Microform Edition © ProQuest LLC.

ProQuest LLC 789 East Eisenhower Parkway

P.O. Box 1346 Ann Arbor, MI 48106 - 1346

27667387

27667387

2020

Abstract

Privacy concerns have been identified as an important barrier to the growth of IoT. These

concerns are exacerbated by the complexity of manually setting privacy preferences for numerous

different IoT devices. Hence, there is a demand to solve the following, urgent research question: How

can we help users simplify the task of managing privacy settings for IoT devices in a user-friendly

manner so that they can make good privacy decisions?

To solve this problem in the IoT domain, a more fundamental understanding of the logic

behind IoT users’ privacy decisions in different IoT contexts is needed. We, therefore, conducted a

series of studies to contextualize the IoT users’ decision-making characteristics and designed a set of

privacy-setting interfaces to help them manage their privacy settings in various IoT contexts based

on the deeper understanding of users’ privacy decision behaviors.

In this dissertation, we first present three studies on recommending privacy settings for

different IoT environments, namely general/public IoT, household IoT, and fitness IoT, respectively.

We developed and utilized a “data-driven” approach in these three studies—We first use statistical

analysis and machine learning techniques on the collected user data to gain the underlying insights of

IoT users’ privacy decision behavior and then create a set of “smart” privacy defaults/profiles based

on these insights. Finally, we design a set of interfaces to incorporate these privacy default/profiles.

Users can apply these smart defaults/profiles by either a single click or by answering a few related

questions. The biggest limitation of these three studies is that the proposed interfaces have not

been tested, so we do not know what level of complexity (both in terms of the user interface and

the in terms of the profiles) is most suitable. Thus, in the last study, we address this limitation

by conducting a user study to evaluate the new interfaces of recommending privacy settings for

household IoT users. The results show that our proposed user interfaces for setting household IoT

privacy settings can improve users’ satisfaction. Our research can benefit IoT users, manufacturers,

ii

Yang He Dissertation

and researchers, privacy-setting interface designers and anyone who wants to adopt IoT devices by

providing interfaces that put their most prominent concerns in the forefront and that make it easier

to set settings that match their preferences.

iii

Dedication

To

my wife Mandy, my parents, and my friends

in recognition of their undying support!

iv

Acknowledgments

First and foremost, I would like to express immense gratitude to my advisor, Dr. Bart

Knijnenburg, for his support, patience, and understanding during the past four years. Second, I

would like to thank my committee members, Dr. Larry Hodges, Dr. Alexander Herzog, and Dr.

Ilaria Torre, for their help and suggestions they provided on my research. Third, I want to thank

Paritosh, Reza, and Byron for their friendship and inspiration. Last but not least, I want to thank

my parents and my wife Mandy who have supported me throughout this long journey.

Financial support was provided by Clemson University, NSF grants 1640664 and CNS-

1126344, and a gift from Samsung Research America.

v

Outline

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 IoT technology and IoT Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 IoT Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Model the Acceptance of IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Privacy setting technologies in IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1 Privacy Preference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Privacy in IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Existing Privacy Setting Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4 Privacy-Setting Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.5 Privacy Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Recommending Privacy Settings for General/Public IoT . . . . . . . . . . . . . . 18 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Dataset and design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.4 Predicting users’ behaviors (original work) . . . . . . . . . . . . . . . . . . . . . . . . 23 4.5 Privacy shortcuts (original work) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.6 Discussion and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Recommending Privacy Settings for Household IoT . . . . . . . . . . . . . . . . . 37 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.3 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.4 Privacy-Setting Prototype Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.5 Predicting users’ behaviors (original work) . . . . . . . . . . . . . . . . . . . . . . . . 49 5.6 Privacy-Setting Prototype Design Using Machine Learning Results (original work) . 66 5.7 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

vi

Yang He Dissertation

5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6 Recommending Privacy Settings for Fitness IoT . . . . . . . . . . . . . . . . . . . 73 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.2 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.4 Predicting users’ Preference (partial original work) . . . . . . . . . . . . . . . . . . . 78 6.5 Profile Prediction (partial original work) . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.6 Privacy-setting Recommendations (partial original work) . . . . . . . . . . . . . . . . 89 6.7 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

7 Evaluate the Household IoT Privacy-setting Profiles and User Interfaces . . . . 100 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7.2 Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 7.3 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

vii

List of Tables

4.1 Parameters used in the experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Comparison of clustering approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3 Confusion matrix for the overall prediction . . . . . . . . . . . . . . . . . . . . . . . 25 4.4 Drill down of the Overall Prediction tree for ‘who’ = ‘Own device’ . . . . . . . . . . 27

5.1 Parameters used to construct the information-sharing scenarios. . . . . . . . . . . . . 42 5.2 Comparison of clustering approaches (highest parsimony and highest accuracy) . . . 50 5.3 Confusion matrix for the One Rule prediction . . . . . . . . . . . . . . . . . . . . . . 51 5.4 Confusion matrix for the overall prediction . . . . . . . . . . . . . . . . . . . . . . . 53

7.1 Factor Items in Trimmed CFA Model . . . . . . . . . . . . . . . . . . . . . . . . . . 108

A1 Table of Accuracies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

viii

List of Figures

2.1 The factors that affecting users’ adoptions of IoT found in our study . . . . . . . . . 9 2.2 Trust Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.1 From Left, Screen 1 shows three default settings, Screen 2,3 and 4 shows layered interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2 The Overall Prediction decision tree. Further drill down for ‘who’ = ‘Own device’ is provided in Table 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3 Attitude-based clustering: 2-cluster tree. . . . . . . . . . . . . . . . . . . . . . . . . 28 4.4 Attitude-based clustering: 3-cluster tree . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.5 The Flow Chart for Fit-based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.6 Fit-based clustering: 3-cluster tree. Further drill down is hidden for space reasons. . 31 4.7 Accuracy of our clustering approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.8 Two types of profile choice interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.1 Example of one of the thirteen scenarios presented to the participants. . . . . . . . . 41 5.2 Attention check questions asked to participants . . . . . . . . . . . . . . . . . . . . . 43 5.3 Transcript of video shown to participants if they failed attention checks. . . . . . . . 44 5.4 Attention check question shown while participants are answering questions per scenario. 45 5.5 Attention check question asked to participants. . . . . . . . . . . . . . . . . . . . . . 46 5.6 Privacy-Setting Interfaces Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.7 A “smart default” setting based on the “One Rule” algorithm. . . . . . . . . . . . . 51 5.8 A “smart default” setting with 264 nodes with 63.76% accuracy. . . . . . . . . . . . 52 5.9 Accuracy and parsimony (tree size) of the smart default change as a function of

Confidence Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.10 Parsimony/accuracy comparison for Naive, One Rule, and Overall Prediction . . . . 54 5.11 A “smart default” setting with only 8 nodes 63.32% accuracy. . . . . . . . . . . . . . 54 5.12 Different tests conducted for mediation analysis . . . . . . . . . . . . . . . . . . . . . 55 5.13 Parsimony/accuracy comparison for attitude-based clustering . . . . . . . . . . . . . 56 5.14 The most parsimonious 2-profile attitude-based solution. . . . . . . . . . . . . . . . 57 5.15 A 3-profile solution example of attitude-based clustering. . . . . . . . . . . . . . . . . 58 5.16 Parsimony/accuracy comparison for agglomerative clustering . . . . . . . . . . . . . 59 5.17 The best 4-profile agglomerative clustering solution. . . . . . . . . . . . . . . . . . . 59 5.18 The best 5-profile agglomerative clustering solution. . . . . . . . . . . . . . . . . . . 60 5.19 The best 6-profile agglomerative clustering solution. . . . . . . . . . . . . . . . . . . 60 5.20 Parsimony/accuracy comparison for fit-based clustering . . . . . . . . . . . . . . . . 62 5.21 The most parsimonious 3-profile fit-based solution. . . . . . . . . . . . . . . . . . . . 62 5.22 The most parsimonious 4-profile fit-based solution. . . . . . . . . . . . . . . . . . . . 63 5.23 The most parsimonious 5-profile fit-based solution. . . . . . . . . . . . . . . . . . . . 63 5.24 Summary of All our Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.25 A good 5-profile fit-based clustering solution. . . . . . . . . . . . . . . . . . . . . . . 66 5.26 Design for 5-Profile solution presented in Section 5.6.1. . . . . . . . . . . . . . . . . . 68

ix

Yang He Dissertation

5.27 Design for 5-Profile solution presented in Section 5.6.2. . . . . . . . . . . . . . . . . . 70

6.1 Comparison of permissions asked by Fitness Trackers and the fitness IoT Data Model used for this study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.2 Interface examples of Smartphone Permissions requests for Fitbit trackers (S set) . . 76 6.3 Interface example of In-App Permissions requests in Fitbit Android App (A set) . . 76 6.4 Average values of each privacy permissions (1-allow, 0-deny). . . . . . . . . . . . . . 79 6.5 Evaluation of different numbers of clusters for each set. . . . . . . . . . . . . . . . . 80 6.6 Privacy profiles from the two clustering methods: 1-cluster results (full data) and

2-clusters results (privacy subprofiles) for each dataset(allow=1, deny=0, except for frequency & retention) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.7 The permission drivers for the privacy subprofiles and their respective prediction accuracies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.8 The attitude drivers for the privacy subprofiles and their respective prediction accuracies. 85 6.9 The social behavior drivers for the privacy subprofiles and their respective prediction

accuracies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.10 The user negotiability drivers for the privacy subprofiles and their respective predic-

tion accuracies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.11 Tree evaluation. Root mean square error for each J48 tree algorithm. . . . . . . . . . 93 6.12 Manual settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.13 Smart Single settings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.14 Interaction for picking a subprofile for the S set. . . . . . . . . . . . . . . . . . . . . 96 6.15 Direct Prediction questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.16 Indirect Prediction questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.17 Average accuracies of the recommender strategies on the holdout 30 users. . . . . . . 99

7.1 CFA Saturated Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.2 Trimmed CFA Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.3 Factor Correlation Matrix (on the diagonal is the sqrt(AVE)). . . . . . . . . . . . . . 109 7.4 Preliminary SEM Model with perceived privacy threats . . . . . . . . . . . . . . . . 109 7.5 Preliminary SEM Model with effect from manipulations to perceived control . . . . . 110 7.6 Trimmed structural equation model. ∗p < .05,∗ ∗ p < .01,∗ ∗ ∗p < .001. . . . . . . . 110 7.7 Effects of profile complexity on perceived helpfulness . . . . . . . . . . . . . . . . . . 111 7.8 Total Effects of profile complexity on perceived control . . . . . . . . . . . . . . . . . 112 7.9 Total Effects of profile complexity on satisfaction . . . . . . . . . . . . . . . . . . . . 112 7.10 Total Effects of profile complexity on trust . . . . . . . . . . . . . . . . . . . . . . . . 113 7.11 Total Effects of profile complexity on usefulness . . . . . . . . . . . . . . . . . . . . . 113 7.12 Effect size of average time spent on different UI pages . . . . . . . . . . . . . . . . . 114

A1 Attention Check Question of Evaluating privacy-setting UI for Household IoT . . . . 129 A2 Instructions on how to use the UIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 A3 User Interface 1 with all settings turned off . . . . . . . . . . . . . . . . . . . . . . . 132 A4 User Interface 2 with all settings turned off . . . . . . . . . . . . . . . . . . . . . . . 132 A5 User Interface 1 with all settings turned on . . . . . . . . . . . . . . . . . . . . . . . 133 A6 User Interface 2 with all settings turned on . . . . . . . . . . . . . . . . . . . . . . . 133 A7 User Interface 1 with Smart Default . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 A8 User Interface 2 with Smart Default . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 A9 User Interface 1 with Smart Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 A10 User Interface 2 with Smart Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

x

Chapter 1

Introduction

During the last two decades, computers have evolved into all kinds of small footprint

internet-connected devices that are capable of: 1) tracking us as we move about the built environ-

ment such as public spaces, offices, schools, universities; 2) being embedded in household appliances

such as smart phones, TVs, refrigerators, light fixtures and thermostats to create ‘smart home’

environments; 3) tracking our personal data daily as we wear them, such as smart watches, and

fitness trackers. All these computers/devices have been integrated seamlessly into people’s lives,

which is defined as “Internet of Things”. By using all kinds of wireless sensor technologies (e.g.

RFID, cameras, microphones, GPS, and accelerometers) and artificial intelligence, these internet-

connected devices are able to gain knowledge of their surrounding and their users, exchange data

with each other, monitor and control remotely controlled devices, and further interact with third-

parties to provide us better personalized services, recommendations, and advertisements. They have

been widely used in many fields, such as tracking, transportation, household usage, healthcare and

fitness [75, 107, 60, 58, 47].

A wide range of well-respected organizations has estimated that IoT will grow rapidly and

bring huge social and economic potential. For example, Gartner [29] has predicted over 21 billion

IoT devices will be in use by 2020; IoT product and service suppliers will generate incremental

revenue exceeding $300 billion. IDC forecasts a global market for IoT will grow from $1.9 trillion in

2013 to $7.1 trillion in 2020 [96]. However, the rise of IoT also comes with a number of key security

and privacy concerns. These include facilitation of the collection of large amounts of consumer

data [128], processing and storing the data in ways unexpected by the consumer [81], and privacy

1

Yang He Dissertation

and security breaches [81, 139].

IoT devices are intended to collect information from the users to realize their functionalities.

Technical solutions can be used to minimize the data collected for such functionality [67, 92, 122],

but arguably, any useful functionality would necessitate at least some amount of personal data.

Therefore, users will have to manage a trade-off between privacy and functionality: a solution that

is fully privacy preserving will be limited in functionality, while a fully functional IoT solution

would demand extensive data collection and sharing with others. Research has shown that user

employ a method called privacy calculus —i.e. that they make disclosure decisions by trading off the

anticipated benefits with the risks of disclosure [23, 70, 110]. However, as the diversity of IoT devices

increases, it becomes increasingly difficult to keep up with the many different ways in which data

about ourselves is collected and disseminated. Although generally users care about their privacy,

few of them in practice find time to carefully read the privacy policies or the privacy-settings that

are provided to them [28, 42]. For example, one found that 59% of users say they have read privacy

notices, while 91% thought it important to post privacy notices [28]. In [115], Tuunainen et al.

find that only 27% participants are aware that Facebook can share their information with people or

organisations outside of Facebook for marketing purpose as their privacy policy.

There are several reasons for this problem: i) Users will pay more attention to the benefit

than potential risks from using IoT devices or services [36]. ii) The privacy policies are too long, or the

privacy setting of such devices are too complicated, making users irritated to finish reading/setting

them [85]. iii) As the number IoT devices rapidly increases, the numbers and options of privacy

setting for all the IoT devices will also increase exponentially. Moreover, each device will have its

own fine-grained privacy settings (often hidden deep within an unassuming “other settings” category

in the settings interface), and many inter-dependencies exist between devices — both in privacy and

functionality. Therefore, there is a large chance that users would make inconsistent privacy decisions

that either limit functionality of their IoT devices or that do not protect their privacy in the end. In

addition, the current user interface for setting privacy preferences of present IoT devices is imperfect

even for a smartphone, not to mention the complexity of manually setting privacy preferences for

numerous different other IoT devices. Hence, there is an urgent demand to solve the following

research question:

Can we simplify the task of managing privacy setting for users of different IoT

contexts?

2

Yang He Dissertation

Prior research (chapter 2) has explored different approaches to this problem in other do-

mains, including providing 1) transparency and control [30, 1, 61, 13, 17], and 2) privacy nudges [5,

79, 37, 79]. However, neither of them provides a satisfying solution in the IoT domain. Providing

transparency and control does give users the freedom of managing their privacy in IoT according to

their own privacy decisions, but privacy decision making is often not rational [61]. Thus, such extra

transparency and control may increase the difficulty of setting appropriate privacy for users. Privacy

nudges are usually implemented in the form of prompts, which will create constant noises given that

the IoT systems usually work in the background. At the same time, they lack personalization to the

inherent diversity of users’ privacy preferences and the context-dependency of their decisions.

To solve these problems in the IoT domain, a more fundamental understanding of the logic

behind IoT users’ privacy decisions in different IoT contexts is needed. I therefore conducted a

series of studies to contextualize the IoT users’ decision making characteristics, and designed a set

of privacy-setting interfaces to help them manage their privacy settings in various IoT contexts based

on the deeper understanding of users’ privacy decision behavior.

In this dissertation, I first present the background and related work of this dissertation in

Chapter 2 and 3. Then, I present three studies on recommending privacy settings for different IoT

environments, namely general/public IoT in Chapter 4, household IoT in Chapter 5, and fitness IoT

in Chapter 6, respectively. One should observe that the above three studies follow an decreasing

order in terms of the IoT context scope. In the first study, I focused on the privacy decision regarding

the entities collecting information from the users, while in the following two studies the context was

moved to a more narrow environment (household IoT and fitness IoT), which shifts the focus to a

more contextual evaluation of the content or nature of the information. This explains why in the

first two studies, the dimensions used to analyze the context are the parameters of the corresponding

IoT scenarios; and for the third study, the focus is on the fitness tracker permission questions. Note

that the above three works all utilized a “ data-driven design” — We first use statistical analysis

(applicable to the first two works) and machine learning techniques on the collected user data to

gain the underlying insights of IoT users’ privacy decision behavior; and then a set of “smart”

privacy defaults/profiles were created based on these insights. Finally, we design a set of interfaces

to incorporate these privacy default/profiles. Users can apply these smart defaults/profiles by either

a single click (applicable to the first two works) or by answering a few related questions (applicable

to the third work). To test the presented interfaces and uncover what level of complexity (both

3

Yang He Dissertation

in terms of the user interface and the in terms of the profiles) is most suitable, I conducted a user

study to evaluate the new interface of recommending privacy-settings for household IoT in Chapter 7.

Finally, I conclude this dissertation with contributions and limitations in Chapter 8.

4

Chapter 2

IoT technology and IoT

Acceptance

In this chapter, we first discuss how Internet-of-Things enter into people’s daily lives, how

people benefit from using IoT, what kinds of disadvantages IoT has brought, and the aspects that

current IoT research has focused on. We then look at what factors are affecting potential IoT users

when they are considering adopting this new technology.

When users are considering adopting new IoT devices, they want to take the benefits of

using IoT devices by sharing and disclosing certain personal information to get a more personalized

experience [41]. However, such disclosed information could be accessed by other smart devices

owned by themselves, other people, organizations, the government, or some third-parties with good

or bad purpose, which brings privacy risks to the users [83]. Thus, we attempt to obtain a clear

understanding of the IoT acceptance model before our further research for the following reasons:

1) The factors that affect users’ adopting phase may have a high chance to also have an effect on

users’ real using phase, which could help us understand how the IoT users make privacy decisions

when they share their personal data in different IoT contexts; 2) These factors may further affect

how we design the user interface for setting privacy preferences and recommend privacy-settings for

different IoT contexts; 3) These factors can potentially help us develop the scales to evaluate the

interfaces that we design, and build a theoretical model.

5

Yang He Dissertation

2.1 IoT Technology

The term “Internet of Things” (IoT) was first introduced by Kevin Ashton in the context of

supply chain management in 1999 [7]. Atozri et al. define IoT as a pervasive presence around us of a

variety of things or objects - such as Radio-Frequency IDentification (RFID) tags, sensors, actuators,

mobile phones, etc. – which, through a unique addressing scheme, are able to interact with each

other and cooperate with their neighbors to reach common goals [8]. As various wireless sensor

technologies (e.g. RFID, embedded sensors, and actuator nodes) and artificial intelligence have

advanced rapidly during the last two decades, the definition of IoT has evolved to more broadly

covering a wide range of monitoring and control applications based on a network of sensing and

actuating devices that are controlled remotely through the Internet in many fields, such as tracking,

transportation, household usage, healthcare and fitness [75, 107, 60, 58, 47].

IoT can benefit both organizations and individual consumers in all above-mentioned domains

by enhancing data collection, enabling real-time response, improving access and control of internet-

connected devices, increasing efficiency, productivity, and satisfaction [128, 95]. With such huge

social and economic potential, IoT is estimated to grow rapidly by a wide range of well-respected

organizations. However, there exist several key security and privacy concerns associated with the

rise of the IoT, including data processing and storage, privacy and security breaches[128, 81, 139].

Previous studies mostly focused on the technical issues of IoT technologies [33, 71, 104].

For example, Uckelmann et al. systematically explained the architecture of future IoT [116]. Chen

et al. present a vision of IoT’s applications in China [21]. Guinard et al. described the IoT’s best

practices based on the web technologies and proposed several prototypes using the web principles,

which connect environmental sensor nodes, energy monitoring systems, and RFID-tagged objects

to the web [45]. However, little attention has been devoted to, from the perspective of individual

consumers, understanding how the user of IoT will trade off the above mentioned benefits and

privacy concerns of IoT technology when they consider adopting it [4, 39, 86].

Furthermore, researchers identified security and privacy issues as the major challenges for

consumer acceptance of the IoT technology’s user-oriented IoT applications [83]. Arguably, if the

users find that their privacy demands can not be satisfied when using IoT devices after adopting

them, they would probably finally give up using these devices.

6

Yang He Dissertation

2.2 Model the Acceptance of IoT

In this section, we first discuss the original Technology Acceptance Model and the adapted

Unified Theory of Acceptance and Use of Technology (UTAUT) model. Then we look at what are

the factors that affect potential IoT users to adopt IoT systems.

2.2.1 Technology Acceptance Model

The Technology Acceptance Model (TAM) is arguably the most popular model that ex-

plains how users come to accept and use a technology [26]. TAM suggests that an individual’s

Behavioral Intention to Use an information technology is significantly dependent upon the individ-

ual’s Perceived Usefulness and Perceived Ease Of Use of that information technology. Specifically,

perceived usefulness is the extent to which an individual believes that using a particular information

technology will have a positive impact on his/her performance. Perceived ease of use is the extent

to which an individual perceives that using a particular information technology will be free of ef-

fort. TAM also proposes that perceived ease of use can explain the variance in perceived usefulness.

TAM has been applied to a wide range of technology adoption contexts [134], such as the adoption

of PC [120], smartphones [90], mobile marketing [12], Internet banking [94], facebook [74, 98], and

online shopping [40].

2.2.2 UTAUT

The unified theory of acceptance and use of technology (UTAUT) is a technology acceptance

model proposed by Venkatesh et al. [121]. Compared to TAM, UTAUT identifies four key factors:

1) performance expectancy, 2) effort expectancy, 3) social influence, and 4) facilitating conditions,

related to predicting behavioral intention to use a technology and actual technology use primarily in

organizational contexts. The first three factors are theorized and found to influence behavioral inten-

tion to use a technology, while behavioral intention and facilitating conditions determine technology

use. UTAUT also identifies four moderators (i.e., age, gender, experience, and voluntariness).

UTAUT has been applied or extended in many contexts, such as electronic learning [125],

e-government [127], and cloud computing [77]. UTAUT is developed upon previous models of tech-

nology adoption, and designed specifically to investigate users’ acceptance of a new technology.

Thus, it has explanatory power higher than previous models.

7

Yang He Dissertation

2.2.3 The Acceptance of IoT

Researchers have attempted to identify the factors that affect the acceptance of IoT by

customers. Acquity Group [44] conducted a user study investigating the concerns of customers to

adopt the IoT. Based on more than 2000 US-based customer survey, They find that awareness of the

technology, usefulness, price (cost), security, privacy are the main concerns of the customers. In [39],

Gao and Bai present a user study (N=368) to investigate the factors that affect the acceptance of

IoT in China. They used the factors of TAM (i.e. perceived ease of use and perceived usefulness)

along with other factors such as trust, social influence, perceived enjoyment, and perceived behavioral

control. Their results show that perceived usefulness, perceived ease of use, social influence, perceived

enjoyment, and perceived behavioral control have significant effect on users’ behavioral intention to

use the IoT. In [73], Lee and Shin develop and test factors determining user acceptance of IoT

services by extending current UTAUT model to include an extra hindering condition to explain the

dual attitudes of users such as technical anxiety.

2.2.4 A preliminary study (original work)

We also conducted a preliminary/pilot study on Clemson University campus (N=15) with

the aim to investigate the various factors that affect the adoption of IoT by interviewing with poten-

tial IoT users. The interviews were approximately 30-50 minutes in length and covered a wide range

of open questions related to IoT. These questions asked participants about their personal preferences

regarding to the technology and self-perceived tech savviness. In this study, the conversations with

our participants were recorded only after obtaining their consent. This study was approved by IRB.

The entire recorded conversation was then transcribed manually. We then extracted keywords from

participants’ statements during the interview, such as “privacy” or “ease of use”. These keywords

were then grouped using card sorting and affinity diagram techniques. We then used a grounded

approach to creating a theory, which is shown in Figure 2.1.

The results showed similar findings as aforementioned work [39, 4]. However, in our study,

we noticed an interesting phenomenon that no literature has mentioned – once trust with the man-

ufacturers is established, it can propagate from the manufacturers to a third-party, which users are

not aware of or even know about in the first place. We define this phenomenon as Trust Chain.

An example of Trust Chain from our interview is:

8

Yang He Dissertation

IoT Adoption

Affordability

Convenience

Control

Privacy

Trust

Affect Easy of use

Figure 2.1: The factors that affecting users’ adoptions of IoT found in our study

I: “Would you be alright if the manufacturer of those products collect your data and share

with other organizations and provide more specific recommendation to you? Will you be OK with

that?”

P: “I think I can be OK with that. Because the data this company collected are most time

just shared or transferred to other companies who can analyze these data and get some information

from these data.”

I: “Any company or any organization?”

P: “I think most are the manufacturers that I trust.”

I: “So you are OK with them to share your data?”

P: “Yes, I trust them.”

As shown in Figure 2.2, Trust Chain is established mostly because of the trust from the

users to the manufacturers (i.e. the brand of the devices), and can arguably be categorized as an

emotional behavior because users would not have a clear sight at the benefits and risks when they

choose to trust the third parties that manufacturers choose to shared their data with. Such benefits

and risks have been defined as abstract benefits and risk. One of our current undergoing research

has shown that, in IoT domain, users are more likely to perceive concrete benefits and abstract risks,

resulting this emotional behavior phenomenon. Such behavior could bring harm to their privacy and

security. To be more rational, we suggest users to investigate the third parties that will handle their

9

Yang He Dissertation

Figure 2.2: Trust Chain

personal data. Thus, we suggest the manufacturer/designers of the IoT privacy to provide users

transparency and control on what third parties they will share users’ data with to reduce the risks

of insecure Trust Chain sharing behavior.

Knowing all these factors that affect users’ decision on adopting IoT device will help us

develop the scales for evaluating our designed privacy-setting interfaces in our proposed work (Chap-

ter 7). Based on the insights gained from this study, we encourage the designers of IoT privacy-setting

interfaces to face the difficult challenge of maximizing the usability and the privacy control of the

user interface while minimizing the privacy threats to the users, making IoT more acceptable.

2.3 Summary

In this chapter, I have noted the following points: 1) IoT have grown in use rapidly with

the advancing of RFID and other wireless sensor technologies. 2) IoT have brought convenience and

enjoyment to our daily lives. 3) Privacy concern is an important factor that affects users’ decisions

when adopting IoT. 4) The acceptance of IoT is still not systematically examined.

In the next chapter, we discuss the reason that causes the privacy issues in IoT, how effective

existing privacy control schemes are, and the work that aims to help users protect their privacy more

effectively.

10

Chapter 3

Privacy setting technologies in IoT

In chapter 2, we discuss the development of IoT and its acceptance model. As IoT systems

gain popularity and bring privacy issues at the same time, it is urgent to study the cause of these

privacy issues. By doing this, we can improve our design of IoT applications to protect IoT users’

privacy, and make IoT more acceptable.

3.1 Privacy Preference

Researchers have attempted to examine users’ privacy preferences in different areas, such

as Social Networks and mobile applications. Research has shown people differ extensively in their

privacy settings [88], but can be clustered into groups [6, 65]. In [131, 62], Facebook users are

found to belong to one of 6 types of privacy profiles which range from Privacy Maximizers to

Minimalists. In the health/fitness domain, emerging sensors and mobile applications allow people to

easily capture fine-grained personal data related to long term fitness goals. Brar and Kay discover

that users’ preferences change for every fitness/heath index. Weight was found to be the most

sensitive index [16].

At the same time, users are found to have difficulties managing their privacy settings with

current privacy-setting schemes. Liu et al. use an online survey (N=200) to investigate the difference

between the desired privacy settings and the actual privacy settings of Facebook users. Their results

show that 63% of the privacy settings for photo sharing did not match the users’ desired settings.

In [82], Madejski et al. conduct user studies to find the difference between Facebook users’ sharing

11

Yang He Dissertation

intentions and their actual privacy settings. Their results show that there is at least one violation

in the privacy settings for each of the 65 participants.

The reasons for the failure of existing privacy-setting schemes are diverse. One reason for

this is that the increasing number of privacy rules make manual privacy configuration excessively

challenging for normal users [38]. Knijnenburg et al. discover that people’s information disclosure

behaviors vary along multiple dimensions [65]. People can be classified along these dimensions

into groups with different “disclosure styles”. This result suggests that we could classify users

into their respective privacy groups and adapt their privacy practices to the disclosure style of

this group to satisfy different types of information and users. However, on the other hand, more

privacy policies would lead to more decision-making and more burden for users. Note that in the

IoT environment, the number for different IoT devices could be vast, which could potentially make

choosing adequate privacy settings a very challenging task that is likely to result in information and

choice overload [130]. Therefore, in this thesis, we will use a data-driven approach (machine learning

techniques) to discover suitable smart privacy profiles, which are generated from the results of both

statistical analysis and machine learning techniques, for users with different “disclosure styles”.

3.2 Privacy in IoT

IoT systems are capable of providing a highly personalized services to their users [118, 31, 41].

Henka et al. [49] propose an approach to personalize services in household IoT using the Global Public

Inclusive Infrastructure’s [119] preference set to describe an individual’s needs and preferences,

and then adapting a smart environment accordingly. Russell et al. [100] use unobtrusive sensors

and a micro-controller to realize human detection to provide personalization in a household IoT

environment.

Researchers have shown that privacy plays a limiting role in users’ adoption of personalized

services [111]. For example, Awad and Krishnan [9] show that privacy concerns inhibit users’ use

of personalized services, and Sutanto et al. [108] demonstrated that privacy concerns can prevent

people from using a potentially beneficial personalized application. Kobsa et al. [68] demonstrate

that the personalization provider is an important determinant of users’ privacy concerns.

Moreover, research has shown users’ willingness to provide personal information to person-

alized services depends on both the risks and benefits of disclosure [93, 50, 52], and researchers

12

Yang He Dissertation

therefore claim that both the benefits and the risks should meet a certain threshold [113], or that

they should be in balance [20].

The argument that using user-generated data for personalization can result in privacy con-

cerns has also been made in IoT environments [135, 39, 4]. One of the first examples in this regard

was the work by Sheng et al. [105], who showed that users of “u-commerce” services (IoT-driven

mobile shopping) felt less inclined to use personalized (rather than non-personalized) u-commerce

services, unless the benefits were overwhelming (i.e., providing help in an emergency).

In response, researchers have proposed frameworks with guidelines for evaluating the security

and privacy of consumer IoT applications, devices, and platforms [91, 80]. Most of these guidelines

are focused on minimizing data acquisition, storage, and collection sources. Along these guidelines,

several researchers have proposed architectures that restrict unwanted access to users’ data by IoT

devices. For example, Davies et al. propose “privacy mediators” to the data distribution pipeline

that would be responsible for data redaction and enforcement of privacy policies even before the

data is released from the user’s direct control [24]. Likewise, Jayraman et al.’s privacy preserving

architecture aggregates requested data to preserve user privacy [56].

Other research has considered IoT privacy from the end-user perspective [35], both when

it comes to research (e.g., Ur et al. investigated how privacy perceptions differ among teens and

their parents in smart security systems installed in homes [117]) and design (e.g., Williams et al.

highlight the importance of designing interfaces to manage privacy such that they are usable to the

end users of IoT devices [130], and Feth et al. investigated the creation of understandable and usable

controls [35]). We followed this approach and developed a novel data-driven approach to developing

usable and efficient privacy-setting interfaces for several different IoT contexts.

3.3 Existing Privacy Setting Models

Previous studies in smartphone privacy have shown that the current smartphone privacy

interfaces lack the potential to provide the necessary user privacy information or control for both

Android and iOS systems [78]. Several solutions have been proposed to improve mobile privacy

protection and offer users more privacy control [34, 14]. These lead into rapid improvement of privacy

management of current mobile systems, providing more control on the user’s privacy settings.

Android system mainly use Ask On Install (AOI) and Ask On First Use (AOFU) models for

13

Yang He Dissertation

privacy settings [114, 129]. In the AOI model, the smartphone permissions are asked in bulk before

installing a new app. The user’s option is only to allow or deny all, which clearly gives less privacy

control. Also, only few users would read or pay attention to the privacy settings when installing

the app, and even fewer users can understand their meaning [34, 59]. Several third-party apps have

been developed to cope with this problem, such as Turtleguard [114] and Mockdroid [14]. In the

AOFU model [114], permissions are only asked during the first use of an app or when some function

of the app is demanding a specific permission of the smartphone. In this case, the user will trade

off his privacy (data sharing) and the functionality of the app. Users can also revisit and review

permissions in their phone privacy settings for each app. This model makes users more informed

and gives them more control compared to AOI [37].

A few privacy management solutions were developed to simplify the task of controlling

personal data for smartphone users. For instance, ipShield [18] is a context-aware privacy framework

for mobile systems that provides users with great control of their data and inference risks. My Data

Store [123] offers a set of tools to manage, control and exploit personal data by enhancing an

individual’s awareness of the value of their data. Similarly, Databox [19] enables individuals to

coordinate the collection of their personal data, and make those data available for specific purposes.

However, these data managers do not include user privacy profiling and recommendation in the

complex IoT environment. Privacy can also be protected by providing different anonymity levels

of data that are given to third parties. However, it might not be possible to implement the most

effective privacy standards such as data obfuscation due to numerous trade-offs and restrictions,

especially in the health care and fitness domain.

In the smartphone domain, active privacy nudging is an effective scheme to increase users’

awareness [5]. Privacy nudging allows users to be informed about both their privacy settings and

how third party applications access their data [79, 37]. In a study by Liu et al., 78.7% [79] of the

nudges were adopted by smartphone users. However, such active nudges are problematic for IoT,

because IoT devices are supposed to operate in the background. Moreover, as the penetration of

IoT devices in our homes continues to increase, nudging would become a constant noise which users

would soon start to ignore, like software EULAs [43] or privacy policies [57]. In addition, privacy

nudges lack the personalization and provide only a general recommendation.

14

Yang He Dissertation

3.4 Privacy-Setting Interfaces

Beyond prompts, one can regulate privacy with global settings. The most basic privacy-

setting interface is the traditional “access control matrix”, which allows users to indicate which

entity gets to access what type of information [102]. This approach can be further simplified by

grouping recipients into relevant semantic categories, such as Google+’s circles [126]. Taking a step

further, Raber et al. [97] proposed Privacy Wedges to manipulate privacy settings. Privacy Wedges

allow users to make privacy decisions using a combination of semantic categorization (the various

wedges) and inter-personal distance (the position of a person on the wedge). Users can decide who

gets to see various posts or personal information by “coloring” parts of each wedge.

Privacy wedges have been tested on limited numbers of friends, and in the case of household

IoT they are likely to be insufficient, due to the complexity of the decision space. To wit, IoT

privacy decisions involve a large selection of devices, each with various sensors that collect data for

a range of different purposes. This makes it complicated to design an interface that covers every

possible setting [130]. A wedge-based interface will arguably not be able to succinctly represent such

complexity, and therefore either be impossible, or still lead to a significant amount of information

and choice overload.

We used a data-driven approach to solve this problem: statistical analysis informs the

construction of a layered settings interface, while machine learning-based privacy prediction helps

us find smart privacy profiles.

3.5 Privacy Prediction

Several researchers have proposed privacy prediction as a solution to the privacy settings

complexity problem—an approach known as “user-tailored privacy” (UTP) [63]. Systems that imple-

ment UTP first predict users’ privacy preferences and behaviors based on their known characteristics.

They then use these predictions to provide automatic default settings or suggestions in line with

users’ disclosure profiles, to educate users about privacy features they are unaware of, to tailor the

privacy-setting user interfaces to make it easier for users to engage with their preferred privacy

management tools, or to selectively restrict the types of personalization a system is allowed engage

in.

Most existing work in line with this approach has focused on providing automatic default

15

Yang He Dissertation

settings. For example, Sadeh et al. [101] used a k-nearest neighbor algorithm and a random forest

algorithm to predict users’ privacy preferences in a location-sharing system, based on the type of

recipient and the time and location of the request. They demonstrated that users had difficulties

setting their privacy preferences, and that the applied machine learning techniques can help users to

choose more accurate disclosure preferences. Similarly, Pallapa et al. [89] present a system which can

determine the required privacy level in new situations based on the history of interaction between

users. Their system can efficiently deal with the rise of privacy concerns and help users in a pervasive

system full of dynamic interactions.

Dong et al. [27] use a binary classification algorithms to give users personalized advice

regarding their privacy decision-making practices on online social networks. They found that J48

decision trees provided the best results. Li and et al. [76] similarly use J48 to demonstrate that

taking the user’s cultural background into account when making privacy predictions improves the

prediction accuracy. Our data stems from a culturally homogeneous population (U.S. Mechanical

Turk workers), so cultural variables are outside the scope of our study. We do however follow these

previous works in using J48 decision trees in our prediction approach.

We further extend this approach using clustering to find several smart default policies

(“profiles”). This is in line with Fang et al. [32], who present an active learning algorithm that

comes up with privacy profiles for users in real time. Since our approach is based on an existing

dataset, our algorithm does not classify users in real time, but instead creates a static set of profiles

‘offline’, from which users can subsequently choose. This avoids cold start problems, and does not

rely on the availability of continuous real-time behaviors. This is beneficial for household IoT privacy

settings, because users often specify their settings in these systems in a “single shot”, leaving the

settings interface alone afterwards.

Ravichandran et al. [99] employ an approach similar to ours, using k-means clustering on

users’ contextualized location sharing decisions to come up with several default policies. They

showed that a small number of policies could accurately reflect a large part of the location sharing

preferences.

In this dissertation, we extend this clustering approach to find the best profiles based on

various novel clustering approaches, and take the additional step of designing user interfaces that

incorporate the best solutions for different IoT contexts.

16

Yang He Dissertation

3.6 Summary

In this chapter, we have noted following points: 1) Existing research has shown that people

are extensively different in their privacy settings, but can be grouped. 2) People are bad at managing

privacy settings using currently privacy setting schemes. 3) Privacy prediction can be used by

utilizing machine learning algorithms to help design a new privacy-setting interface to simplify the

task of managing privacy setting for users. It is possible that we can leverage the user diversity

and use the predictive approach to develop privacy-setting interfaces for the users. Thus, in the

next chapter, we will present how we design for privacy in the general/public IoT context using a

data-driven manner, the contributions and the limitations of our work.

17

Chapter 4

Recommending Privacy Settings

for General/Public IoT

4.1 Introduction

In chapter 2 and 3, we have discussed the benefits and risks of IoT technology, and the

key factors affecting users to adopt IoT systems/devices, the privacy risks caused by inappropriate

privacy disclosure. We also discussed that people’s information disclosure behaviors vary along

multiple dimensions [65], which enables us to classify users into their respective privacy groups and

adapt the privacy practices to their disclosure styles.

Current privacy control schemes make it difficult for IoT users to manually configure their

privacy settings. In this chapter, we developed a data-driven approach to solve this problem: sta-

tistical analysis is used to inform the construction of a layered settings interface, while machine

learning-based algorithms are used to predict people’s privacy decisions and help us find most suit-

able smart privacy profiles. By using this data-driven approach, we developed a set of “smart

defaults/profiles”, which are able to help user configure their IoT privacy settings with a single

click. Note that the novelty of our work is not about the statistical analysis and the machine learn-

ing techniques that we used; the real invention is that we are using these techniques (a data-driven

approach) to have a direct influence on design of privacy interface settings. Arguably, there is no

existing similar work in this area.

18

Yang He Dissertation

In this chapter, we intend to answer the following questions:

• Q1: Is our data-driven approach useful in the general/public IoT context?

• Q2: What results can be achieved by using our approach?

By general/public IoT, we are referring to those IoT devices deployed in public space,

outside people’ home, where people have little control over the data collection practices of the

devices. These IoT devices can be operated by many entities (i.e. government, employers, friends,

colleagues, etc) to collect data with or without people’s awareness. Different types of data, such

as photos, videos, or locations, can be collected to track people or provide convenience, health-

related or social-related information/advice. Due to the high uncertainty and low Controllability,

general/public IoT can cause various privacy risks and concerns. The demand to get notified about

surrounding public IoT devices or to be able to control these devices is urgent. Researchers at Intel

are working on a framework that allows people to be notified about surrounding IoT devices collecting

personal information, and to control these collection practices [22], but no suitable interface has been

developed for this system yet.

Developing a usable privacy-setting interface for IoT to simplify users’ task of managing

privacy settings seems promising. However, developing such an interface would commonly require

user studies with existing systems. Since the Intel control framework [22] has not been implemented

yet, this method is not possible. We therefore propose to develop user interface designs for managing

the privacy settings of general/public IoT devices using a data driven design approach: rather than

evaluating and incrementally improving an existing interface, we leveraged data collected by Lee and

Kobsa [72], which gathered users’ feedback on general/public IoT scenarios before developing the

interface. This approach allows us to create a navigational structure that preemptively maximizes

users’ efficiency in expressing their privacy preferences. Moreover, it allows us to anticipate the

most common privacy settings, and capture them into a series of ‘privacy profiles’ that allow users

to express a complex set of privacy preferences with the single click of a button.

In this chapter, we first discuss the dataset that we use, then we present how we apply our

data-driven approach in the general/public IoT context, including the inspection of users’ behaviors

using statistical analyses and prediction of users’ behaviors using machine learning techniques. Fi-

nally, we present the privacy-setting prototypes that we create based on both statistical and machine

learning results.

19

Yang He Dissertation

4.2 Dataset and design

In the data collected by Lee and Kobsa [72], 200 participants were asked about their inten-

tion to allow or reject the IoT features presented in 14 randomized scenarios. These scenarios are

manipulated in a mixed fractional factorial design along the following dimensions: ‘Who’, ‘What’,

‘Where’, ‘Reason’, and ‘Persistence’ (See Table 4.1). A total of 2800 scenarios were presented to 200

participants (100 male, 99 female, 1 undisclosed) through Amazon Mechanical Turk. Four partici-

pants were aged between 18 and 20, 75 aged 20–30, 68 aged 30–40, 31 aged 40–50, 20 aged 50–60,

and 2 aged > 60.

For every scenario, participants were asked a total of 9 questions. Our study focuses on

the allow/reject question: “If you had a choice to allow/reject this, what would you choose?”,

with options “I would allow it” and “I would reject it”. We also used participants’ answers to three

attitudinal questions regarding the scenario:

• Risk: How risky or safe is this situation? (7pt scale from “very risky” to “very safe”)

• Comfort: How comfortable or uncomfortable do you feel about this situation? (7pt scale)

• Appropriateness: How appropriate do you consider this situation? (7pt scale)

We use this dataset in two phases. In our first phase, we develop a “layered” settings

interface, where users make a decision on a less granular level (e.g., whether a certain recipient is

allowed to collect their personal information or not), and only move to a more granular decision

(e.g., what types of information this recipient is allowed to collect) when they desire more detailed

control. This reduces the complexity of the decisions users have to make, without reducing the

amount of control available to them. We use statistical analysis of the Lee and Kobsa dataset to

decide which aspect should be presented at the highest layer of our IoT privacy-setting interface,

and which aspects are relegated to subsequently lower layers.

In our second phase, we develop a “smart” default setting, which preempts the need for

many users to manually change their settings [106]. However, since people differ extensively in their

privacy preferences [88], it is not possible to achieve an optimal default that is the same for everyone.

Instead, different people may require different settings. Outside the field of IoT, researchers have

been able to establish distinct clusters or “profiles” based on user behavioral data [65, 88, 132]. We

perform machine learning analysis on this dataset to create a similar set of “smart profiles” for our

20

Yang He Dissertation

Table 4.1: Parameters used in the experiment1

Parameter Levels

Who

The entity collecting the data

1. Unknown 2. Colleague 3. Friend 4. Own device 5. Business 6. Employer 7. Government

What

The type of data collected and (optionally) the knowledge extracted from this data

1. PhoneID 2. PhoneID>identity 3. Location 4. Location>presence 5. Voice 6. Voice>gender 7. Voice> age 8. Voice>identity 9. Voice>presence 10. Voice>mood 11. Photo 12. Photo>gender 13. Photo>age 14. Photo>identity 15. Photo>presence 16. Photo>mood 17. Video 18. Video>gender 19. Video>age 20. Video>presence 21. Video>mood 22. Video>looking at 23. Gaze 24. Gaze>looking at

Where

The location of the data collection

1. Your place 2. Someone else’s place 3. Semi-public place (e.g. restaurant) 4. Public space (e.g. street)

Reason

The reason for collecting this data

1. Safety 2. Commercial 3. Social-related 4. Convenience 5. Health-related 6. None

Persistence 1. Once 2. Continuously

Whether data is collected once or continuously 1 Example scenarios:

“A device of a friend records your video to detect your presence. This happens contin- uously, while you are at someone else’s place, for your safety.” “A government device reads your phone ID to detect your identity. This happens once, while you are in a public place (e.g. on the street), for health-related purposes.”

21

Yang He Dissertation

general IoT privacy-setting interface.

4.3 Statistical Analysis

We conducted a statistical analysis on this dataset to determine the effect of each scenario

parameter on users’ decisions to allow the presented general IoT scenario and how this effect is

mediated by the user’s attitudes. 1

Using this approach, we find that the ‘Who’ parameter has the strongest effect on users’

decision to allow the scenario, followed by the ‘What’, the ‘Reason’, and the ’Persistence’ parameter.

The ‘Where’ parameter has no effect at all. People are generally concerned about IoT scenarios

involving unknown and government devices, but less concerned about about data collected by their

own devices. Mistrust of government data collection is in line with Li et al.’s finding regarding US

audiences [76].

‘What’ is the second most important scenario parameter, and its significant interaction

with ‘who’ suggests that some users may want to allow/reject the collection of different types of

data by different types of recipients. Privacy concerns are higher for photo and video than for voice,

arguably because photos and videos are more likely to reveal the identity of a person. Moreover,

people are less concerned with revealing their age and presence, and most concerned with revealing

their identity.

The ‘reason’ for the data collection is the third most important scenario parameter. Health

and safety are generally seen as acceptable reasons. ‘Persistence’ is less important, although one-

time collection is more acceptable than continuous collection. ‘Where’ the data is being collected

does not influence intention at all. This could be an artifact of the dataset: location is arguably less

prominent when reading a scenario than it is in real life.

Finally, participants’ attitudes significantly (and in some cases fully) mediated the effect of

scenario parameters on behavioral intentions. This means that these attitudes may be used as a

valuable source for classifying people into distinct groups. Such attitudinal clustering could capture

a significant amount of the variation in participants in terms of their preferred privacy settings,

especially with respect to the ‘who’ and ‘what’ dimensions.

1The statistical analysis and the subsequent layered interface were developed by my co-author Paritosh Bahirat. These endeavors are presented in summarized form since they are not an official part of this dissertation. For more details, please refer to [11].

22

Yang He Dissertation

Moreover, we found no significant interaction effects of parameters on decision beyond the

significant interaction between ‘Who’ and ‘What’ onto the attitudes. The outcome informed the

design of a ‘layered interface’, which present privacy settings with the most prominent influence

first, relegating less prominent aspects to subsequently lower layers (See Figure 4.1). Users can

make a decision based on a single parameter only, and choose ‘yes’, ‘no’, or ‘it depends’ for each

parameter value. If they choose ‘it depends’, they move to a next layer, where the decision for that

parameter value is broken down by another parameter.

The manual interface is shown in Screens 2-4 of Figure 4.1. At the top layer of this interface

should be the scenario parameter that is most influential in our dataset. Our statistical results

inform us that this is the who parameter. Screen 2 shows how users can allow/reject data collection

for each of the 7 types of recipients. Users can choose “more”, which brings them to the second-most

important scenario parameter, i.e. the what parameter. Screen 3 of Figure 4.1 shows the data type

options for when the user clicks on “more” for “Friends’ devices”. We have conveniently grouped the

options by collection medium. Users can turn the collection of various data types by their friends’

devices on or off. If only some types of data are allowed, the toggle at the higher level gets a yellow

color and turns to a middle option, indicating that it is not completely ‘on’ (see “Friends’ devices”

in Screen 2).

Screen 4 of Figure 4.1 shows how users can drill down even further to specify reasons for

which collection is allowed, and the allowed persistence (we combined these two parameters in

a single screen to reduce the “depth” of our interface). Since reason and persistence explain

relatively little variance in behavioral intention, we expect that only a few users will go this deep

into the interface for a small number of their settings. We leave out where altogether, because our

statistical results deemed this parameter to be non-significant.

4.4 Predicting users’ behaviors (original work)

To further simplify the task of manually setting privacy preferences, we used machine learn-

ing to predict users’ decisions based on the scenario parameters. Our goal is to find suitable default

settings for an IoT privacy-setting interface. Consequently, we do not attempt to find the most

accurate solution; instead we make a conscious tradeoff between parsimony and prediction accuracy.

Accuracy is important to ensure that users’ privacy preferences are accurately captured and/or need

23

Yang He Dissertation

IoT Settings

Unknown devices

Government devices

My employer's devices

Devices of nearby businesses

Colleagues' devices

Friends' devices

My own devices

Which devices may collect your personal information?

more

more

more

more

more

more

more

9:00 AM 100% ������

Friends’ devices

identity

(other)

presence

mood

gender

age

What type of data may your friends’ devices collect?

more

more

more

more

more

more

9:00 AM 100% ������

Voice, to determine my…

identity

gender

age more

more

more

Photos, to determine my…

Settings Voice - age

never

once

continuously

For what purpose may your friends’ devices record your voice to determine your age?

9:00 AM 100% ������

Safety

Friends

never

once

continuously

Health

never

once

Convenience

ProfilesDefault profiles Please select a profile

(you can change individual settings on the next screen)

9:00 AM 100% ������

Limited collection This profile allows the collection of: ⁃ any data by the your own devices, your friends’ devices,

your employer/school’s devices, and devices of nearby businesses

⁃ any data by your colleagues’ devices, but only for certain reasons

learn more…

No collection This profile prevents the collection of any data

learn more…

next

Limited collection, personal devices only This profile allows the collection of: ⁃ certain types of data by the your own devices

learn more…

Figure 4.1: From Left, Screen 1 shows three default settings, Screen 2,3 and 4 shows layered interface

only few manual adjustments. Parsimony, on the other hand, prevents overfitting and promotes

fairness: we noticed that more complex models tended to increase overall accuracy by predicting

a few users’ preferences more accurately, with no effect on other users. Parsimony also makes the

associated default setting easier to understand for the user.

Our prediction target is the participants’ decision to allow or reject the data collection

described in each scenario, classifying a scenario as either ‘yes’ or ‘no’. The scenario parameters

serve as input attributes. These are nominal variables, making decision tree algorithms such as

ID3 and J48 a suitable prediction approach. Unlike ID3, J48 uses gain ratio as the root node

selection metric, which is not biased towards input attributes with many values. We therefore use

J48 throughout our analysis.

We discuss progressively sophisticated methods for predicting participants’ decisions. After

discussing naive solutions, we first present a cross-validated tree learning solution that results in a

single “smart default” setting that is the same for everyone. Subsequently, we discuss three different

procedures that create a number of “smart profiles” by clustering the participants and creating a

separate cross-validated tree for each cluster. For each procedure, we try various numbers of clusters.

Accuracies of the resulting solutions are reported in Table 4.2.

24

Yang He Dissertation

Table 4.2: Comparison of clustering approaches

Approach clusters Accuracy # of profiles Naive classification

1 28.33% 1 (all ‘yes’) 1 71.67% 1 (all ‘no’)

Overall 1 73.10% 1

Attitude- based clustering

2 75.28% 2 3 75.17% 3 4 75.60% 3 5 75.25% 3

Fit-based clustering

2 77.99% 2 3 81.54% 3

Agglomerative clustering

200 78.13% 4 200 78.27% 5

Table 4.3: Confusion matrix for the overall prediction

Observed Prediction Total Yes No

Yes 124 (TP) 669 (FN) 793 No 84 (FP) 1923 (TN) 2007

Total 208 2592 2800

4.4.1 Naive Prediction Methods

We start with naive or “information-less” predictions. Our dataset contains 793 ‘yes’es and

2007 ‘no’s. Therefore, predicting ‘yes’ for every scenario gives us a 28.33% prediction accuracy,

while making a ‘no’ prediction gives us an accuracy of 71.67%. In other words, if we disallow all

information collection by default, users will on average be happy with this default for 71.67% of the

settings.

4.4.2 Overall Prediction

We next create a “smart default” by predicting the allow/reject decision with the scenario

parameters using J48 with Weka’s [46] default settings. The resulting tree is shown in Figure 4.2.

The confusion matrix (Table 4.3) shows that this model results in overly conservative settings; only

208 ‘yes’es are predicted.

Figure 4.2 shows that this model predicts ‘no’ for every recipient (‘who’) except ‘Own device’.

For this value, the default setting depends on ‘what’ is being collected (see Table 4.4). For some

levels of ‘what’, there is a further drill down based on ‘where’, ‘persistence’ and ‘reason’.

We can use this tree to create a “smart default” setting; in that case, users would on average

25

Yang He Dissertation

5/29/2017 localhost:63342/d3_paper/index.html?_ijt=ca80k1g3211vr31sjmg5cihfii

http://localhost:63342/d3_paper/index.html?_ijt=ca80k1g3211vr31sjmg5cihfii 1/1

WHO

Unknown: NO

Colleague: NO

Friend: NO

Own device: WHAT

Business: NO

Employer: NO

Government: NO

Figure 4.2: The Overall Prediction decision tree. Further drill down for ‘who’ = ‘Own device’ is provided in Table 4.4

be content with 73.10% of these settings—a 2% improvement over the naive “no to everything”

default setting.

Given that people differ substantially in their privacy preferences, it is not unsurprising

that this “one size fits all” default setting is not very accurate. A better solution would cluster

participants by their privacy preferences, and then fit a separate tree for each cluster. These trees

could then be used to create “smart profiles” that new users may choose from. Subsequent sections

discuss several ways of creating such profiles.

4.4.3 Attitude-Based Clustering

Our first “smart profile” solution uses the attitudes (comfort, risk, appropriateness) partici-

pants expressed for each scenario on a 7-point scale. We averaged the values per attitude across each

participant’s 14 answers, and ran k-means clustering on that data with 2, 3, 4 and 5 clusters. We

then added participants’ cluster assignments to our original dataset, and ran the J48 decision tree

learner on the dataset with the additional cluster attribute. Accuracies of the resulting solutions

are reported in Table 4.2 under “attitude-based clustering”.

All of the resulting trees had cluster as the root node. This indicates that this parameter

is a very effective parameter for predicting users’ decisions. This also allows us to split the trees at

the root node, and create separate default settings for each cluster.

The 2-cluster solution (Figure 4.3) has a 75.28% accuracy — a 3.0% improvement over the

26

Yang He Dissertation

Table 4.4: Drill down of the Overall Prediction tree for ‘who’ = ‘Own device’

What Decision PhoneID Yes PhoneID>identity Yes Location No

Location>presence Reason

 

Safety Yes Commercial Yes Social-related No Convenience No Health-related Yes None Yes

Voice No

Voice>gender Where

 

Your place No Someone else No Semi-public No Public Yes

Voice> age No Voice>identity Yes Voice>presence Yes Voice>mood Yes Photo No Photo>gender No Photo>age No Photo>identity Yes Photo>presence No Photo>mood No Video No Video>gender No Video>age No Video>presence No Video>mood Yes

Video>looking at Persistence

{ Once Yes Continuous No

Gaze No

Gaze>looking at Reason

 

Safety Yes Commercial No Social-related No Convenience Yes Health-related Yes None Yes

27

Yang He Dissertation

9/19/2017 localhost:63342/d3_paper/index.html?_ijt=1v9fekoi78r3ngd2b0ldg171ne

http://localhost:63342/d3_paper/index.html?_ijt=1v9fekoi78r3ngd2b0ldg171ne 1/1

CLUSTER

Cluster 0 (89 users):

Cluster 1 (111 users):

WHO

NO

Unknown: NO

Colleague: NO

Friend: WHAT

Own device: YES

Business: NO

Employer: WHAT

Government: NO

Figure 4.3: Attitude-based clustering: 2-cluster tree.

5/29/2017 localhost:63342/d3_paper/index.html?_ijt=mk4p34fllei6mbmv12bc2hsfg1

http://localhost:63342/d3_paper/index.html?_ijt=mk4p34fllei6mbmv12bc2hsfg1 1/1

CLUSTER

Cluster 0 (1204 instances):

Cluster 1 (714 instances):

Cluster 2 (882 instances):

WHO

WHO

NO

Unknown: NO Colleague: NO Friend: NO Own device: YES Business: NO Employer: NO Government: NO

Unknown: NO Colleague: NO Friend: YES Own device: YES Business: YES Employer: YES Government: NO

Figure 4.4: Attitude-based clustering: 3-cluster tree

“smart default”. This solution results in one profile with ‘no’ for everything, while for the other

profile the decision depends on the recipient (who). This profile allows any collection involving the

user’s ‘Own device’, and may allow collection by a ‘Friend’ or an ‘Employer/School’, depending on

what is being collected.

The 3-cluster solution has a slightly lower accuracy of 75.17%, but is more parsimonious

than the 2-cluster solution. There is one profile with ‘no’ for everything, one profile that allows

collection by the user’s ‘Own device’ only, and one profile that allows any collection except when the

recipient is ‘Unknown’ or the ‘Government’. The 4- and 5-cluster solutions have several clusters with

the same sub-tree, and therefore reduce to a 3-cluster solution with 75.60% and 75.25% accuracy,

respectively.

4.4.4 Fit-based clustering

Our fit-based clustering approach clusters participants without using any additional infor-

mation. It instead uses the fit of the tree models to bootstrap the process of sorting participants into

28

Yang He Dissertation

clusters. Like many bootstrapping methods, ours uses random starts and iterative improvements

to find the optimal solution. The process is depicted in Figure 4.5, and described in detail below.

Accuracies of the resulting solutions are reported in Table 4.2 under “fit-based clustering”.

Random starts: We randomly divide participants over N separate groups, and learn a tree

for each group. This is repeated until a non-trivial starting solution (i.e., with distinctly different

trees per cluster) is found.

Iterative improvements: Once each of the N groups has a unique decision tree, we

evaluate for each participant which of the trees best represents their 14 decisions. If this is the tree

of a different group, we switch the participant to this group. Once all participants are evaluated and

put in the group of their best-fitting tree, the tree in each group is re-learned with the data of the

new group members. This then prompts another round of evaluations, and this process continues

until no further switches are performed.

Since this process is influenced by random chance, it is repeated in its entirety to find the

optimal solution. Cross-validation is performed in the final step to prevent over-fitting. Accuracies

of the 2- and 3-cluster solutions are reported in Table 4.2 under “fit-based clustering”. We were not

able to converge on a higher number of clusters.

The 2-cluster solution has a 77.99% accuracy—a 6.7% improvement over the “smart default”.

One profile has ‘no’ for everything, while the settings in the other profile depends on who: it allows

any collection by the user’s ‘Own device’, and may allow collection by a ‘Friend’s device’ or an

‘Employer’, depending on what is collected.

The 3-cluster solution (Figure 4.6) has a 81.54% accuracy — an 11.5% improvement over the

“smart default”. We find one profile with ‘no’ for everything; one profile that may allow collection

by the user’s ‘Own device’, depending on what is being collected; and one profile that allows any

collection except when the recipient (who) is ‘Unknown’, the ‘Government’, or a ‘Colleague’, with

settings for the latter depending on the reason.

4.4.5 Agglomerative clustering

Our final method for finding “smart profiles” follows a hierarchical bottom-up (or agglom-

erative) approach. It first fits a separate tree for each participant, and then iteratively merges them

based on similarity. 156 of the initial 200 trees predict “no for everything” and 34 of them predict

“yes for everything”—these are merged first. For every possible pair of the remaining 10 trees, the

29

Yang He Dissertation

Figure 4.5: The Flow Chart for Fit-based Clustering

30

Yang He Dissertation 5/ 30 /2 01 7

lo ca lh os t:6 33 42 /d 3_ pa pe r/ in de x. ht m l? _i jt= aq ak df b7 ks o4 na bl 9r sb la ap 7t

ht tp :// lo ca lh os t:6 33 42 /d 3_ pa pe r/ in de x. ht m l? _i jt= aq ak df b7 ks o4 na bl 9r sb la ap 7t

1/ 2

C LU

S TE

R

C lu st er  0  (7

4  us er s) :

C lu st er  1  (7

7  us er s) :

C lu st er  2  (4

9  us er s) :

N O

W H O

W H O

U nk no

w n:  N O

C ol le ag

ue : N

O Fr ie nd

: N O

O w n  de

vi ce : W

H AT

B us in es s:  N O

E m pl oy er : N

O G ov er nm

en t:  N O

U nk no

w n:  N O

C ol le ag

ue : R

E A S O N

Fr ie nd

: Y E S

O w n  de

vi ce : Y

E S

B us in es s:  Y E S

E m pl oy er : Y

E S

G ov er nm

en t:  N O

P ho

ne ID : Y

E S

P ho

ne ID >i de

nt ity : Y

E S

Lo ca tio n:  P E R S IS TE

N C E

Lo ca tio n>

pr es en

ce : Y

E S

Vo ic e:  N O

Vo ic e>

ge nd

er : Y

E S

Vo ic e>

ag e  : Y

E S

Vo ic e>

id en

tit y:  Y E S

Vo ic e>

pr es en

ce : Y

E S

Vo ic e>

m oo

d:  Y E S

P ho

to : Y

E S

P ho

to >g

en de

r:  W H E R E

P ho

to >a

ge : N

O P ho

to >i de

nt ity : Y

E S

P ho

to >p

re se nc e:  N O

P ho

to >m

oo d:  N O

V id eo

: N O

V id eo

>g en

de r:  N O

V id eo

>a ge

: Y E S

V id eo

>p re se nc e:  N O

V id eo

>m oo

d:  Y E S

V id eo

>l oo

ki ng

 a t:  P E R S IS TE

N C E

G az e:  P E R S IS TE

N C E

G az e>

lo ok in g  at : Y

E S

S af et y  pu

rp os es : Y

E S

C om

m er ci al  p ur po

se s:  N O

S oc ia l­r el at ed

 p ur po

se s:  Y E S

Yo ur  C on

ve ni en

ce : Y

E S

H ea

lth ­r el at ed

 p ur po

se s:  W

H E R E

N on

e:  N O

Figure 4.6: Fit-based clustering: 3-cluster tree. Further drill down is hidden for space reasons.

31

Yang He Dissertation

70 71 72 73 74 75 76 77 78 79 80 81 82 83

Agglomerative (5)

Agglomerative (4)

Fit (3) Attitude (3)

Fit (2) Attitude (2)

Overall (1) Naïve (1)

Accuracy (%)

Overview of model accuracies

Figure 4.7: Accuracy of our clustering approaches

accuracy of the pair is compared with the mean accuracy the individual trees, and the pair with

the smallest reduction in accuracy is merged. This process is repeated until we reach the predefined

number of clusters.

We were able to reach a 5- and 4-cluster solution. The 3-cluster solution collapsed down

into a 2-cluster solution with one profile of all ‘yes’es and one profile of all ‘no’s (a somewhat trivial

solution with a relatively bad fit). Accuracies of the 4- and 5-cluster (Table 4.2, “agglomerative

clustering”) are 78.13% and 78.27% respectively. For the 4-cluster solution, we find one profile with

‘no’ for everything, one profile with ‘yes’ for everything, one profile that depends on who, and

another that depends on what. The latter two profiles drill down even further on specific values of

who and what, respectively.

4.4.6 Discussion of Machine Learning Results

Figure 4.7 shows a comparison of the presented approaches. Compared to a naive default

setting (all ‘no’), a “smart default” makes a 2.0% improvement. The fit-based 2-cluster solution

32

Yang He Dissertation

results in two “smart profiles” that make another 6.7% improvement over the “smart default”, while

the three “smart profiles” of the fit-based 3-cluster solution make an 11.5% improvement. If we let

users choose the best option among these three profiles, they will on average be content with 81.54%

of the settings—a 13.8% improvement2 over the naive “no to everything” default. This rivals the

accuracy of some of the “active tracking” machine learning approaches [101].

In line with our statistical results, the factor who seems to be the most prominent parameter,

followed by what. In some cases the settings are more complex, depending on a combination of

who and what. This is in line with the interaction effect observed in our statistical results.

Even our most accurate solution is not without fault, and its accuracy depends most on

the who parameter. Specifically, the solution is most accurate for the user’s own device, the device

of a friend, and when the recipient is unknown. It is however less accurate when the recipient is a

colleague, a nearby business, an employer, or the government. In these scenarios, more misclassi-

fications tend to happen, so it would be useful to ‘guide’ users to specifically have a look at these

default settings, should they opt to make any manual overrides.

4.5 Privacy shortcuts (original work)

In Section 4.3, we developed a “layered” interface that general IoT users can use to manually

set their privacy settings (see Figure 4.1). Our machine learning analysis (Section 4.4) resulted in

a number of interesting solutions for “smart profiles” that would allow users of this interface to set

their privacy settings with a single click (i.e., a choice of profile). In this section we therefore present

how we integrate the “smart profiles” with our prototype.

4.5.1 Smart Default Setting

The design of “layered” interface is based on our statistical results that there exists no

interaction effect between the parameters, our “smart default” settings can be easily integrated to

this prototype. For the “yes to everything” or “no to everything” default, we can just simply set all

the settings in the Screen 4 of Figure 4.1 to all ‘on’ or ‘off’.

For the results from our Overall Prediction (see Figure 4.2), we can create a “smart default”

setting that is 73.10% accurate on average. In this version, the IoT settings for all devices are set to

281.54.39 / 71.67 = 1.138

33

Yang He Dissertation

Default profiles Please select a profile

(you can change individual settings on the next screen)

9:00 AM 100% ������

Limited collection This profile allows the collection of: ⁃ any data by the your own devices ⁃ certain types of data by your friends’ devices ⁃ certain types of data by your employer/school’s devices

learn more…

No collection This profile prevents the collection of any data

learn more…

next

(a) 2-profile choice interface

Default profiles Please select a profile

(you can change individual settings on the next screen)

9:00 AM 100% ������

Limited collection This profile allows the collection of: ⁃ any data by the your own devices, your friends’ devices,

your employer/school’s devices, and devices of nearby businesses

⁃ any data by your colleagues’ devices, but only for certain reasons

learn more…

No collection This profile prevents the collection of any data

learn more…

next

Limited collection, personal devices only This profile allows the collection of: ⁃ certain types of data by the your own devices

learn more…

(b) 3-profile choice interface

Figure 4.8: Two types of profile choice interfaces

‘off’, except for ‘My own device’, which will be set to the middle option. Table 4.4 shows the default

settings at deeper levels. As this default setting is on average only 73.10% accurate, we expect users

to still change some of their settings. They can do this by navigating the manual settings interface.

4.5.2 Smart Profiles

To improve the accuracy of the default setting, we can instead build two “smart profiles”,

and allow the user to choose among them. Using the 3-cluster solution of the fit-based approach

(see Figure 4.6), we can attain an accuracy of 81.54%. Screen 1 in Figure 4.1 shows a selection

screen where the user can choose between these profiles. The “Limited collection” profile allows the

collection of any information by the user’s own devices, their friends’ devices, their employer/school’s

devices, and devices of nearby businesses. Devices of colleagues are only allowed to collect infor-

mation for certain reasons. The “Limited collection, personal devices only” profile only allows the

collection of certain types of information by the user’s own devices. The “No collection” profile does

not allow any data collection to take place by default.

Once the user chooses a profile, they will move to the manual settings interface (Screens

2–4 of Figure 4.1), where they can further change some of their settings.

34

Yang He Dissertation

4.6 Discussion and Limitations

Our statistical and machine learning results both indicated that recipient of the information

(who) is the most significant parameter in users’ decision to allow or reject IoT-based information

collection. This parameter therefore features at the forefront in our layered settings interface, and

plays an important role in our smart profiles. The what parameter was the second-most important

decision parameter, and interacted significantly with the who parameter. This parameter therefore

features at the second level of our settings interface, and further qualifies some of the settings in our

smart profiles.

Our layered interface allows a further drill-down to the reason and persistence parameters,

but given the relatively lesser importance of these parameters, we expect few users to engage with

the interface at this level. Moreover, the where parameter was not significant, so we left it out of

the interface.

While a naive (‘no’ to all) default setting in our interface would have provided an accuracy

of 71.67%, it would not have allowed users to reap the potential benefits associated with IoT data

collection without changing the default setting. Our Overall Prediction procedure resulted in a

smart default setting that was a bit more permissive, and increased the accuracy by 2%.

The fit-based clustering approach, which iteratively clusters users and fits an optimal tree

in each cluster, provided the best solution. This resulted in an interface where users can choose from

3 profiles, which increases the accuracy by another 11.5%.

The scenario-based method presented in this paper is particularly suited for novel domains

where few real interaction exist. We note, though, that this novelty may hamper our approach:

users’ decisions are inherently limited by the knowledge they have about IoT. Lee and Kobsa [72]

made sure to educate users about the presented scenarios, hence their data is arguably better in this

regard than data from “live” systems. However, as the adaptation of IoT becomes more widespread,

the mindset and knowledge regarding such technologies—and thus their privacy preferences—might

change. Our “smart profiles” may thus eventually have to be updated in future work, but for now,

our current profiles can at least help users make make better privacy decisions in their initial stages

of usage.

One limitation of our work is that we did not test our layered interface, so we do not know

whether users are comfortable with the interface or whether they prefer a single “smart default”

35

Yang He Dissertation

setting or a choice among “smart profiles” (i.e. at what level of complexity in terms of profiles the

user would prefer.). This is why I conduct a user study in Chapter 7 to investigate this problem.

Another caveat of our work is that we did not manipulate the pruning parameter of our machine

learning models in the privacy decision prediction. Note that, the decision tree shown in Figure 4.6

is rather complicated, leading to a complicated privacy profile implied by the decision tree. It will

be difficult to explain a privacy profile in natural language to the users when the profile is getting

too complicated. We note, though, that pruning the decision tree may reduce the accuracy of our

models’ results. I explore this interesting trade-off between the accuracy and parsimony (i.e. simpler

profile) in Chapter 5 where I apply our data-driven approach in the Household IoT context.

4.7 Summary

In this chapter, we have presented the following:

• The definition of general/public IoT, and how we came up with the data-driven approach.

• The dataset we used and the process of the data-driven design.

• Using statistical analysis, uncovered the relative importance of the parameters that influence

users’ privacy decisions. Developed a “layered interface” in which these parameters are pre-

sented in decreasing order of importance.

• Using a tree-learning algorithm, created a decision tree that best predicts participants’ choices

based on the parameters. Used this tree to create a “smart default” setting.

• Using a combination of clustering and tree-learning algorithms, created a set of decision trees

that best predict participants’ choices. Used the trees to create “smart profiles”.

• Developed a prototype for an IoT privacy-setting interface that integrates the layered interface

with the smart default or the smart profiles.

In the next chapter, we discuss the challenges and solutions when we apply our approach in

the household IoT context.

36

Chapter 5

Recommending Privacy Settings

for Household IoT

5.1 Introduction

In Chapter 4, we have discussed how we apply the data-driven approach to the general/public

IoT context and developed an IoT privacy-setting interface prototype that integrated a layered

interface with smart defaults/profiles by predicting users’ privacy decisions. In this chapter, we

present the work we did in the area of designing for privacy for Household IoT. We expand and

improve upon the previously-developed data-driven approach to design privacy-setting interfaces for

users of household IoT devices. Moving the context to a more narrow environment shifts the focus

of the privacy decision from the entity collecting information (which was the dominant parameter

in our previous work) to a more contextual evaluation of the content or nature of the information

[87]. In addition, as discussed in 4.6, an important limitation that we did not solve in our previous

work is balancing the trade-off between parsimony and accuracy. Accuracy is important to ensure

that users’ privacy preferences are accurately captured and/or need only few manual adjustments.

Parsimony, on the other hand, prevents overfitting and promotes fairness: we noticed that more

complex models tended to increase overall accuracy by predicting a few users’ preferences more

accurately, with no effect on other users. Parsimony also makes the associated default setting easier

to understand for the user. In this chapter, we try to address these concerns.

37

Yang He Dissertation

5.2 Experiment Setup

In Chapter 4, we leveraged data collected by Lee and Kobsa [72], which asked 200 partici-

pants about their intention to allow or reject the IoT features presented in 14 randomized scenarios.

They varied the scenarios in a mixed fractional factorial design along the following dimensions:

‘Who’, ‘What’, ‘Where’, ‘Reason’, ‘Persistence’. These all are appropriate dimensions for gen-

eral/public IoT context, however, a new user study is needed for collecting data in the domain of

household IoT. Given that we are narrowing our focus of IoT context in the chapter from gen-

eral/public IoT to household IoT, it is necessary to investigate whether these dimensions are still

suitable for our next-step experiment. At the same time, some new parameters may need to be

added into our consideration for better expressing the features of the new, narrowed IoT context.

In this section, we first discuss the changes in IoT context dimensions for our new user study.

We then present the factorial procedure by which we developed 4608 highly specific IoT scenarios,

as well as the questions we asked participants to evaluate these scenarios. Finally, we describe the

participant selection and experimental procedures used to collect over 13500 responses from 1133

participants.

5.2.1 Dimension Design

We consider following dimension design changes in our user study:

• When applying our data-driven approach on Lee and Kobsa’s dataset in previous chapter, we

found the dimension “where” does not have a significant effect on uses’ disclosure decisions.

Considering that we are moving to household IoT context, the usage environments will always

be in users’ homes. Moreover, the structure of users’ houses are different from case to case,

it would be too complicated if we define ”where” to a more finer-granulated level, such as

bedroom, kitchen, etc. Hence there there is no need to retain the parameter ”where”.

• “Persistence” of tracking is more relevant in public IoT, while persistent tracking is less com-

mon in household IoT. Hence, we removed “Persistence” from our current experiment.

• From the qualitative feedback in our previous study, we have learned that the secondary use

of information was a prominent concern among users. Hence, we split the original dimension

“purpose” into two dimensions – “purpose” and “Action”, where the latter one will be used

38

Yang He Dissertation

to indicate the secondary use of information.

• A new dimension “Storage” was added in addition to our existing dimensions because it is

possible for household IoT systems to operate (and thus store data) locally, and because the

sharing of data with third parties is not as common in household IoT as in public IoT.

By applying the above dimension changes, we aim to conduct a new user study focusing on

household IoT in particular, and further refine our approach to allow us to create more carefully

tailored user interfaces for the household IoT context. Next, we present the factorial procedure by

which we developed highly specific household IoT scenarios.

5.2.2 Contextual Scenarios

The scenarios evaluated in our study are based on a full factorial combination of five different

Parameters: Who, What, Purpose, Storage and Action. A total of 8(who) ∗ 12(what) ∗ 4(purpose) ∗

4(storage) ∗ 3(action) = 4608 scenarios were tested this way.

The scenarios asked participants to imagine that they were owners and active users of the

presented IoT devices, trying to decide whether to turn on or off certain functionalities and/or

data sharing practices. To avoid endowment effects, the scenarios themselves made no indication as

to whether the functionality was currently turned on or off (such endowment effects were instead

introduced by manipulating the framing of the Decision question; see section 5.2.3). An example

scenario is: “Your smart TV (Who) uses a camera (What) to give you timely alerts (Purpose). The

data is stored locally (Storage) and used to optimize the service (Action).” This scenario may for

example represent a situation where the smarthome system has detected (via camera) a delivery

of a package and then alerts the user (via the smart TV) about its arrival. In this particular

scenario we note that the video data is stored locally to optimize service; this could mean that the

smarthome system uses the video stream to (locally) train a package detection algorithm. Similarly,

another example scenario is: “Your Smart Assistant uses a microphone to detect your location in

house. The data is stored on a remote server and shared with third parties to recommend you other

services.” Similarly, this scenario could represent a situation where the smarthome has detected (via

microphone) it’s user’s location in the house and this information is shared to the smart assistant.

In the scenario, the data is stored on remote server and shared with third parties so that it can

recommend additional services (like weather or local transportation) via third parties to the user.

39

Yang He Dissertation

The levels of all five parameters used in our experiment are shown in Table 5.1. The

parameters were highlighted in the scenario for easy identification, and upon hovering the mouse

cursor over them each parameter would show a succinct description of the parameter. Figure 5.1

shows a screenshot of a scenario as shown to participants in the study. A thirteenth scenario

regarding the interrelated control of various IoT devices (e.g. “You can use your smart TV to control

your smart refrigerator”) was also asked, but our current analysis focuses on the information-sharing

scenarios only.

5.2.3 Scenario Evaluation Questions

The first question participants were asked about each scenario was whether they would

enable or disable the particular feature mentioned in scenario (Decision). Subsequently, they were

asked about their attitudes regarding the scenario in terms of their perceived Risk, Appropriateness,

Comfort, Expectedness and Usefulness regarding the presented scenario (e.g., “How appropriate

do you think this scenario is?”). These questions were answered on a 7-point scale (e.g., “very

inappropriate” to “very appropriate”). In every 4th scenario, the Risk and Usefulness questions were

followed by an open question asking the participants to describe the potential Risk and Usefulness

of the scenario. We asked these question mainly to encourage participants to carefully evaluate the

scenarios. Figure 5.1 shows the questions asked about each scenario.

The framing and default of the Decision question were manipulated between-subjects at

three levels each: positive framing (“Would you enable this feature?”, options: Yes/No), negative

framing (“Would you disable this feature?”, options: Yes/No) or neutral framing (“What would

you do with this feature?”, options: Enable/Disable); combined with a positive default (enabled by

default), negative default (disabled by default), or no default (forced choice).

5.2.4 Participants and Procedures

To collect our dataset, 1133 adult U.S.-based participants (53.53% Female, 45.75% Male,

8 participants did not disclose) were recruited through Amazon Mechanical Turk. This significant

increase in participants over the Lee and Kobsa [72] dataset is commensurate with our expectation

of more complex privacy decision behaviors in household IoT compared to public IoT. Participation

was restricted to Mechanical Turk workers with a high reputation (at least 50 completed tasks

40

Yang He Dissertation

Figure 5.1: Example of one of the thirteen scenarios presented to the participants.

41

Yang He Dissertation

Table 5.1: Parameters used to construct the information-sharing scenarios.

Parameter Levels Code1

Who: 1. Home Security System SS Your Smart... 2. Refrigerator RE

3. HVAC System HV 4. Washing Machine WM 5. Lighting System SL 6. Assistant SA 7. TV TV 8. Alarm Clock SC

What: 1. Home Security System CSE ...uses information 2. Refrigerator CRE collected by your... 3. HVAC System CHV

4. Washing Machine CWA 5. Lighting System CLI 6. Assistant CAS 7. TV CTV 8. Alarm CAL 9. uses a location sensor CLO 10. uses a camera CCA 11. uses a microphone CMP 12. connects to your smart phone/watch CSW

Purpose : 1. detect whether you are home PH ...to... 2. detect your location in house LH

3. automate its operations AO 4. give you timely alerts TA

Storage: 1. locally L The data is stored... 2. on remote server R

3. on a remote server and T shared with third parties

Action: 1. optimize the service O ...and used to... 2. give insight into your behavior I

3. recommend you other services R 4. [None] N

1 The “codes” are used as abbreviations in graphs and figures throughout this chapter.

42

Yang He Dissertation

Figure 5.2: Attention check questions asked to participants

completed with an average accuracy greater than 95%). Participants were paid $2.00 upon successful

completion of the study. The participants were warned about not getting paid in case they failed

attention checks.

The study participants represented a wide range of ages, with 9 participants less than 20

years old, 130 aged 20-25, 273 aged 25-30, 418 aged 30-40, 175 aged 40-50, 80 aged 50-60, and 43

participants over 60 years old (5 participants did not disclose their age).

Each participant was first shown a video with a brief introduction to various smart home

devices, which also mentioned various ways in which the different appliances would cooperate and

communicate within a home. After the video, participants were asked to answer three attention

check questions, shown in Figure 5.2. If they got any of these questions wrong, they would be asked

to read the transcript of the video and re-answer the questions. The transcript is shown in Figure 5.3

After the introduction video, each participant was presented with 12 information-sharing

scenarios (and a 13th control scenario, not considered in this paper). These scenarios were selected

from the available 4608 scenarios using fractional factorial design1 that balances the within- and

between-subjects assignment of each parameter’s main effect, and creates a uniform exposure for each

participant to the various parameters (i.e., to avoid “runs” of near-similar scenarios). Participants

were asked to carefully read the scenario and then answer all questions about it. Two of the

1The scenario assignment scheme is available at https://www.usabart.nl/scenarios.csv

43

Yang He Dissertation

Figure 5.3: Transcript of video shown to participants if they failed attention checks.

44

Yang He Dissertation

Figure 5.4: Attention check question shown while participants are answering questions per scenario.

13 scenarios had an additional attention check question (e.g., “Please answer this question with

Completely Agree”, see Figure 5.4), and there was an additional attention check question asking

participants about the remaining time to finish the study (which was displayed right there on the

same page, see 5.5). Participants rushing through the experiment and/or repeatedly failing the

attention check questions were removed from the dataset.

45

Yang He Dissertation

Figure 5.5: Attention check question asked to participants.

5.3 Statistical Analysis

Our statistical analysis 2 shows that unlike results from [11], all parameters had a significant

effect. Particularly, where the information is stored and if/how it is shared with third parties

(‘Storage’ parameter) has the strongest impact on users’ decision, followed by ‘What’, ‘Who’ and

‘Purpose’ (all similar) and finally ‘Action’. Moreover, substantial two-way interaction effects were

observed between ‘Who’, ‘What’, and ‘Purpose’, which suggest that when users decide on one

parameter, they inherently take another parameter into account. Based on these results, we designed

an interface, for users to manually change their privacy settings, which separated ‘Device/Sensor

Management’ and ‘Data Storage & Usage’.

We also analyze the effects of defaults and framing [10]. As outlined in section 5.2.3, the

framing and default of the Decision question in our study were manipulated between-subjects at

three levels each: positive, negative, or neutral framing; combined with a positive, negative, or no

default. The analysis shows that defaults and framing have direct effects on disclosure: Participants

in the negative default condition are less likely to enable the functionality, while participants in

the positive default condition are more likely to enable the scenario (a traditional default effect).

Likewise, participants in the negative framing condition are more likely to enable the functionality

(a loss aversion effect).

Moreover, there are interaction effects between defaults/framing and attitudes on disclosure:

the effects of attitudes are generally weaker in the positive and negative default conditions than in

the no default condition, and they are also weaker in the negative framing condition.

2The statistical analysis were conducted by my co-author Paritosh Bahirat. These endeavors are presented in summarized form since they are not an official part of this dissertation. For more details, please refer to [48].

46

Yang He Dissertation

Importantly, there are no interaction effects between defaults/framing and parameters on

attitude or disclosure. Hence, the main findings in this section regarding the structure and relative

importance of the effects of parameters remain the same, regardless of the effects of defaults and

framing.

5.4 Privacy-Setting Prototype Design

Our dataset presents a simplified version of possible scenarios one might encounter in routine

usage of smart home technology. Still it is a daunting task to design an interface, even for these

simplified scenarios: We want to enable users to navigate their information collection and sharing

preferences across 12 different sources (What ), 7 different devices trying to access this information

(Who) for 4 different Purposes. Additionally, this information is being stored/shared in 3 ways

(Storage) and being used for 4 different secondary uses (Actions ).

Based on our statistical analysis in 5.3, we developed an intuitive interface that gives users

manual control over their privacy settings. We split our settings interface into two separate sections:

‘Device/Sensor Management’ and ‘Data Storage & Use’. The landing page of our design (screen 1

in Figure 5.6) gives users access to these two sections. The former section is based on Who, What

and Purpose and allows users to “Manage device access to data collected in your home” (screen

2-3). The latter section is based on Storage and Action, and allows users to “Manage the storage

and long-term use of data collected in your home” (screen 4). Both sections are explained in more

detail below.

Device/Sensor Management: This screen (Figure 5.6, screen 2) allows users to control

the Purposes for which each device (Who) is allowed to access data collected by itself, other devices,

and the smart home sensors installed around the house (What ). This screen has a collapsible list

of data-collecting devices and sensors (What ). For each device/sensor, the user can choose what

devices can access the collected data (Who; in rows), and what it may use that data for (Purpose;

in columns).

In the example of Figure 5.6, the user does not give the ‘Refrigerator’ access to information

collected by the ‘Smart Assistant’ for any of the four purposes, while they give the ‘Smart TV’

access to this data for the purpose of giving ‘timely alerts’. In this example the ‘Smart Assistant’ is

allowed to use its own data to ‘automate operations’ and to ‘know your location in your home’.

47

Yang He Dissertation

Figure 5.6: Privacy-Setting Interfaces Prototype

48

Yang He Dissertation

Showing Who, What and Purpose at the same time allows users to enable/disable specific

combinations of settings—the significant interaction effects between these parameters suggest that

this is a necessity. The icons for the Purpose requirement allow this settings grid to fit on a

smartphone or in-home control panel. We expect that users will quickly learn the meaning of these

icons, but they can always click on ‘I want to know more’ to learn their meaning (see Figure 5.6,

screen 3).

Data Storage & Use: This screen (Figure 5.6, screen 4) allows users to control how their

data is stored and shared (Storage), as well as how stored data is used (Action). These settings are

independent from each other and from the Device/Sensor Management settings.

For ‘Storage & Sharing’, users can choose to turn storage off altogether, store data locally,

store data both locally and on a remote server, or store data locally and on a remote server and

allow the app to share the data with third parties. Note that the options for Storage are presented as

ordered, mutually exclusive settings. Our scenarios did not present them as such (i.e., participants

were free to reject local storage but allow remote storage). However, the Storage parameter showed

a very clear separation of levels, so this presentation is justified. For ‘Data Use’, the users can choose

to enable/disable the use of the collected data for various secondary purposes: behavioral insights,

recommendations, service optimization, and/or other purposes.

In the subsequent sections we describe the results from our machine learning analysis and

further explain how these results impact the designs presented in this section. For this purpose,

Section 5.6 revisits the interface designs presented here.

5.5 Predicting users’ behaviors (original work)

In this section we predict participants’ enable/disable decision using machine learning meth-

ods. We do not attempt to find the best possible solution; instead we make a conscious trade-off

between parsimony and prediction accuracy.

Our prediction target is the participants’ decision to enable or disable the data collection

described in each scenario. The scenario parameters serve as input attributes. Using Java and

Weka’s Java library [133] for modeling and evaluation, we implement progressively sophisticated

methods for predicting participants’ decisions. After discussing naive (enable/disable all) solutions

and One Rule Prediction, we first present a cross-validated tree learning solution that results in a

49

Yang He Dissertation

Table 5.2: Comparison of clustering approaches (highest parsimony and highest accuracy)

Approach Inital

clusters Final # of

profiles Complexity

(avg. tree size/profile) Accuracy

Naive (enable all) 1 1 1 46.74% Naive (disable all) 1 1 1 53.26% One Rule (Fig. 5.7) 1 1 3 61.39% Overall (Fig. 5.10)

1 1 8 63.32% 1 1 264 63.76%

Attitude-based clustering (Fig. 5.13)

2 2 2 69.44% 2 2 121.5 72.66% 3 3 2.67 72.19% 3 3 26.67 73.47% 5 4 3 72.61% 5 4 26 73.56%

Agglomerative clustering (Fig. 5.16)

1133 4 2 79.4% 1133 5 2.4 80.35% 1133 6 3.17 80.60%

Fit-based clustering (Fig. 5.20)

2 2 2 74.43% 2 2 151.5 76.72% 3 3 7 79.80% 3 3 65.33 80.81% 4 4 9.25 81.88% 4 4 58.25 82.41% 5 5 4.2 82.92% 5 5 51.4 83.35%

single “smart default” setting that is the same for everyone. Subsequently, we discuss three different

procedures that create a number of “smart profiles” by clustering the participants and creating a

separate cross-validated tree for each cluster. For each procedure, we try various numbers of clusters

and pruning parameters. The solutions with the most parsimonious trees and the highest accuracies

of each approach are reported in Table 5.2; more detailed results of the parsimony/accuracy trade-off

are presented in Figures 5.10, 5.13, 5.16 and 5.20 throughout the paper, and combined in Figure 5.24.

5.5.1 Naive Prediction Model

We start with the naive or “information-less” predictions. Compared to our previous work

in Chapter 4, our current dataset shows that it is even less amenable to a ‘simple’ default setting:

it contains 6335 enable cases and 7241 disable cases, which means that predicting enable for every

setting gives us a 46.74% prediction accuracy, while making a disable prediction for every setting

gives us an accuracy of 53.26%. In other words, if we disable all information collection by default,

only 53.26% users will on average be satisfied with this default settings. Moreover, such a default

50

Yang He Dissertation

Figure 5.7: A “smart default” setting based on the “One Rule” algorithm.

Table 5.3: Confusion matrix for the One Rule prediction

Observed Prediction Total Enable Disable

Enable 5085 (TP) 1270 (FN) 6355 Disable 3262 (FP) 3979 (TN) 7241 Total 7192 6404 13596

setting disallows any ‘smart home’ functionality by default—arguably not a solution the producers

of smart appliances can get behind.

5.5.2 One Rule Prediction

Next, we use a “One Rule” (OneR) algorithm to predict users’ decision using the simplest

prediction model possible. OneR is a very simple but often surprisingly effective learning algo-

rithm [51]. It creates a frequency table for each predictor against the target, and then find the best

predictor with the smallest total error based on the frequencies.

As shown in Figure 5.7 (parameter value abbreviations correspond to the “code” column in

Table 5.1), the OneR model predicts users’ decision solely based on the Storage parameter with

an accuracy of 61.39%. Based on this model, if we enable all information-sharing except with third

parties, we will on average satisfy 61.39% of users’ preferences—a 15.3% improvement3 over the

naive “disable all” default. Note, though, that this default setting is overly permissive, with 3262

false positive predictions (see Table 5.3).

361.39 / 53.26 = 1.153

51

Yang He Dissertation

Figure 5.8: A “smart default” setting with 264 nodes with 63.76% accuracy.

5.5.3 Overall Prediction

Moving beyond a single parameter, we create a “smart default” setting by predicting the

enable/disable decision with all scenario parameters using the J48 decision tree algorithm. The

resulting tree has an accuracy of 63.76%. As shown in Figure 5.8, this model predicts users’ decision

on Storage first. It predicts disable for every scenarios with collected data stored on a remote

server and shared with third party. For scenarios that store collected data on remote server without

sharing, the default settings will depend on the ‘purpose’ of information sharing. There is a further

drill down based on ‘who’ and ‘what’. For scenarios that store collected data locally, the default

settings will depend on the ‘what’. There is a further drill down based on ‘who’, ‘what’, and ‘action’.

With this default setting, users would on average be satisfied with 63.76% of these settings—a 19.7%

improvement over the naive “disable all” default.

On the downside, this “smart default” setting is quite complex—the “smart default” in our

previous work [11] contained only 49 nodes, whereas the “smart default” for our current dataset has

264 nodes. Compared to One Rule algorithm, which only has 4 nodes in its decision tree and is thus

much easier to explain, the accuracy improvement of Smart Default is only 3.8%. This highlights

the trade-off between parsimony and prediction accuracy that we have to make when developing

“smart default” settings. On the upside, though, the prediction of the J48 decision tree algorithm

is more balanced, with a roughly equal number of false positives and false negatives (see Table 5.4).

To better understand the parsimony/accuracy trade-off, we vary the degree of model pruning

to investigate the effect of increasing the parsimony (i.e., more trimming) on the accuracy of the

resulting “smart default” setting. The parameter used to alter the amount of post-pruning performed

52

Yang He Dissertation

Table 5.4: Confusion matrix for the overall prediction

Observed Prediction Total Enable Disable

Enable 4753 (TP) 2488 (FN) 7241 Disable 2439 (FP) 3916 (TN) 6355 Total 7192 6404 13596

0

50

100

150

200

250

300

62.8 62.9 63

63.1 63.2 63.3 63.4 63.5 63.6 63.7 63.8

0 0.05 0.1 0.15 0.2 0.25

Tr ee

Si ze

A cc ur ac y (P er ce nt ag e)

Confidence Factor

Accuracy Tree Nodes#

Figure 5.9: Accuracy and parsimony (tree size) of the smart default change as a function of Confi- dence Factor

on the J48 decision trees is called Confidence Factor (CF) in Weka, and lowering the Confidence

Factor will incur more pruning. We tested the J48 classifier with a Confidence Factor ranging from

0.01 to 0.25 (the default setting in Weka) with an increments of 0.01.

Figure 5.9 displays the accuracy and the size of the decision tree as a function of the

Confidence Factor. The X-axis represents the Confidence Factor; the left Y-axis and the orange

line represent the accuracy of the smart default setting; the right Y-axis and the dotted blue line

represent the size of the decision tree for that setting. The highest accuracy, 63.75%, is achieved

with the 264-node decision tree produced by CF = 0.25. The lowest accuracy, 62.9%, is achieved

with the 44-node decision tree produced by CF = 0.19. When CF ≤ 0.16, the decision tree contains

only 8 nodes. The 8-node profile with the highest accuracy is produced by CF = 0.10 with an

accuracy of 63.32%.

Figure 5.10 summarizes accuracy as a function of parsimony. The X-axis represents the

number number of nodes in the decision tree (more = lower parsimony); the Y-axis represents the

accuracy of the decision tree. The figure shows the most accurate J48 solution for any given tree

size, and includes the One Rule and Naive predictions for comparison. Reducing the tree from 264 to

53

Yang He Dissertation

53

54

55

56

57

58

59

60

61

62

63

64

0 40 80 120 160 200 240 280

A cc ur ac y (P er ce nt ag e)

Total Number of The Decision Tree

Overall

One Rule

Naive

Figure 5.10: Parsimony/accuracy comparison for Naive, One Rule, and Overall Prediction

Figure 5.11: A “smart default” setting with only 8 nodes 63.32% accuracy.

8 nodes incurs a negligible 0.67% reduction in accuracy. This decision tree is shown in Figure 5.11,

and is still 3.1% better than the One Rule prediction model and 18.9% better than the naive “disable

all” default. This more parsimonious “smart default” setting can easily be explained to users as

follows:

• All sharing with third parties will be disabled by default.

• Remote storage is allowed for automation and alerts, but not for detecting your presence or

location in the house.

• Local storage is allowed for all purposes.

While the “smart default” setting makes a considerable improvement over a naive default,

there is still a lot of room for improvement—even our best prediction model only correctly models

on average 63.76% of the user’s desired settings. This should come at no surprise, as one of the

54

Yang He Dissertation

most consistent findings in the field of privacy is that people differ substantially in their privacy

preferences [65]. As a result, our “one-size fits all” default setting—smart as it may be—is not

very accurate. Therefore, in Chapter 4 we moved beyond “smart default” settings by clustering

participants with similar privacy preferences and creating a set of “smart profiles” covering each of

the clusters [11]. The idea is that the accuracy of the tree for each cluster will likely exceed the

accuracy of our overall prediction model.

In the remainder of this section we apply existing and new clustering methods with the aim of

creating separate “smart profiles” for each cluster. As our goal is to develop simple, understandable

profiles, we keep the parsimony/accuracy trade-off in mind during this process.

5.5.4 Attitude-Based Clustering

Our statistical results indicate that the effects of scenario parameters on users’ decisions

are mediated by their attitudes (Risk, Comfort, Appropriateness, Expectedness and Usefulness),

as shown in Figure 5.12. Therefore, our first attempt to develop “smart profiles” is to cluster

participants with similar attitudes towards the 12 scenarios they evaluated. We averaged the values

per attitude across each participant’s 12 answers, and ran a k-means clustering algorithm to divide

them into 2, 3, 4, 5, and 6 clusters. We then added the participants’ cluster assignments back to

our original dataset, and ran the J48 decision tree algorithm on the dataset with this additional

Cluster attribute for each number of clusters, varying the Confidence Factor from 0.01 to 0.25 with

increments of 0.01. The results are summarized in Figure 5.13, which displays the most accurate

solution for any given tree size and number of clusters.

PARAMETERS (Who, What,

etc.)

ATTITUDES (Comfort, Risk,

etc.)

DECISION (Enable/Disable)

Figure 5.12: Different tests conducted for mediation analysis

All of the resulting decision trees have Cluster as the root node. This justifies our approach,

because it indicates that the Cluster parameter is a very effective for predicting users’ decisions. It

55

Yang He Dissertation

69

70

71

72

73

74

0 25 50 75 100 125

A cc ur ac y

Average Tree Size Per Profile

2-profile (2-cluster)

3-profile (3-cluster)

3-profile (4-cluster)

4-profile (5-cluster)

Figure 5.13: Parsimony/accuracy comparison for attitude-based clustering

also allows us to split the decision trees at the root node, and a create different “smart profile” for

each subtree/cluster. Note that for some solutions two clusters end up with the same decision tree,

which effectively reduces the number of profiles by 1.

For the 2-cluster solutions (the blue line in Figure 5.13), the highest accuracy is 72.66%,

which is a 14.0% improvement over the best single “smart default” setting. However, this tree has

an average of 121.5 nodes per profile. In comparison, the most parsimonious solution has only 1

node (“disable all”) for one of the clusters, and 3 nodes (“disable sharing with third parties”) for

the other cluster (see Figure 5.14). This solution still has an accuracy of 69.44%, which is still an

8.9% increase over the best single “smart default” setting.

For the 3-cluster solutions (the orange line in Figure 5.13), the highest accuracy of 73.47%

is achieved by a set of trees with 26.67 nodes on average (a minimal improvement of 1.1% over the

best 2-cluster solution, but with simpler trees), while the most parsimonious solution has a “disable

all” and an “enable all” tree, plus a tree that is the same as the most parsimonious smart default

setting (see Figure 5.11). This solution has an accuracy of 72.19%, which is a 4.0% increase over

the most parsimonious 2-cluster solution.

56

Yang He Dissertation

Figure 5.14: The most parsimonious 2-profile attitude-based solution.

The 4-cluster solutions (the grey line in Figure 5.13) all result in “over-clustering”: all

solutions based on the 4-cluster Cluster parameter result in two profiles with the same subtree,

effectively resulting in a 3-profile solution. The accuracy of these solutions is actually lower than

the accuracy of similar 3-cluster solutions, so we will not discuss them here.

The 5-cluster solutions (the yellow line in Figure 5.13) are also “over-clustered”, resulting

in 4 profiles. The highest accuracy of 73.56% is achieved by a set of trees with 26 nodes—this is

about the same accuracy and parsimony as the most accurate 3-cluster solution. The same holds for

the most parsimonious 5-cluster solution, which has a similar accuracy and parsimony as the most

parsimonious 3-cluster solution.

The accuracy of the 6-cluster solutions (which result in either 4- or 5-profile solutions) is

lower than the accuracy of similar 5-cluster solutions. Therefore, we will not further discuss these

results.

Reflecting upon the attitude-based clustering results, we observe in Figure 5.13 that there is

indeed a trade-off between accuracy and parsimony: the most parsimonious results are less accurate,

but the most accurate results are more complex. Moreover, the 2-profile solutions are about 5% less

accurate than the 3-profile solutions at any level of complexity. The 4-profile solutions do not

improve the solution much further, though.

The 3-profile solution with an average of 18.33 nodes per profile and 73.26% accuracy pro-

vides a nice compromise between accuracy and parsimony. Part of this decision tree is shown in

Figure 5.15: it contains one “disable all” profile, one “enable all” profile, and a more complex pro-

file with 55 nodes that disallows sharing with third parties and allows remote and local storage

depending on the purpose (not further shown).

57

Yang He Dissertation

5/6/2018 localhost/d3s.html

http://localhost/d3s.html 1/1

cluster

cluster 2

cluster 1

cluster 0

cstore

yes

no

share: no

remserv: purposef

locally: purposef

Figure 5.15: A 3-profile solution example of attitude-based clustering.

5.5.5 Agglomerative Clustering

The attitude-based clustering approach requires knowledge of users’ attitudes towards the

household IoT information-sharing scenarios, which may not always be available. We developed an

alternative method for finding “smart profiles” that follows a hierarchical bottom-up (or agglomer-

ative) approach, using users’ decisions only. This method first fits a separate decision tree for each

participant, and then iteratively merges these trees based on similarity. In our previous work [11]

only 10 out of the 200 users in the dataset had unique trees fitted to them (all others had an “enable

all” or “disable all” tree), making the merging of trees a rather trivial affair. Our current dataset

has many more participants, and is more complex, making the agglomerative clustering approach

more challenging but also more meaningful.

In the first step, 283 participants’ decision trees predict “enable all”, 414 participants’

decision trees predict “disable all”, while the remaining 436 participants have a multi-node decision

tree.

In the second step, a new decision tree is generated for each possible pair of participants in

the “multi-node group”. The accuracy of the new tree is compared against the weighted average of

the accuracies of the original trees. The pair with smallest reduction in accuracy is merged, leaving

435 clusters for the next round of merging. If two or more candidate pairs have the same smallest

reduction in accuracy, priority is given to the pair with the most parsimonious resulting tree (i.e.,

with smallest number of nodes). If there are still multiple pairs that tie on this criterion, the first

pair is picked. The second step is repeated until it reaches the predefined number of clusters, and

the entire procedure is repeated with 20 random starts to avoid local optima.

To fit the trees, we use the J48 classifier with a Confidence Factor ranging from 0.01 to 0.25

with increments of 0.01. Surprisingly, smaller tree sizes result in a higher accuracy for agglomerative

clustering (see Figure 5.16). This suggests that without extensive trimming, our agglomerative

58

Yang He Dissertation

78

78.5

79

79.5

80

80.5

81

0 2 4 6 8 10 12 14 16 18

A cc ur ac y

Average Tree Size perProfile

4-cluster

5-cluster

6-cluster

Figure 5.16: Parsimony/accuracy comparison for agglomerative clustering

Figure 5.17: The best 4-profile agglomerative clustering solution.

approach arguably overfits the data, resulting in a lower level of cross-validated accuracy.

The best 4-cluster solution has an average of 2 nodes per profile and an accuracy of 79.40%—

a 24.53% improvement over the “smart default”, and a 7.9% increase over the most accurate 5-

cluster/4-profile attitude-based clustering solution. The decision trees are shown in Figure 5.174:

aside from the “enable all” and “disable all” profiles, there is a “disable sharing with third parties”

profile and a “local storage only” profile.

The best 5-cluster solution has an average of 2.4 nodes per profile and an accuracy of

80.35%—a 26.02% improvement over the “smart default”, but only a 1.2% improvement over the

4T: Data will be stored at the remote servers and will be shared with Third Parties; R: Data will be stored at the remote servers without sharing; L: Data will be stored locally. These are the same with the following figures in this chapter

59

Yang He Dissertation

Figure 5.18: The best 5-profile agglomerative clustering solution.

Figure 5.19: The best 6-profile agglomerative clustering solution.

4-cluster agglomerative solution. The decision trees are shown in Figure 5.18: it has the same

profiles as the 4-cluster solution, plus an “allow automation and alerts, but don’t track my presence

or location in the house” profile.

Finally, the best 6-cluster solution5 has an average of 3.17 nodes per profile and an accuracy

of 80.68%—a 26.54% improvement over the “smart default”, but no substantial improvement over

the 5-cluster agglomerative solution. The decision trees are shown in Figure 5.19: it has the same

profiles as the 5-cluster solution, plus a profile that allows local storage for anything, plus remote

storage for any reason except for user profiling (i.e., to recommend other services or to give the user

insight in their behavior).

5There is another solution with slightly fewer nodes per profile (2.67) and a slightly lower accuracy (80.60%).

60

Yang He Dissertation

5.5.6 Fit-Based Clustering

We now present a “fit-based” clustering approach that, like the agglomerative approach,

clusters participants without using any additional information. Instead, it uses the fit of the tree

models to bootstrap the process of sorting participants into different clusters. The process of our

algorithm is similar as the one we used in previous chapter shown in Figure 4.5. The detailed steps

are as follows:

• Random starts: We randomly divide participants into k separate groups, and learn a tree for

each group. This is repeated until a non-trivial starting solution (i.e., with distinctly different

trees per group) is found.

• Iterative improvements: Once each of the k groups has a unique decision tree, we test for

each participant which of the k trees best represents their 12 decisions. If this is the tree of a

different group, we switch the participant to this group. Once all participants are evaluated

and put in the group of their best-fitting tree, the tree in each group is re-learned with the

data of the new group members. This then prompts another round of evaluations, and this

process continues until no further switches are performed.

• Repeat: Since this process is influenced by random chance, it is repeated 1,000 times in its

entirety to find the optimal solution. Cross-validation is performed in the final step to prevent

over-fitting.

We perform this approach to obtain 2-, 3-, 4-, and 5-cluster solutions. To fit the trees, we

use the J48 classifier with a Confidence Factor ranging from 0.01 to 0.25 with increments of 0.01.

The best results are summarized in Figure 5.20.

For the 2-cluster solutions (the blue line in Figure 5.20), the highest accuracy is 76.72%—

a 20.33% improvement over the “smart default” setting and a 5.6% improvement over the most

accurate 2-cluster attitude-based solution. However, this tree has an average of 151.5 nodes per

profile. The most parsimonious solution is exactly the same as the most parsimonious 2-cluster

attitude-based solution (see Figure 5.14), but with a higher accuracy (74.43%).

For the 3-cluster solutions (the orange line in Figure 5.20), the highest accuracy of 80.81%

is achieved by a set of trees with 65.33 nodes on average. This is a 26.74% improvement over the

“smart default”, a 10.0% improvement over the most accurate 3-cluster attitude-based solution (but

61

Yang He Dissertation

73

74

75

76

77

78

79

80

81

82

83

84

0 20 40 60 80 100 120 140 160

A cc ur ac y (P er ce nt ag e)

Average Tree Size PerProfile

2-profile

3-profile

4-profile

5-profile

Figure 5.20: Parsimony/accuracy comparison for fit-based clustering

Figure 5.21: The most parsimonious 3-profile fit-based solution.

at a cost of lower parsimony), and a 5.2% improvement over the best 2-cluster fit-based solution.

The most parsimonious solution, on the other hand, has 7 nodes on average, with an accuracy of

79.80%, thereby still outperforming all other 3-profile solutions. The decision trees for this solution

are shown in Figure 5.21.

For the 4-cluster solutions (the grey line in Figure 5.20), the highest accuracy of 82.41%

is achieved by a set of trees with 58.25 nodes on average. This is a 29.25% improvement over the

“smart default”, a 3.8% improvement over the 4-cluster agglomerative solution (but at a cost of

lower parsimony), and a 2.0% improvement over the best 3-cluster fit-based solution. The most

parsimonious solution, on the other hand, has 9.25 nodes on average, with an accuracy of 81.88%. It

still outperforms all other 4-profile solutions, but the agglomerative solution is more parsimonious.

The decision trees for this solution are shown in Figure 5.22.

62

Yang He Dissertation

Figure 5.22: The most parsimonious 4-profile fit-based solution.

Figure 5.23: The most parsimonious 5-profile fit-based solution.

For the 5-cluster solutions (the yellow line in Figure 5.20), the highest accuracy of 83.35%

is achieved by a set of trees with 51.4 nodes on average. This is a 30.05% improvement over the

“smart default”, a 3.8% improvement over the 5-cluster agglomerative solution (but at a cost of

lower parsimony), and a 1.1% improvement over the best 4-cluster fit-based solution. The most

parsimonious solution, on the other hand, has 4.2 nodes on average, with an accuracy of 82.92%.

It still outperforms the 5-profile agglomerative solution, but it is slightly less parsimonious. The

decision trees for this solution are shown in Figure 5.23.

5.5.7 Discussion of machine learning results

Figure 5.24 shows a comparison of the presented approaches. The X-axis represents the par-

simony (higher average tree size per profile = lower parsimony); the Y-axis represents the accuracy.

While the “smart default” setting makes a significant 15.3% improvement over the naive default

setting (“disable all”), we observe that having multiple “smart profiles” substantially increases the

63

Yang He Dissertation

prediction accuracy even further. The fit-Based clustering algorithm performs the best out of all

the approaches, followed by agglomerative clustering and attitude-based clustering.

The most parsimonious 2-profile fit-based solution (with an accuracy of 74.43%) is the

simplest of all “smart profile” solutions: one profile is simply “disable all”, while the other profile

is the same as our OneR solution: “disable sharing with third parties”. In fact, these profiles are so

simple, that one might not even want to bother with presenting them to the user: in our current

interface (see Figure 5.6) these defaults are incredibly easy for users to implement by themselves.

The same is true for the 4-profile agglomerative clustering solution (see Figure 5.17) and

the 5-profile agglomerative clustering solution (see Figure 5.18): these profiles involve little more

than a single high-level setting, which users can likely easily make by themselves.

The 5-profile fit-based solution is the most accurate of all “smart profile” solutions. The

most parsimonious 5-profile fit-based clustering solution (Figure 5.23) has an accuracy of 82.92%.

It has the following five profiles:

• Enable all

• Enable local and remote storage, but disable third-party sharing

• Enable local storage only

• Enable local storage for everything except location-tracking, enable remote storage for every-

thing except location- and presence-tracking, and disable third-party sharing

• Disable all

The fourth profile in this list specifies an interaction between between Storage and Purpose—

something that is not possible in our current manual settings interface (which only allows interactions

between Who, What, and Purpose). The next section will present a slightly altered interface that

accommodates these profiles.

There is another 5-profile fit-based solution with a slightly higher accuracy (83.11%) and

a reasonably simple tree (5 nodes/profile on average). This solution is shown in Figure 5.25. In

this solution, the third profile (“enable local storage only”) is replaced by a slightly more complex

profile (“enable local storage only, but not to recommend other services”). This profile specifies

an additional interaction between Storage and Action. The next section will present a settings

interface that accommodates this profile as well.

64

Yang He Dissertation

63

66

69

72

75

78

81

84

0 20 40 60 80 100 120 140 160

A cc ur ac y

Average Tree Size per Profile

Fit (5)

Fit (4)

Fit (3)

Fit (2)

Conglomerative (6)

Conglomerative (5)

Conglomerative (4)

Attitude (5)

Attitude (4)

Attitude (3)

Attitude (2)

Overall

Figure 5.24: Summary of All our Approaches

65

Yang He Dissertation

Figure 5.25: A good 5-profile fit-based clustering solution.

Other usable solutions are the 3-profile fit-based solution (Figure 5.21) or the 4-profile

fit-based solution (Figure 5.22. However, like almost all of the less parsimonious solutions, these

profiles involve higher-order interaction effects, e.g. between Storage, Purpose, and Action; and

between Storage, Purpose, and Who. Consequently, a rather more complex interface is needed

to accommodate these default profiles.

5.6 Privacy-Setting Prototype Design Using Machine Learn-

ing Results (original work)

In Section 5.4 we developed a prototype interface that household IoT users can use to

manually set their privacy settings (see Figure 5.6). Our machine learning analysis (Section 5.5)

resulted in a number of interesting solutions for “smart profiles” that would allow users of this

interface to set their privacy settings with a single click (i.e., a choice of profile). While some of

these profiles can be integrated in our prototype (e.g., the most parsimonious 2-profile fit-based

solution and the 4-profile and 5-profile agglomerative solutions) other profiles have an interaction

effect between variables that are modeled as independent in our current prototype interface (e.g.,

the two 5-profile fit-based solutions presented in Figures 5.23 and 5.25).

In this section we therefore present two modified prototypes that are designed to be com-

patible with these two 5-profile solutions. These two solutions are not the most accurate, but they

produce a parsimonious set of profiles that require only minimal alterations to our interface design.

66

Yang He Dissertation

They thus provide the optimal trade-off between reduction accuracy, profile parsimony, and interface

complexity.

5.6.1 Interface for the 5-profile fit-based solution with an accuracy of

82.92%

This machine learning solution (Figure 5.23) requires an interaction between the Storage

parameter and the Purpose parameter—two parameters that are controlled independently in the

prototype in Figure 5.6. Our solution is to slightly alter the interface, and add the profile selection

page at the beginning of the interface. As shown in Figure 5.26, from top left, screen 1 is the profile

selection page, screen 2 is the slightly altered landing page of our manual settings interface, screen 3

is the slightly altered Data Storage page, screen 4 (bottom left) is the Device/Sensor Management

page, and screen 5 is the Data Use page.

• Screen 1: On this screen users choose their most applicable default profile. For some users,

the selected profile accurately represents their preferences, while others may want to adjust

the individual settings manually.

• Screen 2: After clicking ‘Next’, users are given the option to select ‘Storage/Sharing &

Device/Sensor Management’ or ‘Data Use’.

• Screen 3: When users select either ‘Storage/Sharing & Device/Sensor Management’ they first

get to set their sharing preferences for ‘local storage’, ‘remote server’ and ‘third party sharing’

(Storage). Each of these can independently be set to enabled or disabled, but users can also

click on ‘More’.

• Screen 4: When users select ‘More’, they can manage Who-What-Purpose combinations for

that particular storage/sharing option.

• Screen 5: When users select ‘Data Use’ on screen 2, they get to enable/disable the use of the

collected data for various secondary purposes (Action).

67

Yang He Dissertation

Figure 5.26: Design for 5-Profile solution presented in Section 5.6.1.

68

Yang He Dissertation

5.6.2 Interface for the 5-profile fit-based solution with an accuracy of

83.11%

The alternative machine learning solution presented in Figure 5.25 requires an additional

interaction between the Storage parameter and the Action parameter. This requires us to slightly

alter the interface again. As shown in Figure 5.27, from top left, screen 1 is the profile selection

page, screen 2 is the slightly altered Data Storage page, screen 3 follows the ‘More’ button to offer

access to screen 4 (bottom left, the Data Use page) and screen 5 (bottom right, the Device/Sensor

Management page)

• Screen 1: The profile selection screen remains unchanged, with the exception that the ‘Local

storage only’ profile is replaced by the more complex ‘Local Storage & No Recommendations’

profile.

• Screen 2: After clicking ‘Next’, users first get to set their sharing preferences for ‘local

storage’, ‘remote server’ and ‘third party sharing’ (Storage). Each of these can independently

be set to enabled or disabled, but users can also click on ‘More’.

• Screen 3: When users select ‘More’, they are given the option to select either ‘Device/Sensor

Management’ or ‘Data Use’.

• Screen 4: When users select ‘Device/Sensor Management’ they can manage Who-What-

Purpose combinations for that particular storage/sharing option.

• Screen 5: When users select ‘Data Use’ they get to enable/disable the use of the collected

data for various secondary purposes (Action) for that particular storage/sharing option.

5.6.3 Reflection on design complexity

The interfaces presented in this section have an additional ‘layer’ compared to the original

interface presented in Section 5.4. This additional layer makes setting the privacy settings manually

more difficult, but it is necessary to accommodate the complexity of the smart profiles uncovered

by our machine learning analysis. On the one hand, this demonstrates the value of developing a

parsimonious machine learning model—the more accurate but more complex profiles that comprise

some of the solutions in Section 5.5 are not only more difficult to explain to the user, they also contain

69

Yang He Dissertation

Figure 5.27: Design for 5-Profile solution presented in Section 5.6.2.

70

Yang He Dissertation

more complex interactions between decision parameters, forcing the manual settings interface to

become even more complex. A simple smart profile solution avoids such complexity in the interface.

On the other hand, one should not over-simplify the profiles, lest they become overly generic

and inaccurate in representing users’ privacy preferences. Indeed, when we make our smart profile

solutions more accurate, fewer users will need to make any manual adjustments at all, so we can

allow some additional complexity in the interface.

5.7 Limitations

In this section, we discuss the limitations of our work, our plans to evaluate the presented

interfaces.

We note that participants in our study made decisions about hypothetical rather than “real

life” scenarios. However, compared to most other privacy studies, our study asks participants about

very specific IoT scenarios, measuring their attitudes and behaviors in the context of these scenarios.

The hypothetical nature of the scenarios is thus a conscious trade-off here: it is impossible to measure

privacy in 4000+ scenarios without presenting them on a screen.

A limitation regarding our machine learning approach is that it assumes a perfect assignment

of users to profiles. However, in our current approach, users of the profile-based interface make their

own choice as to which profile they want to apply. If they do not make the correct choice, then

this introduces additional uncertainty, and the accuracy of our approach will be substantially lower

than described in our paper. This limitation highlights the importance of the parsimony/accuracy

tradeoff: Users benefit from parsimony in the context of our study, because parsimony makes for

simpler profiles, which are easier to understand and hence easier to choose from. At the same time,

though, these more parsimonious profiles are likely going to be less accurate, which means that users

need to make more manual adjustments to the profile-based settings.

Our biggest limitation is that we did not test any of the presented interfaces, so we do not

know what level of complexity (in term of both the complexity of the user interfaces and the profiles)

is most suitable. I will address this limitation in my final study (Chapter 7).

71

Yang He Dissertation

5.8 Summary

In this chapter, we have presented the following:

• Using an intricate mixed fractional factorial study design, we collected a dataset of 1133

participants making 13596 privacy decisions on 4608 scenarios.

• We performed statistical analysis on this dataset to develop a layered IoT privacy-setting

interface. As our analysis shows more complex decision patterns than our previous work,

we presented guidelines to translate our statistical results into a more sophisticated settings

interface design.

• We performed machine learning analysis on our dataset to create a set of “smart profiles”

for our IoT privacy-setting interface. Beyond our work in Chapter 4, we conducted a deeper

analysis regarding the trade-off between parsimony and accuracy of our prediction models,

leading to a better-informed selection of smart profiles.

• Aside from the privacy-setting interface and the smart profiles, we made specific design rec-

ommendations for household IoT devices that can help to minimize users’ privacy concerns.

In the next chapter, we discuss our work applying data-driven approach in the fitness IoT

context.

72

Chapter 6

Recommending Privacy Settings

for Fitness IoT

6.1 Introduction

In Chapter 4 and 5, we have discussed how we apply the data-driven approach to the

general/public IoT and household IoT contexts, respectively. We developed corresponding IoT

privacy-setting interface prototypes that integrated with smart defaults/profiles by predicting users’

privacy decisions. In this chapter, we present the work we did in the domain of fitness IoT. We

further test the previously-developed data-driven approach to design privacy-setting interfaces for

users of fitness IoT devices. Note that moving the context from general/public IoT to household

IoT, now to fitness IoT, the context that we are focusing is becoming more narrow. The change of

environment brought more challenge. For example, for fitness IoT, there is no contextual scenario,

which we focused on in Chapter 4 and 5. Considering almost all the current fitness IoT devices

require corresponding mobile Apps to be used together and the mobile Apps are usually the ones

who are take charge of users’ privacy information, we focus on the privacy permissions asked by the

mobile Apps. In this Chapter, we first collect users permission decisions to fitness IoT permissions.

Note, unlike previous chapters, for fitness IoT, we divide those permission data into four groups

(discussed in the next section). Thus, each user will have four profiles, namely one for each type

of permission. Then we apply our data-driven approach to classify users into groups based on their

73

Yang He Dissertation

permission decisions, and create permission profiles for each group. This allows new users to answer

very few questions before getting a recommendation of a set of permission profiles, which simplifies

users’ task of setting every permission for fitness IoT devices.

6.2 Data Model

As discussed in Section 6.1, the mechanism that most modern fitness trackers use to guide

their user to manage privacy settings is by asking users various permission questions. We first

investigate the questions asked by mainstream fitness trackers, and then adapt those questions for

the use of our data model in this study.

As shown in Figure 6.1, we examined the permission questions asked by the mainstream

fitness trackers (Fitbit, Garmin, Jawbone, and Misfit) and categorized these questions into 3 groups

– Smartphone Persmission, In-app Requests, and Fitness Data,.

6.2.1 Smartphone Permissions (S set)

The first group of permissions are the smartphones permissions, which are requested during

the installing or the first use of the mobile application. The requested smartphone permissions differs

by the brands of the fitness trackers as well as the mobile Operation System of the smartphones. As

shown in Figures 6.2a and 6.2b, even for the mobile application from the same manufacturer (Fitbit),

the requested smartphone permissions are different between the iOS version and the Android version.

We summarize all the requested smartphone permissions by popular brands of fitness trackers’ mobile

application across different mobile Operating Systems (i.e. iOS, Android, and Windows Mobile).

6.2.2 In-App Requests (A set)

Fitness tracks also intend to collect user’s data in their mobile applications. For example,

Fitbit asks users to provide their First Name, Last Name, Gender, Height, Weight, Birth Date,

as shown in Figure 6.3 when signing up an account during the first-time using the mobile App.

Note that these data are mandatory for all fitness trackers in Figure 6.1; the only optional piece of

information is Misfit’s request on users’ occupation. Figure 6.3 shows the A set for the Fitbit app

(other apps are similar).

74

Yang He Dissertation

Figure 6.1: Comparison of permissions asked by Fitness Trackers and the fitness IoT Data Model used for this study.

75

Yang He Dissertation

(a) The interface of smartphone permissions of Fit- bit iOS App

(b) The interface of smartphone permissions of Fit- bit Android App

Figure 6.2: Interface examples of Smartphone Permissions requests for Fitbit trackers (S set)

Figure 6.3: Interface example of In-App Permissions requests in Fitbit Android App (A set)

76

Yang He Dissertation

6.2.3 Fitness Data Permissions (F set)

F set contains fitness-related data that is either automatically collected by the fitness tracker

or manually input by the users, such as food and water logs, friend list. As shown in Figure 6.1,

we follow Fitbit’s permission model for F set but give users more fine-grained control over Activity

and Exercise data by breaking these permissions down into steps, distance, elevation, floors, activity

minutes, and calories activity. A total of 14 permissions are included in the F set for our study.

6.2.4 GDPR-based Permissions (G set)

As of May 25, 2018, the European Union (EU) enforce the General Data Protection Reg-

ulation (GDPR) [112] which applies to the storage, processing and use of the subject’s personal

data from the third parties which may or may not have been established in the EU as long as they

operate in an EU market or acess data of EU residents. It requires users to provide explicit con-

sent to privacy options expressed by third parties. The G set includes permissions that are based

on GDPR requirements. The purpose of data collection, hasReason, includes safety, health, social,

commercial and convenience. The frequency of data access, hasPersistence, includes continuous

access, continuous access but only when using the app, and separate permissions for each workout.

For the retention period of collected data, hasMaxRetentionPeriod, permissions include retain until

no longer necessary, retain until the app is uninstalled, and retain indefinitely. We did not include

the hasMethod property since it involves technical background.

The types of third parties (instances of EntityType) that can request access to the user’s

Fitness data include health/fitness apps, Social Network (SN) apps (public or friends only), other

apps on the user’s phone, and corporate and government fitness programs.

6.3 Dataset

The dataset we use in this study was collected by my colleague Odnan. 310 participants

were asked to set up a new account using a fitness tracker mobile App similar to Fitbit. They were

then asked the 4 groups of questions that we discussed in our data model. For each question, the

answer will be either “Allow” or “Deny”, meaning the participants are either willing to provide

information for that permission or not. After answering these questions, participants were then

asked to fill our a survey questionnaire measuring their privacy-related attitudes (i.e. Trust, Privacy

77

Yang He Dissertation

concerns, Perceived surveillance and intrusion, and Concerns about the secondary use of personal

information), the negotiability of their privacy settings, their social behaviour (social influence and

sociability), exercise tendencies (a proxy for their attitude and knowledge about fitness tracking),

and demographic information.

As shown in Figure 6.4, participants intend to have a higher disclose rate for their demo-

graphics information (A set), which is in line with the results of other studies [64].

For the smartphone permissions (S set), participants are more likely to allow motion, loca-

tion, bluetooth, and mobile data, which are usually the minimum permissions required for a fitness

mobile App to work. In S set, the access to contacts and photos are the least allowed permissions.

Regarding the G set, participants seem most open to data collection for health (the main

purpose of a fitness tracker) and safety (another popular purpose often advertised by the manu-

facturers). On the other hand, users are less likely to agree to data collection with an indefinite

retention period, and they prefer not to share data with government fitness programs or publicly on

social media.

6.4 Predicting users’ Preference (partial original work)

We predict participants’ allow /deny decision using machine learning methods. Our dataset

shows considerable variability between participants’ privacy preferences—a finding that is broadly

reflected in the privacy literature [65]. Using clustering, one can capture the preferences of various

users with a higher level of accuracy. Hence, the goal of this section is to find a concise set of profiles,

clusters, that can represent the variability of the permission settings among our study participants.

We cluster participants’ permissions with Weka1 using the K-modes clustering algorithm

with default settings. The K-modes algorithm follows the same principles as the more common

K-means algorithm, but it is more suitable for the nominal variables in our dataset.

6.4.1 Overall Prediction

In our first clustering attempt we tried to find a set of profiles by clustering the full dataset,

including the A, F, S, and G subsets. A drawback of this method is that, assume we cluster the users

into n clusters, this method will only provide n possible profiles to be used for recommendations to

1https://www.cs.waikato.ac.nz/ml/weka/

78

Yang He Dissertation

Figure 6.4: Average values of each privacy permissions (1-allow, 0-deny).

79

Yang He Dissertation

(a) S dataset (b) A dataset

(c) F dataset (d) G dataset

Figure 6.5: Evaluation of different numbers of clusters for each set.

the users. A further drawback of clustering based on the full set of 45 permissions is that it has

high error rates (e.g., the sum of squared error for the viable 4-cluster solution is 1435 and 1688 for

2-cluster solution). In addition, the profile provided will be complicated since all the settings from

four different sets are presented in a single profile, making it difficult to explain to the users.

If we instead generate a separate set of n “subprofiles” for each of the four datasets (A, F,

S, and G), n4 different combinations of profiles can be used for recommendation, providing finer-

grained privacy-setting controls to the users compared to clustering the full set. In addition, error

rates are lower when clustering each set separately, as shown in Figure 6.5. For example, with only

2 clusters per set, the sum of squared error reduces to 1277 (a 24.3% reduction). An additional

benefit is that the profiles for each set can be investigated in more detail.

In our dataset the fitness data permissions (F set) are specified repeatedly for each Entity

Type (part of the G set). We tried to cluster these combinations, taking into account all 98 features

(i.e., 14 fitness data per 7 entity types). This analysis resulted in two profiles: one that had “allow

all” for health and SN public entities (and “deny all” for all other entities), and one that had“deny

all” for all entities. This means that: a) very similar results can be obtained by considering the

fitness data permissions separately from the Entity Type, and b) as expected, the “who” parameter

(Entity Type) is more important than the “what” parameter (fitness data permissions).

80

Yang He Dissertation

(a) S dataset (b) A dataset

(c) F dataset (d) G dataset

Figure 6.6: Privacy profiles from the two clustering methods: 1-cluster results (full data) and 2- clusters results (privacy subprofiles) for each dataset(allow=1, deny=0, except for frequency & retention)

In the following, we will discuss our method that generates subprofiles for each of the four

datasets.

6.4.2 2-Cluster Solution

We first investigate the optimal number of clusters by running the K-modes algorithm for 1-

6 clusters with a 70/30 train/test ratio, using the sum of squared errors of the test set for evaluation.

The results are shown in Figure 6.5. Using the elbow method [69], we conclude that 2 is the optimal

number of clusters for each dataset2.

The final cluster centroids of the 2-cluster solution for each dataset are shown in Figure 6.6,

together with the results of the 1-cluster solution. We describe the subprofiles of each set in the

subsections below.

2We obtain similar results using other clustering algorithms, such as Hierarchical Clustering.

81

Yang He Dissertation

6.4.2.1 The S Set

• Minimal (cluster 0): this subprofile allows the minimum permissions needed to effectively run

a fitness app. This includes identity, location, bluetooth, motion & fitness, and mobile data

permissions.

• Unconcerned (cluster 1): this subprofile allows all permissions in this dataset.

6.4.2.2 The A Set

• Anonymous (cluster 0): this subprofile shares only users’ gender, height and weight informa-

tion but not their birth date or first and last name.

• Unconcerned (cluster 1): this subprofile shares all data requested in this dataset.

6.4.2.3 The F Set

• Unconcerned (cluster 0): this subprofile shares all fitness data with third parties.

• Strict (cluster 1): this subprofile does not share any fitness data with third parties.

6.4.2.4 The G Set

• Socially-active (cluster 0): this subprofile shares data with health/fitness apps and social

network friends, but not with other recipients. Sharing is allowed for health, safety, and social

purposes but not for commercial purposes.

• Health-focused (cluster 1): this subprofile does not allow sharing with any third parties.

Sharing is allowed only for health and safety purposes.

6.5 Profile Prediction (partial original work)

Now that we have identified two privacy “subprofiles” per dataset, the next step is to find

predictors for the profiles and predict which subprofiles each participant belongs to.

Recommender systems usually ask users to evaluate a few items before giving recommen-

dations regarding all remaining items. Likewise, in our system, we might be able to identify certain

permission items inside each privacy subprofile that—when answered by the user—could drive the

82

Yang He Dissertation

(a) S set (86.42%) (b) A set (95.85%)

(c) F set (97.74%) (d) G set (82.26%)

Figure 6.7: The permission drivers for the privacy subprofiles and their respective prediction accu- racies.

prediction. Since the items are the permission preferences included in the subprofiles, we define this

as the “direct predicition” approach.

Additionally, we also explored whether the items from our questionnaire could drive the

predicition. Since these items are not part of the privacy subprofiles, we define this as the “indirect

prediction” approach. For each approach and for each subset of data (S, A, F, and G sets), we

develop decision trees that will enable us to predict which subprofile best describes a user. The trees

contain the subprofile items (direct prediction) or questionnaire items (indirect prediction) that can

be asked to classify each user into their correct subprofile.

We developed our decision trees using the J48 decision tree learning algorithm and evaluated

the resulting decision tree using cross validation.

6.5.1 Direct Prediction Questions

In our direct prediction approach, the aim is to ask users to answer certain permission

items from each subset as a means to classify them into the correct subprofile (thereby providing a

recommendation for the remaining items in that subset). For this approach, we thus classify users

using the items in the subset as predictors.

Our results for this approach are reported in Figure 6.7. It shows for each subset the question

that best classifies our study participants into the correct subprofile.

When running tree-based algorithms, a trade-off has to be made between the parsimony

and the accuracy of the solution. Parsimony prevents over-fitting and promotes fairness and can be

83

Yang He Dissertation

accomplished by pruning the decision trees. In our study, while multi-item trees may provide better

predictions, the increase in accuracy is not significant compared to the single-item trees presented

in Figure 6.7. These single-item solutions already obtained a high accuracy, and their parsimony

prevents over-fitting and minimizes the number of questions that will need to be asked to the users

in order to provide them accurate recommendations. The resulting solution involves a 4-question

input sequence—one question for each subset.

For the S set, the Photo permission is the best subprofile predictor. This is one of the

least-shared permissions (see Figure 6.4), and 94% of participants who give this permission are

correctly classified into the “Unconcerned” subprofile, while 83% of participants who do not give

this permission are correctly classified into the “Minimal” subprofile.

For the A set, First name is the best predictor. Again, 94% of participants who share their

first name are correctly classified into the “Unconcerned” subprofile, while 98% of participants who

do not share their first name are correctly classified into the “Anonymous” subprofile.

For the F set, Activity minutes permission is the best predictor. This is one of the most-

shared permissions. 97% of participants who give this permission are correctly classified into the

“Unconcerned” subprofile, while 100% of participants who do not give this permission are correctly

classified into the “Strict” subprofile.

Finally, for the G set, the best predictor is whether the participants allows data collection for

Social purposes. If so, participants are correctly classified into the “Socially active” subprofile with

84% accuracy, otherwise they are classified into the “Health-focused” subprofile with 80% accuracy.

6.5.2 Indirect Prediction Questions

A similar procedure was applied to the questionnaire data concerning the following cate-

gories of user traits: privacy attitude, social behavior, negotiability, exercise tendencies and user

demographics (cf. Table ?? in Appendix). As will be shown below, the indirect prediction approach

has a lower accuracy than the direct approach presented in Section 6.5.1. This is expected since

the questionnaire items about user traits have no direct relationship with the permission settings in

the privacy profiles. These results are still interesting, though, since they allow the user to avoid

making any specific privacy settings. Moreover, the resulting predictors show interesting semantic

relationships with the datasets they predict. We discuss these results in more detail below.

84

Yang He Dissertation

(a) S set (66.04%) (b) A set (65.28%)

(c) F set (69.81%) (d) G set (62.26%)

Figure 6.8: The attitude drivers for the privacy subprofiles and their respective prediction accuracies.

6.5.2.1 Privacy Attitudes

We first attempted to use privacy attitudes as predictors of users’ subprofiles. The resulting

trees for this indirect prediction are shown in Figure 6.8.

Among all the privacy attitude questions, “trust” and “privacy concern” are found to be

predicting factors of user subprofiles. Interestingly, there is a single privacy concern question (“I

believe other people are too concerned with online privacy issues”) that predicts the user’s S and

F subprofiles. Those who agree that people are just too concerned about privacy issues belong

to “Unconcerned” subprofile, while those who have higher concerns tend to be in the “Minimal”

subprofile. The same goes for the F set where those who strongly disagree, (1) on a 7pt scale,

thinking that it is a major concern belong to the “Strict” subprofile. Otherwise they are classified

as “Uncocerned”.

For the trust question, “I believe the company is honest when it comes to using the infor-

mation they provide”, it can be used to predict users’ subprofile for the A set. Participants are

assigned to the “Anonymous” subprofile if they answer this question with “somewhat disagree” (3)

or below. Those who indicate higher levels of trust are assigned to the “unconcerned” subprofile.

The A set concerns information provided directly to the fitness app, so it makes sense that trust is

a significant predictor of users’ willingness to provide such information.

For the G set, those users who agree (6) or extremely agree (7) with the question “I believe

the company providing this fitness tracker is trustworthy in handling my information” are classified

in the “Socially active” subprofile, while the remaining users are classified in the “Health-focused”

85

Yang He Dissertation

(a) S set (65.66%) (b) A set (61.89%)

(c) F set (69.43%) (d) G set (61.89%)

Figure 6.9: The social behavior drivers for the privacy subprofiles and their respective prediction accuracies.

subprofile. The question really fits the G set since GDPR permissions are mostly about handling

the user information by the third parties. Particularly, it makes sense that users who do not trust

the fitness app in handling their information would be assigned to the “Health-focused” profile,

since this profile prevents the app from sharing their data to any other entity and only allows data

collection for the purpose of health and/or safety.

The result shows that we managed to capture some semantically relevant relationships

between users’ attitudes and their assigned privacy profiles. The S and F sets share the same

predictor question which makes the final solution a 3-question input sequence that is one less question

to the users compared to the direct questions in Section 6.5.1.

6.5.2.2 Social Behavior

We also tried to find predictors among the questions about social influence and sociability.

The resulting trees for this indirect prediction are shown in Figure 6.9.

A single sociability question can be used to predict subprofiles for both the S and A sets. For

the S set, users who are completely open (1) to the idea of meeting new friends when they exercise are

classified in the “Unconcerned” subprofile, otherwise they are classified in the “Minimal” subprofile.

For the A set, users who are likely not (6) or definitely not (7) open to meeting new friends

are classified in the “Anonymous” subprofile, otherwise they are classified in the “Unconcerned”

subprofile.

For the F set, users who have never (7) met any new friends while exercising are classified

86

Yang He Dissertation

(a) S set (73.21%) (b) A set (62.26%)

(c) F set (72.08%) (d) G set (66.41%)

Figure 6.10: The user negotiability drivers for the privacy subprofiles and their respective prediction accuracies.

into the “Strict” subprofile, while others are classified into the “Unconcerned” subprofile. This, as

well as the findings regarding the S and A sets, seem to suggest that users’ disclosure of personal

information is likely to be related with their tendency to socialize while using fitness apps.

For the G set, users who are influenced to do exercise if their social media friends also

exercise (i.e., “definitely yes” to “neutral” (1-4)) are classified into the “Socially active” subprofile,

otherwise they are classified into the “Health-focused” subprofile.

Again, we found interesting semantic relationships between social influence and sociability

while exercising and users’ privacy-related behaviors: users who are more prone to reap social

benefits from exercising are more likely to give the app more widespread permissions. Similar to

privacy attitudes, these predictors only involve a 3-question input sequence.

6.5.2.3 Negotiability of Privacy Settings

We also attempted to use the negotiability of users’ privacy settings as input for the sub-

profile prediction. Figure 6.10 shows the tree-learning solutions for this approach.

For the S set, users who are willing to give the Phone permission (access phone calls and call

settings) if the benefits increase are classified into the “Unconcerned” subprofile, while users who

refuse to share the Phone permission even if the benefits increase are classified into the “Minimal”

subprofile. In other words, the privacy preferences of the latter group are not negotiable; they will

still share only the minimum permissions needed to run the tracker, even if the benefits increase.

For the A set, users who are willing to give the Identity permission (account and/or profile

information) if the risks decrease are classified into the “Unconcerned” subprofile, otherwise they

87

Yang He Dissertation

are classified into the “Anonymous” subprofile. Interestingly, the Identity permission is part of the

S set rather than the A set, but it semantically coincides with the items in the A set, which include

the user’s name and birth date (i.e., identifying information). As such, it makes sense that users

who are unwilling to share their phone’s identifier even when the risks decrease are also unwilling

to share their personal identity information.

For the F set, users who share their Sleep fitness data with other third parties if the risks

decrease are classified into the “Unconcerned” subprofile, otherwise they are classified into the

“Strict” subprofile. Users in the latter subprofile will not share their fitness data with any other

third parties, even if the risk decreases.

For the G set, users who share their fitness app Profile with other third parties if the risks

decrease are classified into the “Socially active” subprofile, otherwise they are classified into the

“Health-focused” subprofile. Even though Profile is a permission from the F set, it semantically

coincides with the subprofiles of the G set: users in the “Socially active” subprofile tend to have

permissions that allow them to connect to others while exercising, and sharing one’s fitness app

Profile is indeed a potential way to connect to other users. As such, it makes sense that users in

this subprofile are more willing to share their fitness app Profile if the risks of doing so decrease.

The classification accuracy of the negotiability questions is the highest among all “indirect

prediction” approaches. The most predictive questions also have understandable semantic relation-

ships with the datasets they predict.

6.5.2.4 Exercise Tendencies and User Demographics

We applied J48 learning algorithms to the group of exercise tendency questions and user

demographics as well, but we found no significant predictors among these questions. While other

studies have found user demographics to be significant predictors of privacy behaviors [64], in this

particular study we were not able to find any significant predictors among the group of user demo-

graphics.

6.5.3 Tree Evaluation

Figure 6.11 shows the root mean square error of all the trees produced by the J48 classifier.

The evaluation has been executed with k-fold cross validation with k = 10.

As expected, the “direct prediction” approach results in lower error rates than the various

88

Yang He Dissertation

“indirect prediction” approaches, since in the former approach the items are a direct part of the

privacy settings that constitute the subprofiles. Among the “indirect prediction” approaches, the

negotiability of privacy settings has slightly lower error rates. This is not surprising, since it is at

least partially related to the privacy settings (yet evaluates whether those settings will change under

certain conditions). The prediction accuracies of each tree are reported on the branches in their

respective figures (Figure 6.7 to 6.10), and take the form of (# assigned / # incorrect).

6.6 Privacy-setting Recommendations (partial original work)

In this section, we describe different types of guided privacy-setting approaches for fitness

Iot users that are based on the previous clustering and machine learning results.

6.6.1 Manual Setting

The baseline privacy settings interface is one where users have to manually set their settings

(see Figure 6.12). If users do this correctly these manual settings should match their privacy prefer-

ences 100%. However, the process of manually setting one’s privacy settings can be very burdensome

for the user; our system has a total of 45 permissions that are required to be managed. Under such

burden, users are likely going to make mistakes [82], so the 100% accuracy may not be achieved

through manual settings.

The next strategies exploit the results of the analysis in the previous section to provide

interactive recommendations that simplify the task of privacy permission setting, with different

levels and type of user intervention.

6.6.2 Single Smart Default Setting

One way to reduce the burden of privacy management is with single “smart” default setting.

Rather than having the user set each permission manually, this solution already selects a default

setting for each permission. Users can then review these settings and change only the ones that do

not match their preferences.

The optimal “smart” default is a set of settings that is aligned with the preferences of the

majority of users. Hence, we can calculate these setting by using the cluster centroid of the 1-cluster

solution (i.e., the full dataset “single cluster” in Figure 6.6). Figure 6.13 shows the resulting default

89

Yang He Dissertation

values for each dataset. If the user is unhappy with these settings, he/she can still make specific

changes. Otherwise, he/she can keep them without making any changes.

6.6.3 Pick Subprofiles

The single smart default setting works best when most users have preferences similar to the

average. However, our dataset shows considerable variability in participants’ privacy preferences—

a finding that is broadly reflected in the privacy literature [65]. This bring us to our clustering

solutions, which create separate default settings (in the form of subprofiles) for distinct groups of

users.

Our first approach in this regard is to have users manually select which privacy subprofiles

they prefer. Figure 6.14 shows the subprofile selection interface for the S set. Users can choose

either the “Minimal” or “Unconcerned” subprofile. Similar interfaces are provided for the F, A, and

G sets.

The subprofiles provided by this approach have a higher overall accuracy than the single

“smart” default described in Section 6.6.1, meaning that the user could possibly spend less effort

changing the settings. However, the user will have to select a subprofile for each dataset. This

highlights the importance of having a small number of subprofiles and making these subprofiles easy

to understand. That said, even with only two subprofiles per dataset, this can be a challenging task.

In the next two subsections, we address this problem by automatically selecting subprofiles based

on users’ answers to specific subprofile items (“direct prediction”) or questionnaire items (“indirect

prediction”).

6.6.4 Direct Prediction

For the direct prediction approach, we devise an interactive 4-question input sequence as

shown in Figure 6.15. Each screen asks the user to answer a specific permission question, which

guides the subprofile classification processes as outlined in Section 6.5.1. In effect, each question

informs the system about the user’s subprofile of one of the four datasets, which means that users

no longer have to manually pick the correct subprofiles. Specifically, users will be asked if they

agree to share their First name (for the A set recommendation), Activity (for the F set), Photos (for

the S set), and whether they allow their data to be used for Social purposes (for the G set). This

90

Yang He Dissertation

4-question interaction will aid the users in setting all of the 45 permissions in the system. Depending

on the answer to these questions, the user will subsequently see the settings screens with the defaults

set to the predicted profile. Users can still change specific settings if their preferences deviate from

the selected profile.

6.6.5 Indirect Prediction

For the indirect prediction approach, we take a similar approach, but the interactive 4-

question input sequence is based on the analysis of questionnaire items rather than permission

settings.

As shown in Figure 6.16, we selected 4 questions that yield the highest accuracy for each

permission set: a negotiability question for Phone permissions for the S set, a negotiability question

for the permission to share Sleep data for the F set, A question about sociability for the A set, and

a trust question for the G set. Negotiability and attitude have almost the same accuracy for G set,

so we chose attitude for diversity.

The benefit of the indirect prediction approach is that the user does not have to answer any

permission questions, not even the four needed to give a subprofile recommendation. Instead, the

user has to answer four questionnaire items.

6.7 Validation

We conducted a validation of these different approaches by running the recommendation

strategies on the 30 users in our holdout dataset. The resulting recommended privacy subprofiles

are then compared with their actual privacy preference. Figure 6.17 shows the average accuracies

of each of the presented approaches.

The Pick Profile approach reaches an 84.74% accuracy. This approach has the highest

accuracy, because only the error from the difference between the privacy profile and the users’

settings is counted, omitting the errors introduced by the user classification. This assumes that

users can classify themselves with perfect accuracy—this is likely an incorrect assumption.

Among recommendation approaches, the direct prediction approach is the most accurate,

averaging 83.41%. It almost yields no additional classification error compared to the Pick subprofile

approach. The indirect prediction approach has a significantly lower accuracy of 73.9%.

91

Yang He Dissertation

Finally, the single smart default approach uses only a single “profile”, circumventing the

need for classification. The default profile settings are shown in the ‘full data’ column of Figure 6.6.

The accuracy of this setting is lower than the accuracy of the subprofile solutions, but it does not

lose accuracy on classification. Hence, its accuracy is a respectable 68.7%, which is not much lower

than the indirect prediction approach.

The details about accuracies are provided in Table A1 in Appendix.

6.8 Summary

In this chapter, we have presented the following:

• The dataset we used and Data modeling to fitness IoT permissions.

• Using a data-driven approach to developing user permission profiles

• A series of recommendation strategies that we developed for privacy management including

direct prediction and more interestingly, indirect prediction using some user traits (users’

privacy attitudes, the negotiability of their preferences, and social influence).

One limitation of this work is that we have not tested the suitability of the recommendation

strategies from the user’s perspective. Specifically, we have conjectured that profile-based approaches

reduce the hassle of making privacy settings but that the manual selection of a privacy profile might

be difficult for a user. These conjectures should be evaluated in a user study, which we are currently

working on.

In the next chapter, we discuss the evaluation study for our household IoT privacy-setting

interface prototype.

92

Yang He Dissertation

Figure 6.11: Tree evaluation. Root mean square error for each J48 tree algorithm.

93

Yang He Dissertation

(a) A set (b) F set

(c) S set (d) G set

Figure 6.12: Manual settings

94

Yang He Dissertation

(a) A set (b) F set

(c) S set (d) G set

Figure 6.13: Smart Single settings.

95

Yang He Dissertation

(a) S set subprofiles (b) The “Minimal” subprofile (c) The “Unconcerned” subprofile

Figure 6.14: Interaction for picking a subprofile for the S set.

96

Yang He Dissertation

(a) A set (b) F set

(c) S set (d) G set

Figure 6.15: Direct Prediction questions. 97

Yang He Dissertation

(a) A set (b) F set

(c) S set (d) G set

Figure 6.16: Indirect Prediction questions. 98

Yang He Dissertation

Figure 6.17: Average accuracies of the recommender strategies on the holdout 30 users.

99

Chapter 7

Evaluate the Household IoT

Privacy-setting Profiles and User

Interfaces

7.1 Introduction

In the previous chapters, we have described the three studies on recommending privacy set-

tings for general/public IoT, household IoT, and fitness IoT, respectively. A “data-driven” approach

has been used in all three studies, to gain the underlying insights of IoT users’ privacy decision be-

havior, and to design a set of User interfaces (UI) to incorporate the “smart” privacy default/profiles

created based on the insights. Users can apply these smart defaults/profiles by either a single click

or by answering a few related questions. When applying this approach on the household IoT dataset

in Chapter 5, we explored the trade-off between parsimony and accuracy when creating the “smart”

privacy defaults/profiles. We manipulate the pruning parameter for the decision trees of the C4.5

algorithm, which impacted the complexity of the generated profiles based on the decision trees. Ac-

curacy is important to ensure that users’ privacy preferences are accurately captured and/or need

only few manual adjustments, while parsimony, on the other hand, prevents overfitting and promotes

fairness. In Chapter 5, we noticed that more complex models tended to increase overall accuracy by

predicting a few users’ preferences more accurately, with no effect on other users. Parsimony also

100

Yang He Dissertation

makes the associated default setting easier to understand for the user.

The biggest limitation of our work so far is that we did not test any of the proposed UIs,

so we do not know what level of complexity (both in terms of the user interface and the in terms of

the profiles) is most suitable. Thus, to further test this trade-off between accuracy and parsimony

in a real usage environment and test the user experience of using the interfaces that we designed in

Chapter 5, in this chapter, I address this limitation by discussing our final study on evaluating the

new interface prototypes of recommending privacy-settings for household IoT. The main purpose of

this study is to test the user experience of the privacy-setting profiles interfaces and defaults/profiles.

7.2 Study Design

In this section, we present the design our study, including the dependent variables and

manipulations.

7.2.1 Dependent Variables

To test the user experience of our privacy-setting interfaces, the Dependent Variable of our

study will be their satisfaction to the system, and the trust to the company, and several

subjective system aspects, including perceived usefulness, perceived ease of use, perceived

privacy threats, perceived control, perceived privacy helpfulness. As shown in Table 7.1,

all the scales of these dependent variable are adapted from previous work.

7.2.2 Manipulations

7.2.2.1 Interface complexity

In Chapter 5, we first designed a set of interfaces, as shown in Figure 5.6, based on the

results from our statistical analysis (UI1). Further, we modified these interfaces to integrate the

“smart defaults/profiles” generated from our machine learning results. This modification separated

the Storage and sharing modules from the Data usage, leading a slightly more complex interface

design (UI2). For our study, we need to test these two groups of interfaces (UI1 vs UI2) in terms of

interface complexity. Compared to UI1, UI2 has more granularity when setting on different storage.

In UI1, users can only configure all the privacy-settings to be the same for the three different types

101

Yang He Dissertation

of storage (Local, Remote, and Third-party sharing), while they can configure those setting for each

type of storage differently in UI2.

7.2.2.2 Profile complexity

In terms of the complexity of “smart defaults/profiles”, we consider 4 different experimental

conditions as follows:

• Everything-On: With all the data access and usage being turned on, this is considered as

the open default settings. This profile also means nothing has been done for the users. They

have to make every change for themselves.

• Everything-Off : With all the data access and usage being turned off, this is considered as

the most conservative default settings. In our previous studies, this is also the profile that

more than 50% participants want to use.

• Smart Default: One single “smart profile” will be provided to the users. This is considered

as the experimental condition with intermediate complexity.

• Smart Profiles: Multiple “smart profiles” will be provided to the users. This is considered

as the most sophisticated settings with high complexity in “smart profiles”.

These different default/profile conditions map to users’ “preference fit”, where smart profiles

and smart defaults conditions have better “preference fit” than the two baseline conditions. In

addition, smart profiles condition have more preference options for users to choose than smart

default. Thus, we expect smart profile to have the best fit/user satisfaction or other subject system

aspects, followed by smart defaults and the two baseline conditions.

From above, we have 2 different levels of interface complexity, and 4 different levels of profile

complexity. Hence, 4x2 = 8 total experimental conditions (i.e., user interfaces) will be presented to

the participants.

7.2.2.3 Profile/Interface Selection

Everything-Off and Everything On profiles can easily be implemented on both our

designed interfaces (UI1 and UI2).

102

Yang He Dissertation

For Smart Default and Smart Profiles selection, note that, when applying our machine

learning algorithms in Chapter 5, we have manipulated the pruning parameter to create different

“smart defaults/profiles”. This manipulation results in a set of smart profiles with different weight

in accuracy and parsimony. The more the decision tree is pruned, the less complex the resulting

“smart” profile will be, leading lower accuracy and high parsimony, and vice versa. Since we can

only choose one “smart default/profile” to test the interface, this selection needs to be done carefully.

Smart Default: In section 5.5.2, we have applied a one-rule algorithm to our dataset. The

resulting “smart default” in shown in Figure 5.7. This is the simplest “smart default” settings across

all the different “smart defaults” settings with lowest accuracy (61.39%) but highest parsimony. In

addition, this “smart default” can be easily integrate into both the UI1 and UI2. Thus, we choose

this “smart default” as the target interface for experimental conditions UI1:Smart Default and

UI2:Smart Default.

Smart Profiles: For “smart profiles” selection, we want this interface differ as much as

possible comparing the “smart default”, so we search across all the “smart profiles” with large

number of clusters. In addition, the “smart profiles” should be easily integrated into UI1 or UI2.

Figure 5.18 is considered for UI1 because it has 5 clusters with a high accuracy of 80.35%. And

it has no interaction between Storage and other parameters. This is suitable for our UI1 design,

serving as the target interface profiles for experimental condition — UI1:Smart Profiles. We have

separated the Data Storage and Data Usage modules in UI2. Thus, we choose Figure 5.25 for UI2

because it has 5 clusters, a close to highest accuracy of 83.11%. In addition, in the cluster 3, it has

a 2-way interaction between Storage and Purpose; in cluster 4, it has a 2-way interaction between

Storage and Action. It does not have an 3-way interaction between any of these parameters in

any of its clusters. Thus, we choose this set of “smart profiles” as the target interface profiles for

experimental condition — UI2:Smart Profiles. We implemented above 8 different sets of user

interfaces using HTML, PHP, CSS, and SQL.

7.2.3 Research Questions

Compared to UI1, UI2 has more granularity when setting on different storage. In UI1,

users can only configure all the privacy-settings to be the same for the three different types of

storage (Local, Remote, and Third-party sharing), while they can configure those setting for each

type of storage differently in UI2. Brandimarte et al. demonstrate that users perceive more control

103

Yang He Dissertation

when privacy controls are more granular [15]. But this control can at times be an illusion. More

granular controls allow users to set their privacy settings to a level that better reflects their privacy

preferences, this additional control may increase the perceived usefulness [109, 4]. Similarly, more

fine-grained control may reduce users’ perceived privacy threats. Tang et al. (2012) found that users

of a finer-grained settings interface were more comfortable with their privacy settings. Research has

also shown that increasing the control often introduces choice overload issues [54, 103, 2, 3], which

makes it more difficult and time-consuming for users to accurately their privacy settings [82, 101].

Therefore, here are our first research question:

RQ 1 Is there any significant difference between UI1 and UI2 on users experience and other sub-

jective system aspects when using our system?

In term of profile complexity, we have four different levels of experimental conditions. These

different default/profile conditions map to users’ “preference fit”. “Smart profiles” provide the users

more pre-configured options for users to choose from, leading to a better preference fit than the

“smart default” with only single “smart” option provided, which in turn has a better fit than

Everything-Off/Everything-On defaults. This additional freedom of choice and possible increased

preference fit may increase the perceived control and perceived usefulness. Similarly, the increased

preference fit may increase the level that the pre-configured profiles better reflect the users’ privacy

preferences, which may reduce the perceived privacy threats. The additional options in “smart

profiles” may introduce choice overload compared to the “smart default” and Everything-Off and

Everything-On defaults. This may lead to a low perceived ease of use for “smart profiles”. Compared

to Everything-Off and Everything-On defaults, “smart defaults” are generated from machine learning

analysis results. The higher accuracy of “smart defaults” can arguably result in fewer a lower manual

changes that users would make to the system, leading to a higher perceived ease of use compared to

Everything-Off and Everything-On defaults. Therefore, Here is our second research question:

RQ 2 Is there any significant difference between the 4 experimental conditions on users experience

and other subjective system aspects when using our system?

104

Yang He Dissertation

7.3 Experimental setup

In this section, we discuss the Experimental setup of our user study. This user study will

be a between-subject study, which takes about 15 – 20 minutes to finish. All participants will be

recruited via Amazon Mechanical Turk.

7.3.1 Participants and Procedures

Based on the power analysis results, to collect our dataset, 504 adult U.S.-based participants

were recruited through Amazon Mechanical Turk. Participation was restricted to Mechanical Turk

workers with a high reputation (at least 50 completed tasks, average accuracy of > 96%). Partici-

pants were paid $1.50 upon successful completion of the study. The participants were warned about

not getting paid in case they failed attention checks (see below). The study participants represented

a wide range of ages, with 44 aged 18-24, 298 aged 25-34, 116 aged 35-44, 29 aged 45-54, 12 aged

55-64, and 5 participants over 65 years old.

During the study 1, the participants were first welcomed with a brief introduction of the

experimental instructions. We explicitly introduce that the goal of this study is to test a new setting

interfaces for Smart Home Users.

Then each participant was shown a video with a brief introduction to various smart home

devices, which also mentioned various ways in which the different appliances would cooperate and

communicate within a home. After the video, participants were asked to answer two attention check

questions depicted in Figure A1 in the Appendix.

After the introduction video, each participant was shown the basics of our UI and usage

instructions, shown in Figure A2 in the Appendix. Then each participant was presented with

one privacy-setting user interface for household IoT that was randomly choosen from the previous

lydiscussed 8 different experimental conditions. Participants were asked to set all the privacy-settings

to best fit their own privacy preferences. They were required to spend at least 25 seconds before

they can leave the UI page and will be warned if they spent too little time on the UI page.

Finally, a post-test survey questionnaire was shown to each participant, asking their user

experience using our privacy-setting interfaces. The questionnaire included three groups of ques-

tions – experience (i.e. satisfaction with the system); Subjective System Aspects (SSA), including

1The user study url can be found here: http://iot.usabart.nl/yang

105

Yang He Dissertation

Perceived usefulness, Perceived ease of use, perceived privacy threats, perceived control); and per-

sonal/situational characteristics (General privacy concerns, Data Collection Concerns, Knowledge,

Rational Decision Style, and Emotional Decision Style). All items were adapted from previous pub-

lished studies with minor modifications in wording to accommodate the IoT privacy-setting context.

Each item was measured on five-point Likert scales with 1 being ”strongly disagree” to 7 being

”strongly agree”. All the items of the questionnaire are shown in Appendix.

7.4 Results

In this section, I present the statistical analysis results. I first discuss the confirmatory

factor analysis that I conducted to clean up the survey question items. Then I discuss the structural

equation model (SEM) that I used to analyze the effect of the independent variables on the subjec-

tive systems aspects and user experience. And finally, I present the effect of personal/situational

characteristics on the subjective systems aspects and user experience.

7.4.1 Confirmatory Factor Analysis

I conducted a Confirmatory Factor Analysis (CFA) and examined the validity and reliability

scores of the constructs measured in the study. I started with the saturated model, as shown in

Figure 7.1. Upon inspection of the CFA model, I removed items that have lowest R-square value

since this means that this item explains the least percentage of the scale.

During this process, th6 (low communality 0.121 ), e2 (low communality 0.221 ), e7 (low

communality 0.237), th5 (low communality 0.222), s8 (low communality 0.233), th2 (low communal-

ity 0.278), u4 (low communality 0.291), e4 (low communality 0.330), e5 (low communality 0.341),

e9 (low communality 0.311), tr2 (low communality 0.365), tr4 (low communality 0.353) tr8 (low

communality 0.209), tr3 (low communality 0.295), tr1 (low communality 0.246), s4 (low commu-

nality 0.317), s3 (low communality 0.274), s6 (low communality 0.298), e1 (low communality 0.308

), e10 (low communality 0.184), u2 (low communality 0.368), u7 (low communality 0.346), s5 (low

communality 0.383), s7 (low communality 0.327) were removed. Interestingly, after removing e10, I

removed h4 (communality 0.612) instead of h2 (low communality 0.227), because if h2 was removed,

both h1 and h3 need to be removed, leaving only one item in that scale. While if h4 was removed,

h1, h2, and h3 can all be kept in the model.

106

Yang He Dissertation

Figure 7.1: CFA Saturated Model.

Figure 7.2: Trimmed CFA Model.

And at the end, I also checked that no item has high cross-loadings with other factors. The

remaining scale items with their R-square value of the trimmed model are shown in Table 7.1. The

final trimmed model is shown in Figure 7.2.

Also, to ensure the convergent validity of constructs, I examined the average variance

extracted (AVE) of each construct. The AVEs were all higher than the recommended value of

0.50, indicating adequate convergent validity. To ensure discriminant validity, we ascertained that

the square root of the AVE for each construct was higher than the correlations of the construct

with other constructs. As shown in Figure 7.32, trust, satisfaction, perceived privacy threat, per-

ceived ease of use, and perceived control all have high correlation with each other (at least 0.746).

Out of them, perceived privacy threat and perceived ease of use have the lowest AVE but with

the highest correlation with other constructs. And if these two constructs are removed from the

model, the square root of the AVE for other construct will be higher than their correlations with

the left constructs, which indicates the discriminant validity. Thus, we removed perceived pri-

vacy threat and perceive ease of use from the final model. The model has a following model fit:

2on the diagonal is the sqrt(AVE)

107

Yang He Dissertation

Table 7.1: Factor Items in Trimmed CFA Model1

Construct Item Loading

System satisfaction

[53, 137, 136]

The system has no real benefit to me. Using the system is annoying. Using the system is a pleasant experience. Using the system makes me happy. Overall, I am satisfied with the system. I would recommend the system to others. I would use this system if it were available. I would pay a monthly fee to use this system. I would quickly abandon using this system. It would take a lot of convincing for me to use this system.

0.584 0.700

0.755 0.709

Trust

[55, 84]

I believe the company providing this software is trustworthy in handling my information. I believe this company tells the truth and fulfills promises related to the information I provide. I believe this company is predictable and consistent regarding the usage of my information. I believe this company is honest when it comes to using the information I provide. I think it is risky to give my information to this company. There is too much uncertainty associated with giving my information to this company. Providing this company my information would involve many unexpected problems. I feel safe giving my information to this company.

0.634 0.750 0.709

Perceived Usefulness

[25]

Based on what I have seen, the system is useful. The system helps me more effectively set my privacy preferences. The system gives me more control over my Smart home devices. The privacy setting task would be easier to finish with the help of this system. The system saves me time when I use it. The system meets my needs. The system does everything that I expect it to do.

0.741

0.483

0.527 0.580

Perceived Ease of Use

[25, 66]

It is convenient to set my preferences in the system. It requires the fewest mouse-clicks possible to set my privacy preferences with the system. It takes too many mouse-clicks to set my privacy preferences with the system. I was able to quickly set my privacy-setting preferences in the system. I feel setting my privacy preferences within the system is easy. I feel setting my preferences in the system was unnecessarily complex. I can set my privacy-setting preferences without written instructions. I felt lost using the system’s privacy settings. I felt this privacy-setting interface is designed for all levels of users. I can use the Privacy-setting interface successfully every time.

0.500

0.676

0.774

Perceived privacy Helpfulness [124]

The system helped me to decide what information I should disclose. The system explained how useful providing each piece of information was. The system helped me to make a tradeoff between privacy and usefulness. I felt clueless about what information to disclose.

0.626 0.561 0.629

Perceived Privacy Threat

[66]

I am afraid that I am sharing my personal information too freely, due to my privacy settings. I am comfortable with amount of data that is used/shared based on my settings. Due to the system, the manufacturer will know too much about me. Due to the system, third-parties will know too much about me. I made sure only information that I am comfortable with will be used or shared. My privacy settings are spot on; I am not disclosing too much to anyone. I fear that I have been too liberal in selecting my privacy settings.

0.602

0.557 0.612

0.668

Perceived Control [138]

I had limited control over the way this system made privacy settings. The system restricted me in my choice of settings. Compared to how I normally configure privacy settings, the system was very limited. I would like to have more control over the recommendations.

0.630 0.809 0.731 0.500

1 Grayed out items were removed during trimming

108

Yang He Dissertation

Figure 7.3: Factor Correlation Matrix (on the diagonal is the sqrt(AVE)).

Figure 7.4: Preliminary SEM Model with perceived privacy threats

χ2(125) = 298.507,p = .0000; RMSEA = 0.067, 90%CI : [0.058, 0.077],CFI = 0.975,TLI = 0.970.

7.4.2 Structural Equation Modeling

I tried many different SEM models. Figure 7.4 shows a model which contains perceived

privacy threat, perceived helpfulness, and perceived usefulness. This model has a good model fit.

However, the user experience variables — satisfaction and trust are both removed from the model

due to the high covariance between them and perceived privacy threat. Thus, this model was not

chosen.

As shown in Figure 7.5, I also tried another model with hypothesizing that there will be

109

Yang He Dissertation

Figure 7.5: Preliminary SEM Model with effect from manipulations to perceived control

104

User Experience (EXP)Subjective System Aspects (SSA)Objective System Aspects (OSA)

+

- 0.190 (0.057) **

+

Everything-Off vs.

Everything-On

Smart Defaults vs.

Everything-On

Smart Profiles vs.

Everything-On

Perceived Control

Perceived Helpfulness

+ 0.844 (0.176) ***

+ 0.679 (0.161) ***

+ 0.414 (0.175) *

+ 0.380 (0.074) ***

+ 0.528(0.079) ***

+ 0.149 (0.045) **

+ 0.752 (0.109)***

- 0.221 (0.110) *

+ 0.561 (0.036) ***

+ 0.820 (0.028) ***

+ 0.165 (0.054)**

Trust

Satisfaction

Perceived Usefulness

SEM Final Model

Figure 7.6: Trimmed structural equation model. ∗p < .05,∗ ∗ p < .01,∗ ∗ ∗p < .001.

significant effect from objective system aspects (experimental conditions/manipulations) to per-

ceived control since. Compared to the previous preliminary model, this model kept satisfaction and

perceived usefulness. Both of these factors are mediated by the perceived control and perceived help-

fulness. However, no significant effect from the manipulations to the perceived control was found.

Finally, as shown in Figure 7.6, we subjected the remaining 5 factors (Trust, Satisfaction,

Perceived Usefulness, Perceived Control, and Perceived Helpfulness) and the experimental condi-

tions to structural equation modeling, which simultaneously fits the factor measurement model and

the structural relations between factors and other variables. The model has following model fit:

χ2(176) = 284.160,p = .0000; RMSEA = 0.045, 90%CI : [0.035, 0.054],CFI = 0.986,TLI = 0.983.

110

Yang He Dissertation

Figure 7.7: Effects of profile complexity on perceived helpfulness

The model answers the two first two research questions — smart defaults/profiles manip-

ulation has a significant effect on the helpfulness of the system: Participants in Everything-Off,

Smart Defaults, and Smart Profiles conditions perceived more helpfulness than the Everything-On

condition. The two different UIs, however, do not have a significant effect on anything.

Figure 7.7 shows the effect of profile complexity on perceived helpfulness. Both Everything-

On and Everything-Off have been used as the baseline to test the significance of the effect. The

results shows that only the difference between pair (Everything-off – smart profiles) and pair (smart

profiles — smart defaults) are not significant. The differences between all other conditions are

significant.

The helpfulness is in turn related to users’ perceived control. Here we see that perceived

helpfulness has a negative effect on users’ perceived control. This indicates an interesting debate

between perceived control and perceived helpfulness from the users. More details about this debate

will be discussed in the next section. Figure 7.8 shows the total effects of profile complexity on

perceived control. All the effects are significant, which indicates that the effect of profile complexity

on perceived control is mediated by perceived helpfulness.

Both the perceived control and perceived helpfulness have a positive significant effect on

users’ satisfaction with the system. Figure 7.9 shows the total effects of profile complexity on

satisfaction. No significant effect was found. This, in the other hand, also proves that the trade-off

between perceived control and perceived helpfulness is canceling the effect from the manipulations

on satisfaction.

Perceived control, Perceived helpfulness, and the satisfaction all have significant positive

effects on users’ trust in the company. Figure 7.10 shows the total effects of profile complexity on

111

Yang He Dissertation

Figure 7.8: Total Effects of profile complexity on perceived control

Figure 7.9: Total Effects of profile complexity on satisfaction

trust. No significant effect was found. Thus, the effect from the manipulations on trust were also

being canceled by the debating between perceived control and perceived helpfulness.

Perceived control, Perceived helpfulness, and the satisfaction all determine perceived use-

fulness. Both satisfaction and perceived helpfulness have a positive significant effect on perceived

usefulness. Perceived control has a negative significant effect on perceived usefulness. Less perceived

control indicates there were more perceived helpfulness. And more perceived helpfulness could lead

to more perceived usefulness. Figure 7.11 shows the total effects of profile complexity on perceived

usefulness. All the effects are significant. This may be due to the effect from perceived helpfulness

on perceived usefulness is stronger than the effect from perceived control.

112

Yang He Dissertation

Figure 7.10: Total Effects of profile complexity on trust

Figure 7.11: Total Effects of profile complexity on usefulness

7.5 Discussion

In this chapter we conducted a systematic evaluation of the effect of several design param-

eters of a household IoT privacy settings interface on users’ evaluation of the system. In terms

of managerial implications, we find that it is useful to utilize the data-driven approach to develop

“smart defaults” and “smart profiles”, and corresponding setting interfaces to improve users’ ex-

perience and satisfaction. We did not find significant difference between the two UIs. A possible

explanation for this is that design differences between the UIs were very subtle.

Regarding the negative effect from perceived helpfulness on perceived control, there are a

few possible explanations: i) It is possible that participants in “smart defaults” and “smart profiles”

condition found their privacy-settings have already been set by default. Thus, they found those

“smart defaults” and “smart profiles” generated from previous study are helpful. However, on the

other hand, users may feel that their choice of settings have been limited due to these helpful pre-

settings. ii) It is also possible that users feel as though the pre-set settings are helpful but also

113

Yang He Dissertation

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Everything-On Everything-Off Smart Profiles Smart Defaults

Figure 7.12: Effect size of average time spent on different UI pages

complicated. They feel they have less control over these settings. These are also corresponding to

our previous discussion on the trade-off between accuracy and parsimony. A more parsimonious

profile would be easier to explain to the users and also make them less worried about the pre-

defined settings. So users are debating between the benefit brought by the “smart defaults” and

“smart profiles” and the control they like to have over these privacy settings of their household IoT

devices. iii) We also revisit the perceived scale that we used. It appears that all the items in that

scale are reversed framed and related to the spirit of “losing control” and having limited control

when using the system. One example is question C4, “I would like to have more control over the

recommendations”. Our system is all about make privacy setting recommendations. While we are

making the right recommendation about the IoT privacy settings for users, they might still want

more control and configure these settings by themselves, leading less control perceived.

Another interesting point is that the lack of difference between smart default and smart

profiles in the effect on the user experience and subjective system aspects. We first investigated the

time spent on the two conditions. Figure 7.12 shows the effect size of average time spent on different

UI pages. Although the average time spent on smart defaults is higher than the smart profiles

condition, the effect is not significant. So this could not be explained by the lack of interaction due

to less time spent on the smart profiles than the smart defaults.

The finding is still interesting since users are expected to spend less time on smart profiles.

And these lack of difference between smart defaults and smart profiles could be due to following

114

Yang He Dissertation

reasons:

i) Laziness? Since in our smart profiles conditions for both interfaces, there are detailed

description about the profile being showed. Thus, users would know what the setting will be like for

smart profile condition. Users would have to spend more cognitive effort to individually go ahead

and change the smart default settings provided, which means one would have to spend more time

and effort to get the settings to their preference. Here, people are not entirely confident whether

the recommended defaults are matching their preferences and hence have to go in manually anyway

to do so. Thus, this could lead more time spent on the smart defaults UI page and possible better

user experience.

ii) Accuracy? The second possible reason could be that the accuracy of the smart profiles

are better than the smart defauls that we used. The higher accuracy the pre-settings are, the more

likely users would make less changes and leave the settings as is, leading less time spent on the smart

profiles UI page.

ii) Endowment effect? Since the smart defaults are provided by the system, while in

the smart profile condition, users have to choose their own profile by reading the description of the

profile. So it is highly likely that they would try to stick to their choice as changing it now would

call for expending additional cognitive effort and would rather avoid it. Thus, they might think the

profile that they chose fits their preference the best and would not take a deeper look or give more

careful review on the settings in that profile.

There are several limitations to our work. First of all, our sample is not large enough to

carefully examine 2- or 3-way interaction effects in M-plus. We examined the effect of personal

characteristics on the user experience individually, and did not find any significant effect. A larger

sample is needed to test these effects and assure the robustness of our results. Second, we planned to

examined the behavior data that how users make the changes to the pre-defined settings. However,

due to the coding problem, we were not able to access data. This should be improved in the next

user study since users’ setting behavior could give us more hints to explain our previous findings.

From the results of this study, we encourage privacy researchers, policy-makers, and industry

executives to consider the effects of privacy settings interfaces on privacy outcomes. This study shows

that subtle changes in the design of such interfaces can have important subjective and behavioral

consequences. Careful design of these systems can help users better setting their IoT devices.

115

Chapter 8

Conclusion

In this dissertation, we first present three studies on recommending privacy settings for

different IoT environments, namely general/public IoT, household IoT, and fitness IoT, respectively.

We developed and utilized a “data-driven” approach in these three studies—We first use statistical

analysis and machine learning techniques on the collected user data to gain the underlying insights of

IoT users’ privacy decision behavior, and then create a set of “smart” privacy defaults/profiles based

on these insights. Finally, we design a set of interfaces to incorporate these privacy default/profiles.

Users can apply these smart defaults/profiles by either a single click or by answering a few related

questions. To address the limitation of lacking evaluation to the designed interfaces, we conducted a

user study to evaluate the new interfaces of recommending privacy-settings for household IoT users

The results shows that by using smart defaults and smart profile can significantly improve users’

experience, including satisfaction with the system, trust to the company. Our research can benefit

the IoT users, manufacturers, and researchers, privacy-setting interface designers and anyone who

wants to adopt IoT devices.

The main contributions of my dissertation are:

• User testing is often used to inform the development of user interfaces. Since the interface needs

to be developed for the IoT system does not yet exists, we developed a data-driven approach to

designing IoT privacy-setting interfaces for three different IoT environments, namely general

IoT, household IoT, and fitness IoT.

• Prior research has shown that the decision-making of IoT users are heavily depending on the

116

Yang He Dissertation

contextual parameter of the IoT usage scenario. Thus, we investigated the effect of IoT scenario

parameters on IoT users’ decision and attitudes to find out which contextual parameter is more

important in users’ decision making process. And based on the importance of the different

contextual parameters, we created a set of privacy-setting interfaces.

• Setting privacy-settings in these interfaces can still be complicated. To solve this problem, we

used decision tree algorithm to create smart defaults and developed several clustering algorithm

to group the users and created corresponding smart profiles for each group.

• During the process of creating smart defaults and smart profiles, we found that when the

decision tree of the smart defaults/profiles become complex, this smart defaults/profiles will

be difficult to explain to the users, leading bad decision making when choosing from provided

options. We explored the trade-off between accuracy and parsimony when creating smart

defaults/profile by manipulating the degree of pruning to the decision tree. We striked the

balance between higher accuracy and better explainability of the smart defaults/profiles.

• In Fitness IoT domain, we also created a series of strategies to recommend “smart profiles”

for users.

• Finally, we conducted a study to evaluate the designed interfaces in terms of interface complex-

ity and profile complexity. The results show that smart defaults and smart profiles integrated in

our privacy-setting interfaces significantly improved users experience compared to the baseline

condition.

This research can benefit IoT users, manufacturers, and researchers, privacy-setting interface

designers and anyone who wants to adopt IoT devices. I suggest the designers of future IoT privacy-

setting interface to make use of our data-driven approach and carefully consider the trade-off between

“smart defaults” and “smart profiles”. “smart profiles” and “smart defaults” can be the viable route

for designing future IoT privacy-setting interface. When designing their own setting interfaces and

smart defaults/profiles, the effect of interface complexity and profile complexity should be carefully

investigated based on their own user groups, dataset, and contexts.

117

Bibliography

[1] Alessandro Acquisti and Ralph Gross. Imagined communities: Awareness, information sharing, and privacy on the facebook. In International workshop on privacy enhancing technologies, pages 36–58, 2006.

[2] Alessandro Acquisti and Jens Grossklags. Privacy and rationality in individual decision mak- ing. IEEE security & privacy, 3(1):26–33, 2005.

[3] Alessandro Acquisti and Jens Grossklags. What can behavioral economics teach us about privacy. Digital privacy: theory, technologies and practices, 18:363–377, 2007.

[4] Adai Mohammad Al-Momani, Moamin A Mahmoud, and S Ahmad. Modeling the adoption of internet of things services: A conceptual framework. International Journal of Applied Research, 2(5):361–367, 2016.

[5] Hazim Almuhimedi, Florian Schaub, Norman Sadeh, Idris Adjerid, Alessandro Acquisti, Joshua Gluck, Lorrie Faith Cranor, and Yuvraj Agarwal. Your location has been shared 5,398 times!: A field study on mobile app privacy nudging. In Proceedings of the 33rd annual ACM conference on human factors in computing systems, pages 787–796. ACM, 2015.

[6] Denise Anthony, Tristan Henderson, and David Kotz. Privacy in location-aware computing environments. IEEE Pervasive Computing, (4):64–72, 2007.

[7] Kevin Ashton et al. That ‘internet of things’ thing. RFID journal, 22(7):97–114, 2009.

[8] Luigi Atzori, Antonio Iera, and Giacomo Morabito. The internet of things: A survey. Computer networks, 54(15):2787–2805, 2010.

[9] Naveen Farag Awad and M. S. Krishnan. The Personalization Privacy Paradox: An Empir- ical Evaluation of Information Transparency and the Willingness to be Profiled Online for Personalization. MIS Quarterly, 30(1):13–28, March 2006.

[10] Paritosh Bahirat and Yangyang He. Exploring defaults and framing effects on privacy decision making in smarthomes. In Proceedings of the SOUPS 2018 Workshop on the Human aspects of Smarthome Security and Privacy (WSSP), 2018.

[11] Paritosh Bahirat, Yangyang He, Abhilash Menon, and Bart Knijnenburg. A Data-Driven Approach to Developing IoT Privacy-Setting Interfaces. In 23rd International Conference on Intelligent User Interfaces, IUI ’18, pages 165–176, Toyko, Japan, 2018. ACM.

[12] Hans H Bauer, Tina Reichardt, Stuart J Barnes, and Marcus M Neumann. Driving consumer acceptance of mobile marketing: A theoretical framework and empirical study. Journal of electronic commerce research, 6(3):181, 2005.

118

Yang He Dissertation

[13] Michael Benisch, Patrick Gage Kelley, Norman Sadeh, and Lorrie Faith Cranor. Capturing location-privacy preferences: quantifying accuracy and user-burden tradeoffs. Personal and Ubiquitous Computing, 15(7):679–694, 2011.

[14] Alastair R Beresford, Andrew Rice, Nicholas Skehin, and Ripduman Sohan. Mockdroid: trad- ing privacy for application functionality on smartphones. In Proceedings of the 12th workshop on mobile computing systems and applications, pages 49–54. ACM, 2011.

[15] Laura Brandimarte, Alessandro Acquisti, and George Loewenstein. Misplaced confidences: Privacy and the control paradox. Social Psychological and Personality Science, 4(3):340–347, 2013.

[16] Ajay Brar and Judy Kay. Privacy and security in ubiquitous personalized applications. School of Information Technologies, University of Sydney, 2004.

[17] C Brodie, CM Karat, and J Karat. How personalization of an e-commerce website affects consumer trust. Designing Personalized User Experience for eCommerce, Karat, J., Ed. Dor- drecht, Netherlands: Kluwer Academic Publishers, pages 185–206, 2004.

[18] Supriyo Chakraborty, Chenguang Shen, Kasturi Rangan Raghavan, Yasser Shoukry, Matt Millar, and Mani B Srivastava. ipshield: A framework for enforcing context-aware privacy. In NSDI, pages 143–156, 2014.

[19] Amir Chaudhry, Jon Crowcroft, Heidi Howard, Anil Madhavapeddy, Richard Mortier, Hamed Haddadi, and Derek McAuley. Personal data: thinking inside the box. In Proceedings of The Fifth Decennial Aarhus Conference on Critical Alternatives, pages 29–32. Aarhus University Press, 2015.

[20] Ramnath K. Chellappa and Raymond G. Sin. Personalization versus privacy: An empirical examination of the online consumer’s dilemma. Information Technology and Management, 6(2-3):181–202, 2005.

[21] Shanzhi Chen, Hui Xu, Dake Liu, Bo Hu, and Hucheng Wang. A vision of iot: Applica- tions, challenges, and opportunities with china perspective. IEEE Internet of Things journal, 1(4):349–359, 2014.

[22] Richard Chow, Serge Egelman, Raghudeep Kannavara, Hosub Lee, Suyash Misra, and Edward Wang. HCI in Business: A Collaboration with Academia in IoT Privacy. In Fiona Fui- Hoon Nah and Chuan-Hoo Tan, editors, HCI in Business, number 9191 in Lecture Notes on Computer Science. Springer, 2015.

[23] Mary J Culnan. ” how did they get my name?”: An exploratory investigation of consumer attitudes toward secondary information use. MIS quarterly, pages 341–363, 1993.

[24] Nigel Davies, Nina Taft, Mahadev Satyanarayanan, Sarah Clinch, and Brandon Amos. Privacy Mediators: Helping IoT Cross the Chasm. In Proceedings of the 17th International Workshop on Mobile Computing Systems and Applications, HotMobile ’16, pages 39–44, New York, NY, USA, 2016. ACM.

[25] Fred D Davis. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly, pages 319–340, 1989.

[26] Fred D Davis, Richard P Bagozzi, and Paul R Warshaw. User acceptance of computer tech- nology: a comparison of two theoretical models. Management science, 35(8):982–1003, 1989.

119

Yang He Dissertation

[27] Cailing Dong, Hongxia Jin, and Bart P Knijnenburg. Ppm: A privacy prediction model for online social networks. In International Conference on Social Informatics, pages 400–420, 2016.

[28] Julia Brande Earp, Annie I Antón, Lynda Aiman-Smith, and William H Stufflebeam. Exam- ining internet privacy policies within the context of user privacy values. IEEE Transactions on Engineering Management, 52(2):227–237, 2005.

[29] Nathan Eddy. Gartner: 21 billion iot devices to invade by 2020. InformationWeek, Nov, 10, 2015.

[30] Serge Egelman, Janice Tsai, Lorrie Faith Cranor, and Alessandro Acquisti. Timing is every- thing?: the effects of timing and placement of online privacy indicators. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 319–328. ACM, 2009.

[31] Opher Etzion and Fabiana Forunier. On the personalization of event-based systems. In Pro- ceedings of the 1st ACM International Workshop on Human Centered Event Understanding from Multimedia, pages 45–48. ACM, 2014.

[32] Lujun Fang and Kristen LeFevre. Privacy wizards for social networking sites. In Proceedings of the 19th international conference on World wide web, pages 351–360, 2010.

[33] NK Fantana, Till Riedel, Jochen Schlick, Stefan Ferber, Jürgen Hupp, Stephen Miles, Florian Michahelles, and Stefan Svensson. Iot applications—value creation for industry. Internet of Things: Converging Technologies for Smart Environments and Integrated Ecosystems, page 153, 2013.

[34] Adrienne Porter Felt, Elizabeth Ha, Serge Egelman, Ariel Haney, Erika Chin, and David Wagner. Android permissions: User attention, comprehension, and behavior. In Proceedings of the eighth symposium on usable privacy and security, pages 1–14. ACM, 2012.

[35] Denis Feth, Andreas Maier, and Svenja Polst. A User-Centered Model for Usable Security and Privacy. In Theo Tryfonas, editor, Human Aspects of Information Security, Privacy and Trust, Lecture Notes in Computer Science, pages 74–89. Springer International Publishing, 2017.

[36] forbes. Iot: Don’t forget privacy and security while racing to the price bottom, 2017. [Online; accessed 1-Feb-2019].

[37] Huiqing Fu, Yulong Yang, Nileema Shingte, Janne Lindqvist, and Marco Gruteser. A field study of run-time location access disclosures on android smartphones. Proc. Usable Security (USEC), 14:10–pp, 2014.

[38] Steven Furnell. Managing privacy settings: lots of options, but beyond control? Computer Fraud & Security, 2015(4):8–13, 2015.

[39] Lingling Gao and Xuesong Bai. A unified perspective on the factors influencing consumer acceptance of internet of things technology. Asia Pacific Journal of Marketing and Logistics, 26(2):211–231, 2014.

[40] David Gefen, Elena Karahanna, and Detmar W Straub. Trust and tam in online shopping: An integrated model. MIS quarterly, 27(1):51–90, 2003.

[41] Hemant Ghayvat, S.C. Mukhopadhyay, Jie Liu, Arun Babu, Md Alahi, and Xiang Gui. Internet of things for smart homes and buildings: Opportunities and challenges. Australian Journal of Telecommunications and the Digital Economy, 3:33–47, 12 2015.

120

Yang He Dissertation

[42] Susan E Gindin. Nobody reads your privacy policy or online contract: Lessons learned and questions raised by the ftc’s action against sears. Nw. J. Tech. & Intell. Prop., 8:1, 2009.

[43] Nathaniel Good, Rachna Dhamija, Jens Grossklags, David Thaw, Steven Aronowitz, Deirdre Mulligan, and Joseph Konstan. Stopping Spyware at the Gate: A User Study of Privacy, Notice and Spyware. In Proceedings of the 2005 Symposium on Usable Privacy and Security, pages 43–52, 2005.

[44] ACQUITY GROUP et al. The internet of things: The future of consumer adoption. ACQUITY GROUP, 2014.

[45] Dominique Guinard, Vlad Trifa, Friedemann Mattern, and Erik Wilde. From the internet of things to the web of things: Resource-oriented architecture and best practices. In Architecting the Internet of things, pages 97–129. Springer, 2011.

[46] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10–18, 2009.

[47] Moeen Hassanalieragh, Alex Page, Tolga Soyata, Gaurav Sharma, Mehmet Aktas, Gonzalo Mateos, Burak Kantarci, and Silvana Andreescu. Health monitoring and management using internet-of-things (iot) sensing with cloud-based processing: Opportunities and challenges. In 2015 IEEE International Conference on Services Computing, pages 285–292. IEEE, 2015.

[48] Yangyang He, Paritosh Bahirat, Abhilash Menon, and Bart P Knijnenburg. A data driven approach to designing for privacy in household iot. Transactions on Interactive Intelligent Systems, 2018.

[49] Alexander Henka, Lukas Smirek, and Gottfried Zimmermann. Personalizing smart environ- ments. In Proceedings of the 6th International Conference on the Internet of Things, pages 159–160. ACM, 2016.

[50] Shuk Ying Ho and Kar Tam. Understanding the Impact of Web Personalization on User Information Processing and Decision Outcomes. MIS Quarterly, 30(4):865–890, December 2006.

[51] Robert C. Holte. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11(1):63–90, Apr 1993.

[52] Kai-Lung Hui, Bernard C. Y. Tan, and Chyan-Yee Goh. Online information disclosure: Moti- vators and measurements. ACM Transactions on Internet Technology, 6(4):415–441, November 2006.

[53] Kai-Lung Hui, Bernard CY Tan, and Chyan-Yee Goh. Online information disclosure: Moti- vators and measurements. ACM Transactions on Internet Technology (TOIT), 6(4):415–441, 2006.

[54] Sheena S Iyengar and Mark R Lepper. When choice is demotivating: Can one desire too much of a good thing? Journal of personality and social psychology, 79(6):995, 2000.

[55] Sirkka L Jarvenpaa, Noam Tractinsky, and Lauri Saarinen. Consumer trust in an internet store: A cross-cultural validation. Journal of Computer-Mediated Communication, 5(2):JCMC526, 1999.

121

Yang He Dissertation

[56] Prem Prakash Jayaraman, Xuechao Yang, Ali Yavari, Dimitrios Georgakopoulos, and Xun Yi. Privacy preserving Internet of Things: From privacy techniques to a blueprint architecture and efficient implementation. Future Generation Computer Systems, 76:540–549, November 2017.

[57] Carlos Jensen and Colin Potts. Privacy Policies as Decision-Making Tools: An Evaluation of Online Privacy Notices. In 2004 Conference on Human Factors in Computing Systems, pages 471–478, 2004.

[58] Xiaolin Jia, Quanyuan Feng, Taihua Fan, and Quanshui Lei. Rfid technology and its appli- cations in internet of things (iot). In Consumer Electronics, Communications and Networks (CECNet), 2012 2nd International Conference on, pages 1282–1285. IEEE, 2012.

[59] Patrick Kelley, Sunny Consolvo, Lorrie Cranor, Jaeyeon Jung, Norman Sadeh, and David Wetherall. A conundrum of permissions: installing applications on an android smartphone. Financial cryptography and data security, pages 68–79, 2012.

[60] Sean Dieter Tebje Kelly, Nagender Kumar Suryadevara, and Subhas Chandra Mukhopadhyay. Towards the implementation of iot for environmental condition monitoring in homes. IEEE Sensors Journal, 13(10):3846–3853, 2013.

[61] Bart P. Knijnenburg. A user-tailored approach to privacy decision support. Ph.D. Thesis, University of California, Irvine, Irvine, CA, 2015.

[62] Bart P Knijnenburg. Privacy? i can’t even! making a case for user-tailored privacy. IEEE Security & Privacy, 15(4):62–67, 2017.

[63] Bart P. Knijnenburg. Privacy? I Can’t Even! Making a Case for User-Tailored Privacy. IEEE Security & Privacy, 15(4):62–67, 2017.

[64] Bart P Knijnenburg and Alfred Kobsa. Helping users with information disclosure decisions: potential for adaptation. In Proceedings of the 2013 international conference on Intelligent user interfaces, pages 407–416. ACM, 2013.

[65] Bart P Knijnenburg, Alfred Kobsa, and Hongxia Jin. Dimensionality of information disclosure behavior. International Journal of Human-Computer Studies, 71(12):1144–1162, 2013.

[66] Bart Piet Knijnenburg and Alfred Kobsa. Increasing sharing tendency without reducing sat- isfaction: finding the best privacy-settings user interface for social networks. 2014.

[67] Alfred Kobsa, Ramnath K Chellappa, and Sarah Spiekermann. Privacy-enhanced personaliza- tion. In CHI’06 extended abstracts on Human factors in computing systems, pages 1631–1634. ACM, 2006.

[68] Alfred Kobsa, Hichang Cho, and Bart P. Knijnenburg. The Effect of Personalization Provider Characteristics on Privacy Attitudes and Behaviors: An Elaboration Likelihood Model Ap- proach. Journal of the Association for Information Science and Technology, 67:2587–2606, February 2016.

[69] Trupti M Kodinariya and Prashant R Makwana. Review on determining number of cluster in k-means clustering. International Journal, 1(6):90–95, 2013.

[70] Robert S Laufer and Maxine Wolfe. Privacy as a concept and a social issue: A multidimensional developmental theory. Journal of social Issues, 33(3):22–42, 1977.

122

Yang He Dissertation

[71] Mihai T Lazarescu. Design of a wsn platform for long-term environmental monitoring for iot applications. IEEE Journal on emerging and selected topics in circuits and systems, 3(1):45– 54, 2013.

[72] Hosub Lee and Alfred Kobsa. Understanding user privacy in internet of things environments. 2016 IEEE 3rd World Forum on Internet of Things (WF-IoT), pages 407–412, 2016.

[73] Wonjun Lee and Seungjae Shin. An empirical study of consumer adoption of internet of things services. INTERNATIONAL JOURNAL OF ENGINEERING AND TECHNOLOGY INNOVATION, 9(1):1–11, 2019.

[74] Woojin Lee, Lina Xiong, and Clark Hu. The effect of facebook users’ arousal and valence on intention to go to the festival: Applying an extension of the technology acceptance model. International Journal of Hospitality Management, 31(3):819–827, 2012.

[75] Xu Li, Rongxing Lu, Xiaohui Liang, Xuemin Shen, Jiming Chen, and Xiaodong Lin. Smart community: an internet of things application. IEEE Communications Magazine, 49(11), 2011.

[76] Yao Li, Alfred Kobsa, Bart P Knijnenburg, and MH Carolyn Nguyen. Cross-cultural privacy prediction. Proceedings on Privacy Enhancing Technologies, 2:93–112, 2017.

[77] Jiunn-Woei Lian. Critical factors for cloud based e-invoice service adoption in taiwan: An empirical study. International Journal of Information Management, 35(1):98–109, 2015.

[78] Jialiu Lin, Bin Liu, Norman Sadeh, and Jason I Hong. Modeling users’ mobile app privacy preferences: Restoring usability in a sea of permission settings. In Symposium on Usable Privacy and Security (SOUPS), pages 199–212, 2014.

[79] Bin Liu, Mads Schaarup Andersen, Florian Schaub, Hazim Almuhimedi, Shikun (Aerin) Zhang, Norman Sadeh, Yuvraj Agarwal, and Alessandro Acquisti. Follow My Recommendations: A Personalized Privacy Assistant for Mobile App Permissions. In Proceedings of the 2016 Symposium on Usable Privacy and Security, 2016.

[80] Franco Loi, Arunan Sivanathan, Hassan Habibi Gharakheili, Adam Radford, and Vijay Sivara- man. Systematically Evaluating Security and Privacy for Consumer IoT Devices. In Proceed- ings of the 2017 Workshop on Internet of Things Security and Privacy, IoTS&P ’17, pages 1–6, New York, NY, USA, 2017. ACM.

[81] Chris Lu. Overview of security and privacy issues in the internet of things, 2014.

[82] Michelle Madejski, Maritza Johnson, and Steven M Bellovin. A study of privacy settings errors in an online social network. In Pervasive Computing and Communications Workshops (PERCOM Workshops), 2012 IEEE International Conference on, pages 340–345. IEEE, 2012.

[83] Carlo Maria Medaglia and Alexandru Serbanati. An overview of privacy and security issues in the internet of things. In The Internet of Things, pages 389–395. Springer, 2010.

[84] Miriam J Metzger. Privacy, trust, and disclosure: Exploring barriers to electronic commerce. Journal of computer-mediated communication, 9(4):JCMC942, 2004.

[85] George R Milne and Mary J Culnan. Strategies for reducing online privacy risks: Why con- sumers read (or don’t read) online privacy notices. Journal of interactive marketing, 18(3):15– 29, 2004.

[86] Monika Mital, Victor Chang, Praveen Choudhary, Armando Papa, and Ashis K Pani. Adoption of internet of things in india: A test of competing models using a structured equation modeling approach. Technological Forecasting and Social Change, 136:339–346, 2018.

123

Yang He Dissertation

[87] Helen Nissenbaum. Privacy as Contextual Integrity Symposium - Technology, Values, and the Justice System. Washington Law Review, 79:119–158, 2004.

[88] Judith S Olson, Jonathan Grudin, and Eric Horvitz. A study of preferences for sharing and privacy. In CHI’05 extended abstracts on Human factors in computing systems, pages 1985– 1988, 2005.

[89] Gautham Pallapa, Sajal K Das, Mario Di Francesco, and Tuomas Aura. Adaptive and context- aware privacy preservation exploiting user interactions in smart environments. Pervasive and Mobile Computing, 12:232–243, 2014.

[90] Yangil Park and Jengchung V Chen. Acceptance and adoption of the innovative use of smart- phone. Industrial Management & Data Systems, 107(9):1349–1365, 2007.

[91] Charith Perera, Ciaran McCormick, Arosha K. Bandara, Blaine A. Price, and Bashar Nu- seibeh. Privacy-by-Design Framework for Assessing Internet of Things Applications and Plat- forms. In Proceedings of the 6th International Conference on the Internet of Things, IoT’16, pages 83–92, New York, NY, USA, 2016. ACM.

[92] Andreas Pfitzmann and Marit Köhntopp. Anonymity, unobservability, and pseudonymity—a proposal for terminology. In Designing privacy enhancing technologies, pages 1–9. Springer, 2001.

[93] Joseph Phelps, Glen Nowak, and Elizabeth Ferrell. Privacy Concerns and Consumer Will- ingness to Provide Personal Information. Journal of Public Policy & Marketing, 19(1):27–41, 2000.

[94] Tero Pikkarainen, Kari Pikkarainen, Heikki Karjaluoto, and Seppo Pahnila. Consumer accep- tance of online banking: an extension of the technology acceptance model. Internet research, 14(3):224–235, 2004.

[95] Michael E Porter and James E Heppelmann. How smart, connected products are transforming competition. Harvard business review, 92(11):64–88, 2014.

[96] Gil Press. Internet of things by the numbers: Market estimates and forecasts, 2014.

[97] Frederic Raber, Alexander De Luca, and Moritz Graus. Privacy wedges: Area-based audience selection for social network posts. In Proceedings of the 2016 Symposium on Usable Privacy and Security, 2016.

[98] Rupak Rauniar, Greg Rawski, Jei Yang, and Ben Johnson. Technology acceptance model (tam) and social media usage: an empirical study on facebook. Journal of Enterprise Information Management, 27(1):6–30, 2014.

[99] Ramprasad Ravichandran, Michael Benisch, Patrick Gage Kelley, and Norman M Sadeh. Cap- turing social networking privacy preferences. In Proceedings of the 2009 Symposium on Usable Privacy and Security, pages 1–18, 2009.

[100] Luke Russell, Rafik Goubran, and Felix Kwamena. Personalization using sensors for prelim- inary human detection in an iot environment. In Distributed Computing in Sensor Systems (DCOSS), 2015 International Conference on, pages 236–241. IEEE, 2015.

[101] Norman Sadeh, Jason Hong, Lorrie Cranor, Ian Fette, Patrick Kelley, Madhu Prabaker, and Jinghai Rao. Understanding and capturing people’s privacy policies in a mobile social net- working application. Personal Ubiquitous Comput., 13(6):401–412, August 2009.

124

Yang He Dissertation

[102] R. S. Sandhu and P. Samarati. Access control: principle and practice. IEEE Communications Magazine, 32(9):40–48, 1994.

[103] Barry Schwartz. The paradox of choice: Why more is less, volume 6. HarperCollins New York, 2004.

[104] Xiaopu Shang, Runtong Zhang, and Ying Chen. Internet of things (iot) service architecture and its application in e-commerce. Journal of Electronic Commerce in Organizations (JECO), 10(3):44–55, 2012.

[105] Hong Sheng, Fiona Fui-Hoon Nah, and Keng Siau. An Experimental Study on Ubiquitous com- merce Adoption: Impact of Personalization and Privacy Concerns. Journal of the Association for Information Systems, 9(6):344–376, June 2008.

[106] N. Craig Smith, Daniel G. Goldstein, and Eric J. Johnson. Choice Without Awareness: Ethical and Policy Implications of Defaults. Journal of Public Policy & Marketing, 32(2):159–172, 2013.

[107] Ludovico Solima, Maria Rosaria Della Peruta, and Manlio Del Giudice. Object-generated content and knowledge sharing: the forthcoming impact of the internet of things. Journal of the Knowledge Economy, 7(3):738–752, 2016.

[108] Juliana Sutanto, Elia Palme, Chuan-Hoo Tan, and Chee Wei Phang. Addressing the Personalization-Privacy Paradox: An Empirical Assessment from a Field Experiment on Smartphone Users. MIS Quarterly, 37(4):1141–1164, 2013.

[109] Karen Tang, Jason Hong, and Dan Siewiorek. The implications of offering more disclosure choices for social location sharing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 391–394. ACM, 2012.

[110] David G Taylor, Donna F Davis, and Ravi Jillapalli. Privacy concern and online personaliza- tion: The moderating effects of information control and compensation. Electronic commerce research, 9(3):203–223, 2009.

[111] Max Teltzrow and Alfred Kobsa. Impacts of User Privacy Preferences on Personalized Systems: a Comparative Study. In Clare-Marie Karat, Jan Blom, and John Karat, editors, Designing Personalized User Experiences for eCommerce, pages 315–332. Kluwer Academic Publishers, Dordrecht, Netherlands, 2004. DOI 10.1007/1-4020-2148-8 17.

[112] The European Parliament and the Council of the European Union. Regulation (eu) 2016/679 of the european parliament and of the council. Official Journal of the European Union, page 1:88, 2016.

[113] Horst Treiblmaier and Irene Pollach. Users’ Perceptions of Benefits and Costs of Personaliza- tion. In ICIS 2007 Proceedings, 2007.

[114] Lynn Tsai, Primal Wijesekera, Joel Reardon, Irwin Reyes, Serge Egelman, David Wagner, Nathan Good, and Jung-Wei Chen. Turtle guard: Helping android users apply contextual privacy preferences. In Symposium on Usable Privacy and Security (SOUPS), 2017.

[115] Virpi Kristiina Tuunainen, Olli Pitkänen, and Marjaana Hovi. Users’ awareness of privacy on online social networking sites-case facebook. Bled 2009 Proceedings, page 42, 2009.

[116] Dieter Uckelmann, Mark Harrison, and Florian Michahelles. An architectural approach to- wards the future internet of things. In Architecting the internet of things, pages 1–24. Springer, 2011.

125

Yang He Dissertation

[117] Blase Ur, Jaeyeon Jung, and Stuart Schechter. Intruders Versus Intrusiveness: Teens’ and Parents’ Perspectives on Home-entryway Surveillance. In Proceedings of the 2014 ACM In- ternational Joint Conference on Pervasive and Ubiquitous Computing, UbiComp ’14, pages 129–139, New York, NY, USA, 2014. ACM.

[118] Thibaut Vallée, Karima Sedki, Sylvie Despres, M-Christine Jaulant, Karim Tabia, and Adrien Ugon. On personalization in iot. In Computational Science and Computational Intelligence (CSCI), 2016 International Conference on, pages 186–191. IEEE, 2016.

[119] Gregg Vanderheiden and Jutta Treviranus. Creating a global public inclusive infrastructure. In International Conference on Universal Access in Human-Computer Interaction, pages 517– 526. Springer, 2011.

[120] Viswanath Venkatesh and Susan A Brown. A longitudinal investigation of personal computers in homes: adoption determinants and emerging challenges. MIS quarterly, pages 71–102, 2001.

[121] Viswanath Venkatesh, Michael G Morris, Gordon B Davis, and Fred D Davis. User acceptance of information technology: Toward a unified view. MIS quarterly, pages 425–478, 2003.

[122] Vassilios S Verykios, Elisa Bertino, Igor Nai Fovino, Loredana Parasiliti Provenza, Yucel Say- gin, and Yannis Theodoridis. State-of-the-art in privacy preserving data mining. ACM Sigmod Record, 33(1):50–57, 2004.

[123] Michele Vescovi, Corrado Moiso, Mattia Pasolli, Lorenzo Cordin, and Fabrizio Antonelli. Building an eco-system of trusted services via user control and transparency on personal data. In IFIP International Conference on Trust Management, pages 240–250. Springer, 2015.

[124] Weiquan Wang and Izak Benbasat. Recommendation agents for electronic commerce: Effects of explanation facilities on trusting beliefs. Journal of Management Information Systems, 23(4):217–246, 2007.

[125] Weiquan Wang and Izak Benbasat. Interactive decision aids for consumer decision making in e-commerce: The influence of perceived strategy restrictiveness. MIS quarterly, pages 293–320, 2009.

[126] Jason Watson, Andrew Besmer, and Heather Richter Lipford. +Your circles: sharing behavior on Google+. In Proceedings of the 8th Symposium on Usable Privacy and Security, pages 12:1–12:10, 2012.

[127] Vishanth Weerakkody, Ramzi El-Haddadeh, Faris Al-Sobhi, Mahmud Akhter Shareef, and Yogesh K Dwivedi. Examining the influence of intermediaries in facilitating e-government adoption: An empirical investigation. International Journal of Information Management, 33(5):716–725, 2013.

[128] Bruce D Weinberg, George R Milne, Yana G Andonova, and Fatima M Hajjat. Internet of things: Convenience vs. privacy and secrecy. Business Horizons, 58(6):615–624, 2015.

[129] Primal Wijesekera, Arjun Baokar, Lynn Tsai, Joel Reardon, Serge Egelman, David Wagner, and Konstantin Beznosov. The feasibility of dynamically granted permissions: Aligning mobile privacy with user preferences. In Security and Privacy (SP), 2017 IEEE Symposium on, pages 1077–1093. IEEE, 2017.

[130] Meredydd Williams, Jason RC Nurse, and Sadie Creese. The perfect storm: The privacy paradox and the internet-of-things. In 11th International Conference on Availability, Reliability and Security, pages 644–652, 2016.

126

Yang He Dissertation

[131] Pamela Wisniewski, Bart P Knijnenburg, and H Richter Lipford. Profiling facebook users privacy behaviors. In SOUPS2014 Workshop on Privacy Personas and Segmentation, 2014.

[132] Pamela J Wisniewski, Bart P Knijnenburg, and Heather Richter Lipford. Making privacy personal: Profiling social network users to inform privacy education and nudging. International Journal of Human-Computer Studies, 98:95–108, 2017.

[133] Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J Pal. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016.

[134] Barbara H Wixom and Peter A Todd. A theoretical integration of user satisfaction and technology acceptance. Information systems research, 16(1):85–102, 2005.

[135] Peter Worthy, Ben Matthews, and Stephen Viller. Trust Me: Doubts and Concerns Living with the Internet of Things. In Proceedings of the 2016 ACM Conference on Designing Interactive Systems, DIS ’16, pages 427–434, New York, NY, USA, 2016. ACM.

[136] Heng Xu, Xin Robert Luo, John M Carroll, and Mary Beth Rosson. The personalization pri- vacy paradox: An exploratory study of decision making process for location-aware marketing. Decision support systems, 51(1):42–52, 2011.

[137] Heng Xu, Hock-Hai Teo, Bernard CY Tan, and Ritu Agarwal. The role of push-pull technology in privacy calculus: the case of location-based services. Journal of management information systems, 26(3):135–174, 2009.

[138] Heng Xu, Hock-Hai Teo, Bernard CY Tan, and Ritu Agarwal. Research note—effects of indi- vidual self-protection, industry self-regulation, and government regulation on privacy concerns: a study of location-based services. Information Systems Research, 23(4):1342–1363, 2012.

[139] Tianlong Yu, Vyas Sekar, Srinivasan Seshan, Yuvraj Agarwal, and Chenren Xu. Handling a trillion (unfixable) flaws on a billion devices: Rethinking network security for the internet-of- things. In Proceedings of the 14th ACM Workshop on Hot Topics in Networks, page 5. ACM, 2015.

127

Appendices

128

Yang He Dissertation

Figure A1: Attention Check Question of Evaluating privacy-setting UI for Household IoT

129

Yang He Dissertation

Table A1: Table of Accuracies.

Pick Profile Single Smart Default

Direct pre- diction

Privacy At- titude

Social Be- haviour

Negotiability

S Set

Identity 66.67 % 66.67 % 66.67 % 66.67 % 66.67 % 66.67 % Contacts 83.33 % 70.00 % 70.00 % 56.67 % 73.33 % 80.00 % Location 83.33 % 83.33 % 83.33 % 83.33 % 83.33 % 83.33 % SMS 90.00 % 50.00 % 70.00 % 50.00 % 53.33 % 73.33 % Storage 83.33 % 56.67 % 70.00 % 43.33 % 46.67 % 60.00 % Camera 80.00 % 60.00 % 86.67 % 60.00 % 70.00 % 63.33 % Bluetooth 83.33 % 83.33 % 83.33 % 83.33 % 83.33 % 83.33 % Photos 80.00 % 66.67 % 100.00 % 60.00 % 76.66 % 70.00 % Phone 96.67 % 56.67 % 76.67 % 50.00 % 60.00 % 80.00 % Motion 96.67 % 96.67 % 96.67 % 96.67 % 96.67 % 96.67 % Media 70.00 % 76.67 % 56.67 % 43.33 % 33.33 % 60.00 % Mobile Data 76.67 % 76.67 % 76.67 % 76.67 % 76.67 % 76.67 %

Average 82.50 % 70.28 % 78.06 % 64.17 % 68.33 % 74.44 %

A set

First Name 100.00 % 63.33 % 100.00 % 63.33 % 73.33 % 56.67 % Last Name 96.67 % 60.00 % 96.67 % 60.00 % 70.00 % 60.00 % Gender 76.67 % 76.67 % 76.67 % 76.67 % 76.67 % 76.67 % Birthday 90.00 % 60.00 % 90.00 % 60.00 % 63.33 % 53.33 % Height 70.00 % 70.00 % 70.00 % 70.00 % 70.00 % 70.00 % Weight 70.00 % 70.00 % 70.00 % 70.00 % 70.00 % 70.00 %

Average 83.89 % 66.67 % 83.89 % 66.67% 70.55% 64.44 %

F set

Steps 96.67 % 73.33 % 96.67 % 76.67 % 70.00 % 76.67 % Distance 96.67 % 73.33 % 96.67 % 76.67 % 70.00 % 76.67 % Elevation 100.00 % 70.00 % 100.00 % 73.33 % 73.33 % 80.00 % Floors 96.67 % 73.33 % 96.67 % 76.67 % 70.00 % 76.67 % Activity minutes 100.00 % 70.00 % 100.00 % 73.33 % 73.33 % 80.00 % Calories activity 96.67 % 73.33 % 96.67 % 76.67 % 70.00 % 76.67 % Weight 90.00 % 60.00 % 90.00 % 63.33 % 70.00 % 76.67 % Sleep 93.33 % 63.33 % 93.33 % 66.67 % 66.67 % 80.00 % Heartrate 100.00 % 70.00 % 100.00 % 73.33 % 73.33 % 80.00 % Food logs 90.00 % 60.00 % 90.00 % 63.33 % 70.00 % 76.67 % Friends 83.33 % 53.33 % 83.33 % 56.67 % 63.33 % 70.00 % Profile 96.67 % 66.67 % 96.67 % 70.00 % 76.67 % 76.67 % Location 86.67 % 56.67 % 86.67 % 60.00 % 66.67 % 66.67 % Device & settings 93.33 % 63.33 % 93.33 % 66.67 % 73.33 % 73.33 %

Average 94.29 % 66.19 % 94.29 % 69.52 % 70.48 % 76.19 %

G set SN Public 90.00 % 90.00 % 90.00 % 90.00 % 90.00 % 90.00 % SN Friends Only 73.33 % 53.33 % 73.33 % 63.33% 60.00 % 56.67 % Health 66.67 % 60.00 % 60.00 % 43.33 % 40.00 % 70.00 % Other Apps 76.67 % 76.67 % 76.67 % 76.67 % 76.67 % 76.67 % Corporate 80.00 % 80.00 % 80.00 % 80.00 % 80.00 % 80.00 % Government 86.67 % 86.67 % 86.67 % 86.67 % 86.67 % 86.67 % Health 86.67 % 86.67 % 86.67 % 86.67 % 86.67 % 86.67 % Safety 90.00 % 90.00 % 90.00 % 90.00 % 90.00 % 90.00 % Social 93.33 % 60.00 % 100.00 % 70.00 % 60.00 % 63.33 % Commercial 73.33 % 73.33 % 73.33 % 73.33 % 73.33 % 73.33 % Convenience 80.00 % 73.33 % 73.33 % 76.67 % 66.67 % 70.00 % Frequency 53.33 % 53.33 % 53.33 % 53.00 % 53.33 % 53.33 % Retention 50.00 % 40.00 % 50.00 % 50.00 % 43.33 % 46.67 %

Average 76.92 % 71.02 % 76.41 % 72.31 % 69.74 % 72.56 %

Over-all Average 84.74 % 68.74 % 83.41 % 68.52 % 69.70 % 73.11 %

130

Yang He Dissertation

Figure A2: Instructions on how to use the UIs

131

Yang He Dissertation

Figure A3: User Interface 1 with all settings turned off

Figure A4: User Interface 2 with all settings turned off

132

Yang He Dissertation

Figure A5: User Interface 1 with all settings turned on

Figure A6: User Interface 2 with all settings turned on

133

Yang He Dissertation

Figure A7: User Interface 1 with Smart Default

Figure A8: User Interface 2 with Smart Default

134

Yang He Dissertation

Figure A9: User Interface 1 with Smart Profiles

Figure A10: User Interface 2 with Smart Profiles

135

  • Abstract
  • Dedication
  • Acknowledgments
  • List of Tables
  • List of Figures
  • Introduction
  • IoT technology and IoT Acceptance
    • IoT Technology
    • Model the Acceptance of IoT
    • Summary
  • Privacy setting technologies in IoT
    • Privacy Preference
    • Privacy in IoT
    • Existing Privacy Setting Models
    • Privacy-Setting Interfaces
    • Privacy Prediction
    • Summary
  • Recommending Privacy Settings for General/Public IoT
    • Introduction
    • Dataset and design
    • Statistical Analysis
    • Predicting users' behaviors (original work)
    • Privacy shortcuts (original work)
    • Discussion and Limitations
    • Summary
  • Recommending Privacy Settings for Household IoT
    • Introduction
    • Experiment Setup
    • Statistical Analysis
    • Privacy-Setting Prototype Design
    • Predicting users' behaviors (original work)
    • Privacy-Setting Prototype Design Using Machine Learning Results (original work)
    • Limitations
    • Summary
  • Recommending Privacy Settings for Fitness IoT
    • Introduction
    • Data Model
    • Dataset
    • Predicting users' Preference (partial original work)
    • Profile Prediction (partial original work)
    • Privacy-setting Recommendations (partial original work)
    • Validation
    • Summary
  • Evaluate the Household IoT Privacy-setting Profiles and User Interfaces
    • Introduction
    • Study Design
    • Experimental setup
    • Results
    • Discussion
  • Conclusion
  • Bibliography
  • Appendices