QUES_MON.pdf

1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics

1

 Abstract—In the research of location privacy protection, the

existing methods are mostly based on the traditional anonymization, fuzzy and cryptography technology, and little success in the big data environment, for example, the sensor networks contain sensitive information, which is compulsory to be appropriately protected. Current trends such as "Industrie 4.0" and Internet of Things (IoT), generate, process, and exchange vast amounts of security-critical and privacy-sensitive data, which makes them attractive targets of attacks. However, previous methods overlooked the privacy protection issue, leading to privacy violation. In this paper, we propose a location privacy protection method that satisfying differential privacy constraint to protect location data privacy and maximize the utility of data and algorithm in Industrial Internet of Things. In view of the high value and low density of location data, we combine the utility with the privacy and build a multilevel location information tree model. Furthermore, the index mechanism of differential privacy is used to select data according to the tree node accessing frequency. Finally, the Laplace scheme is used to add noises to accessing frequency of the selecting data. As is shown in the theoretical analysis and the experimental results, the proposed strategy can achieve significant improvements in terms of security, privacy, and applicability.

Index Terms—Differential privacy, location privacy protection, location privacy tree, Internet of Things.

I. INTRODUCTION

HE development of information technology has accumulated a great deal of data for today's digital systems.

Big data is a very important research and development resource,

This work was funded by the National Natural Science Foundation of China

(61772282, 61373134, 61772454, and 61402234). It was also supported by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX17_0901) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET). (Corresponding author: Jin Wang.)

Chunyong Yin is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: [email protected]).

Jinwen Xi is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: [email protected]).

Ruxia Sun is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: [email protected]).

Jin Wang is with the College of Information Engineering, Yangzhou University, Yangzhou 225127, China (e-mail: [email protected]).

and the demand for data publishing, sharing and analysis are also growing rapidly. People pay more and more attention to the security of data protection, and the government, enterprises, and individuals are also improving their understanding of privacy protection.

Body sensor networks (BSNs), as a special application of wireless sensor networks (WSNs) [1], are deployed on the surface of bodies for periodically monitoring physical conditions. For a typical social participatory sensing application, it is important to motivate participatory, at the same time, the participatory sensing process should not disclose the privacy information of any participating party (the private data) or the community (patterns, distribution, etc.). Therefore, it is essential to develop a privacy protection data strategy for the protection of the users and the community [2].

Devices in the Internet of Things (IoT) generate, process, and exchange vast amounts of security and safety-critical data as well as privacy-sensitive information, and hence are appealing targets of various attacks. To ensure the correct and safe operation of IoT systems, it is crucial to assure the integrity of the underlying devices, especially their code and data privacy, against malicious modification [3].

The privacy threats of Industrial Internet of Things (IIoT) [4] can be simply divided into two categories: privacy threats based on data and privacy threats on location. Data privacy issues mainly refer to the leakage of secret information in the process of data acquisition and transmission in Internet of Things.

Location privacy is an important part of privacy protection of the Internet of Things. It mainly refers to the location privacy of each node in the Internet of Things and the location privacy of the Internet of Things in providing various location services, especially including the RFID reader location privacy RFID user location privacy, sensor Node location privacy, and location-based privacy issues based on location services [3], [5].

Usually, data collected, aggregated and transmitted in sensor networks contain personal and sensitive information, which directly or indirectly reveals the condition of a person. If the data cannot be properly preserved, once exposed to the public, the privacy will be destroyed. Therefore, protecting the privacy of sensitive data is greatly important [6], [7].

Location data implies moving objects, spatial coordinates, current time, and unique features different from other data, which is discrete and of high value. Before the concept of big data, most of the privacy protection methods focus on a small

Chunyong Yin, Jinwen Xi, Ruxia Sun, Jin Wang, Member, IEEE

Location Privacy Protection based on Differential Privacy Strategy for Big Data in

Industrial Internet-of-Things

T

1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics

2

number of non-positional data. There are some limitations for the protection of location data privacy in big data. There are two main reasons as follows: (1) Multiple data fusion by big data makes traditional anonymity [8], [9], [10] and fuzzification technology difficult to take effect in location privacy protection [11]; (2) Traditional cryptography technology takes little effect on the real-time analysis required by big data.

In summary, location privacy protection faces great challenges for big data in IIoT. Therefore, we propose a more rigorous LPT-DP-k location privacy protection method based on differential privacy strategy for big data in sensor networks.

Our specific contributions mainly include: (I) We introduce a tree structure to represent the location data

in sensor networks, which called LPT (Location privacy tree), according to the characteristics and retrieval difficulty of location data.

(II) The differential privacy strategy is suitable for location privacy protection because it is insensitive to the background knowledge and the DP-k (Differential privacy-k) model has better protection effect. Laplace and index mechanism are the main implementation mechanisms of differential privacy, which can show the degree of privacy protection by allocating the privacy budget and is relatively reliable and rigorous.

(III) We conduct extensive experiments on real-world datasets which show that the proposed location privacy protection method can protect users’ location privacy without significantly affecting the privacy, availability, security, and effectiveness.

Section 2 presents necessary background and related work. Section 3 introduces the preliminary knowledge. Section 4 details the proposed location privacy protection method. The real-world datasets and experimental results are presented in Section 5 and conclusions in Section 6.

II. RELATED WORK

In the process of data mining and data publishing, the privacy protection of location data mainly involves two aspects: privacy model and utility.

A. Privacy Model

In recent years, with the widespread application of location services and data mining and publishing, location privacy protection model in location service can be divided into two categories: one is the traditional anonymous model based on grouping; the other is the differential privacy model that ignores the attacker’s background knowledge.

(1) Anonymous model based on grouping The traditional anonymous model based on grouping plays

an important role in location privacy protection. Samarati et al. [12] firstly proposed k-anonymity method and other a large number of methods based on k-anonymity [13], [14], [15]. Previous research [15], [16] shows that only using anonymous methods does not provide good protection to a wide range of data. According to [17], the encryption privacy protection method is proposed, which can completely protect the privacy of data and prevent the leakage of data in the process of location

service, but the availability of data is insufficient. The development of traditional location privacy protection technology has gone through three stages: the "informed and consent" method proposed by document [18] has been developed to deal with anonymous processing in a single location, and then to deal with anonymous processing of user's trajectory data. Heuristic privacy measurement, probability deduction, and private information retrieval based technologies are common methods to protect location privacy. The heuristic privacy measurement method mainly protects the users who are not in the strict privacy protection environment, such as k-anonymity, t-closeness [19], m-invariance [20], and l-diversity. However, these three kinds of methods are proposed as a unified attacking model, which protect the location data under the premise of accumulating certain background knowledge. As attackers grasp more background knowledge, these methods cannot effectively protect the user’s location data privacy and the lack of privacy protection about the type of relational privacy protection method is proposed in [21].

(2) Differential privacy model Differential privacy model is a strong privacy concept which

is completely independent of attacker’s background knowledge and computing ability, and has become a research hotspot in recent years. Compared to the traditional privacy protection model, differential privacy has its unique advantages. Firstly, the model assumes that the attacker has the maximum background knowledge. Secondly, differential privacy has a solid mathematical foundation, a strict definition of privacy protection and a reliable quantitative evaluation method. In recent years, differential privacy model has been widely applied in privacy protection. Especially, the FM (Functional mechanism) is introduced in [22], in which an objective function of  -differential privacy disturbance optimization problem is used to protect privacy. The Diff-FPM (Privacy-Preserving Mining) algorithm is proposed in [23], [24], [25] and combined with MCMC (Markov Chain Monte Carlo), which provides privacy protection and maintains the high data availability while satisfying (   )-differential privacy conditions. The PrivBasis and SmartTrunc method proposed in [26] adopt differential privacy model in the mining process of frequent itemsets, ensuring the privacy and utility of data analysis and anonymity. The DiffP-C4.5 and DiffGen algorithm introduced in [27] combines differential privacy with decision trees and other data structures to maintain a balance between data privacy and availability.

In a word, differential privacy is an effective privacy protection mechanism, which protects the user’s location privacy while keeping enough useful information for data analysis.

B. Utility Maximization

In the process of location service, data mining and data publishing need to protect location privacy and provide enough information for data analysis. Therefore, data utility is the core issue that needs to be paid attention to. Reference [28] uses compressive sensing theory to propose a perception mechanism

1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics

3

to solve the published issue of statistical results, which can effectively solve the insufficiency problem of data utility, but it destroys the link among data. DP-topk method is proposed in [29], which has a more rigorous definition of effectiveness. But this method ignored the relation among transaction data, and because of its poor efficiency in data processing individually, the availability of algorithm is not high. To maximize the utility of the results and meet the requirement of location privacy protection, the paper proposes LPT-DP-k algorithm, which is a more rigorous differential privacy protection method, not only can guarantee the high availability of data, but also can enhance the usability of the algorithm.

III. PRELIMINARY KNOWLEDGE

Differential privacy is a common privacy protection framework that supported by the solid mathematical theory, which can provide privacy protection for data in the case that the attacker grasps the largest background knowledge. Definition 1 (Differential privacy)

Suppose there is a random algorithm M , MP is all possible

output sets for M , for any given adjacent data set D and D (there is at most one different record between them),

1D D  , MS is any subset of MP . If the algorithm M

satisfies the following inequality, the algorithm M will satisfy  -differential privacy protection.

     exp .r M r MP M D S P M D S           (1)

in which  .rP represents the randomness of the algorithm M on the data set D and D . Definition 2 (Sensitivity)

The differential privacy protection method defines two kinds of sensitivity, called global sensitivity and local sensitivity. Suppose there is a query function : df D R , the input is a

data set and the output is a d -dimensional real vector. For any adjacent data set:

    1,= max .D Df f D f D   (2)

called the global sensitivity of function f , and f represents the maximum change value of the output results.

   f D f D is the 1-order norm distance between  f D and  f D . Definition 3 (Implementation mechanism)

Laplace mechanism and Exponential mechanism [30] are two of the most basic implementation mechanisms of differential privacy protection. In this paper, the Laplace mechanism is used to add the noise that obeys the Laplace distribution to realize the differential privacy. Assuming the privacy protection algorithm f based on the Laplace

mechanism, the noise keeps to the Laplace distribution with the

variance f

 

is and the mean is 0. Then the probability density

function is:

 r 1

P , = exp . 2

x x 

       

(3)

in which x represents the specific variable and = f

  

.

If the algorithm f is proportional to the probability of

 , exp

2

D r 

   

  to select from O and export r , then the

algorithm f provides  -differential privacy protection, the concrete formula is as follows:

     

r

, , = : P exp .

2

D r A D r r O

 

    

        

(4)

in which  is the global sensitivity of the scoring function  ,D r , (4) shows that the higher the score is, the larger the

probability of the selected output will be.

IV. LPT-DP-K LOCATION PRIVACY PROTECTION

This section introduces the overview of LPT-DP-k algorithm and the specific implementation details of the algorithm.

A. LPT-DP-K Algorithm

Firstly, we choose the location privacy tree construction to maintain the relation among the location data. Secondly, we select the sensitive location information which is most likely to disclose privacy to add noise. The algorithm summarizes as follows: Algorithm 1 LPT-DP-k algorithm

1: Initialize: data set *D , location data set D and *D D , differential

privacy parameter  , parameter k , the number of nodes that are accessed by the data is count and empty sets A , B . 2: Construct a location privacy tree (LPT). 3: The node data will be divided into two categories, and the frequent patterns

of accessing frequency not less than count are recorded in the set, see section 3.2.

4: Calculate the output probability, the formula is  

1

.

.

i r i n

j j

a w P a

a w 

 , see

section 3.3.

5: Add noise and combine with *D

DLPT to construct the noisy location

privacy tree *D

DLPT k , see section 3.4.

6: Finally, publish the *D

DLPT k after step 4.

B. Construction of Location Information Tree Corresponding to Location Data

Assuming the location privacy tree *D

DLPT is corresponding

to the data set D , which records each person’s visiting records within a month. According to the location information, LPT-DP-k algorithm uses the tree structure with better correlation. The corresponding Table I is shown as follows:

TABLE I LOCATION CORRESPONDENCE

Number Location information Accessing count

1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics

4

1 Computer building 28

2 Remote Sensing

Institute 13

3 All-in-one card

management center 2

4 Mingde Building 6 5 Training Gym 5 6 Dormitory 3 30 …… …… ……

As is shown in Table I, the user’s location information and accessing frequency of the places within a month is listed, we can get the location correspondence table.

TABLE II LOCATION DATA INFORMATION

Number Location label Accessing count 1-28 1 28 29-41 2 13 42-43 3 2 44-49 4 6 50-54 5 5 55-84 6 30 85-97 1,2 13 …… …… ……

According to the content in Table II, we can construct the location privacy tree like Fig. 1.

0

1 #28

2 #13

...

... 1,3 #2

1,2 #13

1,2,3 #2

1,2,4 #6

...

1,2,3,6 #2

1,2,3,5 #2

... 1,2,3,4

#2

Fig. 1. Location privacy tree

As shown in Fig. 1, the total number of nodes in *D

DLPT is *

*

*

1

1

2 1 D

D

D i

C 

  .

C. Weighted Selection based on Exponential Mechanism

We use the exponential mechanism in the differential privacy protection model to do weighted selection. The formula is as follows:

 

1

. .

.

i r i n

j j

a w P a

a w 

(5)

 r iP a represents the probability of being selected and .ia w represents the weight of the pattern ia . The selection algorithm

based on the exponential mechanism is as follows: Algorithm 2 Selection algorithm

1: Initialize: records in the frequent pattern set A . 2: Input a set A of frequent pattern records and score each record ia .

3: Calculate the weight .ia w of each pattern record.

4: Take the weight .ia w of each pattern in the set A into the formula

 

1

.

.

i r i n

j j

a w P a

a w 

 , select k frequent pattern records ia and record the

set as B . the scoring function is:

   , .i iM A a Q a (6)

 iQ a represents the accessing frequency of pattern ia . Calculate the weight .ia w of each pattern record.

 1 ,. =exp .

2 i

i

M A a a w

M

    

  (7)

This paper sets up the scoring function    , i iM A a Q a ,  iQ a represents the accessing frequency of pattern ia .  , iM A a represents the scoring value of ia . The formula to

calculate M is as follows:

    1,

max .i j i j N

M Q a Q a 

   (8)

M represents the maximum value of the difference in the accessing frequency among N data record modes.

D. Noise Enhancement based on Laplace Mechanism

This paper proposes the method of node data processing to improve the efficiency of data processing and the effectiveness of the algorithm. The concrete steps are as follows: Algorithm 3 Laplace noising algorithm

1: Initialize: the privacy budget  , k frequent pattern set B after weighted selection, query function f .

2: Add noise into the frequent pattern set B to keep to the Laplace distribution and destabilize the real accessing frequency of k records. The added noise keeps to Laplace distribution that the probability density function is

 r 1

P , = exp 2

x x 

       

.

3: According to (9), the noise set C can be calculated. Finally, the noisy

location privacy tree *D

DLPT k is generated by combining the original data

tree *D

DLPT with the set C .

Therefore, it should be satisfied in every round of noise adding process:

    . f

f b f b lap   

     

(9)

B is the set of k frequent pattern b .  f b is the query

function on the record b and f

lap   

   

represents the noise

that keeps to Laplace distribution, as is shown in Fig. 2.

1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics

Fig

be

LP

ef th

ap

ap

A.

(G am III

Us

M py

g. 2. Relationship

True positive e used to mea

*D DPT k . Fal

ffectiveness of e effectivenes True Positive

ppear in kf D

False Positiv

ppear not in kf

Accuracy:

False Rejectio

F

Error:

Experiment

 Data s This paper u

Gowalla total- mount of data I shows the ex

ser id Check- 0 2010-10-19 0 2010-10-18 0 2010-10-17 0 2010-10-12

 Opera The experi

MATLAB(R20 ython). Hardw

p between privacy

e (TP), false Po asure the utili

lse rejection r

f the algorithm ss of the algori e (TP): the n

D and in kf D

kTP f

ve (FP): the n

 kf D , but in

 kFP f D

Accuracy 

on Rate (FRR

kf D FRR 

LP Error 

V. EXPERIM

Setting

set uses the chec -check-in data and records th

xperimental da TA

EXPERIME

-in time L T23:55:27Z 30.2 T22:17:43Z 30.2 T23:42:03Z 30.2 T19:44:40Z 30.2

ating environm imental envi 13b) and PyC ware environ

y budget and nois

ositive (FP) an ty of noisy lo

rate (FRR) is

m. The smaller ithm will be. number of fre

D :

   k kD f D

number of fre

 kf D :

  k kf D f 

    

k k

k

f D f

f D

 

R):

  kf D f k

 

* *

*

2

D D D D

D D

PT LPT

LPT

MENTAL ANALY

ck-in data set a set). This d he information ata set. ABLE III ENTAL DATA SET

Latitude Lon 23590912 -97.79 26910295 -97.74 25573099 -97.76 26910295 -97.74

ment and param ironment in

Charm (Part of nment is: 2

se size

nd accuracy [3 ocation privac

used to analy

r FRR is, the

equent pattern

 .

equent pattern

  .kf D

  .

D

  .

kf D

2 . k

YSIS

from the Go data set has a n of the users.

ngitude Locat 9513958 228 4939537 420 6338577 316 4939537 420

meter configur n this pape f the program .60GHz Cor

31] can cy tree

yze the

higher

ns that

(10)

ns that

(11)

(12)

(13)

(14)

owalla a large Table

tion id 847

0315 6637 0315

ration er is

m using e(TM)

i7-6 para

Lo

B.

T (1) Tim priv

TIM

patte n 16 32 64 12 25 51

102 204 409

A reco is co

 k

A extr incr prot time

C.

T with repr sens 3, 4

6700HQ CPU ameters are sh

Parameter

k

Location id ocation accessing

frequency

Analysis of Al

The timeliness Timeliness

meliness of w vacy tree. (3) T

ME OF BUILDING

ern n

time.build (s

6 0.0000015 2 0.0000016 4 0.0000029 8 0.0000057 6 0.0000116 2 0.0000149

24 0.0000262 48 0.0000370 96 0.0000443

As shown in T onstructing loc onstantly chan

EXTRACTI

 0.01 k 3432

As is shown raction efficie rease of  . A tection decrea e increases.

Analysis of D

This paper ext h 500, 1000, resent the inse sitive location

4, and 5:

U, 20.00GB, W hown in Table

TAB PARAMETE

20 40 60 80

0.005 0.01 0. 0.7

lgorithm Time

analysis is m of constructin

weighted selec The efficiency

TAB AND ADDING NO

INFORMA

) time.update (s)

0.000103010 0.000103110 0.000103324 0.000103010 0.000104032 0.000103000 0.000105032 0.000106340 0.000108970

Table V, the cation privacy nging.

TAB ION EFFICIENCY O

0.03 0.05

3691 4254

in Table VI ency of loca As the  inc

ases and the n

Data Utility

tracts data fro and 2000 da

ensitive locatio ns. The experim

(

Win10 system IV during the LE IV ER SETTING

value 100 150 200 300

10000 200 .015 0.02 0.025 0

75 1 1.1 1.25 1.5 1 1 2 3 4 5

28 13 2 6 5

eliness

mainly analyze ng location

ction and upd y of extracting BLE V OISE TO RECONSTR ATION TREE

) pattern

n time.bu

16 0.0000 32 0.0000 64 0.0000 128 0.0000 256 0.0000 512 0.0000

1024 0.0000 2048 … 4096 …

efficiency of y tree is higher

LE VI OF LOCATION DA

0.1 0.5

42103 8932

I, within a c ation data in creases, the d

number of extr

om multiple g ata. The blue ons and the red mental results

a)

m of 64 bit. e experiment.

500 1000 2000 5 000 .03 0.035 0.04 0.

1.75 5 10 15 6 30

ed in three asp privacy tree.

dating of loca location data.

RUCT THE LOCAT

uild (s) time.upda

0014 0.000102 0024 0.000102 0035 0.000102 0061 0.000102 0110 0.000102 0150 0.000102 0280 0.000102

… … … …

f constructing r when the mo

ATA PRIVACY

1.0 1

2 53245 79

certain range, ncreases with degree of pri raction in the

groups of patt dots are use

d ones indicat s are shown in

5

The

5000

1 0.5

pects: . (2) ation .

TION

ate (s)

2190 2180 2195 2120 2095 2095 2030

g and ode n

1.5

9310

, the h the ivacy unit

terns ed to e the

n Fig.

1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics

Fig are

lo an

Fig loc

lo se wh

g. 3. Data distribu e shown before ad

The red sensi cations are 39

nd the blue ins

g. 4. Data distr cations are shown

There are 141 cations in Fi

ensitive positio hich shows th

ution before and a dding noise. (b) T

itive locations 7 in Fig. 3(a),

sensitive locati

ribution before an n before adding no

1 red sensitive ig. 4(a), after on and 781 blu at the sensitiv

(b) after protection (k The ones after add

s are 103 and the red sensiti ions are 353 in

(a)

(b) nd after protectio oise. (b) The ones

locations and r protection,

ue insensitive l ve locations ha

(a)

is 500). (a) The lo ding noise.

the blue insen ive locations a n Fig. 3(b).

on (k is 1000). s after adding nois

859 blue insen there are 21

locations in Fig ave increased.

ocations

nsitive are 137

(a) The se.

nsitive 19 red g. 4(b),

Fig. locat

T inse red Fig. utili

Pa nu

1 2

W TDP exp

Fig.

F LPT is b prop prop the priv and TDP the low rela VIII

UTI

5. Data distrib tions are shown b

There are 23 ensitive locatio sensitive loca

. 5(b). Table ity.

UTILITY S

Bef

attern umber

Sensit location

dots 500 103

1000 141 2000 235

We compare t PS_LP_Result erimental resu

6. Error compari

From Fig. 6 sh T-DP-k algorit beginning to s posed method posed method

Error be s vacy protection d the higher t PS_LP_Result privacy param

wer error. When atively larger.T I and IX:

ILITY COMPARISO

(b ution before and

before adding nois

5 red sensit ons in Fig. 5(a ations and 162 VII gives th

TABL STATISTICS UNDE

fore the experimen tive

ns (red s)

Insensi locations

dots 3 397 1 859 5 1765

the proposed t, TDPS_Sig

ults are shown

ison of adding-no

hows that: (1) thm is the sma stabilize, whic d is higher

ds; (2) as the smaller; when n, the smaller the privacy pr t, TDPS_LP_

meters are requ n the privacy p The experime

TABL

ON OF TOP k FRE PRIVACY PARA

b) d after protection se. (b) The ones a

tive locations a), after protec 20 blue insen e statistical r

LE VII ER DIFFERENT DA

nt Afte tive (blue )

Sensiti locations

dots) 7 137 9 219 5 380

method with gnal and TD n in Fig. 6:

oise methods

the noise erro aller, when  ch shows that and better th privacy param n meeting th r  is, the mo rotection leve _Signal and T uired to be lar parameters are ntal results ar

LE VIII

EQUENT ITEMSET

AMETER ( 1  )

n (k is 2000). (a after adding noise

s and 1765 ction, there are nsitive location results of the

ATA MODELS

er the experiment ive (red )

Insensiti locations (

dots) 353 781

1620

three method DPS_EP and

or of the prop 0.015 , the

t the utility of han the prev meter  incre he  -differe

ore the noise n el will be; (3) TDPS_E meth rger to achieve e small, the err re shown in T

MINING UNDER F

6

a) The .

blue e 380 ns in data

ive (blue

ds of the

posed error f the vious ases,

ential needs ) for hods, e the ror is Table

FIXED

1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics

5 1 2 5 10 20

th ac th wh m

U

0. 0. 0 0

1

th

LP 7

Fig

Fig

re va

TP k LPR LPS

50 46 47 00 80 78 00 126 126 00 305 290 000 530 540 2 000 680 940

As shown in e accuracy of

ccuracy of the e accuracy of t hen 500k  ,

more than 80%.

UTILITY COMPARI

TP LPR LPS

.01 136 184

.05 178 182 0.1 174 180 0.5 180 190 1 182 188 .5 184 190

As shown in e higher the p Finally, the

PT-DP-k by F and 8:

g. 7. FRR compa

g. 8. FRR compa

Assuming k sult is shown i

alue of FRR w

P EP DPk LPR 26 49 4 39 96 20 70 190 74

150 455 195 230 890 470 40 1720 1320

Table VIII: (1 f 4 methods d previous three the proposed m the accuracy

. TA

ISON OF TOP k F FREQUENT PAT

P EP DPk LPR 70 188 64 64 188 22 90 190 26

160 190 20 168 188 18 174 190 16

Table IX, the rivacy protect paper compa

FRR. The expe

arison when k 

arison when =1 is 100, the p in Fig. 7: with

with DP-topk an

FP LPS EP DPk

3 4 1 22 61 4 74 130 10

210 350 45 460 770 110 1060 1960 280

1) with the val decreases; (2) e methods is l method still re y decreases bu

ABLE IX

FREQUENT ITEMSE

TTERNS ( 20k  FP

LPS EP DPk 16 130 12 18 136 12 20 110 10 10 40 10 12 32 12 10 26 10

smaller the p tion level will ares the utilit erimental resul

100

privacy parame h privacy param nd the propose

Accuracy k LPR LPS EP

0.93 0.95 0.5 0.80 0.78 0.3 0.63 0.63 0.3 0.61 0.58 0.3 0.53 0.54 0.2 0.38 0.47 0.2

lue of k incre when 20k 

less than 70%, emains at abou ut still mainta

ET MINING UNDER

0 ) Accuracy

k LPR LPS EP 0.68 0.92 0.3 0.89 0.91 0.3 0.87 0.90 0.4 0.90 0.95 0.8 0.91 0.94 0.8 0.92 0.95 0.8

privacy parame be.

ty of DP-top lts are shown i

eter is change meter increasin ed method mai

y P DPk 3 0.98 9 0.96 5 0.95 0 0.91 3 0.89 0 0.86

easing, 0 , the , while

ut 95%, ains at

R FIXED

y P DPk 5 0.94 2 0.94 5 0.95 0 0.95 4 0.94 7 0.95

eter is,

pk and in Fig.

ed, the ng, the intains

low we c whe

A show but up, leve algo

T base netw con the beca and posi data The prot loca is m effic effic proc targ

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

w, but the value can see that th en the protecti Assuming  is wn in Fig. 8: w the value of D the proposed

el of location p orithm.

This paper pro ed on differen works. The m structing the l problem that ause of its cha

d adds noise in ition data. It i a and maintain e differential tection of loca ation privacy p

more rigorous a ciency. In the cient data stru cess of locatio

get functions fo

I. F. Akyildiz, W sensor network Journal of Comp pp. 393-422, 20 K. Xing, C. Q. Privacy Preservi IEEE Transact 2066-2076, 2017 Z. H. Lv, H. “Next-Generatio Future Research vol. 13, no. 4, pp F. Xiao, L. T. Sh for unknown B Internet of Th Computing, no. Z. Q. Wang, F See-through-wa on Battery-free Systems, vol. 17 H. Alemdar and survey,” Compu Z. J. Fu, K. Re personalized se improvement,” I vol. 27, no. 9, pp H. Rong, T. H. isomorphism m detection,” Soft T. H. Ma, Y. L Abdullah and A. and Edge Mod 1336–1344, 201

e of the propo he availability ion level is hig s 1, the value with k increas

DP-topk is large method has o

privacy protec

VI. CON

oposes a locat ntial privacy method expre location inform the location d

aracteristics of nformation to c is more effect ning high ava privacy prote ation privacy. protection alg and has higher next step, we

ucture to expr on data expre

for different ap

REFER W. Su, Y. Sankara ks: a survey,” C puter & Telecom 10. Hu, J. G. Yu, X

ing k-Means Clus tions on Industr 7. B. Song, P. on Big Data Anal h Topics,” IEEE p. 1891-1899, 20 ha, Z. P. Yuan and

Bugs based on An hings,” IEEE Tr 99, pp. 1–13, 201 F. Xiao, N. Ye ll System for Dev RFID,” ACM T

7, no. 1, pp. 1–21, d C. Ersoy, “Wire uter Networks, vol en, J. G. Shu, X earch over encry IEEE Transaction p. 2546–2559, 20 . Ma, M. L. Tan

method in social Computing, pp. 1

L. Zhang, J. Cao, . R. Mznah, “KDV

dification algorith 5.

osed method i of the propose

gh. e of k is chan sing, the value er than that in obvious advan

ction and the e

NCLUSION

tion privacy p strategy for b

esses the pos mation tree mo data is difficu f high dispersio cover the orig tive in protect ailability of da ection model . Compared w gorithm, the pr r algorithm util will discuss to

ress location i ession and pro pplication scen

RENCES asubramaniam and Computer Netwo munications Netw

X. Z. Cheng and stering in Social P rial Informatics,

Basanta-Val, A lytics: State of th Transactions on 17. d R. C. Wang, “V nalysis for know

Transactions on 17. , R. C. Wang vice-free Human Transactions on

2017. eless sensor netw l. 54, no. 15, pp. 2 . M. Sun and F.

ypted outsourced ns on Parallel an

016. ng and J. Cao, “A l network based 1–19 , J. Shen, M. L. VEM: a k-degree hm,” Computing

s lower. From ed method is b

nged, the resu e of FRR incre this paper. To ntages both in

effectiveness o

protection me big data in se ition data se odel, which so

ult to be expre on and low den

ginal trajectory ting the privac ata and algori is applied to

with the traditi roposed algor lity and proces o explore the m information in opose more u narios.

d E. Cayirci, “Wi orks the Interna working, vol. 38,

d F. J. Zhang, “M Participatory Sen

vol. 13, no. 4

A. Steed and M he Art, Challenge Industrial Inform

VulHunter: A Disc wn patches in Ind

Emerging Topi

and P. L. Yang Motion Sensing B Embedded Comp

works for healthca 2688-2710, 2010. X. Huang, “Ena

d data with effic nd Distributed Sys

A novel subgraph d on graph sim

Tang, Y. Tian, anonymity with V

g, vol. 70, no. 6

7

m this better

ult is ases,

o sum n the of the

ethod ensor et by olves essed nsity,

y and cy of ithm. o the ional rithm ssing more n the

utility

ireless ational

no. 4,

Mutual nsing,” 4, pp.

M. Jo, s, and

matics,

covery dustry ics in

g, “A Based puting

are: A . abling ciency stems,

h K+- milarity

A. D. Vertex 6, pp.

1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics

[10

[11

[12

[13

[14

[15

[16

[17

[18

[19

[20

[21

[22

[23

[24

[25

[26

[27

[28

[29

[30

[31

0] C. Y. Yin, S. model for big & Computatio

1] C. Y. Yin, J. W Improved K -V Mobile Inform

2] P. Samarati an when disclo SIGACTSIGM ACM, New Yo

3] R. C. W. Won An enhanced k in Proc. of the and Data Min

4] N. H. Li, T. beyond k-ano Conf., on Data

5] E. Zheleva relationships i on Privacy, S 153–171.

6] A. Korolova, R networks,” in P 1355–1357.

7] E. D. Cristofar Privacy at the and Privacy, I

8] A. R. Beresfor privacy for app Workshop on York, 2011, pp

9] B. Bamba, L. L queries in mo Int’l Conf., on

0] L. Liu, “From in Koch C, ed ACM, New Yo

1] C. Dwork, “D Annual ACM- and Applied M

2] B. Gu, V. S. support vecto Neural Netwo 2014.

3] B. Gu, V. S. S learning for v 140-150, 2015

4] B. Gu and V v-support vect and Learning

5] B. Gu, X. M Machine,” IEE vol. 28, no. 7,

6] E. T. Shen an privacy,” in P Discovery and 545–553.

7] C. Zeng, J. F. itemset minin (VLDB), Trent

8] A. Friedman a Proc. of the A Data Mining (

9] Y. D. Li, Z mechanism: U Proc. of the 1 Society, ACM,

0] X. J. Zhang, M top-k Frequen Research and

1] C. Y. Yin an classification algorithm,” M

Zhang, J. W. Xi data security bas

on Practice & Exp W. Xi and R. X. Su Value Method in

mation Systems, pp nd L. Sweeney,

osing informatio MOD-SIGART Sym

ork, 1998, pp. 188 ng, J. Y. Li, A. W k-anonymity mod e 12th ACM SIGK ning, ACM, New Y

C. Li and S. Ve onymity and l-div a Engineering, IE and L. Getoor, in graph data,” in Security, and Tru

R. Motwani, S. U Proc. of the 24th I

ro, C. Soriente, G e time of twitter,” IEEE, Piscataway rd, A. Rice, N. Sk plication function Mobile Computin p. 49–54. Liu, P. Pesti and T bile environment

n World Wide Web m data privacy to l d. Proc. of the 33r ork, 2007, pp. 142

Differential privac -SIAM Symp., on D Mathematics, ACM

Sheng, K. Y. T r learning for or

orks and Learning

Sheng, Z. Wang, v-support vector r 5.

V. S. Sheng, “A tor classification, Systems, vol. 28, . Sun and V. S. EE Transactions o pp. 1646–1656, 2

nd T. Yu, “Mining Proc. of the 19th A d Data Mining (SI

Naughton and J. ng,” in Proc. of to, Italy, 2013, pp

and A. Schuster, “ ACM SIGKDD In (SIGKDD), ACM, . J. Zhang, M.

Utilizing sparse r 10th Annual ACM

M, New York, 2011 M. Wang, and X. F nt Pattern Under D

Development, vo nd J. W. Xi, “M

in cloud compu Multimedia Tools &

i and J. Wang, “ sed on clustering perience, vol. 29, un, “Location Priv

n Augmented Rea p. 1–7, 2017. “Generalizing da on,” in Proc. mp., on Principle 8–202.

W. C. Fu and K. W del for privacy-pr KDD Int’l Conf., o York, 2006, pp. 7 enkatasubramania versity,” in Proc. EEE, Piscataway,

“Preserving th Proc. of the 1st A

ust in KDD, ACM

U. Nabar and Y. Xu Int’l Conf., on Da

G. Tsudik and A. W in Proc. of IEEE

y, 2012, pp. 285–2 kehin and R. Soha nality on smartpho ng Systems and A

T. Wang, “Suppor ts with privacy gr b, ACM, New Yor location privacy: rd Int’l Conf., on 29–1430. cy in new setting Discrete Algorith

M, New York, 201 ay, W. Romano

rdinal regression, g Systems, vol. 2

D. Ho, S. Osman egression,” Neura

robust regulariza ” IEEE Transacti no. 5, pp. 1241– Sheng, “Structu

on Neural Networ 2017. g frequent graph ACM SIGKDD In

SIGKDD), ACM, C

. Y. Cai, “On dif the 39th Conf o

p. 1087–1098. “Data mining with Int’l Conf., on Kn

M, Washington, US Winslett and Y

representation in M Workshop on P 1, pp. 177–182. F. Meng, “An Acc Differential Privac ol. 51, no. 1, pp. 1

Maximum entropy uting using imp & Applications, p

An improved ano algorithm,” Conc no. 7, 2017.

vacy Protection B ality on Mobile D

ata to provide ano of the 7th es of Database S

Wang, “(a, k)-Ano reserving data pub on Knowledge Di 54–759. an, “t-closeness: . of the 23rd IEE 2007, pp. 106–11

he privacy of s ACM SIGKDD W M, New York, 20

u, “Link privacy i ata Engineering, 2

Williams, “Humm E Symposium on S 299. an, “MockDroid: ones,” in Proc. of Applications, ACM

rting anonymous rid,” in Proc. of t rk, 2008, pp. 237 Models and algor Very Large Data

gs,” in Proc. of t hms Society for In 10, pp. 174–183. and S. Li, “Incr ” IEEE Transact

26, no. 7, pp. 140

n and S. Li, “Incr al Networks, vol.

ation path algori ions on Neural N 1248, 2017.

ural Minimax Pro rks and Learning

patterns with diff nt’l Conf., on Kno Chicago, USA, 20

fferential private f of Very Large D

h differential priv nowledge Discov SA, 2010, pp. 493 Y. Yang, “Comp

differential priva Privacy in the Ele

curate Method for cy,” Journal of Co 04-114, 2014.

y model for mob proved informatio pp. 1-17, 2016.

onymity currency

Based on Devices,”

onymity ACM

Systems,

onymity: blishing,” iscovery

Privacy EE Int’l 15. sensitive

Workshop 007, pp.

in social 2008, pp.

mingbird: Security

Trading f the 12th M, New

location the 17th –246. rithms,” a Bases,

the 21st ndustrial

remental tions on 03-1416,

remental . 67, pp.

ithm for Networks

obability Systems,

fferential owledge 013, pp.

frequent Database

vacy,” in very and 3–502. pressive acy,” in ectronic

r Mining omputer

bile text on gain

H Info coau His sens is a

Chin cryp

Uni algo wire and

He is a Profess ormation Scien uthored more current resea

sor networking member of A

na. His cur ptography, big

iversity. His orithm design, eless ad hoc a

d ACM.

Chunyong from Shan China in 1 Ph.D. deg Guizhou U 2008, resp research a New Brun 2012.

sor and Dean nce & Techno

than twenty arch interests g, machine lea

ACM.

Jinwen Xi computer Huaiyin In 2011.

Now he degree i technology Informatio

rrent researc g data security

Ruxia Su electrical e University She is a

Nanjing Science & research in systems, mathemati

Jin Wan degree fro and Telec and 2005, degree fro in 2010.

He is a Informatio

research inte , performance

and sensor netw

g Yin receive ndong Univers 1998. He rece grees in comp University, Ch pectively. He w associate at t nswick, Cana

with the Nan ology, China. H

journal and c include priva arning and net

i received his science and

nstitute of Tech

e is studying in computer y from Nanji on Science ch interests y and privacy.

un received th engineering fr

y of Technolog an associate

University Technology, nterests inclu

machine ical modeling.

ng received th om Nanjing U ommunication respectively. m Kyung Hee

a professor i on Engineer erests mainly evaluation an

works. He is a

ed the B.S. de ity of Technol

eived the M.S. puter science hina, in 2005 was a post-doc the University ada, in 2011

njing Universit He has authore conference pap cy preserving twork security

bachelor degr technology

hnology, Chin

g for his mas r science ing Universit and Technol are in app

he B.E. degre rom the Shand gy, China, in 1

Professor in of Informa

China. Her cu de cyber phy

learning .

he B.S. and M University of P ns, China in 2 He received P

e University K

n the Colleg ring, Yangz

y include rou nd optimization a Member of I

8

egree logy, . and from and

ctoral y of and

ty of ed or pers.

g and y. He

ree in from

na, in

ster’s and

ty of logy, plied

ee in dong 1997. n the ation

urrent ysical

and

M.S. Posts 2002

Ph.D. Korea

ge of zhou uting n for IEEE

1. What is the purpose of this article?

2. What are the domain and context of this research problem?

3. What does the author claim is the significance of the study?

4. What are the research questions that motivate the study?

5. What literature/prior research do the authors use to validate the problem?

6. How did the authors demonstrate that there is not a ready solution to the problem?

References

  • Location Privacy Portection based
  • questionswkw