Ques Mon
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics
1
Abstract—In the research of location privacy protection, the
existing methods are mostly based on the traditional anonymization, fuzzy and cryptography technology, and little success in the big data environment, for example, the sensor networks contain sensitive information, which is compulsory to be appropriately protected. Current trends such as "Industrie 4.0" and Internet of Things (IoT), generate, process, and exchange vast amounts of security-critical and privacy-sensitive data, which makes them attractive targets of attacks. However, previous methods overlooked the privacy protection issue, leading to privacy violation. In this paper, we propose a location privacy protection method that satisfying differential privacy constraint to protect location data privacy and maximize the utility of data and algorithm in Industrial Internet of Things. In view of the high value and low density of location data, we combine the utility with the privacy and build a multilevel location information tree model. Furthermore, the index mechanism of differential privacy is used to select data according to the tree node accessing frequency. Finally, the Laplace scheme is used to add noises to accessing frequency of the selecting data. As is shown in the theoretical analysis and the experimental results, the proposed strategy can achieve significant improvements in terms of security, privacy, and applicability.
Index Terms—Differential privacy, location privacy protection, location privacy tree, Internet of Things.
I. INTRODUCTION
HE development of information technology has accumulated a great deal of data for today's digital systems.
Big data is a very important research and development resource,
This work was funded by the National Natural Science Foundation of China
(61772282, 61373134, 61772454, and 61402234). It was also supported by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX17_0901) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET). (Corresponding author: Jin Wang.)
Chunyong Yin is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: [email protected]).
Jinwen Xi is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: [email protected]).
Ruxia Sun is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: [email protected]).
Jin Wang is with the College of Information Engineering, Yangzhou University, Yangzhou 225127, China (e-mail: [email protected]).
and the demand for data publishing, sharing and analysis are also growing rapidly. People pay more and more attention to the security of data protection, and the government, enterprises, and individuals are also improving their understanding of privacy protection.
Body sensor networks (BSNs), as a special application of wireless sensor networks (WSNs) [1], are deployed on the surface of bodies for periodically monitoring physical conditions. For a typical social participatory sensing application, it is important to motivate participatory, at the same time, the participatory sensing process should not disclose the privacy information of any participating party (the private data) or the community (patterns, distribution, etc.). Therefore, it is essential to develop a privacy protection data strategy for the protection of the users and the community [2].
Devices in the Internet of Things (IoT) generate, process, and exchange vast amounts of security and safety-critical data as well as privacy-sensitive information, and hence are appealing targets of various attacks. To ensure the correct and safe operation of IoT systems, it is crucial to assure the integrity of the underlying devices, especially their code and data privacy, against malicious modification [3].
The privacy threats of Industrial Internet of Things (IIoT) [4] can be simply divided into two categories: privacy threats based on data and privacy threats on location. Data privacy issues mainly refer to the leakage of secret information in the process of data acquisition and transmission in Internet of Things.
Location privacy is an important part of privacy protection of the Internet of Things. It mainly refers to the location privacy of each node in the Internet of Things and the location privacy of the Internet of Things in providing various location services, especially including the RFID reader location privacy RFID user location privacy, sensor Node location privacy, and location-based privacy issues based on location services [3], [5].
Usually, data collected, aggregated and transmitted in sensor networks contain personal and sensitive information, which directly or indirectly reveals the condition of a person. If the data cannot be properly preserved, once exposed to the public, the privacy will be destroyed. Therefore, protecting the privacy of sensitive data is greatly important [6], [7].
Location data implies moving objects, spatial coordinates, current time, and unique features different from other data, which is discrete and of high value. Before the concept of big data, most of the privacy protection methods focus on a small
Chunyong Yin, Jinwen Xi, Ruxia Sun, Jin Wang, Member, IEEE
Location Privacy Protection based on Differential Privacy Strategy for Big Data in
Industrial Internet-of-Things
T
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics
2
number of non-positional data. There are some limitations for the protection of location data privacy in big data. There are two main reasons as follows: (1) Multiple data fusion by big data makes traditional anonymity [8], [9], [10] and fuzzification technology difficult to take effect in location privacy protection [11]; (2) Traditional cryptography technology takes little effect on the real-time analysis required by big data.
In summary, location privacy protection faces great challenges for big data in IIoT. Therefore, we propose a more rigorous LPT-DP-k location privacy protection method based on differential privacy strategy for big data in sensor networks.
Our specific contributions mainly include: (I) We introduce a tree structure to represent the location data
in sensor networks, which called LPT (Location privacy tree), according to the characteristics and retrieval difficulty of location data.
(II) The differential privacy strategy is suitable for location privacy protection because it is insensitive to the background knowledge and the DP-k (Differential privacy-k) model has better protection effect. Laplace and index mechanism are the main implementation mechanisms of differential privacy, which can show the degree of privacy protection by allocating the privacy budget and is relatively reliable and rigorous.
(III) We conduct extensive experiments on real-world datasets which show that the proposed location privacy protection method can protect users’ location privacy without significantly affecting the privacy, availability, security, and effectiveness.
Section 2 presents necessary background and related work. Section 3 introduces the preliminary knowledge. Section 4 details the proposed location privacy protection method. The real-world datasets and experimental results are presented in Section 5 and conclusions in Section 6.
II. RELATED WORK
In the process of data mining and data publishing, the privacy protection of location data mainly involves two aspects: privacy model and utility.
A. Privacy Model
In recent years, with the widespread application of location services and data mining and publishing, location privacy protection model in location service can be divided into two categories: one is the traditional anonymous model based on grouping; the other is the differential privacy model that ignores the attacker’s background knowledge.
(1) Anonymous model based on grouping The traditional anonymous model based on grouping plays
an important role in location privacy protection. Samarati et al. [12] firstly proposed k-anonymity method and other a large number of methods based on k-anonymity [13], [14], [15]. Previous research [15], [16] shows that only using anonymous methods does not provide good protection to a wide range of data. According to [17], the encryption privacy protection method is proposed, which can completely protect the privacy of data and prevent the leakage of data in the process of location
service, but the availability of data is insufficient. The development of traditional location privacy protection technology has gone through three stages: the "informed and consent" method proposed by document [18] has been developed to deal with anonymous processing in a single location, and then to deal with anonymous processing of user's trajectory data. Heuristic privacy measurement, probability deduction, and private information retrieval based technologies are common methods to protect location privacy. The heuristic privacy measurement method mainly protects the users who are not in the strict privacy protection environment, such as k-anonymity, t-closeness [19], m-invariance [20], and l-diversity. However, these three kinds of methods are proposed as a unified attacking model, which protect the location data under the premise of accumulating certain background knowledge. As attackers grasp more background knowledge, these methods cannot effectively protect the user’s location data privacy and the lack of privacy protection about the type of relational privacy protection method is proposed in [21].
(2) Differential privacy model Differential privacy model is a strong privacy concept which
is completely independent of attacker’s background knowledge and computing ability, and has become a research hotspot in recent years. Compared to the traditional privacy protection model, differential privacy has its unique advantages. Firstly, the model assumes that the attacker has the maximum background knowledge. Secondly, differential privacy has a solid mathematical foundation, a strict definition of privacy protection and a reliable quantitative evaluation method. In recent years, differential privacy model has been widely applied in privacy protection. Especially, the FM (Functional mechanism) is introduced in [22], in which an objective function of -differential privacy disturbance optimization problem is used to protect privacy. The Diff-FPM (Privacy-Preserving Mining) algorithm is proposed in [23], [24], [25] and combined with MCMC (Markov Chain Monte Carlo), which provides privacy protection and maintains the high data availability while satisfying ( )-differential privacy conditions. The PrivBasis and SmartTrunc method proposed in [26] adopt differential privacy model in the mining process of frequent itemsets, ensuring the privacy and utility of data analysis and anonymity. The DiffP-C4.5 and DiffGen algorithm introduced in [27] combines differential privacy with decision trees and other data structures to maintain a balance between data privacy and availability.
In a word, differential privacy is an effective privacy protection mechanism, which protects the user’s location privacy while keeping enough useful information for data analysis.
B. Utility Maximization
In the process of location service, data mining and data publishing need to protect location privacy and provide enough information for data analysis. Therefore, data utility is the core issue that needs to be paid attention to. Reference [28] uses compressive sensing theory to propose a perception mechanism
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics
3
to solve the published issue of statistical results, which can effectively solve the insufficiency problem of data utility, but it destroys the link among data. DP-topk method is proposed in [29], which has a more rigorous definition of effectiveness. But this method ignored the relation among transaction data, and because of its poor efficiency in data processing individually, the availability of algorithm is not high. To maximize the utility of the results and meet the requirement of location privacy protection, the paper proposes LPT-DP-k algorithm, which is a more rigorous differential privacy protection method, not only can guarantee the high availability of data, but also can enhance the usability of the algorithm.
III. PRELIMINARY KNOWLEDGE
Differential privacy is a common privacy protection framework that supported by the solid mathematical theory, which can provide privacy protection for data in the case that the attacker grasps the largest background knowledge. Definition 1 (Differential privacy)
Suppose there is a random algorithm M , MP is all possible
output sets for M , for any given adjacent data set D and D (there is at most one different record between them),
1D D , MS is any subset of MP . If the algorithm M
satisfies the following inequality, the algorithm M will satisfy -differential privacy protection.
exp .r M r MP M D S P M D S (1)
in which .rP represents the randomness of the algorithm M on the data set D and D . Definition 2 (Sensitivity)
The differential privacy protection method defines two kinds of sensitivity, called global sensitivity and local sensitivity. Suppose there is a query function : df D R , the input is a
data set and the output is a d -dimensional real vector. For any adjacent data set:
1,= max .D Df f D f D (2)
called the global sensitivity of function f , and f represents the maximum change value of the output results.
f D f D is the 1-order norm distance between f D and f D . Definition 3 (Implementation mechanism)
Laplace mechanism and Exponential mechanism [30] are two of the most basic implementation mechanisms of differential privacy protection. In this paper, the Laplace mechanism is used to add the noise that obeys the Laplace distribution to realize the differential privacy. Assuming the privacy protection algorithm f based on the Laplace
mechanism, the noise keeps to the Laplace distribution with the
variance f
is and the mean is 0. Then the probability density
function is:
r 1
P , = exp . 2
x x
(3)
in which x represents the specific variable and = f
.
If the algorithm f is proportional to the probability of
, exp
2
D r
to select from O and export r , then the
algorithm f provides -differential privacy protection, the concrete formula is as follows:
r
, , = : P exp .
2
D r A D r r O
(4)
in which is the global sensitivity of the scoring function ,D r , (4) shows that the higher the score is, the larger the
probability of the selected output will be.
IV. LPT-DP-K LOCATION PRIVACY PROTECTION
This section introduces the overview of LPT-DP-k algorithm and the specific implementation details of the algorithm.
A. LPT-DP-K Algorithm
Firstly, we choose the location privacy tree construction to maintain the relation among the location data. Secondly, we select the sensitive location information which is most likely to disclose privacy to add noise. The algorithm summarizes as follows: Algorithm 1 LPT-DP-k algorithm
1: Initialize: data set *D , location data set D and *D D , differential
privacy parameter , parameter k , the number of nodes that are accessed by the data is count and empty sets A , B . 2: Construct a location privacy tree (LPT). 3: The node data will be divided into two categories, and the frequent patterns
of accessing frequency not less than count are recorded in the set, see section 3.2.
4: Calculate the output probability, the formula is
1
.
.
i r i n
j j
a w P a
a w
, see
section 3.3.
5: Add noise and combine with *D
DLPT to construct the noisy location
privacy tree *D
DLPT k , see section 3.4.
6: Finally, publish the *D
DLPT k after step 4.
B. Construction of Location Information Tree Corresponding to Location Data
Assuming the location privacy tree *D
DLPT is corresponding
to the data set D , which records each person’s visiting records within a month. According to the location information, LPT-DP-k algorithm uses the tree structure with better correlation. The corresponding Table I is shown as follows:
TABLE I LOCATION CORRESPONDENCE
Number Location information Accessing count
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics
4
1 Computer building 28
2 Remote Sensing
Institute 13
3 All-in-one card
management center 2
4 Mingde Building 6 5 Training Gym 5 6 Dormitory 3 30 …… …… ……
As is shown in Table I, the user’s location information and accessing frequency of the places within a month is listed, we can get the location correspondence table.
TABLE II LOCATION DATA INFORMATION
Number Location label Accessing count 1-28 1 28 29-41 2 13 42-43 3 2 44-49 4 6 50-54 5 5 55-84 6 30 85-97 1,2 13 …… …… ……
According to the content in Table II, we can construct the location privacy tree like Fig. 1.
0
1 #28
2 #13
...
... 1,3 #2
1,2 #13
1,2,3 #2
1,2,4 #6
...
1,2,3,6 #2
1,2,3,5 #2
... 1,2,3,4
#2
Fig. 1. Location privacy tree
As shown in Fig. 1, the total number of nodes in *D
DLPT is *
*
*
1
1
2 1 D
D
D i
C
.
C. Weighted Selection based on Exponential Mechanism
We use the exponential mechanism in the differential privacy protection model to do weighted selection. The formula is as follows:
1
. .
.
i r i n
j j
a w P a
a w
(5)
r iP a represents the probability of being selected and .ia w represents the weight of the pattern ia . The selection algorithm
based on the exponential mechanism is as follows: Algorithm 2 Selection algorithm
1: Initialize: records in the frequent pattern set A . 2: Input a set A of frequent pattern records and score each record ia .
3: Calculate the weight .ia w of each pattern record.
4: Take the weight .ia w of each pattern in the set A into the formula
1
.
.
i r i n
j j
a w P a
a w
, select k frequent pattern records ia and record the
set as B . the scoring function is:
, .i iM A a Q a (6)
iQ a represents the accessing frequency of pattern ia . Calculate the weight .ia w of each pattern record.
1 ,. =exp .
2 i
i
M A a a w
M
(7)
This paper sets up the scoring function , i iM A a Q a , iQ a represents the accessing frequency of pattern ia . , iM A a represents the scoring value of ia . The formula to
calculate M is as follows:
1,
max .i j i j N
M Q a Q a
(8)
M represents the maximum value of the difference in the accessing frequency among N data record modes.
D. Noise Enhancement based on Laplace Mechanism
This paper proposes the method of node data processing to improve the efficiency of data processing and the effectiveness of the algorithm. The concrete steps are as follows: Algorithm 3 Laplace noising algorithm
1: Initialize: the privacy budget , k frequent pattern set B after weighted selection, query function f .
2: Add noise into the frequent pattern set B to keep to the Laplace distribution and destabilize the real accessing frequency of k records. The added noise keeps to Laplace distribution that the probability density function is
r 1
P , = exp 2
x x
.
3: According to (9), the noise set C can be calculated. Finally, the noisy
location privacy tree *D
DLPT k is generated by combining the original data
tree *D
DLPT with the set C .
Therefore, it should be satisfied in every round of noise adding process:
. f
f b f b lap
(9)
B is the set of k frequent pattern b . f b is the query
function on the record b and f
lap
represents the noise
that keeps to Laplace distribution, as is shown in Fig. 2.
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics
Fig
be
LP
ef th
ap
ap
A.
(G am III
Us
M py
g. 2. Relationship
True positive e used to mea
*D DPT k . Fal
ffectiveness of e effectivenes True Positive
ppear in kf D
False Positiv
ppear not in kf
Accuracy:
False Rejectio
F
Error:
Experiment
Data s This paper u
Gowalla total- mount of data I shows the ex
ser id Check- 0 2010-10-19 0 2010-10-18 0 2010-10-17 0 2010-10-12
Opera The experi
MATLAB(R20 ython). Hardw
p between privacy
e (TP), false Po asure the utili
lse rejection r
f the algorithm ss of the algori e (TP): the n
D and in kf D
kTP f
ve (FP): the n
kf D , but in
kFP f D
Accuracy
on Rate (FRR
kf D FRR
LP Error
V. EXPERIM
Setting
set uses the chec -check-in data and records th
xperimental da TA
EXPERIME
-in time L T23:55:27Z 30.2 T22:17:43Z 30.2 T23:42:03Z 30.2 T19:44:40Z 30.2
ating environm imental envi 13b) and PyC ware environ
y budget and nois
ositive (FP) an ty of noisy lo
rate (FRR) is
m. The smaller ithm will be. number of fre
D :
k kD f D
number of fre
kf D :
k kf D f
k k
k
f D f
f D
R):
kf D f k
* *
*
2
D D D D
D D
PT LPT
LPT
MENTAL ANALY
ck-in data set a set). This d he information ata set. ABLE III ENTAL DATA SET
Latitude Lon 23590912 -97.79 26910295 -97.74 25573099 -97.76 26910295 -97.74
ment and param ironment in
Charm (Part of nment is: 2
se size
nd accuracy [3 ocation privac
used to analy
r FRR is, the
equent pattern
.
equent pattern
.kf D
.
D
.
kf D
2 . k
YSIS
from the Go data set has a n of the users.
ngitude Locat 9513958 228 4939537 420 6338577 316 4939537 420
meter configur n this pape f the program .60GHz Cor
31] can cy tree
yze the
higher
ns that
(10)
ns that
(11)
(12)
(13)
(14)
owalla a large Table
tion id 847
0315 6637 0315
ration er is
m using e(TM)
i7-6 para
Lo
B.
T (1) Tim priv
TIM
patte n 16 32 64 12 25 51
102 204 409
A reco is co
k
A extr incr prot time
C.
T with repr sens 3, 4
6700HQ CPU ameters are sh
Parameter
k
Location id ocation accessing
frequency
Analysis of Al
The timeliness Timeliness
meliness of w vacy tree. (3) T
ME OF BUILDING
ern n
time.build (s
6 0.0000015 2 0.0000016 4 0.0000029 8 0.0000057 6 0.0000116 2 0.0000149
24 0.0000262 48 0.0000370 96 0.0000443
As shown in T onstructing loc onstantly chan
EXTRACTI
0.01 k 3432
As is shown raction efficie rease of . A tection decrea e increases.
Analysis of D
This paper ext h 500, 1000, resent the inse sitive location
4, and 5:
U, 20.00GB, W hown in Table
TAB PARAMETE
20 40 60 80
0.005 0.01 0. 0.7
lgorithm Time
analysis is m of constructin
weighted selec The efficiency
TAB AND ADDING NO
INFORMA
) time.update (s)
0.000103010 0.000103110 0.000103324 0.000103010 0.000104032 0.000103000 0.000105032 0.000106340 0.000108970
Table V, the cation privacy nging.
TAB ION EFFICIENCY O
0.03 0.05
3691 4254
in Table VI ency of loca As the inc
ases and the n
Data Utility
tracts data fro and 2000 da
ensitive locatio ns. The experim
(
Win10 system IV during the LE IV ER SETTING
value 100 150 200 300
10000 200 .015 0.02 0.025 0
75 1 1.1 1.25 1.5 1 1 2 3 4 5
28 13 2 6 5
eliness
mainly analyze ng location
ction and upd y of extracting BLE V OISE TO RECONSTR ATION TREE
) pattern
n time.bu
16 0.0000 32 0.0000 64 0.0000 128 0.0000 256 0.0000 512 0.0000
1024 0.0000 2048 … 4096 …
efficiency of y tree is higher
LE VI OF LOCATION DA
0.1 0.5
42103 8932
I, within a c ation data in creases, the d
number of extr
om multiple g ata. The blue ons and the red mental results
a)
m of 64 bit. e experiment.
500 1000 2000 5 000 .03 0.035 0.04 0.
1.75 5 10 15 6 30
ed in three asp privacy tree.
dating of loca location data.
RUCT THE LOCAT
uild (s) time.upda
0014 0.000102 0024 0.000102 0035 0.000102 0061 0.000102 0110 0.000102 0150 0.000102 0280 0.000102
… … … …
f constructing r when the mo
ATA PRIVACY
1.0 1
2 53245 79
certain range, ncreases with degree of pri raction in the
groups of patt dots are use
d ones indicat s are shown in
5
The
5000
1 0.5
pects: . (2) ation .
TION
ate (s)
2190 2180 2195 2120 2095 2095 2030
g and ode n
1.5
9310
, the h the ivacy unit
terns ed to e the
n Fig.
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics
Fig are
lo an
Fig loc
lo se wh
g. 3. Data distribu e shown before ad
The red sensi cations are 39
nd the blue ins
g. 4. Data distr cations are shown
There are 141 cations in Fi
ensitive positio hich shows th
ution before and a dding noise. (b) T
itive locations 7 in Fig. 3(a),
sensitive locati
ribution before an n before adding no
1 red sensitive ig. 4(a), after on and 781 blu at the sensitiv
(b) after protection (k The ones after add
s are 103 and the red sensiti ions are 353 in
(a)
(b) nd after protectio oise. (b) The ones
locations and r protection,
ue insensitive l ve locations ha
(a)
is 500). (a) The lo ding noise.
the blue insen ive locations a n Fig. 3(b).
on (k is 1000). s after adding nois
859 blue insen there are 21
locations in Fig ave increased.
ocations
nsitive are 137
(a) The se.
nsitive 19 red g. 4(b),
Fig. locat
T inse red Fig. utili
Pa nu
1 2
W TDP exp
Fig.
F LPT is b prop prop the priv and TDP the low rela VIII
UTI
5. Data distrib tions are shown b
There are 23 ensitive locatio sensitive loca
. 5(b). Table ity.
UTILITY S
Bef
attern umber
Sensit location
dots 500 103
1000 141 2000 235
We compare t PS_LP_Result erimental resu
6. Error compari
From Fig. 6 sh T-DP-k algorit beginning to s posed method posed method
Error be s vacy protection d the higher t PS_LP_Result privacy param
wer error. When atively larger.T I and IX:
ILITY COMPARISO
(b ution before and
before adding nois
5 red sensit ons in Fig. 5(a ations and 162 VII gives th
TABL STATISTICS UNDE
fore the experimen tive
ns (red s)
Insensi locations
dots 3 397 1 859 5 1765
the proposed t, TDPS_Sig
ults are shown
ison of adding-no
hows that: (1) thm is the sma stabilize, whic d is higher
ds; (2) as the smaller; when n, the smaller the privacy pr t, TDPS_LP_
meters are requ n the privacy p The experime
TABL
ON OF TOP k FRE PRIVACY PARA
b) d after protection se. (b) The ones a
tive locations a), after protec 20 blue insen e statistical r
LE VII ER DIFFERENT DA
nt Afte tive (blue )
Sensiti locations
dots) 7 137 9 219 5 380
method with gnal and TD n in Fig. 6:
oise methods
the noise erro aller, when ch shows that and better th privacy param n meeting th r is, the mo rotection leve _Signal and T uired to be lar parameters are ntal results ar
LE VIII
EQUENT ITEMSET
AMETER ( 1 )
n (k is 2000). (a after adding noise
s and 1765 ction, there are nsitive location results of the
ATA MODELS
er the experiment ive (red )
Insensiti locations (
dots) 353 781
1620
three method DPS_EP and
or of the prop 0.015 , the
t the utility of han the prev meter incre he -differe
ore the noise n el will be; (3) TDPS_E meth rger to achieve e small, the err re shown in T
MINING UNDER F
6
a) The .
blue e 380 ns in data
ive (blue
ds of the
posed error f the vious ases,
ential needs ) for hods, e the ror is Table
FIXED
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics
5 1 2 5 10 20
th ac th wh m
U
0. 0. 0 0
1
th
LP 7
Fig
Fig
re va
TP k LPR LPS
50 46 47 00 80 78 00 126 126 00 305 290 000 530 540 2 000 680 940
As shown in e accuracy of
ccuracy of the e accuracy of t hen 500k ,
more than 80%.
UTILITY COMPARI
TP LPR LPS
.01 136 184
.05 178 182 0.1 174 180 0.5 180 190 1 182 188 .5 184 190
As shown in e higher the p Finally, the
PT-DP-k by F and 8:
g. 7. FRR compa
g. 8. FRR compa
Assuming k sult is shown i
alue of FRR w
P EP DPk LPR 26 49 4 39 96 20 70 190 74
150 455 195 230 890 470 40 1720 1320
Table VIII: (1 f 4 methods d previous three the proposed m the accuracy
. TA
ISON OF TOP k F FREQUENT PAT
P EP DPk LPR 70 188 64 64 188 22 90 190 26
160 190 20 168 188 18 174 190 16
Table IX, the rivacy protect paper compa
FRR. The expe
arison when k
arison when =1 is 100, the p in Fig. 7: with
with DP-topk an
FP LPS EP DPk
3 4 1 22 61 4 74 130 10
210 350 45 460 770 110 1060 1960 280
1) with the val decreases; (2) e methods is l method still re y decreases bu
ABLE IX
FREQUENT ITEMSE
TTERNS ( 20k FP
LPS EP DPk 16 130 12 18 136 12 20 110 10 10 40 10 12 32 12 10 26 10
smaller the p tion level will ares the utilit erimental resul
100
privacy parame h privacy param nd the propose
Accuracy k LPR LPS EP
0.93 0.95 0.5 0.80 0.78 0.3 0.63 0.63 0.3 0.61 0.58 0.3 0.53 0.54 0.2 0.38 0.47 0.2
lue of k incre when 20k
less than 70%, emains at abou ut still mainta
ET MINING UNDER
0 ) Accuracy
k LPR LPS EP 0.68 0.92 0.3 0.89 0.91 0.3 0.87 0.90 0.4 0.90 0.95 0.8 0.91 0.94 0.8 0.92 0.95 0.8
privacy parame be.
ty of DP-top lts are shown i
eter is change meter increasin ed method mai
y P DPk 3 0.98 9 0.96 5 0.95 0 0.91 3 0.89 0 0.86
easing, 0 , the , while
ut 95%, ains at
R FIXED
y P DPk 5 0.94 2 0.94 5 0.95 0 0.95 4 0.94 7 0.95
eter is,
pk and in Fig.
ed, the ng, the intains
low we c whe
A show but up, leve algo
T base netw con the beca and posi data The prot loca is m effic effic proc targ
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
w, but the value can see that th en the protecti Assuming is wn in Fig. 8: w the value of D the proposed
el of location p orithm.
This paper pro ed on differen works. The m structing the l problem that ause of its cha
d adds noise in ition data. It i a and maintain e differential tection of loca ation privacy p
more rigorous a ciency. In the cient data stru cess of locatio
get functions fo
I. F. Akyildiz, W sensor network Journal of Comp pp. 393-422, 20 K. Xing, C. Q. Privacy Preservi IEEE Transact 2066-2076, 2017 Z. H. Lv, H. “Next-Generatio Future Research vol. 13, no. 4, pp F. Xiao, L. T. Sh for unknown B Internet of Th Computing, no. Z. Q. Wang, F See-through-wa on Battery-free Systems, vol. 17 H. Alemdar and survey,” Compu Z. J. Fu, K. Re personalized se improvement,” I vol. 27, no. 9, pp H. Rong, T. H. isomorphism m detection,” Soft T. H. Ma, Y. L Abdullah and A. and Edge Mod 1336–1344, 201
e of the propo he availability ion level is hig s 1, the value with k increas
DP-topk is large method has o
privacy protec
VI. CON
oposes a locat ntial privacy method expre location inform the location d
aracteristics of nformation to c is more effect ning high ava privacy prote ation privacy. protection alg and has higher next step, we
ucture to expr on data expre
for different ap
REFER W. Su, Y. Sankara ks: a survey,” C puter & Telecom 10. Hu, J. G. Yu, X
ing k-Means Clus tions on Industr 7. B. Song, P. on Big Data Anal h Topics,” IEEE p. 1891-1899, 20 ha, Z. P. Yuan and
Bugs based on An hings,” IEEE Tr 99, pp. 1–13, 201 F. Xiao, N. Ye ll System for Dev RFID,” ACM T
7, no. 1, pp. 1–21, d C. Ersoy, “Wire uter Networks, vol en, J. G. Shu, X earch over encry IEEE Transaction p. 2546–2559, 20 . Ma, M. L. Tan
method in social Computing, pp. 1
L. Zhang, J. Cao, . R. Mznah, “KDV
dification algorith 5.
osed method i of the propose
gh. e of k is chan sing, the value er than that in obvious advan
ction and the e
NCLUSION
tion privacy p strategy for b
esses the pos mation tree mo data is difficu f high dispersio cover the orig tive in protect ailability of da ection model . Compared w gorithm, the pr r algorithm util will discuss to
ress location i ession and pro pplication scen
RENCES asubramaniam and Computer Netwo munications Netw
X. Z. Cheng and stering in Social P rial Informatics,
Basanta-Val, A lytics: State of th Transactions on 17. d R. C. Wang, “V nalysis for know
Transactions on 17. , R. C. Wang vice-free Human Transactions on
2017. eless sensor netw l. 54, no. 15, pp. 2 . M. Sun and F.
ypted outsourced ns on Parallel an
016. ng and J. Cao, “A l network based 1–19 , J. Shen, M. L. VEM: a k-degree hm,” Computing
s lower. From ed method is b
nged, the resu e of FRR incre this paper. To ntages both in
effectiveness o
protection me big data in se ition data se odel, which so
ult to be expre on and low den
ginal trajectory ting the privac ata and algori is applied to
with the traditi roposed algor lity and proces o explore the m information in opose more u narios.
d E. Cayirci, “Wi orks the Interna working, vol. 38,
d F. J. Zhang, “M Participatory Sen
vol. 13, no. 4
A. Steed and M he Art, Challenge Industrial Inform
VulHunter: A Disc wn patches in Ind
Emerging Topi
and P. L. Yang Motion Sensing B Embedded Comp
works for healthca 2688-2710, 2010. X. Huang, “Ena
d data with effic nd Distributed Sys
A novel subgraph d on graph sim
Tang, Y. Tian, anonymity with V
g, vol. 70, no. 6
7
m this better
ult is ases,
o sum n the of the
ethod ensor et by olves essed nsity,
y and cy of ithm. o the ional rithm ssing more n the
utility
ireless ational
no. 4,
Mutual nsing,” 4, pp.
M. Jo, s, and
matics,
covery dustry ics in
g, “A Based puting
are: A . abling ciency stems,
h K+- milarity
A. D. Vertex 6, pp.
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics
[10
[11
[12
[13
[14
[15
[16
[17
[18
[19
[20
[21
[22
[23
[24
[25
[26
[27
[28
[29
[30
[31
0] C. Y. Yin, S. model for big & Computatio
1] C. Y. Yin, J. W Improved K -V Mobile Inform
2] P. Samarati an when disclo SIGACTSIGM ACM, New Yo
3] R. C. W. Won An enhanced k in Proc. of the and Data Min
4] N. H. Li, T. beyond k-ano Conf., on Data
5] E. Zheleva relationships i on Privacy, S 153–171.
6] A. Korolova, R networks,” in P 1355–1357.
7] E. D. Cristofar Privacy at the and Privacy, I
8] A. R. Beresfor privacy for app Workshop on York, 2011, pp
9] B. Bamba, L. L queries in mo Int’l Conf., on
0] L. Liu, “From in Koch C, ed ACM, New Yo
1] C. Dwork, “D Annual ACM- and Applied M
2] B. Gu, V. S. support vecto Neural Netwo 2014.
3] B. Gu, V. S. S learning for v 140-150, 2015
4] B. Gu and V v-support vect and Learning
5] B. Gu, X. M Machine,” IEE vol. 28, no. 7,
6] E. T. Shen an privacy,” in P Discovery and 545–553.
7] C. Zeng, J. F. itemset minin (VLDB), Trent
8] A. Friedman a Proc. of the A Data Mining (
9] Y. D. Li, Z mechanism: U Proc. of the 1 Society, ACM,
0] X. J. Zhang, M top-k Frequen Research and
1] C. Y. Yin an classification algorithm,” M
Zhang, J. W. Xi data security bas
on Practice & Exp W. Xi and R. X. Su Value Method in
mation Systems, pp nd L. Sweeney,
osing informatio MOD-SIGART Sym
ork, 1998, pp. 188 ng, J. Y. Li, A. W k-anonymity mod e 12th ACM SIGK ning, ACM, New Y
C. Li and S. Ve onymity and l-div a Engineering, IE and L. Getoor, in graph data,” in Security, and Tru
R. Motwani, S. U Proc. of the 24th I
ro, C. Soriente, G e time of twitter,” IEEE, Piscataway rd, A. Rice, N. Sk plication function Mobile Computin p. 49–54. Liu, P. Pesti and T bile environment
n World Wide Web m data privacy to l d. Proc. of the 33r ork, 2007, pp. 142
Differential privac -SIAM Symp., on D Mathematics, ACM
Sheng, K. Y. T r learning for or
orks and Learning
Sheng, Z. Wang, v-support vector r 5.
V. S. Sheng, “A tor classification, Systems, vol. 28, . Sun and V. S. EE Transactions o pp. 1646–1656, 2
nd T. Yu, “Mining Proc. of the 19th A d Data Mining (SI
Naughton and J. ng,” in Proc. of to, Italy, 2013, pp
and A. Schuster, “ ACM SIGKDD In (SIGKDD), ACM, . J. Zhang, M.
Utilizing sparse r 10th Annual ACM
M, New York, 2011 M. Wang, and X. F nt Pattern Under D
Development, vo nd J. W. Xi, “M
in cloud compu Multimedia Tools &
i and J. Wang, “ sed on clustering perience, vol. 29, un, “Location Priv
n Augmented Rea p. 1–7, 2017. “Generalizing da on,” in Proc. mp., on Principle 8–202.
W. C. Fu and K. W del for privacy-pr KDD Int’l Conf., o York, 2006, pp. 7 enkatasubramania versity,” in Proc. EEE, Piscataway,
“Preserving th Proc. of the 1st A
ust in KDD, ACM
U. Nabar and Y. Xu Int’l Conf., on Da
G. Tsudik and A. W in Proc. of IEEE
y, 2012, pp. 285–2 kehin and R. Soha nality on smartpho ng Systems and A
T. Wang, “Suppor ts with privacy gr b, ACM, New Yor location privacy: rd Int’l Conf., on 29–1430. cy in new setting Discrete Algorith
M, New York, 201 ay, W. Romano
rdinal regression, g Systems, vol. 2
D. Ho, S. Osman egression,” Neura
robust regulariza ” IEEE Transacti no. 5, pp. 1241– Sheng, “Structu
on Neural Networ 2017. g frequent graph ACM SIGKDD In
SIGKDD), ACM, C
. Y. Cai, “On dif the 39th Conf o
p. 1087–1098. “Data mining with Int’l Conf., on Kn
M, Washington, US Winslett and Y
representation in M Workshop on P 1, pp. 177–182. F. Meng, “An Acc Differential Privac ol. 51, no. 1, pp. 1
Maximum entropy uting using imp & Applications, p
An improved ano algorithm,” Conc no. 7, 2017.
vacy Protection B ality on Mobile D
ata to provide ano of the 7th es of Database S
Wang, “(a, k)-Ano reserving data pub on Knowledge Di 54–759. an, “t-closeness: . of the 23rd IEE 2007, pp. 106–11
he privacy of s ACM SIGKDD W M, New York, 20
u, “Link privacy i ata Engineering, 2
Williams, “Humm E Symposium on S 299. an, “MockDroid: ones,” in Proc. of Applications, ACM
rting anonymous rid,” in Proc. of t rk, 2008, pp. 237 Models and algor Very Large Data
gs,” in Proc. of t hms Society for In 10, pp. 174–183. and S. Li, “Incr ” IEEE Transact
26, no. 7, pp. 140
n and S. Li, “Incr al Networks, vol.
ation path algori ions on Neural N 1248, 2017.
ural Minimax Pro rks and Learning
patterns with diff nt’l Conf., on Kno Chicago, USA, 20
fferential private f of Very Large D
h differential priv nowledge Discov SA, 2010, pp. 493 Y. Yang, “Comp
differential priva Privacy in the Ele
curate Method for cy,” Journal of Co 04-114, 2014.
y model for mob proved informatio pp. 1-17, 2016.
onymity currency
Based on Devices,”
onymity ACM
Systems,
onymity: blishing,” iscovery
Privacy EE Int’l 15. sensitive
Workshop 007, pp.
in social 2008, pp.
mingbird: Security
Trading f the 12th M, New
location the 17th –246. rithms,” a Bases,
the 21st ndustrial
remental tions on 03-1416,
remental . 67, pp.
ithm for Networks
obability Systems,
fferential owledge 013, pp.
frequent Database
vacy,” in very and 3–502. pressive acy,” in ectronic
r Mining omputer
bile text on gain
H Info coau His sens is a
Chin cryp
Uni algo wire and
He is a Profess ormation Scien uthored more current resea
sor networking member of A
na. His cur ptography, big
iversity. His orithm design, eless ad hoc a
d ACM.
Chunyong from Shan China in 1 Ph.D. deg Guizhou U 2008, resp research a New Brun 2012.
sor and Dean nce & Techno
than twenty arch interests g, machine lea
ACM.
Jinwen Xi computer Huaiyin In 2011.
Now he degree i technology Informatio
rrent researc g data security
Ruxia Su electrical e University She is a
Nanjing Science & research in systems, mathemati
Jin Wan degree fro and Telec and 2005, degree fro in 2010.
He is a Informatio
research inte , performance
and sensor netw
g Yin receive ndong Univers 1998. He rece grees in comp University, Ch pectively. He w associate at t nswick, Cana
with the Nan ology, China. H
journal and c include priva arning and net
i received his science and
nstitute of Tech
e is studying in computer y from Nanji on Science ch interests y and privacy.
un received th engineering fr
y of Technolog an associate
University Technology, nterests inclu
machine ical modeling.
ng received th om Nanjing U ommunication respectively. m Kyung Hee
a professor i on Engineer erests mainly evaluation an
works. He is a
ed the B.S. de ity of Technol
eived the M.S. puter science hina, in 2005 was a post-doc the University ada, in 2011
njing Universit He has authore conference pap cy preserving twork security
bachelor degr technology
hnology, Chin
g for his mas r science ing Universit and Technol are in app
he B.E. degre rom the Shand gy, China, in 1
Professor in of Informa
China. Her cu de cyber phy
learning .
he B.S. and M University of P ns, China in 2 He received P
e University K
n the Colleg ring, Yangz
y include rou nd optimization a Member of I
8
egree logy, . and from and
ctoral y of and
ty of ed or pers.
g and y. He
ree in from
na, in
ster’s and
ty of logy, plied
ee in dong 1997. n the ation
urrent ysical
and
M.S. Posts 2002
Ph.D. Korea
ge of zhou uting n for IEEE
1. What is the purpose of this article?
2. What are the domain and context of this research problem?
3. What does the author claim is the significance of the study?
4. What are the research questions that motivate the study?
5. What literature/prior research do the authors use to validate the problem?
6. How did the authors demonstrate that there is not a ready solution to the problem?
References
- Location Privacy Portection based
- questionswkw