Data Analysis
https://doi.org/10.1177/0047287517692446
Journal of Travel Research 1 –15 © The Author(s) 2017 Reprints and permissions: sagepub.com/journalsPermissions.nav DOI: 10.1177/0047287517692446 journals.sagepub.com/home/jtr
Empirical Research Articles
Introduction
Tourism plays an important role in the growth of the global economy. In 2013, international tourism generated a total of US$1,075 billion (ATTF 2013). Among this amount, US$102 billion were received from the expenditure of Chinese travel- ers, making China the number one source market. Germany and the United States both ranked second, with US$84 bil- lion. Australian tourists’ expenditure overseas ranked ninth globally, with US$28 billion (ATTF 2013). An accurate and insightful understanding of travel behavior is thus vital to utilize the great economic benefits of the tourism industry (Edwards et al. 2009). By better understanding travel behav- ior, tourism practitioners can formulate more appropriate business strategies and travel service/products to meet trav- elers’ needs, which in turn make a remarkable return on busi- ness investment.
Tourism researchers and managers have been pursuing insights into travel behavior to support strategic planning and decision-making in product development and destination management (Li, Meng, and Uysal 2008). Knowledge about travelers’ location preferences helps tourism managers refine existing attractions, planning new ones, and proposing effec- tive marketing strategies (Lew and McKercher 2006). Understanding the movement patterns of travelers is valu- able for tourism organizations in identifying bottlenecks and unnecessary barriers in the flow among tourism destinations (Prideaux 2000), or in segmenting the tourism market to identify suitable travel packages that well align with the characteristic of travelers (Xia et al. 2010).
An analysis of travel patterns is usually performed based on the travel history recorded by travelers during their trips, which are referred to as travel diary (Leung et al. 2012; Sheng and Chen 2013; Vu et al. 2015). Spatial information and tem- poral information are important components of travel diaries for describing the travel events, so that their behavioral pat- terns can be inferred. Because of the complex nature of travel behavior, efforts have been made to develop techniques to analyze travel diaries to extract useful patterns. For instance, a method based on dominant movement patterns was intro- duced for segmenting the tourism market of Phillip Island in Australia (Xia et al. 2010). An anisotropic dynamic spatial lag panel Origin–Destination travel flow model was proposed to analyze Australian domestic and international travel patterns (Deng and Athanasopoulos 2011). Both spatial and temporal dynamics were incorporated for tourism demand modeling from the perspective of origin–destination travel flows to the discovery of useful temporal and spatial patterns. To demon- strate, content and social network analyses were carried out to
692446 JTRXXX10.1177/0047287517692446Journal of Travel ResearchVu et al. research-article2017
1Center for Applied Informatics, Victoria University, Melbourne, Victoria, Australia 2School of Information Technology, Deakin University, Melbourne, Victoria, Australia 3School of Hotel & Tourism Management, Hong Kong Polytechnic University, Kowloon, Hong Kong
Corresponding Author: Gang Li, School of Information Technology, Deakin University, 221 Burwood Highway, Melbourne, Victoria 3125, Australia. Email: [email protected]
Travel Diaries Analysis by Sequential Rule Mining
Huy Quan Vu1, Gang Li2, Rob Law3, and Yanchun Zhang1
Abstract Because of the inefficiency in analyzing the comprehensive travel data, tourism managers are facing the challenge of gaining insights into travelers’ behavior and preferences. In most cases, existing techniques are incapable of capturing the sequential patterns hidden in travel data. To address these issues, this article proposes to analyze the travelers’ behavior through geotagged photos and sequential rule mining. Travel diaries, constructed from the photo sequences, can capture comprehensive travel information, and then sequential patterns can be discovered to infer the potential destinations. The effectiveness of the proposed framework is demonstrated in a case study of Australian outbound tourism, using a data set of more than 890,000 photos from 3,623 travelers. The introduced framework has the potential to benefit tourism researchers and practitioners from capturing and understanding the behaviors and preferences of travelers. The findings can support destination-marketing organizations (DMOs) in promoting appropriate destinations to prospective travelers.
Keywords data mining, geotagged photo, sequential rule mining, travel diary, Flickr
2 Journal of Travel Research
examine the travel diaries and map movement patterns of travelers during the Beijing Olympics (Leung et al. 2012). Other works adopted the Geographic Information System to facilitate the analysis of the movement patterns of travelers (Li, Meng, and Uysal 2008; Zakrisson and Zillinger 2012; Orellana et al. 2012). Since traditional data collection meth- ods, such as surveys, opinion polls, and questionnaires, usu- ally require direct contact with travelers, the collected data are limited in the number of responses and the scale of geo- graphical area included (Zheng, Zha, and Chua 2012). Vu et al. (2015) overcome this limitation by utilizing the geo- tagged photos taken by travelers to capture the spatial and temporal information effectively.
Despite the efforts from researchers, tourism managers are still facing challenges in gaining insights into the com- plex travel behavior of tourists. Travel diaries usually com- prise multiple travel events to different locations/destinations (Leung et al. 2012). The sequential association of the visited locations can reflect travel behaviors and preferences, espe- cially in case of international travel. For instance, some trav- elers who visited France in their trips to Europe had also visited Italy, whereas other travelers would visit the United States after their visit to Canada during their trips to North America. Such sequential associations are useful for agen- cies in creating more appropriate and promising travel pack- ages. Special offers to visit both the United States and Canada can then be presented to travelers, especially those who want to visit Canada. Such sequential associations are often embedded in the complex sequential travel data, but the existing methods in travel behavior analysis are incapable of accounting for these multiple sequential travel events simul- taneously. Traditional approaches using descriptive statistics focus on identifying popular destinations (TRA 2014b). A travel sequence has been considered but is limited to a few subsequent travel events (Leung et al. 2012; Barchiesi et al. 2015; Vu et al. 2015). Prior works were unable to discover sequential association in travel diary data for insightful understanding of travelers’ behavior.
Recently, a branch of data mining specifically for sequen- tial patterns has emerged because of the increasing availabil- ity of sequential databases (Mabroukeh and Ezeife 2010). Sequential patterns and subsequences that appear frequently in sequential data sets can be effectively discovered. For instance, Shie et al. (2012) mined user behavior patterns in mobile environments for planning mobile commerce envi- ronments and managing online shopping websites. Aloysius and Binu (2013) mined user buying patterns to improve shelving of products based on order of purchasing patterns. Lately, Zheng et al. (2016) attempted to extract sequential behavioral patterns between compliant and noncompliant taxpayers in the financial service industry. Cheng et al. (2016) mined sequential risk patterns from diagnostic clini- cal records to provide potential clues for physicians for early detection of diseases. Since the travel events in travel diaries can be treated as sequential patterns in a temporal order, it is
therefore beneficial to adopt techniques for mining sequen- tial data to analyze the travel diaries.
Aiming to address the limitations in prior works, this arti- cle attempts to incorporate data-mining techniques for sequential patterns into travel behavior analysis. A method named sequential rules mining (SRM) is introduced to extract the sequential patterns from travel diaries. SRM is able to reveal the complex travel behavior of travelers and infer the potential associated travel destinations (Cheng et al. 2016). The advantage of the proposed method is demon- strated in a case study of international travel patterns of Australians. We utilize geotagged travel photos available on social media sites as a data source as they are available on a large scale and are effective in capturing the travel behavior of tourists (Barchiesi et al. 2015; Vu et al. 2015). The geo- tagged photos are taken by travelers during their trips through digital photo-capturing devices, such as smartphones, smart cameras, and tablets. These devices have a built-in global positioning system (GPS) to record geographical informa- tion automatically. The travel history of tourists can be extracted from the sequence of posted photos as travel dia- ries. The study reveals sequential travel patterns of Australian travelers to popular destinations in Asia, Europe, and America, to offer insights to tourism managers for destina- tion marketing and travel package development. It is impor- tant to mention that the focus of this article is on the sequential travel patterns of travelers to demonstrate the capability of the travel diary and SRM; other influencing factors of travel behavior are beyond this study’s scope of coverage. The introduced framework with the SRM technique has the potential to benefit tourism researchers and practitioners from capturing and understanding the complex travel behav- iors and preferences of travelers.
The rest of the article is organized as follows. The second section provides the background on travel diary for travel research and methods for sequential pattern analysis, which is followed by a recap of pattern mining techniques for sequential data. The third section presents our framework to process geotagged photos for travel diary construction, which is followed by a description of the SRM technique. The fourth section describes case study and result analysis for Australian travelers, and discusses the practical implica- tions of the research outcome. The final section concludes the article and envisages some future research directions.
Literature Review
Travel Diary for Travel Research
Breakwell and Wood (1995, 294) defined diary as “a record of information in relation to the passage of time.” Early attempts in tourism have made use of diaries to record and analyze traveler behavior and expenditure on entertainment, food, and shopping (Breen, Bull, and Walo 2001) and to explore their experiences, emotions, and satisfaction (Coghlan
Vu et al. 3
and Pearce 2010). Travel diaries were used to capture the movement of travelers (Ian, Shane, and Jillian 2011) or to address transportation problems at tourism destinations (McKercher and Lau 2008).
Travel diaries can be recorded in various forms, such as handwriting on paper (McKercher and Lau 2008), video recording (Pocock and McIntosh 2013), and online blog posts (Leung et al. 2012). Recently, GPS-enabled handheld devices, such as GPS loggers, have been employed by researchers to analyze activities of travelers because of the development and widespread use of GPS technology (Orellana et al. 2012; Birenboim et al. 2013). In these works, direct contact with participants is required to obtain their travel diaries. The collected data are, thus, limited in terms of the number of responses or the scale of the included geo- graphical areas.
Several forms of location data have been utilized to pas- sively capture the travel pattern of travelers. For instance, Sobolevsky et al. (2014, 2015) used bankcard transaction data, which are captured via bankcard terminals, to model the spatial and temporal mobility pattern of travelers. Further, Versichele et al. (2014) adopted Bluetooth tracking of data to determine the visiting patterns of travelers to attractions. Raun, Ahas, and Tiru (2016) measured the visitor flows for destination management using mobile tracking data. Although these data effectively capture the mobility patterns of travel- ers, they are not freely available for public use. Researchers have resorted to data that are available online, such as the geotagged travel photos (Vu et al. 2015) and geotagged tweets (Chua et al. 2016). The photos were captured by travelers’ GPS-enabled photo capturing devices, and then shared pub- licly on photo-sharing sites, such as Flickr (www.flickr.com) and Panoramio (www.panoramio.com). Geotagged tweets are short messages in a social media platform known as Twitter (https://twitter.com) that are generated by users through their mobile devices with built-in GPS function.
Tourism researchers have used geotagged travel photos to analyze travel behavior at destination. For instance, Kádár (2014) used geotagged photos to study tourist activities in several European cities. Onder, Koerbitz, and Hubmann- Haidvogel (2014) analyzed Flickr photos in Austria to deter- mine their usefulness in indicating tourism demand. Vu et al. (2015) utilized geotagged photos to discover the travel behav- ior and preference of inbound tourists to Hong Kong. Recently, the capability of geotagged photos in modeling international travel behavior has received increasing atten- tion. Barchiesi et al. (2015) used large-scale geotagged pho- tos to quantify international travel. Yuan and Medel (2016) focused on the interactions among countries in tourism eco- nomics by modeling international travel behavior and inter- country travel flows. Social media strongly influence the tourism industry as people today are becoming heavily depen- dent on virtual communities in searching for and sharing travel information (Xiang and Gretzel 2010) given that social media are available at large volumes and have up-to-date
content for most locations worldwide. Social media data are significant in studying the movement of tourists as well as in understanding their travel preferences (Chua et al. 2016).
Travel Pattern Analysis
In the context of tourism, travel patterns are referred to as the movements or travel flows from one tourism attraction to another. A popular approach to study travel patterns is to pres- ent the flows in the form of Origin-Destination matrix (Hwang and Fesenmaier 2003). The values in the matrix can be the actual count of the transitions (Leung et al. 2012), or the pro- portions of movements from one destination to another, which were computed by Markov-chain technique (Hwang and Fesenmaier 2003). The matrix was also used to represent the changes in the probability of visiting a destination given the changes of attraction at other destinations (Yang, Fik, and Zhang 2013). Vu et al. (2015) used Markov chain to examine the flows of tourists in the Hong Kong metropolitan area. The Origin-Destination matrix was also visualized using a net- work graph to facilitate the travel analysis for many tourism destinations (Leung et al. 2012; Zach and Gretzel 2012). In these works, the flows are usually represented for two loca- tions at a time, the origin and the destination.
Travel sequences with more than two locations were con- sidered in the work of Xia et al. (2010) for mining dominant movement patterns. The patterns were identified manually from a visitor survey of a small-scale tourism attraction with nine destinations within the attraction. The proposed approach is not practical for a large-scale study, especially international travel with many possible destinations. Orellana et al. (2012) used an automatic method named generalized sequential patterns to examine visitor movement in natural recreational areas. Their method can extract sequential pat- terns in a relative rather than absolute order. However, gener- alized sequential patterns were focused on identifying popular travel paths from sequential data rather than assess- ing the sequential association between visited destinations.
A set of techniques for processing sequential pattern dis- covery has been used in tourism literature, which includes time-series analysis, associate distance measure method, sequence alignment method, and high-frequency pattern methods (Shao and Gretzel 2010). Among them, sequence alignment is frequently used for determining the movement pattern of tourists in different destinations. They are unable to represent the sequential associations between events in sequential data.
Mining Pattern from Sequential Data
Discovering a temporal relationship from data is important because it enhances our understanding of the data and pro- vides a basis for making predictions. In data mining, various techniques have been proposed to mine different types of sequential patterns, such as closed sequential patterns (Yan,
4 Journal of Travel Research
Han, and Afshar 2003), maximal sequential patterns (Fournier-Viger, Wu, and Tseng 2013), compressing sequen- tial patterns (Chang et al. 2006), and sequential generator patterns (Fournier-Viger et al. 2014a). Although these approaches can discover frequent sequences in the travel data set, they are insufficient to make meaningful predictions (Fournier-Viger et al. 2012). For instance, a travel event c may appear frequently after travel events a and b, but there are cases that events a and b are not followed by event c. Predicting that c will occur after a b, according to sequen- tial pattern a b c, , is not possible. To assess the association between a b, and c , patterns indicating how many times c appears after a b, and how many times it does not should be available. As such, SRM was proposed as an alternative approach (Fournier-Viger et al. 2012).
A sequential rule is represented in the form X Y⇒ , where X and Y are unordered item sets. The interpretation of the rules is that if some event(s) X occur(s) in a sequence, item Y will occur afterward in the same sequence. Applications of SRM have been found in stock market analy- sis (Yang, Hsieh, and Wu 2006), weather observation (Hamilton and Karimi 2005), and e-learning (Faghih et al. 2010). Sequential rules are related to, but different from, association rules (Tan, Michael, and Kumar 2005; Law et al. 2011). The former accounts for the temporal order of items in a sequence, whereas the latter does not. An association rule of the form X Y⇒ does not necessarily mean that event(s) Y occurs after other event(s) X, which is not suitable for assessing the sequential associations between destinations as the case of SRM.
Summary
Travel diary is an effective form to capture comprehensive information on the behavior of travelers (Ian, Shane, and Jillian 2011). The key component of the travel diary is the spatial-temporal information, which is usually captured using GPS enabling devices. Existing travel diary construction approaches are time consuming and with limited information. Recently, researchers have shifted their attention to user-gen- erated data on social media sites considering its large volume and availability for public use. However, an issue with geo- tagged photo data is noise. For instance, many geotagged photos were taken in transit rather than at the destinations (Vu et al. 2015). It is also possible that many photos were taken in a tourist destination. Similar issues exist in other types of geo- tagged social media content available on popular platforms, such as Twitter, Facebook, and Instagram, given that travelers can post content on social media via their mobile devices while traveling. In the application of sequential travel behav- ior, especially for international travel, the sequences of visited destinations such as cities in the world are of interest, rather than the raw spatial and temporal information embedded in the social media content. Prior works have not presented an effective approach to transform the geotagged social media
content into travel diaries for efficient analysis of travel sequential pattern (Kádár and Gede 2013; Onder, Koerbitz, and Hubmann-Haidvogel 2014; Vu et al. 2015; Garcia- Palomares, Gutierrez, and Minguez 2015).
In existing attempts for analyzing travel patterns, the flow is usually limited to two locations at a time (Leung et al. 2012; Zach and Gretzel 2012; Vu et al. 2015), which is inad- equate to extract complex sequential patterns from travel. Other works used sequential pattern mining techniques but they focused on identifying popular travel paths (Xia et al. 2010; Orellana et al. 2012), rather than the sequential asso- ciation between destinations. If DMOs know that travelers are likely to travel to a destination c after visiting destina- tions a and b, they can design travel packages that promote travelers to visit destinations a, b, and special offer for visit- ing c. However, prior approaches in the tourism literature have not been able to identify such sequential associations.
This article aims to address the aforementioned shortcom- ings through the following specific objectives:
•• present a processing framework for geotagged social media content to construct travel diaries that capture the international travels in the form of sequences;
•• introduce SRM into the analysis of travel diaries to identify complex sequential association between des- tinations; and
•• demonstrate the effect of the proposed method for travel behavior analysis by using a case study of Australian outbound travelers.
Methodology
This section presents our method for sequential travel behav- ior analysis that involves geotagged photos that are available from online databases such as Flickr. The main advantage of Flickr over other popular social media platforms is that photo databases are publicly available. The retrieval of all available photos is convenient at any given point in time, whereas it is not the case for Twitter, Facebook, or Instagram because either a quota limit or a fee applies. Moreover, Flickr is known for its reliable data source that can provide useful indicators for tourism demand (Barchiesi et al. 2015). Therefore, we use geotagged photos as a representative geo- tagged social media content to demonstrate our method. Our framework consists of three stages: (1) travel data extraction from geotagged photos, (2) travel diary construction, and (3) sequential rule mining.
Travel Data Extraction from Geotagged Photos
The geotagged photos can be retrieved from the Flickr server through its application programming interface (API). Full documentation is available at www.flickr.com/services/api. One challenge of data extraction is the identification of users whose photos should be analyzed. For example, we would
Vu et al. 5
like to retrieve photos posted by people living in Melbourne, but none of the API functions directly supports this opera- tion. We propose to retrieve lists of users and their location of residence initially through specific Flickr groups using the Group Search function. Flickr allows users to create or actively associate with a group. For example, we search for groups whose names contain the keyword Melbourne. Some group members are possibly Melbourne local residents. This approach allows for a quick access to many users who likely belong to our group of interest.
Let us use G g gw w={ }1 2, ,... to denote a list of groups returned by the Flickr Group Search function with keyword w. For each group gw1 ∈ G, we retrieve members’ UserID list M u ug
g g i
i i={ }1 2, ,... and their associated location of residence data. One user may have registered into more than one group.
Thus, UserID lists M Mg g1 2, ,...{ } for the groups are merged to remove any duplicated records. Users belonging to the group of interest, such as Melbourne local residents, are grouped together. The entire photo collection for each user is then retrieved using Photo Search with UserID specified as a search parameter. A bounding box covering the entire coun- try of residence for the targeted user group is used to identify photos taken during domestic or international trips. Photos taken outside of the bounding box are assumed to be taken during outbound trips to other countries and kept for further analysis. The photo collection of each user is sorted accord- ing to the temporal order of the time taken from the oldest to newest. The travel information can be inferred from the sequence of geographical data associated with the photos.
Travel Diary Construction
This stage converts the photo collections of users into sequences of visited destinations, which we call outbound travel diaries. The issue with the geotagged photo data is that the geographical information is in the form of raw GPS data (latitude and longitude). The data must be converted into a suitable format, presenting sequences of visited destinations. We propose to process the data by adopting Geocoding API service provided by Google Map, where the GPS data of each photo are mapped to its corresponding location or region. Documentation of Geocoding API is available at http://developers.google.com/maps. The labels of the mapped locations can be at multiple levels such as city or country to represent tourism destinations. It should be noted that the photos collected for each user could be taken during multiple outbound trips. The taken time between the photos is examined to determine their corresponding outbound trips.
The next step is to convert the travel diaries into sequences of destinations. Let L l l lm={ }1 1, ,..., be a set of m distinct items, each representing a travel destination. A travel sequence is defined as an ordered list of travel events
l l l ( ) →( ) → →( ) 1 2
... n
where l can be any item l Li ∈ ,
and l( ) t represents a travel event at time t to location li. For
example, given the travel sequence ( ) ( ) ( ) ,l l l2 1 4 2 2 3→ → the traveler visited destination l2 first, then destination l4 second, and revisited l2 later in the trip. The relative order of the items in the sequence is more important than the posi- tions of the items. For sequence ( ) ( ) ,l l2 1 4 2→ l2 does not necessarily mean the first destination, and l4 does not mean the last destination. The following paragraphs demonstrate the construction of travel sequence from travel diary using an example.
Example 1: Table 1 shows a sample travel diaries of a traveler during two outbound trips. One trip is to Europe and the other is to Asia as indicated by the Trip ID. The GPS information was mapped to its corresponding city and country using Geocoding API. The traveler may take many photos in each location. We only show the information of several photos here for demonstration purpose, but it still preserves the sequential information of visited destinations. Table 2 shows the travel sequences generated from the travel diary at both country and city levels. It is important to note that the sequence for the trip to Asia shows two destinations at the country level, Thailand and Singapore. The sequence at the city level, containing three items, which provides greater details on the visited cities. As such, detailed insights can be obtained. Travel sequence can also be constructed at a more detailed level such as district or street based on Geocoding API. In this article, we only con- sider the city and country levels for analyzing international travel patterns.
Sequential Rule Mining
Given a sequential data set S s s sm={ }1 2, ,..., , where each travel sequence is an ordered list of item sets
s n
= ( ) →( ) → →( )l l l 1 2
... . Each item set l( ) i can com-
prise one or more items l Li ∈ to represent events happening simultaneously. In our application, we assume that item set l( ) i contains only one item to represent each outbound travel
event to a specific destination. Sequential rule r: X Y⇒ is a relationship between two
unordered item sets X Y L, ,⊆ such that X Y∩ ≠∅ and X Y, .≠∅ An item set X or Y occurs in a sequence s if X or
Y i
n
i ⊆ ( )
=1 ∪ �l .
A rule r: X Y⇒ occurs in s ( written as r s ) if a num-
ber k k n1≤ ≤( ) exists, such that X i
k
i ⊆ ( )
=1 ∪ �l and Y
i k
n
i ⊆ ( )
= ∪ �l .
Example 2: The rule l l l l l2 3 5 6 7, , ,{ }⇒{ } occurs in sequence l l l l l2 1 3 2 5 3 6 4 7 5( ) →( ) →( ) →( ) →( )� but not the rule l l l l2 5 7 6, ,{ }⇒{ } because l6 does not occur after l7. For simplicity, we omit the index indicating the order of the event without the loss of order meaning.
Sequential rule is defined based on two metrics: support, denoted as supp r( ) , reflects how often the rule r appears in
6 Journal of Travel Research
Table 3. A Sequential Database.
ID Sequences
s1 l l l l l l1 2 3 6 7 5→ → → → →
s2 l l l l l l l l1 4 3 2 1 2 5 6→ → → → → → → s3 l l l l1 2 6 5→ → → s4 l l l l2 6 7 8→ → →
Note: The symbol “ → ” represents transition.the sequential database S; and confidence, denoted as conf r( ) , reflects how certainly an item set X is followed by item set Y in the sequential data set.
supp r s s S r s
S ( ) = ∈ ∧|{ | }| (3.1)
conf r s s S r s
s s S X s ( ) = ∈ ∧
∈ ∧ |{ | }|
|{ | }|
(3.2)
Traditionally, the process of mining sequential rules starts by finding all frequent sequences in S, whose support is greater than a user-defined threshold min supp( ) ∈[ ]0 1, (Fournier-Viger et al. 2012). Then, the rules that describe the relationships between different sequence items are con- structed. The confidences of the generated rules are com- puted based on Equation 3.2. A rule is considered as representing a strong sequential association if its confidence is greater than a user-defined threshold min conf( ) ∈[ ]01, . Such rules are kept for further analysis. Let us demonstrate the concept of support and confidence using a simple example.
Example 3: Suppose we have a sequential database as shown in Table 3. SRM is applied to this data set, with
min supp( ) = 0 5. , and min conf( ) = 0 6. . Some sequential rules are identified as shown in Table 4. For instance, the rule r l l l l1 1 2 3 5: , ,{ }⇒{ } has a support of 2 4 0 5/ .= because the item set l l l1 2 3, , ,{ } l5{ } appears twice out of four sequences in the data set. The confidence of r1 is 2 2 1/ = because the antecedent l l l1 2 3, ,{ } appears twice and is always followed by the consequent l5{ }. The support and confidence values of other rules are calculated in a similar manner.
In practice, setting min conf( ) is easier than min supp( )
(Fournier-Viger and Tseng 2011). The min conf( ) can be defined by users based on how confident the user wants the rules to be, whereas the min supp( ) should be selected based on the characteristics of the data set. A small value for min supp( ) can result in a large number of rules, while a high value for min supp( ) can lead to no or few rules. Therefore, we adopt a recently developed approach, named Top-K SRM (Fournier- Viger and Tseng 2011), to discover k rules with the highest support, such that their confidences are higher than a user- specified min conf( ). Basically, Top-K SRM discovers a set of k rules R r r rk={ }1 2, ,..., , such that for each rule r Rm ∈ ,
Table 1. Travel Diary Example.
ID Date Trip ID Latitude/Longitude City Country/Continent
P1 21 Jun 2014 1 48.8529 / 2.2992 Paris France/Europe P2 21 Jun 2014 1 48.8734 / 2.2953 Paris France/Europe P3 22 Jun 2014 1 48.8410 / 2.3207 Paris France/Europe P4 23 Jun 2014 1 40.4913 / -3.5920 Madrid Spain/Europe P5 24 Jun 2014 1 40.4108 / -3.7073 Madrid Spain/Europe P6 24 Jun 2014 1 40.4334 / -3.7042 Madrid Spain/Europe P7 25 Jun 2014 1 40.4175 / -3.7143 Madrid Spain/Europe P8 26 Jun 2014 1 41.7949 / 12.2506 Rome Italy/Europe P9 26 Jun 2014 1 41.8913 / 12.4918 Rome Italy/Europe P 0 28 Jun 2014 1 41.9105 / 12.4764 Rome Italy/Europe P11 1 Dec 2015 2 13.6922 / 100.7512 Bangkok Thailand/Asia P12 1 Dec 2015 2 13.7403 / 100.5090 Bangkok Thailand/Asia P13 2 Dec 2015 2 13.7509 / 100.4984 Bangkok Thailand/Asia P14 3 Dec 2015 2 13.7551 / 100.5115 Bangkok Thailand/Asia P15 5 Dec 2015 2 12.9292 / 100.8770 Pattaya Thailand/Asia P16 5 Dec 2015 2 12.9307 / 100.8781 Pattaya Thailand/Asia P17 7 Dec 2015 2 1.2950 / 103.8583 Singapore Singapore/Asia P18 7 Dec 2015 2 1.2809 / 103.8638 Singapore Singapore/Asia P19 7 Dec 2015 2 1.2905 / 103.8455 Singapore Singapore/Asia P20 8 Dec 2015 2 1.3620 / 103.9906 Singapore Singapore/Asia
Table 2. Travel Sequences.
Trip Sequences (country level) Sequences (city level)
1 France Spain Italy→ → Paris Madrid Rome→ → 2 Thailand Singapore→ Bangkok Singapore→ →Pattaya
Vu et al. 7
conf r minm conf( )≥ ( ), and no other rule r Rn ∉ with supp r supp rn m( )> ( ) and conf r minn conf( )≥ ( ) exists. As such, users only need to provide the min conf( ) in the case study and the top rules returned are examined by the algo- rithm. A practical application of Top-K SRM is demonstrated in the fourth section.
A Case Study
This section presents a case study of travel diary analysis for Australian outbound tourism. The data collection process is initially presented, which is followed by the construction of travel diary. SRM is then applied to identify sequentially associated destinations. The capability of the travel diary in capturing travel behavior is further demonstrated through an analysis of sequential patterns. A discussion of the results is provided with practical implications.
Data Collection
The data set used in this study was collected from Flickr through the method described the previous section. A list of user IDs was first retrieved from Flickr’s group together with their locations of residence. Users of interests were identi- fied, and their entire photo collections were retrieved sub- sequently. Our study focused on users residing in Sydney, Melbourne, Brisbane, Perth, and Adelaide, the top five most populated cities in Australia (ABS 2015). A bound- ing box covering the entire geographical area of Australia was specified, with coordinates minlatitude =−45 38122. , maxlatitude =−11 044189. and minlongitude =110 678793. , maxlongitude =153 845616. . A photo was treated as taken dur- ing outbound travel if its location is outside the bounding box; otherwise, it was treated as taken during domestic travel and excluded from further analysis. User accounts with no photo posted were excluded from the data collection. The final data set comprised 809,313 photos taken by 3,623 users during outbound travel. The earliest photos were taken in 2001 and the latest photos were taken in 2015. Table 5 describes our data set with respect to different cities.
Sydney has the highest number, with 1,435 users. Melbourne places second with more than 1,000 users. Brisbane and Perth have fewer users. Adelaide has the least number, with 213 users. This order is similar to the popular- ity ranking for these cities (ABS 2015); Sydney is the most populated city, and Adelaide places fifth. The number of photos taken per user is similar across the groups. The aver- age time span for photo collections of Melbourne travelers is the highest, with an average of around three years, while the photo collections of the Sydney travel group has the least time span. The time span of the photo collection is not a major factor in our study, because our analysis focuses on the sequential travel pattern for each outbound trip rather than the travel history of travelers. In addition, Australia is located far away from countries in other continents; the travel pat- terns from different Australian cities to other continents would be relatively similar given the limited number of air routes. Therefore, we treat users from different cities as the same group to represent Australian travelers in the subse- quent analysis.
We acknowledge the variety of travel styles and prefer- ences among the travelers, such as for businesses, holiday, or family visits. This study, however, does not consider such differences because of the scope of coverage. Instead, this study presents an approach that focuses on extracting pat- terns reflecting sequential associations among visited desti- nations, embedded in the travel photo sequences.
Travel Diary Construction
The geographical information of photos was mapped to their corresponding cities and countries using Geocoding API, as described earlier. We made an assumption that photos taken more than 30 days apart were likely in different outbound trips. The photo collection of each user was sorted in a tem- poral order and separated into different trips. In total, 17,188 travel diaries were constructed from the collected data set. The travel diaries were then converted into sequences of vis- ited destinations. The number of travel diaries in this study was much more than the travel diary data set used in prior studies (Xia et al. 2010; Orellana et al. 2012; Vu et al. 2015).
Among the travel diaries, we noticed that 12,819 travel diaries corresponded to a single country and 4,369 travel dia- ries involve two or more countries. Table 6 shows the
Table 4. Sequential Rules.
ID Rule Support Confidence
r1 l l l l1 2 3 5, ,{ } ⇒ { } 0.5 1.0 r2 l l l l1 3 5 6{ } ⇒ { }, , 0.5 0.66 r3 l l l l1 2 5 6, ,{ } { }⇒ 0.75 1.0 r4 l l l2 5 6{ } ⇒ { }, 0.75 0.75 r5 l l l1 5 6{ } ⇒ { }, 0.75 1.0 r6 l l3 6{ } ⇒ { } 0.5 1.0 r7 l l1 2{ } ⇒ { } 0.75 1.0
Note: The symbol “ ⇒ ” represents rule.
Table 5. Geotagged Photo Data Collection.
Travel Group
No. of Users
No. of Photos
No. of Photos/User
Average Time Span (Year)
Sydney 1,435 367,207 255.89 1.66 Melbourne 1,033 248,056 240.13 3.01 Brisbane 479 101,940 212.82 2.90 Perth 463 125,049 270.08 2.96 Adelaide 213 48,061 225.64 2.44 Total 3,623 890,313 245.74
8 Journal of Travel Research
Table 7. Popularly Visited Countries.
Country (Code) No. of Travelers No. of Trips No. of Trips/Traveler
United States of America (USA)a 1,119 2,535 2.27 United Kingdom (GBR)a 1,040 2,967 2.85 New Zealand (NZL)a 818 1,544 1.89 France (FRA) 702 1,184 1.69 Italy (ITA) 596 1,012 1.70 Japan (JPN) 503 941 1.87 Singapore (SGP)a 496 762 1.54 Malaysia (MAL)a 464 754 1.63 Thailand (THA)a 452 703 1.56 China (CHN)a 440 834 1.90 Germany (DEU) 417 753 1.81 Hong Kong (HKG)a 412 652 1.58 Indonesia (IDN)a 386 631 1.63 Spain (ESP) 350 514 1.47 Canada (CDN) 304 580 1.91 India (IND) 271 454 1.68 Viet Nam (VNM) 255 363 1.42 Switzerland (CHE) 232 343 1.48 Netherland (NLD) 230 425 1.85 Austria (AUT) 185 246 1.33
a. Among the top 10 destinations according to a national outbound survey by Tourism Research Australia (TRA 2014a).
proportions of visited continents, single-country trips versus trips to two or more countries. Please note that destinations in Oceania refer to countries other than Australia, as the col- lected data are for the outbound trips of Australian residents. We can see that the majority of trips within a single country was in Asia, with 33.72%. Travelers were more likely to travel to Europe in trips spanning two or more countries, with 43.16%, significantly higher than trips to a single coun- try. Z-tests with p value ≤0.05 verified statistical signifi- cance. Little difference was noticed for trips to Africa, America, and Asia. Travelers were less likely to travel to two different countries in Oceania.
We further examine the capability of the geotagged photo in capturing travel behavior of Australian travelers via Table 7, which shows the top 20 visited as identified from the
collected data set. The top countries in our list are among the top 10 destinations according to a national outbound survey by Tourism Research Australia (TRA 2014a). The most popu- lar countries in both lists are United States, United Kingdom, and New Zealand. In particular, travelers are likely to visit the United States and the United Kingdom multiple times as shown by the high values for average numbers of trip per travelers. These destinations are in fact the home countries of many Australian residents (TRA 2014a), where they probably visited frequently. We also noticed that our list does not include Fiji, a popular destination of Australian travelers (TRA 2014a). Fiji ranked 22nd in our data set; as a result, Fiji was not listed in Table 7. Nevertheless, the geotagged photos can still capture the general travel behavior of the travelers. Table 7 presents and examines the popular destinations; we
Table 6. Destinations of Outbound Trips.
Continent
Proportion (%)
z-Score p-Valuea Single
Country Two Countries
or More Difference
Africa 3.44 3.07 –0.37 1.225 0.219 America 18.33 13.71 –4.62 8.5072 0.000 Asia 33.72 34.33 0.61 –1.2974 0.194 Europe 29.98 43.16 12.18 –26.715 0.000 Oceania 14.54 5.29 –9.25 18.122 0.000 Number of trips 12,819 4,369
a. Italic type indicates significance (p ≤ 0.05).
Vu et al. 9
included all of the destinations to construct the travel sequences in the subsequent analysis.
Travel Sequence Analysis
Sequential rules of visited countries. The data sets of the con- structed travel sequence at country level were input into the Top-K SRM algorithm (see earlier), whose implementation is available as an open-source data mining library (Fournier- Viger et al. 2014b). Only those 4,369 travel diaries involving two countries were considered in this analysis, as there is no sequence in the travel diaries to a single country. The mini- mum confidence was set to min conf( ) =0 6. , and the k value was set to 50. The Top-K SRM scans the data sets for sequen- tial rules greater or equal to 0.6 and returns those rules with top support. Some rules may have similar items in the ante- cedent and same item(s) in the consequent. Such rules con- tain redundant information, and thus only those rules with the top supports are reported. Twenty-three sequential rules were selected, as shown in Table 8. The countries are denoted by the three-letter country code defined in ISO 3166 pub- lished by the International Organization for Standardization (ISO) (www.iso.org/iso/country_codes). The rules are grouped based on the continent of the countries in the conse- quence part for convenience of interpretation. All rules
satisfy the minimum confidence threshold of 0.6. We sum- marize the findings as follows:
•• Australian travelers have a high chance of traveling to the United States (USA) if they plan to visit Canada (CDN) or Mexico (MEX), as indicated by rules r1 and r2, respectively. The confidences of both rules are above 0.74. If they travel to Bolivia (BOL), they are likely to also visit Peru (PER) with a confidence of 0.871 (rule r3 ).
•• For destinations in Asia, a relatively strong sequential association was found between Lao (LAO) and Thailand (THA) as in rule r4. If a traveler plans to visit Lao, he or she is likely to visit Thailand during the trip. An explanation for the low number of rules is that Australian travelers are likely to visit a single country in Asia as discussed earlier. Thus, a low num- ber of sequential patterns exists in the trips to two or more countries in Asia. Analysis at the city level would provide further insights into the later sections.
•• Quite a number of rules were found for destinations in Europe. Namely, if travelers visited Czech Republic (CZE), France (FRA), and/or Austria (AUT), they have a high possibility of visiting Germany (DEU) as well, as indicated by rules r5 7− . Some travelers are likely to visit Italy (ITA) after visiting Austria, France, or Greece (GRC) (rules r8 9− ). Bosnia and Herzegovina (BIH) are likely to be visited after Croatia (HRV), as shown in rule r10 . Rules r11 23− show strong sequential associations between the countries of the United Kingdom (GBR) and other European countries. The combinations of the visited countries are varied but the United Kingdom is often the last destination. A possible explanation for these patterns is that the United Kingdom is the home country of many Australian residents. Therefore, they are likely taking advantage of their trips back home to visit other European countries on the way.
Table 8 shows that most countries in the identified sequen- tial rules are the most visited destinations by travelers con- sidering that Top-K SRM returns sequential rules with high supports also indicate the frequent items. In the next section, we examined the sequential pattern between cities for more insights into the travel patterns.
Sequential rules of visited cities. This section focuses on dem- onstrating the capability of travel diaries in capturing the travel patterns at the micro level between cities. We examine the multi-city trips to destinations in America, Asia and Europe in this analysis. Only those travel sequences with two or more cities in each continent are input into the SRM algo- rithm. We notice that some rules at the city level are redun- dant to rules at country level as reported in Table 8. For example, the rule Dublin London⇒ would provide similar
Table 8. Sequential Rules by Country.
Sequential Rules Support Confidence Rule
America CDN USA⇒ 0.051 0.743 r1 MEX USA⇒ 0.014 0.768 r2 BOL PER⇒ 0.006 0.871 r3 Asia LAO THA⇒ 0.009 0.661 r4 Europe CZE FRA DEU, ⇒ 0.009 0.621 r5 AUT FRA DEU, ⇒ 0.013 0.663 r6 AUT CZE DEU, ⇒ 0.008 0.642 r7 AUT FRA ITA, ⇒ 0.012 0.639 r8 FRA GRC ITA, ⇒ 0.008 0.733 r9 BIH HRV⇒ 0.008 0.673 r10 ITA ESP GBR, ⇒ 0.015 0.638 r11 AUT FRA GBR, ⇒ 0.012 0.639 r12 FRA DEU GBR, ⇒ 0.030 0.638 r13 FRA NLD GBR, ⇒ 0.015 0.653 r14 DEU ESP GBR, ⇒ 0.011 0.667 r15 IRL GBR⇒ 0.024 0.682 r16 ISL GBR⇒ 0.009 0.684 r17 FRA ITA CHE GBR, , ⇒ 0.011 0.696 r18 FRA ITA ESP GBR, , ⇒ 0.011 0.716 r19 BEL DEU GBR, ⇒ 0.009 0.717 r20 FRA DEU CHE GBR, , ⇒ 0.010 0.729 r21 FRA DEU NLD GBR, , ⇒ 0.010 0.763 r22 DEU ITA ESP GBR, , ⇒ 0.009 0.810 r23
10 Journal of Travel Research
patterns as the rule Ireland United Kingdom⇒ . We report rules that provide new information, as shown in Table 9. The city names are shown together with their corresponding country codes.
Some sequential associations are found between cities in American countries. For instance, travelers are likely to visit Los Angeles if they visited Chicago and/or Denver (rules c1 3− ). Travelers who visited La Paz have a high chance to also visit Lima next (rule c5 ). Rules c6 8− shows some sequential associations between cities in Asia such as Hebron and Jerusalem, Kathmandu and Kolkata, Jakarta–Kuala Lumpur and Singapore.
Although quite a number of rules are found for European cities, many of them provide redundant information. The rules c9 11− show some associations not yet discovered in pre- vious analysis at the country level. Namely, travelers who visited Monaco, Amsterdam, or Madrid are likely to visit Paris. Zagreb is likely to be the next city after Sarajevo.
Aside from the well-known tourism destinations listed in Table 9, DMOs would be interested in the travel patterns of travelers between second- and third-tier destinations to gain more insight. As a demonstration, we examine the sequential rules between cities in the United Kingdom, except for London. Table 10 shows the top 10 sequential rules with 0.6 or more confidence.
Rules e1 and e2 show the possibility for Australian travel- ers to visit Edinburgh after Dunfermline, and Inverness after Elgin, respectively. Travelers who have visited Greenock and Inverness are also likely to visit Isle of Lewis (Rule e3 ). Furthermore, Kendal is likely to be visited after Barrow in Furness (Rule e4 ). Travelers are likely to visit Oxford after Cowley (Rule e6) , or after Edinburgh and Killington (Rule e5 ). Travelers have a high possibility of visiting Perth if they have visited nearby cities, such as Dundee, Alloa, Arbroath, and Glentorhes (Rules e7 10− ). Most cities in the identified rules are at a close distance, which is convenient for land
Table 9. Sequential Rules by City.
Sequential Rules Support Confidence Rule
America Chicago USA LosAngeles USA( ) ( )⇒ 0.146 0.646 c1 Denver USA LosAngeles USA( ) ( )⇒ 0.139 0.775 c2 Chicago USA ,Denver USA LosAngeles USA( ) ( ) ( )⇒ 0.042 0.754 c3 Edmonton CDN Vancouver CDN( ) ( )⇒ 0.051 0.722 c4 LaPaz BOL Lima PER( ) ( )⇒ 0.027 0.871 c5 Asia Hebron ISR erusalem ISR( ) ( )⇒ 0.009 0.661 c6 Kathmandu NPL Kolkata IND( ) ( )⇒ 0.008 0.733 c7 Jakarta IDN ,KualaLumpur MYS Singapore SGP( ) ( ) ( )⇒ 0.008 0.673 c8
Europe Monaco MCO Paris FRA( ) ( )⇒ 0.029 0.932 c9 Amsterdam NLD ,Madrid ESP Paris FRA( ) ( ) ( )⇒ 0.012 0.815 c10 Sarajevo BIH Zagreb HRV( ) ( )⇒ 0.017 0.702 c11
Table 10. Sequential Rules by City in the United Kingdom.
Sequential Rules Support Confidence Rule
Dunfermline Edinburgh⇒ 0.031 0.622 e1 Elgin Inverness⇒ 0.022 0.833 e2 Greenock,Inverness Isleof Lewis⇒ 0.023 0.600 e3 Barrow inFurness Kendal⇒ 0.013 0.600 e4 Edinburgh,Kidlington Oxford⇒ 0.013 0.600 e5 Cowley Oxford⇒ 0.012 0.846 e6
Dundee Perth⇒ 0.049 0.677 e7
Alloa Perth⇒ 0.020 0.783 e8 Arbroath Perth⇒ 0.018 0.889
e9 Glenrothes Perth⇒ 0.017 0.833 e10
travel. Except for Rule e5 , Edinburgh is far away from Oxford but is frequently visited probably because of popularity and convenience of air transportation.
The SRM aims to assess how certain it is for a destination to be visited after other destinations based on the confidence. For example, people may travel frequently between Los Angeles, Chicago, and Denver. If DMOs are certain that Los Angeles will be the next destination after Chicago using SRM, more focused travel packages can be developed to promote those who visit Chicago to travel to Los Angeles. Nevertheless, travel diaries constructed from the geotagged photos can be used to identify popular sequential patterns to support for the construction of travel itinerary. We demon- strate such capability in the next section.
Travel itinerary analysis. This section demonstrates the capa- bility of travel diaries in capturing popular travel pattern through an analysis of sequential travel pattern among Asian cities. Only travel sequences with two cities or more in Asia are considered. We applied Top-k sequential pattern mining algorithm to extract the frequent patterns (Fournier-Viger
Vu et al. 11
and Tseng 2011). Top 50 patterns with high support are returned, and patterns with similar items are removed as they provide redundant information. We are left with 24 frequent patterns as shown in Table 11.
We can see that the identified patterns contain major tour- ism cities in Asia such as Bangkok, Ho Chi Minh City, Hong Kong, Kuala Lumpur, and Singapore. For instance, sequen- tial patterns s1 7− show several patterns starting from Bangkok to other nearby cities. The popular cities to visit after Bangkok are Kuala Lumpur and Singapore with the supports of around 0.05, respectively. Some travelers visited all three cities, as shown in patterns s3 and s5 . Popular sequential patterns starting from Ho Chi Minh City are shown in patterns s8 12− . The travel pattern starting from Hong Kong tends to go to other destinations in China, espe- cially for Shanghai, with a support of around 0.1. The sequence s18 has the highest support (0.134) among all pat- terns, which reflects the fact that Kuala Lumpur to Singapore is a very popular travel path, perhaps because of their close distance. It is interesting to note that although Singapore is a major destination and is close to Australia, travelers are more likely to visit Singapore after other cities in multiple-destina- tion trips in Asia. Besides, frequent patterns were found for less popular cities such as Kathmandu to Kolkata (sequence s23 ) or Kolkata to Singapore (sequence s24 ).
As a result of the long distance between Australia and other continents, Australian travelers often travel to Europe
or America via Asian cities because of more options of air- lines. It is beneficial for DMOs to identify the hub destina- tions in Asia for better development of the travel itinerary for long-haul travel. We examine the travel diaries and identify any transition from a city in Asia to the next city in Europe and America. The transition frequency is visualized using a heat map (Krentzman et al. 2011), as shown in Figure 1. Asian cities are on the vertical axis. European and America cities are listed on the horizontal axis; the prefix of the city names indicates the corresponding continent. Because of the large number of cities, only cities visited by at least 1% of the travelers are included in the figure. A darker cell indicates high frequency, and a lighter cell indicates otherwise.
We can see that Dubai, Hong Kong, and Singapore are the most popular destinations for Australians to travel to London, as indicated by the dark cell in the figure. This is consistent with the fact that those cities are major hub destinations, with large airports and major airlines. Hong Kong is also a popu- lar destination for traveling to Paris. Travelers are likely to travel to Paris via Hong Kong. Shanghai is a popular hub destination for traveling to Berlin. Few direct transitions from Asia to America are shown in Figure 1, which is consis- tent with the fact that direct routes from Australia to America are more convenient. Tokyo to Los Angeles is a commonly used path from Asia to America by Australian travelers. We further examined the travel diaries and found that around 70% of Australian travelers spent more than one day in Tokyo before traveling to Los Angeles. This result suggests that Tokyo is usually visited for other purposes rather than simply for connecting flights.
Discussion
The analysis using SRM has identified some strong sequen- tial associations between visited destinations of Australian outbound travelers. DMOs can advertise specific travel packages that promote travelers to visit multiple destinations in their trips. For instance, special offers to visit the United Kingdom can be created if the travelers also visit Germany, Italy, and Spain (rule s23 in Table 8), due to the strong sequential associations between them. In this way, DMOs can encourage travelers travel to more destinations and pur- chase higher-value travel packages. The analysis of travel patterns can also be done at city level for detailed informa- tion as indicated above. Domestic travel packages between Chicago, Denver, and Los Angeles can be offered for people who visited the United States, as indicated by strong associa- tions in rules s1 3− (Table 9). In addition, detailed insights into the travel patterns of travelers between second- and third-tier destinations can be obtained based on our proposed approach. Table 10 shows some strong rules between nearby cities in the United Kingdom. DMOs may offer different means of transportations other than airlines to travelers to promote their travel packages, such as Elgin to Inverness (rule e2), and Arbroath to Perth (rule e9). The analysis of
Table 11. Sequential Patterns for Destinations in Asia.
Sequential Pattern Support Sequence
Bangkok KualaLumpur→ 0.055 s1 Bangkok Singapore→ 0.049 s2 Bangkok KualaLumpur Singapore→ → 0.018 s3 Bangkok HoChi Minh→ 0.041 s4 Bangkok HoChi Minh PhnomPenh→ → 0.020 s5 Bangkok Vientiane→ 0.031 s6 Bangkok HongKong→ 0.024 s7 HoChi Minh PhnomPenh→ 0.053 s7 HoChi Minh Singapore→ 0.035 s9 HoChi Minh KualaLumpur→ 0.027 s10 HoChi Minh KualaLumpur Singapore→ → 0.024 s11 HoChi Minh Vientiane→ 0.024 s12 HongKong Shanghai→ 0.101 s13 HongKong Macau→ 0.051 s14 HongKong Macau Shanghai→ → 0.021 s15 HongKong Singapore→ 0.031 s16 HongKong Tokyo→ 0.035 s17 KualaLumpur Singapore→ 0.134 s18 KualaLumpur Kuching→ 0.026 s19 KualaLumpur Makassar→ 0.024 s20 Shanghai Tokyo→ 0.034 s21 Singapore Tokyo→ 0.031 s22 Kathmandu Kolkata→ 0.024 s23 Kolkata Singapore→ 0.036 s24
12 Journal of Travel Research
sequential pattern can be done at a more fine-grained level depending on the practical application through the mapping of GPS data to locations using Geocoding API.
The analysis in the previous section shows that the travel diaries constructed from geotagged photos can effectively capture the international travel behavior for the case of Australia. The travel diaries have captured popular travel sequences between Asian cities, as shown Table 11. Bangkok, Ho Chi Minh City, and Kuala Lumpur are major base desti- nations to travel to other places in Southeast Asia, while trav- elers usually travel from Hong Kong to cities in Northern Asia such as Shanghai and Tokyo. It is interesting to see that Singapore is frequently visited after other cities despite being a major destination in Asia. DMOs can then advertise suit- able travel itineraries for Australian travelers following such frequent patterns. The heat map of the transition in Figure 1 confirmed that the travel diaries could capture the popular travel paths from Asia to Europe via some major hub destina- tions. Researchers can adopt the proposed travel diary con- struction approach in further analysis of travel behavior.
It should be noted that the sequential rules are different from traditional approaches of sequential pattern analysis as SRM aims to identify strong sequential association between the visited destinations based on the confidence. The approaches used in prior works (Xia et al. 2010; Orellana et al. 2012) may be able to identify some frequent sequential patterns as shown above, but they are incapable of extracting the sequential association as in case of SRM shown in the present study.
This study is not without limitations. Although some sequential rules have been identified that reflect certain travel patterns of Australians, other factors influencing the travel decision was not considered in this study. These find- ings should be considered as a demonstration of how
sequential patterns can be extracted from travel diaries. The travel pattern should be considered together with other fac- tors, such as user demographic profile or travel motivation, in practical applications. Besides, gaps may exist between the findings and the actual travel behaviors. A combination of multiple data sources of geotagged photos is suggested for a specific practical application. Demographic factors were not considered to explain specific travel patterns. Besides, travel patterns of first-time and repeat visitors will likely dif- fer, as examined in prior studies (Hwang, Gretzel, and Fesenmaier 2002; Kempermann, Joh, and Timmermans 2004). The travel diaries constructed from the geotagged photos can capture the travel sequences, but it is uncertain if the first trip in the travel diaries is the actual first trip of the travelers, and the first trip might be prior to the data collec- tion period. Therefore, we were unable to investigate the dif- ference between the first and the repeated trips. The analysis of travel patterns was based on the observed behavior of travelers through the geotagged photo data. Given the lim- ited scope of this study, we were unable to investigate the relationship between the observed travel patterns and the availability of airlines, which had shown significant influ- ence on travel connections (Hwang, Gretzel, and Fesenmaier 2006). Apart from the travel patterns, the actual photos taken can provide comprehensive information about the activities of travelers in a destination, which has not been considered in this study.
Conclusions
Insight into sequential travel patterns of travelers is important to identify preferred destinations and future travel intentions. This understanding is crucial for tourism managers and indus- try practitioners to design suitable travel packages and make
Figure 1. Transition from Asia to Europe and America.
Vu et al. 13
appropriate offers. Unfortunately, such knowledge has not been fully obtained given the difficulty of capturing the com- plex travel behavior. Travel events usually occur over a long period, especially in the case of international travel, which makes collecting sequential travel information difficult. Traditional approaches to travel pattern analysis were unable to capture sequential association between visited destinations. To address these shortcomings, this article presented an approach to travel diaries construction from geotagged pho- tos and introduced the SRM technique to extract the sequen- tial association of destinations from travel sequences.
The effectiveness of the proposed approach was demon- strated in a case study of Australian outbound tourism, using a large data set of more than 890,000 photos from 3,623 out- bound travelers. Travel diaries are constructed from geotagged photos, which contain comprehensive past travel information of travelers. The case study confirmed that the travel diaries constructed from geotagged photos are effective in capturing travel patterns. The analysis of travel diaries reveals interest- ing sequential association that can assist DMOs in developing better travel packages. DMOs can promote proper destinations to prospective travelers to achieve a high purchasing rate. The introduced framework with SRM technique has the potential to benefit tourism researchers worldwide from improving their understanding of travel behaviors.
One potential extension of this work is to incorporate other information reflecting the context of the travel into the analysis, in addition to the spatial and temporal information. For example, the textual meta-data and the visual content of the actual photos taken at destinations can be examined for additional insights into the activities of tourists. Other influ- encing factors, such as travel styles, preferences, and travel purposes, can be incorporated for more detailed insight into travelers’ behavior. Photo-taking behavior is important in understanding the geotagged photo data and thus should be the focus of future research. The construction of the travel diary presented can be applied to other geotagged social media content such as those on Twitter, Facebook, and Instagram, which we shall investigate in future studies. SRM is a general-purpose approach for mining sequential associa- tions. Aside from social media, SRM is beneficial to investi- gating its applicability in analyzing travel diaries constructed from other data sources, such as GPS loggers, bank transac- tions, and mobile tracking data. Airline availability is one of the influencing factors of travel connections. Indeed, future studies can incorporate airline network data into the analysis of sequential association for more detailed insight.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The
work described in this article was supported by a grant funded by the Research Grants Council of the Hong Kong Special Administrative Region, China (GRF Project Number: 15503814). We also acknowledge the funding support provided by the Hong Kong Polytechnic University.
References
ABS. 2015. “3218.0—Regional Population Growth, Australia, 2013-14.” Technical report, Australian Bureau of Statistics, Canberra. http://www.abs.gov.au/ausstats/[email protected]/mf/3218.0 (accessed May 20, 2016).
Aloysius, G., and D. Binu. 2013. “An Approach to Products Placement in Supermarkets Using PrefixSpan Algorithm.” Journal of King Saud University—Computer and Information Sciences 25 (1): 77–87.
ATTF. 2013. “Tourism Industry Trend Update.” Technical Report Q1 edition, Australia Tourism and Transport Forum, Sydney, Australia. http://tra.gov.au/documents/State-of-the- industry/Tourism_Update_March_Qtr_2014_FINAL.pdf (accessed May 24, 2016).
Barchiesi, D., H. S. Moat, C. Alis, S. Bishop, and T. Preis. 2015. “Quantifying International Travel Flows Using Flickr.” PLos One 10 (7): e0128470.
Birenboim, A., S. Anton-Clave, A. P. Russo, and N. Shoval. 2013. “Temporal Activity Patterns of Theme Park Visitors.” Tourism Geographies 15 (4): 601–19.
Breakwell, G., and P. Wood. 1995. “Diary Techniques.” In Research Methods in Psychology, edited by G. M. Breakwell, S. Hammond, and C. Fife-Schaw, 293–301. London: Sage.
Breen, H., A. Bull, and M. Walo. 2001. “A Comparison of Survey Methods to Estimate Visitor Expenditure at a Local Event.” Tourism Management 22 (5): 473–79.
Chang, L., D. Yang, S. Tang, and T. Wang. 2006. “Mining Compressed Sequential Patterns.” In Proceedings of the 2nd International Conference on Advanced Data Mining and Applications, August 14-16, Xian, China.
Cheng, Y.-T., Y.-F. Lin, K.-H. Chiang, and V. S. Tseng. 2016. “Mining Disease Sequential Risk Patterns from Nationwide Clinical Databases for Early Assessment of Chronic Obstructive Pulmonary Disease.” In Proceedings of the International Conference on Biomedical and Health Informatics (BHI), February 24–27, Las Vegas, Nevada.
Chua, A., L. Servillo, E. Marcheggiani, and A. V. Moere. 2016. “Mapping Cilento: Using Geotagged Social Media Data to Characterize Tourist Flows in Southern Italy.” Tourism Management 57:295–310.
Coghlan, A., and P. Pearce. 2010. “Tracking Affective Components of Satisfaction.” Tourism and Hospitality Research 10 (1): 42–58.
Deng, M., and G. Athanasopoulos. 2011. “Modelling Australian Domestic and International Inbound Travel: A Spatial- Temporal Approach.” Tourism Management 32 (5): 1075–84.
Edwards, D., T. Griffin, B. Hayllar, T. Dickson, and S. Schweinsberg. 2009. “Understanding Tourist Experience and Behavior in Cities: An Australian Case Study.” Technical report. Sustainable Tourism. http://www.sustainabletouris- monline.com/31/destination-access/understanding-tourist- experiences-and-behaviour-in-cities-an-australian-case-study (accessed April 10, 2016).
14 Journal of Travel Research
Fournier-Viger, P., U. Faghihi, R. Nkambou, and E. M. Nguifo. 2012. “CMrules: Mining Sequential Rules Common to Several Sequences.” Knowledge-Based Systems 25:63–76.
Fournier-Viger, P., A. Gomariz, T. Gueniche, A. Soltani, C.-W. Wu, and V. S. Tseng. 2014a. “SPMF: A Java Open-Source Pattern Mining Library.” Journal of Machine Learning Research 15:3389–93.
Fournier-Viger, P., A. Gomariz, M. Sebek, and M. Hlosta. 2014b. “VGEN: Fast Vertical Mining of Sequential Generator Patterns.” In Proceedings of the 16th International Conference on Data Warehousing and Knowledge Discovery, September 2–4, Munich, Germany.
Fournier-Viger, P., and V. S. Tseng. 2011. “Mining Top-k Sequential Rules.” In Advanced Data Mining and Applications: 7th International Conference, ADMA 2011, Beijing, China, December 17-19, 2011, Proceedings, Part II, edited by Jie Tang, Irwin King, Ling Chen, and Jianyong Wang, 180–94. Berlin: Springer.
Fournier-Viger, P., C.-W. Wu, and V. S. Tseng. 2013. “Mining Maximal Sequential Patterns without Candidate Maintenance.” In Advanced Data Mining and Applications: 9th International Conference, ADMA 2013, Hangzhou, China, December 14-16, 2013, Proceedings, Part I, 169–80. Lecture Notes in Computer Science. Berlin: Springer.
Garcia-Palomares, J. C., J. Gutierrez, and C. Minguez. 2015. “Identification of Tourist Hot Spots Based on Social Networks: A Comparative Analysis of European Metropolises Using Photo- Sharing Services and GIS.” Applied Geography 63:408–17.
Hamilton, H. J., and K. Karimi. 2005. “The TIMERS II Algorithm for the Discovery of Causality.” In Advances in Knowledge Discovery and Data Mining: 9th Pacific-Asia Conference, PAKDD 2005, Hanoi, Vietnam, May 18–20, 2005, edited by Tu Bao Ho, David Cheung, and Huan Liu, 744–50. Lecture Notes in Computer Science. Berlin: Springer.
Hwang, Y.-H., and D. R. Fesenmaier. 2003. “Multidestination Pleasure Travel Patterns: Empirical Evidence from the American Travel Survey.” Journal of Travel Research 42:166–71.
Hwang, Y.-H., U. Gretzel, and D. R. Fesenmaier. 2002. “Multi-city Pleasure Trip Patterns: An Analysis of International Travelers to the U.S.” In City Tourism, edited by K. Wöber, 53–62. Vienna, Austria: Springer Verlag.
Hwang, Y.-H., U. Gretzel, and D. R. Fesenmaier. 2006. “Multicity Trip Patterns: Tourists to the United States.” Annals of Tourism Research 33 (4): 53–62.
Ian, P., P. Shane, and L. Jillian. 2011. “Grey Nomads on Tour: A Revolution in Travel and Tourism for Older Adults.” Tourism Analysis 16 (3): 283–94.
Kádár, B. 2014. “Measuring Tourist Activities in Cities Using Geotagged Photography.” Tourism Geographies 16 (1): 88–104.
Kádár, B., and M. Gede. 2013. “Where Do Tourists Go? Visualizing and Analysing the Spatial Distribution of Geotagged Photography.” Cartographica 48 (2): 78–88.
Kempermann, A., C.-H. Joh, and H. Timmermans. 2004. “Comparing First-Time and Repeat Visitors’ Activity Patterns in a Tourism Environment.” In Consumer Psychology of Tourism, Hospitality and Leisure, edited by G. I. Crouch, R. R. Perdue, H. Timmermans, and M. Uysal, 103–19. Cambridge, MA: CABI.
Krentzman, A. R., E. A. R. Robinson, J. M. Jester, and B. E. Perron. 2011. “Heat Maps: A Technique for Classifying and Analyzing Drinking Behavior.” Substance Use & Misuse 46 (5): 687–95.
Law, R., J. Rong, H. Q. Vu, G. Li, and H. A. Lee. 2011. “Identifying Changes and Trends in Hong Kong Outbound Tourism.” Tourism Management 32:1106–14.
Leung, X. Y., F. Wang, B. Wu, B. Bai, K. A. Stahura, and Z. Xie. 2012. “A Social Network Analysis of Overseas Tourist Movement Patterns in Beijing: The Impact of the Olympic Games.” International Journal of Tourism Research 14: 469–84.
Lew, A., and B. McKercher. 2006. “Modelling Tourist Movement: A Local Destination Analysis.” Annals of Tourism Research 33 (2): 403–23.
Li, X., F. Meng, and M. Uysal. 2008. “Spatial Pattern of Tourist Flows among the Asian Pacific Countries: An Examination over a Decade.” Asia Pacific Journal of Tourism Research 13 (3): 229–43.
Mabroukeh, N. R., and C. I. Ezeife. 2010. “A Taxonomy of Sequential Pattern.” ACM Computing Surveys 43:1–41.
McKercher, B., and G. Lau. 2008. “Movement Patterns of Tourists within a Destination.” Tourism Geographies 10 (3): 355–74.
Onder, I., W. Koerbitz, and A. Hubmann-Haidvogel. 2014. “Tracing Tourists by Their Digital Footprints: The Case of Austria.” Journal of Travel Research 55:566–73.
Orellana, D., A. K. Bregt, A. Ligtenberg, and M. Wachowicz. 2012. “Exploring Visitor Movement Patterns in Natural Recreational Areas.” Tourism Management 33:672–82.
Pocock, N., and A. McIntosh. 2013. “Long-Term Traveler Return, ‘Home’?” Annals of Tourism Research 42:402–24.
Prideaux, B. 2000. “The Role of the Transport System in Destination Development.” Tourism Management 21 (1): 53–63.
Raun, J., R. Ahas, and M. Tiru. 2016. “Measuring Tourism Destinations Using Mobile Tracking Data.” Tourism Management 57:202–12.
Shao, J., and U. Gretzel. 2010. “Looking Does Not Automatically Lead to Booking: Analysis of Clickstreams on a Chinese Travel Agency Website.” In Information and Communication Technologies in Tourism 2010: Proceedings of the International Conference in Lugano, Switzerland, February 10–12, 2010, edited by Ulrike Gretzel, Rob Law, and Prof. Matthias Fuchs, 197–208. Wien, Germany: Springer Vienna.
Sheng, C.-W., and M.-C. Chen. 2013. “Tourist Experience Expectations: Questionnaire Development and Text Narrative Analysis.” International Journal of Culture, Tourism and Hospitality Research 7 (1): 93 – 104.
Shie, B.-E., H.-F. Hsiao, P. S. Yu, and V. S. Tseng. 2012. “Discovering Valuable User Behavior Patterns in Mobile Commerce Environments.” In New Frontiers in Applied Data Mining, edited by L. Cao, J. Z. Huang, J. Bailey, Y. S. Koh, and J. Luo, 77–88. Berlin: Springer.
Sobolevsky, S., I. Bijic, A. Belyi, I. Sitko, B. Hawelka, J. M. Arias, and C. Ratti. 2015. “Scaling of City Attractiveness for Foreign Visitors through Big Data of Human Economical and Social Media Activity.” IEEE International Congress on Big Data, 600–7, New York, NY.
Sobolevsky, S., R. Sitko, R. T. D. Combes, B. Hawelka, J. M. Arias, and C. Ratti. 2014. “Money on the Move: Big Data of Bank Card Transactions as the New Proxy for Human Mobility
Vu et al. 15
Patterns and Regional Delineation. The Case of Residents and Foreign Visitors in Spain.” IEEE International Congress on Big Data, 136–43, Anchorage, AK.
Tan, P.-N., S. Michael, and V. Kumar. 2005. “Introduction to Data Mining.” In Association Analysis: Basic Concepts and Algorithms, chap 6. Boston: Addison-Wesley.
TRA (Tourism Research Australia). 2014a. “Australian Traveling Overseas. ” Tourism Research Australia. http://www.tra.gov. au/statistics/australians-travelling-overseas.html (accessed September 22, 2016).
TRA (Tourism Research Australia). 2014b. “Tourism Update— Updated Results to ‘State of the Industry 2013.’” Technical report, Tourism Research Australia, Canberra. https://www. tra.gov.au/documents/State-of-the-industry/TRA_State_of_ the_Industry_2014_FINAL.pdf (accessed May 10, 2016).
Versichele, M., L. de Groote, M. C. Bouuaer, T. Neutens, I. Moerman, and N. Van de Weghe. 2014. “Pattern Mining in Tourist Attraction Visits through Association Rule Learning on Bluetooth Tracking Data: A Case Study of Ghent, Belgium.” Tourism Management 44:67–81.
Vu, H. Q., G. Li, R. Law, and B. H. Ye. 2015. “Exploring the Travel Behaviors of Inbound Tourists to Hong Kong Using Geotagged Photos.” Tourism Management 46:222–32.
Xia, J., F. Evans, K. Spilsbury, V. Ciesielski, C. Arrowsmith, and G. Wright. 2010. “Market Segments Based on Dominant Movement Patterns of Tourists.” Tourism Management 31 (4): 464–69.
Xiang, Z, and U. Gretzel. 2010. “Role of Social Media in Online Travel Information Search.” Tourism Management 31 (2): 179–88.
Yan, X., J. Han, and R. Afshar. 2003. “CloSpan: Mining Closed Sequential Patterns in Large Datasets.” In Proceeding of SIAM International Conference on Data Mining, edited by Daniel Barbara and Chandrika Kamath, 166–77. Philadelphia : Society for Industrial and Applied Mathematics.
Yang, D.-l., Y. L. Hsieh, and J. Wu. 2006. “Using Data Mining to Study Upstream and Downstream Causal Relationship in Stock Market.” In Proceedings of the 9th Joint Conference on Information Sciences, October 8–11, Kaohsiung, Taiwan.
Yang, Y., T. Fik, and J. Zhang. 2013. “Modeling Sequential Tourist Flows: Where Is the Next Destination?” Annals of Tourism Research 43:297–320.
Yuan, Y., and M. Medel. 2016. “Characterizing International Travel Behavior from Geotagged Photos: A Case Study of Flickr.” PLos One 11 (5): e0154885.
Zach, F., and U. Gretzel. 2012. “Tourist-Activated Networks: Implications for Dynamic Bundling and EN Route Recommendations.” Information Technology & Tourism, 13 (3): 229–38.
Zakrisson, I., and M. Zillinger. 2012. “Emotions in Motion: Tourist Experiences in Time and Space.” Current Issues in Tourism 15 (6): 505–23.
Zheng, Y.-T., Z.-J. Zha, and T.-S. Chua. 2012. “Mining Travel Patterns from Geotagged Photos.” ACM Transactions on Intelligent Systems and Technology 3 (3): 1–18.
Zheng, Z., W. Wei, C. Liu, W. Cao, L. Cao, and M. Bhatia. 2016. “An Effective Contrast Sequential Pattern Mining Approach to Taxpayer Behavior Analysis.” World Wide Web 19 (4): 633–51.
Author Biographies
Huy Quan Vu, PhD, is a research fellow at the Center for Applied Informatics, Victoria University. His research interests include machine learning, data mining, social network analysis and technol- ogy applications in tourism and hospitality.
Gang Li, PhD, is an associate professor at the School of Information Technology, Deakin University. His research interests are data sci- ence, information abuse prevention, data privacy, and technology applications to tourism and hospitality.
Rob Law, PhD, is a professor at the School of Hotel and Tourism Management, the Hong Kong Polytechnic University. His research interests are information management and technology applications.
Yanchun Zhang, PhD, is a professor at Center for Applied Informatics, Victoria University University. His research interests include databases, data mining, health informatics, web information systems, and web services.