Week 3 Assignment 1.5-3 pages

profileMissMsw01
Week3resourcescontd2.pdf

Education

Advanced Technologies and Data Management Practices in Environmental Science: Lessons from Academia

REBECCA R. HERNANDEZ, MATTHEW S. MAYERNIK, MICHELLE L. MURPHY-MARISCAL, AND MICHAEL F. ALLEN

Environmental scientists are increasing their capitalization on advancements in technology, computation, and data management. However, the extent ofthat capitalization is unknown. We analyzed the survey responses of 434 graduate students to evaluate the understanding and use of such advances in the environmental sciences. Two-thirds of the students had not taken courses related to information science and the analysis of complex data. Seventy-four percent of the students reported no skill in programming languages or computational applications. Of the students who had completed research projects, 26% had created metadata for research data sets, and 29% had archived their data so that it was available online. One-third of these students used an environmental sensor. The results differed according to the students' research status, degree type, and university type. Changes may be necessary in the curricula of university programs that seek to prepare environmental scientists for this techno- logically advanced and data-intensive age.

Keywords: data life cycle, data repository, education, environmental sensors, eScience

With the advent of recent technological and computationaladvances, scientists are using increasing numbers of in situ environmental sensors, model simulations, crowd- sourcing tasks, and embedded networked systems that enable environmental studies to incorporate various spatio- temporal scales and to produce utiprecedented amounts of data (Porter et al. 2005, Benson et aL 2010). Such tech- nologies and an increasing interest in synthesis studies of environmental phenomena have made data valuable beyond their immediate use (Peters et al. 2008). The flood of data that digital technologies produce (Hey and Trefethen 2003) underscores the urgency of a rapid adoption of pertinent skills and best practices by environmental scientists in the proper management of data sets. Studies in which such preparedness in the environmental sciences is evaluated are absent; however, academic institutions may play a role in imparting the relevant knowledge and skills to the next generation of scientists.

As electronic devices become smaller and cheaper and as complementary computer power grows and applications increase in efficiency, scientists at all career stages are finding technology useful for addressing topics from global epidem- ics to climate change. Such integration has transformed

both the experimental techniques and the solitary working platforms known by predecessors in the field in the not-so- distant past (Nature 2003). But the use of technology and interdisciplinary collaborations often necessitates analytical tools for the integration and analysis of large and hetero- geneous data sets. In a survey of a distributed seminar course for ecology graduate students incorporating 11 American universities, Andelman and colleagues (2004) found that over 90% of the students did not have skills in the scripted programming languages that they considered essential for large data set integration and analysis. The degree to which academic institutions have modified their curricula or programs in anticipation of increasing demand for scien- tists with technological and computational competency is unknown.

Another trend yet to be quantified is an increase in the num- ber of environmental scientists who follow proper data man- agement practices to improve their research. Exemplifying this trend, the National Science Foundation (NSF) now requires that all grant applications include data management plans (NSF 2010). Regardless of the size of a project or its associated data products, creating and following through with such plans requires fulfilling metadata requirements and completing the

BioScience 62: 1067-1076. ISSN 0006-3568, electronic ISSN 1525-3244. © 2012 by American Institute of Biological Sciences. All rights reserved. Request

permission to photocopy or reproduce article content at the University of California Press's Rights and Permissions Web site at www.ucpressjournals.com/

reprmtin/o.flsp. doi:10.1525/bio.2012.62.12.8

WWW. biosciencemag. org December2012/Vol. 62No. 12 • BioScience 1067

Education

data life cycle (e.g., collection, management, interpretation, long-term archiving; Wallis et al. 2010). Metadata are the documentation and annotations used to manage, share, and preserve data resources. Many believe that metadata standards are critical for overcoming widespread problems of linguistic uncertainty that can render environmental data unshareable (Regan et al. 2002). The degree to which programs and advis- ers in the environmental and ecological sciences are instructing graduate students to correctly capture and record metadata or to use metadata standards, such as the Ecological Metadata Language (EML), is unknown.

In addition, it is unknown whether programs and advisers are supporting and conveying the responsibility of proper data archiving in online data repositories (e.g.. Dryad; www. datadryad.org) and thereby completing the data life cycle. When graduate students are not trained in data archival methods or do not take independent action to archive their graduate research data sets, they may be less likely to archive data sets in future research endeavors. As an example, the Networked Digital Library of Theses and Dissertations already contains over one million graduate products whose original data may be available only by contacting the author, or even worse, the data may have been misplaced. The continuance of this practice would be a huge loss of opportunity to the academic community, however large or small each individual student's data set may be, especially if the number of graduate degrees awarded continues to grow (see supplemental figure SI, available online at http://dx.doi. org/10.1525/bio.2012.62.12.8).

In this study, our first goal was to evaluate the technologi- cal and computational experience of environmental scien- tists and their data management practices in the formative stages of their career. Specifically, we were interested in the breadth of coursework completed by environmental graduate students that was germane to computational and information science and to the analysis of large and complex data sets. We also sought to determine the proficiency levels of graduate students with analytical tools, including programming lan- guages and computational applications that are frequently employed in environmental studies. Finally, we evaluated the students' data management practices, environmental sensor use, and interdisciplinary collaborations, comparing between those who had completed and those who had not completed their master's research project or dissertation. A secondary goal was to compare master's students with doctoral students and also to determine whether differences exist among differ- ent institution types in California. Specifically, we surveyed private California universities, the University of California (UC), and California State University (CSU). Private univer- sities differ in their major funding sources, whereas the latter two differ in their function (i.e., institutions with exclusive jurisdiction in PhD and professional instruction or undergraduate-focused institutions with primarily master's degree graduate programs, respectively; Douglass 2007).

Using survey responses of current and former graduate students, we highlight the degree to which academia is

facilitating the integration of technology, computation, and data management in the environmental sciences and dis- cuss its implications for the contribution of research data products to the greater body of scientific knowledge. Finally, we draw on these results to elucidate methods by which environmental scientists at all career stages may excel in this technological and data-intensive era.

Graduate students' responses and the data- collection process During the months of June, July, and August 2011, we conducted an online survey (using www.surveymonkey.com; see supplemental form 1). We solicited responses from master's and doctoral students in academic departments related to environmental or ecological sciences from 27 California universities, including 4 private schools, 9 public universities in the UC system, and 14 public universities in the CSU system. CSU institutions offer research-based mas- ter's degrees and, in general, do not support doctoral pro- grams. All private universities and UC institutions surveyed support both master's and doctoral programs; however, all of the survey respondents for these university types were planning to complete or had completed a doctoral degree. We excluded universities that did not respond to requests for participation and from whose students we received fewer than three responses. Private universities were those classified as research institutions by the Association of Independent California Colleges and Universities {n = 7), that offer an environmental- or ecology-related graduate program (« = 4), and that were receptive to participation (n > 3). In total, 23 universities, including 18 academic pro- grams from 11 California State Universities, 16 academic programs from 9 Universities of California, and 4 academic programs from 2 private universities, were represented.

The survey responses were solicited through e-mail. When it was possible, we sent e-mail solicitations to gradu- ate student electronic mailing lists within each surveyed department. If such mailing lists were not available, we collected student e-mail addresses from online department directory pages and e-mailed the students directly. For a few surveyed universities, we also e-mailed faculty members within the relevant departments and asked them to forward our solicitation e-mail to students. If our first solicitation to a particular department did not result in responses, we sent a second solicitation e-mail. Students who had completed their graduate degree more than two years prior or answered no to the question "[Do] your education and research foci fall within the ecological or environmental sciences?" were excluded from our analyses.

The response rates were difficult to calcu]ate, because the survey was, in most cases, sent to departmenta] mailing lists, the sizes of which were unknown. Instead, we counted the number of students listed on departmental Web pages. Using this proxy measure, we calculated approximate response rates of 23% for the UC sample and 25% for the private sample. We did not calculate a response rate for the CSU sample.

1068 BioScience • December2012/Vol. 62No. 12 www. biosciencemag. org

Education

because department lists were not provided. We processed and statistically analyzed all of the survey data using scripts in R iwww.r-project.org). For all of the survey questions, means were derived using the number of responses for each university as a weight, and the associated 95% confidence intervals (CI) were reported. We determined the differences in responses among the three university types and between the master's and doctoral students by using chi-squared anal- yses based on counts derived at the response level. We used Student's f-test scores to determine significant differences between the responses of those students with thesis or dis- sertation research in progress and those who had completed their research on the basis of weighted means at the individ- ual university level. It was possible that the students would respond that their research project was both completed and in progress; this scenario occurred, for example, when a student had progressed from a research-based master's to a doctoral program.

Survey results In total, 498 graduate students responded to the survey, and of those, 434 met the study's criteria. The number of eligible responses varied according to the student's thesis or dissertation status (progress, n = 326; completed, « = 131), according to their education level (master's student, « = 124; doctoral student, n = 385), and according to university type (California State University, n = 124; University of California, n = 261; private university, n = 49) (supplemental table SI).

Coursework. Over 80% (82.3%, 95% CI = 5.3; table 1) of the students in our survey stated that they had completed none of the eight computer and information science courses evaluated in this study. Over 20% of the students had com- pleted coursework in introductory computing (23.8%, 95% CI - 5.9) and computer programming (22.9%, 95% CI - 4.6).

The students completed the least amount of coursework in networking, metadata, and information technology. The stu- dents showed little intention of eventually taking additional courses in this discipline (1.0%, 95% CI = 1.6), but that intention was numerically greatest for bioinformatics and computational biology (2.4%, 95% CI = 3.8).

A large number of the students—74.6% (95% CI = 6.0)— stated that they had not completed any coursework related to the management and analysis of complex data (table 2). Approximately one-third (30.5%, 95% CI = 6.4) of the students stated that they had taken at least one course in geographic information systems (CIS), 29.2% (95% CI = 6.3) had taken coursework in tnodeling, and 19.6% (95% CI = 6.1) had taken courses in spatial analysis. Less than 20% of the students had taken a course in remote sens- ing (16.1%, 95% CI = 5.8), time series analysis (12.1%, 95% CI = 3.2), meta-analysis (6.9%, 95% CI = 3.4), or data min- ing (4.9%, 95% CI = 3.0).

Skills. A majority—74.0% (95% CI = 6.6)—of the students stated that they had no skills in the programming languages and computational applications evaluated in this survey. Only 17.2% (95% CI = 4.7) of the students, on average, stated that they had basic skill levels in these areas. The stu- dents had the least experience with EML (99.1% stated that they had no experience, 95% CI = 4.7; figure 1), Java (90.5%, 95% CI = 12.1), or IDL (Interactive Data Language; 90.5%, 95% CI = 0.7). The students claimed a basic skill level or higher in GIS (e.g., ArcGIS; 55.5%) and statistical applica- tions, including R (55.9%), and JMP, SPSS, or SAS (53.0%).

Advanced technologies. Approximately one-third (36.7%, 95% CI = 8.7) of the students whose program was still in prog- ress planned to use environmental sensors in their research study (figure 2). This number paralleled the percentage of

Table 1. The mean percentage of surveyed graduate students who to computational and information science.

Introductory computing

Computer programming

0 courses completed

Mean

69.4

63.8

Data structures or algorithms 81.7

Networking

Information technoiogy

Database management

Metadata

Bioinformatics or computational biology

AH courses

95.1

90.8

86.1

94.2

76.9

82.3

Abbreviation: CI, confidence interval.

95% CI

6.7

8.0

5.7

2.6

4.9

4.0

4.1

6.5

5.3

'The survey stated, "0, but I will take one soon."

1 course completed

Mean

23.8

22.9

14.2

3.3

7.4

11.0

4.4

15.5

12.8

95% CI

5.9

4.6

4.4

2.3

4.3

3.5

4.0

4.7

4.2

had taken or intended to i

2 courses completed

Mean

4.2

4.6

1.8

0.7

0.7

1.6

0.7

3.6

2.2

95% CI

2.9

2.5

1.2

0.6

1.0

1.1

1.8

1.7

1.6

ake courses in subjects related

3 or more courses completed

Mean

1.8

6.8

1.1

0.5

0.7

0.5

0.2

1.6

1.7

95% CI

1.1

3.2

1.6

0.6

0.6

0.6

0.5

1.9

1.3

Intended future course"

Mean

0.7

1.8

1.1

0.5

0.5

0.9

0.5

2 4

1.0

95% CI

0.5

3.1

1.9

0.4

1.8

0.7

0.4

3 8

1.6

www. biosciencemag.org December2012/Vol. 62No. 12 • BioScience 1069

Education

Table 2. The mean percentage of surveyed graduate student to the management and analysis of large or complex data.

Course

Spatial analysis

Geographic information systems

Remote sensing

Modeiing

Time series anaiysis

Meta-anaiysis

Data mining

Aii courses

0 courses completed

Mean

71.7

54.3

77.2

54.7

82.1

91.0

91.4

74.6

Abbreviation: CI, confidence interval.

95% CI

7.1

9.4

6.9

7.9

4.1

3.5

3.3

6.0

"The survey stated, "0, but I will take one soon."

1 course completed

Mean

19.6

30.5

16.1

29.2

12.1

6.9

4.9

17.1

95% CI

6.1

6.4

5.8

6.3

3.2

3.4

3.0

4.9

; who had iaken or intended to

2 courses completed

Mean

3.6

7.8

3.7

7.8

3.6

0.7

1.1

4.1

95% CI

2.1

3.5

2.0

2.8

2.2

1.8

1.9

2.3

3 or more

take courses

courses completed

Mean

2.8

3.5

2.3

5.4

0.7

0.0

0.5

2.2

95% CI

1.3

1.8

1.5

2.1

0.5

0.0

0.6

1.1

in subjects related

Intended future course'

Mean

2.3

3.9

0.7

2.8

1.5

1.4

2.1

2.1

95% CI

1.1

3.7

0.5

2.1

0.9

0.8

1.9

1.6

Students who had completed their research and had, in fact, used environmental sensors (33.1%, 95% CI = 10.1). More than 10% (i.e., 14.9%, 95% CI = 9.8) of the students whose research was in progress did not know what an environmen- tal sensor was or what it meant to use it in environmental research, but this number was halved (7.5%, 95% CI = 0.7) for the students who had finished their research. There was no significant difference between the percentage of students whose research was in progress and who intended to use a sensor in that research and that of the students who had completed their research and who actually did use a sensor (table 3). The doctoral students whose research was still in progress planned to use environmental sensors significantly more than did the master's students, and there was a nearly significant difference in education level for the students who had used environmental sensors in their research (p = .0520; table 4a, 4b). The students at the UC institutions planned on using environmental sensors in their research (41.9%) significantly more than did those in private (27.1%) and CSU-system (28.5%) universities (supplemental table S2).

Interdisciplinary collaboration. The percentage of students who had collaborated with someone whose expertise was outside the environmental or ecological sciences was significantly lower (37.6%, 95% CI = 1.4) than the percentage of stu- dents whose work was in progress who stated that they had planned such collaborations (55.4%, 95% CI = 7.5; table 3). The percentage of students who planned an interdisciplinary collaboration was significantly larger than that of students who were finished with their research and actually had done so (table 3). There was no significant difference in interdis- ciplinary collaboration activities between the master's and doctoral students (table 4a, 4b). There were significant differ- ences in interdisciplinary collaboration among the students at different university types who had finished their research (table S2). Specifically, the CSU students were less likely to

collaborate (28.1%) than were the students at UC institu- tions (39.8%), who were also less likely to collaborate than the students at private universities (51.7%).

Data management. Approximately 72.3% (95% CI = 6.2) of the students who were still in the process of completing their master's or doctoral research were planning on com- pleting the data life cycle in their research, and 65.3% (95% CI = 6.7) of these students intended to archive their research data so that it would be available online (table 3). Of those who had already completed their graduate degree, 63.9% (95% CI = 16.2) stated that they had completed the data life cycle, whereas only 29.3% (95% CI =13.1) had made it avail- able online—significantly less than the prospective figure from the students still in the midst of their research (table 3). A large portion of the students stated that they did not plan on making their data available online, and this number was greater for the students who had already completed their thesis or dissertation (45.9%, 95% CI = 1.3) than for those whose research was still in progress (28.0%, 95% CI = 6.7). Almost one-third of the students whose research was in progress did not know what it means to create metadata for their data sets (28.0%, 95% CI = 8.8), and a similar num- ber (34.7%, 95% CI = 9.3) did not plan to create metadata for their data sets. For the students who had finished their research, 25.6% (95% CI = 1.3) created metadata, 63.2% (95% CI = 1.7) did not, but 12.0% (95% CI = 1.3) planned to do so some time in the future.

The students' data management practices varied accord- ing to degree type (table 4a, 4b). The doctoral students were more likely to complete or to plan to complete the data life cycle. However, the master's students showed significantly greater intent to create metadata and to archive their data products such that it would be available online than did the doctoral students. There were no significant differences among the different university types regarding data life cycles

1070 BioScience • December 2012 / Vol. 62 No. 12 www. biosciencemag. org

Education

a NONE

D BASIC

• PROFICIENT

• EXPERT

Percentage

20 40 60 80 100

O

C, C # , C+-H

EML

ENVI

a. GIS (e.g., ArcGIS) (0

"5 o IDL r ou

Java

g, JMP, SPSS, SAS

O)

s MATLAB

Ë 2 i"

Access

Python

SQL, MySQL

Figure 1. The level of proficiency of the surveyed graduate students with programming languages or computational applications. The error bars represent 95% confidence intervals. Abbreviations: EML, Ecological Metadata Language; GIS, geographical information systems; IDL, Interactive Data Language.

suggest that many of the skills and practices that would enable scientists to use these new opportunities are only marginally itistructed in formal graduate programs in California in the environmental sciences.

Environmental curricula: New courses and

skill sets. Students can and do learn new methods and technologies on their own, but advanced computation, in situ field sensor technologies, and digital data management best practices will only become standard tools and skills if they are integrated into formal curricula. Among the topics that we surveyed, GIS and modelitig courses were the most widely studied by the students; About one-third of them had taken a GIS or modeling course. Only two other top- ics in our survey even reached 20%. This suggests that most environmental scientists in training are not taking the initiative to expand their knowledge in these areas through fortnal courses.

The development of novel courses requires many resources, including expertise, time, and funding. In some cases, it may be worthwhile to inte- grate new material or skills into existing courses. However, external organizations may provide relevant materials that can be incorporated into an institution's curriculum. The DataONE organiza- tion, for example, develops educational programs related to data management, such as internships, workshops at pro- fessional meetings, and educational modtiles on specific data management topics (see www.dataone.org/education for more information).

and metadata creation (table S2). But students at private universities (69.5%) and UC institutions (67.0%) were more likely to make their research data available online than stu- dents at a CSU institution.

The extent of graduate student preparation Environmental studies in which new kinds of technology, computation, data life cycle techniques, and open-source dissemination are employed hold promise for addressing many important societal issues, including the measurement of biodiversity shifts (Kelling et al. 2009) and the assess- ment of climate change (Graham et al. 2010), but our results

Learning to capitalize on technology. I n

this study, we show that environmental sensors are irnportant methodological instruments for a large proportion of graduate students. A limitation of our study is that we did not assess the levels of complexity in the sensor setup (e.g., an individual device versus a sensor network) or in data streams derived from such devices. More complex scenarios often require that users have knowledge in areas in which few of the students in our survey had taken courses, such as data structures and algorithms, database manage- ment, and networking (table 1). Researchers will also need to understand how new technologies can be used, their strengths and limitations, and techniques for analyzing the numerous and complex data that they output. For example, one must

www. biosciencemag. org December 2012 / Vol. 62 No. 12 • BioScience 1071

Education

a Percentage of students (research completed)

100 90 80 70 60 50 40 30 20 10 10 20 30 40 50 60 70 80 90 100

Created metadata

Used environmental sensors

Archived research data so that it is avaiiabie oniine

Coliaborated with researcher outside environmentai science

Compieted the data life cycie (coilection, management, interpretation, archivai)

100 90 80 70 60 50

Percentage of students (research in progress)

40 30 20 10 10 20 30 40 50 60 70 80 90 100

Complete the data life cycle (collection, management, interpretation, archival)

Create metadata

Use environmental sensors

Archive research data so that it is avaiiabie oniine

Coilaborate with researcher outside environmental science

Figure 2. (a) Mean percentage of responses for the surveyed graduate students (a) who had completed their master's or doctoral research or (b) who had not yet completed their master's or doctoral research. The error bars represent 95% confidence intervals. The respondents were earning or had earned their master's or doctoral degree in the ecological or environmental sciences at a California State University, the University of California, or a private California university.

Table 3. The mean percentage of surveyed graduate students who responded that they planned to complete (n = 326) or had already completed (n = 131) the relevant research steps.

Research project status

Research step

Completion of the data life cycle

Creation of metadata

Use of environmentai sensors

Online archivai of research data

Collaboration with researchers outside environmental science

In progress

Mean

72.3

37.0

36.7

65.3

55.4

95% CI

6.2

9.3

8.7

6.7

7.5

Completed

Mean

63.9

25.6

33.1

29.3

37.6

95% CI

16,2

13.2

10.1

13.1

13.9

t(455)

3.388

4.361

1.600

16.137

7.366

P

.0008

<.OOO1

,1104

<,0001

<,0001

Abbreviation: CI, confidence interval.

"This value is not significant.

be able to adequately design environ- mental experiments to support reliable inferences, regardless of whether one is using computational technologies or traditional field-surveying techniques. The task of designing experiments, however, becomes more problematic when students are not well equipped to integrate new technology and statisti- cal techniques (Millspaugh and Citzen 2010). Nonetheless, our results sug- gest that students already value the integration of technology. Pedagogical models that address technology in environmental science and the other aforementioned concepts and that can be easily duplicated by instructors are needed.

1072 BioScience • December2012/Vol. 62 No. 12 www. biosdencemag. org

Education

Table 4a. The percentage of surveyed graduate students who reported that they have completed the relevant research steps as a function of their education level for the students who had completed their thesis or dissertation project.

Research step

Completion of the data iife cycle

Creation of metadata

Use of environmental sensors

Online archival of research data

Coiiaborated with researchers outside environmental science

Education

Master's

48.0

20.0

12.0

20.0

20.0

level

Doctoral

67.6

26.9

38.0

31.5

41.7

XH2)

7.578

2.057

5.912

1.205

4.133

P

.0226

.3575'

.0520=

.5473=

.1266=

••This value is not significant.

Table 4b. Percentage of graduate students who reported that they planned to perform the relevant research steps as a function of their education level for the students who had not yet completed their thesis or dissertation project.

Research step

Completion of the data iife cycie

Creation of metadata

Use of environmental sensors

Online archival of research data

Coiiaborated with researchers outside environmental science

Education

Master's

75.9

30.6

30.6

55.5

44.4

level

Doctoral

68.4

26.3

36.0

33.3

43.0

XH2)

6.976

21.459

6.762

21.952

1.3873

P

.0306

<.OOO1

.0340

<.OOO1

.4998

'This value is not significant.

A place for uncommon collaborations. Collaborations allow nonexperts to take advantage of new technologies— particularly when those technologies are still in the devel- opment stage. Scientists can also consult with computer science or engineering partners about existing off-the-shelf tools. Developing interdisciplinary collaborations, however, can be time consuming and challenging. As Andelman and colleagues (2004) illustrated, students new to inter- disciplinary work may not be aware of the challenges involved in initiating and maintaining such collaborations. Collaborators must spend time learning each other's lan- guage (including jargon), research methods, and expec- tations, and they must develop schedules, project plans, and—in the end—research products that satisfy' everyone involved. Interdisciplinary projects can be risky from a pro- fessional point of view, particularly in the early phases of a research career (Rhoten and Parker 2004). Not incidentally, theses and dissertations still need to meet the requirements of the students' individual disciplines.

Despite these challenges, our survey shows that inter- disciplinary collaborations by environmental scientists are important. The obvious benefit of interdisciplinary collaborations is access to expertise that enhances projects.

Additional benefits are derived when scientists provide feedback to the devel- opers of new technologies or work with developers to ensure that technology is durable in various field conditions. For example, new technologies are more conducive to field studies if they are adaptive to shifts in environmental and behavioral phenomena (Collins et al. 2006, Allen et al. 2007, Rundel et al. 2009)—something that environmental scientists are equipped to assess.

Teaching environmental scientists data man- agement. As the frequency of collabo- rations increases, data management and sharing needs and expectations grow as well. In addition, complet- ing the data life cycle—that is, docu- menting data-collection and analysis processes, making data available in a usable format, and submitting data to a formal data archive—is increasingly expected by research fanders.

The majority of students in our survey indicated that they had (or expected to have) completed the data life cycle by the completion of their program. Comparing the responses of students whose thesis or disserta- tion was in progress with those of students who had completed their project shows some differences. For

example, there were fewer positive responses to all of the questions from those who had already completed their thesis or dissertation than from those whose thesis or dis- sertation was still in progress (figure 2). This might indi- cate wishful thinking by those who were still working on their projects or poor follow-through on the behalf of the students who had finished their projects. Perhaps this is indicative of the students' perceptions that they would be well equipped to tackle the task of completing the data life cycle at the completion of their project, but the students found that their formal education in these domains had been inadequate to motivate the completion of the data life cycle. The differences in the responses among the stu- dents from the different kinds of institutions (e.g., the CSU students were less likely to have completed the data life cycle and to have archived data) indicate that the doctoral students from the UC-system and private universities may have had more time or resources to complete data manage- ment tasks than did the CSU students.

Previous studies have shown that the difficulties of creat- ing metadata are some of the biggest impediments to sharing data (Campbell et al. 2002, Tenopir et al. 2011). Researchers without metadata expertise must either spend significant

www. biosciencemag. org December2012/Vol. 62No. 12 • BioScience 1073

Education

amounts of time helping outside researchers to understand their data or must forgo data sharing altogether. Paradoxically, as collaboration becomes the norm, metadata creation can become an even bigger challenge. Regardless of the size of a research project, the loss of metadata can be prevented by documenting data-collection methods, data transformations, and analysis steps as they occur and according to metadata standards (Michener et al. 1997). However, in collaborative projects, the responsibility for metadata creation is often not discussed, and knowledge of the different components of a project is diffused among the collaborators (Mayernik 2011, Wallis and Borgman 2011). Therefore, putting together integrated documentation from large projects requires broad coordination and input and specifically assigning the task of data documentation to particular individuals.

If students are not familiar with metadata practices that allow their data to be used by collaborators or reused by out- side researchers, the potential benefits of data sharing across projects may not be realized. Few metadata-specific courses for graduate students exist, and most that do exist are offered in library or information science departments. Therefore, it is not surprising that very few of our survey respondents had taken courses relating to metadata (table 1). In fact, of the activities surveyed in table 1, creating metadata received the lowest cumulative positive response and had the highest response for / don't know what that means right now from students who were in the midst of their graduate research. In addition, figure 1 shows that 99% of our survey respondents had no knowledge of EML, which was formally adopted by the US Long Term Ecological Research Network as a metadata standard in 2003. Our survey results suggest that metadata training and EML are not regular parts of graduate student education in ecology and environmental science.

Conclusions Scientists are now uniquely poised to employ in situ sensors, to create and use open-source data sets, and to capitalize on other technologies for use in research. However, graduate students and early-career scientists may need to acquire some additional skills and knowledge that were not necessary or available a generation ago. In some cases, immediate advis- ers and mentors provide this background. However, as our survey results indicate, many of the requisite newer skills and knowledge are not being obtained through coursework or instruction. Institutions responsible for equipping environ- mental scientists with the tools necessary for success have a challenge ahead of them. Frameworks and educational models addressing the concepts discussed in this article that can be replicated and tested for efficacy are much needed. The move toward competence may also be accomplished unconventionally through consultation with people with such skills, such as a technician or expert on a critical instru- ment or someone with a new statistical tool (see box 1 for more information). Students may need to go beyond the instruments and analytical approaches used in their immedi- ate labs.

Regardless of available training, graduate students or early-career scientists may doubt their ability to integrate such technology into their own research or to create data sets that can be repurposed by other researchers. Integrating tech- nology and creating reusable data collections may require additional effort up front, but long-term rewards will result for the individual scientist and for the scientific and public communities. For example, many data archives are now requesting that data users include data citations in their pub- lications (e.g.. Cook 2008). The goal of such data citations is to make the data as valued in scientific settings as peer- reviewed publications, although this type of work is not yet often considered in the evaluation processes of most tenure and promotion committees.

How do the environmental sciences move forward to help students take advantage of advanced technology and data repositories? Students and educators can contribute in differ- ent ways. Graduate students can be proactive in investigating new opportunities, including spending time exploring in situ sensor devices or networked systems currently available to determine whether they are appropriate for the study's objec- tives. They must also consider that other scientists may have similar needs for new equipment. Software that might elimi- nate bottlenecks in data workflow or automate data entry or processing should be investigated. Database structures, data workflows, and metadata requirements should be established at the beginning of a project—before data collection. Metadata routines should be established during data collection, man- agement, and interpretation. Options for long-term data archiving should be investigated. Research centers and univer- sity libraries may have data and metadata archiving options that may help reach the target audience. Relevant seminars, symposiums, and conferences should be sought in other disciplines, such as computer science and engineering. These gatherings provide opportunities to share innovative ideas and to meet prospective tech-sawy colleagues (who might be keen on using their skills for environmental research).

From the educator's point of view, how should an environ- mental science curriculum evolve to provide graduate stu- dents with the skills necessary to use new computational and data tools? Environmental science programs within different institutions will be situated to approach this question in dif- ferent ways, but some promising approaches include the fol- lowing: Classes or academic programs can be developed that are cross-listed among disciplines or cotaught by instructors from various disciplines. Cross-listed courses introduce students to new research methods and techniques, as well as to students from other disciplines. Ever-improving online instruction tools are also making interinstitution courses pos- sible, as is exemplified by a new distributed program in land- scape genetics developed by Wagner and colleagues (2012). The list of courses that meet students' methods requirements could include computer and information science courses, lust as many students take statistics courses, they might also benefit from these. Student-focused workshops can be cre- ated with data or computational themes. Workshops can

1074 BioScience • December 2012 / Vol. 62 No. 12 www. biosciencemag. org

Education

Box 1 . Selection of Web sites offering information and resources in advanced technologies and data management practices in environmental science.

The following lists are not exhaustive but will serve as a springboard for those inter- ested in learning more about the eScience research community and its services.

Research centers The Center for Embedded Networked Sensing (http://research.cens.ucla.edu) and its Urban Sensing program (http://urban.cens.ucla.edu) DataONE (The Data Observation Network for Earth; https://dataone.org) NEON (The National Ecological Observatory Network; http://www.neoninc.org) The National Center for Ecological Analysis and Synthesis (www.nceas.ucsb.edu) The South African Environmental Observation Network (www.saeon.ac.za) The Knowledge Network for Biocomplexity (http://knb.ecoinformatics.org) Ecoinformatics' online resource for managing ecological data and information (www. ecoinformatics. org) Oregon State University's Eco-Informatics Summer Institute (http://eco-informatics. engr. o regonsta te. edu) The University of Washington's eScience Institute (http://escience.washington.edu)

Selected projects What's Invasive! Comrnunity Data Collection (http://whatsinvasive.com) Trash I Track (http://senseable.mit.edu/trashtrack) Project BudBurst (http://neoninc.org/budburst) Urban Sensing Projects (http://urban.cens.uch.edu/proiects) Biketastic (http://biketastic.com)

Data management and metadata resources DM?Too\ (http://dmp.cdlib.org) Ecological Metadata Language (http://knb.ecoinformatics.org/sofiware/eml) The Dublin Core Metadata Initiative (www.dublincore.org) The Kepler Project (https://kepler-project.org)

Online data repositories DataONE (www.dataone.org) The Dryad data repository (www.datadryad.org) Global Population Dynamics Database (www.imperial.ac.tik/cpb/gpdd/secure/login. aspx?Return Url= %2fcpb %2fgpdd%2fgpdd-b. aspx) The Interaction Web DataBase (www.nceas.ucsb.edu/interactionweb) The US Long Term Ecological Research Network Data Portal (http://metacat.lternet.edu) Metacat (http://knb.ecoinformatics.org/knb/docs) The Paleobiology Database (www.paleodb.org) DataUp (http://datapub.cdlib.org) Vegbank (http://vegbank.org)

provide intensive environments in which to learn particular methods or technologies and might be easier to organize and implement than new or cross-listed courses (e.g., Andelman et al. 2004). Students can be encouraged to look into intern- ship programs, such as DataONE's Summer Internships Program (www.dataone.org/internships). As the NSF and other funding agencies continue to promote data-intensive research collaborations, the possibilities for relevant intern- ships will probably increase.

Curriculum and culture changes related to advanced tech- nologies and data management in environmental science are certain to be gradual. As more examples appear of the ways in which the increased use of new technologies and increased attention to data management can benefit

individual environmental scientists, stu- dents' interest in these tools and tech- niques is likely to increase. To serve as an example of our own cause, we have archived the data we collected in this study in the Dryad data repository (see Hernandez et al. 2012). The Dryad data repository accepts "data files asso- ciated with any published article in the biosciences, as wel] as software scripts and other fi]es important to the article" (http://datadryad.org/depositing). We prepared our data for deposit while the present article was being reviewed. In preparing our data for submission, we followed the recommendations pro- vided on Dryad's "Depositing data to Dryad" Web page (http://datadryad.org/ depositing). Dryad's page provides rec- ommendations on file names and data file documentation and suggestions for standardization. Because our data are survey data, we also followed the data- deposit recommendations provided by the Inter-University Consortium for Political and Social Research (www. icpsr.umich.edu/icpsrweb/deposit/index. jsp), the largest archive of quantita- tive social science data. Data sets from environmental studies might be best prepared for deposit according to the recommendations provided by the Oak Ridge National Laboratory Distributed Active Archive Center (DAAC) for Biogeochemical Dynamics (Hook et al. 2010).

As more scientists become accus- tomed to documenting and archiving their data in long-term data archives, more data will be available for reuse. Unless metadata becomes a more salient topic within environmental science edu-

cation, however, these archived data sets may be of highly vari- able utüity. Some data archives, such as the Oak Ridge DAAC, provide support for metadata creation. Data archives that use a model similar to self-publication, such as the Dryad reposi- tory, require the data creators to create and deposit metadata. The process of documenting data sets for use by someone else is different from the process of documenting data sets for one's own use, even though descriptions of many of the same project aspects are involved, such as annotations of methods, sampling processes, and errors or uncertainties. Outside users require considerably more in-depth and detailed metadata descriptions.

Metadata standards are crucially important in integrating data from individual projects. In our survey, we investigated

wvm>. biosciencemag. org December2012/Vol. 62No. 12 • BioScience 1075

Education

only one standard: EML. Not all data-sharing systems for ecological and environmental data use EML; many other metadata standards exist, both general-purpose standards and topic-specific standards, such as Darwin Core for biodi- versity data (Wieczorek et al. 2011) and Federal Geographic Data Committee standards for geospatial data. However, the lack of metadata training indicated by our survey sug- gests that students will come to any metadata standard unequipped with basic knowledge of how to create metadata following such standards.

With the increased attention to computational and data- intensive science by federal funding agencies, universities, and the general public, new curriculum initiatives in the environmental sciences might be well received by both insti- tutions and students. Building new tools and techniques into educational curricula is essential to enabling individual scientists and large collaborative groups of scientists to solve Earth's environmental problems in this data-intensive age.

Acknowledgments This research was funded by National Science Foundation grant no. EF-0410408 and Center for Embedded Networked Sensing grant no. CCR-0120778.

References cited Allen MF, et al. 2007. Soil sensor technology: Life within a pixel. BioScience

57: 859-867. Andelman SJ, Bowles CM, Willig MR, Waide RB. 2004. Understanding

environmental complexity through a distributed kjiowledge network. BioScience 54: 240-246.

Benson BJ, Bond BJ, Hamihon MP, Russell MK, Han R. 2010. Perspectives on next-generation technology for environmental sensor networks. Frontiers in Ecology and the Environment 8: 193-200.

Campbell EG, Clarridge BR, Gokhale M, Birenbaum L, Hilgartner S, Holtzman NA, Blumenthal D. 2002. Data withholding in academic genetics: Evidence from a national survey. Journal of the American Medical Association 287: 473-480.

Collins SL, et al. 2006. New opportunities in ecological sensing using wireless sensor networks. Frontiers in Ecology and the Environment 4:402^07.

Cook R. 2008. Citations to published data sets. FluxLetter 1: 4-5. (20 September 2012; http://hwc.berkeley.edu/FluxLetter/FluxLetter- VoU-No4.pdf)

Douglass JA. 2007. The California Idea and American Fligher Education: 1850 to the 1960 Master Plan. Stanford University Press.

Graham EA, Riordan EC, Yuen EM, Estrin D, Rundel PW. 2010. Public Internet-connected cameras used as a cross-continental ground- based plant phenology monitoring system. Global Change Biology 16: 3014-3023. doi:10.111 l/).1365-2486.2010.02164.x

Hernandez RR, Mayernik MS, Murphy-Mariscal ML, AJlen MF. 2012. Data from: Advanced Technologies and Data Management Practices in Environmental Science: Lessons from Academia. Dryad Data Repository http://dx.doiorg/W.506l/dryad.cv86385c

Hey T [ AJG], Trefethen AE. 2003. The data deluge: An e-Science perspective. Pages 809-824 in Berman F, Fox GC, Hey AJG, eds. Grid Computing: Making the Global Infrastructure a Reality. Wiley.

Hook LA, Santhana Vannan SK, Beaty TW, Cook RB, Wilson BE. 2010. Best Practices for Preparing Environmental Data Sets to Share and Archive. Oak Ridge National Laboratory Distributed Active Archive Center. doi:10.3334/ORNLDAAC/BestPractices-2010

Kelling S, Hochachka WM, Fink D, Riedewald M, Caruana R, Ballard G, Hooker G. 2009. Data-intensive science: A new paradigm for biodiver- sity studies. BioScience 59: 613-620.

Mayernik MS. 2011. Metadata Realities for Cyberinfrastructure: Data Authors as Metadata Creators. PhD Dissertation. University of California, Los Angeles. doi:10.2139/ssrn.2042653

Michener WK, Brunt JW, Helly JJ, Kirchner TB, Stafford SG. 1997. Nongeospatial metadata for tbe ecological sciences. Ecological Applications 7: 330-342.

Millspaugh JJ, Gitzen RA. 2010. Statistical danger zone. Frontiers in Ecology and the Environment 8: 515.

Nature. 2003. Who'd want to work in a team? Nature 424: 1. [NSF] National Science Foundation. 2010. Scientists Seeking NSF Funding

Will Soon Be Required to Submit Data Management Plans. NSF. (20 September 2012; www.nsfgov/news/news_summ.jsp?cntn_id=116928)

Peters DPC, Groffman PM, Nadelhoffer KJ, Grimm NB, Collins SL, Michener WK, Huston MA. 2008. Living in an increasingly con- nected world: A framework for continental-scale environmental science. Frontiers in Ecology and the Environment 6: 229-237.

Porter J, et al. 2005. Wireless sensor networks for ecology. BioScience 55: 561-572.

Regan HM, Colyvan M, Burgman MA. 2002. A taxonomy and treat- ment of uncertainty for ecology and conservation biology. Ecological Applications 12: 618-628.

Rhoten D, Parker A. 2004. Risks and rewards of an interdisciplinary research path. Science 306: 2046. doi: 10.1126/science.l 103628

Rundel PW, Graham EA, Allen MF, Fisher JC, Harmon TC. 2009. Environmental sensor networks in ecological research. New Phytologist 182: 589-607. doi:10.1111/i.l469-8137.2009.02811.x

Tenopir C, Allard S, Douglass K, Aydinoglu AU, Wu L, Read E, Manoff M, Frame M. 2011. Data sharing by scientists: Practices and perceptions. PLOS ONE 6 (art. e21101). doi:10.1371/journal.pone.0021101

Wagner HH, Murphy MA, Holderegger R, Waits L. 2012. Developing an interdisciplinary, distributed graduate course for twenty-first century scientists. BioScience 62: 182-188. doi: 10.1525/bio.2012.62.2.11

Wallis JC, Borgman CL. 2011. Who is responsible for data? An exploratory study of data authorship, ownership, and responsibility. Proceedings of tbe American Society for Information Science and Technology 48: 1-10.

Wallis JC, Mayernik MS, Borgman CL, Pepe A. 2010. Digital libraries for scientific data discovery and reuse: From vision to practical reality. Pages 333-340 in Hunter J, Lagoze C, Giles L, Li Y-F, eds. Proceedings of the 10th Annual loint Conference on Digital Libraries. Association for Computing Machinery doi:10.1145/1816123.1816173

Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, Giovanni R, Robertson T, Vieglais D. 2012. Darwin Core: An evolving community- developed biodiversity data standard. PLOS ONE 7 (art. e29715). doi: 10.1371/journal.pone.0029715

Rebecca R. Hernandez ([email protected]) is a doctoral student

in environmental Earth system science at Stanford University, in Stanford,

California. She studies plant and soil ecological processes using sensor tech-

nologies and computational tools. Matthew S. Mayernik is a research data

services specialist at the National Center for Atmospheric Research, in Boulder,

Colorado. He received his PhD in information studies from the University of

California, Los Angeles, in 2011. He studies data and metadata management

practices across scientific disciplines. Michelle L. Murphy-Mariscal is a research

scientist and Michael F. Allen is the director at the Center for Conservation

Biology at the University of California. Riverside. MLM-M uses imaging tech-

nologies to sttidy corridor ecology in southern California, and MFA leads the

Terrestrial Ecology Observing Systems program for the Center for Embedded

Networked Systems Consortium.

1076 BioScience • December2012/Vol. 62No. 12 www. biosciencemag. org

Copyright of BioScience is the property of American Institute of Biological Sciences and its content may not be

copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written

permission. However, users may print, download, or email articles for individual use.