Empirical Methods in Software Engineering Assignment

profilefarzadbigz
Lecture4-HowtoPerformExperiments.pptx

1

How to Perform Experiments: Basic Concepts

CSCI 783: Empirical Software Engineering

2

Empirical Software Engineering: How to use empirical research in software engineering?

Repetition of empirical studies is necessary!

Definition

Planning and Design

Execution

Analysis

Packaging

Definition: Determine study goal(s)

Design: and research hypothesis(es). Select type of empirical study to be employed Operationalize study goal(s) and hypotheses. Make study plan: what needs to be done by whom and when. Prepare material required to conduct the study

Execution: Run study according to plan and collect required data

Analysis: Analyze collected data to answer operationalized study goals and hypotheses

Packaging: Report your studies

3

Empiricism in Software Engineering

Confirmation

Evaluation

Identification

Validation

Understanding

Guidance / Control

Of more or less accepted hypotheses:

For example: object-orientation is good for reuse

Of Methods:

For example: Whether Java produces higher quality code than C++

Of Relationships:

For example: Find a relationship between fault prone components and design concepts

Of Models and Measures:

For example: Validate a specific cost estimate model

Of Methods, Techniques and Models:

For example: To understand the relationship between inspections and testing

to help in Management:

For example: as input to personnel to software inspections

To support Decision- Making with respect to Changes:

For example: Whether or not to introduce a new development tool

C

Change / Improve

Experimentation in software engineering

4

Experiment Objective

Cause

Construct

Effect

Construct

Cause-effect

Construct

Theory

Treatment

Outcome

Treatment - Outcome

Construct

Observation

Experiment Operation

Independent variable

Dependent variable

5

What is Empirical Software Engineering Research

What kinds of questions are "interesting"?

What kinds of results help to answer these questions, and what research methods can produce these results?

What kinds of evidence can demonstrate the validity of a result, and how to distinguish good results from bad ones?

6

Types of Research Questions

What kinds of questions are "interesting"?

Types of Research Questions

Method or means of development

Method for analysis

Design, evaluation, or analysis of a particular instance

Generalization or characterization

Feasibility

How can we do/create (or automate doing) X?

What is a better way to do/create X?

How can I evaluate the quality/correctness of X?

How do I choose between X and Y?

What is a (better) design or implementation for application X?

What is property X of artifact/method Y?

How does X compare to Y?

What is the current state of X / practice of Y?

Given X, what will Y (necessarily) be?

What, exactly, do we mean by X?

What are the important characteristics of X?

What is a good formal/empirical model for X?

What are the varieties of X, how are they related?

Is it possible to accomplish X at all?

7

What is Software Engineering Research

What kinds of questions are "interesting"?

What kinds of results help to answer these questions, and what research methods can produce these results?

8

Types of Research Results

What kinds of questions are "interesting"?

What kinds of results help to answer these questions,

Procedure or technique

Qualitative or descriptive model

Empirical model

Analytic model

Notation or tool

Specific solution

Answer or judgment

Report

New or better way to do some task, such as design, implementation,

measurement, evaluation, selection from alternatives,

Techniques for implementation, representation, management,

and analysis, but not advice or guidelines

Structure or Taxonomy for a problem area, architecture style, or design

pattern

Well grounded checklist

Empirical Predictive model based on data

Structural model precise enough to support formal analysis or

automatic manipulation

Formal language to support technique or model

Implemented tool that embodies a technique

Solution to application problem that shows use of software engineering

principles – may be design, rather than implementation

Result of specific analysis, evaluation, or comparison

Interesting observations, rules of thumb

9

Software Engineering Research

Question Results / method Validation
Feasibility Qualitative Model Persuasion
Characterization Technique Implementation
Method/Means System Evaluation
Generalization Empirical Model Analysis
Selection Analytic Model Experience

10

Software Engineering Research: A common Plan

Question Results / method Validation
Feasibility Qualitative Model Persuasion
Characterization Technique Implementation
Method/Means System Evaluation
Generalization Empirical Model Analysis
Selection Analytic Model Experience

Can X be Done Better

Build a Y

Measure Y to compare X

11

Software Engineering Research: A common (often bad) Plan

Question Results / method Validation
Feasibility Qualitative Model Persuasion
Characterization Technique Implementation
Method/Means System Evaluation
Generalization Empirical Model Analysis
Selection Analytic Model Experience

Can X be Done Better

Build a Y

Look it works

12

Software Engineering Research: 2 Other good Plans

Question Results / method Validation
Feasibility Qualitative Model Persuasion
Characterization Technique Implementation
Method/Means System Evaluation
Generalization Empirical Model Analysis
Selection Analytic Model Experience

Can X be Done at all

Build a Y that does X

Look it works

Is X always true of Y

Formally Model X and

Prove Y

Check proof

13

Goal Question Metric (GQM) Paradigm

14

Goal Question Metric (GQM) Paradigm: Example

Goal 1 [1] Purpose [2] Issue [3] Object (process) [4] Viewpoint Maintain a maximum level of customer satisfaction from the Help Desk user’s viewpoint
Question 1 What is the current help desk ticket trend?
Metrics 1 Metrics 2 Metrics 3 Metrics 4 Number of help desk tickets closed Number of new help desk tickets % tickets outside of the upper limit Subjective rating of customer satisfaction
Metrics 5 Number of new help desk tickets open
Question 2 Is the help desk satisfaction improving or diminishing?
Metrics 6 Metrics 7 Metrics 8 Metrics 9 Number of help desk calls abandoned Number of help desk calls answered Number of help desk calls sent to voicemail Subjective rating of customer satisfaction

“If you can not measure it, you can not improve it.”:

By Great Lord Kelvin

15

Experiment Definition

Definition

Experiment Definition:

Determine study goal(s)

The Goal Template:

Analyze <a process, product, method, model>

For the purpose of <characterizing, understanding, evaluating, predicting, improving>

With respect to their <Quality Focus>

From the point of view of <Developer, Customer, Manager>

in the context of <Other Context factors that may affect outcomes>

16

Experiment Definition: Example

The Goal Template:

Analyze <PBR and Checklist Technique>

For the purpose of <Evaluating>

With respect to their <Effectiveness and Efficiency>

From the point of view of <Researchers>

in the context of <Students reading requirement documents>

example

TASK: Software-development process management

PROBLEM: During the software testing phase many anomalies were discovered and it is suspected that the software quality would not reach a satisfactory level by the shipping deadline.

QUESTION: Construct a GQM tree that helps you to decide when it would be possible to ship the software.

17

18

Example – sentence format

Analyze the unit test process to understand the impact of adding additional tests to project K from the viewpoint of the project manager

19

Experiment Planning and Design

Experiment Definition

Context Selection

Experiment Implementation

Hypothesis

Formulation

Variable Selection

Selection of Subjects

Design

Instrumentation

Validity Evaluation

Experiment

Planning and Design

20

Experiment Planning and Design: Context Selection

Context Selection:

Off line vs. On-line

Students vs. Professionals

Toy vs. Real problems

21

Experiment Planning and Design

Experiment

Definition

Context Selection

Hypothesis

Formulation

Experiment

Planning and Design

22

Experiment Planning and Design: Hypothesis Formulation

Null hypothesis (no real underlying trend or

pattern) and alternative hypothesis.

The objective is to reject the null hypothesis with as high significance as possible.

23

Experiment Planning and Design: Hypothesis Formulation (Example)

Null hypothesis

There is no difference in code quality between code produced using clean-room and code produced using our current process

Alternative hypothesis

The quality of code produced using clean- room is higher than the quality of code produced using our current process

24

Experiment Planning and Design

Experiment

Definition

Context Selection

Hypothesis

Formulation

Variable Selection

Experiment

Planning and Design

25

Experiment Planning and Design: Variable Selection

Process

Dependent

variable

Independent

variables

.

.

.

26

Experiment Planning and Design

Experiment

Definition

Context Selection

Hypothesis

Formulation

Variable Selection

Selection of Subjects

Experiment

Planning and Design

27

Experiment Planning and Design

Experiment

Definition

Context Selection

Hypothesis

Formulation

Variable Selection

Selection of Subjects

Design

Experiment

Planning and Design

28

Experiment Planning and Design: Design

1. Design principles

Randomization

Blocking (e.g., on experience)

Balancing (same number of subjects in each group)

29

Experiment Planning and Design: Design

2. Design types

A large number of standard designs exist, and we should select an appropriate design type depending on:

“treatments” and “number of subjects”, and “the objective (hypothesis)“ of the experiment

30

Experiment Planning and Design

Experiment

Definition

Context Selection

Hypothesis

Formulation

Variable Selection

Selection of Subjects

Design

Instrumentation

Experiment

Planning and Design

31

Experiment Planning and Design: Instrumentation

Objects

Guidelines

Measurement Instruments

32

Experiment Planning and Design

Experiment

Definition

Context Selection

Hypothesis

Formulation

Variable Selection

Selection of Subjects

Design

Instrumentation

Validity Evaluation

Experiment Implementation

Experiment

Planning and Design

Construct Validity

whether the theoretical constructs are interpreted and measured correctly

Internal Validity

focuses on the study design, and particularly whether the results really do follow from the data

External Validity

whether claims for the generality of the results are justified

Reliability

whether the study yields the same results if other researchers replicate it

Experiment Planning and Design: Validity Evaluation

For empirical work to be acceptable as a contribution to scientific knowledge, the researcher needs to convince readers that the conclusions drawn from an empirical study are valid. Not surprisingly, the criteria by which researchers judge validity depend on their philosophical stance.

For positivists, research is normally theory-driven. The key steps include deriving study propositions from the theory, designing the study to address the propositions, and then drawing more general conclusions from the results. Each of these steps must be shown to be sound. Accordingly, positivists usually identify four criteria for validity:

• Construct validity focuses on whether the theoretical constructs are interpreted and measured correctly. For example, if Jane designs an experiment to test her claims about the efficiency of fish eye views, will she interpret “efficiency” in the same way that other researchers have, and does she have an appropriate means for measuring it? Problems with construct validity occur when the measured variables don’t correspond to the intended meanings of the theoretical terms.

• Internal validity focuses on the study design, and particularly whether the results really do follow from the data. Typical mistakes include the failure to handle confounding variables properly, and misuse of statistical analysis.

• External validity focuses on whether claims for the generality of the results are justified. Often, this depends on the nature of the sampling used in a study. For example, if Jane’s experiment is conducted with students as her subjects, it might be hard to convince people that the results would apply to practitioners in general.

• Reliability focuses on whether the study yields the same results if other researchers replicate it. Problems occur if the researcher introduces bias, perhaps because the tool being evaluated is one that the researcher herself has a stake in.

These criteria are useful for evaluating all positivist studies, including controlled experiments, most case studies and survey research. In reporting positivist empirical studies, it is important to include a section on threats to validity, in which potential weaknesses in the study design as well as attempts to mitigate these threats are discussed in terms of these four criteria. This is important because all study designs have flaws. By acknowledging them explicitly, the researchers show that they are aware of the flaws and have taken reasonable steps to minimize their effects.

34

Empirical Software Engineering: How to use empirical research in software engineering?

Definition

Planning and Design

Definition: Determine study goal(s)

Planning and Design:

Research hypothesis(es).

Select type of empirical study to be employed .

Operationalize study goal(s) and hypotheses.

Make study plan: what needs to be done by whom and when.

Prepare material required to conduct the study

Execution: Run study according to plan and collect required data

Execution

35

Experiment Execution

Experiment

Definition

Train Subjects

Run Treatment

Data Analysis and Interpretation

Experiment Execution

Experiment

Planning and Design

Data Collection

36

Empirical Software Engineering: How to use empirical research in software engineering?

Definition

Planning and Design

Execution

Analysis

Definition: Determine study goal(s)

Design: and research hypothesis(es). Select type of empirical study to be employed Operationalize study goal(s) and hypotheses. Make study plan: what needs to be done by whom and when. Prepare material required to conduct the study

Execution: Run study according to plan and collect required data

Analysis: Analyze collected data to answer operationalized study goals and hypotheses

37

Experiment

Definition

Descriptive Statistics

Hypothesis Testing

Packaging

Experiment

Planning and Design

Discussion and Conclusion

Experiment

Execution

Data Analysis and Interpretation

Data Analysis and Interpretation

38

Empirical Software Engineering: How to use empirical research in software engineering?

Definition

Planning and Design

Execution

Analysis

Packaging

Definition: Determine study goal(s)

Design: and research hypothesis(es). Select type of empirical study to be employed Operationalize study goal(s) and hypotheses. Make study plan: what needs to be done by whom and when. Prepare material required to conduct the study

Execution: Run study according to plan and collect required data

Analysis: Analyze collected data to answer operationalized study goals and hypotheses

Packaging: Report your studies

39

Experiment Packaging

Definition

Packaging

Planning and Design

Execution

Data Analysis Interpretation

Report Outline:

Introduction

Problem Statement

Experiment Planning

Experiment Operation

Data Analysis

Interpretation of Results

Discussions and Conclusions

Appendix