Empirical Methods in Software Engineering Assignment
1
How to Perform Experiments: Basic Concepts
CSCI 783: Empirical Software Engineering
2
Empirical Software Engineering: How to use empirical research in software engineering?
Repetition of empirical studies is necessary!
Definition
Planning and Design
Execution
Analysis
Packaging
Definition: Determine study goal(s)
Design: and research hypothesis(es). Select type of empirical study to be employed Operationalize study goal(s) and hypotheses. Make study plan: what needs to be done by whom and when. Prepare material required to conduct the study
Execution: Run study according to plan and collect required data
Analysis: Analyze collected data to answer operationalized study goals and hypotheses
Packaging: Report your studies
3
Empiricism in Software Engineering
Confirmation
Evaluation
Identification
Validation
Understanding
Guidance / Control
Of more or less accepted hypotheses:
For example: object-orientation is good for reuse
Of Methods:
For example: Whether Java produces higher quality code than C++
Of Relationships:
For example: Find a relationship between fault prone components and design concepts
Of Models and Measures:
For example: Validate a specific cost estimate model
Of Methods, Techniques and Models:
For example: To understand the relationship between inspections and testing
to help in Management:
For example: as input to personnel to software inspections
To support Decision- Making with respect to Changes:
For example: Whether or not to introduce a new development tool
C
Change / Improve
Experimentation in software engineering
4
Experiment Objective
Cause
Construct
Effect
Construct
Cause-effect
Construct
Theory
Treatment
Outcome
Treatment - Outcome
Construct
Observation
Experiment Operation
Independent variable
Dependent variable
5
What is Empirical Software Engineering Research
What kinds of questions are "interesting"?
What kinds of results help to answer these questions, and what research methods can produce these results?
What kinds of evidence can demonstrate the validity of a result, and how to distinguish good results from bad ones?
6
Types of Research Questions
What kinds of questions are "interesting"?
Types of Research Questions
Method or means of development
Method for analysis
Design, evaluation, or analysis of a particular instance
Generalization or characterization
Feasibility
How can we do/create (or automate doing) X?
What is a better way to do/create X?
How can I evaluate the quality/correctness of X?
How do I choose between X and Y?
What is a (better) design or implementation for application X?
What is property X of artifact/method Y?
How does X compare to Y?
What is the current state of X / practice of Y?
Given X, what will Y (necessarily) be?
What, exactly, do we mean by X?
What are the important characteristics of X?
What is a good formal/empirical model for X?
What are the varieties of X, how are they related?
Is it possible to accomplish X at all?
7
What is Software Engineering Research
What kinds of questions are "interesting"?
What kinds of results help to answer these questions, and what research methods can produce these results?
8
Types of Research Results
What kinds of questions are "interesting"?
What kinds of results help to answer these questions,
Procedure or technique
Qualitative or descriptive model
Empirical model
Analytic model
Notation or tool
Specific solution
Answer or judgment
Report
New or better way to do some task, such as design, implementation,
measurement, evaluation, selection from alternatives,
Techniques for implementation, representation, management,
and analysis, but not advice or guidelines
Structure or Taxonomy for a problem area, architecture style, or design
pattern
Well grounded checklist
Empirical Predictive model based on data
Structural model precise enough to support formal analysis or
automatic manipulation
Formal language to support technique or model
Implemented tool that embodies a technique
Solution to application problem that shows use of software engineering
principles – may be design, rather than implementation
Result of specific analysis, evaluation, or comparison
Interesting observations, rules of thumb
9
Software Engineering Research
| Question | Results / method | Validation |
| Feasibility | Qualitative Model | Persuasion |
| Characterization | Technique | Implementation |
| Method/Means | System | Evaluation |
| Generalization | Empirical Model | Analysis |
| Selection | Analytic Model | Experience |
10
Software Engineering Research: A common Plan
| Question | Results / method | Validation |
| Feasibility | Qualitative Model | Persuasion |
| Characterization | Technique | Implementation |
| Method/Means | System | Evaluation |
| Generalization | Empirical Model | Analysis |
| Selection | Analytic Model | Experience |
Can X be Done Better
Build a Y
Measure Y to compare X
11
Software Engineering Research: A common (often bad) Plan
| Question | Results / method | Validation |
| Feasibility | Qualitative Model | Persuasion |
| Characterization | Technique | Implementation |
| Method/Means | System | Evaluation |
| Generalization | Empirical Model | Analysis |
| Selection | Analytic Model | Experience |
Can X be Done Better
Build a Y
Look it works
12
Software Engineering Research: 2 Other good Plans
| Question | Results / method | Validation |
| Feasibility | Qualitative Model | Persuasion |
| Characterization | Technique | Implementation |
| Method/Means | System | Evaluation |
| Generalization | Empirical Model | Analysis |
| Selection | Analytic Model | Experience |
Can X be Done at all
Build a Y that does X
Look it works
Is X always true of Y
Formally Model X and
Prove Y
Check proof
13
Goal Question Metric (GQM) Paradigm
14
Goal Question Metric (GQM) Paradigm: Example
| Goal 1 | [1] Purpose [2] Issue [3] Object (process) [4] Viewpoint | Maintain a maximum level of customer satisfaction from the Help Desk user’s viewpoint |
| Question 1 | What is the current help desk ticket trend? | |
| Metrics 1 Metrics 2 Metrics 3 Metrics 4 | Number of help desk tickets closed Number of new help desk tickets % tickets outside of the upper limit Subjective rating of customer satisfaction | |
| Metrics 5 | Number of new help desk tickets open | |
| Question 2 | Is the help desk satisfaction improving or diminishing? | |
| Metrics 6 Metrics 7 Metrics 8 Metrics 9 | Number of help desk calls abandoned Number of help desk calls answered Number of help desk calls sent to voicemail Subjective rating of customer satisfaction |
“If you can not measure it, you can not improve it.”:
By Great Lord Kelvin
15
Experiment Definition
Definition
Experiment Definition:
Determine study goal(s)
The Goal Template:
Analyze <a process, product, method, model>
For the purpose of <characterizing, understanding, evaluating, predicting, improving>
With respect to their <Quality Focus>
From the point of view of <Developer, Customer, Manager>
in the context of <Other Context factors that may affect outcomes>
16
Experiment Definition: Example
The Goal Template:
Analyze <PBR and Checklist Technique>
For the purpose of <Evaluating>
With respect to their <Effectiveness and Efficiency>
From the point of view of <Researchers>
in the context of <Students reading requirement documents>
example
TASK: Software-development process management
PROBLEM: During the software testing phase many anomalies were discovered and it is suspected that the software quality would not reach a satisfactory level by the shipping deadline.
QUESTION: Construct a GQM tree that helps you to decide when it would be possible to ship the software.
17
18
Example – sentence format
Analyze the unit test process to understand the impact of adding additional tests to project K from the viewpoint of the project manager
19
Experiment Planning and Design
Experiment Definition
Context Selection
Experiment Implementation
Hypothesis
Formulation
Variable Selection
Selection of Subjects
Design
Instrumentation
Validity Evaluation
Experiment
Planning and Design
20
Experiment Planning and Design: Context Selection
Context Selection:
Off line vs. On-line
Students vs. Professionals
Toy vs. Real problems
21
Experiment Planning and Design
Experiment
Definition
Context Selection
Hypothesis
Formulation
Experiment
Planning and Design
22
Experiment Planning and Design: Hypothesis Formulation
Null hypothesis (no real underlying trend or
pattern) and alternative hypothesis.
The objective is to reject the null hypothesis with as high significance as possible.
23
Experiment Planning and Design: Hypothesis Formulation (Example)
Null hypothesis
There is no difference in code quality between code produced using clean-room and code produced using our current process
Alternative hypothesis
The quality of code produced using clean- room is higher than the quality of code produced using our current process
24
Experiment Planning and Design
Experiment
Definition
Context Selection
Hypothesis
Formulation
Variable Selection
Experiment
Planning and Design
25
Experiment Planning and Design: Variable Selection
Process
Dependent
variable
Independent
variables
.
.
.
26
Experiment Planning and Design
Experiment
Definition
Context Selection
Hypothesis
Formulation
Variable Selection
Selection of Subjects
Experiment
Planning and Design
27
Experiment Planning and Design
Experiment
Definition
Context Selection
Hypothesis
Formulation
Variable Selection
Selection of Subjects
Design
Experiment
Planning and Design
28
Experiment Planning and Design: Design
1. Design principles
Randomization
Blocking (e.g., on experience)
Balancing (same number of subjects in each group)
29
Experiment Planning and Design: Design
2. Design types
A large number of standard designs exist, and we should select an appropriate design type depending on:
“treatments” and “number of subjects”, and “the objective (hypothesis)“ of the experiment
30
Experiment Planning and Design
Experiment
Definition
Context Selection
Hypothesis
Formulation
Variable Selection
Selection of Subjects
Design
Instrumentation
Experiment
Planning and Design
31
Experiment Planning and Design: Instrumentation
Objects
Guidelines
Measurement Instruments
32
Experiment Planning and Design
Experiment
Definition
Context Selection
Hypothesis
Formulation
Variable Selection
Selection of Subjects
Design
Instrumentation
Validity Evaluation
Experiment Implementation
Experiment
Planning and Design
Construct Validity
whether the theoretical constructs are interpreted and measured correctly
Internal Validity
focuses on the study design, and particularly whether the results really do follow from the data
External Validity
whether claims for the generality of the results are justified
Reliability
whether the study yields the same results if other researchers replicate it
Experiment Planning and Design: Validity Evaluation
For empirical work to be acceptable as a contribution to scientific knowledge, the researcher needs to convince readers that the conclusions drawn from an empirical study are valid. Not surprisingly, the criteria by which researchers judge validity depend on their philosophical stance.
For positivists, research is normally theory-driven. The key steps include deriving study propositions from the theory, designing the study to address the propositions, and then drawing more general conclusions from the results. Each of these steps must be shown to be sound. Accordingly, positivists usually identify four criteria for validity:
• Construct validity focuses on whether the theoretical constructs are interpreted and measured correctly. For example, if Jane designs an experiment to test her claims about the efficiency of fish eye views, will she interpret “efficiency” in the same way that other researchers have, and does she have an appropriate means for measuring it? Problems with construct validity occur when the measured variables don’t correspond to the intended meanings of the theoretical terms.
• Internal validity focuses on the study design, and particularly whether the results really do follow from the data. Typical mistakes include the failure to handle confounding variables properly, and misuse of statistical analysis.
• External validity focuses on whether claims for the generality of the results are justified. Often, this depends on the nature of the sampling used in a study. For example, if Jane’s experiment is conducted with students as her subjects, it might be hard to convince people that the results would apply to practitioners in general.
• Reliability focuses on whether the study yields the same results if other researchers replicate it. Problems occur if the researcher introduces bias, perhaps because the tool being evaluated is one that the researcher herself has a stake in.
These criteria are useful for evaluating all positivist studies, including controlled experiments, most case studies and survey research. In reporting positivist empirical studies, it is important to include a section on threats to validity, in which potential weaknesses in the study design as well as attempts to mitigate these threats are discussed in terms of these four criteria. This is important because all study designs have flaws. By acknowledging them explicitly, the researchers show that they are aware of the flaws and have taken reasonable steps to minimize their effects.
34
Empirical Software Engineering: How to use empirical research in software engineering?
Definition
Planning and Design
Definition: Determine study goal(s)
Planning and Design:
Research hypothesis(es).
Select type of empirical study to be employed .
Operationalize study goal(s) and hypotheses.
Make study plan: what needs to be done by whom and when.
Prepare material required to conduct the study
Execution: Run study according to plan and collect required data
Execution
35
Experiment Execution
Experiment
Definition
Train Subjects
Run Treatment
Data Analysis and Interpretation
Experiment Execution
Experiment
Planning and Design
Data Collection
36
Empirical Software Engineering: How to use empirical research in software engineering?
Definition
Planning and Design
Execution
Analysis
Definition: Determine study goal(s)
Design: and research hypothesis(es). Select type of empirical study to be employed Operationalize study goal(s) and hypotheses. Make study plan: what needs to be done by whom and when. Prepare material required to conduct the study
Execution: Run study according to plan and collect required data
Analysis: Analyze collected data to answer operationalized study goals and hypotheses
37
Experiment
Definition
Descriptive Statistics
Hypothesis Testing
Packaging
Experiment
Planning and Design
Discussion and Conclusion
Experiment
Execution
Data Analysis and Interpretation
Data Analysis and Interpretation
38
Empirical Software Engineering: How to use empirical research in software engineering?
Definition
Planning and Design
Execution
Analysis
Packaging
Definition: Determine study goal(s)
Design: and research hypothesis(es). Select type of empirical study to be employed Operationalize study goal(s) and hypotheses. Make study plan: what needs to be done by whom and when. Prepare material required to conduct the study
Execution: Run study according to plan and collect required data
Analysis: Analyze collected data to answer operationalized study goals and hypotheses
Packaging: Report your studies
39
Experiment Packaging
Definition
Packaging
Planning and Design
Execution
Data Analysis Interpretation
Report Outline:
Introduction
Problem Statement
Experiment Planning
Experiment Operation
Data Analysis
Interpretation of Results
Discussions and Conclusions
Appendix