Interpreting HTML Markup
It’s easy to imagine the creators of ENIAC or other early computers looking at each
other and saying, “There has to be an easier way.” They had created what was (at the
time) an engineering marvel—an automated computer that could do thousands of
mathematical operations per second. It was significantly faster than any human at doing
math, but there was a problem. The developers had to tell the computer what to do, and
that “programming” process could take weeks for a team of engineers to create and
debug. Any change in the program, even a simple one such as telling it to subtract
instead of add, took hours of changes and testing. And if they had a different math
equation altogether, it required the creation of an entirely new program. There had to be
an easier way.
Fast-forward to today, and it’s impossible to count the number of software titles in
existence. From operating systems to productivity software to games to—you name
it—the breadth and depth of available programs would blow the minds of those early
computer pioneers. One thing hasn’t changed, though: People still need to tell
computers what to do. Computers can’t independently think for themselves. Perhaps
that will change (for better or worse) with advances in artificial intelligence, but for now
computers need instructions, and those instructions are delivered via preprogrammed
software packages. It is the easier way.
This chapter gives you an overview to foundational software development concepts. It’s
not intended to turn you into a programmer—one chapter could not do that task justice.
Rather, at the end of this chapter, you will understand the basics of different
programming languages as well as some concepts and techniques that programmers use
to make their jobs easier. If you find these topics interesting and want to pursue a path
that includes programming, there are ample courses, books, and online materials to
help you get there.
Exploring Programming Languages
All of the software that you use today, whether it’s on your smartphone, a workstation,
or a cloud-based app, was created by a programmer. More than likely, it was created by
a team of them. Small and simple programs might have only a few hundred lines of
code, whereas an elaborate program like Windows 10 reportedly has about 65 million
lines. Now, most developers will tell you that lines of code are a terrible measure for
anything—the goal is to get the desired functionality with as little code as possible—but
the Windows 10 statistic underscores the complexity of some applications.
I use the terms programmers, developers, and coders interchangeably in this chapter.
Just as there are a large number of software titles on the market, numerous
programming languages exist. Each language has its own grammar and syntax, much
like the rules of English are different from the rules of Spanish. Software developers
typically specialize in one or two languages but have a basic foundational understanding
of the popular ones in the marketplace. For example, a coder might identify as a C++
programmer but also know a scripting language like JavaScript. Because of the
similarities between language concepts, the coder can look at code from an unfamiliar
language and generally understand what it does.
In this section, you will learn about four categories of programming languages:
assembly, compiled, interpreted, and query.
Assembly Language
Any programmer writing code is telling the computer—specifically, the processor—what
to do. Assembly language is the lowest level of code in which people can write. That is,
it allows the developer to provide instructions directly to the hardware. It got its name
because after it’s created, it’s translated into executable machine code by a program
called an assembler.
Assembly code is specific to processor architectures. A program written for a 32-bit Intel
chip will look different than one written for an ARM chip, even if the functionality is
identical.
Originally developed in 1947 at the University of London, assembly was the primary
programming language used for quite some time. Operating systems such as IBM’s PC
DOS and programs like the Lotus 1-2-3 spreadsheet software were coded in assembly, as
well as some console-based video games. In the 1980s, higher-level languages overtook
assembly in popularity.
Even though it doesn’t dominate the landscape, assembly is still used today because it
has some advantages over higher-level languages. It can be faster. It’s also used for
direct hardware access, such as in the system BIOS, with device drivers, and in
customized embedded systems. It’s also used for the reverse engineering of code.
Translating high-level code into assembly is fairly straightforward, but trying to
disassemble code into a higher-level language might not be. On the downside, some
virus programmers like it because it’s closer to the hardware.
Understanding Notational Systems
Before I get into how assembly works or what it looks like, it’s important to take a step
back and think about how computers work. Computers only understand the binary
notational system—1s and 0s. Everything that a computer does is based on those two
digits—that’s really profound. When you’re playing a game, surfing the web, or chatting
with a friend using your computer, it’s really just a tremendously long string of 1s and
0s. Recall from Chapter 1, “Core Hardware Components,” the basic organizational
structure of these 1s and 0s. One digit (either a 1 or a 0) is a bit, and eight digits form a
byte.
There are a lot of real-life practical examples of binary systems. Take light switches, for
example. A conventional light switch is either on (1) or off (0). Ignoring dimmable
lighting for a minute, with a traditional switch, the light is in one of two distinct states:
on or off. This is binary, also known as “base 2” because there are two distinct values.
Humans are far more used to counting in base 10, which is the decimal notational
system. In decimal, the numbers 0 through 9 are used. To show the number that’s one
larger than 9, a second digit is added in front of it and reset the rightmost digit to 0. This
is just a complicated way of telling you something you already know, which is 9 + 1 = 10.
Binary math works much the same way. The binary value 1 equals a decimal value of 1. If
you add 1 + 1 in binary, what happens? The 1 can’t increase to 2, because only 1s and 0s
are allowed. So, the 1 resets to a 0, and a second digit is added in front of it. Thus, in
binary, 1 + 1 = 10. Then, 10 + 1 = 11, and 11 + 1 = 100. If you’re not accustomed to
looking at binary, this can be a bit confusing!
Now think about the structure of a byte, which is eight bits long. If you want to convert
binary to decimal, then the bit’s position in the byte determines its value. Table 6.1
illustrates what I mean.
TABLE 6.1 Converting binary to decimal
Position 8 7 6 5 4 3 2 1
Bit 1 1 1 1 1 1 1 1
Base 2
7 2
6 2
5 2
4 2
3 2
2 2
1 2
0
Value 128 64 32 16 8 4 2 1
If the bit is set to 1, it has the value shown in the value row. If it’s set to 0, then its value
is 0. Using the example from a few paragraphs ago, you can now see how binary 100
equals a decimal 4. The binary number 10010001 is 128 + 16 + 1 = 145 in decimal. Using
one byte, decimal values between 0 and 255 can be represented.
It’s unlikely that you will be asked to perform binary to decimal conversion on the IT Fundamentals+ (ITF+) exam. It’s a good concept to understand, though, and
it’s material to understanding how assembly works!
To take things one step further, there’s a third system used commonly in programming,
which is the hexadecimal notational system, or base 16. You’ll also see it referred to as
hex. In hex, the numbers 0 to 9 are used, just like in decimal. However, the letters A to F
are used to represent decimal numbers 10 through 15. So, in hex, F + 1 = 10. Aren’t
notational systems fun? The key when dealing with numbers in programming is to
understand clearly which notational system you’re supposed to be using. Exercise 6.1
will give you some
Some programming languages will use the prefix 0x in front of a number to indicate that it’s in hexadecimal. For example, you might see something like 0x16FE. That just means the hex number 16FE. Other languages will use an h suffix, so it would be written as 16FEh.
Binary, decimal, and hex work great for representing numbers, but what about letters
and special characters? There are notational systems for these as well. The first is
American Standard Code for Information Interchange (ASCII), which is pronounced
ask-e. ASCII codes represent text and special characters on computers and
telecommunications equipment. The standard ASCII codes use seven bits to store
information, which provides for only 128 characters. Therefore, standard ASCII only has
enough space to represent standard English (Latin) uppercase and lowercase letters,
numbers 0 to 9, a few dozen special characters, and (now obsolete) codes called control
codes. Table 6.2 shows you a small sample of ASCII codes. You can find the full table at
www.asciitable.com.
TABLE 6.2 Sample ASCII codes
Dec Hex HTML Character
33 21 ! !
56 38 8 8
78 4E N N
79 4F O O
110 6E n n
Covering only the Latin alphabet isn’t very globally inclusive, so a superset of ASCII was
created called Unicode. The current version of Unicode supports 136,755 characters
across 139 language scripts and several character sets. Unicode has several standards.
UTF-8 uses 8 bits and is identical to ASCII. UTF-16 uses 16 bits (allowing for 65,536
characters, covering what’s known as the Basic Multilingual Plane) and is the most
common standard in use today. UTF-32 allows for coverage of the full set of characters.
The Unicode table is at unicode-table.com/en.
Working with Assembly
Coding in assembly is not for the faint of heart. As I mentioned earlier, you need to
know the version specific to the processor’s platform. In addition, you need to know how
memory segmentation works and how processor codes will respond in protected and
unprotected memory environments. There’s more to it than those few criteria, but
suffice it to say that it’s challenging work.
Let’s start with a simple example, remembering that all computers understand are 1s
and 0s. Say that you have a 32-bit Intel processor and want to move a simple 8-bit
number into a memory register. The binary code to tell the processor to move data is
10110, followed by a 3-bit memory register identifier. For this example, you’ll use the lowest part of the accumulator register (you won’t need to know this for the exam),
which is noted as AL. The code for this register is 000. Finally, you need to tell the CPU the number that you want to move into this register—you’ll use 42. In binary, the
command looks like this:
10110000 00101010
That’s not very user friendly or easy to remember. Using decimal to hex conversion, you
can simplify it to the following:
B0 2A
That literally means, “Move into memory register AL the number 42.” It's still not very
user friendly. To help ease the challenge, assembly developers have created mnemonic
codes to help programmers remember commands. For example, the command MOV
(short for move) is a mnemonic to replace the binary or hex code. So, the command can
now be written as follows:
MOV AL, 2Ah ;Move the number 42 (2A hex) into AL
You’ll notice a few things here. The first is that the command MOV AL is much easier to remember than the binary code, and it makes more sense in human terms than does B0. The second is that I added some real words after a semicolon. Most languages allow
coders the ability to add comments that are not processed. In assembly, anything on a
line after a semicolon is considered a comment and ignored by the processor.
So, to summarize the basic structure of a line of code, it contains processor instructions
(“do this”), directives (defining data elements or giving the processor specific ways of
performing the task), data, and optional comments. This is pretty much true for all
programming languages, although the structure and syntax will vary.
As I wrap up this section on assembly, I want to leave you with one small gift. A
tradition in pretty much every programming class is that the first program you are
taught to write is how to display “Hello, world!” on the screen. The way to create this
friendly greeting varies based on the language, so when I cover various languages, I am
going to show you what it looks like or have you do it. The intent isn’t to have you
memorize the code or learn to program but to give you a feel for what the code looks like
for an actual application. So, without further ado, here is “Hello, world!” in all of its
assembly glory:
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor
mov eax,4 ;system call
int 0x80 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;the message!
len equ $ - msg ;length of the string
When the code is assembled and executed, it will display the following on the screen:
Hello, world!
Compiled Languages
High-level languages have replaced assembly as the most commonly used ones in
software development today. When creating a new application, the developer must
decide between a compiled language and an interpreted language. A compiled
programming language is one that requires the use of a compiler to translate it into
machine code. Creating and using a program using a compiled language consists of
three steps:
1. Write the application in a programming language, such as Java or C++. This is
called the source code.
2. Use a complier to translate the source code into machine code. Most software
development applications have a compiler.
3. Execute the program file, which (in Windows) usually has an .exe extension.
There are dozens of compiled languages a programmer can choose from, but unless
there is a specific need, the programmer is likely going to go with one of the more
common ones. In days past, the choices might have been Fortran, BASIC, or Pascal.
Now, Java, C, C++ (pronounced C plus plus), and C# (pronounced C sharp) are the most
popular. The Linux and Windows kernels are written in C. The rest of Windows is
written mostly in C++, with a bit of custom assembly thrown in for good measure.
Let’s take a look at the source code for “Hello, world!” in Java, which is the most popular
programming language in use today:
public class HelloWorld {
public static void main(String[] args) {
// Prints “Hello, world!” in the terminal window.
System.out.println(“Hello, world!”);
}
}
Compare and contrast the Java code to assembly. A few things might jump out. First,
the program is a lot shorter. Second, the syntax is different. Java uses braces (the { and }) to indicate code blocks. Notice that for every open brace, there is a corresponding close brace. Single-line comments in Java are preceded with two slashes (//) instead of a semicolon. Even little things—assembly uses single quotes around the words that you
want to print while Java uses double quotes—are different. As for the rest of the context, don’t worry about understanding it all right now. Again, the point is to just get a feel for what some basic code looks like.
The next stop on your tour of compiled languages is C++. Let’s look at source code for your new favorite program:
// Header file #include<iostream>
using namespace std;
// the main function is where the program execution begins int main() {
// the message to the world cout<<“Hello, world!”;
return 0; }
There are a few similarities to Java. C++ and Java are both derivatives of the C language, so it makes sense that they would share some features. For example, comments start with two slashes, and braces are present to create blocks of code. Other things that you might have noticed are that both use a main function, double quotes for text, and a semicolon to end a statement.
Finally, take a look at the same program written in C#. C# is also a derivative of C, so some of this code might start to look familiar to you:
using System;
namespace HelloWorld {
Class Program {
// the main function - also known as a method Static void Main(string[] args) {
// the message to the world Console.WriteLine(“Hello, world!”); Console.ReadLine();
} }
}
The indentation used in programming, in most cases, does not affect the functionality of the program. It’s there to make it easier for the developer (or other people looking at the code) to read. In the C# example, the indentation makes it easy to see that there are three open braces and three corresponding close braces. Again, for the IT Fundamentals+ (ITF+) exam, don’t worry about being
able to read or write code in a specific language. These examples are intended to give you a flavor for what a few common languages look like.
Interpreted Languages
The second major classification of modern, high-level programming languages is interpreted languages. With an interpreted programming language, each line of code is read by an interpreter every time the program is executed. Contrast this with compiled languages, in which the source code is compiled once and then executed any number of times.
An interpreter and a compiler essentially do the same thing—they take high-level source code and translate it into low-level machine code. They differ a bit in how they perform their job, however. Table 6.3 outlines some of the key differences.
TABLE 6.3 Interpreter vs. compiler
Task Interpreter Compiler
Translating source code One statement at a time Entire program at once
Executing programs By the interpreter Creates an executable file (usually .exe)
Analyzing source code Faster Slower
Executing code Slower Faster
Using memory Lower Higher
Debugging Easier Harder
Debugging interpreted code is easier because the source code is read line by line. If the interpreter gets to a line that is written incorrectly, it stops there and generates an error. A compiler will read all of the code, and if something is wrong somewhere, it will generate an error. Some compilers will give you a clue as to the location of the problem, but troubleshooting a compiled language is usually more challenging than troubleshooting an interpreted language.
There are three types of interpreted languages about which you need to be familiar: markup languages, scripting languages, and scripted languages.
Markup Languages
Compare and contrast the Java code to assembly. A few things might jump out. First, the program is a lot shorter. Second, the syntax is different. Java uses braces (the { and }) to indicate code blocks. Notice that for every open brace, there is a corresponding close brace. Single-line comments in Java are preceded with two slashes (//) instead of a semicolon. Even little things—assembly uses single quotes around the words that you want to print while Java uses double quotes—are different. As for the rest of the context, don’t worry about understanding it all right now. Again, the point is to just get a feel for what some basic code looks like.
The next stop on your tour of compiled languages is C++. Let’s look at source code for your new favorite program:
// Header file
#include<iostream>
using namespace std;
// the main function is where the program execution begins
int main()
{
// the message to the world
cout<<“Hello, world!”;
return 0;
}
There are a few similarities to Java. C++ and Java are both derivatives of the C language, so it makes sense that they would share some features. For example, comments start with two slashes, and braces are present to create blocks of code. Other things that you might have noticed are that both use a main function, double quotes for text, and a semicolon to end a statement.
Finally, take a look at the same program written in C#. C# is also a derivative of C, so some of this code might start to look familiar to you:
using System;
namespace HelloWorld
{
Class Program
{
// the main function - also known as a method
Static void Main(string[] args)
{
// the message to the world
Console.WriteLine(“Hello, world!”);
Console.ReadLine();
}
}
}
The indentation used in programming, in most cases, does not affect the functionality of the program. It’s there to make it easier for the developer (or other people looking at the code) to read. In the C# example, the indentation makes it easy to see that there are three open braces and three corresponding close braces. Again, for the IT Fundamentals+ (ITF+) exam, don’t worry about being able to read or write code in a specific language. These examples are intended to give you a flavor for what a few common languages look like.
Interpreted Languages
The second major classification of modern, high-level programming languages is interpreted languages. With an interpreted programming language, each line of code is read by an interpreter every time the program is executed. Contrast this with compiled languages, in which the source code is compiled once and then executed any number of times.
An interpreter and a compiler essentially do the same thing—they take high-level source code and translate it into low-level machine code. They differ a bit in how they perform their job, however. Table 6.3 outlines some of the key differences.
TABLE 6.3 Interpreter vs. compiler
Task Interpreter Compiler
Translating source code One statement at a time Entire program at once
Executing programs By the interpreter Creates an executable file (usually .exe)
Analyzing source code Faster Slower
Executing code Slower Faster
Using memory Lower Higher
Debugging Easier Harder
Debugging interpreted code is easier because the source code is read line by line. If the interpreter gets to a line that is written incorrectly, it stops there and generates an error. A compiler will read all of the code, and if something is wrong somewhere, it will generate an error. Some compilers will give you a clue as to the location of the problem, but troubleshooting a compiled language is usually more challenging than troubleshooting an interpreted language.
There are three types of interpreted languages about which you need to be familiar: markup languages, scripting languages, and scripted languages.
Markup Languages
A markup language is a language that programmers can use to annotate—or mark
up—text to tell the computer how to process or manipulate the text. You might be
familiar with markups in other walks of life. For example, have you ever used a
highlighter or pencil to mark text in a book, such as a school book? That’s an example of
markup, because you are highlighting something important. To create a markup
language, there needs to be a codified set of rules telling the processor what to do with
the marked-up text when it encounters it.
There are multiple classes of markup languages, but the most common application of
them is in the creation of web pages. Hypertext Markup Language (HTML) is the
language in which most web pages are created. HTML allows web developers to format
web pages. It’s sort of a joke among developers that if you know HTML, all that means is
that you’re good at drawing boxes. HTML does allow you to do that in order to lay out a
web page, but HTML developers can do much more than just draw boxes! The current
version is HTML5, but you’ll see it referred to simply as HTML.
HTML is a bit unique among languages in that the pages that contain it are stored on a
server, and the pages themselves are downloaded to a client and then processed by
specialized software. This might sound complicated, but it’s quite likely you’re familiar
with the process. Said differently, a user opens a web browser and visits a website. The
site sends the page to the browser, which interprets the language and displays it
properly.
Another common markup language is Extensible Markup Language (XML).
HTML works by using tags to signify instructions for the browser. Tags take this format:
<TAG> (something) </TAG>
Generally speaking, all tags have an opening tag <> and a closing tag </>. For example, to tell the browser to bold the word penguin, the web page would read <b> penguin </b>. Tags enable the developer to control all elements of the text, such as where the text should appear on the screen, in what font it should it appear, the color of the text,
and all other features related to its style. Tags are also used to create tables, place
images properly, set background colors, and include hyperlinks to other web pages.
Remember the “Hello, world!” program? You get to create a simple web page displaying
this text in Exercise 6.2.
EXERCISE 6.2
Creating “Hello, world!” in HTML
1. Open Notepad (or another text editor). You can open Notepad by clicking Start,
typing the word note, and clicking Notepad when it appears under Best Match.
Type in the following code to the text editor:
<html>
<header><title>Tab title</title></header>
<body>
Hello, world!
</body>
</html>
2.
3. Save the file to your desktop as hello.html. 4. Double-click the hello.html file. It should open in your browser and look
something like Figure 6.2.
FIGURE 6.2 Hello, world!
5. Go back to your hello.html file in the text editor. (If you closed it, right-click, click Open With ➢ Choose Another App ➢ More Apps, and scroll down until you
find Notepad.)
Change the line that says Hello, world!, to the following: <b><i>Hello, world!</b></i>
6.
7. Save the file.
8. Open hello.html again in a web browser. Notice the change in the text.
Open the file in the text editor again, and change the <body> line to the following: <body bgcolor=“#DDEA11”>
9.
10. Save the file.
11. Open hello.html again in a web browser. What change do you see?
In Exercise 6.2, step 9, the code DDEA11 is a hexadecimal code to indicate color. The first two characters are for red, the second two are for green, and the third
two are for blue. For example, 00FF00 is true green, and 0000FF is true blue. Learning about hex was important! You can play around with hex color codes on several different
websites, such as www.color-heScripting Languages
The second category of interpreted languages is scripting languages. For much of the
history of computers, operating system interfaces were a simple command prompt, and
only one task could be executed at a time. Scripting languages came along and were used
for executing a list of tasks. One of the earlier common scripting languages was the
Bourne again shell (Bash), which also happened to be a popular command interface (or
shell) for UNIX-based operating systems. Essentially, someone could create a file that
contained multiple actions to perform and then execute the file.
Scripting languages today have evolved a great deal, but for the most part are still
designed to create simple programs that execute a list of tasks or get data from a data
set. Their advantage is that they’re less code-intensive than the compiled language
counterparts; you can get more done with less code. In addition, modern scripting
languages support the use of objects, variables, and functions, which I’ll talk about in
the “Understanding Programming Concepts and Techniques” section later in this
chapter. Some of the most popular scripting languages are JavaScript (JS), Visual Basic
(VB) Script, PHP, Perl, and Python.
Perhaps the most common use of scripting languages is to execute tasks from within a
web page written in HTML. Remember that a markup language is designed to present
and format information—it doesn’t really “do” anything else like execute a program. So,
developers will insert a script to execute the tasks they need the website to perform.
Here’s an example, with some code you have seen before:
<html>
<header><title>Tab title</title></header>
<body>
<script>
alert('Hello, world!');
</script>
</body>
</html>
When you execute this web page, you will see something like Figure 6.3.
FIGURE 6.3 “Hello, world!” JavaScript alert
Notice that an alert box pops up, because the alert method was used. You can get the
message to appear in the browser window by using document.write(‘Hello, world!’); instead. One question you might have is, how did the browser know that it’s reading JavaScript and not some other script? The <script> tag tells HTML that a script is coming; by default, HTML assumes that it’s JavaScript.
The syntax for Python is even simpler. All that’s needed is the following:
print(“Hello, world!”)
As I’ve said before, don’t worry about memorizing the code or specific methods. Know a
few examples of scripting languages, and remember that they are usually short, don’t
need a compiler, and typically execute a list of tasks.
Scripted Languages
While the terms scripted languages and scripting languages are similar, they do
different things. A scripted language doesn’t operate on its own as scripting languages
do, but instead it needs a command interpreter to be built into the program. Scripted
languages are often used to modify video games—that is, to add functionality above and
beyond what the initial developers created—without altering the core of the game itself.
Common scripted languages include Lua and Lisp.
Query Languages
Of the four language categories, query languages are the most unlike the others. The
term query is synonymous with question, and a query language is specialized to ask
questions. Specifically, query languages are designed to retrieve data from databases.
The most common query language, by far, is Structured Query Language (SQL). Another
example is the Lightweight Directory Access Protocol (LDAP), which is designed to
query directory services such as Microsoft’s Active Directory.
Microsoft’s Active Directory is beyond the scope of this book, but essentially it’s the database on a Microsoft server that stores all of the information about
users and security.
Basic SQL syntax is straightforward. To ask the database for data, a SELECT statement is used, followed by the specifics of what you want and where you want it from. Queries
often follow a structure like this:
SELECT column_name
FROM table_name
WHERE condition
While the basic syntax is relatively simple, it can get a lot more complicated. SQL allows
for insertion of data into tables, joining data from multiple tables, and several different
types of operators, such as finding the minimum and maximum, counting, finding
averages, and summing values. Interfacing with databases is IT Fundamentals+ (ITF+)
Exam Objective 5.3, and it will be covered in more detail in Chapter 7, “Database
Fundamentals.”
One Is the Loneliest Number
Throughout this chapter, I’ve talked about different categories of programming
languages and what they are used for. In the real world, most programs actually use
code from multiple languages to complement each other’s functionality. Some languages
are simply better at certain tasks than others.
For example, as mentioned earlier, Windows 10 uses three languages: the kernel is
written in C, most of the code is in C++, and some custom Intel assembly is also
included. If any application needs to get data from a database, it’s most likely using SQL
in addition to the language in which it was coded. Websites use HTML and most likely a
scripting language, such as JavaScript or Python. YouTube uses both JavaScript and
Python.
Using a single language for a program seems to be more of the exception than the norm.
If you want to get into coding, it’s good to have proficiency in multiple languages to
expand your effectiveness.
Understanding Programming Concepts and Techniques
A programmer’s goal is to get a computer to do what he or she wants it to do. Think just a minute about the variety of things that different programs do. Operating systems are massive and complex and perform thousands of different tasks, from accepting input and providing output to managing and monitoring devices to establishing connections to remote computers. Each of these tasks is handled by different parts of the program, so you can see why Windows reportedly has 65 million lines of code. Even relatively simple programs can easily contain several thousands of lines.
One of the ways that programmers can simplify their work is to reuse sections of code. For example, imagine that a program needs to perform a mathematical calculation. The developer would write code for it. Later in the program, if the same calculation is needed, the programmer can simply reference the previous section of code as opposed to needing to rewrite the whole thing. Many programs are kind of like Frankenstein in nature. Blocks of code perform specific tasks, and the developer figures out how to stitch them all together into a finished product. Sometimes it’s elegant, sometimes it’s ugly, and sometimes it’s a bit of each.
In this section, you will learn about concepts that developers use to make their tasks easier. First up is programming logic. This refers to what the program does and includes topics such as logic components, data types, and identifiers. Then you will learn about organizational techniques such as flowcharts, pseudocode, containers, functions, and objects.
Programming Logic
Processors perform math and logic operations, so it figures that all programs are made up of logic and arithmetic. For example, processors can add or subtract numbers or compare two values to
each other and determine an action to take based upon the result. Essentially, the job of the programmer is to tell the processor what to do based on the results of a logic puzzle.
The two main ways that programs perform logic are branching and looping.
Branching
People use a lot of branching logic in their daily lives. For example, if you’re driving a
car, your brain probably follows a simple process when it comes to traffic lights. If the
light is red, you stop. From a programmatic standpoint, this is represented as a simple
if... then statement. If a certain condition exists (red light), then take a specific action (stop the car). “But, wait!” you might say, “What if the light is green?” The logic
then used is called if... then, else. If the light is red, stop, else go. This is an example with only one choice to make based on two discrete conditions. Branching can
handle many more conditions as well.
For example, most traffic lights have an amber light too, telling you to slow down
because a red light is imminent. The processor now needs to take an action based on one
of three conditions. You can tell it to do that by using else if statements. Here’s an example:
if light = red, then stop else if light = green, then go else if light = amber, then slow down
This pseudocode isn’t for a real programming language, but it illustrates how the logic
works. Computers, as you know, deal only in data and not stoplights. Instead of
comparing the colors of a stoplight, a processor compares two different pieces of data.
Table 6.4 lists common data types with which you need to be familiar.
TABLE 6.4 Common data types
Type Explanation Examples
Char One character, such as a UTF-16 or UTF-32 character A or a
String Zero or more characters “This is a string” and “S0
is th1$.”
Integ
er
Whole number with no decimal point 5 or 500000
Floats Any number with a decimal place 5.2 or 5.000001
Boole
an
A true or false condition, usually represented by a 1
(true) or 0 (false)
1 or 0
There can be some overlap between the definitions, meaning that some data could
appear to be multiple data types. For example, a string can consist of only one
character, making it look like a char. Numbers designated as floats might not have a
decimal place, whereas an integer definitely does not have any decimal places. A
Boolean data type is always true or false; if its output is represented as a 1 or 0, it could
look like an integer. Regardless of what it looks like, the data type is what it was defined
to be within a program.
Boolean values are particularly important because they often directly control computer
logic. If the processor compares two values and based on the condition, the result is true
(for example, is random integer 1 greater than random integer 2), then the program will
follow a specific path. If false (random integer 1 is not greater than random integer 2),
then the program will go down a different path.
Here’s another example using if, else if statements, this time comparing integers. Assume that you’re creating a program that needs to categorize people based on their
age. If the person is younger than 13, they are a child. Anyone younger than 20 is a teen,
younger than 65 is an adult, and older than 65 is a senior. What would that logic look
like? It could look something like this:
if age < 13, then category “Child” else if age < 20, then category “Teen” else if age < 65, then category “Adult” else category “Senior”
Based on an input to the program, the person will be categorized appropriately.
There’s one other concept to introduce here, and it’s that of identifiers. In the example
of age classification, the numbers defining the categories are set. A set or predefined
number like this is called a constant. As you might imagine based on its name, this
means that it doesn’t change. The other type of identifier is a variable, which simply
means that it can change.
There’s one other concept to introduce here, and it’s that of identifiers. In the example of age classification, the numbers defining the categories are set. A set or predefined number like this is called a constant. As you might imagine based on its name, this means that it doesn’t change. The other type of identifier is a variable, which simply means that it can change.
Imagine a scenario where instead of putting people into a category based on their age, the program is designed to determine who is older. The age values of two people will be entered, and the program is supposed to say if person 1 is older than person 2. Since the ages of the people are not predetermined, they are variable. The logic could look something like this:
if age (person 1) > age (person 2), then “Older” else if age (person 1) = age (person 2), then “Same” else “Younger”
Variables can also change throughout a program. For example, think of a program designed to count sums of money. A variable can be defined early in the program to accept the first input. Then, when the second amount of money is input (again, a variable amount), the program adds the two values together and uses the sum as a third variable.
Branching logic statements are used to compare variables and constants to other variables and constants and can be used across different data types. The Boolean output of the comparison (true or false) is used to determine the path that the program takes. Branches are good for simple logical comparisons, but more complicated activities might require different logic.
Looping
As its name implies, looping logic is circular rather than linear like branching logic. At the center of looping is the while statement. Looping is useful for monitoring a state within a program, and then invoking an action when that state changes. Think back to the simple stoplight example given earlier. In human terms, the logic can basically be, “While the stoplight is red, stop; otherwise, go.” While the light is red, you keep repeating the loop until the condition changes, and once it changes, you take a different action.
You can also use looping for a counting function. For example, while x < 10 is true, count x + 1. This pseudocode example uses a few concepts you’ve learned in this chapter, such as variables (x) and Boolean (true).
Loops can be powerful tools, but they can also be problematic. If not coded properly, the program can end up in an infinite do while loop. (Sometimes you hear people say that they don’t want to get stuck in a “do loop,” and this is where that reference comes from.) An example would be using a loop to count but then neglecting to tell it when to stop. To avoid this, the developer must ensure that they provide proper exit conditions for the loop.
Loops and branches can (and often are) used together to perform more complex operations.
Organizing Code
Earlier in this chapter, I mentioned that programs often have a Frankenstein feel to them. This is because when programmers sit down to write code, they think about it in terms of the tasks to be done and what it takes to accomplish those tasks. This compartmentalization is one of the key organizing principles that developers follow. Breaking something like Windows down into manageable chunks lets the developer focus on exactly what is needed for a small, discrete task rather than feeling overwhelmed by the enormity of it all.
As you can imagine, organizing the code is important, but it’s also important for the developer to be organized outside of the code; that is, to lay out a blueprint of how the program will function so the developer can construct it or to have people on the team build various parts of it. The following sections cover different organizational concepts.
Flowcharts
A flowchart is a visual representation of a program that uses boxes to represent the logic. Flowcharts are critical in the software development process. They should be created before the code is developed, much like blueprints should be drawn before building a house. They help the developers visualize the flow of the program, making it easier to plan out the sections of code needed.
Flowcharts show the sequence of operations within a program, including where data input is needed as well as decisions to be made and the logic choices. Different shapes are used to indicate different components. For example, a rectangle is used to indicate a process, whereas a diamond is used to show a decision point.
There are various flowchart software packages on the market, and you can also use Microsoft Word or PowerPoint to create simple ones. Figure 6.4 shows a simple flowchart for a program designed to find the smallest of three values.
FIGURE 6.4 Flowchart
Pseudocode
True to its name, pseudocode is literally fake code. It’s fake in the sense that it’s not read
by the processor and has no effect on the functioning of the program. It can be helpful to
people trying to read the code, however.
I’ve used the term a few times in this chapter already, in describing examples of logic.
The pseudocode I wrote wouldn’t be recognized by any interpreter or compiler, but it
was intended for ease of reading and understanding. Another example of pseudocode is
for comments in programs. Comments can be used to describe the purpose of a line or
block of code or to show a mathematical formula that might be incredibly complicated
and hence harder to read in the actual code.
A joke among programmers is that some modern scripting languages, like Python, use such simple syntax that you could write a Python script in pseudocode and
it would still work.
Pseudocode can be particularly helpful in situations where a team of programmers is
responsible for one program. It can be challenging for a developer who didn’t write the
code to figure out what’s wrong if there’s a problem. Having well-annotated code can
help the troubleshooter understand what the original developer intended when creating
the code.
Containers
In real life, containers hold things—in computer programming, they do the same thing.
Earlier in this chapter you learned about variables, which are values that can change
based on different input or conditions within a program. When developers define a
variable, they’re allowed to specify only one value for that variable. Using containers,
multiple values of similar types can be grouped together and accessed at the same time.
There are two types of containers about which you should be aware: arrays and vectors.
An array is simply a list of values. There are a few key defining factors to an array. First,
all of the elements in an array must be of the same data type. Second, the array is
predefined in size and does not change.
A vector also holds a list of values. However, the values do not need to be of the same
data type, and vectors can be dynamically allocated, meaning that they can shrink or
grow as the program requires. Otherwise, vectors behave very similarly to arrays (in the
sense that they store values and allow for their retrieval). Because vectors are
dynamically sized, they are far more versatile than arrays and are the preferred
container type.
Functions
Developers often employ sections of reusable code, and that is what functions are. When
a particular set of instructions is needed, a function block can be created to accomplish
the task. Whenever that task is needed, the program references that function. Functions
are generally designed to take input, transform it somehow, and deliver output.
Functions are linear in nature, meaning that they take input, process it, and then deliver
output. It’s not to say they can’t have looping logic in them but rather that the function
starts at the beginning of the code block and finishes at the end, handing off to another
process.
Here’s an analogy to help illustrate how functions work—the command prompt on older
operating systems. The command prompt just waits for user input. The computer won’t
perform any other tasks while it sits and waits, and it has an indefinite supply of
patience (unless, of course, it loses power). A user enters a command, and then the
computer springs into action. It performs one or more tasks based on the command.
Maybe it opens a file, changes permissions on a folder, or creates a network connection
with a server. It completes that task, perhaps produces some output, and then gives the
user another command prompt. It’s then ready for another task, and it will be patient.
Objects
Since I just provided an analogy for how functions work, it’s a good time to do the same
for objects. As you’re probably aware, modern operating systems do not provide a
simple command prompt for input. In fact, the graphical user interface (GUI) of an
operating system has dozens of ways in which you can “enter” into functionality with it.
There are icons on the screen, some sort of launcher like the Start button, more icons on
the task bar, and a clock and random items in the system tray. You can click any of them
and make something happen, or you can right-click the screen and make something
happen too. Even better, you can click multiple items and have several processes
running at once. (Okay, technically, only one process ever runs at a time, but they switch
back and forth so quickly that it appears to us like they all run at the same time.)
How does this analogy relate to an object? In a few ways. First, each item on your
desktop with which you can interact is considered its own object, each with its own set
of properties and attributes. Second, thinking of the GUI, you can begin interacting with
it at any number of entry points and stop interacting with it at any number of places as
well. There isn’t just one specific entry and exit point.
Now that you have a rough idea of how objects work, here’s the definition: Objects are
collections of attributes, properties, and methods that can be queried or called upon to
perform a task. Said differently, an object can be a variable, function, method, or data
structure that can be referenced.
Objects have properties and attributes. The words are synonyms, but there are
differences. The term properties describes the characteristics of the object, while
attributes refers to additional information about an object. As a specific example, people
have a property called height. However, height can be expressed in several different
ways, such as 5′ 8″, 173 centimeters, or 1.73 meters, depending on the attribute used.
Properties and attributes are used in different ways in coding. First, a property can be
different data types, such as a string, integer, or Boolean. Properties of an object can be
modified through code. Attributes only have the data type of string and can’t be
changed. If you attempt to modify the value of an attribute and then ask to display the
attribute’s value, the program will return the default value.
If the terms properties and attributes are confusing to you, know that you’re not alone.
Asking for the difference between them is a challenging question—even for experienced
programmers!
When referring to methods, think of them like functions for objects. They are ways to
organize several tasks together to perform an operation.
Objects and Classes and Methods, Oh My!
A major feature of object-oriented programming (OOP) languages such as Java, C++,
C#, Python, PHP, Perl, and Ruby is the use of objects. As you learned earlier, an object is
a collection of properties and attributes.
In most OOP languages, objects consist of three things:
● Identity, which is the name of the object ● State, which is represented by attributes and reflects properties of the object ● Behavior, which determines the response of the object and is represented by
methods
For example, let’s say that a dog is an object. The dog has an identity, which is its name.
In theory, you can use the name of the dog to interact with it. (I say in theory, because
the dog may choose to ignore you.) The dog has a state, such as its breed, height, color,
and coat length, and it has behaviors such as eating, playing, sleeping, and barking.
Multiple behaviors can be called from a method. For example, playing and barking
might belong to a method called “fun time.”
OOP languages also use the concept of classes. A class is the blueprint for objects—it
describes an object’s state and behaviors. In computer terms, the class describes what
the object can hold or do. To create an object, you need a class from which to create it.
Any introduction to objects needs to include the mention of classes, so now you know!
Summary
In this chapter, you learned about software development concepts. To begin, you
explored four categories of programming languages. The first was assembly, which is a
low-level language that is used to access hardware directly. While learning about
assembly, you also learned about notational systems such as binary, hexadecimal,
decimal, ASCII, and Unicode. The second grouping was compiled languages. Compiled
languages are high-level