Interpreting HTML Markup

profileraybetts1
Unit41.pdf

It’s easy to imagine the creators of ENIAC or other early computers looking at each

other and saying, “There has to be an easier way.” They had created what was (at the

time) an engineering marvel—an automated computer that could do thousands of

mathematical operations per second. It was significantly faster than any human at doing

math, but there was a problem. The developers had to tell the computer what to do, and

that “programming” process could take weeks for a team of engineers to create and

debug. Any change in the program, even a simple one such as telling it to subtract

instead of add, took hours of changes and testing. And if they had a different math

equation altogether, it required the creation of an entirely new program. There had to be

an easier way.

Fast-forward to today, and it’s impossible to count the number of software titles in

existence. From operating systems to productivity software to games to—you name

it—the breadth and depth of available programs would blow the minds of those early

computer pioneers. One thing hasn’t changed, though: People still need to tell

computers what to do. Computers can’t independently think for themselves. Perhaps

that will change (for better or worse) with advances in artificial intelligence, but for now

computers need instructions, and those instructions are delivered via preprogrammed

software packages. It is the easier way.

This chapter gives you an overview to foundational software development concepts. It’s

not intended to turn you into a programmer—one chapter could not do that task justice.

Rather, at the end of this chapter, you will understand the basics of different

programming languages as well as some concepts and techniques that programmers use

to make their jobs easier. If you find these topics interesting and want to pursue a path

that includes programming, there are ample courses, books, and online materials to

help you get there.

Exploring Programming Languages

All of the software that you use today, whether it’s on your smartphone, a workstation,

or a cloud-based app, was created by a programmer. More than likely, it was created by

a team of them. Small and simple programs might have only a few hundred lines of

code, whereas an elaborate program like Windows 10 reportedly has about 65 million

lines. Now, most developers will tell you that lines of code are a terrible measure for

anything—the goal is to get the desired functionality with as little code as possible—but

the Windows 10 statistic underscores the complexity of some applications.

  I use the terms programmers, developers, and coders interchangeably in this chapter.

Just as there are a large number of software titles on the market, numerous

programming languages exist. Each language has its own grammar and syntax, much

like the rules of English are different from the rules of Spanish. Software developers

typically specialize in one or two languages but have a basic foundational understanding

of the popular ones in the marketplace. For example, a coder might identify as a C++

programmer but also know a scripting language like JavaScript. Because of the

similarities between language concepts, the coder can look at code from an unfamiliar

language and generally understand what it does.

In this section, you will learn about four categories of programming languages:

assembly, compiled, interpreted, and query.

Assembly Language

Any programmer writing code is telling the computer—specifically, the processor—what

to do. Assembly language is the lowest level of code in which people can write. That is,

it allows the developer to provide instructions directly to the hardware. It got its name

because after it’s created, it’s translated into executable machine code by a program

called an assembler.

Assembly code is specific to processor architectures. A program written for a 32-bit Intel

chip will look different than one written for an ARM chip, even if the functionality is

identical.

Originally developed in 1947 at the University of London, assembly was the primary

programming language used for quite some time. Operating systems such as IBM’s PC

DOS and programs like the Lotus 1-2-3 spreadsheet software were coded in assembly, as

well as some console-based video games. In the 1980s, higher-level languages overtook

assembly in popularity.

Even though it doesn’t dominate the landscape, assembly is still used today because it

has some advantages over higher-level languages. It can be faster. It’s also used for

direct hardware access, such as in the system BIOS, with device drivers, and in

customized embedded systems. It’s also used for the reverse engineering of code.

Translating high-level code into assembly is fairly straightforward, but trying to

disassemble code into a higher-level language might not be. On the downside, some

virus programmers like it because it’s closer to the hardware.

Understanding Notational Systems

Before I get into how assembly works or what it looks like, it’s important to take a step

back and think about how computers work. Computers only understand the binary

notational system—1s and 0s. Everything that a computer does is based on those two

digits—that’s really profound. When you’re playing a game, surfing the web, or chatting

with a friend using your computer, it’s really just a tremendously long string of 1s and

0s. Recall from Chapter 1, “Core Hardware Components,” the basic organizational

structure of these 1s and 0s. One digit (either a 1 or a 0) is a bit, and eight digits form a

byte.

There are a lot of real-life practical examples of binary systems. Take light switches, for

example. A conventional light switch is either on (1) or off (0). Ignoring dimmable

lighting for a minute, with a traditional switch, the light is in one of two distinct states:

on or off. This is binary, also known as “base 2” because there are two distinct values.

Humans are far more used to counting in base 10, which is the decimal notational

system. In decimal, the numbers 0 through 9 are used. To show the number that’s one

larger than 9, a second digit is added in front of it and reset the rightmost digit to 0. This

is just a complicated way of telling you something you already know, which is 9 + 1 = 10.

Binary math works much the same way. The binary value 1 equals a decimal value of 1. If

you add 1 + 1 in binary, what happens? The 1 can’t increase to 2, because only 1s and 0s

are allowed. So, the 1 resets to a 0, and a second digit is added in front of it. Thus, in

binary, 1 + 1 = 10. Then, 10 + 1 = 11, and 11 + 1 = 100. If you’re not accustomed to

looking at binary, this can be a bit confusing!

Now think about the structure of a byte, which is eight bits long. If you want to convert

binary to decimal, then the bit’s position in the byte determines its value. Table 6.1

illustrates what I mean.

TABLE 6.1 Converting binary to decimal

Position 8 7 6 5 4 3 2 1

Bit 1 1 1 1 1 1 1 1

Base 2

7 2

6 2

5 2

4 2

3 2

2 2

1 2

0

Value 128 64 32 16 8 4 2 1

If the bit is set to 1, it has the value shown in the value row. If it’s set to 0, then its value

is 0. Using the example from a few paragraphs ago, you can now see how binary 100

equals a decimal 4. The binary number 10010001 is 128 + 16 + 1 = 145 in decimal. Using

one byte, decimal values between 0 and 255 can be represented.

  It’s unlikely that you will be asked to perform binary to decimal conversion on the IT Fundamentals+ (ITF+) exam. It’s a good concept to understand, though, and

it’s material to understanding how assembly works!

To take things one step further, there’s a third system used commonly in programming,

which is the hexadecimal notational system, or base 16. You’ll also see it referred to as

hex. In hex, the numbers 0 to 9 are used, just like in decimal. However, the letters A to F

are used to represent decimal numbers 10 through 15. So, in hex, F + 1 = 10. Aren’t

notational systems fun? The key when dealing with numbers in programming is to

understand clearly which notational system you’re supposed to be using. Exercise 6.1

will give you some

Some programming languages will use the prefix 0x in front of a number to indicate that it’s in hexadecimal. For example, you might see something like 0x16FE. That just means the hex number 16FE. Other languages will use an h suffix, so it would be written as 16FEh.

Binary, decimal, and hex work great for representing numbers, but what about letters

and special characters? There are notational systems for these as well. The first is

American Standard Code for Information Interchange (ASCII), which is pronounced

ask-e. ASCII codes represent text and special characters on computers and

telecommunications equipment. The standard ASCII codes use seven bits to store

information, which provides for only 128 characters. Therefore, standard ASCII only has

enough space to represent standard English (Latin) uppercase and lowercase letters,

numbers 0 to 9, a few dozen special characters, and (now obsolete) codes called control

codes. Table 6.2 shows you a small sample of ASCII codes. You can find the full table at

www.asciitable.com.

TABLE 6.2 Sample ASCII codes

Dec Hex HTML Character

33 21 ! !

56 38 8 8

78 4E N N

79 4F O O

110 6E n n

Covering only the Latin alphabet isn’t very globally inclusive, so a superset of ASCII was

created called Unicode. The current version of Unicode supports 136,755 characters

across 139 language scripts and several character sets. Unicode has several standards.

UTF-8 uses 8 bits and is identical to ASCII. UTF-16 uses 16 bits (allowing for 65,536

characters, covering what’s known as the Basic Multilingual Plane) and is the most

common standard in use today. UTF-32 allows for coverage of the full set of characters.

The Unicode table is at unicode-table.com/en.

Working with Assembly

Coding in assembly is not for the faint of heart. As I mentioned earlier, you need to

know the version specific to the processor’s platform. In addition, you need to know how

memory segmentation works and how processor codes will respond in protected and

unprotected memory environments. There’s more to it than those few criteria, but

suffice it to say that it’s challenging work.

Let’s start with a simple example, remembering that all computers understand are 1s

and 0s. Say that you have a 32-bit Intel processor and want to move a simple 8-bit

number into a memory register. The binary code to tell the processor to move data is

10110, followed by a 3-bit memory register identifier. For this example, you’ll use the lowest part of the accumulator register (you won’t need to know this for the exam),

which is noted as AL. The code for this register is 000. Finally, you need to tell the CPU the number that you want to move into this register—you’ll use 42. In binary, the

command looks like this:

10110000 00101010

That’s not very user friendly or easy to remember. Using decimal to hex conversion, you

can simplify it to the following:

B0 2A

That literally means, “Move into memory register AL the number 42.” It's still not very

user friendly. To help ease the challenge, assembly developers have created mnemonic

codes to help programmers remember commands. For example, the command MOV

(short for move) is a mnemonic to replace the binary or hex code. So, the command can

now be written as follows:

MOV AL, 2Ah ;Move the number 42 (2A hex) into AL

You’ll notice a few things here. The first is that the command MOV AL is much easier to remember than the binary code, and it makes more sense in human terms than does B0. The second is that I added some real words after a semicolon. Most languages allow

coders the ability to add comments that are not processed. In assembly, anything on a

line after a semicolon is considered a comment and ignored by the processor.

So, to summarize the basic structure of a line of code, it contains processor instructions

(“do this”), directives (defining data elements or giving the processor specific ways of

performing the task), data, and optional comments. This is pretty much true for all

programming languages, although the structure and syntax will vary.

As I wrap up this section on assembly, I want to leave you with one small gift. A

tradition in pretty much every programming class is that the first program you are

taught to write is how to display “Hello, world!” on the screen. The way to create this

friendly greeting varies based on the language, so when I cover various languages, I am

going to show you what it looks like or have you do it. The intent isn’t to have you

memorize the code or learn to program but to give you a feel for what the code looks like

for an actual application. So, without further ado, here is “Hello, world!” in all of its

assembly glory:

section .text

global _start ;must be declared for linker (ld)

_start: ;tells linker entry point

mov edx,len ;message length

mov ecx,msg ;message to write

mov ebx,1 ;file descriptor

mov eax,4 ;system call

int 0x80 ;call kernel

section .data

msg db 'Hello, world!', 0xa ;the message!

len equ $ - msg ;length of the string

When the code is assembled and executed, it will display the following on the screen:

Hello, world!

Compiled Languages

High-level languages have replaced assembly as the most commonly used ones in

software development today. When creating a new application, the developer must

decide between a compiled language and an interpreted language. A compiled

programming language is one that requires the use of a compiler to translate it into

machine code. Creating and using a program using a compiled language consists of

three steps:

1. Write the application in a programming language, such as Java or C++. This is

called the source code.

2. Use a complier to translate the source code into machine code. Most software

development applications have a compiler.

3. Execute the program file, which (in Windows) usually has an .exe extension.

There are dozens of compiled languages a programmer can choose from, but unless

there is a specific need, the programmer is likely going to go with one of the more

common ones. In days past, the choices might have been Fortran, BASIC, or Pascal.

Now, Java, C, C++ (pronounced C plus plus), and C# (pronounced C sharp) are the most

popular. The Linux and Windows kernels are written in C. The rest of Windows is

written mostly in C++, with a bit of custom assembly thrown in for good measure.

Let’s take a look at the source code for “Hello, world!” in Java, which is the most popular

programming language in use today:

public class HelloWorld {

public static void main(String[] args) {

// Prints “Hello, world!” in the terminal window.

System.out.println(“Hello, world!”);

}

}

Compare and contrast the Java code to assembly. A few things might jump out. First,

the program is a lot shorter. Second, the syntax is different. Java uses braces (the { and }) to indicate code blocks. Notice that for every open brace, there is a corresponding close brace. Single-line comments in Java are preceded with two slashes (//) instead of a semicolon. Even little things—assembly uses single quotes around the words that you

want to print while Java uses double quotes—are different. As for the rest of the context, don’t worry about understanding it all right now. Again, the point is to just get a feel for what some basic code looks like.

The next stop on your tour of compiled languages is C++. Let’s look at source code for your new favorite program:

// Header file #include<iostream>

using namespace std;

// the main function is where the program execution begins int main() {

// the message to the world cout<<“Hello, world!”;

return 0; }

There are a few similarities to Java. C++ and Java are both derivatives of the C language, so it makes sense that they would share some features. For example, comments start with two slashes, and braces are present to create blocks of code. Other things that you might have noticed are that both use a main function, double quotes for text, and a semicolon to end a statement.

Finally, take a look at the same program written in C#. C# is also a derivative of C, so some of this code might start to look familiar to you:

using System;

namespace HelloWorld {

Class Program {

// the main function - also known as a method Static void Main(string[] args) {

// the message to the world Console.WriteLine(“Hello, world!”); Console.ReadLine();

} }

}

The indentation used in programming, in most cases, does not affect the functionality of the program. It’s there to make it easier for the developer (or other people looking at the code) to read. In the C# example, the indentation makes it easy to see that there are three open braces and three corresponding close braces. Again, for the IT Fundamentals+ (ITF+) exam, don’t worry about being

able to read or write code in a specific language. These examples are intended to give you a flavor for what a few common languages look like.

Interpreted Languages

The second major classification of modern, high-level programming languages is interpreted languages. With an interpreted programming language, each line of code is read by an interpreter every time the program is executed. Contrast this with compiled languages, in which the source code is compiled once and then executed any number of times.

An interpreter and a compiler essentially do the same thing—they take high-level source code and translate it into low-level machine code. They differ a bit in how they perform their job, however. Table 6.3 outlines some of the key differences.

TABLE 6.3 Interpreter vs. compiler

Task Interpreter Compiler

Translating source code One statement at a time Entire program at once

Executing programs By the interpreter Creates an executable file (usually .exe)

Analyzing source code Faster Slower

Executing code Slower Faster

Using memory Lower Higher

Debugging Easier Harder

Debugging interpreted code is easier because the source code is read line by line. If the interpreter gets to a line that is written incorrectly, it stops there and generates an error. A compiler will read all of the code, and if something is wrong somewhere, it will generate an error. Some compilers will give you a clue as to the location of the problem, but troubleshooting a compiled language is usually more challenging than troubleshooting an interpreted language.

There are three types of interpreted languages about which you need to be familiar: markup languages, scripting languages, and scripted languages.

Markup Languages

Compare and contrast the Java code to assembly. A few things might jump out. First, the program is a lot shorter. Second, the syntax is different. Java uses braces (the { and }) to indicate code blocks. Notice that for every open brace, there is a corresponding close brace. Single-line comments in Java are preceded with two slashes (//) instead of a semicolon. Even little things—assembly uses single quotes around the words that you want to print while Java uses double quotes—are different. As for the rest of the context, don’t worry about understanding it all right now. Again, the point is to just get a feel for what some basic code looks like.

The next stop on your tour of compiled languages is C++. Let’s look at source code for your new favorite program:

// Header file

#include<iostream>

using namespace std;

// the main function is where the program execution begins

int main()

{

// the message to the world

cout<<“Hello, world!”;

return 0;

}

There are a few similarities to Java. C++ and Java are both derivatives of the C language, so it makes sense that they would share some features. For example, comments start with two slashes, and braces are present to create blocks of code. Other things that you might have noticed are that both use a main function, double quotes for text, and a semicolon to end a statement.

Finally, take a look at the same program written in C#. C# is also a derivative of C, so some of this code might start to look familiar to you:

using System;

namespace HelloWorld

{

Class Program

{

// the main function - also known as a method

Static void Main(string[] args)

{

// the message to the world

Console.WriteLine(“Hello, world!”);

Console.ReadLine();

}

}

}

The indentation used in programming, in most cases, does not affect the functionality of the program. It’s there to make it easier for the developer (or other people looking at the code) to read. In the C# example, the indentation makes it easy to see that there are three open braces and three corresponding close braces. Again, for the IT Fundamentals+ (ITF+) exam, don’t worry about being able to read or write code in a specific language. These examples are intended to give you a flavor for what a few common languages look like.

Interpreted Languages

The second major classification of modern, high-level programming languages is interpreted languages. With an interpreted programming language, each line of code is read by an interpreter every time the program is executed. Contrast this with compiled languages, in which the source code is compiled once and then executed any number of times.

An interpreter and a compiler essentially do the same thing—they take high-level source code and translate it into low-level machine code. They differ a bit in how they perform their job, however. Table 6.3 outlines some of the key differences.

TABLE 6.3 Interpreter vs. compiler

Task Interpreter Compiler

Translating source code One statement at a time Entire program at once

Executing programs By the interpreter Creates an executable file (usually .exe)

Analyzing source code Faster Slower

Executing code Slower Faster

Using memory Lower Higher

Debugging Easier Harder

Debugging interpreted code is easier because the source code is read line by line. If the interpreter gets to a line that is written incorrectly, it stops there and generates an error. A compiler will read all of the code, and if something is wrong somewhere, it will generate an error. Some compilers will give you a clue as to the location of the problem, but troubleshooting a compiled language is usually more challenging than troubleshooting an interpreted language.

There are three types of interpreted languages about which you need to be familiar: markup languages, scripting languages, and scripted languages.

Markup Languages

A markup language is a language that programmers can use to annotate—or mark

up—text to tell the computer how to process or manipulate the text. You might be

familiar with markups in other walks of life. For example, have you ever used a

highlighter or pencil to mark text in a book, such as a school book? That’s an example of

markup, because you are highlighting something important. To create a markup

language, there needs to be a codified set of rules telling the processor what to do with

the marked-up text when it encounters it.

There are multiple classes of markup languages, but the most common application of

them is in the creation of web pages. Hypertext Markup Language (HTML) is the

language in which most web pages are created. HTML allows web developers to format

web pages. It’s sort of a joke among developers that if you know HTML, all that means is

that you’re good at drawing boxes. HTML does allow you to do that in order to lay out a

web page, but HTML developers can do much more than just draw boxes! The current

version is HTML5, but you’ll see it referred to simply as HTML.

HTML is a bit unique among languages in that the pages that contain it are stored on a

server, and the pages themselves are downloaded to a client and then processed by

specialized software. This might sound complicated, but it’s quite likely you’re familiar

with the process. Said differently, a user opens a web browser and visits a website. The

site sends the page to the browser, which interprets the language and displays it

properly.

  Another common markup language is Extensible Markup Language (XML).

HTML works by using tags to signify instructions for the browser. Tags take this format:

<TAG> (something) </TAG>

Generally speaking, all tags have an opening tag <> and a closing tag </>. For example, to tell the browser to bold the word penguin, the web page would read <b> penguin </b>. Tags enable the developer to control all elements of the text, such as where the text should appear on the screen, in what font it should it appear, the color of the text,

and all other features related to its style. Tags are also used to create tables, place

images properly, set background colors, and include hyperlinks to other web pages.

Remember the “Hello, world!” program? You get to create a simple web page displaying

this text in Exercise 6.2.

EXERCISE 6.2

Creating “Hello, world!” in HTML

1. Open Notepad (or another text editor). You can open Notepad by clicking Start,

typing the word note, and clicking Notepad when it appears under Best Match.

Type in the following code to the text editor:

<html>

<header><title>Tab title</title></header>

<body>

Hello, world!

</body>

</html>

2.

3. Save the file to your desktop as hello.html. 4. Double-click the hello.html file. It should open in your browser and look

something like Figure 6.2.

FIGURE 6.2 Hello, world!

5. Go back to your hello.html file in the text editor. (If you closed it, right-click, click Open With ➢ Choose Another App ➢ More Apps, and scroll down until you

find Notepad.)

Change the line that says Hello, world!, to the following: <b><i>Hello, world!</b></i>

6.

7. Save the file.

8. Open hello.html again in a web browser. Notice the change in the text.

Open the file in the text editor again, and change the <body> line to the following: <body bgcolor=“#DDEA11”>

9.

10. Save the file.

11. Open hello.html again in a web browser. What change do you see?

  In Exercise 6.2, step 9, the code DDEA11 is a hexadecimal code to indicate color. The first two characters are for red, the second two are for green, and the third

two are for blue. For example, 00FF00 is true green, and 0000FF is true blue. Learning about hex was important! You can play around with hex color codes on several different

websites, such as www.color-heScripting Languages

The second category of interpreted languages is scripting languages. For much of the

history of computers, operating system interfaces were a simple command prompt, and

only one task could be executed at a time. Scripting languages came along and were used

for executing a list of tasks. One of the earlier common scripting languages was the

Bourne again shell (Bash), which also happened to be a popular command interface (or

shell) for UNIX-based operating systems. Essentially, someone could create a file that

contained multiple actions to perform and then execute the file.

Scripting languages today have evolved a great deal, but for the most part are still

designed to create simple programs that execute a list of tasks or get data from a data

set. Their advantage is that they’re less code-intensive than the compiled language

counterparts; you can get more done with less code. In addition, modern scripting

languages support the use of objects, variables, and functions, which I’ll talk about in

the “Understanding Programming Concepts and Techniques” section later in this

chapter. Some of the most popular scripting languages are JavaScript (JS), Visual Basic

(VB) Script, PHP, Perl, and Python.

Perhaps the most common use of scripting languages is to execute tasks from within a

web page written in HTML. Remember that a markup language is designed to present

and format information—it doesn’t really “do” anything else like execute a program. So,

developers will insert a script to execute the tasks they need the website to perform.

Here’s an example, with some code you have seen before:

<html>

<header><title>Tab title</title></header>

<body>

<script>

alert('Hello, world!');

</script>

</body>

</html>

When you execute this web page, you will see something like Figure 6.3.

FIGURE 6.3 “Hello, world!” JavaScript alert

Notice that an alert box pops up, because the alert method was used. You can get the

message to appear in the browser window by using document.write(‘Hello, world!’); instead. One question you might have is, how did the browser know that it’s reading JavaScript and not some other script? The <script> tag tells HTML that a script is coming; by default, HTML assumes that it’s JavaScript.

The syntax for Python is even simpler. All that’s needed is the following:

print(“Hello, world!”)

As I’ve said before, don’t worry about memorizing the code or specific methods. Know a

few examples of scripting languages, and remember that they are usually short, don’t

need a compiler, and typically execute a list of tasks.

Scripted Languages

While the terms scripted languages and scripting languages are similar, they do

different things. A scripted language doesn’t operate on its own as scripting languages

do, but instead it needs a command interpreter to be built into the program. Scripted

languages are often used to modify video games—that is, to add functionality above and

beyond what the initial developers created—without altering the core of the game itself.

Common scripted languages include Lua and Lisp.

Query Languages

Of the four language categories, query languages are the most unlike the others. The

term query is synonymous with question, and a query language is specialized to ask

questions. Specifically, query languages are designed to retrieve data from databases.

The most common query language, by far, is Structured Query Language (SQL). Another

example is the Lightweight Directory Access Protocol (LDAP), which is designed to

query directory services such as Microsoft’s Active Directory.

  Microsoft’s Active Directory is beyond the scope of this book, but essentially it’s the database on a Microsoft server that stores all of the information about

users and security.

Basic SQL syntax is straightforward. To ask the database for data, a SELECT statement is used, followed by the specifics of what you want and where you want it from. Queries

often follow a structure like this:

SELECT column_name

FROM table_name

WHERE condition

While the basic syntax is relatively simple, it can get a lot more complicated. SQL allows

for insertion of data into tables, joining data from multiple tables, and several different

types of operators, such as finding the minimum and maximum, counting, finding

averages, and summing values. Interfacing with databases is IT Fundamentals+ (ITF+)

Exam Objective 5.3, and it will be covered in more detail in Chapter 7, “Database

Fundamentals.”

One Is the Loneliest Number

Throughout this chapter, I’ve talked about different categories of programming

languages and what they are used for. In the real world, most programs actually use

code from multiple languages to complement each other’s functionality. Some languages

are simply better at certain tasks than others.

For example, as mentioned earlier, Windows 10 uses three languages: the kernel is

written in C, most of the code is in C++, and some custom Intel assembly is also

included. If any application needs to get data from a database, it’s most likely using SQL

in addition to the language in which it was coded. Websites use HTML and most likely a

scripting language, such as JavaScript or Python. YouTube uses both JavaScript and

Python.

Using a single language for a program seems to be more of the exception than the norm.

If you want to get into coding, it’s good to have proficiency in multiple languages to

expand your effectiveness.

Understanding Programming Concepts and Techniques

A programmer’s goal is to get a computer to do what he or she wants it to do. Think just a minute about the variety of things that different programs do. Operating systems are massive and complex and perform thousands of different tasks, from accepting input and providing output to managing and monitoring devices to establishing connections to remote computers. Each of these tasks is handled by different parts of the program, so you can see why Windows reportedly has 65 million lines of code. Even relatively simple programs can easily contain several thousands of lines.

One of the ways that programmers can simplify their work is to reuse sections of code. For example, imagine that a program needs to perform a mathematical calculation. The developer would write code for it. Later in the program, if the same calculation is needed, the programmer can simply reference the previous section of code as opposed to needing to rewrite the whole thing. Many programs are kind of like Frankenstein in nature. Blocks of code perform specific tasks, and the developer figures out how to stitch them all together into a finished product. Sometimes it’s elegant, sometimes it’s ugly, and sometimes it’s a bit of each.

In this section, you will learn about concepts that developers use to make their tasks easier. First up is programming logic. This refers to what the program does and includes topics such as logic components, data types, and identifiers. Then you will learn about organizational techniques such as flowcharts, pseudocode, containers, functions, and objects.

Programming Logic

Processors perform math and logic operations, so it figures that all programs are made up of logic and arithmetic. For example, processors can add or subtract numbers or compare two values to

each other and determine an action to take based upon the result. Essentially, the job of the programmer is to tell the processor what to do based on the results of a logic puzzle.

The two main ways that programs perform logic are branching and looping.

Branching

People use a lot of branching logic in their daily lives. For example, if you’re driving a

car, your brain probably follows a simple process when it comes to traffic lights. If the

light is red, you stop. From a programmatic standpoint, this is represented as a simple

if... then statement. If a certain condition exists (red light), then take a specific action (stop the car). “But, wait!” you might say, “What if the light is green?” The logic

then used is called if... then, else. If the light is red, stop, else go. This is an example with only one choice to make based on two discrete conditions. Branching can

handle many more conditions as well.

For example, most traffic lights have an amber light too, telling you to slow down

because a red light is imminent. The processor now needs to take an action based on one

of three conditions. You can tell it to do that by using else if statements. Here’s an example:

if light = red, then stop else if light = green, then go else if light = amber, then slow down

This pseudocode isn’t for a real programming language, but it illustrates how the logic

works. Computers, as you know, deal only in data and not stoplights. Instead of

comparing the colors of a stoplight, a processor compares two different pieces of data.

Table 6.4 lists common data types with which you need to be familiar.

TABLE 6.4 Common data types

Type Explanation Examples

Char One character, such as a UTF-16 or UTF-32 character A or a

String Zero or more characters “This is a string” and “S0

is th1$.”

Integ

er

Whole number with no decimal point 5 or 500000

Floats Any number with a decimal place 5.2 or 5.000001

Boole

an

A true or false condition, usually represented by a 1

(true) or 0 (false)

1 or 0

There can be some overlap between the definitions, meaning that some data could

appear to be multiple data types. For example, a string can consist of only one

character, making it look like a char. Numbers designated as floats might not have a

decimal place, whereas an integer definitely does not have any decimal places. A

Boolean data type is always true or false; if its output is represented as a 1 or 0, it could

look like an integer. Regardless of what it looks like, the data type is what it was defined

to be within a program.

Boolean values are particularly important because they often directly control computer

logic. If the processor compares two values and based on the condition, the result is true

(for example, is random integer 1 greater than random integer 2), then the program will

follow a specific path. If false (random integer 1 is not greater than random integer 2),

then the program will go down a different path.

Here’s another example using if, else if statements, this time comparing integers. Assume that you’re creating a program that needs to categorize people based on their

age. If the person is younger than 13, they are a child. Anyone younger than 20 is a teen,

younger than 65 is an adult, and older than 65 is a senior. What would that logic look

like? It could look something like this:

if age < 13, then category “Child” else if age < 20, then category “Teen” else if age < 65, then category “Adult” else category “Senior”

Based on an input to the program, the person will be categorized appropriately.

There’s one other concept to introduce here, and it’s that of identifiers. In the example

of age classification, the numbers defining the categories are set. A set or predefined

number like this is called a constant. As you might imagine based on its name, this

means that it doesn’t change. The other type of identifier is a variable, which simply

means that it can change.

There’s one other concept to introduce here, and it’s that of identifiers. In the example of age classification, the numbers defining the categories are set. A set or predefined number like this is called a constant. As you might imagine based on its name, this means that it doesn’t change. The other type of identifier is a variable, which simply means that it can change.

Imagine a scenario where instead of putting people into a category based on their age, the program is designed to determine who is older. The age values of two people will be entered, and the program is supposed to say if person 1 is older than person 2. Since the ages of the people are not predetermined, they are variable. The logic could look something like this:

if age (person 1) > age (person 2), then “Older” else if age (person 1) = age (person 2), then “Same” else “Younger”

Variables can also change throughout a program. For example, think of a program designed to count sums of money. A variable can be defined early in the program to accept the first input. Then, when the second amount of money is input (again, a variable amount), the program adds the two values together and uses the sum as a third variable.

Branching logic statements are used to compare variables and constants to other variables and constants and can be used across different data types. The Boolean output of the comparison (true or false) is used to determine the path that the program takes. Branches are good for simple logical comparisons, but more complicated activities might require different logic.

Looping

As its name implies, looping logic is circular rather than linear like branching logic. At the center of looping is the while statement. Looping is useful for monitoring a state within a program, and then invoking an action when that state changes. Think back to the simple stoplight example given earlier. In human terms, the logic can basically be, “While the stoplight is red, stop; otherwise, go.” While the light is red, you keep repeating the loop until the condition changes, and once it changes, you take a different action.

You can also use looping for a counting function. For example, while x < 10 is true, count x + 1. This pseudocode example uses a few concepts you’ve learned in this chapter, such as variables (x) and Boolean (true).

Loops can be powerful tools, but they can also be problematic. If not coded properly, the program can end up in an infinite do while loop. (Sometimes you hear people say that they don’t want to get stuck in a “do loop,” and this is where that reference comes from.) An example would be using a loop to count but then neglecting to tell it when to stop. To avoid this, the developer must ensure that they provide proper exit conditions for the loop.

  Loops and branches can (and often are) used together to perform more complex operations.

Organizing Code

Earlier in this chapter, I mentioned that programs often have a Frankenstein feel to them. This is because when programmers sit down to write code, they think about it in terms of the tasks to be done and what it takes to accomplish those tasks. This compartmentalization is one of the key organizing principles that developers follow. Breaking something like Windows down into manageable chunks lets the developer focus on exactly what is needed for a small, discrete task rather than feeling overwhelmed by the enormity of it all.

As you can imagine, organizing the code is important, but it’s also important for the developer to be organized outside of the code; that is, to lay out a blueprint of how the program will function so the developer can construct it or to have people on the team build various parts of it. The following sections cover different organizational concepts.

Flowcharts

A flowchart is a visual representation of a program that uses boxes to represent the logic. Flowcharts are critical in the software development process. They should be created before the code is developed, much like blueprints should be drawn before building a house. They help the developers visualize the flow of the program, making it easier to plan out the sections of code needed.

Flowcharts show the sequence of operations within a program, including where data input is needed as well as decisions to be made and the logic choices. Different shapes are used to indicate different components. For example, a rectangle is used to indicate a process, whereas a diamond is used to show a decision point.

There are various flowchart software packages on the market, and you can also use Microsoft Word or PowerPoint to create simple ones. Figure 6.4 shows a simple flowchart for a program designed to find the smallest of three values.

FIGURE 6.4 Flowchart

Pseudocode

True to its name, pseudocode is literally fake code. It’s fake in the sense that it’s not read

by the processor and has no effect on the functioning of the program. It can be helpful to

people trying to read the code, however.

I’ve used the term a few times in this chapter already, in describing examples of logic.

The pseudocode I wrote wouldn’t be recognized by any interpreter or compiler, but it

was intended for ease of reading and understanding. Another example of pseudocode is

for comments in programs. Comments can be used to describe the purpose of a line or

block of code or to show a mathematical formula that might be incredibly complicated

and hence harder to read in the actual code.

  A joke among programmers is that some modern scripting languages, like Python, use such simple syntax that you could write a Python script in pseudocode and

it would still work.

Pseudocode can be particularly helpful in situations where a team of programmers is

responsible for one program. It can be challenging for a developer who didn’t write the

code to figure out what’s wrong if there’s a problem. Having well-annotated code can

help the troubleshooter understand what the original developer intended when creating

the code.

Containers

In real life, containers hold things—in computer programming, they do the same thing.

Earlier in this chapter you learned about variables, which are values that can change

based on different input or conditions within a program. When developers define a

variable, they’re allowed to specify only one value for that variable. Using containers,

multiple values of similar types can be grouped together and accessed at the same time.

There are two types of containers about which you should be aware: arrays and vectors.

An array is simply a list of values. There are a few key defining factors to an array. First,

all of the elements in an array must be of the same data type. Second, the array is

predefined in size and does not change.

A vector also holds a list of values. However, the values do not need to be of the same

data type, and vectors can be dynamically allocated, meaning that they can shrink or

grow as the program requires. Otherwise, vectors behave very similarly to arrays (in the

sense that they store values and allow for their retrieval). Because vectors are

dynamically sized, they are far more versatile than arrays and are the preferred

container type.

Functions

Developers often employ sections of reusable code, and that is what functions are. When

a particular set of instructions is needed, a function block can be created to accomplish

the task. Whenever that task is needed, the program references that function. Functions

are generally designed to take input, transform it somehow, and deliver output.

Functions are linear in nature, meaning that they take input, process it, and then deliver

output. It’s not to say they can’t have looping logic in them but rather that the function

starts at the beginning of the code block and finishes at the end, handing off to another

process.

Here’s an analogy to help illustrate how functions work—the command prompt on older

operating systems. The command prompt just waits for user input. The computer won’t

perform any other tasks while it sits and waits, and it has an indefinite supply of

patience (unless, of course, it loses power). A user enters a command, and then the

computer springs into action. It performs one or more tasks based on the command.

Maybe it opens a file, changes permissions on a folder, or creates a network connection

with a server. It completes that task, perhaps produces some output, and then gives the

user another command prompt. It’s then ready for another task, and it will be patient.

Objects

Since I just provided an analogy for how functions work, it’s a good time to do the same

for objects. As you’re probably aware, modern operating systems do not provide a

simple command prompt for input. In fact, the graphical user interface (GUI) of an

operating system has dozens of ways in which you can “enter” into functionality with it.

There are icons on the screen, some sort of launcher like the Start button, more icons on

the task bar, and a clock and random items in the system tray. You can click any of them

and make something happen, or you can right-click the screen and make something

happen too. Even better, you can click multiple items and have several processes

running at once. (Okay, technically, only one process ever runs at a time, but they switch

back and forth so quickly that it appears to us like they all run at the same time.)

How does this analogy relate to an object? In a few ways. First, each item on your

desktop with which you can interact is considered its own object, each with its own set

of properties and attributes. Second, thinking of the GUI, you can begin interacting with

it at any number of entry points and stop interacting with it at any number of places as

well. There isn’t just one specific entry and exit point.

Now that you have a rough idea of how objects work, here’s the definition: Objects are

collections of attributes, properties, and methods that can be queried or called upon to

perform a task. Said differently, an object can be a variable, function, method, or data

structure that can be referenced.

Objects have properties and attributes. The words are synonyms, but there are

differences. The term properties describes the characteristics of the object, while

attributes refers to additional information about an object. As a specific example, people

have a property called height. However, height can be expressed in several different

ways, such as 5′ 8″, 173 centimeters, or 1.73 meters, depending on the attribute used.

Properties and attributes are used in different ways in coding. First, a property can be

different data types, such as a string, integer, or Boolean. Properties of an object can be

modified through code. Attributes only have the data type of string and can’t be

changed. If you attempt to modify the value of an attribute and then ask to display the

attribute’s value, the program will return the default value.

If the terms properties and attributes are confusing to you, know that you’re not alone.

Asking for the difference between them is a challenging question—even for experienced

programmers!

When referring to methods, think of them like functions for objects. They are ways to

organize several tasks together to perform an operation.

Objects and Classes and Methods, Oh My!

A major feature of object-oriented programming (OOP) languages such as Java, C++,

C#, Python, PHP, Perl, and Ruby is the use of objects. As you learned earlier, an object is

a collection of properties and attributes.

In most OOP languages, objects consist of three things:

● Identity, which is the name of the object ● State, which is represented by attributes and reflects properties of the object ● Behavior, which determines the response of the object and is represented by

methods

For example, let’s say that a dog is an object. The dog has an identity, which is its name.

In theory, you can use the name of the dog to interact with it. (I say in theory, because

the dog may choose to ignore you.) The dog has a state, such as its breed, height, color,

and coat length, and it has behaviors such as eating, playing, sleeping, and barking.

Multiple behaviors can be called from a method. For example, playing and barking

might belong to a method called “fun time.”

OOP languages also use the concept of classes. A class is the blueprint for objects—it

describes an object’s state and behaviors. In computer terms, the class describes what

the object can hold or do. To create an object, you need a class from which to create it.

Any introduction to objects needs to include the mention of classes, so now you know!

Summary

In this chapter, you learned about software development concepts. To begin, you

explored four categories of programming languages. The first was assembly, which is a

low-level language that is used to access hardware directly. While learning about

assembly, you also learned about notational systems such as binary, hexadecimal,

decimal, ASCII, and Unicode. The second grouping was compiled languages. Compiled

languages are high-level