stub section
This subchapter is a stub section. It will be filled in with instructional material later. For now it serves the purpose of a place holder for the order of instruction.
Professors are invited to give feedback on both the proposed contents and the propsed order of this text book. Send commentary to Milo, PO Box 1361, Tustin, California, 92781, USA.
integer data type
The computer integer data type is based on the mathematical concept of an integer.
Most modern computers store integers as binary integers. Some early computers used decimal integers and many modern CISCs still provide limited support for binary coded decimals.
integer type
Most programming languages have an integer type. This is a computer representation of the mathematical integers (counting numbers, zero, and negative integers).
Unlike mathematical integers, computer integers have a range, a maximum (largest) and minimum (smallest negative) number.
Note that negative integers are indicated with a negative sign (such as -3), while positive integers are indicated by the lack of a sign (such as 3).
Unlike normal written numbers, you leave out the commas when writing numbers in a computer program. The number 1,000,000 (one million) is written 1000000. Adding the commas will confuse your compiler.
The following material is from the unclassified Computer Programming Manual for the JOVIAL (J73) Language, RADC-TR-81-143, Final Technical Report of June 1981.
The kinds of values provided by JOVIAL reflect the applications
of the language; they are oriented toward engineering and contrl
programming rather than, for example, commercial and business
programming. The JOVIAL values are:
1. Integer values, which are signed of unsigned whole
numbers. They are used for counting. For example, an
integer can be used to count the number of times a loop
is repeated or the number of checks performed on a
process.
Chapter 1 INTRODUCTION, page 2
In ALGOL 68 the integer mode is declared with the reserved word int.
int FinalAverage;
In C the integer type is declared with the reserved word int.
int age;
Stanford C essentials
Stanford CS Education Library This [the following section until marked as end of Stanford University items] is document #101, Essential C, in the Stanford CS Education Library. This and other educational materials are available for free at http://cslibrary.stanford.edu/. This article is free to be used, reproduced, excerpted, retransmitted, or sold so long as this notice is clearly reproduced at its beginning. Copyright 1996-2003, Nick Parlante, nick.parlante@cs.stanford.edu.
Integer Types
The integral types in C form a family of integer types. They all behave like integers and can be mixed together and used in similar ways. The differences are due to the different number of bits (widths) used to implement each type -- the wider types can store a greater ranges of values.
- char
- ASCII character -- at least 8 bits. Pronounced car. As a practical matter char is basically always a byte which is 8 bits which is enough to store a single ASCII character. 8 bits provides a signed range of -128..127 or an unsigned range is 0..255. char is also required to be the smallest addressable unit for the machine -- each byte in memory has its own address.
- short
- Small integer -- at least 16 bits which provides a signed range of -32768..32767. Typical size is 16 bits. Not used so much.
- int
- Default integer -- at least 16 bits, with 32 bits being typical. Defined to be the most comfortable size for the computer. If you do not really care about the range for an integer variable, declare it int since that is likely to be an appropriate size (16 or 32 bit) which works well for that machine.
- long
- Large integer -- at least 32 bits. Typical size is 32 bits which gives a signed range of about -2 billion ..+2 billion. Some compilers support long long for 64 bit ints.
The integer types can be preceded by the qualifier unsigned which disallows representing negative numbers, but doubles the largest positive number representable. For example, a 16 bit implementation of short can store numbers in the range -32768..32767, while unsigned short can store 0..65535. You can think of pointers as being a form of unsigned long on a machine with 4 byte pointers. In my opinion, its best to avoid using unsigned unless you really need to. It tends to cause more misunderstandings and problems than it is worth.
Extra: Portability Problems
Instead of defining the exact sizes of the integer types, C defines lower bounds. This makes it easier to implement C compilers on a wide range of hardware. Unfortunately it occasionally leads to bugs where a program runs differently on a 16-bit-int machine than it runs on a 32-bit-int machine. In particular, if you are designing a function that will be implemented on several different machines, it is a good idea to use typedefs to set up types like Int32 for 32 bit int and Int16 for 16 bit int. That way you can prototype a function Foo(Int32) and be confident that the typedefs for each machine will be set so that the function really takes exactly a 32 bit int. That way the code will behave the same on all the different machines.
int Constants
Numbers in the source code such as 234 default to type int. They may be followed by an L (upper or lower case) to designate that the constant should be a long such as 42L. An integer constant can be written with a leading 0x to indicate that it is expressed in hexadecimal -- 0x10 is way of expressing the number 16. Similarly, a constant may be written in octal by preceding it with 0 -- 012 is a way of expressing the number 10.
Type Combination and Promotion
The integral types may be mixed together in arithmetic expressions since they are all basically just integers with variation in their width. For example, char and int can be combined in arithmetic expressions such as ('b' + 5). How does the compiler deal with the different widths present in such an expression? In such a case, the compiler promotes the smaller type (char) to be the same size as the larger type (int) before combining the values. Promotions are determined at compile time based purely on the types of the values in the expressions. Promotions do not lose information -- they always convert from a type to compatible, larger type to avoid losing information.
Pitfall -- int Overflow
I once had a piece of code which tried to compute the number of bytes in a buffer with the expression (k * 1024) where k was an int representing the number of kilobytes I wanted. Unfortunately this was on a machine where int happened to be 16 bits. Since k and 1024 were both int, there was no promotion. For values of k >= 32, the product was too big to fit in the 16 bit int resulting in an overflow. The compiler can do whatever it wants in overflow situations -- typically the high order bits just vanish. One way to fix the code was to rewrite it as (k * 1024L) -- the long constant forced the promotion of the int. This was not a fun bug to track down -- the expression sure looked reasonable in the source code. Only stepping past the key line in the debugger showed the overflow problem. Professional Programmers Language. This example also demonstrates the way that C only promotes based on the types in an expression. The compiler does not consider the values 32 or 1024 to realize that the operation will overflow (in general, the values dont exist until run time anyway). The compiler just looks at the compile time types, int and int in this case, and thinks everything is fine.
Stanford CS Education Library This [the above section] is document #101, Essential C, in the Stanford CS Education Library. This and other educational materials are available for free at http://cslibrary.stanford.edu/. This article is free to be used, reproduced, excerpted, retransmitted, or sold so long as this notice is clearly reproduced at its beginning. Copyright 1996-2003, Nick Parlante, nick.parlante@cs.stanford.edu.
end of Stanford C essentials
In Pascal the integer type is declared with the reserved word integer.
var Age: Integer;
31 Every object in the language has a type, which characterizes a set of values and a set of applicable operations. The main classes of types are elementary types (comprising enumeration, numeric, and access types) and composite types (including array and record types). :Ada-Europes Ada Reference Manual: Introduction: Language Summary See legal information
33 Numeric types provide a means of performing exact or approximate numerical computations. Exact computations use integer types, which denote sets of consecutive integers. Approximate computations use either fixed point types, with absolute bounds on the error, or floating point types, with relative bounds on the error. The numeric types Integer, Float, and Duration are predefined. :Ada-Europes Ada Reference Manual: Introduction: Language Summary See legal information
There are no data types in Ruby. Instead there are objects, as Ruby is exclusively an Object Oriented Programming language.
Rubys base class for numbers is Numeric.
Rubys numeric class Fixnum holds integers. They are stored as fixed length numbers whose bit length is the underlying native machine word minus one.
Ruby also has a class Bignum for storing multiple precision numbers too large for native machine representation. Numbers are automatically converted from Fixnum to Bignum whenever a result is too large for storage in Fixnum. The only limit on the size of a Bignum is the amount of memory made available by the operaating system.
The following material is from the unclassified Computer Programming Manual for the JOVIAL (J73) Language, RADC-TR-81-143, Final Technical Report of June 1981.
1.1.2 Storage
When a JOVIAL program is executed, each value it operates on is
stored as an item. The item has a name, which is declared and
then used in the program when the value of the item is fetched or
modified.
An item is declared by a JOVIAL statement called a declaration
statement. The declaration provides the compiler with the
information it needs to allocate and access the storage for the
item. Here is a statement that declares an integer item:
ITEM COUNT U 10;
This declaration says that the value of COUNT is an integer that
is stored without a sign in ten or more bits. The notation is
compact: "U" means it is an unsigned integer, "10" means it
requires at least 10 bits. We say "at least" then bits because
the JOVIAL compiler may allocate more than ten bits. (That
allocation wastes a little data space, but can result in faster,
more compact code.)
Chapter 1 INTRODUCTION, page 3
JOVIAL does not require that you give the number of bits in the
declaration of an integer item. If you omit it, JOVIAL supplies
a default value that depends on which implementation of JOVIAL
you are using. An example is:
ITEM TIME S;
This statement declares TIME to be the name of an integer
variable item that is signed and has the default number of bits.
On one implementation of JOVIAL, this would be equivalent to the
declaration:
ITEM TIME S 15;
The item TIME occupies 16 bits (including the sign). On another
implementation, it would be equivalent to:
ITEM TIME S 31;
This and other defaults are defined in the user"s manual for the
implementation of JOVIAL you are using.
In this brief introduction, we cannot consider each kind of item
in detail (as we just did for integer items). Instead, a list of
examples follow, one declaration for each kind of value.
ITEM SIGNAL S 2; A signed integer item, which occupies
at least three bits and accomodates
values from -3 to +3.
Chapter 1 INTRODUCTION, page 4
assembly language instructions
number systems
Binary is a number system using only ones and zeros (or two states).
Decimal is a number system based on ten digits (including zero).
Hexadecimal is a number system based on sixteen digits (including zero).
Octal is a number system based on eight digits (including zero).
Duodecimal is a number system based on twelve digits (including zero).
binary | octal | decimal | duodecimal | hexadecimal |
0 | 0 | 0 | 0 | 0 |
1 | 1 | 1 | 1 | 1 |
10 | 2 | 2 | 2 | 2 |
11 | 3 | 3 | 3 | 3 |
100 | 4 | 4 | 4 | 4 |
101 | 5 | 5 | 5 | 5 |
110 | 6 | 6 | 6 | 6 |
111 | 7 | 7 | 7 | 7 |
1000 | 10 | 8 | 8 | 8 |
1001 | 11 | 9 | 9 | 9 |
1010 | 12 | 10 | A | A |
1011 | 13 | 11 | B | B |
1100 | 14 | 12 | 10 | C |
1101 | 15 | 13 | 11 | D |
1110 | 16 | 14 | 12 | E |
1111 | 17 | 15 | 13 | F |
10000 | 20 | 16 | 14 | 10 |
10001 | 21 | 17 | 15 | 11 |
10010 | 22 | 18 | 16 | 12 |
10011 | 23 | 19 | 17 | 13 |
10100 | 24 | 20 | 18 | 14 |
10101 | 25 | 21 | 19 | 15 |
10110 | 26 | 22 | 1A | 16 |
10111 | 27 | 23 | 1B | 17 |
11000 | 30 | 24 | 20 | 18 |
integer representations
Sign-magnitude is the simplest method for representing signed binary numbers. One bit (by universal convention, the highest order or leftmost bit) is the sign bit, indicating positive or negative, and the remaining bits are the absolute value of the binary integer. Sign-magnitude is simple for representing binary numbers, but has the drawbacks of two different zeros and much more complicates (and therefore, slower) hardware for performing addition, subtraction, and any binary integer operations other than complement (which only requires a sign bit change).
In ones complement representation, positive numbers are represented in the normal manner (same as unsigned integers with a zero sign bit), while negative numbers are represented by complementing all of the bits of the absolute value of the number. Numbers are negated by complementing all bits. Addition of two integers is peformed by treating the numbers as unsigned integers (ignoring sign bit), with a carry out of the leftmost bit position being added to the least significant bit (technically, the carry bit is always added to the least significant bit, but when it is zero, the add has no effect). The ripple effect of adding the carry bit can almost double the time to do an addition. And there are still two zeros, a positive zero (all zero bits) and a negative zero (all one bits).
In twos complement representation, positive numbers are represented in the normal manner (same as unsigned integers with a zero sign bit), while negative numbers are represented by complementing all of the bits of the absolute value of the number and adding one. Negation of a negative number in twos complement representation is accomplished by complementing all of the bits and adding one. Addition is performed by adding the two numbers as unsigned integers and ignoring the carry. Twos complement has the further advantage that there is only one zero (all zero bits). Twos complement representation does result in one more negative number (all one bits) than positive numbers.
Twos complement is used in just about every binary computer ever made. Most processors have one more negative number than positive numbers. Some processors use the extra neagtive number (all one bits) as a special indicator, depicting invalid results, not a number (NaN), or other special codes.
In unsigned representation, only positive numbers are represented. Instead of the high order bit being interpretted as the sign of the integer, the high order bit is part of the number. An unsigned number has one power of two greater range than a signed number (any representation) of the same number of bits.
bit pattern | sign-mag. | ones comp. | twos comp | unsigned |
000 | 0 | 0 | 0 | 0 |
001 | 1 | 1 | 1 | 1 |
010 | 2 | 2 | 2 | 2 |
011 | 3 | 3 | 3 | 3 |
100 | -0 | -3 | -4 | 4 |
101 | -1 | -2 | -3 | 5 |
110 | -2 | -1 | -2 | 6 |
111 | -3 | -0 | -1 | 7 |
See also Data Representation in Assembly Language
accumulators
Accumulators are registers that can be used for arithmetic, logical, shift, rotate, or other similar operations. The first computers typically only had one accumulator. Many times there were related special purpose registers that contained the source data for an accumulator. Accumulators were replaced with data registers and general purpose registers. Accumulators reappeared in the first microprocessors.
- Intel 8086/80286: one word (16 bit) accumulator; named AX (high order byte of the AX register is named AH and low order byte of the AX register is named AL)
- Intel 80386: one doubleword (32 bit) accumulator; named EAX (low order word uses the same names as the accumulator on the Intel 8086 and 80286 [AX] and low order and high order bytes of the low order words of four of the registers use the same names as the accumulator on the Intel 8086 and 80286 [AH and AL])
- MIX: one accumulator; named A-register; five bytes plus sign
data registers
Data registers are used for temporary scratch storage of data, as well as for data manipulations (arithmetic, logic, etc.). In some processors, all data registers act in the same manner, while in other processors different operations are performed are specific registers.
- MIX: one extension register; named X-register; five bytes plus sign; can be concatenated on the right hand side of the A-register (accumulator)
- Motorola 680x0, 68300: 8 longword (32 bit) data registers; named D0, D1, D2, D3, D4, D5, D6, and D7
general purpose registers
General purpose registers can be used as either data or address registers.
- DEC VAX: 16 word (32 bit) general purpose registers; named R0 through R15
- IBM 360/370: 16 full word (32 bit) general purpose registers; named 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A (or 10), B (or 11), C (or 12), D (or 13), E (or 14), and F (or 15)
- Intel 8086/80286: 8 word (16 bit) general purpose registers; named AX, BX, CX, DX, BP, SP, SI, and DI (high order bytes of the AX, BX, CX, and DX registers have the names AH, BH, CH, and DH and low order bytes of the AX, BX, CX, and DX registers have the names AL, BL, CL, and DL)
- Intel 80386: 8 doubleword (32 bit) general purpose registers; named EAX, EBX, ECX, EDX, EBP, ESP, ESI, and EDI (low order words use the same names as the general purpose registers on the Intel 8086 and 80286 and low order and high order bytes of the low order words of four of the registers use the same names as the general purpose registers on the Intel 8086 and 80286)
- Motorola 88100: 32 word (32 bit) general purpose registers; named r0 through r31
constant registers
Constant registers are special read-only registers that store a constant. Attempts to write to a constant register are illegal or ignored. In some RISC processors, constant registers are used to store commonly used values (such as zero, one, or negative one) for example, a constant register containing zero can be used in register to register data moves, providing the equivalent of a clear instruction without adding one to the instruction set. Constant registers are also often used in floating point units to provide such value as pi or e with additional hidden bits for greater accuracy in computations.
- Motorola 88100: r0 (general purpose register 0) contains the constant 32 bit integer zero
See also Registers
free music player coding example
Coding example: I am making heavily documented and explained open source code for a method to play music for free almost any song, no subscription fees, no download costs, no advertisements, all completely legal. This is done by building a front-end to YouTube (which checks the copyright permissions for you).
View music player in action: www.musicinpublic.com/.
Create your own copy from the original source code/ (presented for learning programming).
Because I no longer have the computer and software to make PDFs, the book is available as an HTML file, which you can convert into a PDF.
Names and logos of various OSs are trademarks of their respective owners.