stub section

This subchapter is a stub section. It will be filled in with instructional material later. For now it serves the purpose of a place holder for the order of instruction.

Professors are invited to give feedback on both the proposed contents and the propsed order of this text book. Send commentary to Milo, PO Box 1361, Tustin, California, 92781, USA.

data types

We briefly looked at declaring data types for variable declarations so that you could get started on programming. Now we will look at the data types in detail.

I have decided to place all of the reference material regarding data types into this chapter (although professors might strip out the excess) in addition to the instructional material appropriate at this point. This will allow this chapter to serve as an easy to find reference as you continue to learn computer prorgamming.

kinds of data types

Data types eventually get translated into a format that the computer “understands”. The lowest level data types are typically things like individual bits (or groups of bits), numbers, and characters (and strings of characters).

Most modern programming languages offer more abstract data types, but these abstract data types are ultimately built on whatever native data types are avaiolable on the underlying computer.

The following list contains some of the data types discussed in this chapter. You may want to briefly look ahead.

strong and weak types

A programming language is considered to have strong type if the type declarations are required and are strongly enforced.

A programming language is considered to have weak type if it doesn’t require type declarations and freely allows the programmer to treat any piece of data as any kind of data type.

Note that the description of strong and weak type above is informal. More exact definitions will come along later.

Also note that few programming languages are pure strong or weak, but instead somewhere on the continuum between the two extremes.

Dennis Ritchie said, “C is a strongly typed, weakly checked language.”

valid type identifiers

Types may be named by a valid identifier.

Quick summary of the rules for building valid type identifiers in several major languages, using regular expressions:

Ada	[a-zA-Z](_?[a-zA-Z0-9])*
ALGOL-68	[a-z][a-z0-9 ]*
Awk	[_a-zA-Z][_a-zA-Z0-9]*
B	[_a-zA-Z][_a-zA-Z0-9]*
BourneShell	[_a-zA-Z0-9]+
C	[_a-zA-Z][_a-zA-Z0-9]*
C#	[_a-zA-Z][_a-zA-Z0-9]*
C++	[_a-zA-Z][_a-zA-Z0-9]*
COBOL	[a-zA-Z][a-zA-Z0-9-]* 30 character maximum
Classic REXX	[a-zA-Z!?@#][a-zA-Z0-9!?@#]*
Common Lisp	anything without a space and is not a number
E	[_a-zA-Z][_a-zA-Z0-9]*
Eiffel	[a-zA-Z][_a-zA-Z0-9]*
F#	[_a-zA-Z][_a-zA-Z0-9']*
FORTRAN	[A-Z][A-Z0-9]* maximum of six characters
Forth	anything without a space and is not a number
GNU-bc	[a-z][a-z0-9_]*
Haskell	[_A-Z][_a-zA-Z0-9']*
Java	[_a-zA-Z$][_a-zA-Z0-9$]*
JavaScript	[_a-zA-Z$][_a-zA-Z0-9$]*
Lisp	anything without a space and is not a number
Maple	[_a-zA-Z][_a-zA-Z0-9]*
Mathematica	[a-zA-Z][a-zA-Z0-9]*
Matlab	[a-zA-Z][_a-zA-Z0-9]*
Mercury	[_a-z][_a-zA-Z0-9']*
merd	[_a-z][_a-zA-Z0-9][!?']
Modula-3	[a-zA-Z][_a-zA-Z0-9]*
MUMPS	[a-zA-Z%][a-zA-Z0-9]*
OCaml	[_a-z][_a-zA-Z0-9']*
Pascal	[a-zA-Z][a-zA-Z0-9]*
Perl	[_a-zA-Z0-9]+
Perl6	[_a-zA-Z0-9]+
PHP	[_a-zA-Z][_a-zA-Z0-9]*
PL/I	[a-zA-Z][a-zA-Z0-9]*
Pliant	[_a-zA-Z][_a-zA-Z0-9]* or '[^']*'
Prolog	[_A-Z][_a-zA-Z0-9]*
Python	[_a-zA-Z][_a-zA-Z0-9]*
Rebol	[_a-zA-Z?!.'+&\|=~-][_a-zA-Z0-9?!.'+&\|=~-]* or [^0-9[](){}":;/][^ \n\t[](){}":;/]*
Ruby	[_a-z][_a-zA-Z0-9]*
Scheme	[_a-zA-Z!0&/:<=>?^][_a-zA-Z!0&/:<=>?^0-9.+-]*
SmallTalk	[a-zA-Z][a-zA-Z0-9]*
SML	[_a-z][_a-zA-Z0-9']*
Tcl	[_a-zA-Z][_a-zA-Z0-9]*

Python

Built-in data types will run faster than custom data types in Python.

C

Dennis Ritchie, co-creator of the C programming language, said “C is a strongly typed, weakly checked language.”

Stanford C essentials

Stanford CS Education Library This [the following paragraph] is document #101, Essential C, in the Stanford CS Education Library. This and other educational materials are available for free at http://cslibrary.stanford.edu/. This article is free to be used, reproduced, excerpted, retransmitted, or sold so long as this notice is clearly reproduced at its beginning. Copyright 1996-2003, Nick Parlante, nick.parlante@cs.stanford.edu.

C provides a standard, minimal set of basic data types. Sometimes these are called “primitive” types. More complex data structures can be built up from these basic types.

Ada

    “30 Data Types” —:Ada-Europe’s Ada Reference Manual: Introduction: Language Summary See legal information

    “31 Every object in the language has a type, which characterizes a set of values and a set of applicable operations. The main classes of types are elementary types (comprising enumeration, numeric, and access types) and composite types (including array and record types).” —:Ada-Europe’s Ada Reference Manual: Introduction: Language Summary See legal information

    “32/2 An enumeration type defines an ordered set of distinct enumeration literals, for example a list of states or an alphabet of characters. The enumeration types Boolean, Character, Wide_Character, and Wide_Wide_Character are predefined.” —:Ada-Europe’s Ada Reference Manual: Introduction: Language Summary See legal information

    “33 Numeric types provide a means of performing exact or approximate numerical computations. Exact computations use integer types, which denote sets of consecutive integers. Approximate computations use either fixed point types, with absolute bounds on the error, or floating point types, with relative bounds on the error. The numeric types Integer, Float, and Duration are predefined.” —:Ada-Europe’s Ada Reference Manual: Introduction: Language Summary See legal information

    “34/2 Composite types allow definitions of structured objects with related components. The composite types in the language include arrays and records. An array is an object with indexed components of the same type. A record is an object with named components of possibly different types. Task and protected types are also forms of composite types. The array types String, Wide_String, and Wide_Wide_String are predefined.” —:Ada-Europe’s Ada Reference Manual: Introduction: Language Summary See legal information

    “35 Record, task, and protected types may have special components called discriminants which parameterize the type. Variant record structures that depend on the values of discriminants can be defined within a record type.” —:Ada-Europe’s Ada Reference Manual: Introduction: Language Summary See legal information

    “36 Access types allow the construction of linked data structures. A value of an access type represents a reference to an object declared as aliased or to an object created by the evaluation of an allocator. Several variables of an access type may designate the same object, and components of one object may designate the same or other objects. Both the elements in such linked data structures and their relation to other elements can be altered during program execution. Access types also permit references to subprograms to be stored, passed as parameters, and ultimately dereferenced as part of an indirect call.” —:Ada-Europe’s Ada Reference Manual: Introduction: Language Summary See legal information

    “37 Private types permit restricted views of a type. A private type can be defined in a package so that only the logically necessary properties are made visible to the users of the type. The full structural details that are externally irrelevant are then only available within the package and any child units.” —:Ada-Europe’s Ada Reference Manual: Introduction: Language Summary See legal information

    “38 From any type a new type may be defined by derivation. A type, together with its derivatives (both direct and indirect) form a derivation class. Class-wide operations may be defined that accept as a parameter an operand of any type in a derivation class. For record and private types, the derivatives may be extensions of the parent type. Types that support these object-oriented capabilities of class-wide operations and type extension must be tagged, so that the specific type of an operand within a derivation class can be identified at run time. When an operation of a tagged type is applied to an operand whose specific type is not known until run time, implicit dispatching is performed based on the tag of the operand.” —:Ada-Europe’s Ada Reference Manual: Introduction: Language Summary See legal information

    “38.1/2 Interface types provide abstract models from which other interfaces and types may be composed and derived. This provides a reliable form of multiple inheritance. Interface types may also be implemented by task types and protected types thereby enabling concurrent programming and inheritance to be merged.” —:Ada-Europe’s Ada Reference Manual: Introduction: Language Summary See legal information

    “39 The concept of a type is further refined by the concept of a subtype, whereby a user can constrain the set of allowed values of a type. Subtypes can be used to define subranges of scalar types, arrays with a limited set of index values, and records and private types with particular discriminant values.” —:Ada-Europe’s Ada Reference Manual: Introduction: Language Summary See legal information

    “40 Other facilities” —:Ada-Europe’s Ada Reference Manual: Introduction: Language Summary See legal information

    “41/2 Aspect clauses can be used to specify the mapping between types and features of an underlying machine. For example, the user can specify that objects of a given type must be represented with a given number of bits, or that the components of a record are to be represented using a given storage layout. Other features allow the controlled use of low level, nonportable, or implementation-dependent aspects, including the direct insertion of machine code.” —:Ada-Europe’s Ada Reference Manual: Introduction: Language Summary See legal information

    “2 The language also covers systems programming; this requires precise control over the representation of data and access to system-dependent properties.” —:Ada-Europe’s Ada Reference Manual: Section 1: General See legal information

Ruby

There are no data types in Ruby. Instead there are objects, as Ruby is exclusively an Object Oriented Programming language.

JOVIAL

The following material is from the unclassified Computer Programming Manual for the JOVIAL (J73) Language, RADC-TR-81-143, Final Technical Report of June 1981.

    Permissible data structures are simple items, structured tables
    os simple items, and composite data blocks containing simple
    items and tables.

    Types of data in data structures can be signed or unsigned
    integers; enumeration values, floating point numbers, fixed point
    (fractional) numbers, character strings, bits strings (logical),
    and pointers (address of data objects).

Chapter 1 INTRODUCTION, page 1

    1.1.1 Values

    The kinds of values provided by JOVIAL reflect the applications
    of the language; they are oriented toward engineering and contrl
    programming rather than, for example, commercial and business
    programming.  The JOVIAL values are:
    1. Integer values …
    2. Floating values …
    3. Fixed values …
    4. Bit-string values …
    5. Character-string values …
    6. Status values …
    7. Pointer values …
    8. Table values …
    9. Block values …

Chapter 1 INTRODUCTION, page 2-3

    1.1.5 Built-In Functions

    The JOVIAL built-in functions provide advanced, specialized
    operations that are not covered by the JOVIAL operators.

         BITSIZE(x)     Logical size of x in bits
         BYTESIZE(x)    Logical size of x in bytes
         WORDSIZE(x)    Logical size of x in words

Chapter 1 Introduction, page 9

assembly language instructions

data representation

Most data structures are abstract structures and are implemented by the programmer with a series of assembly language instructions. Many cardinal data types (bits, bit strings, bit slices, binary integers, binary floating point numbers, binary encoded decimals, binary addresses, characters, etc.) are implemented directly in hardware for at least parts of the instruction set. Some processors also implement some data structures in hardware for some instructions — for example, most processors have a few instructions for directly manipulating character strings.

An assembly language programmer has to know how the hardware implements these cardinal data types. Some examples: Two basic issues are bit ordering (big endian or little endian) and number of bits (or bytes). The assembly language programmer must also pay attention to word length and optimum (or required) addressing boundaries. Composite data types will also include details of hardware implementation, such as how many bits of mantissa, characteristic, and sign, as well as their order. In many cases there are machine specific encodings for some data types, as well as choice of character codes (such as ASCII or EBCDIC) for character and string implementations.

data size

The basic building block is the bit, which can contain a single piece of binary data (true/false, zero/one, north/south, positive/negative, high/low, etc.).

Bits are organized into larger groupings to store values encoded in binary bits. The most basic grouping is the byte. A byte is the smallest normally addressable quantum of main memory (which can be different than the minimum amount of memory fetched at one time). In modern computers this is almost always an eight bit byte, so much so that many skilled programmers believe that a byte is defined as being always eight bits. In the past there have been computers with bytes of six, seven, eight, twelve, and sixteen bits (six, eight, and twelve were the most common choices). There have also been bit slice computers where the common memory addressing approach is by single bit; in these kinds of computers the term byte actually has no meaning, although eight bits on these computers are likely to be called a byte. Throughout the rest of this discussion, assume the standard eight bit byte applies unless specifically stated otherwise.

Most early computers were arranged for either decimal digits (which line up in four bits with a few wasted combinations) and six bit characters. The six bit character codes were derived from the popular six-bit character codes (especially Baudot code) used in the teleptype industry (which used machines over telegraph lines). Because teletype equipment was widely available, it made sense to cut costs by using the existing I/O hardware.

The IBM 360 project made the leap to an eight-bit character code (EBCDIC) so that they could include lower case letters (the teletype machines provided upper cases letters, decimal digits, and a few punctuation marks — you may recognize from telegrams in old movies).

A nibble is half a byte, or four bits.

A word is the default data size for a processor. The default size does not apply in all cases. The word size is chosen by the processor’s designer(s) and reflects some basic hardware issues (such as internal or external buses). The most common word sizes are 16 and 32, but words have ranged from 16 to 60 bits. Typically there will be additional data sizes that are defined relative to the size of a word: halfword, half the size of a word; longword, usually double the size of a word; doubleword, usually double the size of a word (sometimes double the size of a longword); and quadword, four times the size of a word. Whether or not there is a space between the size designation and “word” is designated by the manufacturer, and varies by processor.

Some processors require that data be aligned. That is, two byte quantities must start on byte addresses that are multiples of two; four byte quantities must start on byte addresses that are multiples of four; etc. The general rule follows a progression of exponents of two (2, 4, 8, 16, ƒ). Some processors allow data to be unaligned, but this usually results in a slow down in performance.

DEC VAX 16 bit [2 byte] word; 32 bit [4 byte] longword; 64 bit [8 byte] quadword; 132 bit [16 byte] octaword; data may be unaligned at a speed penalty
IBM 360/370 32 bit [4 byte] word or full word (which is the smallest amount of data that can be fetched, with words being addresses by the highest order byte); 16 bit [2 byte] half-word; 64 bit [8 byte] double word; all data must be aligned on full word boundaries
Intel 80x86 16 bit [2 byte] word; 32 bit [4 byte] doubleword; data may be unaligned at a speed penalty
MIX byte of unspecified size, must work for both binary and decimal operations without programmer knowledge of size of byte, must be able to contain the values 0 to 63, inclusive, and must not hold more than 100 distinct values, six bits on a binary implementation, two digits on a decimal implementation; word is one sign and five bytes
Motorola 680x0 8 bit byte; 16 bit [2 byte] word; 32 bit [4 byte] long or long word; 64 bit [8 byte] quad word; data may be unaligned at a speed penalty, instructions must be on word boundaries
Motorola 68300 8 bit byte; 16 bit [2 byte] word; 32 bit [4 byte] long or long word; 64 bit [8 byte] quad word; data may be unaligned at a speed penalty, instructions must be on word boundaries

endian

Endian is the ordering of bytes in multibyte scalar data. The term comes from Jonathan Swift’s Gulliver’s Travels. For a given multibyte scalar value, big- and little-endian formats are byte-reversed mappings of each other. While processors handle endian issues invisibly when making multibyte memory accesses, knowledge of endian is vital when directly manipulating individual bytes of multibyte scalar data and when moving data across hardware platforms.

Big endian stores scalars in their “natural order”, with most significant byte in the lowest numeric byte address. Examples of big endian processors are the IBM System 360 and 370, Motorola 680x0, Motorola 68300, and most RISC processors.

Little endian stores scalars with the least significant byte in the lowest numeric byte address. Examples of little endian processors are the Digital VAX and Intel x86 (including Pentium).

Bi-endian processors can run in either big endian or little endian mode under software control. An example is the Motorola/IBM PowerPC, which has two separate bits in the Machine State Register (MSR) for controlling endian: the ILE bit controls endian during interrupts and the LE bit controls endian for all other processes. Big endian is the default for the PowerPC.

chapter contents

free music player coding example

Coding example: I am making heavily documented and explained open source code for a method to play music for free — almost any song, no subscription fees, no download costs, no advertisements, all completely legal. This is done by building a front-end to YouTube (which checks the copyright permissions for you).

View music player in action: www.musicinpublic.com/.

Create your own copy from the original source code/ (presented for learning programming).

view text book
HTML file

Because I no longer have the computer and software to make PDFs, the book is available as an HTML file, which you can convert into a PDF.

†UNIX used as a generic term unless specifically used as a trademark (such as in the phrase “UNIX certified”). UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Company Ltd.

OSdata.com

data types

summary

free computer programming text book project

stub section

data types

kinds of data types

strong and weak types

valid type identifiers

Python

C

Stanford C essentials

Ada

Ruby

JOVIAL

assembly language instructions

data representation

data size

endian

chapter contents

free music player coding example

view text book
HTML file

free computer programming text book project

OSdata.com

data types

summary

free computer programming text book project

stub section

data types

kinds of data types

strong and weak types

valid type identifiers

Python

C

Stanford C essentials

Ada

Ruby

JOVIAL

assembly language instructions

data representation

data size

endian

chapter contents

free music player coding example

view text bookHTML file

free computer programming text book project

view text book
HTML file