Programming consists of two parts: control structures and data structures. Programming languages implement various control and data structures using a specific syntax and grammar.
OSdata.com is used in more than 300 colleges and universities around the world
Find out how to get similar high web traffic and search engine placement.
Programming languages can be implemented with compilers, assemblers, and interpretters. A compiler converts high level source code into executable object code (also called binary code or machine code). An assembler converts assembly language source code into executable object code. An interpretter translates either source code or tokens into machine code, one instruction at a time, as the program is run. Object oriented programming (as well as the implementation of some kinds of data structures) blurs the line between compilers and interpretters.
nature: The two most common kinds of programming languages are procedural and object oriented.
format: The two basic formats for programming languages are free form and column. Older programming languages were typically designed to fit into the columns of Hollerith punch cards, with specific fields being reserved for specific purposes. Modern programming langauges are free form in the sense that there are no specific columns that various items must line up in.
character set: There are two character sets to consider: the source code character set and the execution character set. The source code character set is the set of characters allowed for writing source code programs. The execution character set is the set of characters recognized and used by running programs. These may be the same character set, but usually are different (usually there are more characters allowed in the execution character set and some languages have mechanisms for specifically handling multiple execution character sets. The source code character set can vary for different parts of the language. Some programming languages have case sensitive source code character sets (that is, identifiers that differ only in case are considered to be different identifiers), while other programming languages have case insensitive source code character sets (that is, identifiers that differ only in case are considered to be the same identifier).
Very few programming languages require a specific character encoding for the execution character set. Some of the common character encodings in use include ASCII, EBCDIC, and UniCode. A common cross-platform programming error is to assume a specific character set. For example, the expression ('Z' - 'A' + 1) will produce the number of letters in the alphabet (26) when either ASCII or UniCode is used, but will produce the answer 41 when EBCDIC is used (because the letters of the alphabet are not encoded consecutively in EBCDIC). Another common error regards the use of the high eight bit characters in ASCII, which are typically different on each different operating system.
white space: White space is any blanks (or space characters) in the source code. Some programming languages define additional characters as white space. These may include horizontal tab, vertical tab, end of line, form feed, carriage return, and line feed. In some programming languages, comments are treated as white space. White space is sometimes compacted into a single space character by the compiler. White space may be optional or required (and whether it is optional or required can vary at different parts of the source code).
line termination: Some modern programming languages ignore line returns, while others use them as dividing lines between instructions (sometimes with a special character to indicate that an instruction continues on more than one line and sometimes with a special character to allow more than one instruction per line).
line length: Some programming languages have fixed line lengths, while others have variable line lengths. Fixed line lengths typically occur with programming languages that use columns.
multibyte characters: Some programming languages have special multibyte encodings for some characters. It is common to have an escape character(s) to allow programmers to indicate special characters (such as non-printable characters, line breaks, characters reserved as special symbols, etc.) in character or string constants. Some programming languages have special character sequences for encoding required characters on older hardware that doesnt support all of the required source code character set. Some programming languages include have special sequences for encoding multi-byte characters (such as non-Roman alphabets).
comments: Every programming language except for machine language (writing in raw binary opcodes) has some system for comments (and comments are sometimes even interspersed as ASCII data between raw binary machine code). Comments are used to to make programs self-documenting for future human maintenance or modification (most of the life of any successful software is in maintenance, not initial programming). Comments are either removed or replaced with white space.
tokens: Tokens are the basic lexical building blocks of source code. Characters are combined into tokens according to the rules of the programming language. There are five classes of tokens: identifiers, reserved words, operators, separators, and constants.
operators: Operators are tokens used to indicate an action to be taken (usually arithmetic operations, logical operations, bit operations, and asignment operations). Operators can be simple operators (a single character token) or compound operators (two or more character tokens).
separators: Separators are tokens used to separate other tokens. Two common kinds of separators are indicators of an end of an instruction and separators used for grouping.
There are three basic kinds of control structures: sequences, branching, and loops. Corrado Böhm and Guiseppe Jacopini presented the proof of the Structured Theorem (which states that any logic problem can be solved with only sequence, choice (IFTHENELSE), and repetition (DOWHILE) structures) in Flow Diagrams, Turing Machines and Languages with Only Two Formation Rules, Communications of the ACM, No. 5, May 1966, pp 366-371.
Sequential structures are structures that are stepped through sequential. These are also called sequences or iterative structures. Basic arithmetic, logical, and bit operations are in this category. Data moves and copies are sequences. Most register and processor control operations are sequences. Some computer scientists group subroutines and functions as a special kind of sequence, while others group them as a special kind of branching structure.
Branching structures consist of direct and indirect jumps (including the infamous GOTO), conditional jumps (IF), nested ifs, and case (or switch) structures.
The basic looping structures are DO iterative, do WHILE, and do UNTIL. An infinite loop is one that has no exit. Normally, infinite loops are programming errors, but event loops and task schedulers are examples of intentional infinite loops.
Object oriented programming does not fit neatly into the three basic categories, because the three basic categories of control structures describe procedural programming. The internal implementation of at least some methods will use the three basic structures, but this is hidden from other objects.
Machine code can be written using the three basic structures, although on some particular processors it may not be possible to write all of the structures as atomic code. Atomic means all as one unit. On some processors it may be necessary to construct some structures with a series of machine instructions instead of with a single machine instruction.
Exception processing also doesnt fall into the three basic categories. Normally, exception processing only occurs in low level assembly or machine language programming. Some high level languages provide on conditions, or the ability to designate routines to be run when an exception or other special event occurs. The actual code that runs during the exception can be written with just the three basic structures.
There are some kinds of structures that dont neatly fall into the above categories. Böhm-Jacopinis theory states that only the three basic structures are necessary, but obviously other structures do exist. Use of these other structures is generally contraindicated (you will almost certainly regret the decision to use them).
All information in binary digital computers is actually stored as a series of binary ones and zeros. Those sequences of binary data are interpretted by the processor, by software, or by humans as having some other meaning (numbers, characters, logic states, or complex data structures).
The following quoted text presents a formal mathematical description of compilers and interpreters. Feel free to skip over this portion.
Programming languages are tools used to construct formal descriptions of finite computations (algorithms). Each computation consists of operations that transform a given initial state into some final state. A programming language provides essentially three components for describing such computations:
- Data types, objects and values with operations defined upon them.
- Rules fixing the chronological relationships among specified operations.
- Rules fixing the (static) structure of a program.
These components together constitute the level of abstraction on which we can formulate algorithms in the language.
The collection of objects existing at a given point in time during the computation constitutes the state, x, of the computation at that time. The set, S, of all states that could occur during computations expressed in the language is called the state space of the language. The meaning of an algorithm is the (partially-defined) function : S -> S by which it transforms initial states to final states. Compiler Construction, by William M. Waite and Gerhard Goos, page 1b5
For every programming language, PL, we can define an abstract machine: The operations, data structures and control structures of PL become the memory elements and instructions of the machine. A Pascal machine is therefore an imaginary computer with Pascal operations as its machine instructions and the data objects possible in Pascal as its memory elements. Execution of an algorithm written in PL on such a machine is called interpretation; the asbtract machine is an interpreter.
A pure interpreter analyzes the character form of each source language instruction every time that instruction is executed. If the given instructionis only executed once, pure interpretation is the least expensive method of all. Hence it is often used for job control languages and the immediate commands of interactive languages. When instructions are to be executed repeatedly, a better approach is to analyze the character form of the source program only once, replacing it with a sequence of symbols more amenable to interpretation. This analysis is simply a translation of the source language into some target language, which is then interpreted.
The translation from the source language to the target language can take place as each instruction of the program is executed for the first time (interpretation with substitution). Thus only that part of the program actually executed will be translated; during testing this may be only a fraction of the entire program. Also, the character form of the source program can often be stored more compactly than the equivalent target program. The disadvantage of interpretation with substitution is that both the compiler and interpreter must be available during execution. In practice, however, a system of this kind should not be significantly larger than a pure interpreter for the same language.
Examples may be found of virtually all levels of interpretation. At one extreme are the systems in which the computer merely converts constants to internal form, fixes the meaning of identifiers and perhaps transforms infix notation to postfix (APL and SNOBOL4 are commonly implemented this way); at the other are the systems in which the hardware, assited by a small run-time system, forms the interpreter (FORTRAN and Pascal implementations usually follow this strategy). Compiler Construction, by William M. Waite and Gerhard Goos, pages 2-4b5
The term compilation denotes the conversion of an algorithm expressed in a human-oriented source language to an equivalent algorithm expressed in a hardware-oriented target language. Compiler Construction, by William M. Waite and Gerhard Goos, page 1b5
A compilation is usually implemented as a sequence of transformations (SL, L1), (L1, L2), , (Lk, TL), where SL is the source language and TL is the target language. Each language Li is called an intermediate language. Intermediate languages are conceptual tools used in decomposing the task of compiling from the source language to the target language. The design of a particular compiler determines which (if any) intermediate language programs actually occur as concrete text or data structures during compilation.
Any compilation can be broken down into two major tasks:
- Analysis: Discover the structure and primitives of the source program, determining its meaning.
- Synthesis: Create a target program equivalent to the source program.
The analysis concerns itself solely with the properties of the source langauge. It converts the program text submitted by the programmer into an abstract representation embodying the essential properties of the algorithm. This abstract representaiton may be implemented in many ways, but it is usually conceptualized as a tree. The structure of the tree represents the control and data flow aspects of the program, and additional information is attached to the nodes to describe other aspects vital to the compilation.
Synthesis proceeds from the abstraction developed during analysis. It augments the tree by attaching additional information that reflects the source-to-target mapping.
Formal definitions of the source language and the source-to-target mapping determine the structure of the tree and the computation of the additional information. The compiler simply implements the indicated transformations.
Analysis is the more formalized of the two major compiler tasks. It is generally broken down into two parts, the structural analysis to determine the static structure of the source program, and the semantic analysis to fix the additional information and check its consistency. Two subtasks of the structural analysis are identified in terms of finite-state automata; syntatic analysis, or parsing.
There is little in the way of formal models for the entire synthesis process, although algorithms for various subtasks are known. We view synthesis as consisting of two distinct subtasks, code generation and assembly. Code generation transforms the abstract source program appearing at the analysis/symthesis interface into an equivalent target machine program. This transformatin is carried out in two steps: First we map the algorithm from source concepts to target concepts, and then we select a specific sequence of target machine instructions to omplement that algorithm.
Assembly resolves all target addressing and converts the target machine instructions into an appropriate output format. We should stress that by using the term assembly we do not imply that the code generator will produce symbolic assmebly code for input to the assembly task. Instead, it delivers an internal representation of target instructions in which most addresses remain unresolved. This representation is similar to that resulting from analysis of symbolic instructions during the first pass of a normal symbolic assembler. The output of the assembly task should be in the format accepted by the standard link editor or loader on the target machine.
Errors may appear at any time during the compilation process. In order to detect as many errors as possible in a single run, repairs must be made such that the program is consistent, even though it may not reflect the programmers intent. Violations of the rules of the source language should be detected and reported during analysis. If the source algorithm uses concepts of the source language for which no target equivalent has been defined in a particular implementation, or if the target algorithm exceeds limitations of a specific target langauge interpretter (e.g. requires more memory than a specific computer provides), this should be reported during synthesis. FInally, errors must be reported if any storage limits of the compiler itself are violated.
In addition to the actual error handling, it is useful for the compiler to provide extra information for run-time error detection and debugging. This task is closely related to error handling. Compiler Construction, by William M. Waite and Gerhard Goos, pages 4-7b5
Programming example: I am making heavily documented and explained open source PHP/MySQL code for a method to play music for free almost any song, no subscription fees, no download costs, no advertisements, all completely legal. This is done by building a front-end to YouTube (which checks the copyright permissions for you).
View music player in action: www.musicinpublic.com/.
Create your own copy from the original source code/ (presented for learning programming). Includes how to run this from your own computer if you dont have a web site.
OSdata.com is used in more than 300 colleges and universities around the world
free downloadable college text book on computer programming.
A web site on dozens of operating systems simply cant be maintained by one person. This is a cooperative effort. If you spot an error in fact, grammar, syntax, or spelling, or a broken link, or have additional information, commentary, or constructive criticism, please e-mail Milo. If you have any extra copies of docs, manuals, or other materials that can assist in accuracy and completeness, please send them to Milo, PO Box 1361, Tustin, CA, USA, 92781.
|previous page||next page|
This web site handcrafted on Macintosh computers using Tom Benders Tex-Edit Plus and served using FreeBSD .
Names and logos of various OSs are trademarks of their respective owners.
Copyright © 2000, 2002, 2007 Milo
Last Updated: October 10, 2007
Created: July 9, 2000
|previous page||next page|