Saturday, 30 July 2011

Command Line Arguments in Java Program



Java application can accept any number of arguments directly from the command line. The user can enter command-line arguments when invoking the application. When running the java program from java command, the arguments are provided after the name of the class separated by space. For example, suppose a program named CmndLineArguments that accept command line arguments as a string array and echo them on standard output device.
java CmndLineArguments Mahendra zero one two three
CmndLineArguments.java
   /**
   * How to use command line arguments in java program.
   */
   class CmndLineArguments {

  public static void main(String[] args) {
    int length = args.length;
    if (length <= 0) {
    System.out.println("You need to enter some arguments.");
    }
   for (int i = 0; i < length; i++) {
    System.out.println(args[i]);
   }
   }
   }
Output of the program:
Run program with some command line arguments like: 
java CmndLineArguments Mahendra zero one two three
OUTPUT
Command line arguments were passed :
Mahendra
zero
one
two
three

Java Virtual Machine

 


Overview of Java Virtual Machine (JVM) architecture. Source code is compiled down to Java bytecode. Any platform running a JVM can execute Java bytecode. Bytecode is verified, then interpreted or JIT-compiled for the native architecture. The Java APIs and JVM together make up the Java Runtime Environment (JRE).
A Java Virtual Machine (JVM) is a virtual machine capable of executing Java bytecode. Sun Microsystems states there are over 4.5 billion JVM-enabled devices.[1]

Overview

A Java Virtual Machine is a piece of software that is implemented on non-virtual hardware and on standard operating systems. A JVM provides an environment in which Java bytecode can be executed, enabling such features as automated exception handling, which provides "root-cause" debugging information for every software error (exception), independent of the source code. A JVM is distributed along with a set of standard class libraries that implement the Java application programming interface (API). Appropriate APIs bundled together form the Java Runtime Environment (JRE).
JVMs are available for many hardware and software platforms. The use of the same bytecode for all JVMs on all platforms allows Java to be described as a "compile once, run anywhere" programming language, as opposed to "write once, compile anywhere", which describes cross-platform compiled languages. Thus, the JVM is a crucial component of the Java platform.
Java bytecode is an intermediate language which is typically compiled from Java, but it can also be compiled from other programming languages. For example, Ada source code can be compiled to Java bytecode and executed on a JVM.
Oracle, the owner of Java, produces a JVM, but JVMs using the "Java" trademark may be developed by other companies as long as they adhere to the JVM specification published by Oracle and to related contractual obligations.

Execution environment

Java's execution environment is termed the Java Runtime Environment, or JRE.
Programs intended to run on a JVM must be compiled into a standardized portable binary format, which typically comes in the form of .class files. A program may consist of many classes in different files. For easier distribution of large programs, multiple class files may be packaged together in a .jar file (short for Java archive).
The Java application launcher, java, offers a standard way of executing Java code. Compare javaw.[2]
The JVM runtime executes .class or .jar files, emulating the JVM instruction set by interpreting it, or using a just-in-time compiler (JIT) such as Oracle's HotSpot. JIT compiling, not interpreting, is used in most JVMs today to achieve greater speed. There are also ahead-of-time compilers that enable developers to precompile class files into native code for particular platforms.
Like most virtual machines, the Java Virtual Machine has a stack-based architecture akin to a microcontroller/microprocessor. However, the JVM also has low-level support for Java-like classes and methods, which amounts to a highly idiosyncratic[clarification needed] memory model and capability-based architecture.

JVM languages

Versions of non-JVM languages
Language
On JVM
Languages designed expressly for JVM
Although the JVM was primarily aimed at running compiled Java programs, many other languages can now run on top of it.[4] The JVM has currently no built-in support for dynamically typed languages: the existing JVM instruction set is statically typed,[5] although the JVM can be used to implement interpreters for dynamic languages. The JVM has a limited support for dynamically modifying existing classes and methods; this currently only works in a debugging environment, where new classes and methods can be added dynamically. Built-in support for dynamic languages is currently planned for Java 7.[6]

Bytecode verifier

A basic philosophy of Java is that it is inherently "safe" from the standpoint that no user program can "crash" the host machine or otherwise interfere inappropriately with other operations on the host machine, and that it is possible to protect certain methods and data structures belonging to "trusted" code from access or corruption by "untrusted" code executing within the same JVM. Furthermore, common programmer errors that often lead to data corruption or unpredictable behavior such as accessing off the end of an array or using an uninitialized pointer are not allowed to occur. Several features of Java combine to provide this safety, including the class model, the garbage-collected heap, and the verifier.
The JVM verifies all bytecode before it is executed. This verification consists primarily of three types of checks:
  • Branches are always to valid locations
  • Data is always initialized and references are always type-safe
  • Access to "private" or "package private" data and methods is rigidly controlled.
The first two of these checks take place primarily during the "verification" step that occurs when a class is loaded and made eligible for use. The third is primarily performed dynamically, when data items or methods of a class are first accessed by another class.
The verifier permits only some bytecode sequences in valid programs, e.g. a jump (branch) instruction can only target an instruction within the same method. Furthermore, the verifier ensures that any given instruction operates on a fixed stack location,[7] allowing the JIT compiler to transform stack accesses into fixed register accesses. Because of this, that the JVM is a stack architecture does not imply a speed penalty for emulation on register-based architectures when using a JIT compiler. In the face of the code-verified JVM architecture, it makes no difference to a JIT compiler whether it gets named imaginary registers or imaginary stack positions that must be allocated to the target architecture's registers. In fact, code verification makes the JVM different from a classic stack architecture whose efficient emulation with a JIT compiler is more complicated and typically carried out by a slower interpreter.
Code verification also ensures that arbitrary bit patterns cannot get used as an address. Memory protection is achieved without the need for a memory management unit (MMU). Thus, JVM is an efficient way of getting memory protection on simple architectures that lack an MMU. This is analogous to managed code in Microsoft's .NET Common Language Runtime, and conceptually similar to capability architectures such as the Plessey 250, and IBM System/38.
The original specification for the bytecode verifier used natural language that was "incomplete or incorrect in some respects." A number of attempts have been made to specify the JVM as a formal system. By doing this, the security of current JVM implementations can more thoroughly be analyzed, and potential security exploits prevented. It will also be possible to optimize the JVM by skipping unnecessary safety checks, if the application being runned is proved to be safe.




Java Statements






Java Statements




Methods and constructors are sequences of statements, along with variable
definitions.
The statements specify the sequence of actions to be performed when a
method or constructor is invoked.
They can alter the value of variables, generate output, process input, or
respond to user mouse or keyboard actions.

Different types of statements are described in the following sections.



Assignment Statements


An assignment statement has the following form.
variable = expression;
This statement changes the value of the variable on the left side of the
equals sign to the value of the expression on the right-hand side.
The variable is often just specified by a variable name, but
there are also expressions that specify variables.

Java treats an assignment as both an expression and as a statement.
As an expression, its value is the value assigned to the variable.
This is done to allow multiple assignments in a single statement, such as

a = b = 5;
By treating b = 5 as an expression with value 5, Java makes
sense of this statement, assigning the value 5 to both a and
b.


Statements involving Messages


Messages are the fundamental means of communication between objects in a
Java program.
A message has the following form.
receiver.method-name(parameters)

Here,


  • receiver is an expression (often just a variable name) that
    specifies the object that should respond to the message.

  • method-name is the name of the method that the receiver should
    execute.

  • parameters is a comma-separated list of expressions that provide
    data that the receiver can use in its execution of the method.
The receiver can be omitted if it is the object that you are writing code for. That is, you do not need to specify the receiver for messages sent from an object to itself.
Messages can be used in three ways to form statements. First, if the method specified by a message returns a value then the message can be used as the expression in an assignment statement.
variable = message;
For messages with methods that do not return values, a statement can also be formed by just terminating the message with a semicolon.
message;
This statement form can also be used when the method returns a value, although it is not usually a good idea to ignore a returned value.
Finally, if a message returns an object, then that object can be used directly as the receiver of a message. In an applet method, for example, getContentPane() returns a container to which components can be added. This container is the receiver of the add() message in the following statement.
getContentPane().add(button);

Statement Blocks

In Java, any sequence of statements can be grouped together to function as a single statement by enclosing the sequence in braces. These groupings are called statement blocks. A statement block may also include variable declarations.
Statement blocks are used to define methods and to allow multiple statements in the control structures described in the following sections.

Control Statements

Normally, statements in a method or constructor are executed sequentially. Java also has control statements that allow repetitive execution of statements and conditional execution of statements. Java has the following types of control statements.


Conditional Execution and Selection


Java has three kinds of statements that permit execution of a nested
statement based on the value of a boolean expression or selection among
several statements based on the value of a boolean expression or a control
variable.
These statements are the if statement, the if-else statement, and the
switch statement.


If Statements

The if statement has the following form.
if (boolean-expression) {
	then-clause
    }

Here,

  • boolean-expression is an expression that can be true or false.

  • then-clause is a sequence of statements.
    If there is only one statement in the sequence then the surrounding
    braces may be omitted.
    The then-clause statements are executed only if the
    boolean-expression is true.
If-Else Statements
The if-else statement has the following form.
if (boolean-expression) {
	then-clause
    } else {
	else-clause
    }
Here,

  • boolean-expression is an expression that can be true or false.

  • then-clause and else-clause are
    sequences of statements.
    If there is only one statement in a sequence then the surrounding
    braces may be omitted.
    The then-clause statements are executed only if the
    boolean-expression is true.
    The else-clause statements are executed if the
    boolean-expression is false.
Switch Statements
The switch statement allows execution of different statements depending on the value of an expression. It has the following form.
switch (control-expression) {
    case constant-expression-1:
	statements-1
	    .
	    .
	    .
    case constant-expression-n:
	statements-n
    default:
	default-statements
    }
Here,

  • control-expression is an expression of a simple type, such
    as int, char, or an enum type.
    It cannot have float or double type.

  • constant-expression-1 through
    constant-expression-n are expressions of a type that
    converts to the type of control-expression.
    The compiler must be able to evaluate these expressions to constant
    values.

  • statements-1 through statements-n are sequences
    of statements.
When the switch statement is executed, control-expression is evaluated. The resulting value is compared to the values of constant-expression-1 through constant-expression-n in order until a matching value is found. If a match is found in constant-expression-i then statements-i through statements-n and default-statements are executed, with switch statement execution terminated if a break statement is encountered. Normally, the last statement in each sequence is a break statement so that only one sequence is executed.
The default clause is optional. If it is present then the default-statements are executed whenever the value of control-expression does not match any of the constant-expression-i values.
Extended If-Else Statements
Often, a programmer needs a construction that works like a switch statement, but the selection between choices is too complex for a switch statement. To make this work, a programmer can use a sequence of if statements where each if statement is nested in the else clause of the preceding if statement. This construction is called an extended is statement. It has the following form.
if (boolean-expression-1) {
	statements-1
    } else if (boolean-expression-2) {
	statements-2
	.
	.
	.
    } else if (boolean-expression-n) {
	statements-n
    } else {
	default-statements
    }
Here,

  • boolean-expression-1 through
    boolean-expression-n are expressions that can be true or false.

  • statements-1 through statements-n and
    default-statements are sequences of statements.
When this extended if-else statement is executed, the boolean expressions are evaluated in order until one is found that is true. Then the corresponding sequence of statements is executed. If none of the boolean expressions is true then the default-statements are executed. In either case, execution continues with the next statement after the extended if-else statement.

Repetition

Java has three kinds of loop statements: while loops, for loops, and do-while loops. Loop statements allow a nested statement to be executed repetitively. The nested statement can be a block statement, allowing repetition of a sequence of statements.
When a loop is executed its nested statement can be executed any number of times. Each execution of the nested statement is called an iteration of the loop. The number of iterations is controlled by a boolean expression. The boolean expression is evaluated before each iteration (pretest) in while loops and for loops, and after each iteration (post-test) in do-while loops. When the boolean expression becomes false the loop is terminated.
While Loops
The while loop is a pretest loop statement. It has the following form.
while (boolean-expression) {
	nested-statements
    }
Here,

  • boolean-expression is an expression that can be true or false.

  • nested-statements is a sequence of statements.
    If there is only one statement then the braces can be omitted.
The boolean expression is tested before each iteration of the loop. The loop terminates when it is false.
For Loops
The for loop is a pretest loop statement. It has the following form.
for (initialization; boolean-expression; increment) {
	nested-statements
    }
Here,

  • initialization is an expression (usually an assignment
    expression).

  • boolean-expression is an expression that can be true or false.

  • increment is an expression.

  • nested-statements is a sequence of statements.
    If there is only one statement then the braces may be omitted.
When a for loop is executed the initialization expression is evaluted first. This expression is usually an assignment (for example, i = 0) that sets the initial value of a loop control variable.
The boolean expression is tested before each iteration of the loop. The loop terminates when it is false. The boolean expression is frequently a comparison (for example, i < 10).
At the end of each iteration, the increment expression is evaluated. The expression is often an expression that increments the control variable (for example, i++).
Do-While Loops
The do-while loop is a post-test loop statement. It has the following form.
do {
	nested-statements
    } while (boolean-expression);
Here,

  • nested-statements is a sequence of statements.
    If there is only one statement then the braces may be omitted.

  • boolean-expression is an expression that can be true or false.
The boolean expression is tested after each iteration of the loop. The loop terminates when it is false.

Special Control Statements

Return Statements
The return statement is used in the definition of a method to set its returned value and to terminate execution of the method. It has two forms. Methods with returned type void use the following form.
return;
Methods with non-void returned type use the following form.
return expression;
Here,

  • expression is an expression that yields the desired return
    value.
    This value must be convertible to the return type declared for the
    method.
Continue Statements
The continue statement is used in while loop, for loop, or do-while loop to terminate an iteration of the loop. A continue statement has the following form.
continue;
After a continue statement is executed in a for loop, its increment and boolean expression are evaluated. If the boolean expression is true then the nested statements are executed again.
After a continue statement is executed in a while or do-while loop, its boolean expression is evaluated. If the boolean expression is true then the nested statements are executed again.
Break Statements
The break statement is used in loop (for, while, and do-while) statements and switch statements to terminate execution of the statement. A break statement has the following form.
break;
After a break statement is executed, execution proceeds to the statement that follows the enclosing loop or switch statement.






Tokens in Java Programs







Tokens and Java Programs









Introduction
In this lecture we will learn about the lowest level of the Java language:
its tokens.
We will learn how to recognize and classify every category of token
(which is like classifying English words into their parts of speech).
Towards this end, we will employ ourly new learned EBNF skilss to write and
analyze descriptions for each category of token.
In later lectures we will learn about a programming language's higher level
structures: phrases (expressions), sentences (statements),
paragraphs (blocks/methods), chapters (classes), and books (packages).







The Family History of Java
Before going on to study Java, let's take a brief look, through quotes,
at the languages on which Java was based, traveling back over 30 years
to do so.



Where it starts: C
The earliest precursor of Java is C: a language developed by Ken Thompson
at Bell Labs in the early 1970s.
C was used as a system programming language for the DEC PDP-7.
C began achieving its widespread popularity when Bell's Unix operating
system was rewritten in C.
Unix was the first operating system written in a high-level language;
it was distributed to universities for free, where it became popular.
Linux is currently a popular (it is still free!) variant of Unix.
"C is a general-purpose programming language which features economy of
expression, modern control flow and data structures, and a rich set of
operators.
C is not a "very high level" language, nor a "big" one, and is not
specialized to any particular area of application."



- B. Kernighan/D. Ritchie: The C Programming Language

(Kernighan & Ritchie designed and implemented C)





From C to C++
"A programming language serves two related purposes: it provides a vehicle
for the programmer to specify actions to be executed, and it provides a
set of concepts for the programmer to use when thinking about what can
be done.
The first aspect ideally requires a language that is "close to the
machine," so that all important aspects of a machine are handled simply
and efficiently in a way that is reasonably obvious to the programmer.
The C language was primarily designed with this in mind.
The second aspect ideally requires a language that is "close to the
problem to be solved" so that the concepts of a solution can be expressed
directly and concisely.
The facilities added to C to create C++ were primarily designed with this
in mind"
- B. Stroustrup: The C++ Programming Language (2nd Ed)

(Stroustrup designed and implemented C++)





Java as a Successor to C++
"The Java programming language is a general-purpose, concurrent, class-based,
object-oriented language.
It is designed to be simple enough that many programmer can achieve fluency
in the language.
The Java programming language is related to C and C++ but it is organized
rather differently, with a number of aspects of C and C++ omitted and a
few ideas from other languages included.
It is intended to be a production language, not a research language, and
so, as C.A.R. Hoare suggested in his classic paper on language design,
the design has avoided including new and untested features.


...


The Java programming language is a relatively high-level language, in that
details of the machine representation are not available through the
language.
It includes automatic storage management, typically using a garbage
collector, to avoid the safety problems of explicit deallocation (as in
C's free or C++'s delete).
High-performance garbage-collected implementations can have bounded pauses
to support systems programming and real-time applications.
The language does not include any unsafe constructs, such as array accesses
without index checking, since such unsafe constructs would cause a
program to behave in an unspecified way."
- J. Gosling, B. Joy, G. Steele, G. Bracha: The Java Language Specification






Overview of Tokens in Java: The Big 6
In a Java program, all characters are grouped into symbols called
tokens.
Larger language features are built from the first five categories of tokens
(the sixth kind of token is recognized, but is then discarded by the Java
compiler from further processing).
We must learn how to identify all six kind of tokens that can appear in
Java programs.
In EBNF we write one simple rule that captures this structure:
token <= identifier | keyword | separator | operator | literal | comment

We will examine each of these kinds of tokens in more detail below, again
using EBNF.
For now, we briefly describe in English each token type.

  1. Identifiers: names the programmer chooses
  2. Keywords: names already in the programming language
  3. Separators (also known as punctuators): punctuation characters and
    paired-delimiters
  4. Operators: symbols that operate on arguments and produce results
  5. Literals (specified by their type)
    • Numeric: int and double
    • Logical: boolean
    • Textual: char and String
    • Reference: null
  6. Comments
    • Line
    • Block
Finally, we will also examine the concept of white space which is crucial to understanding how the Java compiler separates the characters in a program into a list of tokens; it sometimes helps decide where one token ends and where the next token starts.

The Java Character Set
The full Java character set includes all the
Unicode
characters; there are 216 = 65,536 unicode characters.
Since this character set is very large and its structure very complex, in
this class we will use only the subset of unicode that includes all the
ASCII (pronounced "Ask E") characters; there are 28 = 256
ASCII characters, of which we will still use a small subset containing
alphabetic, numeric, and some special characters.
We can describe the structure of this character set quite simply in EBNF,
using only alternatives in the right hand sides.

lower-case <= a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z

upper-case <= A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z

alphabetic <= lower-case | upper-case

numeric     <= 0|1|2|3|4|5|6|7|8|9

alphanumeric <= alphabetic | numeric

special       <= !|%|^|&|*|(|)|-|+|=|{|}|||~|[|]|\|;|'|:|"|<|>|?|,|.|/|#|@|`|_

graphic     <= alphanumeric | special

In the special rule, the bracket/brace characters stand for themselves
(not EBNF options nor repetitions) and one instance of the vertical bar
stands for itself too: this is the problem one has when the character set
of the language includes special characters that also have meanings in
EBNF.

White space consists of spaces (from the space bar), horizontal and vertical
tabs, line terminators (newlines and formfeeds): all are non-printing
characters, so we must describe them in English.
White space and tokens are closely related: we can use white space to force
the end of one token and the start of another token (i.e., white space is
used to separate tokens).
For example XY is considered to be a single token, while X Y
is considered to be two tokens.
The "white space separates tokens" rule is inoperative inside
String/char literals, and comments (which are all discussed
later).

Adding extra white space (e.g., blank lines, spaces in a line -often for
indenting) to a program changes its appearance but not its meaning to Java:
it still comprises exactly the same tokens in the same order.
Programmers mostly use white space for purely stylistic purposes: to
isolate/emphasize parts of programs and to make them easier to read and
understand.
Just as a good comedian know where to pause when telling a joke; a good
programmer knows where to put white space when writing code.

Identifiers
The first category of token is an Identifier.
Identifiers are used by programmers to name things in Java: things such as
variables, methods, fields, classes, interfaces, exceptions, packages, etc.
The rules for recognizing/forming legal identifiers can be easily stated in
EBNF.
id-start    <= alphabetic | $ | _

identifier <= id-start{id-start | numeric }


Although identifiers can start with and contain the $ character,
we should never include a $ in identifiers that we write;
such identifiers are reserved for use by the compiler, when it needs to
name a special symbol that will not conflict with the names we write.

Semantically, all characters in an identifier are significant, including the
case (upper/lower) of the alphabetic characters.
For example, the identifier Count and count denote different
names in Java; likewise, the identifier R2D2 and R2_D2
denote different names.

When you read programs that I have written, and write your own program,
think carefully about the choices made to create identifiers.

  • Choose descriptive identifiers (mostly starting with lower-case
    letters).
  • Separate different words in an identfier with a case change:
    e.g., idCount; this is called "camel style", with each
    capital letter representing a hump.
  • Apply the "Goldilocks Principle": not too short, not too long, just
    right.
During our later discussions of programming style, we will examine the standard naming conventions that are recommend for use in Java code.
Carefully avoid identifiers that contain dollar signs; avoid
  • homophones (sound alike): aToDConvertor   a2DConvertor
  • homoglyphs (look alike): allOs vs. all0s and
    Allls vs All1s
      which contain the letter (capital) O, number 0, letter (small) l, letter (capital) I, and number 1
  • mirrors: xCount   countX

Keywords
The second category of token is a Keyword, sometimes called a
reserved word.
Keywords are identifiers that Java reserves for its own use.
These identifiers have built-in meanings that cannot change.
Thus, programmers cannot use these identifiers for anything other than their
built-in meanings.
Technically, Java classifies identifiers and keywords as separate categories
of tokens.
The following is a list of all 49 Java keywords we will learn the meaning
of many, but not all,of them in this course.
It would be an excellent idea to print this table, and then check off the
meaning of each keyword when we learn it; some keywords have multiple
meanings, determined by the context in which they are used.

abstractcontinuegotopackageswitch
assertdefaultifprivatethis
booleandoimplementsprotectedthrow
breakdoubleimportpublicthrows
byteelseinstanceofreturntransient
caseextendsintshorttry
catchfinalinterfacestaticvoid
charfinallylongstrictfpvolatile
classfloatnativesuperwhile
constfornewsynchronized
Notice that all Java keywords contain only lower-case letters and are at
least 2 characters long; therefore, if we choose identifiers that are very
short (one character) or that have at least one upper-case letter in them,
we will never have to worry about them clashing with (accidentally being
mistaken for) a keyword.
Also note that in the Metrowerks IDE (if you use my color preferences),
keywords always appear in yellow (while identifiers, and many other tokens,
appear in white).

We could state this same tabular information as a very long (and thus harder
to read) EBNF rule of choices (and we really would have to specify
each of these keywords, and not use "...") looking like

keyword <= abstract | boolean | ... | while

Finally, assert was recently added (in Java 1.4) to the original 48
keywords in Java.

Separators
The third category of token is a Separator (also known as a
punctuator).
There are exactly nine, single character separators in Java, shown in the
following simple EBNF rule.
separator <= ; | , | . | ( | ) | { | } | [ | ]

In the separator rule, the bracket/brace characters stand for
themselves (not EBNF options or repetitions).

Note that the first three separators are tokens that separate/punctuate
other tokens.
The last six separators (3 pairs of 2 each) are also known as delimiters:
wherever a left delimiter appears in a correct Java program, its matching
right delimiter appears soon afterwards (they always come in matched
pairs).
Together, these each pair delimits some other entity.

For example the Java code Math.max(count,limit); contains nine
tokens

  1. an identifier (Math), followed by
  2. a separator (a period), followed by
  3. another identifier (max), followed by
  4. a separator (the left parenthesis delimiter), followed by
  5. an identfier (count), followed by
  6. a separator (a comma), followed by
  7. another identifier(limit), followed by
  8. a separator (the right parenthesis delimiter), followed by
  9. a separator (a semicolon)

Operators
The fourth category of token is an Operator.
Java includes 37 operators that are listed in the table below;
each of these operators consist of 1, 2, or at most 3 special
characters.
=><!~?:
==<=>=!=&&||++--
+-*/&|^% <<>>>>>
+=-=*=/=&=|=^=%= <<=>>=>>=
The keywords instanceof and new are also considered operators
in Java.
This double classification can be a bit confusing; but by the time we
discuss these operators, you'll know enough about programmig to take them
in stride.

It is important to understand that Java always tries to construct the
longest token from the characters that it is reading.
So, >>= is read as one token, not as the three tokens >
and > and =, nor as the two tokens >> and
=, nor even as the two tokens > and >=.

Of course, we can always use white space to force Java to recognize separate
tokens of any combination of these characters:
writing >   >= is the two tokens > and >=.

We could state this same tabular information as a very long (and thus harder
to read) EBNF rule of choices (and we really would have to specify each of
these operators, and not use "...") looking like

operator <=   = | > | ... | >>= | instanceof | new


Types and Literals
The fifth, and most complicated category of tokens is the Literal.
All values that we write in a program are literals: each belongs to one of
Java's four primitive types (int, double, boolean,
char) or belongs to the special reference type String.
All primitive type names are keywords in Java; the String reference
type names a class in the standard Java library, which we will learn much
more about soon.
A value (of any type) written in a Java program is called a literal;
and, each written literal belongs in (or is said to have) exactly one type.
literal <= integer-literal | floating-point-literal | boolean-literal

               
| character-literal
| string-literal
| null-literal

Here are some examples of literals of each of these types.

Literaltype
1int
3.14double (1. is a double too)
trueboolean
'3'char ('P' and '+' are char too)
"CMU ID"String
nullany reference type
The next six sections discuss each of these types of literals, in more detail.

int Literals
Literals of the primitive type int represent countable, discrete
quantities (values with no fractions nor decimal places
possible/necessary).
We can specify the EBNF for an int literal in Java as
non-zero-digit     <= 1|2|3|4|5|6|7|8|9

digit                     <= 0 | non-zero-digit

digits                   <= digit{digit}

decimal-numeral <= 0 | non-zero-digit[digits]

integer-literal      <= decimal-numeral

                              | octal-numeral

                              | hexidecimal-numeral

This EBNF specifies only decimal (base 10) literals.
In Java literals can also be written in ocal (base 8) and hexidecimal
(base 16).
I have omitted the EBNF rules for forming these kinds of numbers, because we
will use base 10 exclusively.
Thus, the rules shown above are correct, but not complete.

By the EBNF rules above, note that the symbol 015 does not look like a
legal integer-literal; it is certainly not a
decimal-numeral, because it starts with a zero.
But, in fact, it is an octal-numeral (whose EBNF is not shown).
Never start an integer-literal with a 0 (unless its value is
zero), because starting with a 0 in Java signifies the literal is
being written as an octal (base 8) number: e.g., writing 015 refers
to an octal value, whose decimal (base 10) value is 13!
So writing a leading zero in an integer can get you very confused about what
you said to the computer.

Finally, note that there are no negative literals: we will see soon how to
compute such values from the negate arithmetic operator and a positive
literal (writing -1 is exactly such a construct).
This is a detail: a distinction without much difference.

double Literals
Literals of the primtive type double represent measureable quantities.
Like real numbers in mathematics, they can represent fractions and numbers
with decimal places.
We can specify the EBNF for a double literal in Java as
exponent-indicator   <= e | E

exponent-part           <= exponent-indicator [+|-]digits

floating-point-literal <= digits exponent-part

                       
       
| digits.[digits][exponent-part]

                       
       
| .digits[exponent-part]

This EBNF specifies a floating-point-literal to contain various
combinations of a decimal point and exponent (so long as one -or both- are
present); if neither is present then the literal must be classified as an
int-literal.
The exponent-indicator (E or e) should be read to mean
"times 10 raised to the power of".

Like literals of the type int, all double literals are
non-negative (although they may contain negative exponents).
Using E or e means that we can specify very large or small
values easily
(3.518E+15 is equivalent to 3.518 times 10 raised to the
power of 15, or 3518000000000000.; and 3.518E-15 is
equivalent to 3.518 times 10 raised to the power of -15, or
.000000000000003518)
In fact, any literal with an exponent-part is a double: so even
writing 1E3 is equivalent to writing 1.E3, which are both
equivalent to writing 1000.
Note this does not mean the int literal 1000!

Finally, all double literals must be written in base 10 (unlike
int literals, which can be written in octal or hexadecimal)

boolean Literals
The type name boolean honors George Boole, a 19th century English
mathematician who revolutionized the study of logic by making it more
like arithmetic.
He invented a method for calculating with truth values and an algebra for
reasoning about these calculations.
Boole's methods are used extensively today in the engineering of hardware
and software systems.
Literals of the primitive type boolean represent on/off, yes/no,
present/absent, ... data.
There are only two values of this primtive type, so its ENBF rule is
trivially written as

boolean-literal <= true | false


In Java, although these values look like identifiers, they are classified as
literal tokens (just as all the keywords also look like identifiers, but
are classified differently).
Therefore, 100 and true are both literal tokens in Java (of
type int and boolean respectively).

Students who are familiar with numbers sometimes have a hard time accepting
true as a value; but that is exactly what it is in Java.
We will soon learn logical operators that compute with these values of the
type boolean just as arithmetic operators compute with values of
the type int.

char Literals
The first type of text literal is a char.
This word can be pronounced in many ways: care, car, or as in
charcoal
(I'll use this last pronunciation).
Literals of this primitive type represent exactly one character inside
single quotes.
Its EBNF rule is written
character-literal <= 'graphic' | 'space' | 'escape-sequence'

where the middle option is a space between single quotes.
Examples are 'X', or 'x', or '?', or ' ', or
'\n', etc. (see below for a list of some useful escape sequences).

Note that 'X' is classified just as a literal token (of the primitive
type char); it is NOT classified as an identifier token inside two
separator tokens!

String Literals
The second type of text literal is a String.
Literals of this reference type (the only one in this bunch; it is not a
primitive type) represent zero, one, or more characters:
Its EBNF is written
string-literal <= "{graphic | space | escape-sequence}"

Examples are: "\n\nEnter your SSN:", or
"" (the empty String), or
"X" (a one character String, which is different from a
char).

Note that "CMU" is classified just as a literal token (of the
reference type String); it is NOT classified as an identifier token
inside two separator tokens!

Escape Sequences
Sometimes you will see an escape-sequence inside the single-quotes
for a character-literal or one or more inside double-quotes for a
string-literal (see above);
each escape sequence is translated into a character that prints in some
"special" way.
Some commonly used escape sequences are
Escape SequenceMeaning
\nnew line
\thorizontal tab
\vvertical tab
\bbackspace
\rcarriage return
\fform feed
\abell
\\\ (needed to denote \ in a text literal)
\'' (does not act as the right ' of a char literal)
\"" (does not act as the right " of a String literal)
So, in the String literal "He said, \"Hi.\"" neither escape
sequence \" acts to end the String literal: each represents
a double-quote that is part of the String literal, which displays
as He said, "Hi."

If we output "Pack\nage", Java would print on the console
Pack
age
with the escape sequence \n causing Java to immediately terminate the
current line and start at the beginning of a new line.
There are other ways in Java to write escape sequences (dealing with unicode
represented by octal numbers) that we will not cover here, nor need in the
course.
The only escape sequence that we wil use with any frequency is \n.

The null Reference Literal
There is a very simple, special kind of literal that is used to represent a
special value with every reference type in Java (so far we know only one,
the type String).
For completeness we will list it here, and learn about its use a bit later.
Its trivial EBNF rule is written
null-literal <= null

So, as we learned with boolean literals, null is a literal in
Java, not an identifier.

Bounded Numeric Types
Although there are an infinite number of integers in mathematics, values in
the int type are limited to the range from -2,147,483,648 to
2,147,483,647.
We will explore this limitation later in the course, but for now we will not
worry about it.
Likewise, although there are an infinite number of reals in mathematics,
values in the double type are limited to the range from

-1.79769313486231570x10308
to
1.79769313486231570x10308; the smallest non-zero, positive value
is
4.94065645841246544x10-324.
Values in this type can have up to about 15 significant digits.
For most engineering and science calculations, this range and precision are
adequate.

In fact, there are other primitive numeric types (which are also keywords):
short, long, and float.
These types are variants of int and double and are not as
widely useful as these more standard types, so we will not cover them in
this course.

Finally, there is a reference type named BigInteger, which can
represent any number of digits in an integer (up to the memory capacity of
the machine).
Such a type is very powerful (because it can represent any integer), but
costly to use (in execution time and computer space) compared to int.
Most programs can live with the "small" integer values specified above; but,
we will also study this reference type soon, and write programs using it.



Comments
The sixth and final category of tokens is the Comment.
Comments allow us to place any form of documentation inside our Java code.
They can contain anything that we can type on the keyboard: English,
mathematics, even low-resolution pictures.
In general, Java recognizes comments as tokens, but then excludes these
tokens from further processing; technically, it treats them as white space
when it is forming tokens.
Comments help us capture aspects of our programs that cannot be expressed as
Java code.
Things like goals, specification, design structures, time/space tradeoffs,
historical information, advice for using/modifying this code, etc.
Programmers intensely study their own code (or the code of others) when
maintaining it (testing, debugging or modifying it).
Good comments in code make all these tasks much easier.

Java includes two style for comments.

  • Line-Oriented: begins with // and continues until the end of
    the line.
  • Block-Oriented: begins with /* and continues (possibly over
    many lines) until */ is reached.
    • So, we can use block-oriented comments to create multiple comments
      within a line

          display(/*Value*/ x, /*on device*/ d);

      In contrast, once a line-oriented comment starts, everything
      afterward on its line is included in the comment.
    • We can also use block-oriented comments to span multiple lines

      /*

          This is a multi-line comment.

          No matter home many lines

          it includes, only one pair

          of delimiters are needed.

      */


      In contrast, a line-oriented comment stops at the end of the line
      it starts on.
Technically, both kinds of comments are treated as white space, so writing
X/*comment*/Y has the same meaning in Java as writing the tokens
X and Y, not the single token XY.
Typically Java comments are line-oriented; we will save block-oriented
comments for a special debugging purpose (discussed later).

The EBNF rule for comments is more complicated than insightful, so we will
not study here.
This happens once in a while.





Program are a Sequence of Tokens built from Characters
The first phase, a Java compiler tokenizes a program by scanning its
characters left to right, top to bottom (there is a hidden end-of-line
character at the end of each line; recall that it is equivalent to white
space), and combining selected characters into tokens.
It works by repeating the following algorithm (an algorithm is a precise set
of instructions):
  • Skip any white space...
  • ...if the next character is an underscore, dollar, or alphabetic
    character, it builds an identifier token.
    • Except for recognizing keywords and certain literals (true,
      false, null) which all share the form of identifiers,
      but are not themselves identifiers
  • ...if the next character is a numeric character, ' or ", it builds a
    literal token.
  • ...if the next character is a period, that is a seperator unless the
    character after it is a numeric character (in which case it builds a
    double literal).
  • ...if the next two characters are a // or /* starting a comment, it
    builds a comment token.
  • ...if the next character is anything else, it builds a separator or
    operator token (trying to build the longest token, given that white
    space separates tokens, except in a char or String
    literal).
Recall that white space (except when inside a textual literal or comment) separates tokens.
Also, the Java compiler uses the "longest token rule": it includes characters in a token until it reaches a character that cannot be included.
Finally, after building and recognizing each token, the Java compiler passes all tokens (except for comments, which are ignored after being tokenized) on to the next phase of the compiler.

Common Mistakes
I have seen the following mistakes made repeatedly by beginning students
trying to tokenize programs.
Try to understand each of these subtle points.
  • Tokenizing x as a char literal: it is an identifier.
  • Tokenizing 10.5 as two int literals separated by a
    period: it is a double literal.
  • Tokenizing int as a literal: it is a keyword, that happens to
    name a type in Java.
    Tokens like 1 are literals whose type is int; the token
    int is a keyword.
  • Tokenizing "Hi" as two separators with the identifier
    Hi in between: it is a single String literal.
  • Tokenizing something like }; as one separator token: it is
    really two separate separators.
  • Tokenizing something like += as two separate operator tokens
    (because + and = are operators): it is really one
    large token (because += is an operator).
  • Forgetting to tokenize parentheses, semicolons, and other separators:
    everything except white space belongs in some token.
  • Creaing tokens inside comments: each comment is one big token that
    includes all the characters in the comment.

A Simple Program
The following program will serve as a model of Input/Calculate/Output
programs in Java.
Here are some highlights
  • A large, multi-line (oriented) comment appears at the top of the
    program.
    Line-oriented comments appear at various other locations in the
    program.
  • The Prompt class is imported from the
    edu.cmu.cs.pattis.cs151xx package.
  • The Application class is declared.
  • Its main method is declared; its body (the statements it
    executes) is placed between the { and } delimiters.
  • Each simple statement in the body is ended by a semicolon (;)
    separator.
  • Three variables storing double values are declared.
  • The user is prompted for the value to store in the first two variables.
  • The third variable's value is computed and stored.
  • The third variable's value is printed (after printing a blank line).
Besides just reading this program, practice tokenzing it.
//////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////////
//
// Description:
//
//   This program computes the time it take to drop an object (in a vacuum)
// form an arbitrary height in an arbitrary gravitational field (so it can
// be used to calculate drop times on other planets). It models a straight
// input/calculate/output program: the user enters the gravitation field
// and then the height; it calculates thd drop time and then prints in on
// the console.
//
//////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////////


import edu.cmu.cs.pattis.cs151xx.Prompt;


public class Application {


  public static void main(String[] args)
    {
      try {

        double gravity;        //meter/sec/sec
        double height;         //meters
        double time;           //sec
		  
		  
        //Input
		  
        gravity = Prompt.forDouble("Enter gravitational acceleration (in meters/sec/sec)");
        height  = Prompt.forDouble("Enter height of drop (in meters)");
		  
		  
        //Calculate
		  
        time = Math.sqrt(2.*height/gravity);
		  
		  
        //Output
		  
        System.out.println("\nDrop time = " + time + " secs");

		  
      }catch (Exception e) {
        e.printStackTrace();
        System.out.println("main method in Application class terminating");
        System.exit(0);  
   }

}

How Experts See Programs
In the 1940s, a Dutch psychologist named DeGroot was doing research on chess
experts.
He performed the following experiment: He sat chess experts down in front of
an empty chessboard, all the chess pieces, and a curtain.
Behind the curtain was a chessboard with its pieces arranged about 35 moves
into a game.
The curtain was raised for one minute and then lowered.
The chess experts were asked to reconstruct what they remembered from seeing
the chessboard behind the curtain.
In most cases, the chess experts were able to completely reconstruct the
board that they saw.
The same experiment was conducted with chess novices, but most were able to
remember only a few locations of the pieces.
These results could be interpreted as, "Chess experts have much better
memories than novices."

So, DeGroot performed a second (similar) experiment.
In the second experiment, the board behind the curtain had the same number
of chess pieces, but they were randomly placed on the board; they did not
represent an ongoing game.
In this modified experiment, the chess experts did only marginally better
than the novices.
DeGroot's conclusion was that chess experts saw the board differently than
novices: they saw not only pieces, but attacking and defending structures,
board control, etc.

In this class, I am trying to teach you how to see programs as a programmer
sees them: not as a sequence of characters, but at a higher structural
level.
Tokens is where we start this process.

Problem Set
To ensure that you understand all the material in this lecture, please solve
the the announced problems after you read the lecture.
If you get stumped on any problem, go back and read the relevant part of the
lecture.
If you still have questions, please get help from the Instructor, a CA, a Tutor,
or any other student.

  1. Classify each of the following as a legal or illegal identifier.
    If it is illegal, propose a legal identifier that can take its place
    (a homophone or homoglyph)
    packAge x12 2Lips
    xOrY sum of squares %Raise
    termInAte u235 $Bill
    x_1 x&Y 1derBoys

  2. What tokens does Java build from the characters ab=c+++d==e.
    Be sure that you know your Operators.

  3. Classify each of the following numeric literals as int, or double, or
    illegal (neither); write the equivalent value of each double without using
    E notation; for each illegal literal, write a legal one with the "same" value.
    5. 3.1415 17
    17.0 1E3 1.E3
    .5E-3 5.4x103 50E-1
    1,024 0.087 .087

  4. What is the difference between 5, 5., five,
    '5', and "5"?
    What is the difference between true and "true"?

  5. Write a String literal that includes the characters
    I've said a million times, "Do not exaggerate!"

  6. How does Java classify each of the following lines

        "//To be or not to be"

        //"To be or not to be"

  7. Does the following line contain one comment or two?

       //A comment //Another comment?

  8. Explain whether X/**/Y is equivalent to XY or X   Y.

  9. Tokenize the following Java Code (be careful): -15

  10. Tokenize the following line of Java code: identify every Java token as either an
    Identifier, Keyword, Separator, Operator, Literal (for any literal, also specify its type), or Comment.
    Which (if any) identifiers are keywords?

    int X = Prompt.forInt("SSN",0,999999999); //Filter && use

  11. Choose an appropriate type to represent each of the following pieces of information
    • the number of characters in a file
    • a time of day (accurate to 1 second)
    • the middle initial in a name
    • whether the left mouse button is currently pushed
    • the position of a rotary switch (with 5 positions)
    • the temperature of a blast furnace
    • an indication of whether one quantity is less than, equal to or greater than another
    • the name of a company
  12. This problem (it is tricky, so do it carefully) shows a difficulty with using
    Block-Oriented comments.
    Tokenize the following two lines of Java code: identify every token as either an
    Identifier, Keyword, Separator, Operator, Literal, or Comment. What problem arises?

    x = 0;  /* Initialize x and y to
      y = 1;     their starting values */
    Rewrite the code shown above with Line-Oriented comments instead, to avoid this problem.
    How can our use of my Java preferences help us avoid this error?

  13. This problem (it is tricky, so do it carefully) shows another difficulty with using
    Block-Oriented comments.
    Tokenize the following Java code: identify every token as either an
    Identifier, Keyword, Separator, Operator, Literal, or Comment. What problem arises?

    /*
        here is an outer
        comment with an
        /* inner comment inside */
        and the finish of the outer
        comment at the end
      */
    Rewrite the code shown above with Line-Oriented comments instead, to avoid this problem.
    How can our use of my Java preferences help us avoid this error?


  14. Explain why language designers are very reluctant to add new keywords
    to a programming language.
    Hint: what problem might this cause in already-written programs?