Tech and Media Labs
This site uses cookies to improve the user experience.




Java IO: StreamTokenizer

Jakob Jenkov
Last update: 2015-09-10

The Java StreamTokenizer class (java.io.StreamTokenizer) can tokenize the characters read from a Reader into tokens. For instance, in the string "Mary had a little lamb" each word is a separate token.

When you are parsing files or computer languages it is normal to break the input into tokens, before further processing them. This process is also called "lexing" or "tokenizing".

Using a Java StreamTokenizer you can move through the tokens in the underlying Reader. You do so by calling the nextToken() method of the StreamTokenizer inside a loop. After each call to nextToken() the StreamTokenizer has several fields you can read to see what kind of token was read, it's value etc. These fields are:

ttypeThe type of token read (word, number, end of line)
svalThe string value of the token, if the token was a string (word)
nvalThe number value of the token, if the token was a number.

StreamTokenizer Example

Here is a simple Java StreamTokenizer example:

StreamTokenizer streamTokenizer = new StreamTokenizer(
        new StringReader("Mary had 1 little lamb..."));

while(streamTokenizer.nextToken() != StreamTokenizer.TT_EOF){

    if(streamTokenizer.ttype == StreamTokenizer.TT_WORD) {
        System.out.println(streamTokenizer.sval);
    } else if(streamTokenizer.ttype == StreamTokenizer.TT_NUMBER) {
        System.out.println(streamTokenizer.nval);
    } else if(streamTokenizer.ttype == StreamTokenizer.TT_EOL) {
        System.out.println();
    }

}
streamTokenizer.close();

The Java StreamTokenizer is capable of recognizing identifiers, numbers, quoted strings, and various comment styles. You can also specify what characters are to be interpreted as white space, comment begin, end etc. All these things are configured on the StreamTokenizer before you start parsing its contents. See the JavaDoc for more information about that.

Closing a StreamTokenizer

When you are finished reading tokens from the StreamTokenizer you should remember to close it. Closing a StreamTokenizer will also close the Reader instance from which the StreamTokenizer is reading.

Closing a StreamTokenizer is done by calling its close() method. Here is how closing a StreamTokenizer looks:

streamTokenizer.close();

You can also use the try-with-resources construct introduced in Java 7. Here is how to use and close a StreamTokenizer looks with the try-with-resources construct:

Reader reader = new FileReader("data/data.bin");

try(StreamTokenizer streamTokenizer =
    new StreamTokenizer(reader)){

    while(streamTokenizer.nextToken() != StreamTokenizer.TT_EOF){
    
        if(streamTokenizer.ttype == StreamTokenizer.TT_WORD) {
            System.out.println(streamTokenizer.sval);
        } else if(streamTokenizer.ttype == StreamTokenizer.TT_NUMBER) {
            System.out.println(streamTokenizer.nval);
        } else if(streamTokenizer.ttype == StreamTokenizer.TT_EOL) {
            System.out.println();
        }
    
    }
}

Notice how there is no longer any explicit close() method call. The try-with-resources construct takes care of that.

Notice also that the first FileReader instance is not created inside the try-with-resources block. That means that the try-with-resources block will not automatically close this FileReader instance. However, when the StreamTokenizer is closed it will also close the Reader instance it reads from, so the FileReader instance will get closed when the StreamTokenizer is closed.

Jakob Jenkov




Copyright  Jenkov Aps
Close TOC