ewe.util
Class DataParser

java.lang.Object
  extended byewe.util.DataParser

public class DataParser
extends Object

A DataParser is used to extract numeric and textual information from a formated line of text. It works in a similar fashion to C/C++ scanf() routines, in that you specify the format of the data using "%" fields in a format String.

To parse a large text file you should use the ewe.io.StreamScanner class. This can scan entire files at the maximum speed while not creating any objects for each scan.

Specifying Data Format

You specify the data you wish to parse as a String with a set of '%' formats separated by spaces, similar to C/C++ scanf function. You can scan for Strings (either as fixed length Strings or individual words), integer/long values or floating point values.

The formats you can use for scanning numbers are:

%#i %#f %#d
'i' indicates an integer value and 'f' or 'd' indicates a floating point (double) value.
'#' indicates an optional number specifying the number of digits to read. If you do not specify a number of digits, then all non-space characters will be read in and then converted to a number.

For strings, words or characters use:

%#c %#s %q
'c' indicates a single character (byte) to read.
's' indicates a single word or String to read.
'q' indicates a word or set of words that may be in quotes. In other words, if the first character read is a ' or " character, then all characters will be read until a matching quote is found. If the first character is not a quote character, then only the first word is read in.
'#' indicates an optional number specifying the number of characters to read (note that this cannot be used with the 'q' format).
Note that '%10c' and '%10s' will have the same effect - i.e. both will read in a string of 10 characters, but '%c' reads a single character and '%s' reads the next single word.

Skipping fields - Using a '!' character instead of a '%' character will indicate that the specified field should be skipped over instead of being converted and returned.

Retrieving the Parsed Data

This can be done in two ways. The parse() methods return an Object array that contains a single Object for each '%' field in the scan string (but NOT for any '!' fields). Each object will be either a ewe.sys.Long object (for %i fields), ewe.sys.Double object (for %f fields) and a ewe.util.SubString object (for all text fields). So a scan of "%10s !5s %i %f" will return an array of 3 objects. The object at index 0 will be a SubString, the one at index 1 will be a Long object and the one at index 2 will be a Double object. Note that these objects are re-used for the next parse().

You can also ignore the return value of parse() and instead call one of the getXXX() methods to retrieve a particular data type from the scanned array of values. Using the same example "%10s !5s %i %f" after a parse you could call getString(0) followed by getInt(1) followed by getDouble(2). These calls are only valid until the next parse().


Constructor Summary
DataParser(String format)
          Create a new DataParser for the specified format.
 
Method Summary
 double getDouble(int index)
          Use this to get the value that was just parsed at the specified index.
 String getFormat()
           
 int getInt(int index)
          Use this to get the value that was just parsed at the specified index.
 long getLong(int index)
          Use this to get the value that was just parsed at the specified index.
 String getString(int index)
          Use this to get the value that was just parsed at the specified index.
 SubString getSubString(int index)
          Use this to get the value that was just parsed at the specified index.
 Object getValue(int index)
          Get the parsed value at the specified index.
 Object[] parse(byte[] buffer, int start, int length)
          Parse a string of UTF encoded bytes.
 Object[] parse(char[] chars, int start, int length)
          Parse a string of UTF encoded bytes.
 Object[] parse(String data)
          Parse a string.
static DataParser parseString(String data, String format)
          This creates a new DataParser for the specified format and then parses the String.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, toString
 

Constructor Detail

DataParser

public DataParser(String format)
           throws IllegalArgumentException
Create a new DataParser for the specified format.

Parameters:
format - The format in % notation.
Throws:
IllegalArgumentException - if the format is malformed.
Method Detail

getFormat

public String getFormat()

parseString

public static DataParser parseString(String data,
                                     String format)
                              throws IllegalArgumentException,
                                     IndexOutOfBoundsException
This creates a new DataParser for the specified format and then parses the String.

Parameters:
data - The data to parse.
format - The format string.
Returns:
The DataParser created and used for the parsing. Use the getXXX() methods to retrieve the parsed values.
Throws:
IllegalArgumentException - if the format is malformed.
IndexOutOfBoundsException - if there was not enough data to parse all formats.

parse

public Object[] parse(String data)
               throws IndexOutOfBoundsException
Parse a string.

Parameters:
data - the String data.
Returns:
A set of objects for each '%' element in the format. This will be a ewe.sys.Long object for integer/long values, a ewe.sys.Double object for double values or a ewe.util.SubString for string/text values.
Throws:
IndexOutOfBoundsException - if there was not enough data to parse all formats.

parse

public Object[] parse(byte[] buffer,
                      int start,
                      int length)
               throws IndexOutOfBoundsException
Parse a string of UTF encoded bytes.

Parameters:
buffer - the array containing the bytes.
start - the start of the data bytes in the array.
length - the number of data bytes in the array.
Returns:
A set of objects for each '%' element in the format. This will be a ewe.sys.Long object for integer/long values, a ewe.sys.Double object for double values or a ewe.util.SubString for string/text values.
Throws:
IndexOutOfBoundsException - if there was not enough data to parse all formats.

parse

public Object[] parse(char[] chars,
                      int start,
                      int length)
               throws IndexOutOfBoundsException
Parse a string of UTF encoded bytes.

Parameters:
chars - the array containing the characters.
start - the start of the data bytes in the array.
length - the number of data bytes in the array.
Returns:
A set of objects for each '%' element in the format. This will be a ewe.sys.Long object for integer/long values, a ewe.sys.Double object for double values or a ewe.util.SubString for string/text values.
Throws:
IndexOutOfBoundsException - if there was not enough data to parse all formats.

getValue

public Object getValue(int index)
                throws IndexOutOfBoundsException
Get the parsed value at the specified index. This will either be a ewe.sys.Long or a ewe.sys.Double or a ewe.util.SubString;

Parameters:
index - The index of the retrieved value.
Returns:
The Object at that value.
Throws:
IndexOutOfBoundsException - if the index is out of bounds.

getLong

public long getLong(int index)
             throws IllegalArgumentException,
                    IndexOutOfBoundsException
Use this to get the value that was just parsed at the specified index.

Parameters:
index - The index of the value for the '%i' element as specified in the format string.
Returns:
The long value at that index.
Throws:
IllegalArgumentException - If the element did not denote an integer value.
IndexOutOfBoundsException - If the index is out of bounds.

getInt

public int getInt(int index)
           throws IllegalArgumentException,
                  IndexOutOfBoundsException
Use this to get the value that was just parsed at the specified index.

Parameters:
index - The index of the value for the '%i' element as specified in the format string.
Returns:
The integer value at that index.
Throws:
IllegalArgumentException - If the element did not denote an integer value.
IndexOutOfBoundsException - If the index is out of bounds.

getDouble

public double getDouble(int index)
                 throws IllegalArgumentException,
                        IndexOutOfBoundsException
Use this to get the value that was just parsed at the specified index.

Parameters:
index - The index of the value for the '%f' element as specified in the format string.
Returns:
The double value at that index.
Throws:
IllegalArgumentException - If the element did not denote an integer value.
IndexOutOfBoundsException - If the index is out of bounds.

getSubString

public SubString getSubString(int index)
                       throws IllegalArgumentException,
                              IndexOutOfBoundsException
Use this to get the value that was just parsed at the specified index.

Parameters:
index - The index of the value for the '%' element as specified in the format string.
Returns:
The SubString value at that index.
Throws:
IllegalArgumentException - If the element did not denote an integer value.
IndexOutOfBoundsException - If the index is out of bounds.

getString

public String getString(int index)
                 throws IllegalArgumentException,
                        IndexOutOfBoundsException
Use this to get the value that was just parsed at the specified index.

Parameters:
index - The index of the value for the '%' element as specified in the format string.
Returns:
The String value at that index.
Throws:
IllegalArgumentException - If the element did not denote an integer value.
IndexOutOfBoundsException - If the index is out of bounds.