Google
 
Web unafbapune.blogspot.com

Wednesday, October 24, 2007

 

TextIterable ?

Ever wonder why there doesn't exist a kind of Iterable in Java that can work with text file ? For example:
Iterable ti = new TextIterable(myfile);

for (String line: ti) {
// do stuff with line
}
// file automatically opened and closed
Or :
TextIterable ti = new TextIterable(myfile);
int count = 0;

for (String line : ti) {
if (++count > 10)
break;
// do stuff with line
}
ti.close(); // explicitly close it
You would think something so obvious/simple/needed that someone would have done it already. But search I may, I couldn't find any. So I ended up building one. See code below for the TextIterable and a helper class LineIterator. Note:
  1. You don't need to explicitly open the file and you don't have to explicitly close the file. As long as you iterate through the entire file, you get the opening/closing automatically done. Yet to ensure the underlying resource is closed even in the face of exceptions, the TextIterable can be explicitly closed in a finally block.
  2. Or, if you choose to iterate through only part of the file, it can be explicitly closed afterwards.
  3. Each line is loaded on demand, so it would incur minimal memory footprint.
  4. Not only does TextIterable support file, it also support resource path, and for that matter, any url!
  5. TextIterable is threadsafe, but the iterators it produces (ie LineIterator) is not (ie intended to be used in individual threads.)
  6. Support Encodings.
  7. Support Fluent API for configurations.
  8. Think File, URL, resource, rather than Stream, Reader, etc.
I think it makes the code much simpler now. What do you think ?

Special thanks to John Xiao for listening and triggering the support of (2), "lumpynose" for various suggestions, Henri Yandell for triggering the support of (6), and Dr. Heinz M. Kabutz for reminding me not to under-estimate the force ;)
/**
* @author Hanson Char
*/
@ThreadSafe
public class TextIterable implements Iterable<String>, Closeable {
private final URL url;
private final List<LineIterator> openedIterators = new ArrayList<LineIterator>();
private volatile boolean returnNullUponEof;
private String charsetname;
private Charset charset;
private CharsetDecoder charsetDecoder;

public TextIterable(File file) throws MalformedURLException {
this(file.toURI().toURL());
}

public TextIterable(URL url) {
this.url = url;
}

public TextIterable(String resourcePath) {
this(
Thread.currentThread()
.getContextClassLoader()
.getResource(resourcePath));
}

public LineIterator iterator() {
LineIterator ret;
final String charsetname;
final Charset charset;
final CharsetDecoder charsetDecoder;

synchronized(this) {
charsetname = this.charsetname;
charset = this.charset;
charsetDecoder = this.charsetDecoder;
}
try {
if (charsetDecoder != null)
ret = new LineIterator(this, url.openStream(), returnNullUponEof, charsetDecoder);
else if (charset != null)
ret = new LineIterator(this, url.openStream(), returnNullUponEof, charset);
else if (charsetname != null)
ret = new LineIterator(this, url.openStream(), returnNullUponEof, Charset.forName(charsetname));
else
ret = new LineIterator(this, url.openStream(), returnNullUponEof, (Charset)null);
synchronized (openedIterators) {
openedIterators.add(ret);
}
return ret;
} catch (IOException e) {
throw new IllegalStateException(e);
}
}

public void close() {
final LineIterator[] lineIterators;

synchronized (openedIterators) {
lineIterators = openedIterators.toArray(
new LineIterator[
openedIterators.size()]);
for (Iterator<LineIterator> itr=openedIterators.iterator(); itr.hasNext();) {
itr.next();
itr.remove();
}
}
for (LineIterator li : lineIterators)
li.closeInPrivate();
}

public int numberOfopenedIterators() {
return openedIterators.size();
}

void removeLineIterator(LineIterator li) {
synchronized (openedIterators) {
openedIterators.remove(li);
}
}

public boolean isReturnNullUponEof() {
return returnNullUponEof;
}

public void setReturnNullUponEof(boolean returnNullUponEof) {
this.returnNullUponEof = returnNullUponEof;
}

public TextIterable withReturnNullUponEof(boolean returnNullUponEof) {
setReturnNullUponEof(returnNullUponEof);
return this;
}

public synchronized Charset getCharset() {
return charset;
}

public synchronized void setCharset(Charset charset) {
this.charset = charset;
this.charsetname = null;
this.charsetDecoder = null;
}

public TextIterable withCharset(Charset charset) {
setCharset(charset);
return this;
}

public synchronized CharsetDecoder getCharsetDecoder() {
return charsetDecoder;
}

public synchronized void setCharsetDecoder(CharsetDecoder charsetDecoder) {
this.charsetDecoder = charsetDecoder;
this.charsetname = null;
this.charset = null;
}

public TextIterable withCharsetDecoder(CharsetDecoder charsetDecoder) {
setCharsetDecoder(charsetDecoder);
return this;
}

public synchronized String getCharsetname() {
return charsetname;
}

public synchronized void setCharsetname(String charsetname) {
this.charsetname = charsetname;
this.charset = null;
this.charsetDecoder = null;
}

public TextIterable withCharsetname(String charsetname) {
setCharsetname(charsetname);
return this;
}
}

/**
* @author Hanson Char
*/
@NotThreadSafe
class LineIterator implements Iterator<String>, Closeable {
private boolean hasNextExecuted;
private String line;
private LineNumberReader lnr;
private final TextIterable textIterable;
private final boolean returnNullUponEof;

LineIterator(TextIterable textIterable, InputStream is, boolean returnNullUponEof, Charset charset) {
this.textIterable = textIterable;
this.returnNullUponEof = returnNullUponEof;
InputStreamReader isr = null;

try {
isr = charset == null
? new InputStreamReader(is)
: new InputStreamReader(is, charset)
;
lnr = new LineNumberReader(isr);
} catch (Exception ex) {
try {
if (lnr != null)
lnr.close();
else if (isr != null)
isr.close();
else if (is != null)
is.close();
} catch (Throwable ignore) {
}
}
}

LineIterator(TextIterable textIterable, InputStream is, boolean returnNullUponEof, CharsetDecoder decoder) {
this.textIterable = textIterable;
this.returnNullUponEof = returnNullUponEof;
InputStreamReader isr = null;

try {
isr = new InputStreamReader(is, decoder);
lnr = new LineNumberReader(isr);
} catch (Exception ex) {
try {
if (lnr != null)
lnr.close();
else if (isr != null)
isr.close();
else if (is != null)
is.close();
} catch (Throwable ignore) {
}
}
}

public boolean hasNext() {
if (hasNextExecuted)
return line != null;
try {
hasNextExecuted = true;

if (lnr != null) {
line = lnr.readLine();

if (line == null)
close();
}
return line != null;
} catch (IOException e) {
throw new IllegalStateException(e);
}
}

public String next() {
if (hasNextExecuted) {
hasNextExecuted = false;
return line == null
? eof()
: line
;
}
return hasNext()
? next()
: eof()
;
}

private String eof() {
if (returnNullUponEof)
return null;
throw new NoSuchElementException();
}

public void close() {
if (lnr != null) {
textIterable.removeLineIterator(this);
closeInPrivate();
}
}

public int getLineNumber() {
return lnr == null
? -1
: lnr.getLineNumber()
;
}

void closeInPrivate() {
if (lnr != null) {
try {
lnr.close();
} catch (IOException ignore) {
}
line = null;
lnr = null;
}
}

public void remove() {
throw new UnsupportedOperationException("remove not supported");
}

@Override
public void finalize() {
try {
super.finalize();
} catch (Throwable ex) {
}
close();
}
}

Comments:
Looks very useful. You're right, there should be something like this in the JDK.
TL
 
I like the idea as well.

Shouldn't your next() throw NoSuchElementException instead of returning null?
 
Also, BufferedReader might have less overhead than LineReader.
 
You are right about the iterator API throwing NoSuchElementException. But I also like the behavior of returning null. So maybe the default should throw the exception, with a configurable option to return null upon no more element ?

I chose to use LineReader thinking I can provide the line number during iteration, but then forgot about it. Thing to do.
 
I think that would work. As long as the default is throwing the exception, since that's how the api specifies it.
 
Source code updated. Thanks, lumpynose :)
 
Just realized the generics in the code were not properly displayed due to angle brackets in html. Fixed now.
 
Nice idea...but seeing all the synchronizations and array copying makes me cringe :)

Why not simplify this by 1) expecting a BufferedReader in your constructor. This way the user can setup the Reader all they want before passing it to you. Nothing stops you from then having a further convenience constructor that takes in a File of course. 2) getting rid of the "feature" that each call to iterator() returns a new instance of your Iterator implementation. Why not return the same Iterator instance for the lifetime of your Iterable? This rids you of all the housekeeping code necessary for keeping track of past instantiations. It will really simply the code and make it a lot more performant and still be pretty useful, methink :)

Anyway just my 2 cents... :)
 
The design of TextIterable is to free the client from worrying about the lower level Reader, Streams, etc., and allow them to operate at a higher level of abstraction - File, URL and resource. If you have a BufferedReader that you'd like to iterate through, maybe the Jakarta commons-io LineIterator is what you want ?

http://commons.apache.org/io/apidocs/org/apache/commons/io/LineIterator.html

The iterator() method returns a new instance of iterator so multiple threads can operate on the same instance of TextIterable. I have considered removing this feature as you suggested, but so far it still appears to me a nice little feature to keep that comes with very little overhead.
 
Ummmm, maybe I'm missing something, but what's stopping you from having other constructors that take File, URL, resource paths, what not, and still have it all based on BufferedReader? I mean in a way that's what you've done with your other constructors, no? So it can still be super convenient if all you wanted is to work on files and yet you can still work lowlevelish and deal with the Readers directly.
 
>in a way that's what you've done with your other constructors, no?

One difference is that the internal stream is open everytime the iterator() method is invoked, allowing threadsafe concurrent iteration. If a constructor is overloaded directly with a BufferedReader, that would make it incompatible both for such parallel iterations, and for the stream opening/closing modus operandi.
 
http://java.sun.com/j2se/1.5.0/docs/api/java/util/Scanner.html
 
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?