Programming

my faceChris Foley is a computer programming enthusiast. He loves exploring new programming languages and crafting nice solutions. He lives in Glasgow, Scotland and works as a software developer.

Iterating Text Files

My latest Java snippet lets me iterate through a text file with a simple for...each loop. Using it looks something like this. Java's file handling has always seemed overly verbose to me (especially when I only ever want to do simple things with it) so I'm glad to have a more concise idiom.

for(String s : new TextFileIterator("C:\\anywhere\\anyfile.txt")) {
  System.out.pritnln(s);
}

However, this snippet started life as a CSV reader method I wrote for a project I'm working on. I was going to put together a bunch of CSV utility methods in a class and post it as a snippet. If you're interested, the method looked something like this:

	public static String[][] openCSV(String filename) throws FileNotFoundException {
		
		Scanner in = new Scanner(new File(filename));
		Queue<String> lines = new ArrayDeque<String>();
		while(in.hasNext()) {
			lines.add(in.nextLine());
		}
		
		String[][] csvTable = new String[lines.size()][];
		for(int i = 0; i < csvTable.length; i++) {
			csvTable[i] = lines.remove().split(",");
		}
		return csvTable;
	}

A few things bugged me about this method. There is no way of telling how many lines are in a file without reading the whole lot in. That means I can't initialise the array until I've read everything in, which has the knock-on effect of having to store the whole lot in a temporary data structure and copy it over.

Second is that the Scanner smells like an Iterator (look how nicely it plays with the while loop) but doesn't implement Iterator. This means it can't be used with Java's the sexy new for..each syntax. (actually, it would need to implement Iterable for that but you get the idea).

As an aside, the reason is that a Scanner isn't really anything to do with files. It's a more general String parser. If it were to be an Iterator, it would probably return space-delimited tokens, not whole lines. But that doesn't help me. I only use it for files, and it's a bit clumsy for that.

Third is that I've been playing with Ruby lately and that split() is embedded right in the middle of the method. A similar Ruby method would instead pass a block (which could be some operation, maybe split) and yield to it. Calling such a method might look something like this (if you excuse the bastard Java/Ruby love child syntax)

fileIterator.each do |s|
 s.split()
end
That has to be way more flexible! I was about to reimplement my CSV method to emulate this. It was going to be messy. I'd pass an Operation object (abstract) which would define one method: accept a String and return... a generic T. Then concrete implementations could be anonymous. And that idea was terrible. Straight in the bin.

Fortunately, Iterators come in two forms: external and internal. External iterators are like the ones in Java. A method returns a copy of the iterator and you can write a loop to go through each item. Internal iterators are like the ones you find in ruby. You pass a "method" to the object. The object then iterates through each item it contains and applies the method to each one.

For most applications, either type can be used and in this case, either type would work perfectly. It was as simple as playing to Java's strengths and writing an external Iterator. All that was left to do was to make it play nice with for..each loops.

Java for...each loops take an Iterable object. Iterable objects (like pretty much everything in the Collections framework) have an iterator() method which returns the external iterator (that's what the for..each loop does behind the scenes)

ArrayList<String> x = new ArrayList<String>();
Iterator<String> i = x.iterator();

So to make my file iterator play nicely I just had to implement Iterable and return itself in the iterator() method.

The class is a bit longer than it has to be. There are two constructors so I can specify either a File object or the filename as a String, and there are a couple of utility methods (one of which opens a CSV file. :D ) I'll probably add more over time. The class is listed below as well as in the snippet section.

I think I've found a way around all my grumbles with the original CSV opener method... except not being able to declare the array up front.

Thanks for reading!

import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;
import java.util.List;
import java.util.Scanner;


public class TextFileIterator implements Iterable<String>, Iterator<String> {

	private final Scanner fileScanner;

	public TextFileIterator(String fileName) throws FileNotFoundException {
		this(new File(fileName));
	}

	public TextFileIterator(File file) throws FileNotFoundException {
		fileScanner = new Scanner(file);
	}

	@Override
	public boolean hasNext() {
		return fileScanner.hasNext();
	}

	@Override
	public String next() {
		return fileScanner.nextLine();
	}

	@Override
	public void remove() {
		throw new UnsupportedOperationException();
	}

	@Override
	public Iterator<String> iterator() {
		return this;
	}

	public static <T extends Collection<String>> T addLinesToCollection(T collection, File file) throws FileNotFoundException {
		for(String s : new TextFileIterator(file)) {
			collection.add(s);
		}
		return collection;
	}

	public static String[][] csvToArray(File csvFile) throws FileNotFoundException {
		List<String[]> lines = new ArrayList<String[]>();
		for(String s : new TextFileIterator(csvFile)) {
			lines.add(s.split(","));
		}
		return lines.toArray(new String[lines.size()][]);
	}

}

06 January 2011

Comments