Programming Assignment 5: Kd-Trees


Write a data type to represent a set of points in the unit square (all points have x- and y-coordinates between 0 and 1) using a 2d-tree to support efficient range search (find all of the points contained in a query rectangle) and nearest-neighbor search (find a closest point to a query point). 2d-trees have numerous applications, ranging from classifying astronomical objects to computer animation to speeding up neural networks to mining data to image retrieval.

Range search and k-nearest neighbor


Geometric primitives. To get started, use the following geometric primitives for points and axis-aligned rectangles in the plane.

Geometric primitives
Do not modify these data types.

Brute-force implementation. Write a mutable data type PointSET.java that represents a set of points in the unit square. Implement the following API by using a red–black BST:

public class PointSET {
   public         PointSET()                               // construct an empty set of points 
   public           boolean isEmpty()                      // is the set empty? 
   public               int size()                         // number of points in the set 
   public              void insert(Point2D p)              // add the point to the set (if it is not already in the set)
   public           boolean contains(Point2D p)            // does the set contain point p? 
   public              void draw()                         // draw all points to standard draw 
   public Iterable<Point2D> range(RectHV rect)             // all points that are inside the rectangle (or on the boundary) 
   public           Point2D nearest(Point2D p)             // a nearest neighbor in the set to point p; null if the set is empty 

   public static void main(String[] args)                  // unit testing of the methods (optional) 
}

Implementation requirements.  You must use either SET or java.util.TreeSet; do not implement your own red–black BST.

Corner cases.  Throw a java.lang.IllegalArgumentException if any argument is null. Performance requirements.  Your implementation should support insert() and contains() in time proportional to the logarithm of the number of points in the set in the worst case; it should support nearest() and range() in time proportional to the number of points in the set.

2d-tree implementation. Write a mutable data type KdTree.java that uses a 2d-tree to implement the same API (but replace PointSET with KdTree). A 2d-tree is a generalization of a BST to two-dimensional keys. The idea is to build a BST with points in the nodes, using the x- and y-coordinates of the points as keys in strictly alternating sequence.

  Insert (0.7, 0.2)  

insert (0.7, 0.2)
  Insert (0.5, 0.4)  

insert (0.5, 0.4)
  Insert (0.2, 0.3)  

insert (0.2, 0.3)
  Insert (0.4, 0.7)  

insert (0.4, 0.7)
  Insert (0.9, 0.6)  

insert (0.9, 0.6)
Insert (0.7, 0.2)
Insert (0.5, 0.4)
Insert (0.2, 0.3)
Insert (0.4, 0.7)
Insert (0.9, 0.6)

The prime advantage of a 2d-tree over a BST is that it supports efficient implementation of range search and nearest-neighbor search. Each node corresponds to an axis-aligned rectangle in the unit square, which encloses all of the points in its subtree. The root corresponds to the unit square; the left and right children of the root corresponds to the two rectangles split by the x-coordinate of the point at the root; and so forth.

Clients.  You may use the following interactive client programs to test and debug your code.

Analysis of running time and memory usage (optional and not graded). 

Submission.  Submit only the files PointSET.java and KdTree.java. We will supply algs4.jar. Your may not call library functions except those in those in java.lang, java.util, and algs4.jar.

This assignment was developed by Kevin Wayne.