Scenario: bundles of items (m:m).
List the duplicate bundles
Consider an application which maintains items and allows users to create
bundles of these items. Each item has various attributes (cost, no. in stock,
etc). Each bundle has various attributes (customer, order date, etc). Each
bundle may contain one or many items. Each item may be in one or many bundles.
The relational implementation has an items
table and a bundles table. The m:m
relatioship is implemented as the classical intersections
table with columns bundle_id and item_id.
The bundle composition for a given bundle_id
is the set of item_id's it contains.
Do many bundles have the same composition? (Exactly the same item_id's
but in any order.)
Generate a list of bundle compositions for each distinct composition that
is contained in two or more bundles. For each, list the bundle_id's
that contain it.
- Create the table. (We need only
the intersections table for this exercise.)
With so little data, you can see the solution by eye...
1+2 4, 5, 7
1+2+3 1, 2, 6
- Naive PL/SQL implementation. For
each distinct bundle_id (outer cursor
loop), get the list of item_id's in
order (inner cursor loop with where
for this bundle_id). Concatenate the
item_id's into a string (the bundle
composition), and store the bundle_id
in an associative array indexed by bundle composition. As two or
more bundle_id's are found for the same
bundle composition, concatenate them into a list at that array index. When
done, report only those array elements whose bundle_id
list has two or more members.
- Refined PL/SQL implementation. Scan
the intersections table just once, order
by bundle_id, item_id. Detect when the bundle_id
changes to get the item_id's for each
bundle_id. Else, same logic. The advantage
is a single table scan rather than umpteen individual index lookup selects
for each bundle_id. The control logic
is a little bit more elaborate than in the naive implementation, but the
logic for populating the associate array and for reporting is unchanged.
- This scenario
provides an example of a requirement which is very difficult to implement
in pure SQL, and where the solution - if you do manage to debug it - can
have dramatically worse performance characteristics than the PL/SQL solution.
Completing it is left as an exercise.
Here's a start.