Hi Brad,
Yes, after thrashing around 22 pages worth of stuff, I think the one processor per cell approach is the nicest. No problem with the processor activating the shunt. The small processors run nicely on 2.5 to 5v and are around $1 ea.
This approach would be inherently modular, and you would need a 'master unit' that would listen to all the cells and perform the cutoff funcitons as well as display cell information. If there is one weak cell in a long string, it would be nice to know which one it was. Smart software could do this automatically. The master unit could also be the interface to calibrate or change the set points for the cell units. You might want a display like a CA with some buttons, but this would add to the cost. If you had a display, it could have a bar graph 'fuel gauge' that automatically self calibrates, as well as individual cell measurements. Alternatively, you could have a USB interface and plug it into a computer to display cell information and reprogram set points.
The part I hadn't decided on was what the best way to have them communicate. They are all sitting at different voltages, so either you need optocouplers on each one or use a communication that happens at a high enough frequency that you can capacitively couple them. I don't know what the maximum number of devices (cells) is that you can run on a single bus. Seems like data collisions would happen or something if there were too many. I suppose the master unit could poll each cell in sequence to keep things straight.
Cell units could be built for any number of cells and daisy chained with just the data bus lines.
You could build them stackable with connectors so you just snap them together to make any size BMS. When the battery was not charging or discharging, the cell units would go to sleep to conserve power.
The design would be scaleable by using larger, more powerful dissipators. Everything else would be the same.
The hardware layout is pretty straightforward. The problem is software. I know what I want it to do (sort of), but don't have a clue how to program a PIC. Overall, this approach will likely be more expensive than an analog design in terms of parts, but may be easier to produce.