BMS Failure Mode Data Collection Thread

I have had the following failure modes in my battery BMS:

  • Internal Current Limiter (FETs) let out magic smoke

    Votes: 2 9.5%
  • Cell tap wires fatigued/broke

    Votes: 5 23.8%
  • Cell tap connector failure (wear, crimp failure, or went resistive)

    Votes: 3 14.3%
  • Power lead connector failure (wear, crimp failure or went resistive)

    Votes: 2 9.5%
  • BMS failed to catch undervoltage condition on one or more cells

    Votes: 3 14.3%
  • BMS failed and overvoltaged one or more cells

    Votes: 2 9.5%
  • BMS discharged and ruined pack in storage

    Votes: 8 38.1%
  • Internal short in high current leads on or near PCB

    Votes: 1 4.8%
  • Internal short/open circuit on PCB

    Votes: 3 14.3%
  • Water/moisture damage to PCB/Components

    Votes: 1 4.8%
  • Component failure on PCB (note in post)

    Votes: 8 38.1%
  • BMS interface failure with charger

    Votes: 1 4.8%
  • BMS interface failure with controller

    Votes: 0 0.0%
  • Internal fuse failed/blew

    Votes: 0 0.0%
  • It just quit working and I don't know why

    Votes: 4 19.0%
  • Other (explain in post)

    Votes: 1 4.8%

  • Total voters
    21

bigmoose

1 MW
Joined
Aug 6, 2009
Messages
2,675
Location
North coast USA
Alan B created this thread http://endless-sphere.com/forums/viewtopic.php?f=14&t=22378 to design more reliable Cell Monitoring Electronics. I recommended that we gather failure mode data so that he can be sure to design out the known failure modes. Jeremy added a good bit of meat to the bones with a recommendation of doing a BMS FMECA.

So with Alan's recommendation, I am starting this thread to vote for the failure modes that you have seen in BMS's and if I do not have them listed, add a post and I will try to edit the poll. Feel free to comment in posts.

The intent is to get a statistical picture from experience on what has failed in BMS's in the field.

For the poll to be valid, please only vote for items that you have directly experienced.
 
I used a voltblocher which is a shunt/bleeder type balancing bms. The transistor failed which caused it to short internally and short the entire cell. I have been without a bms on my car for over a year and 7,300 miles. While i believe it would be better to have one, it needs to be cost effective and reliable. There just hasn't been enough time, in my opinion, for a BMS manufacturer to develop a solid reputation. In the mean time, high C rate LiFePO4 batteries have proven that they actually don't need a lot of help. Occasional balancing is all that's really necessary.
 
jondoh be sure to vote for: Component failure on PCB (note in post)
so the poll will catch your statistics and the specific component (transistor) as listed in your post.
 
one of the problems with this format is that people will not know the cause but assume that they do and will ascribe the failure mode without the normal course of evidence.

so maybe there could be a special code, to distinguish between what people guess, and then the cases where people repaired it and could show that their repair fixed the problem.

this will help those of us who are interested in these failure modes since i don't usually believe the arguments people make without some kinda evidence. that is hard to get.

also how to distinguish the BMS problem from a problem with the cells the BMS controls. such as an internal short in a cell or an open creating the problem initially.

getting people to admit they damaged it by accident is hard also. but that would help a lot.
 
bigmoose said:
The intent is to get a statistical picture from experience on what has failed in BMS's in the field

I had to replace already several BMS, some due to the same failure. I would like to submit multiple votes - how should I do that?
 
Rolf, you may have to post the extra data in a post. Unless we can get an other member to vote your extra's... but that will take a lot of coordinating.

I just got a heads up via PM that if I edit my first post I will erase all the votes in the poll!

I inadvertently already did that once... so I can't be editing the original. Perhaps I can keep some type of running tally on the next post I made in the thread of the "extra's". .. I did set it to allow 10 votes per member. Enter a vote. Tell it to save it, then try the poll again. See if it will let you vote one category more than once. I also set it to allow you to "edit" your vote.

I am open to ideas from folks that know this software better.
 
It might be that people describing their failure in a posting would work better than the voting (though the voting doesn't hurt). Then by reading the postings carefully various different tallies could be made. Those might be summarized periodically in new postings. Detailed descriptions of what happened, stick with the facts, then any measurements or additional information might be better than just conclusions.
 
My BMS was continuously discharging a cell group in my Ping 48v 15 AH battery. Ping sent me a replacement BMS and it is working fine now. After receiving my new BMS and replacing the cell group I got the battery in balance then connected the old BMS. By morning the cell group read 3.2 volts whereas the others read 3.5 volts. I sent the old BMS to Ping and requested that he let me know exactly what failed. When I find out I'll post what he says...
 
This thread is a brilliant idea. Well done, guys.

Here's a failure I had, but at the moment I don't see a box to tick for it. I built a cell level LVC detector with the standard arrangement of a threshold detector and an optocoupler to signal low voltage. The problem is that this will signal when the cell goes below [2.2 V] or whatever you choose, but when the cell goes below about 1.5 V it can't operate the optocoupler. So it really only signals that the voltage is in the 2.2 to 1.5 V window.

If you don't use the pack often enough, a cell could go through this window between uses. You don't get a LVC indication even though the cell is way below LVC. Its not just a theoretical observation. This happened to me with 2 cells in a 24 cell pack. I can't say whether the cells could have been saved by a better design, but its definitely a weakness in this design.

Another issue I had was to do with the balance tap wires. I don't count it as a failure mode because I caught it before putting it into use, so its just a warning. I decided to use wires with high temperature, double sheathed insulation simply because of the potential for disaster if these wires got damaged or hot. The problem comes, however, when you fit the double barrel crimp terminals used in connectors such as the JST XH or Molex Microfit 3.

These terminals have two crimps. One that goes on the conductors and one that goes over the insulation for strain relief. With PVC or silicone insulation you crimp it hard enough to distort the sheath. But I found that if you do that with the harder compound sheath, it crushes the wire and it is weakened enough to break.

So, the warning is that the crimp technique is important with some of these high spec wires. You can't just look up the wire diameter and choose the crimp tool accordingly.

Nick
 
This is a personal bugbear of mine - LVC that stays stuck on and self discharges down the the VF of the optocoupler - I had one that did this, it would have been nice to have a loud beeper so when the pack was put into storage it would have chirped or beeped or done something useful other than keep dragging the cells down.

Had shunt transistors short and drag cells down to 0V - twice. Ebrake lines dropping is nice, but to disconnect the load (as commercial BMS'es do) would be ideal. I built one recently for LiCo laptop cells for a Xenon worklamp - it monitors HVC and LVC and drops the charger or the load if stuff happens that shouldn't. The LVC circuit has a narrower window (series LED with opto) so that the cells don't get dragged down to a terminal (deadly) voltage, and the shunt has a series zener so even if it does short, it wont kill the cells either. It doesn't run a high balancing current so this is not a major issue.

In short - the BMS should NOT kill cells - either by over charging or overdischarging - even in the case of component failure.
 
As of this morning it looks like 23% of our failure modes are on the battery side of the BMS with cell tap wires/connectors. Perhaps the JST connectors need to go, and aerospace strain relief practices implemented on the battery wiring loom.

Interesting that one in four of the failure modes of the "BMS/Battery System" are not attributable to the BMS itself, but are on the battery side of the interface!
 
Bigmoose,

With you on task I believe we'll finally get a BMS that I can trust enough to use. I've resisted for 2 years because I can't get myself to use something to protect my batteries that has a reasonable likelihood of killing them instead. It's perfect timing too because I'm soon to be the proud papa of over 4kwh of A123's that I'd like to protect. Once you solve the issues can we rename them BMMS's for Big Moose Management System? :mrgreen:

I'll add my only known failure mode, which some may not consider a failure, but I do since it killed cells in 2 idle packs. That is the BMS drawing a very slight current from one cell when not in use, so while I checked the overall pack voltage occasionally and the packs looked fine, over 18 months a cell in each pack was killed directly by the BMS leaving the other cells with less than 0.2V drop over that period.

John
 
Perhaps the JST connectors need to go, and aerospace strain relief practices implemented on the battery wiring loom.

Having worked for JST once, I'll have to say that the XH connector (used for many of the balancing connectors) was never meant for more than 30 mate/unmate cycles. It was strictly designed for internal connections for the consumer electronics industry-- televisions and VCRs. Most of the XH connectors out there are actually cheap chinese clones not made by JST. If I were to specify a connector, I would recommend the SM series with gold plated contacts. This connector should be good for at least a few hundred cycles.
 
Little successess thrill us simple folks! I just learned how to multi quote! Whooppee!

jondoh said:
Having worked for JST ... the XH connector ... was never meant for more than 30 mate/unmate cycles. << That needs to be chiseled in stone! I would recommend the SM series with gold plated contacts.
^^^Great advice johdoh! Thank you for posting that info!
John in CR said:
Bigmoose, With you on task I believe we'll finally get a BMS that I can trust enough to use.
Thanks for the confidence John. But this was an effort by Jeremy and I to get out in front of us all data on where the actual failures are in the field and in use. Also to help Alan B in the design of his BMS, his thread here: http://endless-sphere.com/forums/viewtopic.php?f=14&t=22378

I also liked CamLight's recommendation of using the TI line, particularly TI's gas gauges bq78PL114 and bq76PL102. I have not had direct personal experience with those chips, but starting to read up on them because of CamLight's recommendation. (I have a LOT of respect for his tips, advice and observations!) TI is pretty good at these things, and one feature that caught my eye is that the above mentioned chip line is smart enough to go into standby mode after announcing the LVC and drop the chip power consumption to a couple of micro amps. That feature would design in the solution to another noted failure mode in our poll.

Also methods has had success developing a BMS around the LTC6802-2 chipset. His thread is here: http://endless-sphere.com/forums/viewtopic.php?f=14&t=6602&hilit=bms&start=255

GGoodrum has been the spearhead of BMS development, I believe this is his latest thread: http://endless-sphere.com/forums/viewtopic.php?f=14&t=17168

I am "noodling" on a 2 cell BMS for a product retrofit... but I am not there yet.

So, hoping that this data might help all BMS designers on the sphere!
 
My vote = Component failure on PCB

BMS = Fechter/Goodrum 16S LiFePO4 v.2.2 HVC/LVC BMS setup with 15.4 ohm 2.5watt rated shunt resistors (~230ma shunt current)

In my case, failure was related to excess HEAT within my BMS enclosure during charge balancing (It was uncomfortable to touch the enclosure for more than a few seconds!). This caused damage to 5 of the LVC optocouplers paralleled into the e-brake line. Rather than the phototransistors within each optocoupler IC failing completely as open or short circuit regardless of recieved input signal, each of the 5 damaged phototransistors exhibited varying degrees of low impedance between collector and emmitter which resulted in enough leakage current to drag down the ebrake line enough to turn the controller off even when LVC hadn't actually been detected. The damage was measurable by sequentially wiring one cell at a time into it's appropriate BMS channel which revealed varying resistance between the ebrake line wire-pair with each different damaged optocoupler tested. With all connected cells in high SOC, this leakage current was apparent regardless of the >2.1V logic state of each TC54 voltage detector feeding the phototransistor triggering LED.

Replacing all 5 damaged optocouplers fixed the problem, and placing a large vent hole in my BMS enclosure to allow heat to escape stopped the problem reoccurring however ::sigh:: it is no longer a water resistant enclosure :cry:

Moral of the story = Despite Gary indicating that these older design BMSes populated with the lower spec ~250ma shunts can be operated without ventalation, it is not reliable recepe.
 
Having some experience with reliability testing, I can say the heat and especially heat cycling contributes much to electronic component failure. Electric bikes have thermal cycling, vibration and moisture. This is a tough environment.
 
I just wanted to thank folks for contributing to this thread. I especially get more out of the detailed descriptions of the situation, the report of what happened, and any details that are available.

There are a variety of different things we might do to improve on the reliability of BMS systems, and understanding the failures in some detail is really helpful in evaluating different solutions.
 
Back
Top