Torvalds Blows Stack Over Buggy New Kernel
Oct 7, 2016 2:08 PM PT
Linux creator Linus Torvalds this week apologized for including in the just-released Linux 4.8 kernel a bug fix that crashed it.
"I'm really sorry I applied that last series from Andrew just before doing the 4.8 release, because they cause problems, and now it is in 4.8 (and that buggy crap is marked for stable too)," he wrote in a message to the Linux kernel mailing list. "In particular, I just got this -- kernel BUG at ./include/linux/swap.h:276 -- and the end result was a dead kernel."
The bug the dev was trying to fix has existed since Linux 3.15, "but the fix is clearly worse than the bug ... since that original bug has never killed my machine," Torvalds wrote.
The message became increasingly acrimonious, as Torvalds displayed the temper for which he's notorious.
"I should have reacted to the damn added BUG_ON() lines. I suspect I will have to finally just remove the idiotic BUG_ON() concept once and for all, because there is NO F*CKING EXCUSE to knowingly kill the kernel. Why the hell was that not a warning?" he fumed.
Torvalds acknowledged he was "grumpy," adding that "this went in very late in the release candidates, and I had higher expectations of things coming in through Andrew."
The reference presumably was to Andrew Morton, one of the Linux kernel's lead developers.
"Adding random BUG_ON()s to code that clearly hasn't had sufficient testing is *not* acceptable, and it's definitely not acceptable to send that to me after rc8 unless it has gotten a *lot* of testing, which it clearly must not have had," Torvalds continued.
"I've ranted against people using BUG_ON() for debugging in the past. Why the f*ck does this still happen? And Andrew - please stop taking those kinds of patches! Lookie here:
so excuse me for being upset that people still do this sh*t almost 15 years later," Torvalds concluded.
The Bug Fix That Wasn't
It was the addition of the BUG_ON() line that killed the kernel.
BUG() and BUG_ON() are used as debugging help when something goes very wrong in the kernel.
BUG() and BUG_ON() are the same instruction; the former is used in older kernels, and the latter from the 2.6 kernel on.
The instruction is an invalid one, which leads the CPU to throw an invalid opcode exception.
When a BUG_ON() assertion fails, or the code takes a branch with BUG() in it, the kernel will print out the contents of the registers and a stack process -- then the current process will die.
"This type of situation, while rare, is common enough in smaller and less visible projects, where testing processes and protocol are typically less sophisticated than those used by Linus and his team," noted Al Hilwa, a research program director at IDC.
The Grinch Who Rules Linux
Linux has grown dramatically over the 25 years of its existence, but its creator apparently hasn't seen the need to alter his trademark style of communication with its core developers.
Software development "remains a highly detailed and error-prone process, especially at these lowest levels of systems. Kernel work is [the equivalent of] brain surgery, and it's not a surprise that it's still not foolproof," Hilwa told LinuxInsider.
"Almost all engineers get grumpy about things like this," he said. "It's just that this team operates in the open, which is generally a wonderful thing -- but we get to see real human emotions in action."