PDA

View Full Version : Fatal trap 12: page fault while in kernel mode


patrik
03-01-2007, 01:05 AM
Yesterday one of our server crashed. I logged into DRAC console and rebooted the machine. A few minutes it crashed again so I rebooted once more. Then it crashed again a few minutes later but this time I managed to get a screenshot of the terminal.

This is what it said:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 06
fault virtual address = 0x4
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc05c6094
stack pointer = 0x28:0xea6b98cc
frame pointer = 0x28:0xea6b98d0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 1269 (da-popb4smtp)
trap number = 12
panic: page fault
cpuid = 2
Uptime: 2m5s
Cannot dump. No dump device defined.
Automatic reboot in 15 seconds - press a key on the console to abort

After next reboot I instantly stopped the da-popb4smtp process and since then it has run smoothly. I'm not 100% sure that da-popb4smtp has anything to do with this, though. It could be the case that the page fault doesn't occur anymore and when I managed to get a screenshot it happend to be da-popb4smtp which was the current process.
I did update DirectAdmin to 1.292 (from 1.28 I think) earlier that day (14.57). The server first crashed somewhere around 15:50.

It is running FreeBSD 6.1-STABLE.
What could be the cause of the problem? Bad memory?
Is there any point of "defining a dump device"? And how is that done?

pucky
03-01-2007, 01:14 AM
Have you searched Google? http://www.google.com/search?hl=en&q=Fatal+trap+12%3A+page+fault+while+in+kernel+mode+freebsd

I know we had a problem with the box rebooted every 14days at one stage and then it mysteriously stopped. That was 85 days ago and since then it never happened again. First i quested the NOC about them rebooting the box and they denied it. Then it happened about 3 more times at 10 and 20 day uptime interval with no evidenance as to why. But it has not happend in the last 85 days. Maybe it was something to a related update that fixed a bug or serious something along the way but it was impossible to find. FreeBSD 6.1 here also. Thankly we have not seen the above problems.

Your problem on the other hand could be related to hardware i feel. Could be memory and its a place to start. Doesnt hurt to get your memory modules replaced then go from there. No point in trying to get them checked as sometimes those results are completely unreliable.

patrik
03-01-2007, 01:40 AM
Yes, I have searched on google but thought I would get better answers in a discussion here. It's easy to change memory modules so I guess that could be a start.

jlasman
03-01-2007, 01:27 PM
We remove two 512 sticks from one of our server and replaced it with to 1G sticks. It started failing within 24 hours of starting every time.

We pulled those two sticks and installed a different set of two 1G sticks. Works for a week already.

:)

Jeff

BigWil
06-12-2007, 02:49 PM
Jeff,

Hey did your kernel panic ever come back after replacing those sticks?

We're still getting these kernel trap 12 panics. These machines with same sticks ran fine back in the 4.10 days but every since going to 6.1 and 6.2 these panics happen and their isn't the slightest log as to the reason why.

I see many others out on the web having the same issues and not a word out of the FreeBSD groups and core as to why it could be happening or how to fix it.

BigWil

jlasman
06-12-2007, 08:41 PM
Nope on our (CentOS) servers it was the memory.

I don't know enough about FreeBSD to even guess.

May I ask who your avatar is? (Mine is me ... no really it's Einstein but when I let my hair grow out it looks like me ;) .)

Jeff

BigWil
06-13-2007, 01:19 AM
Oh I don't remember. Went through a collection of avatars a long time ago and said "hey I used to be that hot before age caught up to me" and presto I have had it since.

BigWil

patrik
06-13-2007, 04:05 AM
We haven't replaced our memory sticks (yet). The system is working just fine without pop4smtp started.

BigWil
06-13-2007, 11:12 AM
I don't think, at least in our case, that it is popb4smtp because we stopped using that about 2 years ago. We use authentication only.

Back in the days of 4.10 we had never even heard of a Fatal trap 12. Heck we never had a kernel panic at all back then. I remember the glorious uptime messages of 100 or more days. But now that we have gone to 6.x and we scan our own spam we are luck on most machines if we see a week. The good news is that most of the machines dump and restart. Though we have one supermicro in partiular that is stubborn and requires a manual power cycle. The guys at the datacenter know her by name at this point.

BigWil

patrik
06-13-2007, 03:56 PM
You're happy if your 6.2 servers runs for a week? Heck, sounds like you're having some serious issues. We run several 6.x servers and we aren't experiencing much problems at all actually. The panic I talked about in this thread is the first panic we've got and that has been solved by now.

jlasman
06-17-2007, 11:10 AM
I remember the glorious uptime messages of 100 or more days.
And I remember uptime on BSD-OS (a commercial version of BSD) and also on early slackware linux distributions, of over a year, so it's all relative.

One of our CentOS servers today shows an uptime of 162 days. Another shows 113 days. Heck, my linux desktop has an uptime of 43 days, though X crashed last week and I had to re-login.

Generally the problems requiring you reboot are memory usage related; often swapfile related. You may just need more memory.
But now that we have gone to 6.x and we scan our own spam we are luck on most machines if we see a week.If you're scanning all spam, and not using blocklists first to get rid of 70% of it, then yes, you're going to have a memory problem; remember that SpamAssassin is a perl script and is not the best at memory management.
The good news is that most of the machines dump and restart.
And is the bad news that it has to do that ;) ?
Though we have one supermicro in partiular that is stubborn and requires a manual power cycle. The guys at the datacenter know her by name at this point.
Scary to me. One of our routers is a SuperMicro system running one of the BSDs (sorry, don't know which one or which version) and it's only required a reboot once in three years; that time it failed because of a power outage. I'd look into the memory issue if I were you.

Jeff

BigWil
06-18-2007, 02:24 AM
Generally the problems requiring you reboot are memory usage related; often swapfile related. You may just need more memory.

Dual Xeon, 2GB RAM, 4GB SWAP. Not so sure that more is needed, but possibly better use of it could at least help avoid the problem. See next.

If you're scanning all spam, and not using blocklists first to get rid of 70% of it, then yes, you're going to have a memory problem; remember that SpamAssassin is a perl script and is not the best at memory management.

Yes this is my suspicion. Either SpamAssasin or Clam as the problem didn't occur on any of the machines until we were put in the situation where we needed to run them full time on the servers. We used to pass everything through Postini which kept everything pretty clean before it got to the machines.

So which blocklists are you recommending? We already run RBL. Are you talking about the /etc/virtual blocklists specific to hosts? Do you have a list of well known spam relays? Care to share?

And is the bad news that it has to do that ;) ?

Yah that would be the bad news as well.

BigWil

elvandar
06-19-2007, 01:36 PM
Please refer to http://www.freebsd.org/doc/en/books/developers-handbook on how to obtain kernel crash information for the dump you mentioned, then send-pr the information towards the freebsd-bugs team (http://www.freebsd.org/send-pr.html) with an abstract of the information obtained with the kernel dump and a location where we (FreeBSD team) Can download the dump if needed.

Only that way you can see what is going on, the information you just presented is just worthless for investigation (Sorry to put it this hard).

Regards,
Remko
FreeBSD.org

jlasman
06-21-2007, 09:47 PM
BigWil, we find that if we run SpamBlocker on all domains (see /etc/virtual/use_rbl_domains) it cuts down enough on the email coming into the box so SpamAssassin works well with the rest.

Jeff

BigWil
06-22-2007, 01:11 AM
Jeff,

I've been doing that since day one and wouldn't have it any other way:
lrwxr-xr-x 1 root mail 7 May 3 2006 use_rbl_domains -> domains

However we still get TONS of mail and all of it that makes it through gets scanned for virus using clamd and spamassasin. Very heavy footprint but I guess that is all that can be done at this point.

Alot of the traffic that I do notice while I am watching is out of China, Korea, Argentina, Russia and Indonesia and most all of it is spam. RBL catches most and SA catches the rest but at the cost of alot of resources.

Well I got a cold one waiting for me.... I haven't had a break in days.

Cheers,

BigWil

jlasman
06-23-2007, 01:03 PM
BigWil,

Have you updated to SpamBlocker3? The way it calls ClamAV is supposed to use less resources than the methods posted here for earlier versions of the exim.conf file.

It's till beta, and it's not perfect. But it may work for you.

Jeff

BigWil
06-23-2007, 04:07 PM
Jeff,

We have been using Spamblocker3 for a very long time now. Unless you added something and I was unaware:

# uncomment to define AntiVirus scanner here:
av_scanner = clamd:/var/run/clamav/clamd

BigWil

jlasman
06-23-2007, 04:17 PM
No further answer today, then :( .

Jeff

BigWil
06-23-2007, 06:36 PM
Jeff,

Though I appreciate your help I think the only answer is that the level of spam our domains receive is so high that it is kicking our mem and vmem butts because of some deficiency in FBSD 6.x. Not much we can do but to reboot a machine once in awhile.

Oh and to answer elvandar which I forgot to do until now.... The issue and dumps have been submitted to FBSD a couple of times now. Their only answer is that it has to be an issue with hardware. Hardware on 15 different machines all which worked with 4.x perfectly until some but all coincidently went bad when we made the 6.x upgrades. ;-)

You both enjoy your days..... especially you Jeff you need a break.

BigWil

jlasman
06-25-2007, 07:53 PM
I've decided I'm going to take a vacation this year, but I don't know when yet.

I need it ;) .

Jeff

BigWil
06-25-2007, 08:11 PM
Jeff,

I say that every year but it just never seems to happen.

BigWil

BigWil
09-21-2007, 06:15 PM
For those that were having these Kernel Panic Trap 12 errors on FreeBSD 6.1-6.2 could you please post the result of the following:

sysctl -a | grep nmbclusters

Better yet can those NOT having them on 6.1-6.2 and running Exim please post theirs as well.

I think I may have found a connection and can possibly defend Exim's innocence.

Thanks,

Big Wil

HH-Steve
09-22-2007, 11:29 PM
Here is my output from 6.1 with no problems:

kern.ipc.nmbclusters: 16576


Steve

BigWil
09-23-2007, 12:34 PM
Thanks. What I am noticing is that each of the kernel builds were build with nmbclusters set to 8192 an old setting from previous finetuning with lower memory back in the 4.10 days. The two machines that had it set to 0 don't seem to have the problem. One with a very higher value about two times yours doesn't have the problem.

I have set two of the 8192 machines to 0 in loader.conf. We should know in a few days whether this has fixed the issue. I was informed that setting nmbclusters to 0 is a workaround for a bug that seems to exist in 6.1 on and probably won't be completely fixed until FBSD 7.

#### References
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2006-06/msg00011.html
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2006-06/msg00621.html
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2006-06/msg00636.html


Lets hope the workaround does the trick because I am not anxious for 7 yet.

BigWil

BigWil
09-23-2007, 12:36 PM
You're happy if your 6.2 servers runs for a week? Heck, sounds like you're having some serious issues. We run several 6.x servers and we aren't experiencing much problems at all actually. The panic I talked about in this thread is the first panic we've got and that has been solved by now.

Hey Patrik what was your nmbclusters value?

Oh and by the way, if it was solved would you mind telling us what you did to solve it?

Big Wil

melker
01-26-2008, 12:27 PM
Hi,
I the new guy managing the servers Patrik used to manage.
This evening we had a similar Kernel Trap 12. da-popb4smtp is running on the server and I will let it run for a couple of days to see if the server crash again.

the nmbcluster-value is 25600.

BigWil
01-27-2008, 01:49 AM
Well Hi newguy. What OS release are you running on that thing? We upgraded all of our kernels that were having the issue. Kernel version 6.2-RELEASE-p8 or greater will fix that bug. They finally found it after telling us for almost two years that we had bad memory in our machines. Graciously they did retract the statement with a jolly "Sorry".

Upgrade to 6.2-RELEASE-p8 and I think you will be more than pleased.

BigWil

melker
02-07-2008, 04:32 AM
Hi,

Im running 6.2 and a compiled kernel from july 2007 so i guess its not p8.
I had not have any more crashes yet. We are going for 6.3 now so i guess it wont be any p8 but thanks anyway.