[RP-PPPoE] Very high CPU usage by the pppoe-server
Dardan Behluli
Dardan.Behluli at ipko.com
Mon Apr 4 09:12:27 EDT 2011
Hi,
You are right, 'gettimeofday' and ntp don't "go hand in hand", but there is definitely a difference in CPU usage since I removed the NTP. I can not say that the NTP made the difference, but I can say that there is a difference in CPU usage. I'm watching the situation closely and I'll inform you about everything.
1) I did the tcpdump and I got 1000 packets for some 7 minutes. I couldn't find anything suspicious there;
2) I shared the output file of the strace with you on dropbox;
3) Here is the output of the ps auxwf:
[PPPOE08]# ps auxwf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 1304 476 ? S Apr01 0:06 init [3]
root 2 0.0 0.0 0 0 ? SW Apr01 0:00 [keventd]
root 3 0.0 0.0 0 0 ? SW Apr01 0:19 [kapmd]
root 4 0.2 0.0 0 0 ? RWN Apr01 10:55 [ksoftirqd_CPU0]
root 5 0.0 0.0 0 0 ? SW Apr01 0:00 [kswapd]
root 6 0.0 0.0 0 0 ? SW Apr01 0:00 [bdflush]
root 7 0.0 0.0 0 0 ? SW Apr01 0:01 [kupdated]
root 116 0.0 0.0 0 0 ? SW Apr01 0:00 [loop0]
root 581 0.0 0.0 1356 560 ? S Apr01 0:24 syslogd -m 0
root 586 0.0 0.0 1296 432 ? S Apr01 0:00 klogd -x -c 1
ntp 621 0.0 0.0 1812 1804 ? SL Apr01 0:03 ntpd -U ntp
root 631 0.0 0.0 3432 1480 ? S Apr01 0:14 /usr/sbin/sshd
root 19896 0.0 0.0 6664 1968 ? S 14:32 0:00 \_ /usr/sbin/sshd
root 19904 0.0 0.0 2040 1132 pts/0 S 14:32 0:00 \_ -bash
root 24605 0.0 0.0 3852 1984 pts/0 R 15:08 0:00 \_ ps auxwf
root 648 0.0 0.0 1984 800 ? S Apr01 0:00 xinetd -stayalive -reuse -pidfile /var/run/xinetd.pid
root 659 0.0 0.0 1340 564 ? S Apr01 0:00 crond
root 689 18.2 0.1 5812 2708 ? S Apr01 863:00 pppoe-server -k -u -r -s -I -C PPPOE08 -L 46.99.88.1 -R 46.99.88.2 -N
root 725 0.0 0.0 1848 900 ? S Apr01 0:06 \_ pppd plugin /usr/lib/plugins/rp-pppoe.so eth1 rp_pppoe_sess 2724:
.......
root 24568 0.2 0.0 1848 896 ? S 15:07 0:00 \_ pppd plugin /usr/lib/plugins/rp-pppoe.so eth1 rp_pppoe_sess 1925:
root 24588 0.5 0.0 1848 896 ? S 15:07 0:00 \_ pppd plugin /usr/lib/plugins/rp-pppoe.so eth1 rp_pppoe_sess 2650:
root 696 0.0 0.0 1272 372 ? S Apr01 0:03 /etc/session_check 192.168.10.121
root 718 0.0 0.0 1284 404 ttyS0 S Apr01 0:00 /sbin/agetty -h ttyS0 9600 vt100
root 720 0.0 0.0 1276 376 tty2 S Apr01 0:00 /sbin/mingetty tty2
root 6103 0.0 0.0 5232 2096 ? S Apr01 1:33 pppoe-server
root 6337 0.0 0.0 1276 376 tty1 S Apr01 0:00 /sbin/mingetty tty1
Thank you very much again,
Dardan
-----Original Message-----
From: rp-pppoe-bounces at lists.roaringpenguin.com [mailto:rp-pppoe-bounces at lists.roaringpenguin.com] On Behalf Of Insane Laughing Clown
Sent: Friday, April 01, 2011 5:51 PM
To: For users of RP-PPPoE client/server software
Subject: Re: [RP-PPPoE] Very high CPU usage by the pppoe-server
Hello,
You are more than likely confused - ntp is not 'gettimeofday'. The
references here to 'gettimeofday' are system calls that take no time to
complete and are irrelevant, and so your statement about removing 'ntp'
making a difference is moot.
1) Get a tcpdump of the control traffic so we can see how many times
the main loop of pppoe-server is being hit.
2) Do an strace like I showed you and give us some better snippets. I
would like to amend my earlier suggestion so that this should look like
this:
strace -xx -tt -f -p 3024 -vvv -s256
Where the '-p <serverpid>' is your pppoe-server process id.
I would pipe this out to a file, let it run for 60 seconds, and then
post the results somewhere since it's likely big (dropbox or such).
3) For added fun -
ps auxwf
and what do you get?
-ILC
On 04/01/2011 01:44 AM, Dardan Behluli wrote:
> Hi,
> Thank you for your efforts. I got this output when I did the strace -f -p:
>
> gettimeofday({1301602017, 693009}, NULL) = 0
> select(15, [5 7 9 10 11 13 14], NULL, NULL, {2, 147618}) = 1 (in [11], left {2, 150000})
> gettimeofday({1301602017, 693813}, NULL) = 0
> recv(11, "\0000H\217\335\221\0\fn\265\4\4\210d\21\0\5 \5\211\0!E"..., 1520, 0) = 1437
> gettimeofday({1301602017, 693967}, NULL) = 0
> select(15, [5 7 9 10 11 13 14], NULL, NULL, {2, 146660}) = 1 (in [11], left {2, 150000})
> gettimeofday({1301602017, 694176}, NULL) = 0
> recv(11, "\0000H\217\335\221\0\37\341\212.<\210d\21\0\7H\5\230\0"..., 1520, 0) = 1452
> gettimeofday({1301602017, 694995}, NULL) = 0
> select(15, [5 7 9 10 11 13 14], NULL, NULL, {2, 145632}) = 1 (in [11], left {2, 150000})
> gettimeofday({1301602017, 695253}, NULL) = 0
> recv(11, "\0000H\217\335\221\0\34#%\220}\210d\21\0\3/\0w\0!E\0\0"..., 1520, 0) = 139
> gettimeofday({1301602017, 695481}, NULL) = 0
> select(15, [5 7 9 10 11 13 14], NULL, NULL, {2, 145146}) = 1 (in [11], left {2, 150000})
> gettimeofday({1301602017, 695882}, NULL) = 0
> recv(11, "\0000H\217\335\221\0\24\"$V\371\210d\21\0\0/\0*\0!E\0\0"..., 1520, 0) = 62
>
> I deleted the NTP servers from the configuration and the CPU utilization dropped down drastically. For the moment there are 1000 users online in that server and the CPU is idle 58%. I'll keep you updated how it goes today.
> Thanks again,
> Dardan
>
> -----Original Message-----
> From: rp-pppoe-bounces at lists.roaringpenguin.com [mailto:rp-pppoe-bounces at lists.roaringpenguin.com] On Behalf Of Insane Laughing Clown
> Sent: Thursday, March 31, 2011 4:58 PM
> To: For users of RP-PPPoE client/server software
> Subject: Re: [RP-PPPoE] Very high CPU usage by the pppoe-server
>
> On 03/31/2011 02:31 AM, Dardan Behluli wrote:
>> Hi,
>>
>> One of our several PPPoE servers has very high CPU usage. This occurs basically when the number of users terminated reaches 1300 - 1500. With more than 1500 users connected the server is practically inaccessible. When we do top we see that the process pppoe-server uses most of the CPU. The parameters of the pppoe-server are: pppoe-server -k -u -r -s -I -C Name -L local_IP -R remote_IP -N 3000.
>>
>> The difference in the configuration between this and the other PPPoE servers is that this one is not doing NAT, the RADIUS is giving public IP addresses to the clients connected to this NAS.
>>
>> We tried with mss clamping, synchronous ppp etc but it didn't help much.
>>
>> We would very much appreciate any insight on this issue.
>>
>>
>
> Hi,
>
> As david noted in this thread, I also would recommend an 'strace' on
> the pppoe-server process to see what it's doing since that is the
> process that you say is the top consumer of cpu ('strace -f -p<pid of
> server process>') Really, pppoe-server does practically nothing - it
> sits idly in a select() (using no cpu at all) waiting for pppoe session
> requests, and quickly kicks out a new pppd in response to that, and then
> goes idle again waiting for the next one.
>
> Off the top of my head without strace, I'd hazard a guess that a dos of
> some kind - some bad client flooding you with pppoe padi or padr
> perhaps. Have you done a tcpdump on the ethernet feeding you pppoe
> traffic? ('tcpdump -lni<interface> ether proto 0x8863'). This will show
> you all of the control messages that pppoe-server has to concern itself
> with.
>
> Please follow up on the list and let us know what your results are.
>
> -ILC
>
> _______________________________________________
> RP-PPPoE mailing list
> RP-PPPoE at lists.roaringpenguin.com
> http://lists.roaringpenguin.com/cgi-bin/mailman/listinfo/rp-pppoe
> _______________________________________________
> RP-PPPoE mailing list
> RP-PPPoE at lists.roaringpenguin.com
> http://lists.roaringpenguin.com/cgi-bin/mailman/listinfo/rp-pppoe
_______________________________________________
RP-PPPoE mailing list
RP-PPPoE at lists.roaringpenguin.com
http://lists.roaringpenguin.com/cgi-bin/mailman/listinfo/rp-pppoe
More information about the RP-PPPoE
mailing list