(last edited on April 29, 2014 at 1:27 am)
Screenshots
I’m still figuring out exactly what all these parameters tell me, so take this section with a grain of salt.
This is the output of cat /proc/user_beancounters
.
- The two blue lines, privvmpages and physpages, are the amount of current allocated memory and actual used memory respectively. The
- The two orange lines vmguarpages and oomguarpages tell you how much memory can be allocated (“guaranteed available”).
- The purple line shows othersockbuff, which is memory used for interprocess communication between (among other things)
mysqld
and other programs (like WordPress). I was getting a lot offailcnt
here because of a bad table in WordPress as I mentioned above.
The columns:
- held/maxheld are the current amounts of memory (in bytes) or pages (in 4K chunks) are being used at the time that
/proc/user_beancounters
was dumped. It changes from moment to moment. Maxheld, I think, is the “highest” amount requested in recent history. barrier is the maximum amount of that resource that your VPS instance can request, unless there is additional shared resources available. The limit is the absolute maximum that can be requested.
<
p>Now let’s look at the top
command output.
The highlighted green areas show memory and CPU information. The mem total line shows the total available, which corresponds to privvmpages. The mem used line corresponds to physpages.
If you look in the display down a bit, you’ll see VIRT and RES, which shows you how much memory is being used by each running process, also in terms of allocation (VIRT) and actual use (RES). This isn’t an exact count, though; if you sum up the numbers you’ll get more than mem total and mem used. That’s because a lot of code is shared and used by more than one process.
One thing you’ll want to look for is to make sure you’re not seeing too many running processes competing for memory. For example, count up the number of httpd
(web server) processes and smtpd
(mail) processes. You’ll also want to see which processes might be hogging the CPU by looking at the cpu column; as top
updates, the most hoggy processes will be shown at the top by default unless you have changed the sort order, which I never remember how to do. If you see something taking 100% CPU for an extended period of time, that’s a good sign something isn’t right.
So my VPS is supposed to have 256MB of memory allocated to it. So WHY THEN does mem total exceed that and say I have 689MB? That’s because mem total includes swap disk space, though the swap disk doesn’t show up in the VPS directly. To ensure that I don’t go to swap, which is slower, I make sure that mem used doesn’t exceed 256MB. The whole beginning of this article describes what I did to balance the resources.
Finally, the load average at the upper left (in orange) shows some nice low numbers. A range between 0 and 1 is pretty lightly loaded, 1-3 is moderately loaded, and above 3 you’re starting to see some strain.
Don’t like using the shell? Here’s what it looks like in Plesk 8.1:
Here’s what mine looks like when I click Server (left panel) then Statistics (first line, under System). This is just the first line:
You should recognize the mem total and mem used values here, which also correspond to privvmpages and physpages. Remember that a “page” is 4K.
I actually haven’t seen the Shared and Buffer values go up before…they may have before I constrained my server configuration. I would rather stay within my allocation, though, and try to achieve a level of consistent performance.
There’s no swap disk in use here. It’s a virtual server, so there’s no actual swap disk. The Virtuozzo system takes care of this behind the scenes (I think).
Now, if you go to the Virtuozzo panel, then Resources -> Extended, you’ll see something like this below:
These correspond to the userbeancounters I was talking about in the first picture. It’s a little friendlier to look at that the raw text output I showed earlier. soft limit is the same as barrier, and hard limit is the same as limit.
Monitoring Server Resource Usage
One thing to remember about these values is that they’re constantly changing. Finding out what’s causing a particular memory overage means that you have to watch how these values change in real time. I did this by opening a couple of shell windows, then running top
and cat /proc/user_beancounters
in them. There may be a more sophisticated way of doing this, but I’m a noob at this stuff.
There are a few logfiles I monitored simultaneously in other shell windows. Here’s what my setup looked like while I was watching:
The top left window shows filtered output from the apache log, which I do with the following command line: tail -f ~user/statistics/logs/access_log | grep -o -f ~/match-web
~user
is the “ftp user” defined in Plesk for your particular domain, and ~/match-web
is a file in the root
user’s directory that contains the following lines:
"http:.*/"
GET.*HTTP
This makes the live output from the web server a little more focused on pageloads, so you can get an idea how much is going on.
The top right window shows output from the current system message log, with the following command: tail -f -n50 /var/log/messages
On my server, the message log rotates between messages.1, messages.2, etc, so you might have to dig for it. Anyway, watching this log in real time shows me what’s going on, like smtp connections (email) and attempts to break into the server by automated scripts. It’s very irritating to watch, but it allowed me to figure out that it was smtp that was creating some problems in conjunction with the other windows.
The two bottom left windows are the top
command and while true ; do cat /proc/user_beancounters ; sleep 1 ; done
command. If I saw a sudden surge in %CPU or an increase in the failcnt
(in this screenshot you can see that the othersockbuf
problem was still happening), I could look at the top two windows and see what happened.
Finally, the bottom right-window is my Mint installation, showing me the traffic stats for my website. This gave me an idea of how heavily loaded the server was, and I could monitor the various windows to see what was going on.
Other Reading
in progress
- OpenVZ
- UBC Parameter Descriptions
- Linux Memory Usage
- UBC Explained
- How to Profile Memory in a Linux System
- Use the ‘pmap’ (process map) command to see the memory breakdown of a specific process. Or look in ‘/proc/pid/status ‘ and ‘/proc/pid/maps’.
- Use the ‘pstree’ command to show processes in a tree hierarchy, showing who spawned what
- Troubleshooting Linux Memory – similar in some ways to the optimization articles I’ve written.
- To see what sockets are being held by what apps, use
lsof
(list open files) ornetstat -p
. An article on the use of lsof describes it in the intrusion detection role, but it’s still useful. Here’s another lsof quickstart. And this one describes how to recover deleted files using lsof (at the end).
5 Comments
David, this is an awesome series! I only wish (mt)‘s knowledge base was 10% as rich as one of your posts… :-)
Mega props…
-Bob
——-
Great work David!
Thank you for taking the time to publish the results of your dv explorations in such a clear and concise manner.
But looking at your results brings a question to mind:
If pages are 4096 bytes, and your dv has oomguarpages set at 28729, that comes out to (4K * 28729) = 114.9 MB.
Yet the base dv is advertised as having 256 MB of RAM.
Have you found any answers to that apparent discrepancy?
Hey David,
Thanks for the info. I am getting failcnts for othersockbuf at regular intervals. How can I find which process is causing that?
Thanks
David,
I just subscribed mt’s DV 3.5 base yesterday… and your page helped me a lot!
Although this article was wrote two years ago… they are still valid!
Thanks a lot for your help!
Great work David!
Thank you for taking the time to publish the results of your dv explorations in such a clear and concise manner.
But looking at your results brings a question to mind:
If pages are 4096 bytes, and your dv has oomguarpages set at 28729, that comes out to (4K * 28729) = 114.9 MB.
Yet the base dv is advertised as having 256 MB of RAM.
Have you found any answers to that apparent discrepancy?