July 2016

Sunday, July 31, 2016

networking - Hardware Firewall Vs. Software Firewall (IP Tables, RHEL)

My hosting company says IPTables is useless and doesn't provide any protection. Is this a lie?

TL;DR
I have two, co-located servers. Yesterday my DC company contacted me to tell me that because I'm using a software firewall my server is "Vulnerable to multiple, critical security threats" and my current solution offers "No protection from any form of attack".

They say I need to get a dedicated Cisco firewall ($1000 installation then $200/month each) to protect my servers. I was always under the impression that, while hardware firewalls are more secure, something like IPTables on RedHat offered enough protection for your average server.

Both servers are just web-servers, there's nothing critically important on them but I've used IPTables to lock down SSH to just my static IP address and block everything except the basic ports (HTTP(S), FTP and a few other standard services).

I'm not going to get the firewall, if ether of the servers were hacked it would be an inconvenience but all they run is a few WordPress and Joomla sites so I definitely don't think it's worth the money.

Answer

Hardware firewalls are running software too, the only real difference is that the device is purpose built and dedicated to the task. Software firewalls on servers can be just as secure as hardware firewalls when properly configured (note that hardware firewalls are generally 'easier' to get to that level, and software firewalls are 'easier' to screw up).

If you're running outdated software, there's likely a known vulnerability. While your server might be susceptible to this attack vector, stating that it is unprotected is inflammatory, misleading, or a boldface lie (depends on what exactly they said and how they meant it). You should update the software and patch any known vulnerabilities regardless of the probability of exploitation.

Stating that IPTables is ineffective is misleading at best. Though again, if the one rule is allow everything from all to all then yeah, it wouldn't be doing anything at all.

Side Note: all my personal servers are FreeBSD powered and use only IPFW (built-in software firewall). I have never had a problem with this setup; I also follow the security announcements and have never seen any issues with this firewall software.
At work we have security in layers; the edge firewall filters out all the obvious crap (hardware firewall); internal firewalls filter traffic down for the individual servers or location on the network (mix of mostly software and hardware firewalls).
For complex networks of any kind, security in layers is most appropriate. For simple servers like yours there may be some benefit in having a separate hardware firewall, but fairly little.

windows server 2008 - IIS7 install two SSL certificates to two different host (not subdomains)

My question is very similar to this one: IIS 7.0 install two SSL certificates with two different host headers except that my domains are totally different:

1 SSL for http://domain1.com
1 SSL for http://domain2.com

I installed my certificates as described in that question/answer but if I navigate do domain2.com (which I binded via appcmd ...) I get the certificate for domain1.com. Therefor if I open the bindings via IIS7 I see that domain2.com use the wrong certificate, but if I change it will change for domain1.com too.

I really don't know how to solve this issue!

Thanks

Answer

You can't. An SSL certificate applies to a whole port/ip combination because the host header is encrypted so can't be used at that stage.

The question you refer to suggests to apply a single wildcard certificate to both domains because the single certificate is relevant to both domains. The option allows a single certificate to be applied to multiple sites. Where the domains are entirely different this won't work. You are trying to apply multiple certificates.

If you have two seperate domains that you want to use SSL on then you will need two seperate IP addresses.

web server - Transparent geographical DR website failover

We've already got webservers that are loadbalanced. And even though outages shouldn't happen, they do, for a variety of reasons. (central switch failure, misconfigured ISP routers, backbone failures, DOS attack on shared infrastructure) I want to put a second set of servers in a completely different geographical location with entirely different connections. I can sync the SQL servers with a number of different techniques, so that's not a problem. But what I don't know how to do is transparently redirect existing user web sessions to the backup servers when the primary goes down or becomes unreachable.

AFAIK, the three most common ways of dealing with this are:

DNS load balancing, which uses a very-low TTL to intelligently

resolve DNS requests to server IPs in the best environment.

Intelligent Redirection, which uses a 3rd site to authoritatively
redirect users to well-known, but secondary DNS names like
na1.mysite.com and eu.mysite.com.

Use an intelligent, minimal proxy server to relay the requests to different sites while hosting the proxy server in the cloud somewhere.

But in the case of a site failure, the first would leave users unable to reach the server until the TTL causes clients to requery DNS and resolve to the DR site, or causes excessive extra DNS requests. The second method still leaves us with a potential single-point-of-failure (although I could see multiple A-records being used to duplicate the master "login" role between environments) but still doesn't redirect users when the site that they're currently using goes down. And the third isn't redundant at all if the cloud goes down. (as they all have from time to time)

From what I know about networking, isn't there a way that I can give 2 different servers in 2 geographically separated environments the same overlapping IP address and let IP packet routing take over and route traffic to the server accepting requests? Is this only feasible with IPv6? What is it called and why don't DR site failovers currently use such a technique? Update: This is called anycast. How do I make this happen? And is it worth the trouble?

To clarify: this question is specific to HTTP server traffic only with service interruption allowed for up to 60 seconds. Users should not need to close their browser, go back to the login page, or refresh anything. Mobile users cannot accept an extra DNS query for every page request.

centos , linux , apache permission issue

I have a php file which executes a shell script



$ip_access = $_GET['ip_access'];
// run knock app
exec("/home/knock.sh ".$ip_access);

however when I access it I receive the following error in the logs "sh: /home/knock.sh: Permission denied"
I created a new user and added it in the apache httpd.conf file but is still not working . Any advice how to set the permissions or how can I grant more access to the user to make it work would be highly appreciated .

Answer

If you are running the script as the appropriate user, you may need to add the execute permission +x to the file. You can do it using:

chmod +x /home/knock.sh

As an alternative, you can use the following command (no need to add execute permission):

exec("bash /home/knock.sh ".$ip_access);

domain name system - What's the command-line utility in Windows to do a reverse DNS look-up?

Is there a built-in command line tool that will do reverse DNS look-ups in Windows? I.e., something like w.x.y.z => mycomputername

I've tried:

nslookup: seems to be forward look-up only.

host: doesn't exist

dig: also doesn't exist.

I found "What's the reverse DNS command line utility?" via a search, but this is specifically looking for a *nix utility, not a Windows one.

Answer

ping -a w.x.y.z

Should resolve the name from the IP address if the reverse lookup zone has been set up properly. If the reverse lookup zone does not have an entry for the record, the -a will just ping without a name.

php fpm - FastCGI: "comm with server aborted: read failed" only for one specific file

Related question:
FastCGI and Apache 500 error intermittently

The solution does not work for me.

The problem:

I have a Laravel 5.1 application (was in production on other servers without any problems) running on a fresh Ubuntu 14.04 server with Apache 2.4.7 and PHP through PHP-FPM.

Everything works fine as long as a certain file isn't invoked in the application:

$compiledPath = __DIR__.'/cache/compiled.php';


if (file_exists($compiledPath)) {
    require $compiledPath; // this causes a "500 Internal Server Error"
}

It's a Laravel specific file created automatically by the framework itself to speed things up a little (so it's not a bug in my code), it really exists and I have full access permissions. It's about 600kB in size. When I remove it, everything works fine. But when I tell Laravel to create it again and then hit any route of the application, I get a "500 Internal Server Error" with the following log entries:

[fastcgi:error] [pid 14334] (104)Connection reset by peer: [client
xxx.xxx.xxx.xxx:41395] FastCGI: comm with server

"/var/www/clients/client1/web1/cgi-bin/php5-fcgi-yyy.yyy.yyy.yyy-80-domain.com"
aborted: read failed

[fastcgi:error] [pid 14334] [client xxx.xxx.xxx.xxx:41395] FastCGI:
incomplete headers (0 bytes) received from server
"/var/www/clients/client1/web1/cgi-bin/php5-fcgi-yyy.yyy.yyy.yyy-80-domain.com"

[fastcgi:error] [pid 14334] (104)Connection reset by peer: [client
xxx.xxx.xxx.xxx:41395] FastCGI: comm with server
"/var/www/clients/client1/web1/cgi-bin/php5-fcgi-yyy.yyy.yyy.yyy-80-domain.com"

aborted: read failed

[fastcgi:error] [pid 14334] [client xxx.xxx.xxx.xxx:41395] FastCGI:
incomplete headers (0 bytes) received from server
"/var/www/clients/client1/web1/cgi-bin/php5-fcgi-yyy.yyy.yyy.yyy-80-domain.com"

What I've tried:

I tried the solution in the related question mentioned above, which also represents most of the other suggestions concerning this problem I could find: Play around with the common PHP-FPM settings in order to assign more resources. The accepted answer also mentions the option of completely abandoning FastCGI, but I don't want to go there. So I played around with the values, but no luck.

There is no load on the server whatsoever since I'm the only one using it, so I really doubt that it's an issue with the available resources (It's a VPS with 12GB RAM). Could it have something to do with the filesize? It's the only PHP file that big.

I could reproduce the problem on 2 different servers with the same configuration. It did not occur on an Ubuntu 12.04 server with Apache 2.2 with FastCGI.

My current configuration:

PHP-FPM:

pm.max_children = 10

pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 5
pm.max_requests = 0


    ...

    Alias /php5-fcgi /var/www/....
    FastCgiExternalServer /var/www/.... -idle-timeout 300 -socket /var/lib/php5-fpm/web1.sock -pass-header Authorization

php.ini

memory_limit = 512M
output_buffering = on

Answer

If PHP is failing only on specific source files, the most probable reason is that some PHP code accelerator (opcode cache) like Xcache, APC or eAccelerator has issues with the file. This can be due to bugs in the accelerator or in PHP itself.

You can try to run your script via PHP command-line interface (php-cli command) as PHP CLI doesn't use any accelerators.

Saturday, July 30, 2016

windows - How do you handle "CMD does not support UNC paths as current directories"?

I am attempting to change directories to a file server such as:

cd \\someServer\\someStuff\

However, I get the following error:

CMD does not support UNC paths as
current directories

What are my options to navigate to that directory?

Answer

If you're considering scripting it, it's always helpful to learn about the pushd and popd commands. Sometimes you can't be sure what drives letters are already used on the machine that the script will run on and you simply need to take the next available drive letter. Since net use will require you to specify the drive, you can simply use pushd \\server\folder and then popd when you're finished.

linux - PHP Mkdir not working - Full permission

Our server is a Linux Server with Debian 5, Apache2

This is a development server which we are doing testing on and as such we have setup world write permission on everything

Ive also set the umask in /etc/profile to 000

One particular PHP script loops through some images in a directory and attempts to make thumbnails in a sub directory

the PHP Error we receive is "Warning: mkdir() [function.mkdir]: No such file or directory"

apache2 runs as user www-data, I can login as www-data and make directories and files and everything with no problem

the apache error log just says File does not exist

Any suggestions?

Answer

Is the path its trying to create there ? ie if its trying to create /var/www/images/thumb/ then /var/www/images/ needs to exist. It may also pay to enable recursive creation mkdir(/var/www/images/thumbs, 0, true)

debian - Crontab problem

under debian as a root (using su -)

First of all there is already a job inside the server (done by someone else), when I type

crontab -e

I get

# m h  dom mon dow   command
* * * * * sh /opt/somescript.sh

It executes exery minute.

Anyway, I am trying to add a scheduled job to the crontab:
I want tried to add a second job that will be executed every day at 00:30 am.

30 0   * * *    sh /opt/newscript.sh

I have two problems:

I am not able to edit the crontab with crontab -e

Is my newscript scheduling right ?

httpd - Apache configuration for allowing a web site that is a symlink to a users directory

I have configure the httpd.conf in Apache (httpd-2.2.17-1.fc14.x86_64) on FC14 to point to a symlink that exists in the users home directory by adding the following config to the httpd.conf.


    ServerName site.test.co.uk
    DocumentRoot /var/www/html/site.test.co.uk
    
            AllowOverride None
            Options Indexes FollowSymLinks MultiViews
            Order allow,deny

            allow from all

I have then used the command

ln -s /var/www/html/site.test.co.uk /home/user/www/site.test.co.uk

I have give the user userA access to the directory structure and ownership of the folders in the home dir (I was logged in as root). I have also given the group apache access to the group userA and restarted httpd.

This issue I have is that when I view the site site.test.co.uk I get a 403 Forbidden error?

I can cd to the directory via the symlink and that works fine but apache does not seem to be able to access the symlink.

Thanks in advance.

Answer

Try cding to the folder using the Apache user,

Most likely its your home directory permissions, or permission somewhere on the top level

su apache
cd home
cd user
cd www
cd site.test.co.uk

If it fails, you need to add apache to the group of user and make sure the directory has Group read and write permission drwxr-x--- or 750

chmod -R 750 /home/user

Friday, July 29, 2016

vmware esxi - Will Adaptec raid 2405 card work in Asus Crosshair Formula IV MOBO

Will this PCIe 8 lane Raid card Adaptec 2405

Work in this Motherboard Crosshair Formula IV

Having never seen a PCIe slot Im not sure and just want to check before ordering.

TIA

UPDATE:

What essentialy I am looking for is to find out if this PICe x8 card will work in this motherboard on a PCIe level. I am satisfied that ESXi will work on this MB and with this Raid card. Apparently some MB only allow Graphics cards to be used in certain PCIe slots hence my caution. See Why does the Adaptec PCI Express card not work in a PCIe x16 slot?

Thursday, July 28, 2016

centos7 - df shows 2 GB size fdisk shows 60 GB

My newly purchased KVM based VPS (virtual server) running Centos 7 shows a - to my mind - odd result for the df -h command versus fdisk -l and ssm list.

I've put the output of various disk related commands below.

What I expected to see was a Size of roughly 60 GB on /dev/vda1 when I did df -h

ssm shows that the difference is between Volume size and FS Size.

What am I missing?

# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       2.0G  1.1G  754M  60% /
devtmpfs        912M     0  912M   0% /dev
tmpfs           921M     0  921M   0% /dev/shm
tmpfs           921M   17M  904M   2% /run
tmpfs           921M     0  921M   0% /sys/fs/cgroup

tmpfs           921M   17M  904M   2% /etc/machine-id

Seems I only have 2 GB on the / mount.

However, fdisk shows this:
# fdisk -l

Disk /dev/vda: 64.4 GB, 64424509440 bytes, 125829120 sectors
Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000dcb5c

   Device Boot      Start         End      Blocks   Id  System
/dev/vda1   *        2048   121632383    60815168   83  Linux
/dev/vda2       121632384   125826687     2097152   82  Linux swap / Solaris

ssm gives this:

# ssm list
--------------------------------
Device        Total  Mount point
--------------------------------
/dev/vda   60.00 GB  PARTITIONED
/dev/vda1  58.00 GB  /
/dev/vda2   2.00 GB  SWAP
--------------------------------

-------------------------------------------------------------------
Volume     Volume size  FS    FS size       Free  TypeMount point
-------------------------------------------------------------------
/dev/vda1     58.00 GB  ext4  1.97 GB  851.11 MB  part/
-------------------------------------------------------------------

linux - Can someone explain the "use-cases" for the default munin graphs?

When installing munin, it activates a default set of plugins (at least on ubuntu). Alternatively, you can simply run munin-node-configure to figure out which plugins are supported on your system. Most of these plugins plot straight-forward data. My question is not to explain the nature of the data (well... maybe for some) but what is it that you look for in these graphs?

It is easy to install munin and see fancy graphs. But having the graphs and not being able to "read" them renders them totally useless.

I am going to list standard plugins which are enabled by default on my system. So it's going to be a long list. For completeness, I am also going to list plugins which I think to understand and give a short explanation as to what I think it's used for. Pleas correct if I am wrong with any of them.

So let me split this questions in three parts:

Plugins where I don't even understand the data

Plugins where I understand the data but don't know what I should look out for

Plugins which I think to understand

Plugins where I don't even understand the data

These may contain questions that are not necessarily aimed at munin alone. Not understanding the data usually mean a gap in fundamental knowledge on operating systems/hardware.... ;) Feel free to respond with a "giyf" answer.

These are plugins where I can only guess what's going on... I hardly want to look at these "guessing"...

Disk IOs per device (IOs/second)
What's an IO. I know it stands for input/output. But that's as far as it goes.

Disk latency per device (Average IO wait)
Not a clue what an "IO wait" is...

IO Service Time
This one is a huge mess, and it's near impossible to see something in the graph at all.

Plugins where I understand the data but don't know what I should look out for

IOStat (blocks/second read/written)
I assume, the thing to look out for in here are spikes? Which would mean that the device is in heavy use?

Available entropy (bytes)
I assume that this is important for random number generation? Why would I graph this? So far the value has always been near constant.

VMStat (running/I/O sleep processes)
What's the difference between this one and the "processes" graph? Both show running/sleeping processes, whereas the "Processes" graph seems to have more details.

Disk throughput per device (bytes/second read/written)
What's thedifference between this one and the "IOStat" graph?

inode table usage
What should I look for in this graph?

Plugins which I think to understand

I'll be guessing some things here... correct me if I am wrong.

Disk usage in percent (percent)
How much disk space is used/remaining. As this is approaching 100%, you should consider cleaning up or extend the partition. This is extremely important for the root partition.

Firewall Throughput (packets/second)
The number of packets passing through the firewall. If this is spiking for a longer period of time, it could be a sign of a DOS attack (or we are simply recieving a large file). It can also give you an idea about your firewall performance. If it's levelling out and you need more "power" you should consider load balancing. If it's levelling out and see a correlation with your CPU load, it could also mean that your hardware is not fast enough. Correlations with disk usage could point to excessive LOG targets in you FW config.

eth0 errors (packets in/out)
Network errors. If this value is increasing, it could be a sign of faulty hardware.

eth0 traffic (bits/second in/out)
Raw network traffic. This should correlate with Firewall throughput.

number of threads
An ever-increasing value might point to a process not properly closing threads. Investigate!

processes
Breakdown of active processes (including sleeping). A quick spike in here might point to a fork-bomb. A slowly, but ever-increasing value might point to an application spawning sub-processes but not properly closing them. Investigate using ps faux.

process priority
This shows the distribution of process priorities. Having only high-priority processes is not of much use. Consider de-prioritizing some.

cpu usage
Fairly straight-forward. If this is spiking, you may have an attack going on, or a process is hogging the CPU. Idf it's slowly increasing and approaching max in normal operations, you should consider upgrading your hardware (or load-balancing).

file table usage
Number of actively open files. If this is reaching max, you may have a process opening, but not properly releasing files.

load average
Shows an summarized value for the system load. Should correlate with CPU usage. Increasing values can come from a number of sources. Look for correlations with other graphs.

memory usage
A graphical representation of you memory. As long as you have a lot of unused+cache+buffers you are fine.

swap in/out
Shows the activity on your swap partition. This should always be 0. If you see activity on this, you should add more memory to your machine!

Answer

Disk IOs per device (IOs/second)

With traditional hard drives this is a very important number. I/O operation is a read or write operation to disk. With rotational spindles you can get around from dozens to perhaps 200 IOPS per second, depending on the disk speed and its usage pattern.

This is not all to it: modern operating systems do have I/O schedulers which try to merge several I/O requests as one and make things faster that way. Also the RAID controllers and so on do perform some smart I/O request reordering.

Disk latency per device (Average IO wait)

How long it took from performing the I/O request to an individual disk to actually receive the data from there. If this hovers around couple of milliseconds, you are OK, if it's dozens of ms, then you are starting to see your disk subsystem sweating, if it's hundreds of more ms, you are in big trouble, or at least have a very, very slow system.

IO Service Time

How your disk subsystem (possibly containing lots of disks) is performing overall.

IOStat (blocks/second read/written)

How many disk blocks were read/written per second. Look for spikes and also the average. If average starts to near the maximum throughput of your disk subsystem, it's time to plan for performance upgrade. Actually, plan that way before that point.

Available entropy (bytes)

Some applications do want to get "true" random data. Kernel gathers that 'true' randomness from several sources, such as keyboard and mouse activity, a random number generator found in many motherboards, or even from video/music files (video-entropyd and audio-entropyd can do that).

If your system runs out of entropy, the applications wanting that data stall until they get their data. Personally in the past I've seen this happening with Cyrus IMAP daemon and its POP3 service; it generated a long random string before each login, and on a busy server that consumed the entropy pool very quickly.

One way to get rid of that problem is to switch the applications to use only semi-random data (/dev/urandom), but that's not among this topic anymore.

VMStat (running/I/O sleep processes)

Not thought about this one before, but I would think that this tells you about per-process I/O statistics, or mainly if they are running some I/O or not, and if that I/O is blocking I/O activity or not.

Disk throughput per device (bytes/second read/written)

This is purely bytes read/written per second, and more often this is more human-readable form than blocks, which may vary. Block size may differ because of the disks used, file system (and its settings) used, and so on. Sometimes the block size might be 512 bytes, other times 4096 bytes, sometimes something else.

inode table usage

With file systems having dynamic inodes (such as XFS), nothing. With file systems having static inodes maps (such as ext3), everything. If you have combination of static inodes, a huge file system and huge number of directories and small files, you might encounter a situation where you cannot create more files on that partition, even though in theory there would be lots of free space left. No free inodes == bad.

Wednesday, July 27, 2016

centos5 - Issue installing gcc & curl-devel on Centos OS 5.6

I'm trying to install gcc and curl-devel on CentOS 5.6 (64 bit) server.

The command I'm using is:

yum install gcc \ curl-devel

After running the command it says:

No package gcc available.

No package curl-devel available.

Is there another way for me to install this? Haven't used CentOS much so not sure if maybe they are disabled by the repo or something else. Any help would be greatly appreciated!

Here is the output from running 'yum repolist':

Loaded plugins: fastestmirror Loading mirror speeds from cached
hostfile * base: centos.syn.co.il * extras: centos.syn.co.il *
updates: centos.syn.co.il base
| 1.1 kB 00:00 extras
| 2.1 kB 00:00 updates
| 1.9 kB 00:00 repo id repo name
status base CentOS-5 - Base
enabled: 3,662 extras CentOS-5 - Extras
enabled: 265 updates CentOS-5 - Updates
enabled: 223 repolist: 4,150

Answer

Try:

yum install gcc.x86_64

then

yum install curl-devel.x86_64

Tuesday, July 26, 2016

vmware esxi - Raid 10 to SSD's or not?

I have R620 server. I have that disks:

4x512GB Samsung SSD

4x1TB Seagate ES.2 HDD

I will use Vmware Esxi to create virtual machines. What's the best option? Using SSD's as primary drives and use them at RAID10 or using HDD's with RAID10?

I will host 1200 websites on this server with cPanel.

Capacity is not my concern. But performance is. When I Raid 10 with SSD's, it doesn't really show 4x performance and 4K Seq. Read/Write rate is very low. So that makes me think using it.

linux - Providing normal users(non-root) with initialization and shutdown auto-run capabilities

I'm hosting an experimental/testing Linux box, running Debian Wheezy 7.4.0 distribution. Different users log into the machine over ssh to their accounts and are allowed to run the development tools and leave their programs running as services in background if they so wish.

Since this is a testing machine for all kinds of purposes there is often a need to restart the whole machine and then the users have to log back in and restart their user-space stuff that was running.
I would like to automate that. Basically I would like to provide the users with a mean to launch stuff right after the machine boots up(after everything else is initialized) and a mean to launch stuff upon system shutdown(with no time limitations, basically stalling the shutdown until all those shutdown user processes have completed).

What I have tried so far:
I've created an init bash script, by following the principles found in the 'skeleton' template file under /etc/init.d/ (Skeleton template source code: https://gist.github.com/ivankovacevic/9917139)

My code is here:
https://github.com/ivankovacevic/userspaceServices

Basically the script goes through users home directories and looks for executable files in corresponding subdirectories named .startUp, .shutDown or .status. Depending on the event that is currently going on the scripts get executed with su as if the users have started them themselves.

The problem I'm currently facing with this approach is that there is a strange process left hanging after the system boots and the script starts all the processes of other users. This is how it looks in the processes list:

UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
root      3053     1  0  1024   620   1 17:42 ?        00:00:00 startpar -f -- userspaceServices

I don't know what that process is and man page for it does not mention the -f argument. So I'm clueless but I must be doing something wrong since no other script/service from init.d leaves such a process hanging after boot.

So I'm looking for someone to help me debug this solution I have(which also seems a bit complex in my opinion). Or give me some idea how this could be implemented in an entirely different way.

UPDATE
I've started a separate question for the startpar issue:
startpar process left hanging when starting processes from rc.local or init.d

UPDATE 2
Problem solved for my original solution. Check the previously mentioned question for startpar. The code on GitHub is also corrected to reflect that.

UPDATE 3 - How to use crontab
As Jenny suggested, regular users can schedule tasks to be executed once upon boot using crontab. I find that to be the easiest method, if all you need is starting user tasks on boot and not shutdown. However there is a drawback that users can leave cron process "hanging" as parent when they launch on-going, service-like tasks. First let me just explain how it works:

regular users themselves should call:

crontab -e

( -e as in edit )
Which opens a default console text editor with their user crontab file. To add a task to be executed at boot, a user must add one line at the end of the file:

@reboot /path/to/the/executable/file

Now if the user would do just that and if that file is not just some simple script that linearly completes something and ends, but some sort of watchdog for example, after a reboot you would end with something like this in your processes list:

    1  2661 root       20   0 20380   860   660 S  0.0  0.0  0:00.00 ├─ /usr/sbin/cron
 2661  2701 root       20   0 33072  1152   868 S  0.0  0.0  0:00.00 │  └─ /USR/SBIN/CRON
 2701  2944 someuser   20   0  4180   580   484 S  0.0  0.0  0:00.00 │     └─ /bin/sh -c ./watchdog
 2944  2945 someuser   20   0 10752  1204  1016 S  0.0  0.0  0:00.00 │        └─ /bin/bash ./watchdog

 2945  2946 someuser   20   0 23696  4460  2064 S  0.0  0.1  0:00.01 │           └─ /usr/bin/python ./some_program.py

To avoid that the user needs to modify his crontab entry to look like this:

@reboot /path/to/the/executable/file >/dev/null 2>&1 &

The redirections of file descriptors are optional but recommended to keep it clean. If you want to study why, try looking at them:

ls -l /proc/pid_of_started_process/fd

Answer

I agree that your solution seems a bit complex, so I'll go with "give me some idea how this could be implemented in an entirely different way" :-)

The standard solution for this is to use a configuration management system, such as puppet, and allow users to add their stuff to the puppet config for the server. Puppet will then push out the start script and add them to the relevant runlevels.

A quicker way would be to give them sudoedit access to /etc/rc.d/rc.local and add their things there.

Or give them each a directory to put the start scripts they want started, and have a cron job copy those scripts to /etc/init.d, inserting su $USER -c at suitable places and run chkconfig on them.

Or give them each a directory to put the start scripts, and add some lines at the end fo /etc/rc.d/rc.local to go through those directories and run edited su $USER -c 'script start' on each script in them.

Edited to add:
5. Let them use crontab to schedule the jobs to be run @reboot

Linux swapiness - Adjusting Kernel VM settings

Before you read this, please note that I understand the benefits of caching. I'm familiar with the dogma that unused ram is wasted ram.

This question is one that I've adapted from a previous question:

deleting linux cached ram

In that question I was curious about adjusting how my server uses and caches ram. The system is fairly dynamic so I believe that the cached files doesn't really afford me much gain. Additionally, we have code on the server that has to quickly access large amounts of ram in short periods of time to process video files and I believe that I'll see a performance benefit from directly handing of ram rather then clearing it from cache and then handing it off.

I'd like to find out if any of you have experience with adjusting the default value of 60 in the following file (this happens to be on an Ubuntu server):

/proc/sys/vm/swappiness

And if so, what affects did you see. If I replace the default value of 60 with 30 will I see less swapping and a lot more reuse of stale cache? Do I approach 0 or 100 to decrease swapiness and increase reuse of cache?

Finally, anyone know why the default is set to 60?

NOTE: If it's close to 0, Linux will prefer to keep applications in RAM and not grow the caches. If it's close to 100, Linux will prefer to swap applications out, and enlarge the caches as much as possible. The default is a healthy 60. - Thanks for the link below, 3dInfluence.

Answer

Edit: Rewrote the answer so that it's shorter and clearer I hope :)

You really need to understand how the VM subsystem works as a whole to start tweaking the tunables or you may find that you're not getting the results that you expect. This article is a pretty good starting point on how these settings work together with a desktop slant.

So more to your question. Swappiness controls when the VM subsystem reclaims process table pages by unmapping and paging them out, aka swapping. This tunable works by telling the VM subsystem to look for pages to swap when the % of memory mapped to process page tables + swappiness value is > 100. So a setting of 60 will cause the system to start paging out stale pages from the process page table when it is using more than 40% of your system's memory. If you want to allow your programs to use more memory at the expense of cache you'll want to lower the swappiness value. You'll also want to have a look at /proc/sys/vm/min_free_kbytes and /proc/sys/vm/vfs_cache_pressure. As this will also control how much memory is kept in reserve and how aggressive the caching is. See that article I linked to for more information on the latter of those.

Monday, July 25, 2016

Does Linux support IPv4 mapped IPv6 addresses?

I work in a mixed IPv4 / IPv6 environment. I read that IPv4 addresses can be mapped into the IPv6 space with this syntax

::ffff:1.2.3.4 (1.2.3.4 is the IPv4 address)

Does Linux support this notation ? All these fail on my server:

ping6 ::ffff:1.2.3.4 # to the server IP
ping6 ::ffff:127.0.0.1

Answer

Rather than using ping6, try ssh'ing to ::ffff:127.0.0.1.

I think the specific failure here is related to ping6, not the IP4 mapped addresses.

Aren't IPv4 mapped IPv6 addresses actually using IPv4, and hence, not suitable for ping6?

Linux has a socket option, IPV6_V6ONLY which prevents some applications using IPv4 mapped addresses. However, I think for ping6 the specific issue is the way it works internally.

This is from netbsd, but I think it covers the issue.

You should be aware that IPv4 mapped IPv6 is still IPv4 - it's only
presented in a IPv6-resembling text format (or actually, when calling

your operating system's libraries or kernel, binary socket address
format.)

For dual-protocol applications this is no problem - they know how to
switch (implicitly, when using the right (modern) library calls).

domain name system - Why do Network Solutions DNS servers answer an incorrect IP for google.com etc.?

Network Solutions DNS servers (ns1 - ns99.worldnic.com) answer the IP 141.8.225.31 for any A query to which they do not hold the answer. E.g.:

C:\>dig @ns11.worldnic.com www.google.com
www.google.com.         3600    IN      A       141.8.225.31

For the corresponding NS query, they claim to give an authoritative answer that their server holds the SOA for that TLD.

C:\>dig @ns11.worldnic.com www.google.com NS
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; AUTHORITY SECTION:
com.                    3600    IN      SOA     ns11.worldnic.com. dns.worldnic.com. 2016010801 3600 600 1209600 3600

The same results occur for any name for which they are not authoritative, e.g. previous customers like metacase.com, or non-existent names like xyxyxyxyxy.net. All return the same IP, which is for a spammy advertising site in Switzerland.

This seems incorrect. Although normally ISP DNS servers will not query Network Solutions for these names, when a domain is transferred away from their name servers many ISP DNS servers ("child sticky") continue to ask the previous name server as long as it claims to answer. Thus a domain transfer (or change of authoritative name server) results in a loss of connectivity for that host, even if the host's actual IP remains unchanged and was correct in both losing and gaining authoritative name server.

Full dig output for the above queries:

C:\>dig @ns11.worldnic.com www.google.com

; <<>> DiG 9.11.3 <<>> @ns11.worldnic.com www.google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44870

;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 2800
;; QUESTION SECTION:
;www.google.com.                        IN      A

;; ANSWER SECTION:
www.google.com.         3600    IN      A       141.8.225.31


;; Query time: 154 msec
;; SERVER: 207.204.40.106#53(207.204.40.106)
;; WHEN: Mon Apr 09 13:22:19 FLE Summer Time 2018
;; MSG SIZE  rcvd: 59


C:\>dig @ns11.worldnic.com www.google.com NS

; <<>> DiG 9.11.3 <<>> @ns11.worldnic.com www.google.com NS

; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50950
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 2800
;; QUESTION SECTION:

;www.google.com.                        IN      NS

;; AUTHORITY SECTION:
com.                    3600    IN      SOA     ns11.worldnic.com. dns.worldnic.com. 2016010801 3600 600 1209600 3600

;; Query time: 151 msec
;; SERVER: 207.204.40.106#53(207.204.40.106)
;; WHEN: Mon Apr 09 13:22:32 FLE Summer Time 2018
;; MSG SIZE  rcvd: 97

Domain name with Company A, Email with Company B and VPS with Company C... A Record or Nameservers to join things together?

I do some freelance web development and have spent a long time with a single host that provides all services (domain name, website/database hosting and email) under one account.

I am looking at improving my services but also keep costs down. My current provider has a package where I can register domain, email and a VPS but the cost is quite high. I have researched that I could spread the services to different providers, but I am unsure how this would work with DNS and existing records.

My domain name example.com is registered with Fasthosts (UK), my email is provided through Office 365 (with all the correct DNS records pointing from Fasthosts to Office 365) and I would like to run my own VPS through a different company again.

To add to this: I would also like to be able to host a number of customer websites through my VPS but also be able to provide email to them through Office 365 (I have a tenant account). They may want to keep their domain names with their own company, or I register them through Fasthosts.

Question number 1 (I guess): For my own domain (example.com) do I just need to create an A record to point my www traffic to my VPS IP? I think I've answered my own question as this seems fairly logical.

Question 2 (main question): For my customers who have their domain names hosted elsewhere but only want their website on my VPS, do I just need to, again, point an A record on their domain DNS to my VPS IP?

Do I need to do anything with nameservers, or is that if they want ALL services provided by my hosting company? So, in theory, the customer's domain registrar is just that - no services provided by them apart from the domain admin?

Answer

To answer your questions, yes, all you need to do is change the A record(s).

The easiest way is to get your clients to change the A record for www (or whatever hostname) on their DNS servers to your VPS IP. You do not need to move domains to your hosting provider's DNS servers.

Make sure the IP of your VPS is static as well. If not you will need dynamic DNS.

Sunday, July 24, 2016

windows server 2008 - DC1 Can't See DC2 Nor Network Machines Can't See Domain

Using Windows Server 2008 R2

Found an issue with my Domain Controller 1. Setup is basic and the main domain controller is hosting AD and DNS. The secondary cannot find the primary.

dcgetdcname failed error 1355

As well as computers that are on the domain show the network as Unidentified Network

DC1 shows the correct domain it is connected to and working correctly. DCDIAG on the DC1 shows everything running correctly. DC1 can also ping domain joined computers without a problem but cannot ping DC2.

DC2 also is the backup AD and DNS.

Googled everything under the sun but I can't get DC2 to see DC1 and the computers on the network to see the domain name correctly. I can't join computers to domain either as it says it cannot find the domain.

I can remote to the DC1 from any domain joined computer without a problem but I can't get to any fileshares on the DC1 either. Neither of the DCs are running NAT. This network is not internet joined.

Can anyone help?

EDIT1: NSLOOKUP cannot find the DNS servers. They time out.

EDIT2: DC1 and DC2 are using themselves as the DNS servers as they are marked as secondary's because they cannot reach each other to register each other as DNS servers for each other.

EDIT3: Update got DNS working only not sure how but now no computers on the network can reach the fileshares on the server itself. Firewalls are disabled and I can't find any issue with reaching them.

linux - How-To Chain Boot/Trampoline Boot into Ubuntu 17 on PCIe SSD without BIOS support

I have a server-grade SSD (Intel DC P3520) that will eventually end up in a server. Until the server is deployed in the data center I want to conduct some tests in a desktop machine.

I have an older but yet usable machine (AMD Phenom/ASUS M4A88T-M with 16GB) and was easily able to install Ubuntu 17 BUT obiously the board cannot boot from the PCIe SSD. As the Ubuntu Installer could see and write to the SSD it is save to expect that Linux can handle the device.

Is there an (easy) way to somehow chain-boot from a "Helper" USB stick to the Ubuntu installation on the PCIe SSD? That is, boot from USB into a trampoline system (GRUB/Linux/Bootmanager) that has drivers for the PCIe SSD and loads the kernel just as GRUB would.

I am explicitly not asking for:

How to make an outdated BIOS boot directly from PCIe devices

How to use the PCIe SSD as a data drive

Advice to buy new hardware

The goal is to evaluate the performance gain of this server-grade SSD compared to SATA or M.2 SSDs for specific desktop applications.
Thanks for reading!

Windows Implicit Deny Permissions

Is there any way to setup implicit deny for windows folder permissions?
I would like to setup our accounting folders to only be accessed by the administrators group and the accounting group, and be denied by all other users (even ones that are created later). If possible I want to avoid having to put all the users that don't need access in a group but I don't want to accidently miss someone when creating the group and I don't want to have to remember to put new users in the group.

We are using Windows Small Business Server 2011.

Thanks in advance
Branson

Friday, July 22, 2016

linux - XFS: no space left on device, (but I have 850GB available!)

I'm using a combination of mdadm, lvm2, and XFS on Amazon EC2.

So far, I've had success running a RAID 5 volume built from a number of EBS volumes. The EBS volumes are attached and used with mdadm to create the RAID 5. Then, I use LVM to present the resulting RAID as a single physical volume and single logical volume.

In the past, I've been able to grow the file system by adding a new EBS volume, attaching it, and then running the following process.


mdadm --add /dev/md0 /dev/xvdi

# grow the raid... (could take a while for a large disk!)
mdadm --grow /dev/md0 --raid-devices=4


# grow the LVM physical volume
pvresize /dev/md0

# grow the LVM logical volume ... fairly certain
# -l100%PVS will make the extents use as much space
# as possible on physical disks (and hopefully won't overwrite anything)
lvresize -l100%PVS /dev/data_vg/data_lv

# resize the file system, too!

xfs_growfs /dev/data_vg/data_lv

# yay!
df -h

My most recent attempt at doing this has worked just fine, ostensibly. Running df -ih/fh shows that I have a mounted filesystem with an additional terabyte available, as expected. Also, the total number of inodes used is ONLY 1%. Also, pvdisplay and lvdisplay also show the proper volume sizes.

I've even been able to add some data (about 150GB) to the volume since growing it. However, today I attempt to create a directory and get

mkdir: no space left on device

Why would I encounter this problem if I allegedly have plenty of inodes availble?

I've unmounted the disk and run xfs_check, but that does not report any issues.

Thanks!

Answer

I was able to resolve the issue in the following way:

umount [the mountpoint]
mount /dev/data_vg/data_lv -o inode64 [the mountpoint]

Apparently, the default (32-bit inodes?) xfs will store all inodes in the first 1TB portion of the disk. This means that if the first 1TB is full, then you'll run into no space on disk errors even if it appears you have plenty of space/inodes available. By adding the inode64 option, the nodes can be stored anywhere on disk, if I understand correctly.

Source: the XFS FAQ.

Shibboleth does pass attribute to server variable in PHP

I am building a SAML based federated authentication mechanism in which the IdP is ADFS 2.0 and the SP is Shibboleth running on Linux. I am able to do the following:

Attempt to access a protected page, which redirects me to the IdP login page.

Browse to spserver.internal/Shibboleth.sso/Session and see the returned attributes, including eppn.

I am, however, unable to extract the eppn attribute in the form of the REMOTE_USER header in PHP.

I have disabled attribute-policy.xml (commented it out in shibboleth2.xml).

I am missing something trivial, I suspect for the world of me I don't know what. Either PHP is not picking up the server variables set by Shibboleth or Shibboleth is never setting them. Any ideas?

Output from spserver.internal/Shibboleth.sso/Session

Miscellaneous
Session Expiration (barring inactivity): 479 minute(s)
Client Address: a.b.c.d
SSO Protocol: urn:oasis:names:tc:SAML:2.0:protocol
Identity Provider: http://veragence.thesixthflag.com/adfs/services/trust
Authentication Time: 2014-10-28T11:55:23.947Z

Authentication Context Class: urn:oasis:names:tc:SAML:2.0:ac:classes:PasswordProtectedTransport
Authentication Context Decl: (none)

Attributes
eppn: user.id@adfs.idp.server

Relevant line from shibboleth2.xml:

    
                     REMOTE_USER="eppn persistent-id targeted-id">

Relevant line from attribute-map.xml

Relevant output from shibd.log

2014-10-28 11:55:21 DEBUG Shibboleth.SSO.SAML2 [2]: extracting issuer from SAML 2.0 assertion
2014-10-28 11:55:21 DEBUG OpenSAML.SecurityPolicyRule.MessageFlow [2]: evaluating message flow policy (replay checking on, expiration 60)
2014-10-28 11:55:21 DEBUG XMLTooling.StorageService [2]: inserted record (_06157709-48ab-4701-90b2-b3ecea5df51f) in context (MessageFlow) with expiration (1414497564)
2014-10-28 11:55:21 DEBUG OpenSAML.SecurityPolicyRule.XMLSigning [2]: validating signature profile
2014-10-28 11:55:21 DEBUG XMLTooling.TrustEngine.ExplicitKey [2]: attempting to validate signature with the peer's credentials
2014-10-28 11:55:21 DEBUG XMLTooling.TrustEngine.ExplicitKey [2]: signature validated with credential
2014-10-28 11:55:21 DEBUG OpenSAML.SecurityPolicyRule.XMLSigning [2]: signature verified against message issuer
2014-10-28 11:55:21 DEBUG OpenSAML.SecurityPolicyRule.BearerConfirmation [2]: assertion satisfied bearer confirmation requirements

2014-10-28 11:55:21 DEBUG Shibboleth.SSO.SAML2 [2]: SSO profile processing completed successfully
2014-10-28 11:55:21 DEBUG Shibboleth.SSO.SAML2 [2]: extracting pushed attributes...
2014-10-28 11:55:21 DEBUG Shibboleth.AttributeExtractor.XML [2]: unable to extract attributes, unknown XML object type: samlp:Response
2014-10-28 11:55:21 DEBUG Shibboleth.AttributeExtractor.XML [2]: unable to extract attributes, unknown XML object type: {urn:oasis:names:tc:SAML:2.0:assertion}AuthnStatement
2014-10-28 11:55:21 INFO Shibboleth.AttributeExtractor.XML [2]: skipping unmapped SAML 2.0 Attribute with Name: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/upn, Format:urn:oasis:names:tc:SAML:2.0:attrname-format:unspecified
2014-10-28 11:55:21 DEBUG Shibboleth.AttributeDecoder.Scoped [2]: decoding ScopedAttribute (eppn) from SAML 2 Attribute (urn:oid:1.3.6.1.4.1.5923.1.1.1.6) with 1 value(s)
2014-10-28 11:55:21 DEBUG Shibboleth.SSO.SAML2 [2]: resolving attributes...
2014-10-28 11:55:21 DEBUG Shibboleth.AttributeResolver.Query [2]: found AttributeStatement in input to new session, skipping query
2014-10-28 11:55:21 DEBUG Shibboleth.SessionCache [2]: creating new session
2014-10-28 11:55:21 DEBUG Shibboleth.SessionCache [2]: storing new session..

Thursday, July 21, 2016

centos - 2 linux boxes, proxy and ssh tunnel

problem:
I need create ssh forwarding to other linux box that works as a proxy.

I have two linux boxes(centos 5.5), one in the office(server1) behind firewall, other at colocation(server2)

server1 has squid proxy instaled on port 3128.

i cant use server1 as a direct proxy from home because its behind firewall.

iwas able to create ssh tunnel from server1 to server2 and
when i log in to server2 ican ssh root@localhost -p 12312
to server1

what i need is configure server2 so it forwards port server2:3128 to server1:3128

and i could add server2 ip addres and port to firefox proxy's and access ofice network.

Thanx

linux - Unable to connect to my own system through SSH

My Fedora system is connected to the internet through a proxy server and we have IPs assigned to every system connected via LAN - mine has 192.168.0.103 (by the way, what is this kind of IP called? the technical term? Anybody). I was trying to setup smartsvn and found that SSH was stopped which is why it was not working. You may check my previous question SmartSVN - Unable to create new repository profile.

sshd was stopped on my system. Trying ssh root@192.168.0.103 was saying Connection refused. Then the fllowing things happened - I don't remember the exact sequence in which they happened -

I did service sshd start and then I got password prompt on trying to ssh.

I entered the correct password of root user but it kept denying saying - Permission denied, please try again..

I probably restarted sshd and it stopped asking for password on doing ssh root@192.168.0.103 and kept showing ssh_exchange_identification: Connection closed by remote host instead.

I checked this solution ssh_exchange_identification: Connection closed by remote host and found that -

my IP was present in /etc/hosts.deny - sshd: 192.168.0.103

There were failed login attempts in /var/log/secure.

So I deleted these things from both these files. After that, ssh root@192.168.0.103 prompted for password again but again the same problem. Entering correct password says - Permission denied, please try again.

In that file it is written

> This file describes the names of the
> hosts which are
> #     *not* allowed to use the local INET services, as decided
> #     by the '/usr/sbin/tcpd' server.

But that file cannot be viewed in text. Seems like some more setting needs to be corrected where it is set to disallow this IP for SSH connection. What do I need to fix?

I tried ssh connection from other systems connected via LAN. Permission denied to them too. I logged out and logged in after doing those file changes, restarted sshd and confirmed that those two files do not contain any such thing now.

But still not working. What am I missing. Any pointers?

Thanks,
Sandeepan

Answer

Loging into a system as root is generally considered to be a bad thing. You will probably find that the that sshd is denying root logins. Check /etc/ssh/sshd_config for the line

PermitRootLogin no

Changing no to yes and restarting sshd would allow root to log in. This is however a bad idea. You should connect as a normal user and use sudo or su to perform administrative tasks.

The IP address you have is an address from one of the private address blocks.

mysql - Is it safe to use up all memory on linux server, not leaving anything for the cache?

I have a CentOS server fully dedicated to MySQL 5.5 (with innodb tables mostly). Server has 32 GB RAM, SSD disks, and avarage memory usage looks like this:

free

So about 25GB is in use and about 6.5GB is cached. I am experiencing performance problems with WRITE queries, so I was thinking, is this the optimal cache size? I might increase innodb buffer size, so that linux cache would become smaller, or decrease it, so it would be bigger.

What is the optimal used/cached memory balance for busy MySQL server on linux?

Wednesday, July 20, 2016

Where did my memory go on linux (no cache/slab/shm/ipcs)

This is a headless server with 8GB RAM (kernel 3.12)... even after only a few days, i get low on memory. in fact, this server has OOMed a few days ago... something is losing memory, but i don't know where...

see the output below:

in short:

64bit system & OS

not a hypervisor nor a virtual machine

low free mem

swap in use

low cache

low buffer

inactive+active == 1GB ???

low ipcs

low shm

low slab

~500MB tmpfs usage

in fact total RSS of all processes is 262MB

and HWM of all processes is less than 600MB

i lost more than 6GB somewhere...?


[root@localhost ~]# cat /proc/meminfo 
MemTotal:        8186440 kB
MemFree:          251188 kB
Buffers:             144 kB
Cached:           853548 kB

SwapCached:         9988 kB
Active:           480036 kB
Inactive:         529456 kB
Active(anon):     256196 kB
Inactive(anon):   333072 kB
Active(file):     223840 kB
Inactive(file):   196384 kB
Unevictable:       13656 kB
Mlocked:               0 kB
SwapTotal:       4194300 kB

SwapFree:        4092540 kB
Dirty:               356 kB
Writeback:             0 kB
AnonPages:        161576 kB
Mapped:            50116 kB
Shmem:            419812 kB
Slab:              72680 kB
SReclaimable:      50648 kB
SUnreclaim:        22032 kB
KernelStack:        1824 kB

PageTables:        10260 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     8287520 kB
Committed_AS:    1883404 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       91804 kB
VmallocChunk:   34359637332 kB
HardwareCorrupted:     0 kB

AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       83180 kB
DirectMap2M:     8296448 kB

[root@localhost ~]# ipcs -m 


------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x01123bac 0          root       600        1000       8                       

[root@localhost ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           4.0G  393M  3.6G  10% /run

[root@localhost ~]# for i in /proc/*/status ; do grep VmRSS $i; done | awk '{ s = s + $2 } END { print s / 1024 }'

262.375

[root@localhost ~]# for i in /proc/*/status ; do grep VmHWM $i; done | awk '{ s = s + $2 } END { print s / 1024 }'
526.77

Edit: i've set overcommit=2 (disabled) just in case (i rebooted 2 days ago)



[root@localhost linux]# cat /proc/sys/vm/overcommit_memory 
2
[root@localhost linux]# df -h | grep tmpfs
devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           4.0G     0  4.0G   0% /dev/shm
tmpfs           4.0G  532K  4.0G   1% /run
tmpfs           4.0G     0  4.0G   0% /sys/fs/cgroup
tmpfs           4.0G     0  4.0G   0% /tmp
tmpfs           4.0G  532K  4.0G   1% /var/spool/postfix/run/saslauthd
[root@localhost linux]# for i in /proc/*/status ; do grep VmRSS $i; done | awk '{ s = s + $2 } END { print s / 1024 }'

434.188
[root@localhost linux]# for i in /proc/*/status ; do grep VmHWM $i; done | awk '{ s = s + $2 } END { print s / 1024 }'
545.551
[root@localhost linux]# cat /proc/meminfo 
MemTotal:        8186440 kB
MemFree:          146576 kB
Buffers:            1728 kB
Cached:          5212588 kB
SwapCached:            0 kB
Active:          2560112 kB

Inactive:        2874464 kB
Active(anon):      94464 kB
Inactive(anon):   136528 kB
Active(file):    2465648 kB
Inactive(file):  2737936 kB
Unevictable:        9772 kB
Mlocked:               0 kB
SwapTotal:       4194300 kB
SwapFree:        4194300 kB
Dirty:              1436 kB

Writeback:             0 kB
AnonPages:        230032 kB
Mapped:            50540 kB
Shmem:               960 kB
Slab:             316804 kB
SReclaimable:     291712 kB
SUnreclaim:        25092 kB
KernelStack:        1880 kB
PageTables:        11184 kB
NFS_Unstable:          0 kB

Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     8287520 kB
Committed_AS:    1160812 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       91676 kB
VmallocChunk:   34359582672 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0

HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       91372 kB
DirectMap2M:     8288256 kB

so, i'm using 8GB:

5GB is cached

0.5MB tmpfs

450MB RSS

~1GB slab+pages+whatever (in meminfo)

i'm still short 1.5GB ... is this a kernel leak? or what is going on here???

Edit2: i have the same issue on another atom board

I also checked if kmemleak saw something, but nothing... i'm out of ideas...

Edit3: updating to kernel 3.17.2 seems to have resolved this issue, but i still don't know how to trace these memory leaks...

Answer

lkml thinks that it might have been https://lkml.org/lkml/2014/10/15/447 , but that patch wasn't in 3.17.2 and the thp allocation don't point that way

however, /proc kpageflags might show what part allocated what pages, so that might help. in tools/vm/page-types.c in kernel sources, that might hold some info on the structure of the kpageflags binary output.

iptables - Configuring a port setting on Linux server

I'm trying to allow Internet traffic to port 7778 on my server, but am unable to do it correctly. Probably making some rookie mistake here. Can you help me diagnose and solve the issue?

I simply did the following:

sudo iptables -A TCP -p tcp -m tcp --dport 7778 -j ACCEPT

If I do iptables -S, I do see the rule appended in the list, e.g.:

-A TCP -p tcp -m tcp --dport 22 -j ACCEPT

-A TCP -p tcp -m tcp --dport 80 -j ACCEPT
-A TCP -p tcp -m tcp --dport 443 -j ACCEPT
-A TCP -p tcp -m tcp --dport 7778 -j ACCEPT

However, if I ping this particular port from another server - telnet example.com 7778, I see:

telnet: Unable to connect to remote host: Connection refused

What else can I do here? Port 80, 443 and 22 are working correctly FYI.

Note: my server uses Azure infrastructure (classic VM). An extra step I took was adding an endpoint for port 7778 in the Azure portal. Thus this part is covered.

Answer

By using the -A switch you have added your rule to the end of the chain. This will almost certainly have placed it after the rule that drops/blocks packets.

When iptables/netfilter is checking to see how a packet should be acted upon. the first to match wins. In your case it will likely match a line like -A INPUT -j REJECT --reject-with icmp-port-unreachable which will cause a Connection Refused message prior to matching your allow messages.

The solution is to use insert the rule using -I into a suitable place in your INPUT chain.

Can someone explain to me the apache log "[info] server seems busy"?

This is the log:

[info] server seems busy, (you may need to

increase StartServers, or Min/MaxSpareServers), spawning 8 children,
there are 17 idle, and 37 total children

[info] server seems busy, (you may need to increase StartServers, or
Min/MaxSpareServers), spawning 8 children, there are 18 idle, and 37
total children

[info] server seems busy,
(you may need to increase StartServers, or Min/MaxSpareServers),
spawning 8 children, there are 18 idle, and 37 total children

[info] server seems busy, (you may need to increase
StartServers, or Min/MaxSpareServers), spawning 8 children, there are
19 idle, and 37 total children

[info]
server seems busy, (you may need to increase StartServers, or
Min/MaxSpareServers), spawning 8 children, there are 18 idle, and 38
total children

[info] server seems busy,
(you may need to increase StartServers, or Min/MaxSpareServers),
spawning 8 children, there are 15 idle, and 39 total children

The message isn't very clear to me and apache2 documentation doesn't help either.

What are those children, why is it spawning 8, what are idle children and what are total children? What is it trying to tell me?

Tuesday, July 19, 2016

Does SSL need to be configured at the external load balancer if you are using one?

Using Elastic Load Balancer, it is easy enough to set up SSL at the external load balancer, and serve requests up as http to the applicaion.

When running a single server, it is also possible to configure SSL at the web server (Tomcat) or application (Spring).

When running with the load balancer, is it necessary to pull SSL up to the level of the load balancer? Is there an element of state to the SSL connection that will be lost by forwarding the still-encryped traffic?

Answer

I'm afraid that the answer will be "it depends". I had occasion to dig into it a bit when I was working with internet banking security. This was a couple of years ago, though, so it's entirely possible that someone else will come up with something that I forgot or that's changed during those years.

Disadvantages with terminating SSL at the load balancer

First, it's possible to lose state by doing this - if you have an application running that requires some SSL headers to keep state. If so, you are likely to lose that information (though you may be able to configure your load balancers to forward it in some manner). One example could be if you're using client certificates for authentication.

Second, as Bazze says, it does make your traffic vulnerable to eavesdropping on the local network. How much of a danger this is will of course depend on what your network looks like, and also what kind of traffic it is.

Advantages

You're reducing the load on your webservers, since they no longer need to spend their resources on decryption and encryption.

When you change your webserver configuration, you can do a simple apache reload without having to enter your SSL key password. This means that you can automate it, enabling continuous deployment and devops and all the buzzwords. (The flip-side to this is that changing your LB configuration may require you to enter the password, but as a rule that's something you don't do as often as fiddling with apache config...)

Troubleshooting your webserver and application just got a whole lot easier, since you can now snoop/tcpdump the incoming traffic directly.

You have fewer places to deal with SSL bugs/security holes. It's usually a lot easier to change SSL settings on one LB than on a large number webservers - especially if those servers are managed by lots of different people from different departments.

Auditing SSL is also a lot easier when there's only one place to audit.

It's a lot easier to keep track of what certificates are in use and when they need to be renewed when they're all in one place. You no longer have the issue of Bob having ordered a cert and put his personal email address in the system for reminders, and then quitting or getting fired so that the reminder bounces and the cert expires and suddenly you have a lot of upset people demanding that it get fixed Right Now! (Not that that has evere happened anyplace I've worked! cough)

Conclusion

Whether it's a good idea to terminate at the LB or not will depends on how you value the various advantages and disadvantages. As a rule, I'd say that unless there is a good reason to do otherwise, you'll want to remove complexity as early as possible - at the network border if that's reasonable from a security and usability standpoint, or as soon thereafter as possible.

Best practice to migrate Server 2003 x64 to Server 2008 x64

We have 2 Domain Controllers running Windows Server 2003 R2 x64. DC1 has the FSMO roles and DC2 acts as a 'secondary' DC. Both are Global Catalog servers.

DC1 has these roles installed: Domain Controller, DNS Server, Application Server (IIS), File Server and Print Server.
DC2 has these roles installed: Domain Controller, DNS Server, Application Server (IIS) and File Server.

There is also a trust relationship between the current domain of these two DCs and a different one which is located in the same location.

What we want to achieve is to upgrade both servers to Windows Server 2008 R2 with a clean install process on both machines, not a direct upgrade from 2003 R2 to 2008 R2.

I am looking for the best practice to achieve that goal. Can you please help me understand the best way to achieve that?

Which server do I have to upgrade first? How can I preserve all of the settings and data from each role?

networking - Unable to use FQDN when using local network (Ubuntu Server 10.04, dd-wrt)

Networking newb here... I seem to be having an odd networking issue.

When I am using a PC/laptop/Smart phone on my local network I am unable to access web pages (or email) on my network server using the FQDN.

Here's the set-up:

Domain name provider that points to the static IP address e.g. example.com points to 222.111.111.001

Fixed IP router address: e.g. 222.111.111.001

dd-wrt router: 192.168.1.1 It has port-forwarding to the Ubuntu server which seems to work as it serves pages, etc to non-local ip addresses. Local machines are issued ip address via DHCP in the 192.168.1.100 to 149 range.

Ubuntu 10.04 server on 192.168.1.150

Clients include Linux mint machines, Android phone and get addresses such as 192.168.1.104

If I am on a client on the local network ( 192.168.1.104) and I try to navigate to a webpage on example.com/index.htm then the request times out.
The same sort of thing applies to email - If I am connected on the local network (wirelessly) then I cannot access the IMAP and SMTP servers using mail.example.com

The situation is fine if I am using a non-local network (e.g. my Vodafone mobile network). The device will successfully load example.com/index.htm
The situation is also ok if I navigate to 192.168.1.150/index.htm

Any thoughts on how to trouble shoot this one?
It's obviously a little annoying...

Cheers.

Monday, July 18, 2016

linux - Fedora 13 post kernel/security update boot problem

About a month ago I installed a security update that had new Kernek 2.6.34.x from 2.6.33.x) on Fedora 13, this is when the problem occurred for the first time.

After the install computer would not boot at all, black screen without any visible hard drive activity (I gave it good 30 minutes on black screen, before took actions)... I poped in installation DVD and went in rescue mode to change back the boot option to old kernel (was just a guess where the problem was). After restart computer loaded just file, took a long time for it to start because of SELinux targeted policy relabel is required. Relabeling could take very long time depending on file size. I assumed that the update got messed up somehow and continued working with modified boot option.

Couple of days ago, there was another kernel update. I installed it and same problem as before. This rules out corrupted update theory... black screen right after 'BIOS' screen before OS gets loaded. I had to rescue system again... Below is copy of my grub.conf file. I am fairly new to LINUX (couple of years of experience), mostly development and basic config... nothing crazy.

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/mapper/vg_obalyuk-lv_root
#          initrd /initrd-[generic-]version.img

#boot=/dev/sda
default=2
timeout=0

splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Fedora (2.6.34.6-54.fc13.i686.PAE)
    root (hd0,0)
    kernel /vmlinuz-2.6.34.6-54.fc13.i686.PAE ro root=/dev/mapper/vg_obalyuk-lv_root rd_LVM_LV=vg_obalyuk/lv_root rd_LVM_LV=vg_obalyuk/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us rhgb quiet
    initrd /initramfs-2.6.34.6-54.fc13.i686.PAE.img

title Fedora (2.6.34.6-47.fc13.i686.PAE)
    root (hd0,0)
    kernel /vmlinuz-2.6.34.6-47.fc13.i686.PAE ro root=/dev/mapper/vg_obalyuk-lv_root rd_LVM_LV=vg_obalyuk/lv_root rd_LVM_LV=vg_obalyuk/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us rhgb quiet
    initrd /initramfs-2.6.34.6-47.fc13.i686.PAE.img
title Fedora (2.6.33.8-149.fc13.i686.PAE)
    root (hd0,0)
    kernel /vmlinuz-2.6.33.8-149.fc13.i686.PAE ro root=/dev/mapper/vg_obalyuk-lv_root rd_LVM_LV=vg_obalyuk/lv_root rd_LVM_LV=vg_obalyuk/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us rhgb quiet
    initrd /initramfs-2.6.33.8-149.fc13.i686.PAE.img

I like my system to be up to date... Let me know if I can post any other files that can be of help.

Has anyone else had this problem? Does anyone has any ideas how to fix this problem?

p.s. Anything helps, you ppl are great! thx for ur time.

Answer

Remove rhgb quiet from the boot command line and start in text mode (runlevel 3) to have a better idea of what's going on. Then consider reporting a bug.

P.S. It seems that someone else has already reported this bug.

debian - Apache + LDAP Auth: access to / failed, reason: require directives present and no Authoritative handler

Can't solve this one, here's my .htaccess:


AuthPAM_Enabled Off

AuthType Basic
AuthBasicProvider ldap
AuthzLDAPAuthoritative on

AuthName "MESSAGE"
Require ldap-group cn=CHANGED, cn=CHANGED

AuthLDAPURL "ldap://localhost/dc=CHANGED,dc=CHANGED?uid?sub?(objectClass=posixAccount)"
AuthLDAPBindDN CHANGED
AuthLDAPBindPassword CHANGED
AuthLDAPGroupAttribute memberUid

AuthLDAPURL is correct, BindDN and BindPassword are correct also (verified with ldapvi -D ..).

Apache version: Apache/2.2.9 (Debian)

The error message seems cryptic to me, I have AuthzLDAPAuthoritative on so where's the problem.

EDIT:

LDAP modules are loaded, the problem is not with them being missing.



# ls /etc/apache2/mods-enabled/*ldap*
/etc/apache2/mods-enabled/authnz_ldap.load  /etc/apache2/mods-enabled/ldap.load

EDIT2:

Solved it by changing funky


Require ldap-group cn=CHANGED, cn=CHANGED

line with


Require valid-user

Since AuthzLDAPAuthoritative is on, no other auth methods will be used and valid-user requirement will auth via LDAP. (right? :/)

Answer

Your 'Require' line reads

Require ldap-group cn=CHANGED, cn=CHANGED

That doesn't look write - I don't believe you can have have two cn's in a DN like that.

Sunday, July 17, 2016

windows server 2003 - Intranet with local DNS resolution issues

I have a DNS server running on Windows Server 2003 that is configured as the primary DNS server for my intranet. I have several DNS entries for our QA server and other local addresses set up there. The secondary DNS server we use is the first DNS server from our hosting provider. All computers are some flavor of windows (mostly WinXP and Win7) and use DHCP to get their IP addresses and DNS information from our router. All local domains end in the prefix .local.

With this setup, we're having an issue where sometimes browsers will not resolve local addresses correctly. For example, if I try to bring up www.somesite.myqaserver.local, sometimes the DNS will resolve correctly and give me the local address I'm looking for, and other times I'll get the hosting provider's error page. However, if I do an nslookup I'll always be able to resolve the expected local IP address from the DNS server.

Usually, when we get this error, we can fix it by restarting the dnscache (net stop dnscache/net start dnscache) but we're having to resort to that solution way more often than I'd like. Does anyone have any suggestions for how I can fix this problem permanently?

Answer

Configure all clients and servers to use the internal DNS server only. If you want to use the ISP DNS servers as forwarders for your DNS server you can.

hp - Smart Array P400 - Accelerator Replacement Battery Failure

TL;DR - Is the immediate failure of a replacement battery, for a failed battery, on a battery backed accelerator for a Smart Array P400 controller a common occurrence? Or are we likely to have an storage controller with an impending and critical fault?

We have a slightly confusing situation with a Smart Array P400 storage controller with the 512mb battery backed accelerator addon on an HP DL380 server.

The storage controller is (afaik) running the latest firmware and driver:

Model:  Smart Array P400 
Controller Status:  OK 

Firmware Version:  7.24 
Serial Number:  *snip* 
Rebuild Priority:  Medium 
Expand Priority:  Medium 
Number Of Ports:  2

The storage diagnostic (both on the both boot-up screen for the controller and within the 'Management Homepage' and the 'HP Array Diagnostic Utility') recently starting showing the following status a fault for the battery for the accelerator:

Accelerator

Status:  Temporarily Disabled 
Error Code:  Cache Disabled Low Batteries 
Serial Number:  *snip* 
Total Memory:  524288 KB 
Read Cache:  25% 
Write Cache:  75% 
Battery Status:  Failed 
Read Errors:  0 
Write Errors:  0

We replaced the battery with a new unit (a visual inspection of the P400 card showing nothing unusual) and saw the same fault - but expected this to disappear over the course of a few hours/days as it charged. This didn't happen, and the fault status remains the same as above.

Given the battery is a genuine part from HP, I wouldn't have expected a replacement battery to fail straight away, or to be dead-on-arrival (is that naivety on my part?).
Is the immediate failure of a replacement battery, for a failed battery, on a battery backed accelerator a common occurrence? Or are we likely to have an storage controller with an impending and critical fault?
Is there any diagnostic that could tell me more about the failed battery, without cracking the server open again?

Many thanks!

Answer

You're correct. Firmware version 7.24 is the current release.

If you have the downtime window, power the server off entirely (remove power cables), wait a few minutes, and power up again. See if that jump-starts the battery charge process.

But short of that, if your system is still under warranty (P400-equipped systems went away in 2009), call for another battery unit. Sometimes HP technicians grab components from the wrong bin at the parts depot.

HP isn't in the battery business much anymore... The two most recent generations of HP Smart Array controllers can use flash-backed cache, which eliminate the impact of cough battery malfunction.

HP can analyze the battery status using a dump from the HP Array Diagnostic Utility (HPADU), but for a disposable part, it's better for everyone involved to just try a new unit.

Finally, if you're experiencing an unbearable drop in performance because of the dead battery, you can override the cache disablement. Look for the No-Battery Write Cache: Enabled option.

switch - DHCP snooping prevents client server from receiving IP

The issue that I am experiencing is that DHCP requests can't be delivered from DHCP server on sw1 to DHCP client server on sw2 when DHCP snooping is enabled. Client servers on the same switch as DHCP server are able to receive DHCP with no problem. Both switches are connected to a router through which traffic between them is transported. I have seen solutions for Cisco switches, but I am using dell switches and I can not find a solution to this. Disabling DHCP snooping on either of the switches fixes the problem oddly. I've also enabled snooping trust on DHCP server and client ports as well as uplink from sw1 to sw2. I am working with dell S4810 switches. Any suggestions?

UPDATE:

To eliminate the possibility of router issue I did the same thing simply by connecting 2 switches of the same model and of the same configuration with each other and trusting all ports and uplinks for dhcp snooping. My client server is still unable to receive DHCP server reply packet. I tracked packet pathway between switches and noticed that client server is able to send DHCP request packet to the DHCP server which the server receives, however when the DHCP server tries to send a reply, the packet reaches the switch on which the client server resides but it never reaches the client server itself. It seems like the client switch drops the reply packet. With dhcp snooping disabled everything works in order.

Quickly changing Windows permissions for a huge directory tree?

I have a huge directory on an NTFS file-system
(i.e. a top-level directory containing tens or hundreds of millions of descendant nodes with the file nodes probably on average about three levels deep) that I need to change permissions for. In particular, I need to give a new user (or group) read-only access to absolutely everything in the directory tree.

The most obvious place to do this is in Windows Explorer by right-clicking the top-level directory and going to the security tab of the directory properties window. However, when trying the obvious things there Windows Explorer seems excited to recursively traverse the whole directory tree and try to modify something or other about the permissions of each node in the tree. This is extremely inefficient for such a large directory!

Can anyone offer any tips for changing permissions without this recursive descent? Do I need to click something particular in the GUI? Do I need to use command-line tools? Could this potentially be the result of a previous sysadmin doing something weird to the permissions in this directory?

I also need to enable network sharing and let the user/group mount the directory over the network. Haven't tried that yet, so I don't know if there will be a similar can of worms when I try to enable sharing.

This is on Windows 2008 Server if it matters.

EDIT: People are right that it probably makes more sense to give permission to a domain group rather than a particular account, so I've made note of this above (That's what I was doing anyway. I don't know why I specifically asked about adding a user in the original question. Sorry for the sloppiness). But of course adding a group to a folder's permissions list isn't any faster than adding a user (None of the existing groups are assigned read-only permissions).

Answer

In this case, there's no need to mess with the NTFS permissions.

Just create a Share to the top-level directory and add the users or groups to the share with Read-Only (or if you want Write) permission.

Even if Everyone has Full Control NTFS permissions on the top-level directory, the most restrictive permission (Share or NTFS) will be used.

linux - High CPU utilization but low load average

We are running into a strange behavior where we see high CPU utilization but quite low load average.

The behavior is best illustrated by the following graphs from our monitoring system.

CPU usage and load

At about 11:57 the CPU utilization goes from 25% to 75%. The load average is not significantly changed.

We run servers with 12 cores with 2 hyper threads each. The OS sees this as 24 CPUs.

The CPU utilization data is collected by running /usr/bin/mpstat 60 1 each minute. The data for the all row and the %usr column is shown in the chart above. I am certain this does show the average per CPU data, not the "stacked" utilization. While we see 75% utilization in the chart we see a process showing to use about 2000% "stacked" CPU in top.

The load average figure is taken from /proc/loadavg each minute.

uname -a gives:

Linux ab04 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux

Linux dist is Red Hat Enterprise Linux Server release 6.3 (Santiago)

We run a couple of Java web applications under fairly heavy load on the machines, think 100 requests/s per machine.

If I interpret the CPU utilization data correctly, when we have 75% CPU utilization it means that our CPUs are executing a process 75% of the time, on average. However, if our CPUs are busy 75% of the time, shouldn't we see higher load average? How could the CPUs be 75% busy while we only have 2-4 jobs in the run queue?

Are we interpreting our data correctly? What can cause this behavior?

Answer

While Matthew Ife's answer was very helpful and led us in the right direction, it was not exactly the what caused the behavior in our case. In our case we have a multi threaded Java application that uses thread pooling, why no work is done creating the actual tasks.

However, the actual work the threads do is short lived and includes IO waits or synchornization waits. As Matthew mentions in his answer, the load average is sampled by the OS, thus short lived tasks can be missed.

I made a Java program that reproduced the behavior. The following Java class generates a CPU utilization of 28% (650% stacked) on one of our servers. While doing this, the load average is about 1.3. The key here is the sleep() inside the thread, without it the load calculation is correct.

import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

public class MultiThreadLoad {

    private ThreadPoolExecutor e = new ThreadPoolExecutor(200, 200, 0l, TimeUnit.SECONDS,

            new ArrayBlockingQueue(1000), new ThreadPoolExecutor.CallerRunsPolicy());

    public void load() {
        while (true) {
            e.execute(new Runnable() {

                @Override
                public void run() {
                    sleep100Ms();
                    for (long i = 0; i < 5000000l; i++)

                        ;
                }

                private void sleep100Ms() {
                    try {
                        Thread.sleep(100);
                    } catch (InterruptedException e) {
                        throw new RuntimeException(e);
                    }
                }

            });
        }
    }

    public static void main(String[] args) {
        new MultiThreadLoad().load();
    }

}

To summarize, the theory is that the threads in our applications idle a lot and then perform short-lived work, why the tasks are not correctly sampled by the load average calculation.