September 2018

Sunday, September 30, 2018

How to calculate storage working set for sizing SSD/Flash size for tiered storage?

How would you calculate the size of the working set of storage that would live in a SSD/Flash tier of a tiered/hybrid storage solution? This would be used to gauge the size of the SSD/Flash tier.

By tiered/hybrid I mean something (e.g. a storage array) that presents storage that is made up of different tiers/types of disks such as SSD/Flash, SAS, NL-SAS etc. The 'something' then moves data around between the tiers of disk based on how active it is. More active data moves up to SSD/Flash and colder down to slower tiers.

Friday, September 28, 2018

solaris - ZFS Snapshots (Missing Incremental)

Assume two Zpools on Solaris 11.3. Pool A and Pool B.

Pool A contains PoolA/FileSystem1@Snap3

Pool B contains PoolB/FileSystem1@Snap1 and PoolB/FileSystem1@Snap2

Is there a way to update Pool B with FileSystem1@Snap3?

Thursday, September 27, 2018

apache 2.2 - Server problems, running out of RAM, really high load average

I desperately need help in figuring out how to troubleshoot this problem I'm having. I run a fairly mission critical web server (Debian 7.5, 512MB RAM, 512MB swap, Apache, MySQL). It runs a couple WordPress sites on it. Today I found the websites responding quite slowly, and ssh'd in to find the load average was just above 10.0, and RAM use was at 100% and swap was close to the 512MB limit.

I have no idea how to figure out what's going on. Is Apache or MySQL not tuned properly? Maybe someone is attacking the server with repeated hits (how would I know?). I installed htop but even if I saw that Apache or MySQL was eating up a ton of resources, how would I figure out why?

Any points in the right direction would be massively appreciated. I'm at a loss here and I have to keep this server stable.

Side note: The server was up for 30 days, so maybe this was some sort of slow leak. Now that I've rebooted, load average is at 0.45 1.10 0.88, RAM is 165/512MB and swap is 68/512MB.

UPDATE: Still having issues. I captured a screenshot of htop below.

enter image description here

Answer

Congratulations, you've managed to use nearly all of your swap space.

The first obvious problem here is that you went very deep into swap. This is probably what's causing the system to thrash so hard (lots of time spent in system, I/O wait and software interrupts).

First thing to do is to reduce the number of Apache processes that are running. You don't need that many for a small site, and it's just going to throw you deep into swap and kill your performance...which is what already happened. I would recommend you start very small and increase when it becomes necessary. An example:

StartServers            1
MinSpareServers         1
MaxSpareServers         2
MaxClients              5

This limits you to only serving 5 simultaneous requests (everyone else has to wait in line). If at this point you get warnings from Apache about running out of servers, and you still have RAM to spare, then you can increase them, but you are eventually going to run into a point where your VPS simply hasn't got enough RAM to handle all your traffic. At that point you should upgrade the VPS.

Wednesday, September 26, 2018

centos7 - Cron daily running twice

On my CentOS 7 server, errors generated by a nightly backup script that should have run fine caused me to check on my cron activity. I discovered that cron.daily is running twice - here is the relevant section of /var/log/cron after I removed my backup script to see if it was somehow causing the problem:

Oct 10 02:28:01 mail CROND[1750]: (root) CMD (run-parts /etc/cron.hourly)
Oct 10 02:28:01 mail run-parts(/etc/cron.hourly)[1750]: starting 0anacron

Oct 10 02:28:01 mail anacron[1759]: Anacron started on 2017-10-10
Oct 10 02:28:01 mail run-parts(/etc/cron.hourly)[1761]: finished 0anacron
Oct 10 02:28:01 mail anacron[1759]: Normal exit (0 jobs run)
Oct 10 02:30:01 mail CROND[1766]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Oct 10 02:40:01 mail CROND[1847]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Oct 10 02:50:01 mail CROND[1936]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Oct 10 03:00:01 mail CROND[2032]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Oct 10 03:10:01 mail CROND[2148]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Oct 10 03:17:01 mail CROND[2223]: (root) CMD (run-parts /etc/cron.daily)
Oct 10 03:17:01 mail run-parts(/etc/cron.daily)[2223]: starting kizunademo

Oct 10 03:17:02 mail run-parts(/etc/cron.daily)[2259]: finished kizunademo
Oct 10 03:17:02 mail run-parts(/etc/cron.daily)[2223]: starting logrotate
Oct 10 03:17:02 mail run-parts(/etc/cron.daily)[2266]: finished logrotate
Oct 10 03:17:02 mail run-parts(/etc/cron.daily)[2223]: starting man-db.cron
Oct 10 03:17:02 mail run-parts(/etc/cron.daily)[2277]: finished man-db.cron
Oct 10 03:20:01 mail CROND[2288]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Oct 10 03:28:01 mail CROND[2367]: (root) CMD (run-parts /etc/cron.hourly)
Oct 10 03:28:01 mail run-parts(/etc/cron.hourly)[2367]: starting 0anacron
Oct 10 03:28:01 mail anacron[2376]: Anacron started on 2017-10-10
Oct 10 03:28:01 mail run-parts(/etc/cron.hourly)[2378]: finished 0anacron

Oct 10 03:28:01 mail anacron[2376]: Will run job `cron.daily' in 35 min.
Oct 10 03:28:01 mail anacron[2376]: Jobs will be executed sequentially
Oct 10 03:30:01 mail CROND[2381]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Oct 10 03:40:01 mail CROND[2462]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Oct 10 03:50:02 mail CROND[2547]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Oct 10 04:00:01 mail CROND[2670]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Oct 10 04:03:01 mail anacron[2376]: Job `cron.daily' started
Oct 10 04:03:01 mail run-parts(/etc/cron.daily)[2685]: starting kizunademo
Oct 10 04:03:02 mail run-parts(/etc/cron.daily)[2721]: finished kizunademo
Oct 10 04:03:02 mail run-parts(/etc/cron.daily)[2685]: starting logrotate

Oct 10 04:03:02 mail run-parts(/etc/cron.daily)[2728]: finished logrotate
Oct 10 04:03:02 mail run-parts(/etc/cron.daily)[2685]: starting man-db.cron
Oct 10 04:03:03 mail run-parts(/etc/cron.daily)[2739]: finished man-db.cron
Oct 10 04:03:03 mail anacron[2376]: Job `cron.daily' terminated
Oct 10 04:03:03 mail anacron[2376]: Normal exit (1 job run)

Why is cron.daily running twice? As you can see, the log contains some entries related to the second run that aren't present for the first run: two lines announcing the upcoming run, and two more lines saying that it terminated with a normal exit. The first run simply ran the scripts with no extra fanfare. I assume that means something, but I don't know what.

I checked everything I could think of for doubles of something. I'm pretty sure I've read every similar thread on the subject, so compare with the following before calling this a duplicate question. Based on Why is cron running twice? I checked for extra processes - the complete output of ps aux | grep cron is as follows, so there is only one process:

root      9383  0.0  0.2 112672  2340 pts/0    S+   15:18   0:00 grep --color=auto cron
root     25624  0.0  0.0 126248   320 ?        Ss   Sep30   0:02 /usr/sbin/crond -n

Based on Cron jobs running twice - Ubuntu server 12.04 I also checked crontab -l -u root, which said no crontab for root.

And here is my /etc/crontab:

SHELL=/bin/bash

PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=my@email.com

28 * * * * root run-parts /etc/cron.hourly
17 3 * * * root run-parts /etc/cron.daily
44 2 * * 0 root run-parts /etc/cron.weekly
8 2 7 * * root run-parts /etc/cron.monthly

Thoughts?

EDIT (9 months after this discussion had fallen silent):

Comment today from Marin Velikov made me aware that there is an anacrontab file (I know it's silly, but it hadn't even occurred to me). Here is its contents:

SHELL=/bin/sh
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
# the maximal random delay added to the base delay of the jobs
RANDOM_DELAY=45

# the jobs will be started during the following hours only
START_HOURS_RANGE=3-22

#period in days   delay in minutes   job-identifier   command
1       5       cron.daily              nice run-parts /etc/cron.daily
7       25      cron.weekly             nice run-parts /etc/cron.weekly
@monthly 45     cron.monthly            nice run-parts /etc/cron.monthly

So that's the cause. But why would the system be configured to run both? I assume someone smarter than me set it up this way, so I hesitate to muck with it before knowing the purpose. And if I should indeed get rid of the entries in either crontab or anacrontab, which one is best? Anacron is apparently the more sophisticated tool, but it seems weird/wrong to empty one's crontab. Am I just too old-school?

Answer

Why is cron.daily running twice?

crond is running it once:

Oct 10 03:17:01 mail CROND[2223]: (root) CMD (run-parts /etc/cron.daily)

anacron is running it once:

Oct 10 04:03:01 mail anacron[2376]: Job `cron.daily' started

crond started anacron, that is why you didn't see a process for it:

Oct 10 03:28:01 mail CROND[2367]: (root) CMD (run-parts /etc/cron.hourly)
Oct 10 03:28:01 mail run-parts(/etc/cron.hourly)[2367]: starting 0anacron
Oct 10 03:28:01 mail anacron[2376]: Anacron started on 2017-10-10

Oct 10 03:28:01 mail run-parts(/etc/cron.hourly)[2378]: finished 0anacron
Oct 10 03:28:01 mail anacron[2376]: Will run job `cron.daily' in 35 min.
Oct 10 03:28:01 mail anacron[2376]: Jobs will be executed sequentially

nat - Iptables udp port forwarding

I'm using latest debian relese and i need to do some port forwarding, but i dont know how.I have 2 stream sources coming to my server on the same udp port from 2 diferent ip-s

192.168.1.2:1003 via udp to 192.168.1.4 (server)  
192.168.1.3:1003 via udp to 192.168.1.4 (server)

My qestion is: how to forward this port 1003 coming from 1.2 to some other port 1004 for example?

linux - Why is it unable to make query directly from the country level DNS?

I tried to make a DNS trace request (taking Oxford's website www.ox.ac.uk as an example) starting from Google's DNS 8.8.8.8. I can successfully get the result and the route was via the country level DNS nsa.nic.uk.

However, when I tried to ask nsa.nic.uk directly, there is no route shown. Is that normal and why didn't it show the result?

Thanks in advance!

Command 1 (asking Googld DNS):-

$ dig +trace www.ox.ac.uk @8.8.8.8

Command 1 Result (get route successfully):-

8.8.8.8 (Google DNS)

-> 192.203.230.10 (e.root-servers.net)

-> 156.154.100.3 (nsa.nic.uk)

-> 193.62.157.66 (ns4.ja.net)

-> 193.63.105.17 (ns2.ja.net)

-> 129.67.242.155 (www.ox.ac.uk)

Command 2 (asking nsa.nic.uk directly):-

$ dig +trace www.ox.ac.uk @156.154.100.3

Command 2 Result (get no route):-

Received 28 bytes from 156.154.100.3#53(156.154.100.3) in 79 ms

Answer

Yes, this is normal.

Google DNS 8.8.8.8 is a "recursive" DNS, which means it will resolve any domains for you (by querying the consecutive authoritative DNS servers for each of the components of the domain, starting with the root and going all the way to the "www" component.)

The country DNS nsa.nic.uk is an authoritative DNS for "uk." but it does not accept recursive queries.

If you do the "dig" command without +trace, you'll see it will reply something, but it's only the next level of the tree:

$ dig www.ox.ac.uk @156.154.100.3

;; AUTHORITY SECTION:
ac.uk.          172800  IN  NS  ns3.ja.net.
ac.uk.          172800  IN  NS  ns4.ja.net.
ac.uk.          172800  IN  NS  ns2.ja.net.
ac.uk.          172800  IN  NS  ns1.surfnet.nl.
ac.uk.          172800  IN  NS  dns-3.dfn.de.
ac.uk.          172800  IN  NS  ns0.ja.net.
ac.uk.          172800  IN  NS  auth03.ns.uu.net.

If then you go to the next step and ask one of those for the domain, you'll get the next step:

$ dig www.ox.ac.uk @ns0.ja.net
;; AUTHORITY SECTION:
ox.ac.uk.       86400   IN  NS  dns1.ox.ac.uk.
ox.ac.uk.       86400   IN  NS  dns2.ox.ac.uk.
ox.ac.uk.       86400   IN  NS  dns0.ox.ac.uk.
ox.ac.uk.       86400   IN  NS  ns2.ja.net.

When you query 8.8.8.8, it does all the steps of the resolution for you... And when you do +trace, it will show you the individual steps too...

Sunday, September 23, 2018

virtual machines - Automatic VM capacity planning

Situation:

I've got a bunch of blades, all with same amount of memory and cores in each. Some have local storage, some do not and rely on the SAN.

I've also got a ton of VMs that I need to build and drop on these blades. There are about 7 or 8 different instance types, each which has the same spec. For example, instanceA has 2 GB RAM/2 cores/100 GB SAN storage. InstanceB has 4 GB RAM/8 cores/60 GB blade-local storage. InstanceC has 16GB RAM/4 cores/100 GB blade-local storage. Etc.

Is there some sort of tool somewhere that I can run/get/etc that I can punch in each blade and instance spec, and said tool will automatically propose which blades to put instances on, while leaving a bit of room for overhead? Even an Excel template or something would work.

List of VMs and hosts goes in, locations of VMs on guests comes out. VMware with vSphere.

Any input is appreciated

Friday, September 21, 2018

linux - AWS EC2 - CentOS 7 Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

I have an EC2 Instance with CentOS 7,
after rebooting and stop - start, the instance cannot run anymore.

I got this error, what cause this and how to fix it, thank you:

[    1.601892] List of all partitions:

[    1.604458] No filesystem could mount root, tried: 

[    1.608147] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

[    1.609140] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-514.16.1.el7.x86_64 #1


[    1.609140] Hardware name: Xen HVM domU, BIOS 4.2.amazon 02/16/2017

===== More complete log

[    0.802498] registered taskstats version 1

[    0.805672] Key type trusted registered

[    0.808549] Key type encrypted registered


[    0.811468] IMA: No TPM chip found, activating TPM-bypass!

[    0.815226] xenbus_probe_frontend: Device with no driver: device/vbd/768

[    0.819871] xenbus_probe_frontend: Device with no driver: device/vif/0

[    0.824033] rtc_cmos 00:02: setting system clock to 2017-06-07 08:57:52 UTC (1496825872)

[    1.516119] tsc: Refined TSC clocksource calibration: 2399.999 MHz


[    1.576495] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3

[    1.582622] md: Waiting for all devices to be available before autodetect

[    1.587126] md: If you don't use raid, use raid=noautodetect

[    1.590876] md: Autodetecting RAID arrays.

[    1.593819] md: Scanned 0 and added 0 devices.


[    1.597016] md: autorun ...

[    1.599321] md: ... autorun DONE.

[    1.601892] List of all partitions:

[    1.604458] No filesystem could mount root, tried: 

[    1.608147] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)


[    1.609140] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-514.16.1.el7.x86_64 #1

[    1.609140] Hardware name: Xen HVM domU, BIOS 4.2.amazon 02/16/2017

[    1.609140]  ffffffff818b44d0 00000000dbd248fc ffff88003d66fd60 ffffffff81686ac3

[    1.609140]  ffff88003d66fde0 ffffffff8167feca ffffffff00000010 ffff88003d66fdf0

[    1.609140]  ffff88003d66fd90 00000000dbd248fc 00000000dbd248fc ffff88003d66fe00


[    1.609140] Call Trace:

[    1.609140]  [] dump_stack+0x19/0x1b

[    1.609140]  [] panic+0xe3/0x1f2

[    1.609140]  [] mount_block_root+0x2a1/0x2b0

[    1.609140]  [] mount_root+0x53/0x56


[    1.609140]  [] prepare_namespace+0x13c/0x174

[    1.609140]  [] kernel_init_freeable+0x1f0/0x217

[    1.609140]  [] ? initcall_blacklist+0xb0/0xb0

[    1.609140]  [] ? rest_init+0x80/0x80

[    1.609140]  [] kernel_init+0xe/0xf0


[    1.609140]  [] ret_from_fork+0x58/0x90

[    1.609140]  [] ? rest_init+0x80/0x80

Wednesday, September 19, 2018

cpu usage - CPU utilization imbalance on ESXi 6.7

We are working with a server running ESXi 6.7 Enterprise Plus, and the mobo has 2 Xeon 10-core CPU's.

The host is moderately loaded, but strangely the ESXi monitoring screen shows MAX socket (package) 0 at 87% utilization, and socket (package) 1 at 2.5% utilization, and AVERAGE socket 0 at 20% and socket at 1%.

Is this normal? Should ESXi be balancing the load across the 2 CPU's? Or does it fill one and then start using the other.

License is installed and should support 2 sockets I think (though I don't see a CPU limit on the licensing tab of the GUI). I didn't purchase the hardware/license so I don't know too much about what was purchased but I can see the license tab and it looks right-ish. I just don't see anything that says 2 SOCKETS...so I'm wondering if another license needs to be purchased to activate the second socket? Does anyone with ESXi 6.7 with Enterprise Plus have a line in their license tab showing # of sockets licensed?

Answer

The ESXi scheduler is NUMA aware. By default, it will prefer to keep VMs on one socket's cores and memory if possible. An overview of this is in the Resource Management Guide.

You can show 2 sockets get used by putting more load on the host. If its usual workload isn't enough, create a 14 core VM and run something multi-threaded and CPU intensive. Have fun with it, maybe compile a very large software package, or donate some CPU cycles to science. Both sockets should be well over 2% utilized, because the VM is larger than one node.

apache 2.2 - Set default permissions for new files (Linux)

I'm having a bit of a nightmare with file permissions on my web-server at the moment.

The server has Apache installed on it, which uses the user 'apache' from the group 'apache'.
I have also installed FTP which uses the user 'ftp' from the group 'ftp'.

The 'ftp' user has access to a directory on the server called 'uploads'. This is owned by the 'ftp' user and the group 'ftp'. This is all working fine.

The problem...

There are scripts owned by the 'apache' user which needs to have RWX access to the same directory, 'uploads'.
To try and achieve this I added the 'apache' user into the 'ftp' group and set the folder permissions to 775. This worked fine, but new files added by the 'ftp' user are always 744, allowing the 'ftp' users full access but only allowing the 'apache' user read access.

What I am looking for is the ability to always allow the 'apache' user RWX access to ALL files / folders in this directory. How do I set the default permissions for new files/folders to have a different permission? Or allow the group permissions to be RWX by default. Or have I missed something obvious??

Any help will be greatly received.
Thanks, Ben

Answer

You should try umask :

The umask (UNIX shorthand for "user file-creation mode mask") is a four-digit octal number that UNIX uses to determine the file permission for newly created files.

More information on the man page.

You can read this tutorial to understand how to use umask easily :

Tuesday, September 18, 2018

virtualization - new hyper V High Avaliability setup and SMB 3.0 storage

I am looking at building an new 2012 R2 with Hyper V setup, everything has to be HP hardware.

2x HyperV 2012 R2 servers running 20-30 VMs. These will be dual 6 core xeons with 128gb of ram.

2x 2012 R2 back end storage servers running SMB 3.0 shares, connected to the front end VM servers with multiple 10gb ethernet connections. Not sure on the hardware requirements for these servers so advice helpful.

2x SAS 6gb external enclosure for each storage server. Probably HP d2600 enclosures.

A couple of questions:

What is the best way to achieve high availability and redundancy in this setup? Should I use replicas or a HV cluster?

What is the best way to ensure redundancy of the storage back ends? File share clustering?

Would the performance of the SAS enclosure be fast enough? What kind of drives would be suitable for this? Would 7.2k SAS drives be enough or would we need faster? What would be the best raid setup? Initially there would be 12 drives expanding to 24. Would a mix of fast SAS or SSD and SATA be more efficient?

Thanks in advance, any help appreciated.

linux - Page allocation failure - Am I running out of memory?

Lately, I've noticed entries like this one in the kern.log of one of my servers:

Feb 16 00:24:05 aramis kernel: swapper: page allocation failure. order:0, mode:0x20

I'd like to know:

What exactly does that message mean?

Is my server running out of memory?

The swap usage is quite low (less than 10%), and so far I haven't noticed any processes being killed because of lack of memory.

Additional information:

The server is a Xen instance (DomU) running Debian 6.0

It has 512 MB of RAM and a 512 MB swap partition

CPU load inside the virtual machine shows an average of 0.25

.htaccess - rewrite url to subdomain and include full url

i'm trying to prepend all of my urls from mysite.com/sub to sub.mysite.com and retain the full url using mod_rewrite ad it doesn't seem to be working.

example

if someone goes to - mysite.com/sub/1/2/3, the url in the address bar should be sub.mysite.com/1/2/3

i moved a wordpress install to my root that was previously installed to a subdirectory with a subdomain pointing to it and want to retain my url structure for posts.

thanks!

proxy - Putty SSH tunnelling not working for HTTPS sites

I created a proxy by creating a SSH tunnel using PUTTY, and then filling in the values of the server in my home computer's browser proxy settings. I entered the server IP in the Socks list.

I can view all HTTP non secure sites, but when I try to go to a HTTPS site, the page comes out blank. Happens with all HTTPS sites.

Whilst connecting by putty, I even tried putting local port 443 and destination port localhost:443, and filling in the browser proxy settings for the HTTPS field, but no luck still.

Can anybody tell me how I can achieve to browser HTTPS using my proxy.

login - Failure running SQL2012 on Windows 8 with local service accounts

I have installed SQL2012 on a Windows 8 laptop that is on a domain. The domain includes a group policy that defines accounts that can "Log on as a service". I installed SQL Server 2012 using the default settings, including the use of local service accounts.

When first installed, whilst connected to the domain, the server runs. When I look at the local security policy, the service accounts that SQL was installed to run with show up.

When the PC is rebooted, disconnected from the domain, SQL Server will no longer start. Even when reconnected to the domain, it gives the error "Error 1069: The service did not start due to a log-on failure". When I look at the local security policy, the service accounts that SQL Server installs with are no longer present.

I know that I can run SQL Server under a domain account and add that domain account to the group policy (I already have an SQL Service account configured like this for servers), but surely there must be a way to run with the local accounts; otherwise why would it sometimes work? I have installed this 3 times now, both when connected to the domain and when not connected to the domain but nothing seems to work. It always works when first installed, and fails sometime later after the laptop has been rebooted one or more times (to be honest, I have yet to determine the pattern of reboots that results in the log-on failure - sometimes I am connected to the domain when I reboot and other times I am not).

Incidentally, within the "Log on as a service" properties in the local security policy, I see only those accounts defined in group policy and the buttons to add further accounts are greyed out. I have also tried "Run as administrator" when opening LSP, but this makes no difference.

Sunday, September 16, 2018

linux - (ssh tunnel?) Access remote server with private IP through a DIFFERENT server with public IP

Let's assume the following hosts:

localhost : my laptop

remoteserver : a server with a public IP which runs a SSH server.

private.remoteserver : a server with a private IP which is only accessible from remoteserver.

I don't have sudo access to remoteserver, so I can't make changes with the root user.

The question is: Is it possible to access a port on private.remoteserver from remoteserver, in a single command?

I've played around a bit with ssh tunnels without luck. It would like to create an SSH alias to private.remoteserver as described in this article.

For example, I'd like to run from localhost:

curl http://private.remoteserver:8080/

to connect to port 8080 on private.remoteserver. Is this possible?

Answer

You haven't show us what you've tried so far, but something as simple as this should work:

ssh -L 8080:private.remoteserver:8080 remoteserver

Which would then let you run:

curl http://localhost:8080/

...which due to the port forwarding we just set up would actually connect to port 8080 on private.remoteserver.

If you want to be able to directly access http://private.remoteserver:8080/ from your client, you'll need to (a) set up some sort of proxy and (b) configure curl (or other software) to use the proxy. You can set up a SOCKS5 proxy with ssh using the -D option:

ssh -D 1080 remoteserver

And then you can:

curl --socks5-hostname http://private.remoteserver:8080/

Most web browsers (Firefox, Chrome) can also be configured to operate with a SOCKS5 proxy. If you search for "ssh dynamic forwarding" you'll find lots of good documentation, including this article from Ubuntu.

Saturday, September 15, 2018

blacklist - What are the best methods for catching snowshoe spam?

I'm using Smartermail for my small mailserver. We've been having a problem lately of getting waves of snowshoe spam that follow the same pattern. They come in batches of 3 or 4 at a time. The bodies are almost identical save for the domain name they link to. The source IPs tend to be from the same /24 block for awhile, then they switch to another /24. The domains tend to be brand new. They have valid PTR and SPF records and have random gibberish at the bottom of the body to spoof bayesian filters.

I'm using a dozen or so different RBLs including Barracuda, Spamhaus, SURBL and URIBL. They do a decent job catching most of them, but we still get a lot the slip through because the IPs and domains haven't been blacklisted.

Are there any strategies I can employ, including RBLs that block newly created domains or deal specifically with snoeshow spam? I'm hoping to avoid having to use a 3rd party filtering service.

Friday, September 14, 2018

Software vs hardware RAID performance and cache usage

I've been reading a lot on RAID controllers/setups and one thing that comes up a lot is how hardware controllers without cache offer the same performance as software RAID. Is this really the case?

I always thought that hardware RAID cards would offer better performance even without cache. I mean, you have dedicated hardware to perform the tasks. If that is the case what is the benefit of getting a RAID card that has no cache, something like a LSI 9341-4i that isn't exactly cheap.

Also if a performance gain is only possible with cache, is there a cache configuration that writes to disk right away but keeps data in cache for reading operations making a BBU not a priority?

Answer

In short: if using a low-end RAID card (without cache), do yourself a favor and switch to software RAID. If using a mid-to-high-end card (with BBU or NVRAM), then hardware is often (but not always! see below) a good choice.

Long answer: when computing power was limited, hardware RAID cards had the significant advantage to offload parity/syndrome calculation for RAID schemes involving them (RAID 3/4/5, RAID6, ecc).

However, with the ever increasing CPU performance, this advantage basically disappeared: even my laptop's ancient CPU (Core i5 M 520, Westmere generation) has XOR performance of over 4 GB/s and RAID-6 syndrome performance over 3 GB/s over a single execution core.

The advantage that hardware RAID maintains today is the presence of a power-loss protected DRAM cache, in the form of BBU or NVRAM. This protected cache give very low latency for random write access (and reads that hit) and basically transform random writes into sequential writes. A RAID controller without such a cache is near useless. Moreover, some low-end RAID controllers do not only come without a cache, but forcibly disable the disk's private DRAM cache, leading to slower performance than without RAID card at all. An example are DELL's PERC H200 and H300 cards: if newer firmware has not changed that, they totally disable the disk's private cache (and it can not be re-enabled while the disks are connected to the RAID controller). Do a favor yourself and do not, ever, never buy such controllers. While even higher-end controller often disable disk's private cache, they at least have their own protected cache - making HDD's (but not SSD's!) private cache somewhat redundant.

This is not the end, though. Even capable controllers (the one with BBU or NVRAM cache) can give inconsistent results when used with SSD, basically because SSDs really need a fast private cache for efficient FLASH page programming/erasing. And while some (most?) controllers let you re-enable disk's private cache (eg: PERC H700/710/710P let the user re-enable it), if that private cache is not write-protected you risks to lose data in case of power loss. The exact behavior really is controller and firmware dependent (eg: on a DELL S6/i with 256 MB WB cache and enabled disk's cache, I had no losses during multiple, planned power loss testing), giving uncertainty and much concern.

Open source software RAIDs, on the other hand, are much more controllable beasts - their software is not enclosed inside a proprietary firmware, and have well-defined metadata patterns and behaviors. Software RAID make the (right) assumption that disk's private DRAM cache is not protected, but at the same time it is critical for acceptable performance - so they typically do not disable it, rather they use ATA FLUSH / FUA commands to be certain that critical data land on stable storage. As they often run from the SATA ports attached to the chipset SB, their bandwidth is very good and driver support is excellent.

However, if used with mechanical HDDs, synchronized, random write access pattern (eg: databases, virtual machines) will greatly suffer compared to an hardware RAID controller with WB cache. On the other hand, when used with enterprise SSDs (ie: with a powerloss protected write cache), software RAID often excels and give results even higher than what achievable with hardware RAID cards. That said you had to remember that consumer SSDs (read: with non-protected writeback cache), while very good at reading and async writing, deliver very low IOPS in synchronized write workloads.

Also consider that software RAIDs are not all created equal. Windows software RAID has a bad reputation, performance wise, and even Storage Space seems not too different. Linux MD Raid is exceptionally fast and versatile, but Linux I/O stack is composed of multiple independent pieces that you need to carefully understood to extract maximum performance. ZFS parity RAID (ZRAID) is extremely advanced but, if not correctly configured, can give you very poor IOPs; mirroring+striping, on the other side, performs quite well. Anyway, it need a fast SLOG device for synchronous write handling (ZIL).

Bottom line:

if your workloads are not synchronized random write sensitive, you don't need a RAID card

if you need a RAID card, do not buy a RAID controller without WB cache

if you plan to use SSD software RAID is preferred but keep in mind that for high synchronized random writes you need a powerloss-protected SSD (ie: Intel S4600, Samsung PM/SM863, etc). For pure performance the best choice probably is Linux MD Raid, but nowadays I generally use striped ZFS mirrors. If you can not afford losing half the space due to mirrors and you needs ZFS advanced features, go with ZRAID but carefully think about your VDEVs setup.

if you, even using SSD, really need an hardware RAID card, use SSDs with write-protected caches (Micron M500/550/600 have partial protection - not really sufficient but better than nothing - while Intel DC and S series have full power loss protection, and the same can be said for enterprise Samsung SSDs)

if you need RAID6 and you will use normal, mechanical HDDs, consider to buy a fast RAID card with 512 MB (or more) WB cache. RAID6 has a high write performance penalty, and a properly-sized WB cache can at least provide a fast intermediate storage for small synchronous writes (eg: filesystem journal).

if you need RAID6 with HDDs but you can't / don't want to buy a hardware RAID card, carefully think about your software RAID setup. For example, a possible solution with Linux MD Raid is to use two arrays: a small RAID10 array for journal writes / DB logs, and a RAID6 array for raw storage (as fileserver). On the other hand, software RAID5/6 with SSDs is very fast, so you probably don't need a RAID card for an all-SSDs setup.

linux - How do I add storage with cloud templating?

I have a CloudFormation template to spin up an EC2 instance.

Parameters:
  InstanceType:
    Type: String
    Description: Instance type for RStudio. Default is t2.micro.
    AllowedValues:
      - t2.micro
      - t2.small

      - t2.medium
      - t2.large
    ConstraintDescription: 'Valid instance type in the t2 family'
    Default: t2.micro
  ImageId:
    Type: 'AWS::EC2::Image::Id'
    Description: >-
      Amazon Linux Image ID. Default is for 2017.03.01 (HVM). N.B. 
    Default: ami-4fffc834

When I spin up the instance manually, there is an option to add storage. It defaults to 8gb and I'd like to do 16gb instead.

I looked for the syntax to add storage with CloudFormation. What is the syntax to set a volume size other than the default?

Thursday, September 13, 2018

postfix - Outlook is rejecting emails without any notification

i have configured postfix with opendkim. Everything is working fine, I have tested DKIM records as well as DMARC and SPF, everything seems fine. I am also receiving emails on my gmail account in Inbox. But when I try to send email to Outlook.com, the email does not work. Also, I do not see any error in the mail log.

Mail log:

Apr 16 17:51:34postfix/smtp[4778]: E6911060F:
to=<*******@hotmail.com>, relay=mx1.hotmail.com[65.54.188.72]:25,
delay=2, delays=0.09/0/1.2/0.75, dsn=2.0.0, status=sent (250

Queued mail
for delivery) Apr 16 17:51:34 postfix/qmgr[4697]: E6911060F:
removed

I do not receive in either junk or inbox @ hotmail. Seems as Outlook is just rejecting the email.

Couple of days ago everything was working fine, but I was receiving email in the junk, then I did setup DKIM, I started getting emails in the Inbox but after somedays I stopped receiving email to my Outlook at all, I do not see any error too.

In my SPF records I am using: "v=spf1 a ~all"

And when I try to send an email from Outlook to my email@domain.org
I do receive the email, but in the logs this is what I see.

Apr 16 17:53:22 postfix/smtpd[5016]: Anonymous TLS connection established from mail-oln040092255027.outbound.protection.outlook.com[40.92.255.27]: TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)
Apr 16 17:53:23 postfix/smtpd[5016]: 3F45D4060F: client=mail-oln040092255027.outbound.protection.outlook.com[40.92.255.27]
Apr 16 17:53:23 postfix/cleanup[5023]: 3F45D4060F: message-id=
Apr 16 17:53:23 opendkim[819]: 3F45D4060F: mail-oln040092255027.outbound.protection.outlook.com [40.92.255.27] not internal   
Apr 16 17:53:23  opendkim[819]: 3F45D4060F: not authenticated
Apr 16 17:53:23 opendkim[819]: 3F45D4060F: failed to parse authentication-results: header field

Apr 16 17:53:23 opendkim[819]: 3F45D4060F: DKIM verification successful

I am still confused. can anyone detect the issue? If you need anymore configurations please let me know.

debian - Cacti not working for SNMP data sources

I installed packages cacti and snmpd on a Debian server. I'm able to display common graphs in Cacti (such as memory usage, load average, logged in users, etc) using the data templates listed as Unix. Now I want to replace these graphs with new ones using SNMP data sources, because I see there is also CPU usage and because it's not excluded I have to manage multiple hosts in the future.

So, I installed snmpd on the machine and left the snmpd.conf as it is. In Cacti, I created three new data sources from SNMP templates for 127.0.0.1 host:

ucd/net - CPU Usage - Nice

ucd/net - CPU Usage - System

ucd/net - CPU Usage - User

Then I created a new graph from template ucd/net - CPU Usage, and select the three data sources in the Graph Item Fields section. Graph is now enabled and running, but empty. No data have been collected.

Under Console -> Devices my SNMP host is listed as up and running:

System:Linux ip-xx-xx-xxx-xxx 3.2.0-23-virtual #36-Ubuntu SMP Tue Apr 10 22:29:03 UTC 2012 x86_64
Uptime: 929267 (0 days, 2 hours, 34 minutes)
Hostname: ip-xx-xx-xxx-xxx
Location: Sitting on the Dock of the Bay
Contact: Me me@example.org

In SNMP Options I left all as it is:

SNMP Version: Version 1

SNMP Community: public

SNMP Timeout: 500 ms

Maximum OID's Per Get Request: 10

In Console -> Utilities -> Cacti Log I have multiple warning (two for each data source) every 5 minutes:

10/29/2012 01:45:01 PM - CMDPHP: Poller[0] Host[2] DS[18] WARNING: Result from SNMP not valid. Partial Result: U
10/29/2012 01:45:01 PM - CMDPHP: Poller[0] WARNING: SNMP Get Timeout for Host:'127.0.0.1', and OID:'.1.3.6.1.4.1.2021.4.15.0'
10/29/2012 01:45:01 PM - CMDPHP: Poller[0] Host[1] DS[9] WARNING: Result from SNMP not valid. Partial Result: U
10/29/2012 01:45:01 PM - CMDPHP: Poller[0] WARNING: SNMP Get Timeout for Host:'127.0.0.1', and OID:'.1.3.6.1.4.1.2021.11.52.0'
10/29/2012 01:40:01 PM - CMDPHP: Poller[0] Host[2] DS[19] WARNING: Result from SNMP not valid. Partial Result: U
10/29/2012 01:40:01 PM - CMDPHP: Poller[0] WARNING: SNMP Get Timeout for Host:'127.0.0.1', and OID:'.1.3.6.1.4.1.2021.4.6.0' 
[...]

I have the feeling I'm missing something, but I cannot get it...

Answer

Could you try this command (OID is from Cacti log):

SNMPv1:

  $ snmpwalk -Cc -On -v 1 -c public 127.0.0.1 1.3.6.1.4.1.2021.11.52.0

SNMPv2c: Nowadays is very common SNMPv2c as default, so also try.

  $ snmpwalk -Cc -On -v 1 -c public 127.0.0.1 1.3.6.1.4.1.2021.11.52.0

Also, try changing COMMUNITY and/or IP ADDRESS to local network instead loopback.

If you don't get a result like .1.3.6.1.4.1.2021.11.52.0 = Counter32: 250038, then edit /etc/snmp/snmpd.conf adding or decommenting:

rocommunity public  localhost

And restart snmpd using one of:

/etc/init.d/snmpd restart
service snmpd restart

Tuesday, September 11, 2018

usb flash drive - Does USB key speed matter with ESXi 4.0?

I am setting up a ESXi 4.0 server and have decided to use the CD installers ability to install to a USB key. I have a freebie slow 1GB Newegg USB key and a very fast 4GB Patriot Xporter XT Boost USB key. I'd like to use the Patriot somewhere else, but if speed makes a difference, I'll just buy another one.

Does the speed of the USB key make a noticeable difference when running ESXi?

Answer

I have no concrete benchmarks, but from trying it, the answer is only when initially loading the hypervisor.

I used a 8GB Very Fast USB stick, but then I needed to use it for something else so I replaced it with a cheap 2GB one that I got for free at a trade show.

It boots slightly slower when actually starting ESXi, but once it is loaded and at the half yellow/grey screen, I can not see any difference in speed at all.

linux - Why is "chmod -R 777 /" destructive?

This is a Canonical Question about File Permission and Why 777 is "destructive".

I'm not asking how to fix this problem, as there are a ton of references of that (reinstall OS). Why does it do anything destructive at all?

If you've ever ran this command you pretty much immediately destroy your operating system. I'm not clear why removing restrictions has any impact on existing processes. For example, if I don't have read access to something and after a quick mistype in the terminal suddenly I now have access well... why does that cause Linux to break?

Answer

First of all a minor terminology nitpick: chmod doesn't remove permissions. It CHANGES them.

Now the meat of the issue -- The mode 777 means "Anyone can read, write or execute this file" - You have given permission for anyone to do (effectively) whatever the heck they want.

Now, why is this bad?

You've just let everyone read/modify every file on your system.
- Kiss password security goodbye (anyone can read the shadow file and crack your passwords, but why bother? Just CHANGE the password! It's much easier!).
- Kiss security for your binaries goodbye (someone can just write a new login program that lets them in every time).
- Kiss your files goodbye: One user misdirects rm -r / and it's all over. The OS was told to let them do whatever they wanted!

You've pissed off every program that checks permissions on files before starting.
sudo, sendmail, and a host of others simply will not start any more. They will examine key file permissions, see they're not what they're supposed to be, and kick back an error message.
Similarly ssh will break horribly (key files must have specific permissions, otherwise they're "insecure" and by default SSH will refuse to use them.)

You've wiped out the setuid / setgid bits on the programs that had them.
The mode 777 is actually 0777. Among the things in that leading digit are the setuid and setgid bits.
Most programs which are setuid/setgid have that bit set because they must run with certain privileges. They're broken now.

You've broken /tmp and /var/tmp
The other thing in that leading octal digit that got zero'd is the sticky bit -- That which protects files in /tmp (and /var/tmp) from being deleted by people who don't own them.
There are (unfortunately) plenty of badly-behaved scripts out there that "clean up" by doing an rm -r /tmp/*, and without the sticky bit set on /tmp you can kiss all the files in that directory goodbye.
Having scratch files disappear can really upset some badly-written programs...

You've caused havoc in /dev /proc and similar filesystems
This is more of an issue on older Unix systems where /dev is a real filesystem, and the stuff it contains are special files created with mknod, as the permissions change will be preserved across reboots, but on any system having your device permissions changing can cause substantial problems, from the obvious security risks (everyone can read every TTY) to the less-obvious potential causes of a kernel panic.
Credit to @Tonny for pointing out this possibility

Sockets and Pipes may break, or have other problems
Sockets and pipes may break entirely, or be exposed to malicious injection as a result of being made world-writeable.
Credit to @Tonny for pointing out this possibility

You've made every file on your system executable
A lot of people have . in their PATH environment variable (you shouldn't!) - This could cause an unpleasant surprise as now anyone can drop a file conveniently named like a command (say make or ls, and have a shot at getting you to run their malicious code.
Credit to @RichHomolka for pointing out this possibility

On some systems chmod will reset Access Control Lists (ACLs)
This means you may wind up having to re-create all your ACLs in addition to fixing permissions everywhere (and is an actual example of the command being destructive).
Credit to @JamesYoungman for pointing out this possibility

Will the parts of the system which are already running continue to run? Probably, for a while at least.
But the next time you need to launch a program, or restart a service, or heaven forbid REBOOT the box you're in for a world of hurt as #2 and #3 above will rear their ugly heads.

Monday, September 10, 2018

Trouble starting apache with two virtual hosts, 2 ip's and 2 ssl's

In Apache 2.2.22 I am attempting to run two virtual hosts with two IP addys and two SSL certs. I have nothing regarding listening to ports or NameVirtualHost in any other configuration files other than the files that configure the virtual hosts. In the process of getting this to work I would like to have the minimum amount of code necessary.

In the sites-available directory I have site1.com config file:

NameVirtualHost 1.1.1.1:80
Listen 1.1.1.1:80
Listen 1.1.1.1:443


    ServerAdmin me@site1.com
    ServerName site1.com
    ServerAlias www.site1.com

    RewriteEngine On
    RewriteCond %{SERVER_PORT} !443
    RewriteRule (.*) https://www.site1.com/ [R]



    ServerName site1.com
    ServerAlias www.site1.com
    DocumentRoot /home/j/site1/public


    SSLEngine On
    SSLCertificateFile /etc/apache2/ssl/site1.com.crt
    SSLCertificateKeyFile /etc/apache2/ssl/site1.com.key
    SSLCertificateChainFile /etc/apache2/ssl/gd_bundle_site1.crt

    LogLevel warn
    ErrorLog /home/j/site1/log/error.log
    CustomLog /home/j/site1/log/access.log combined

If site1.com is the only site that is enabled, the server starts fine. When I enable site2.com I run into trouble. Sudo apachectl configtest results in Syntax OK but a restart of apache results in (99)Cannot assign requested address: make_sock: could not bind to address 2.2.2.2:8080 no listening sockets available, shutting down Unable to open logs Action 'start' failed. Here's the content of site2.com:

NameVirtualHost 2.2.2.2:8080
Listen 2.2.2.2:8080
Listen 2.2.2.2:4430


    ServerAdmin me@site2.com
    ServerName  site2.com

    ServerAlias www.site2.com
    RewriteEngine On
    RewriteCond %{SERVER_PORT} !4430
    RewriteRule (.*) https://www.site2.com/ [R]



    ServerName site2.com
    ServerAlias www.site2.com
    DocumentRoot /home/j/site2/public


    SSLEngine On
    SSLCertificateFile /etc/apache2/ssl/site2.com.crt
    SSLCertificateKeyFile /etc/apache2/ssl/site2.key
    SSLCertificateChainFile /etc/apache2/ssl/gd_bundle_site2.crt

    LogLevel warn
    ErrorLog /home/j/site2/log/error.log
    CustomLog /home/j/site2/log/access.log combined

Thanks for your help.

UPDATE:

Results for netstat -lpn less udp6:

(No info could be read for "-p": geteuid()=1000 but you should be root.)
Active Internet connections (only servers)

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:3306          0.0.0.0:*               LISTEN      -               
tcp        0      0 1.1.1.1:80         0.0.0.0:*               LISTEN      -               
tcp        0      0 0.0.0.0:7187            0.0.0.0:*               LISTEN      -               
tcp        0      0 1.1.1.1:443        0.0.0.0:*               LISTEN      -               
tcp        0      0 127.0.0.1:55363         0.0.0.0:*               LISTEN      -               
tcp6       0      0 :::7187                 :::*                    LISTEN      -               
udp        0      0 0.0.0.0:68              0.0.0.0:*                           -               
udp        0      0 1.1.1.1:123        0.0.0.0:*                           -               
udp        0      0 127.0.0.1:123           0.0.0.0:*                           -               

udp        0      0 0.0.0.0:123             0.0.0.0:*                           -               
Active UNIX domain sockets (only servers)
Proto RefCnt Flags       Type       State         I-Node   PID/Program name    Path
unix  2      [ ACC ]     STREAM     LISTENING     169974   -                   /tmp/passenger.1.0.32045/generation-0/spawn-server/socket.32055.19978820
unix  2      [ ACC ]     STREAM     LISTENING     2823     -                   @/tmp/fam-root-
unix  2      [ ACC ]     STREAM     LISTENING     180580   -                   /tmp/passenger.1.0.32045/generation-0/backends/ruby.ui0IFvdXouP5Ukb3zZo2fiLBEJOgc5835cbcGK93fhrs5ogoitaPfi1
unix  2      [ ACC ]     STREAM     LISTENING     10547    -                   /var/run/mysqld/mysqld.sock
unix  2      [ ACC ]     STREAM     LISTENING     106      -                   @/com/ubuntu/upstart
unix  2      [ ACC ]     STREAM     LISTENING     182366   -                   /var/run/apache2/cgisock.32045
unix  2      [ ACC ]     STREAM     LISTENING     395      -                   /var/run/dbus/system_bus_socket

unix  2      [ ACC ]     SEQPACKET  LISTENING     168      -                   /run/udev/control
unix  2      [ ACC ]     STREAM     LISTENING     12724    -                   /var/run/fail2ban/fail2ban.sock
unix  2      [ ACC ]     STREAM     LISTENING     181619   -                   /tmp/passenger.1.0.32045/generation-0/socket
unix  2      [ ACC ]     STREAM     LISTENING     181621   -                   /tmp/passenger.1.0.32045/generation-0/spawn-server/socket.32053.32793072
unix  2      [ ACC ]     STREAM     LISTENING     181640   -                   /tmp/passenger.1.0.32045/generation-0/logging.socket

UPDATE:

grep -r Listen /etc/apache2 produces no reference to Listen on port 8080 other than what is mentioned above.

UPDATE:

Per Jenny D's suggestion below, ifconfig -a produces the following:

dummy0    Link encap:Ethernet  HWaddr be:fc:55:b0:9e:80  
          BROADCAST NOARP  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 

          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth0      Link encap:Ethernet  HWaddr f2:3c:91:70:34:84  
          inet addr:50.116.59.14  Bcast:50.116.59.255  Mask:255.255.255.0
          inet6 addr: 2600:3c03::f03c:91ff:fe70:3484/64 Scope:Global
          inet6 addr: fe80::f03c:91ff:fe70:3484/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:69078 errors:0 dropped:0 overruns:0 frame:0
          TX packets:41852 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 

          RX bytes:16773617 (16.7 MB)  TX bytes:69148409 (69.1 MB)
          Interrupt:76 

gre0      Link encap:UNSPEC  HWaddr 00-00-00-00-34-84-00-00-00-00-00-00-00-00-00-00  
          NOARP  MTU:1476  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)


ip6gre0   Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          NOARP  MTU:1448  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ip6tnl0   Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          NOARP  MTU:1452  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ip_vti0   Link encap:IPIP Tunnel  HWaddr   
          NOARP  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)


lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:3487 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3487 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:62766 (62.7 KB)  TX bytes:62766 (62.7 KB)


sit0      Link encap:IPv6-in-IPv4  
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

tunl0     Link encap:IPIP Tunnel  HWaddr   
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

windows server 2003 - Child Folder inheriting a permission that parent folder does not have (NTFS)

I'm reconfiguring roaming profiles on my network to use proper NTFS security settings according to this article. I have reset the following permissions on the roaming profile parent folder:

CREATOR OWNER, Full Control, Subfolder and files only

User group with profiles, List folder, Create folders, This folder only

System, Full Control, This folder, subfolders, and files

Then I select one of the actual roaming profile folders and follow these steps to fix the NTFS settings:

Click Security, Advanced

Uncheck "Allow inheritable permissions..."

Choose "Remove..."

Recheck "Allow inheritable permissions..."

Click "Apply"

After I choose apply, I get the following permissions listed on the roaming profile folder:

Administrators (MYDOMAIN\Administrators) Full Control, This folder only

CREATOR OWNER, Full Control, Subfolders and files only

System, Full Control, This folder, subfolders, and files

Where is the Administrators entry coming from!? There is an entry on the root of the drive for Administrators to have full control, but the Roaming Profile Parent folder is not set to inherit any permissions, and it does not have the administrators permission.

Answer

It appears the problem was coming from my misunderstanding of the "CREATOR OWNER" permission. This "account" does not map to an SID, rather it is a permission that tells the OS "when a new item is created in this folder, grant these permissions to the creator/owner". Because I was creating the account with an administrators user, it caused the permissions to follow.

Thursday, September 6, 2018

ubuntu - Is openldap fit for large production deployments?

For about 1 year we've been using openldap on ubuntu server 10.04LTS for authenticating about 20 IT users and everything has been running fine (the operations on the LDAP server were basically limited to creating/removing users using apache directory studio).

More recently (6 months ago) we've also started implementing openldap (openldap-2.4.21/debian) as an external authentication system for our website which is being migrated from an external CMS to a new platform we're developing in house using Drupal CMS. We have a 45K-user database and things haven't been going smoothly at all. Issues that we've had are:
-ldap crashing after a backup restore, needing to be recovered.
-the ldap recover tool unable to recover the ldap database on some occassions
-slapd consuming 100% CPU while no authentication activity on the website.

Due to lack of resources and knowledge internally, all we've done so far is to find ways of keeping LDAP running without really investigating any of these issues (use monit to restart it when it crashes, db_recover to recover the db if needed, and slapcat to recreate the db from scratch when db_recover fails).

Recently we've had a round of interviews to hire a Senior infrastructure engineer to assist us with all the various infra. issues we're running into. Several candidates confirmed they've either had or heard about issues with openldap in large production environments and never managed to come up with a single stable standalone openldap server but instead had to come up with redundant deployments (replication, load balancing, auto-recovery/restart routines) to keep ldap running. Some candidates even said that openldap just wasn't fit for production environments and that instead, using alternatives such as Novel eDirectory was necessary.

Q: If you have experience in dealing with ldap in production environments with thousands of users, do you have facts to share which tend to prove that openldap is indeed unstable for such setups and that using other ldap servers are indeed recommended?

Answer

I use OpenLDAP supporting a user-base of about 10,000 active users who rely on it throughout the day for everything. Problems are rare. Many services rely on it, for authentication and other things.

However, we have 4 read-only replicas (slaves/consumers) behind a load-balancer, a hidden master and a hot standby master. Used to be 2 front-end servers, but we had load problems during certain peak times (when 4,000 or so of those users were desperately trying to hit it at the same second). All write access to LDAP is via our code.

That equipment and OS is all old and we're working on replacing it with a new setup that will go back to only 2 replicas (that aren't doing as many other things) and "mirror mode" replication between a pair of masters in an HA configuration. Again, problems are rare.

We used to have some problems with replication failing, but that's mostly from when we were using slurpd instead of syncrepl. Also, unclean shutdowns of a server can corrupt the data.

Keys to running OpenLDAP in a large-scale production environment, in my experience:

Somebody that understands LDAP and OpenLDAP well. Preferably more than one somebody.

Somebody that understands all the other directly related parts of the infrastructure well.

Somebody that understands how OpenLDAP replication works.

A reasonable understanding of the BerkeleyDB options (or whatever backend you're using), since the defaults aren't quite right.

Highly available slaves. More than 1. Better: really load-balanced.

**Active-passive masters (active-active master replication is inherently tricky)

We back up LDAP data to LDIF every hour and keep a few days worth of those on disk. (the whole server gets backed up nightly)

We have scripts to quickly bring a broken slave back to a clean current data replica

We have scripts to quickly restore a broken master from the LDIF backups (via slapadd)

We can quickly switch to the standby master. (scripts)

We monitor that the replication connections are alive

We monitor that the replications IDs are current on all slaves

We monitor (less often) that the entire contents of the slaves match the master.

Basically, though, if it's a key part of your infrastructure, somebody on your team should really understand it well.

Addendum: By request, the DB_CONFIG file from my openldap DB directory. Look at http://docs.oracle.com/cd/E17076_02/html/api_reference/C/configuration_reference.html for details.

set_cachesize 0 536870912 1

set_flags DB_TXN_NOSYNC
set_flags DB_TXN_WRITE_NOSYNC
set_lg_regionmax 268435456
set_lg_max 536870912
set_lg_bsize 134217728

debian - What's the difference between 'useradd' and 'adduser'?

What's the difference between useradd and adduser? When/why should I prefer using one or the other?

Answer

In the case of Debian and its related distros, adduser is a friendlier interactive frontend to useradd.

When ordering SAS replacement drive, does the hot swappable enclosure matter?

I have a dell PowerEdge and need to replace one of the SAS drives in it. I noticed the supplier I order from does not sell a specific "DELL" SAS replacement drive, but they have what appear to be compatible SAS (15000rpm/146gb/SAS) drives.

The problem is, they are branded "IBM", "HP" and "Fujtisu". Each of them appear to have their own type of hot swappable mechanism attached in the picture.

I am ASSUMING that's OK and I would simple unscrew the drive from that enclosure and put it in to my Dell hot swap tray?

Answer

As far as the logistics go, that will work. The hot swap enclosure is just an enclosure for a standard 3.5 inch drive. However there was a time when Dell machines would only recognise, or allow you to use Dell branded disks. I believe Dell updated their firmware in most places to make that not the case, but you may want to make sure that your firmware is up to date.

Wednesday, September 5, 2018

domain name system - Secondary Nameserver DNSSEC

I have this hidden master DNS nameserver notifying and updating the two public slave DNS servers:

my own VPS running Debian/Bind9 DNS

3rd-party secondary nameserver provider (afraid.org)

I finally got DNSSEC working with the hidden master and my public slave server (VPS).

Now I am searching high and low for a secondary nameserver service provider that can ALSO support DNSSEC. I couldn't find one. I couldn't understand why.

Then I saw this clue on GoDaddy Secondary NameServer wiki:

"You cannot use both DNSSEC and Secondary DNS with the same domain name."

Why can't a 3rd party provide asecondary name server with DNSSEC?

Answer

As has been noted, the quoted statement is one service provider noting a limitation in their own service, it's not a universal truth.

All that is really needed to make what you ask for work is this:

Slave nameserver gets an exact copy of the full zone data (including public keys, signatures, everything) such as what happens with a normal zone transfer (AXFR/IXFR), and simply uses the received zone data verbatim, no mucking about with the data.

Slave nameserver software supports DNSSEC. Ie, supports EDNS0, knows to act on the DNSSEC-relevant flags in the header/EDNS0 fields (such as returning relevant RRSIG/NSEC in responses to queries that request DNSSEC).

As for why the service provider referenced in the question cannot do this, you will really need to direct the question to them to get a proper answer.
Maybe they are using some custom or outdated nameserver software that cannot meet the above requirements? Maybe it's some kind of policy decision that is not even purely technical?

If you look at service providers that have more of a focus on DNS hosting, my impression is that requirements like the above are usually a non-issue (provided they have a slave nameserver option in the first place).

HP ProLiant DL380e Gen8 has high fan speed after installing second CPU

Today I installed a second CPU in our HP ProLiant DL380e Gen8, and after booting the fans all went to 99.96%. This causes the server to make an enormous amount of noise.

Could it damage the server? What could I do to fix this?
I already removed all the fans and put them back in, rebooted the entire server,...

I'm running HP_ESXi-5.1.0, CPU is twice Intel Xeon E5-2420 0 @ 1.90 GHz.

This is the current memory configuration:
enter image description here

(before - after).

Temperatures:

enter image description here

Answer

After upgrading all the software, the fans kept blowing 99%. However I suddenly noted a warning during boot time that fan 1 was missing - I switched fan 6 to slot 1 and the issue is now resolved.

Tuesday, September 4, 2018

PHP installation on IIS: ISAPI or CGI?

I'm running IIS6 on Windows Server 2k3, and currently have PHP installed as a ISAPI module. We're about to upgrade our environment to PHP 5.3.0, and this made me wonder whether I should stick with the ISAPI module or if there was a reason the CGI would be a better fit.

We have one web server for our organization, and do not have to worry about security related to shared hosting; we have several web sites, but they all belong to us.

Is there an advantage to using one method over the other? Is one more secure? Is it simply a matter of preference?

EDIT: PHP 5.3.0 dropped support for ISAPI, so you do need to install it via FastCGI. From the PHP Migration Guide:

Support for the ISAPI module has been dropped. Use the improved FastCGI SAPI module instead.

Answer

The Non-Thread-Safe PHP CGI binary for Windows is supposed to give you maximum stability, compatibility and performance as:

PHP was originally designed and optimized for multi-process environment

Most of the extensions were created keeping that in mind

There is no "waiting" that you see in multi-threaded environments

However, the performance and stability is susceptible when the CGI binary is used in multi-threaded environments such as IIS. Therefore most people have started using the relatively new FastCGI extension that is available for IIS 5.1/IIS 6.0 as a download and bundled with IIS7.

This guide explains how to install and configure PHP CGI with Microsoft's FastCGI extension.

The second option is to go for PHP ISAPI but be sure to (i) use Thread-Safe builds (ii) use stable and tested extensions -- the PHP ISAPI can otherwise crash and take down IIS as well. A side note is that tread-safety in PHP is like a hand-brake that is always engaged; some even say that it is a myth.

Update: PHP ISAPI is not shipped anymore so the question about ISAPI vs. CGI is not a question anymore. FastCGI is recommended.

FastCGI support is built into modern versions of IIS, you just need to enable it.

PHP installer presents the option of installing PHP with FastCGI.

For people who want to perform ZIP installations can use "PHP Manager for IIS" to configure PHP installation with IIS.

Monday, September 3, 2018

windows - Properly escaping a path with spaces in CMD shell

I have the following path:

"d:\workspace\Server trunk - CI\make\make & publish.bat"

However, when I try to execute this from a cmd shell, I get the error:

'd:\workspace\Server' is not recognized as an internal or external command, operable program or batch file.

What am I doing wrong? Is there a way to escape those spaces properly?

Answer

You need to quote everything bar the extension

"d:\workspace\Server trunk - CI\make\make & publish".bat

networking - How are NAT rules processed in Vyatta?

I am setting up a Vyatta router to replace my pfSense box that died. As I setting up the NAT rules I am not sure how they are processed.

Are Vyatta's NAT rules processed in the order of the list until the first match?

I have several rules that are destination rules for things like Zimbra and OpenVPN.

But at the bottom of my NAT rules I have a source NAT rule that defines anything coming from my subnet 10.0.0.0/24 should be NAT'd to my second usable public IP address.

So if I needed a specific NAT rule like one for my Zimbra server which sits on the third usable public IP would that need to BEFORE the general NAT rule?

Sunday, September 2, 2018

domain name system - Caching DNS returns SERVFAIL for NS record, but dig +trace disagrees?

This question is similar, but doesn't elaborate on the confusing case of a why a NS record cannot be obtained.

One of our caching DNS environments (RHEL 5.8, BIND 9.3.6-20.P1.el5_8.4) has ceased to return any useful data at all for a zone. Usually this sort of problem ends up being a stale NS or glue record, but in this particular case I can't seem to even get the cache to report a NS record for the zone.

dig @mycache somedomain NS returns SERVFAIL. There are no nameserver records cached at all.

dig +trace shows a healthy delegation path, with the final nameserver returning a response. Manually running the dig query against the final nameserver returns a valid NS record, the corresponding A record exists and agrees with the glue, etc.

What gives? Why is there no NS record for me to obtain from the DNS cache, not even a bad one?

Answer

If there's no authoritative answer for a NS record, then there's nothing to cache other than the failure to determine the authority. This is what has been cached, and a server's in-memory information about lame nameservers cannot be obtained by a DNS client. (or rather, this is as close as you're going to get)

Usually you can identify a problem with stale nameserver records by comparing the NS record in cache to what you find on the internet, but in this case there is no authoritative NS record to to cache. Glue records are not authoritative in and of themselves; with no authoritative answer, there is simply no authoritative nameserver.

One of two things is usually happening here:

dig +trace is getting a stale answer for an intermediate nameserver from your local cache, and there really is a problem going on at the moment. I've covered this behavior in another question.

The caching server encountered NXDOMAIN or SERVFAIL when chasing glue records to find an authoritative nameserver, and this event has been cached. Even if the problem has been corrected, or the glue has been pointed somewhere else, the nameserver isn't going to try asking for it again until an internal timer expires. Requesting a cache purge for the zone in question will usually reset it.

The latter case is usually the culprit. If you want to be absolutely sure, it may be possible to dump your nameserver's runtime cache and view the glue in memory. (i.e. BIND's rndc dumpdb) Be advised that this is a very expensive operation unless you can limit the scope of the dump to a single zone, and generally something to be avoided in high load scenarios.

Saturday, September 1, 2018

linux - Log every IP connecting on a system with iptables

Title says it all.

How can I, with iptables under Linux, log all IP connecting to a server?
As a little detail, I'd like to have only ONE entry in the log PER DAY PER IP.

Thanks :)

EDIT:

I narrowed it down to 5 packets logged for every new session which is weird since I use --hashlimit 1 --haslimit-burst 1, I suspect that --m limit which defaults to 5 plays a role in there. Trouble is, if I set --m limit to 1, only 1 entry is logged for ALL IP instead one per EACH IP.

The reason I want to do this is also to avoid as much as possible logs growing too fast since this will be a rather unmanaged box.

EDIT2:
Here is my current try, in a iptables-restore format:

(on several lines for ease of reading)

-A FORWARD -d 10.x.x.x -p tcp --dport 443 -m state --state NEW 
-m hashlimit --hashlimit-upto 1/min --hashlimit-burst 1 
--hashlimit-mode srcip --hashlimit-name denied-client 
-j LOG --log-prefix "iptables (denied client): "

Answer

I would try this:

# IP address entry older than one day
iptables -A ... -m recent --name mydaily ! --rcheck ! --seconds 86400 -j logandset
# IP address never seen before
iptables -A ... -m recent --name mydaily ! --rcheck -j logandset

# Custom chain for logging and refreshing
iptables -N logandset
iptables -A logandset -j LOG
iptables -A logandset -m recent --name mydaily --set

So your list mydaily will keep track of the last seen IP addresses, and if it was never seen before, or if the last seen is older than one day, the packet will be logged, and the list entry for that IP address be updated.

You should probably set ip_list_tot to a higher value for mydaily, as explained in the iptables manpage (In your case for /proc/net/xt_recent/mydaily).