Friday, June 30, 2017

virtual memory - Linux: How do I explicitly swap out everything possible?



This is essentially the reverse of "Linux: how to explicitly unswap everything possible?".



I want to maximize the amount of available free memory before running a process I know will use system memory intensively and I don't want it to pause for long periods of time until the OS gets it into its head that everything else should be swapped out.



Also, I know a lot of programs have memory they only use on initialization and then never touch again.




How do I make this happen?



I have tried doing sysctl vm.swappiness=100 but that hardly swaps out anything.


Answer



The unused initialization code will be freed as soon as the memory is needed for other purposes. (It will be backed by files from which it is read.)



The memory paging mechanisms on Linux are well designed and have been tested for years. It is rare you would want to swap any process out of memory. This would result in heavy paging activity any time the swapped process is scheduled for execution.



If you truly need the memory from the other applications, you have too little memory. You can prevent the other programs from executing by sending them a STOP signal with the kill command. Be careful which programs you stop or you could lock yourself out of the system.




If you are experiencing large pauses during startup of your process, consider using sar to determine where the bottleneck is. You can also use top to determine which process are being paged or swapped heavily. Don't be surprised if your process shows up as the problem.



I've run servers which were severely starved for memory. To perform startups, it was essential to limit the number of processes starting at any one time. Process start almost instantaneously even if memory is far over-committed.



If you really want to force everything possible out of memory you could write a program that allocates the desired amount of memory and continually writes to each page of the allocated memory for a few loops. It will experience all the issues you want to avoid.


email - saslauthd authentication error



My server has developed an expected problem where I am unable to connect from a mail client.



I've looked at the server logs and the only thing that looks to identify a problem are events like the following:




Nov 23 18:32:43 hig3 dovecot: imap-login: Login:
user=, method=PLAIN, rip=xxxxxxxx,
lip=xxxxxxx, TLS Nov 23 18:32:55 hig3 postfix/smtpd[11653]:

connect from xxxxxxx.co.uk[xxxxxxx] Nov 23
18:32:55 hig3 postfix/smtpd[11653]: warning: SASL authentication
failure: cannot connect to saslauthd server: No such file or directory
Nov 23 18:32:55 hig3 postfix/smtpd[11653]: warning:
xxxxxxx.co.uk[xxxxxxxx]: SASL LOGIN
authentication failed: generic failure Nov 23 18:32:56 hig3
postfix/smtpd[11653]: lost connection after AUTH from
xxxxxxx.co.uk[xxxxxxx] Nov 23 18:32:56 hig3
postfix/smtpd[11653]: disconnect from
xxxxxxx.co.uk[xxxxxxx]





The problem is unusual, because just half an hour previously at my office, I was not being prompted for a correct username and password in my mail client. I haven't made any changes to the server, so I can't understand what would have happened to make this error occur.



Searches for the error messages yield various results, with 'fixes' that I'm uncertain of (obviously don't want to make it worse or fix something that isn't broken).



When I run




testsaslauthd -u xxxxx -p xxxxxx





I also get the following result:




connect() : No such file or directory




But when I run





testsaslauthd -u xxxxx -p xxxxxx -f
/var/spool/postfix/var/run/saslauthd/mux -s smtp




I get:




0: OK "Success."





I found those commands on another forum and am not entirely sure what they mean, but I'm hoping they might give an indication of where the problem might lie.



When I run




ps -ef|grep saslauthd




This is the output:





root 1245 1 0 Nov24 ? 00:00:00 /usr/sbin/saslauthd -a
pam -c -m /var/spool/postfix/var/run/saslauthd -r -n 5 root 1250
1245 0 Nov24 ? 00:00:00 /usr/sbin/saslauthd -a pam -c -m
/var/spool/postfix/var/run/saslauthd -r -n 5 root 1252 1245 0
Nov24 ? 00:00:00 /usr/sbin/saslauthd -a pam -c -m
/var/spool/postfix/var/run/saslauthd -r -n 5 root 1254 1245 0
Nov24 ? 00:00:00 /usr/sbin/saslauthd -a pam -c -m
/var/spool/postfix/var/run/saslauthd -r -n 5 root 1255 1245 0

Nov24 ? 00:00:00 /usr/sbin/saslauthd -a pam -c -m
/var/spool/postfix/var/run/saslauthd -r -n 5 root 5902 5885 0
08:51 pts/0 00:00:00 grep --color=auto saslauthd




If it makes any difference, I'm running Ubuntu 10.04.1, Postfix 2.7.0 and Webmin/ Virtualmin.


Answer



Postfix can run in a chroot (by default in /var/spool/postfix) or not. If it is, it will try to open /var/spool/postfix/var/run/saslauthd/mux for sasl authentication. If it's not, it will try to open /var/run/saslauthd/mux



It seems that, for some reason, your postfix instance was running in a chroot, and it's not anymore. It's odd, but that's what I guess from the details of your question. If it's what's happened, you may change saslauthd configuration to use /var/run/saslauthd or run postfix in a chroot again.




To know if your Postfix is running chroot, you can check /etc/postfix/master.cf:




  1. If it has the line smtp inet n - y - - smtpd or smtp inet n - - - - smtpd, then your Postfix is running in a chroot;

  2. If it has the line smtp inet n - n - - smtpd then your Postfix is NOT running in a chroot.



This check comes from /etc/default/saslauthd (Ubuntu sasl configuration file).


Thursday, June 29, 2017

performance - What is the relation between IO wait utilisation and load average

Load average uses processes that are running or runnable or in uninterrupted sleep state. So do the processes in uninterrupted sleep state correspond with the %wa as per the top command? Both are referring to threads waiting for IO so it seems intuitive to assume that if one increases, the other will as well.



However, I'm seeing quite the opposite. %wait doesn't increase, the %idle is high and the load average is also high. I've read other questions on this but I have't found a satisfactory answer because they don't explain this behaviour.




  • If the %wait does not include uninterrupted sleep state, then what is
    it exactly? Is it that the %wait does not really correspond with the
    load? (eg. the load could be 10 on a 2 CPU machine but it contributes
    to only 30% wait%)

  • And how is this IO different from the IO referred

    to in uninterrupted state? What is a possible remedy in this case?



Clearly increasing CPU wouldn't help because there's tasks in the queue which the CPU is not picking up.



Another situation where it seems unintuitive that load average and CPU utilisation don't add up:



This situation is a bit different. The CPU idle time is high, the load average high (often double the number of CPUs), no disk I/O, so swap usage, some network I/O. There are no processes in uninterruptible sleep, the run queue goes up high frequently. How is the CPU still idle though? Shouldn't I expect the CPU to be at 100% utilisation? Is it that the high number of tasks can't be put on the CPU because they are waiting on network (or something else?)? It only seems reasonable to assume that that those tasks each consume very little time on CPU. Is that correct? What is the bottleneck in this case? Is it correct to say that increasing the CPU will not help? How can I find out what to configure or which resources to increase in order to reduce the load average?



sar -n TCP,ETCP,DEV 1

sar



netstat number of connections
netstat



iostat
iostat



vmstat
vmstat




uptime
uptime



top
top



nicstat
nicstat

microsoft office 365 - Dynamics CRM with Windows Essentials AD + Azure AD

I'm trying to configure a new Dynamics CRM 2016 on premise installation with Claims based authentication for Sharepoint Online (Office 365) and Internet facing access.



We currently have a Windows 2012 R2 Essentials domain controller synchronizing with Office 365, I'm aware we should not change passwords on any online services but instead use the local account so it syncs up the new password.




At the time we wanted to be as lean as possible in terms of setup in the office, so Essentials was the obvious choice, but I now think it's a bit too essential when you want to add-on other services! Is that correct?



I've seen this article, http://blog.kloud.com.au/2014/06/06/claims-based-federation-service-using-microsoft-azure/, that explains how to leverage ACS for the CRM's claims federation, which would sort the CRM login.



But I am slightly concerned about rolling this out without having single sign on configured across the directory. e.g. sync down the password from Azure to the onprem AD and apparently that's not possible with this setup, see https://social.technet.microsoft.com/Forums/windowsserver/en-US/97cdba31-afda-49a0-bd71-cdd408b22fe6/windows-server-2012-r2-essentials-and-azure-active-directory-sync-tool?forum=winserveressentials



Before I commit to using ACS (available in Azure premium only), I want to ensure we'll also be able to rool-out single sign on across the directory as it is, or if we need to migrate to a new DC (not on essentials) and use AADConnect instead? See https://azure.microsoft.com/en-us/documentation/articles/active-directory-aadconnect/ that includes a ADFS and thus not needing ACS.



Am I just mixing concepts? Is my concern unfounded?




Has anyone been able to do this kind of setup before?



Any help on this would be greatly appreciated.

Wednesday, June 28, 2017

Autoscaling Azure SQL Database

I'm looking to set up some type of method to autoscale (vertically) my Azure SQL Databases based on CPU or some other performance metric. My environment is in govcloud and I don't see any options for autoscale. I've create a logic app that will scale up and down at set times but this doesn't help when performance fluctuates on a given day.




If somebody knows of a way to trigger the databases to autoscale when cpu reaches a certain percentage I would appreciate any help/guidance.

raid - Fujitsu BX600 S3 with 10 x BX620 S4 Blade Servers + QNAP TS-EC1279U-RP --- SETUP



I have the following hardware:




  • Fujitsu BX600 S3 Chassis

  • 10 x BX620 S4 Blade Servers


  • QNAP TS-EC1279U-RP

  • APC Symmetra RM 6kVA 6000VA SYH2K6RMI

  • HP PROCURVE 2510G (J9280A)



What I would like to do:




  • Use the above for VMWARE

  • Partial rendering Video Farm


  • Developing software

  • Lots of testing and practicing



My questions:




  • The TS-EC1279U-RP has 4 x 1GBIT connection which can be trucked all together. I can install a 10GBIT network card and I would like to connect this to the BX600 for maximum throughput. The Gbic part number is FTLF8524P2BNV. I have 24 of these, please have a look at [Click Here][1] for the BX600 Specs, what I have! Everything is listed here. What is the best way to connect the QNAP TS-EC1279U-RP to the BX600. (I have been suggested (2 GBIT ports for iSCSI and the other 2 for normal traffic, seperate the traffice the with switch).


  • With the TS-EC1279U-RP. What is the best recommended RAID level that I should use? I have 12 x 3TB HDD's. (I have been suggested RAID 5, but I am not sure about that! I think at least RAID 6)





Please ask me for any information that I might have missed.



Thank you


Answer




2 GBIT ports for iSCSI and the other 2 for normal traffic, seperate
the traffice the with switch





That...do that, that makes sense - the BX600 is just a layer 2 switch, no layer 3 so I'd be tempted to create three vswitches, each with two active 1Gbps uplinks, one each for management interface, iscsi and VM traffic. This'll keep vMotion IP traffic 'in-chassis', give you a decent spot of performance (for iSCSI at 1Gbps to 7.2k disks anyway) and separate your VM traffic trunks.



Oh and consider/benchmark using NFS instead of iSCSI, not saying it's better but that box can do both, why not try both options.



Regarding your array don't EVER touch R5, it's dangerous and suicidal with large disks, I'd strongly advise you use RAID 10 but if you absolutely have to then R6 is SO much better than R5 but you're still going to be bottlenecked by those 7.2k disks anyway so it's kind of moot.


Tuesday, June 27, 2017

apache 2.2 - Problems with multiple SSL on same IP, but only in select clients



I know there are tons of posts about multiple SSL on same IP, but I promise I'm not beating a dead horse. My question is very clear. First, a little background...



Our organization has several ecommerce sites. All of these sites are running on the same IP, using SNI for name based virtual hosts. In most cases, this is working great. However, in some browsers (ie7 / ie8, but only on select machines for some reason), we got reports that users were seeing a domain mismatch with the SSL certificate. It turned out that they were seeing the SSL certificate for the first SSL host in alphabetic order, since Apache resolves the IP address first, THEN grabs what it thinks is the right virtual host file.



I did some experimenting with the SSL protocol and found that if I set it thusly (ssl.conf):




SSLProtocol TLSv1



Then I'd simply get a not found for any of the https domains in IE.



If I set SSLStrictSNIVHostCheck on in ports.conf



SSLStrictSNIVHostCheck on



Then I'd get a permission denied in the problematic browsers.




The problem is obviously that IE is not supporting, or not using, the TLSv1 protocol, or SNI, both of which are needed. So my question is...



Is there a configuration change I can make to support IE, perhaps under a different protocol, or is my only option using a separate IP for each virtual host which requires SSL?



Thanks in advance = )


Answer



SNI support is still, unfortunately, rather lacking. You don't specify, but I'd wager that your problematic IE browsers are on Windows XP machines, yes? There is no SNI support in any version of IE on Windows XP (or earlier); only Vista and later support it, and only in IE 7 and later.



See here for a list of browsers supporting SNI.




My advice: If you need to support clients that lack SNI support (and with the number of XP systems still out there, you quite likely do need to), then you'll have to implement solutions that do not depend on SNI.


linux - zram filesystem reports different usage at device level to that reported by the filesystem



We have an 80GB zram device defined on our host, and within this a 170GB ext4 filesystem:



  echo 170G > /sys/block/zram0/disksize

echo 80G > /sys/block/zram0/mem_limit
/usr/sbin/mkfs.ext4 -q -m 0 /dev/zram0
/usr/bin/mount /dev/zram0 /var/zram


This filesystem is used by our application for rapidly accessing large amounts of ephemeral data.



The filesystem size displayed in df matches the zram size as reported in /sys/block/zram0/disksize



Copying test data into an empty filesystem, we verified that a 2.2 : 1 compression ratio is achieved, and so the filesystem fills before we hit the zramfs memory limit. The /sys/block/zram0/orig_data_size value matches the usage reported by the filesystem:




# expr `cat orig_data_size` / 1024 ; df -k /dev/zram0
112779188
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/zram0 175329308 112956600 62356324 65% /var/zram


However, when the application is running with live data over a longer period, we find that this no longer matches.



# expr `cat orig_data_size` / 1024 ; df -k /dev/zram0

173130200
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/zram0 175329308 112999496 62313428 65% /var/zram


Now, the filesystem reports a usage of approx 110GB, but the zramfs device reports 165GB. At the same time, the zramfs memory is exhausted, and the filesystem becomes read-only.



The zram figures confirm that we are getting a 2.2 : 1 compression ratio between orig_data_size and compr_data_size; however, why does the filesystem show much more free space than the zram device? Even if this is space already allocated for re-use by the filesystem, shouldn't it be reused rather than allocating new space?



The data consists of a large number of small files which are added and removed at irregular intervals.



Answer



The cause of this turns out to be that when files are deleted from the ext4 filesystem living in the zram0 device, the memory is not freed back to the system. This means that, although the space is available for the filesystem to use (the output of df) it is nevertheless still allocated memory ( the stats in /sys/block/zram0 ). As a result, the memory usage heads up to 100% of allocation, though the filesystem still finds itself half-full due to deletions.



This does mean that you can still fill the filesystem and new files will not use as much new memory space; however it does affect the compression ratio negatively.



The solution is to mount the filesystem with the discard and noatime options. The discard releases freed filespace back to the memory device and as a result the usage on the two matches again.


Change IP in all DNS zones on Microsoft Windows DNS Server

We host about 125 DNS forward-lookup zones on a Microsoft Windows Server 2003 DNS Server. We need to migrate to a new block of IPs and don't want to go through each zone manually. What's the best way to globally change one ip (1.1.1.1, for example) to another (2.2.2.2, for example) in ALL DNS zones and records?

Monday, June 26, 2017

bind - How to make a proper SPF record(s) for a complex setup?

I have a setup with several mailing domains and one domain for the return-path and sender



So the mail header looks like:




Received: from x1.mailer1.com  ( can be x1.mailer2.com, x1.mailer3.com, x1.mailer4.com ) 

Sender name#companydomain.com@bounce.bouncedomain.com
Return-Path:

Received-SPF: neutral (google.com: a.y.z.a is neither permitted nor denied by best guess record for domain of test@bounce.bouncedomain.com)


Here are my SPF Records:




SPF Record for bouncedomain.com



"v=spf1 a a:bouncedomain.net a:bouncedomain.com include:bouncedomain.com ~all"



SPF record for mailer1.com:



v=spf1 a mx ptr mx:mail.mailer1.com -all



Questions, should this work. Do I need an SPF record published for mailer1.com?

virtualhost - /etc/hosts entry for single IP server serving multiple domains



Running Ubuntu 10.04



My server serves 3 different domains using named virtual hosts in Apache2.
I'm current using different Named Virtual Servers to 301 redirect www to the non-www equivalent.
It's working, but I don't understand the correct entries for my /etc/hosts file and I think that is causing problems for me trying to setup Varnish.




I understand I need the localhost line



127.0.0.1       localhost localhost.localdomain


Should I also list each domain here? as in



127.0.0.1       localhost localhost.localdomain example1.com example2.com example3.com



What about the entry for the IP of the server? Do I need the following line?



< IP.Of.Server >      example1.com example2.com example3.com


Also, should I be listing www.example.com AND example.com on each line, so they go into Apache and it can deal with the 301 redir?


Answer



I'm assuming this is for testing, otherwise you'd be setting up proper DNS records, not your hosts file.




What you want is for every name you want to call your web server with, to resolve to your server's IP address.



If you are testing from the server itself, then you can make everything point to 127.0.0.1, but of course also making it point to your server's actual IP address would work.



If you are testing from another machine, then of course you want every name to resolve to the server's real IP address.



The syntax is straingthforward:



IP.of.server        www.domain.name domain.name
IP.of.server www.otherdomain.name otherdomain.name

IP.of.server www.anotherdomain.name anotherdomain.name
IP.of.server www.yetanotherdomain.name yetanotherdomain.name


...and so on.






Update:




Of course, what ErikA says is completely right. Modifying the hosts file is not needed for the server to work; it's only useful if/when you need to test it without having proper DNS records in place, or if you want to override them to connect f.e. to a test server instead of a production one.


Sunday, June 25, 2017

systemd - /etc/HOSTNAME on SuSE: short name or FQDN?



The file /etc/HOSTNAME on SuSE-Linux contains the host name.



Should this be the full qualified domain name, or the short name (without ".")?



Related question: socket.getfqdn() returns no domain, but socket.gethostname() does?


Answer




Please note that AFAIK the upper-case /etc/HOSTNAME is specific to SuSe systems, but that should be a symbolic link to the a lowercase file name /etc/hostname, which is used by systemd and should be therefore be present on other distributions as well.



The recommend systemd utility hostnamectl distinguishes three different hostnames:




  1. the high-level "pretty" hostname which might include all
    kinds of special characters (e.g. "Lennart's Laptop"), which is stored in /etc/machine-info

  2. the static hostname which is used to initialize the
    kernel hostname at boot (e.g. "lennarts-laptop"), which is stored in /etc/hostname


  3. the transient hostname which is a default received from

    network configuration.




The manual page for the hostname configuration file man 5 hostname doesn't really explicitly use the term FQDN but states:




The /etc/hostname file configures the name of the local system that is set during boot using the
sethostname(2) system call. It should contain a single newline-terminated hostname string.
Comments (lines
starting with a `#') are ignored.
The hostname may be a free-form string up to 64 characters in length;
however, it is recommended that it consists only of 7-bit ASCII lower-case characters and no spaces or dots,

and limits itself to the format allowed for DNS domain name labels, even though this is not a strict
requirement.




Where the "no dots" is the only hint that the hostname file should only contain the system host name component, without a domain suffix and therefore not a FQDN.



The manual for the hostname command is more explicit (man 1 hostname) :







You can't change the FQDN (as returned by hostname --fqdn) or the DNS domain name with this [sic: the hostname] command. The FQDN of the system is the name that the resolver(3) returns for the host name.




In other words, the hostname is NOT the FQDN.



And then on how to configure the FQDN:




Technically: The FQDN is the name gethostbyname(2) returns for the host name returned by gethostname(2). The DNS domain name is the part after the first dot.




Therefore it depends on the configuration (usually in /etc/host.conf) how you can change it. Usually (if the hosts file is parsed before DNS or NIS) you can change it in /etc/hosts.







BTW: If you do use a FQDN such as myhost.example.com as hostname and in /etc/hostname, things like dnsdomain and hostname -d will return empty strings and will NOT split that string at the first dot into a DNS hostname component myhost and a domain name component example.com


Saturday, June 24, 2017

scheduling - Prevent duplicate cron jobs running



I have scheduled a cron job to run every minute but sometimes the script takes more than a minute to finish and I don't want the jobs to start "stacking up" over each other. I guess this is a concurrency problem - i.e. the script execution needs to be mutually exclusive.




To solve the problem I made the script look for the existence of a particular file ("lockfile.txt") and exit if it exists or touch it if it doesn't. But this is a pretty lousy semaphore! Is there a best practice that I should know about? Should I have written a daemon instead?


Answer



There are a couple of programs that automate this feature, take away the annoyance and potential bugs from doing this yourself, and avoid the stale lock problem by using flock behind the scenes, too (which is a risk if you're just using touch). I've used lockrun and lckdo in the past, but now there's flock(1) (in newish versions of util-linux) which is great. It's really easy to use:



* * * * * /usr/bin/flock -n /tmp/fcj.lockfile /usr/local/bin/frequent_cron_job

domain name system - Why should one have a secondary DNS server?



I'm very confused.



I basically understand how DNS works. Here's an example that helps illustrate what I'm having trouble understanding.



Right now, I run a small web-server. I use my provider's DNS manager, so I don't have a DNS server hosted on the machine.



Let's say for a second, that I don't use my host's DNS, and I decide to set up a DNS server on my server. Hypothetical scenario: my server (entire) server goes down - DNS included. Why do I need backup DNS? If the server is down, who cares if the DNS server is down too, considering that even if I had DNS up (it wasn't on the crashed server), it wouldn't be able to forward requests anyway since the server would be down?




Is the point of having secondary DNS, to be able to change the IP addresses that your DNS server points to, so if your webserver was down, you could redirect traffic to a backup? How would you switch to the secondary provider, in the event that your main DNS provider becomes unavailable? Is a backup DNS system basically up all the time? How is it configured? Is it just an exact clone of the DNS server you would have on your server? Do they run simultaneously?



Hopefully someone can see what I'm hung up on, and provide some guidance.


Answer



The major point in having a secondary DNS server is as backup in the event the primary DNS server handling your domain goes down. In this case, your server would be still up, and so without having a backup, nobody could get to your server possibly costing you lots of lost customers (i.e. REAL MONEY).



A secondary DNS server is always up, and ready to serve. It can help balance the load on the network as there are now more than one authoritative place to get your information. Updates are generally performed automatically from the master DNS. Thus it is an exact clone of the master.



Generally a DNS server contains more information than just a single server, it might contain mail routing information, information for many many hosts, mail spam keys, etc. So resilancy and redundancy are of DEFINITE benefit to domain holders.




I hope this helps your understanding.


Top level domain/domain suffix for private network?

At our office, we have a local area network with a purely internal DNS setup, on which clients all named as whatever.lan. I also have a VMware environment, and on the virtual-machine-only network, I name the virtual machines whatever.vm.



Currently, this network for the virtual machines isn't reachable from our local area network, but we're setting up a production network to migrate these virtual machines to, which will be reachable from the LAN. As a result, we're trying to settle on a convention for the domain suffix/TLD we apply to the guests on this new network we're setting up, but we can't come up with a good one, given that .vm, .local and .lan all have existing connotations in our environment.



So, what's the best practice in this situation? Is there a list of TLDs or domain names somewhere that's safe to use for a purely internal network?

Friday, June 23, 2017

security - How can I check if my embedded Linux's SSL is not affected by heartbleed, without relying on the version number?




There are a lot of embedded Linux device that are built on Linux, that are used exactly for security purposes, like gateways, if I check OpenSSL I get:



openssl version -a


gets -»



OpenSSL 1.0.0k 5 Feb 2013



But this maybe patched or merged and I don't have access to the sources, how can I check that my system is not vulnerable without relying on openssl version -a


Answer



There is a perl script that allows you to check our own services. There are also online tools. One more.


heroku - Redirect to HTTPS and Apex Domain with Nginx location Configuration

I would like to force HTTPS and the apex domain (e.g. https://example.com) in my application through nginx configuration using location blocks. I currently have the following nginx_app.conf file (which works with both the apex and the www subdomain, and both http and https):



location / {
try_files $uri @rewriteapp;
}


location @rewriteapp {
rewrite ^(.*)$ /app.php/$1 last;
}

location ~ ^/(app|config)\.php(/|$) {
# fastcgi_pass directives go here...
}



To force the apex domain and https, I tried using if-statements as follows, checking for the $scheme and $host variables, but I get an error that the page is not redirecting properly. I also added an HSTS directive.



location / {
if ($scheme = http) {
rewrite ^/(.*) https://$host/$1 permanent;
}
if ($host = www.example.com) {
rewrite ^/(.*) https://example.com/$1 permanent;
}
try_files $uri @rewriteapp;

}

location @rewriteapp {
rewrite ^(.*)$ /app.php/$1 last;
}

location ~ ^/(app|config)\.php(/|$) {
# fastcgi_pass directives go here...
add_header Strict-Transport-Security "max-age=86400";
}



What is the proper way to force http and the apex domain with nginx configuration? As an aside, I'm using heroku (with DNSimple) to deploy my app so I would like both the following domains to work: https://example.herokuapp.com and https://example.com.



UPDATE:
I tried moving the if-statements outside the location block into the default server block (click here), and change the rewrites for returns as follows, but it still does not work. I still get "The page isn't redirecting properly" when requesting http, and "Unable to connect error" when requesting the www subdomain.



if ($scheme = http) {
return 301 https://$host$request_uri;
}

if ($host = www.example.com) {
return 301 https://example.com$request_uri;
}

location / {
try_files $uri @rewriteapp;
}

location @rewriteapp {
rewrite ^(.*)$ /app.php/$1 last;

}

location ~ ^/(app|config)\.php(/|$) {
# fastcgi_pass directives go here...
add_header Strict-Transport-Security "max-age=86400";
}

Edit Hard Disk Serial Number with VMware




I'm virtualizing a Rockwell AssetCentre Server and I'm looking at Disaster Recovery scenarios. This server contains a lot of other Rockwell Software like RSLinx, Logix 5000, Logix 500, and more...



Software activations for Rockwell work in a very strict manner, so much so that I'm concerned about whether its going to be viable to restore the AssetCentre server Virtual Machine to a different host in the event of a system failure.



The software activations are locked to the virtual machine using the serial number of the hard drive. You can also choose to lock it to the MAC address of the virtual machine. Are either of these two things something that can be customized and edited using VMWare? Will they automatically change if I host the virtual machine using a different Virtual Server?



I've looked inside the .vmx files (currently using a mix of VMWare Workstation 7 and VMware ESXi 4.1) and I didn't see anything in either of the files that looked like a MAC addresss or a Hard Disk serial number.


Answer



So I found out that VMware changes Hard Disk serial number (8 character Alpha-Numeric code somehow bound to a Hard Drive or Volume) when you make a clone, and I haven't found a way to manually change it back. So... using the "DISK_SERIAL_NUM" for the Host ID is a bad idea for Rockwell products running on VMware (even though they will still recommend it).




In FactoryTalk Activation Manager, if you click "Get New Activations" and then click the [...] button under "Host ID Information" it will show you MAC Address and the "DISK_SERIAL_NUM" and ask you to choose a Host ID to bind your activation's.



Since the MAC Address is the only thing I know of that you can manually configure in ESXi on a virtual machine, we reworked our activation's and now they're all bound to the MAC address of the primary network adapter.



Been running...
- AssetCentre
- RSLogix 500 (make sure you get the activation Node-Locked)
- RSLogix 5000
- RSLinx Classic
...with no issues since reworking the activation's.


Thursday, June 22, 2017

Can cloning a hard disk drive to a SSD drive physically damage the SSD or the performance of the SSD?

I have a new SSD drive, I want to use it to replace the old mechanical hard disk drive in a laptop.



The laptop had a new O/S (Windows 7 64-bit) installed about 9 months ago and I want to know if it's worth the effort to reinstall the O/S or if I can just clone the old HDD.



I have read discussions about differences between platter alignments in the HDD and memory/data pages in the SSD. I believe you can reconfigure the SSD to handle the problem of platter/page alignment, but even so the advice is coming down on the side of a reinstall because:





  1. Cloning can cause a misconfigured SSD and thus not achieve the
    maximum performance boost I am looking for.

  2. The misconfiguration is actually damaging to the SSD and will result in a reduced lifetime.



Is there quantified measure of the expected performance degradation for cloning (with or without reconfiguration)?



Is there a quantified measure of the any possible damage to the SSD and the reduction in its lifetime?

linux - How to assign multiple public IP-Adresses for 2 KVM-Guests



I am new to this whole topic and I try for days now to figure out how to assign multiple public ip-addresses to KVM-guests through a KVM host. I found tons of examples howto get such a setup with 1 public IP running.




Here is my setup:
The Server has only one NIC/MAC and runs 2 KVM-Guests with apache(and other stuff). Both guest-environments are ubuntu server 11.10 and must run in separate VMs. The 5 public ip-addresses are used to handle SSL-certificates and other stuff. The first VM should use 3 of the 5 addresses/certificates. The second VM gets the rest. The apache-stuff is configured correctly.



I have tried a number of differend ways via iptables to route the traffic from the hosts NIC to the guest-NICs. In spite of the fact that one way was the right one but only wrong implemented, I leave the details untold to leave you unprepossessed. The question is: Whats the ideal way it should be done?



The following conditions should be met:




  • Apache must get the original ip-address of the visitor


  • Apache must know, what public-ip address was ultilized to use the right ssl-vhost

  • The traffic must not be routed through a (reverse-)proxy on the host, since there are 2 other non-http-services, on other VM-guests, that should be accessible from public. And: Only sshd should listen directly on the host - nothing else

  • Each VM should be able to access the internet directly.

  • The network in the data-center is switched MAC-based. As I figured out, the only way to comunicate with the internet is through eth0 and its MAC-address.



If I would discard all the virtualization-stuff, this would be perfectly easy, as apache get the request directly from a specific ip-address.



I am open for any working solution.




Diagram


Answer



Use a bridge on your dom0 (e.g. KVM Host) WAN interface. This requires installing bridge-utils package. Since this is Debian-based distro, you may configure it in /etc/network/interfaces:



iface eth0 inet manual

auto br_wan
iface br_wan inet dhcp
# Assuming DHCP to get address, otherwise migrate all WAN connection options here
#address 192.168.122.0

bridge_ports eth0 tap_guest1
bridge_stp off
bridge_maxwait 0
bridge_fd 0
pre-up ip tuntap add dev tap_guest1 user guest1 mode tap
# This command is required if your ISP allocates static IPs depending on MAC address
# You shouldn't use this but might be handy some time
#pre-up sysctl -q -w net/ipv4/conf/tap_guest1/proxy_arp=1
post-down ip tuntap del tap_guest1 mode tap



Pre-up commands set up TAP interface to connect your KVM guest to a bridge. Note that this setup allows to run kvm from non-privileged user guest1. Note that setting net.ipv4.ip_forward = 1 with sysctl might be usefull as well.



I have used ip tuntap command from iproute2 package. It's not yet documented in the Debian package but soon will be available in upstream's manual page. Since this package is installed on every Debian-based server, you won't need to install uml-utilities or openvpn package to just create these interfaces.



This approach sure lacks some elegance to manage lots of tap interfaces, because you'll need to create similar pre-up and post-down lines as for tap_guest1 interface. This can be fixed by writing additional scripts in /etc/network/pre-up.d and /etc/network/post-down.d. It is also a problem if you want to reconfigure br_wan interface with ifdown/ifup scripts while KVM guests are still running — you'll need either to remove all interfaces except eth0 from bridge configuration and detach them from bridge manually (don't forget to attach them back after bridge reconfiguration then) or shutdown all KVM instances running on a bridge.



Another way, perhaps more clean, is to write custom ifup script for KVM itself and use it in script option for your NIC. You can get an example in /etc/qemu-ifup. See kvm manual page for details.



Then you can run your KVM box like this:




kvm -net nic,model=virtio,macaddr=12:34:56:78:9a:bc \
-net tap,ifname=tap_guest1,script=no,downscript=no \
-boot c -nographic -display none -daemonize \
guest1-drive.qcow2


Setting several IP addresses on one interface for your KVM guest can be done manually with command



ip address add aaa.bbb.ccc.101/24 dev eth0



Or permanently in /etc/network/interfaces like this:



auto eth0 eth0:1
iface eth0 inet static
address aaa.bbb.ccc.100
network aaa.bbb.ccc.0
netmask 255.255.255.0
broadcast aaa.bbb.ccc.255

gateway aaa.bbb.ccc.1

iface eth0:1 inet static
address aaa.bbb.ccc.101
network aaa.bbb.ccc.0
netmask 255.255.255.0
broadcast aaa.bbb.ccc.255
gateway aaa.bbb.ccc.1



Note that if your datacenter/provider does not expect you to reveal additional boxes on the same net he might not configure them and they will be unavailable. In this case you might want to create internal bridge and use iptables to forward packets between your WAN interface and this bridge using DNAT and SNAT. Assuming your local virtual bridge network is 10.0.0.0/8, your guest1 is 10.0.0.2 you'll need this:



iptables -t nat -A PREROUTING --dst aaa.bbb.ccc.100 -p tcp --dport 80 -j DNAT --to-destination 10.0.0.2
iptables -t nat -A PREROUTING --dst aaa.bbb.ccc.101 -p tcp --dport 80 -j DNAT --to-destination 10.0.0.2
...
iptables -t nat -A POSTROUTING -p tcp --dst 10.0.0.2 -j SNAT --to-source aaa.bbb.ccc.100


Note that you'll need as much DNAT commands as external IPs per KVM guest you have, but only one SNAT rule to give access to the internet. Also you can allow only HTTP/HTTPS/SSH traffic by allowing only desired ports. If you omit the --dport statement then all ports will be forwarded. Your KVM guest should have static network settings with KVM host as default gateway unless you're willing to host DHCP server.


HP ProLiant DL380 G7 servers will not POST



We have just received 2 HP DL380 G7's from our DR site. They have been running fine at the site for some time but we have tried to power them up in our DC and they will not post. We get a very brief flash of the post screen and then the systems power cycle. Both systems behave in the same way. Has anyone come across this issue before? There are no beep codes. We have tried removing the battery as well as the hard disks but it appears to be an issue that occurs prior to the system posting.


Answer



Check your KVM... Try with different keyboard/monitor or run headless. Don't repeat the mistakes made here.




See: HP ProLiant DL360 G7 hangs at "Power and Thermal Calibration" screen






Edit: It would be important to get into the ILO to see server messages. The ILO's settings are persistent, so removing the battery won't help you. Even if your issue is not KVM-related, the rest of the flowchart above should help you isolate the issue.



If you have physical access to the server, you can try this sequence:




  • Remove the power supply units and swap them.




Test to see if the system will boot...



If that doesn't work:




  • Remove all power supplies from the chassis

  • Locate the System Maintenance Switch on the motherboard - It's a set of 10 DIP switches.

  • Turn switch #6 switch on.


  • Insert all power supply units.

  • Power on the server and allow it to idle for 3 minutes.

  • Power the server off.

  • Remove all power supplies.

  • Return DIP switch #6 to off (original) position.

  • Reinsert the power supplies.

  • Power the server on.



Test to see if the system will boot...




If that doesn't work:




  • Remove all power supplies from the chassis

  • Turn DIP switches #1, 5 and 6 switch on.

  • Insert all power supply units.

  • Power on the server and allow it to idle for 3 minutes.

  • Power the server off.

  • Remove all power supplies.


  • Return DIP switches #1, 5 and 6 to off (original) position.

  • Reinsert the power supplies.

  • Power the server on.



Test to see if the system will boot...






oops



Wednesday, June 21, 2017

routing - Machines disregarding default gateway



Our gateway is a router which redirects all browsing traffic to a proxy server (Ubuntu 14.04.3). Proxy server then process and sends the traffic back to the router through a different interface. Proxy is also connected to the LAN .



Some computers in the LAN routes it's traffic directly to 192.168.0.2 which is the proxy server disregarding the default gateway (192.168.0.1) set in the network settigns. This has only identified with computers with static IP's for the moment. DHCP users do not have a problem. What could be the reason for this? How could we avoid this behaviour? Find a basic diagram of the network below.



Diagram



Routing table of a machine with static IP





Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 192.168.0.1 192.168.0.179 276
127.0.0.0 255.0.0.0 On-link 127.0.0.1 306
127.0.0.1 255.255.255.255 On-link 127.0.0.1 306
127.255.255.255 255.255.255.255 On-link 127.0.0.1 306
169.254.0.0 255.255.0.0 On-link 192.168.0.179 296
169.254.255.255 255.255.255.255 On-link 192.168.0.179 276
192.168.0.0 255.255.255.0 On-link 192.168.0.179 276

192.168.0.179 255.255.255.255 On-link 192.168.0.179 276
192.168.0.255 255.255.255.255 On-link 192.168.0.179 276
224.0.0.0 240.0.0.0 On-link 127.0.0.1 306
224.0.0.0 240.0.0.0 On-link 192.168.0.179 276
255.255.255.255 255.255.255.255 On-link 127.0.0.1 306






Persistent Routes:

Network Address Netmask Gateway Address Metric
0.0.0.0 0.0.0.0 192.168.0.1 Default



Routing table of a DHCP machine




Active Routes:


Network Destination Netmask Gateway Interface Metric

0.0.0.0 0.0.0.0 192.168.2.1 192.168.2.165 10
127.0.0.0 255.0.0.0 On-link 127.0.0.1 306
127.0.0.1 255.255.255.255 On-link 127.0.0.1 306
127.255.255.255 255.255.255.255 On-link 127.0.0.1 306
192.168.2.0 255.255.255.0 On-link 192.168.2.165 266
192.168.2.165 255.255.255.255 On-link 192.168.2.165 266
192.168.2.255 255.255.255.255 On-link 192.168.2.165 266
224.0.0.0 240.0.0.0 On-link 127.0.0.1 306
224.0.0.0 240.0.0.0 On-link 192.168.2.165 266
255.255.255.255 255.255.255.255 On-link 127.0.0.1 306

255.255.255.255 255.255.255.255 On-link 192.168.2.165 266



Answer



We managed to address this issue thanks to the tip given by joeqwerty. We were not using ICMP redirects to redirect the traffic. However both the Proxy server and the Router were sending ICMP redirects to the clients. Since neither of the devices had any use for this behaviour, we disabled ICMP redirects in both the devices and the issue never came back.


linux - Adding custom dns entries for name resolution in the local network




Hi!



Our office server serves different websites based on hostname, e.g. wiki.os, icons.os, an entry for many client projects etc. We perform the name resolution via the hosts file on every single pc in the office. This is a real pain as the list changes from time to time.



I would like to use the office server as the nameserver for the pcs in the office and let it return the usual nameserver results PLUS our custom local dns entries for the office server, so every pc connected in the network can use the names. Currently, the nameserver used is the router. The server runs on Debian.



What would be the best way to do this? Do I have to set up a complete BIND server or is there a little, sneaky tool I missed? Any suggestions?



Greetings,
Steffen



Answer



I have used dnsmasq to provide local dns services in my home network and it is also known to be able to serve way beyond 1000 hosts. Dnsmasq will serve names from the /etc/hosts file, provide dns-caching and it also contains a dhcp server. The dhcp part is disabled by default.



On Debian the installation procedure goes like this:



apt-get install dnsmasq



Dnsmasq will per default ask the nameservers in resolv.conf if it's own cache or the local hosts file lacks a suitable entry.



I also use dnsmasq at work to provide dns-caching and it's effect is very noticeable.



Tuesday, June 20, 2017

How to shrink SQL Server Log File Size?

To start off, I am a Software Engineer - nowhere near a DBA. I can install SQL Server and setup a simple SQL Server database, which I have done. I setup a simple SQL Server 2016 database on a client's web server (Windows 2012 R2) to store some data for a new website being deployed on the server. It started off as a very small database with a few tables. It has quickly grown to have a few dozen tables, several of which have thousands of records. The database is basically being used to import data from an external API and store it locally so that it can be accessed frequently without performance issues. All of the data imports are up an running without issue. However, I have noticed CPU spikes on SQL Server. More importantly, the log file for the database is up to 33GB. I am very afraid this log file is going to end up using all of the available disk space and crash the server. I need to know what I can do in the short-term (until we get a real DBA in here) to prevent this from happening.

Sunday, June 18, 2017

How can an unauthenticated user access a windows share?



I have a directory shared on my computer, which is part of the domain. Is it possible to set up the share so that a user logged on to a different machine which is not part of the domain can access my share? From the machine not on the domain, I can browse to the share, but it asks for credentials, and I just want to allow anonymous access.


Answer



To do what you want you'll have to enable the "Guest" account on the computer hosting the files and then grant the "Everyone" group whatever access you want.




"Guest" is a user account, but its enabled / disabled status is interpreted by the operating system as a boolean "Allow unauthenticated users to connect?" Permissions still control the access to files, but you open things up a LOT by enabling Guest.



Don't do this on a domain controller computer, BTW, because you'll be Guest on all DCs...


domain name system - DHCP and DNS on none AD 2003 Server PTR is updating but no A records

I have a strange issue, I have a DHCP and DNS server running in a non AD environment, on Windows 2003 server. I setup DHCP to update DNS A and PTR records even if the client doesnt request it, but I only see PTR records updated, the A records are not created at all. The domain is "local" forward zone is called "local" and in the option 15 set to "local" (actual name) the PTR records are created with the right name (example: win64_ent.local), What am I missing here ?

Friday, June 16, 2017

How is the IP of a nameserver located in it's own domain resolved?





I understand basics of how DNS works. My ISP finds out recursively, that my domain xyz.com is assigned with two nameservers:



ns1.xyzprivatens.com
ns2.xyzprivatens.com




Then it asks one of those two for the xyz.com IP and communicates with server using this IP.



Now is the interesting question. How IPs for the nameservers themselves are resolved?


Answer



This is done by means of the so-called Glue Records. More information here on SF or on Wikipedia.


testing - How do I force ext3 partition to the "error" state?

I have a script where fsck is called in case filesystem is in "error" state only.
I'd like to tests it.
Unfortunatelly I don't know how to force "error" state on ext3 partition.



The only one idea I have is run fsck on mounted partition first. AS soon as there is warning that filesystem can be damaged, I expect it can help. Need to try.

Meanwhile, may be anyone knows answer already?

monitoring - Using smartd to monitor eSATA hard drive?



I'm using smartd to monitor the S.M.A.R.T. health of the internal hard drives on my file server and alert me to signs of impending doom. I would also like to monitor the external eSATA hard drives I'll be using with it, but I'm not sure how to overcome these obstacles:





  1. Being an external drive used for off-site backup, it may or may not be present. How can I make smartd not "freak out" and spam my e-mail inbox when the drive "disappears"? (Note: I haven't tested this yet, but I'm assuming smartd will assume a catastrophic failure of the drive if it suddenly can't be found.)

  2. For the same reason as above, the drives won't always be e.g. /dev/sdf (in fact, once I remove the USB HDD that's currently connected, the next time I connect one of them it will be /dev/sdg!), but it's my understanding that by-UUID and friends reference partitions, whereas I need to reference devices for smartd. How can I reliably point toward these external drives without having to manually update the /dev/sd* reference each time it's plugged in?



Using DEVICESCAN in the config file seems the obvious choice, since I am using identical configurations for all my drives anyway, but it's my understanding that smartd only scans devices when it starts up, and I'd rather avoid having to restart the daemon every time I plug in one of the drives (unless this is the most elegant solution to the problem).


Answer



You should be able to achieve what you need with udev rules.



You could create /dev symlinks to provide consistant access to your external drives (as identified by serial/model/etc). Those could then be referenced in the smartd config and marked with -d removable to ensure smartd starts up when the external drive is absent.




You may still need to restart smartd, but udev can do that too via the RUN directive.


SSH X11 not working



I have a home and work computer, the home computer has a static IP address.



If I ssh from my work computer to my home computer, the ssh connection works but X11 applications are not displayed.




In my /etc/ssh/sshd_config at home:



X11Forwarding yes
X11DisplayOffset 10
X11UseLocalhost yes


At work I have tried the following commands:




xhost + home HOME_IP
ssh -X home
ssh -X HOME_IP
ssh -Y home
ssh -Y HOME_IP


My /etc/ssh/ssh_config at work:



Host *

ForwardX11 yes
ForwardX11Trusted yes


My ~/.ssh/config at work:



Host home
HostName HOME_IP
User azat
PreferredAuthentications password

ForwardX11 yes


My ~/.Xauthority at work:



-rw------- 1 azat azat 269 Jun  7 11:25 .Xauthority


My ~/.Xauthority at home:




-rw------- 1 azat azat 246 Jun  7 19:03 .Xauthority


But it doesn't work



After I make an ssh connection to home:



$ echo $DISPLAY
localhost:10.0


$ kate
X11 connection rejected because of wrong authentication.
X11 connection rejected because of wrong authentication.
X11 connection rejected because of wrong authentication.
X11 connection rejected because of wrong authentication.
X11 connection rejected because of wrong authentication.
X11 connection rejected because of wrong authentication.
X11 connection rejected because of wrong authentication.
X11 connection rejected because of wrong authentication.
kate: cannot connect to X server localhost:10.0



I use iptables at home, but I've allowed port 22. According to what I've read that's all I need.



UPD.
With -vvv




...
debug2: callback start

debug2: x11_get_proto: /usr/bin/xauth list :0 2>/dev/null
debug1: Requesting X11 forwarding with authentication spoofing.
debug2: channel 1: request x11-req confirm 1
debug2: client_session2_setup: id 1
debug2: fd 3 setting TCP_NODELAY
debug2: channel 1: request pty-req confirm 1
...


When try to launch kate:





debug1: client_input_channel_open: ctype x11 rchan 2 win 65536 max 16384
debug1: client_request_x11: request from 127.0.0.1 55486
debug2: fd 8 setting O_NONBLOCK
debug3: fd 8 is O_NONBLOCK
debug1: channel 2: new [x11]
debug1: confirm x11
debug2: X11 connection uses different authentication protocol.
X11 connection rejected because of wrong authentication.

debug2: X11 rejected 2 i0/o0
debug2: channel 2: read failed
debug2: channel 2: close_read
debug2: channel 2: input open -> drain
debug2: channel 2: ibuf empty
debug2: channel 2: send eof
debug2: channel 2: input drain -> closed
debug2: channel 2: write failed
debug2: channel 2: close_write
debug2: channel 2: output open -> closed

debug2: X11 closed 2 i3/o3
debug2: channel 2: send close
debug2: channel 2: rcvd close
debug2: channel 2: is dead
debug2: channel 2: garbage collecting
debug1: channel 2: free: x11, nchannels 3
debug3: channel 2: status: The following connections are open:
#1 client-session (t4 r0 i0/0 o0/0 fd 5/6 cc -1)
#2 x11 (t7 r2 i3/0 o3/0 fd 8/8 cc -1)


# The same as above repeate about 7 times

kate: cannot connect to X server localhost:10.0



UPD2
Please provide your Linux distribution & version number.
Are you using a default GNOME or KDE environment for X or something else you customized yourself?





azat:~$ kded4 -version
Qt: 4.7.4
KDE Development Platform: 4.6.5 (4.6.5)
KDE Daemon: $Id$


Are you invoking ssh directly on a command line from a terminal window?
What terminal are you using? xterm, gnome-terminal, or?
How did you start the terminal running in the X environment? From a menu? Hotkey? or ?




From terminal emulator `yakuake`

Manualy press `Ctrl + N` and write commands


Can you run xeyes from the same terminal window where the ssh -X fails?




`xeyes` - is not installed
But `kate` or another kde app is running



Are you invoking the ssh command as the same user that you're logged into the X session as?
From the same user



UPD3



I also download ssh sources, and using debug2() write why it's report that version is different
It see some cookies, and one of them is empty, another is MIT-MAGIC-COOKIE-1


Answer



The reason ssh X forwarding wasn't working was because I have a /etc/ssh/sshrc config file.



The end of the sshd(8) man page states:





If ~/.ssh/rc exists, runs it; else if /etc/ssh/sshrc exists, runs it; otherwise runs xauth




So I add the following commands to /etc/ssh/sshrc (also from the sshd man page) on the server side:



if read proto cookie && [ -n "$DISPLAY" ]; then
if [ `echo $DISPLAY | cut -c1-10` = 'localhost:' ]; then
# X11UseLocalhost=yes
echo add unix:`echo $DISPLAY |

cut -c11-` $proto $cookie
else
# X11UseLocalhost=no
echo add $DISPLAY $proto $cookie
fi | xauth -q -
fi


And it works!


Thursday, June 15, 2017

windows - Load-testing tools for Active Directory?




Are there any tools out there that can torture-test and measure AD performance? We're looking at a fairly major expansion of our environment (think tens of thousands of computers) that will throw lots of transactions at our AD environment.



We suspect that we need to add hardware to our core network, but I don't want to buy hardware blindly and either waste money or hurt performance for the users.



Any ideas? I'm thinking of a tool to generate synthetic transactions, but I'm willing to accept any suggestions.


Answer



Microsoft has a tool exactly for this called Active Directory Performance Testing Tool (ADTest.exe). Unfortunately, I can't find any documentation online for you, but a quick adtest.exe /? should give you some information. I believe there is a "quicktest" option too which will get you up and running quickly.



You may also want to read this article on domain controller capacity planning. It is written for 2003, but it should apply to 2008 also.


linux - Showing total progress in rsync: is it possible?



I have searched for this option already, but have only found solutions that involve custom patching. The fact that it does not show in --help and no more info can be found probably indicates the answers is 'no', but I'd like to see this confirmed.




Is it possible to show total file transfer progress with rsync?


Answer



There is now an official way to do this in rsync (version 3.1.0 protocol version 31, Tested with Ubuntu Trusty 14.04).



#> ./rsync -a --info=progress2 /usr .
305,002,533 80% 65.69MB/s 0:00:01 xfr#1653, ir-chk=1593/3594)


I tried with my /usr folder because I wanted this feature for tranferring whole filesystems, and /usr seemed to be a good representative sample.




The --info=progress2 gives a nice overall percentage, even if it's just a partial value. In fact, my /usr folder is more than 6 gigs:



#> du -sh /usr
6,6G /usr/


and rsync took a lot of time to scan it all. So almost all the time the percentage I've seen was about 90% completed, but nonetheless it's comforting to see that something is being copied :)



References:





amazon ec2 - After reading lots of articles on sendmail relay tls port 587 gmail, still having problems



Environment is Amazon EC2, running Amazon Linux (Centos-like) with sendmail 8.14.4 and Cyrus sasl 2.1.23. The machine has an elastic IP address that is the target of a domain name, and reverse DNS is setup for it. MX record points at an external server, so the machine does not deal with any incoming internet email, only outgoing. For all outgoing off-node email, I want to use a TLS authenticated connection to smtp.googlemail.com.



There are lots of tutorials and articles about this kind of setup (several on this site and its sister sites), as it seems to be a fairly popular way of dealing with email in the cloud. I've been reading every one I can find, trying different things, and studying /var/log/maillog. But for the life of me, I'm stumped. It's not that I can't send email with sendmail, or that I can't send it through an authenticated TLS connection to smtp.googlelemail.com: it's just that I can't get sendmail to send it through such a connection! I admit I'm a sendmail newbie, and I know it has a reputation for difficult configuration, so I've really put a lot of time into it. But I've run out of clues and ideas at this point.




I have one php application on the machine that uses Zend Framework (1.11), and it uses Zend_Mail_Transport_Smtp. In the constructor for the transport I specify smtp.googlemail.com, port 587, tls, name@gmail.com, and password. The email from that application gets sent lickity-split and arrives with nice clean headers (I also added a TXT record including google's spf txt records to the domain's zone file).



I have another application that uses php mail directly, and php mail relies on sendmail. That application also is able to send mail that arrives okay under the default sendmail configuration supplied with the Amazon Linux AMI (which does not relay through googleemail). But the headers in the arriving messages are not so clean, at least one thing being that there is a "neutral" complaint from the spf check. So I'm not stuck without email, it's just that I'd like to be able to use sendmail for its greater reliably with queueing outgoing mail in case of multi-user bursts of emails coming from the website (as all those new users of the Zend Framework application sign up for accounts :-), and I don't want users looking at the headers and thinking things may not be quite right.



So the goal here is just to modify the existing working sendmail configuration to relay outgoing mail through a port 587 tls connection to googlemail, just the way that the Zend Framework application does with no problem. And then I can modify the Zend Framework application to use sendmail to get the overload protection of queueing.



How hard can it be I thought...



What I'm finding is that with the changes in place in the sendmail configuration, outgoing messages invariably get stuck with this message in /var/log/maillog:




"timeout waiting for input from googlemail-smtp.l.google.com. during client greeting"


The messages then go into the mailq and stay there, failing each time they get re-tried.



Here are diffs between the sendmail.cf that works by sending directly, and the one modified to relay through google:



# diff sendmail.mc-orig sendmail.mc-new
26c27,30

< dnl define(`SMART_HOST', `smtp.your.provider')dnl
---
> define(`SMART_HOST', `[smtp.googlemail.com]')dnl
> define(`RELAY_MAILER_ARGS', `TCP $h 587')dnl
> define(`ESMTP_MAILER_ARGS', `TCP $h 587')dnl
> FEATURE(authinfo, `Hash -o /etc/mail/authinfo.db')dnl
52,53c56,59
< dnl TRUST_AUTH_MECH(`EXTERNAL DIGEST-MD5 CRAM-MD5 LOGIN PLAIN')dnl
< dnl define(`confAUTH_MECHANISMS', `EXTERNAL GSSAPI DIGEST-MD5 CRAM-MD5 LOGIN PLAIN')dnl
---

> dnl # TRUST_AUTH_MECH(`EXTERNAL DIGEST-MD5 CRAM-MD5 LOGIN PLAIN')dnl
> dnl # define(`confAUTH_MECHANISMS', `EXTERNAL GSSAPI DIGEST-MD5 CRAM-MD5 LOGIN PLAIN')dnl
> TRUST_AUTH_MECH(`PLAIN LOGIN')dnl
> define(`confAUTH_MECHANISMS', `PLAIN LOGIN')dnl


Here is the (sanitized) mode 600 root-owned authinfo file:
AuthInfo:[smtp.googlemail.com] "U:user@gmail.com" P:"xyzzy" "M:PLAIN"



I also started saslauthd, and edited /usr/lib/sasl2/Sendmail.conf to read:
pwcheck_method:saslauthd
mech_list: login plain




I did not set up any certificates, because as far as I could tell they are not necessary if sendmail is only using TLS on outgoing connections; and indeed the port 587 connection works in the Zend Framework application without certificates installed (or saslauthd running for that matter).



The level 29 maillog for a message sent with the relaying sendmail config looks like this:



[31528]: q1F5dYcB031528: from=userapp.com+admin@gmail.com, size=7085, class=0, nrcpts=1, msgid=<3fb0c7a36a63b2b855336d0865b345a7@bugs.userapp.com>, relay=nobody@localhost
[31529]: NOQUEUE: connect from localhost [127.0.0.1]
[31529]: AUTH: available mech=PLAIN LOGIN, allowed mech=PLAIN LOGIN
[31529]: q1F5dYYY031529: Milter: no active filter
[31529]: q1F5dYYY031529: --- 220 name.compute-1.internal ESMTP Sendmail 8.14.4/8.14.4; Wed, 15 Feb 2012 05:39:34 GMT
[31529]: q1F5dYYY031529: <-- EHLO name.compute-1.internal

[31529]: q1F5dYYY031529: --- 250-name.compute-1.internal Hello localhost [127.0.0.1], pleased to meet you
[31529]: q1F5dYYY031529: --- 250-ENHANCEDSTATUSCODES
[31529]: q1F5dYYY031529: --- 250-PIPELINING
[31529]: q1F5dYYY031529: --- 250-8BITMIME
[31529]: q1F5dYYY031529: --- 250-SIZE
[31529]: q1F5dYYY031529: --- 250-DSN
[31529]: q1F5dYYY031529: --- 250-ETRN
[31529]: q1F5dYYY031529: --- 250-AUTH PLAIN LOGIN
[31529]: q1F5dYYY031529: --- 250-DELIVERBY
[31529]: q1F5dYYY031529: --- 250 HELP

[31529]: q1F5dYYY031529: <-- MAIL From: SIZE=7085 AUTH=userapp.com+2Badmin@gmail.com
[31529]: ruleset=trust_auth, arg1=userapp.com+2Badmin@gmail.com, relay=localhost [127.0.0.1], reject=550 5.7.1 ... not authenticated
[31529]: q1F5dYYY031529: --- 250 2.1.0 ... Sender ok
[31529]: q1F5dYYY031529: <-- RCPT To:
[31529]: q1F5dYYY031529: --- 250 2.1.5 ... Recipient ok
[31529]: q1F5dYYY031529: <-- DATA
[31529]: q1F5dYYY031529: --- 354 Enter mail, end with "." on a line by itself
[31529]: q1F5dYYY031529: from=, size=7190, class=0, nrcpts=1, msgid=<3fb0c7a36a63b2b855336d0865b345a7@bugs.userapp.com>, proto=ESMTP, daemon=MTA, relay=localhost [127.0.0.1]
[31529]: q1F5dYYY031529: --- 250 2.0.0 q1F5dYYY031529 Message accepted for delivery
[31528]: q1F5dYcB031528: to=user@example.com, ctladdr=userapp.com+admin@gmail.com (99/99), delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=37085, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (q1F5dYYY031529 Message accepted for delivery)

[31529]: q1F5dYYZ031529: <-- QUIT
[31529]: q1F5dYYZ031529: --- 221 2.0.0 name.compute-1.internal closing connection
[31531]: q1F5dYYY031529: SMTP outgoing connect on name.compute-1.interna
[31531]: q1F5dYYY031529: timeout waiting for input from googlemail-smtp.l.google.com. during client greeting
[31531]: q1F5dYYY031529: to=, delay=00:05:00, xdelay=00:05:00, mailer=relay, pri=127190, relay=googlemail-smtp.l.google.com. [74.125.91.16], dsn=4.0.0, stat=Deferred: Connection timed out with googlemail-smtp.l.google.com.


I do see the "ruleset=trust_auth, ... not authenticated" message, but besides not knowing how to fix it, I also see that it's immediately followed by an ok message, and the log shows it proceeding onward to try to connect to the relay, so I think that has nothing to do with the timeout... If I'm wrong and someone could tell me how to fix it, that would be great!



The maillog for a message sent with the unmodified config that works without relaying (note that "user@example.com" is actually an address with an mx record for a Network Solutions server, which is why the final line has a relay= for a netsol.net host):




[31425]: q1F5VtLr031425: from=userapp+admin@gmail.com, size=6743, class=0, nrcpts=1, msgid=<8aa8ddfdc691cb86896329126a4eb6ef@bugs.userapp.com>, relay=nobody@localhost
[31426]: NOQUEUE: connect from localhost [127.0.0.1]
[31426]: AUTH: available mech=PLAIN LOGIN, allowed mech=EXTERNAL GSSAPI KERBEROS_V4 DIGEST-MD5 CRAM-MD5
[31426]: q1F5VuPd031426: Milter: no active filter
[31426]: q1F5VuPd031426: --- 220 name.compute-1.internal ESMTP Sendmail 8.14.4/8.14.4; Wed, 15 Feb 2012 05:31:56 GMT
[31426]: q1F5VuPd031426: <-- EHLO name.compute-1.internal
[31426]: q1F5VuPd031426: --- 250-name.compute-1.internal Hello localhost [127.0.0.1], pleased to meet you
[31426]: q1F5VuPd031426: --- 250-ENHANCEDSTATUSCODES
[31426]: q1F5VuPd031426: --- 250-PIPELINING

[31426]: q1F5VuPd031426: --- 250-8BITMIME
[31426]: q1F5VuPd031426: --- 250-SIZE
[31426]: q1F5VuPd031426: --- 250-DSN
[31426]: q1F5VuPd031426: --- 250-ETRN
[31426]: q1F5VuPd031426: --- 250-DELIVERBY
[31426]: q1F5VuPd031426: --- 250 HELP
[31426]: q1F5VuPd031426: <-- MAIL From: SIZE=6743
[31426]: q1F5VuPd031426: --- 250 2.1.0 ... Sender ok
[31426]: q1F5VuPd031426: <-- RCPT To:
[31426]: q1F5VuPd031426: --- 250 2.1.5 ... Recipient ok

[31426]: q1F5VuPd031426: <-- DATA
[31426]: q1F5VuPd031426: --- 354 Enter mail, end with "." on a line by itself
[31426]: q1F5VuPd031426: from=, size=6848, class=0, nrcpts=1, msgid=<8aa8ddfdc691cb86896329126a4eb6ef@bugs.userapp.com>, proto=ESMTP, daemon=MTA, relay=localhost [127.0.0.1]
[31426]: q1F5VuPd031426: --- 250 2.0.0 q1F5VuPd031426 Message accepted for delivery
[31425]: q1F5VtLr031425: to=user@example.com, ctladdr=userapp+admin@gmail.com (99/99), delay=00:00:01, xdelay=00:00:00, mailer=relay, pri=36743, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (q1F5VuPd031426 Message accepted for delivery)
[31428]: q1F5VuPd031426: SMTP outgoing connect on name.compute-1.interna
[31426]: q1F5VuPe031426: <-- QUIT
[31426]: q1F5VuPe031426: --- 221 2.0.0 name.compute-1.internal closing connection
[31428]: q1F5VuPd031426: to=, delay=00:00:00, xdelay=00:00:00, mailer=esmtp, pri=126848, relay=inbound.domain.netsolmail.net. [205.178.149.7], dsn=2.0.0, stat=Sent (OK FB/29-06630-5434B3F4)
[31428]: q1F5VuPd031426: done; delay=00:00:00, ntries=1



Here is what I get using telnet:



# telnet smtp.googlemail.com 587
Trying 74.125.93.16...
Connected to smtp.googlemail.com.
Escape character is '^]'.
220 mx.google.com ESMTP j17sm7987765qaj.9


502 5.5.1 Unrecognized command. j17sm7987765qaj.9
STARTTLS
503 5.5.1 EHLO/HELO first. j17sm7987765qaj.9
EHLO localhost
250-mx.google.com at your service, [nnn.nnn.nnn.nnn]
250-SIZE 35882577
250-8BITMIME
250-STARTTLS
250 ENHANCEDSTATUSCODES
STARTTLS

220 2.0.0 Ready to start TLS


Any help greatly appreciated!


Answer



You seem to have a small typo in your password declaration, so try the following:



AuthInfo:googlemail.com "U:user@gmail.com" "P:xyzzy" "M:PLAIN"
AuthInfo:google.com "U:user@gmail.com" "P:xyzzy" "M:PLAIN"



Do not forget to run makemap and rebuild authinfo.db:



makemap hash authinfo < authinfo

amazon ec2 - How to list EC2 security group *rules* from within instance?




I have SSH root access to an EC2 instance but no access to the AWS Console / AWS KEY & SECRET.




My incoming traffic to this host on all TCP ports except 22 TCP seems to be blocked - I cannot access my services (for example nginx on port 80) from the outside.



nmap shows these ports as filtered, while port 22 is shown as open when SSH is running and closed when SSH is temporarily shut down.




ICMP and UDP are also blocked.



(I used ping, nc and some other tools to check that.)





I know that my instance is in some custom, non-default EC2 Security Group named, let's say my-security-group, but I don't know its rules.




How to list these rules with the access level I have got?




Update 1: My iptables rules are empty - let's assume that I am sure that its the Security Group that is blocking my traffic.


Answer



You can't. You can get a list of groups you're in at http://169.254.169.254/latest/meta-data/security-groups but it won't give you the rules themselves.


Wednesday, June 14, 2017

Did my registrar screw up or is this how name server propagation works?



So my company has a number of domains with a large registrar that shall go unnamed. We are making some changes to our DNS infrastructure and the first of those is we are moving our secondary DNS from one server on site to four servers offsite. So we updated the name servers for each domain at the registrar by removing the entry for the old secondary name server and adding the four new ones. I monitored the old secondary server for requests and when I saw no new requests had been made for 24 hours I shut it down. That was this morning. I assumed at this point everything was good. Unfortunately this was my mistake. I should have gone and made sure name servers at large were returning the correct NS records.




So this afternoon we were performing maintenance on our primary DNS server and we shut it down. This is when I started getting alerts from our external monitoring. I checked and sure enough, the DNS server used there reported the only NS record for our primary domain was the primary name server. The new secondary servers were not listed and neither was the old secondary.



Is it unreasonable of me to have assumed that because the update was from



ns1.mydomain.com
ns2.mydomain.com


to




ns1.mydomain.com
ns1.backupdns.com
ns2.backupdns.com
ns3.backupdns.com
ns4.backupdns.com


in one step at the registrar that there should be no intermediate state where the only NS record was for ns1.mydomain.com?




Going forward to be safe obviously I will always leave the old name servers alone until after I'm 100% sure the new ones have propagated and only then remove the old name servers from the registrar. However, I'd still like to know if my registrar screwed up or if my expectation was unreasonable.


Answer




Is it unreasonable of me to have assumed that because the update was from <... trimmed ...>




YES.



Generally speaking, it is unreasonable for you to make ANY assumption about ANY change performed through control panel software (except the standard assumption that it's going to screw up somehow).
That includes DNS registrar management interfaces (which are usually pretty awful on the back-end).




The changes you made were probably processed as two separate transactions (one removing the old server, one adding the new ones), and someone got your DNS information after the first transaction, but before the second.






You got bit here because you kind-of Did It Wrong - though in a way that many of us do.
For the future, when decommissioning DNS servers / replacing them with new ones the safe workflow is:




  1. Build and deploy your new DNS servers. Verify they are functioning correctly.

  2. Add the new DNS servers to the registrar's list of name servers.

  3. Wait (until the change has been picked up on the internet at large.)
    TTL-Dependent, but usually 24-48 hours is a good rule.



    • At this point you should start to see queries on the new servers.


  4. Remove the old DNS server from the registrar's list of name servers.

  5. Wait again (until the change is picked up on the internet at large)
    You should stop seeing queries going to the decommissioned server.
    As in (3), 24-48 hours is a good rule to go with.

  6. Unplug the old server and dispose of it per your company's policies.



That workflow guarantees that the worst-case scenario is that someone will have an extra (lame) NS listed because they're using the "Step 2" information, but they will always have all your new secondaries, so they should always be able to find at least one working name server for your domain.




You combined steps 2, 3, 4, and 5 into one step, and on the back end the removal (4) happened before the addition (2).
Chances are that would never have caused a problem except for your maintenance happening before everyone caught up with the "addition" part of the changes. It's a classic edge case and you landed on it.



Now you know, and knowing is 7/16ths of the battle.


domain name system - Is a wildcard CNAME DNS record valid?



I know it's valid to have a DNS A record that's a wildcard (e.g. *.mysite.com). Is it possible/valid/advised to have a wildcard CNAME record?


Answer



It is possible to do this. At one point it was up in the air a bit until 4592 clarified that it should be supported.



Just because it is possible doesn't mean it is supported by all DNS providers. For example, GoDaddy won't let you set up a wildcard in a CNAME record.




In terms of whether it is advisable or not to do this, it depends on your usage. Usually CNAMES are used for convenience when you are pointing to an "outside" domain name that you don't control the DNS on.



For example, let's say you set up a CMS system that allows you to have *.mycms.com as the site name (it uses host headers). You want customers to be able to easily set up *.cms.customer.com, without worrying that you might change your IP address at some point. In that case, you could advise them to set up a wildcard CNAME called *.cms.customer.com to www.mycms.com.



Because wildcard CNAMES aren't supported by all providers (such as GoDaddy), I wouldn't advise using it in a case where you suggested it for various customers (where you don't know their provider's capabilities).


Tuesday, June 13, 2017

php - How to configure apache server with localhost virtual host

I am new to Apache.



I am editing the httpd.config to point virtual host to a specific filesystem location.



Here is what I am trying:



127.0.0.1


DocumentRoot "/Users/MyUser/Documents/StoreFront"

Alias StoreFront /Users/MyUser/Documents/StoreFront


Options FollowSymLinks
AllowOverride None
Order deny,allow
Deny from all




I cannot restart the server after making these changes. I get the following errors:



/usr/sbin/apachectl: line 82: ulimit: open files: cannot modify limit: Invalid argument
Syntax error on line 161 of /private/etc/apache2/httpd.conf:
Invalid command '127.0.0.1', perhaps misspelled or defined by a module not included in the server configuration
httpd not running, trying to start



Can someone help me out?



Update



I have the following now:



This is what my hosts file looks like:



    # Host Database

#
# localhost is used to configure the loopback interface
# when the system is booting. Do not change this entry.
##
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost
fe80::1%lo0 localhost



This is the virtualhost I have in my httpd.config:




DocumentRoot "/Users/Nick/Documents/StoreFront"
ServerName localhost

Options Indexes FollowSymLinks
AllowOverride All
Order allow,deny
Allow from all





After making these changes I am trying to restart the server with following:




apachectl -k restart





Error:




/usr/sbin/apachectl: line 82: ulimit: open files: cannot modify limit: Invalid argument
httpd not running, trying to start
(13)Permission denied: make_sock: could not bind to address [::]:80
(13)Permission denied: make_sock: could not bind to address 0.0.0.0:80
no listening sockets available, shutting down
Unable to open logs


place php errors in log file

I am running mac 10.6.4 on an iMac and am using it as a developer server.



I have Apache and Entropy php5 installed, when i write my applications, some pages wont run when php has errors, however these are not recorded on a log file, I created one php_errors.log and entered the following on the php.ini file



error_log = /usr/local/php5/logs/php_errors.log


However errors are not written to this file and i have




log_errors = true


What could be the problem

Monday, June 12, 2017

Ipv6 over bridged network stops working after some time (Ubuntu LXC)

i hope you can help me. I am running several Linux Containers on a bare metal server which is provided with an Global Ipv6 address as well as an 64-bit network by my provider.




I have configured IPv6 settings within the config files of each container using a bridge provided by the host machine. When I reboot the whole system I am able to access my containers via IPv6 and vice versa (e.g. ping6 google.de works). After some time (dunno how long) the containers are no more accessible via their ipv6-address and I don't know why (ping6 google.de also doesn't work). Does anyone have a clue what could cause this behavior?



Here are my configs:



network/interfaces (master)



iface br0 inet6 static
pre-up modprobe ipv6
address 2a02:xxxx:1:1::517:f79

gateway 2a02:xxxx:1:1::1
netmask 64
bridge_stp on



sysctl.conf (master)





net.ipv6.conf.default.autoconf=0
net.ipv6.conf.default.accept_ra=0

net.ipv6.conf.default.accept_ra_defrtr=0
net.ipv6.conf.default.accept_ra_rtr_pref=0
net.ipv6.conf.default.accept_ra_pinfo=0
net.ipv6.conf.default.accept_source_route=0
net.ipv6.conf.default.accept_redirects=0
net.ipv6.conf.default.forwarding=1
net.ipv6.conf.default.proxy_ndp=1
net.ipv6.conf.all.autoconf=0
net.ipv6.conf.all.accept_ra=0
net.ipv6.conf.all.accept_ra_defrtr=0

net.ipv6.conf.all.accept_ra_rtr_pref=0
net.ipv6.conf.all.accept_ra_pinfo=0
net.ipv6.conf.all.accept_source_route=0
net.ipv6.conf.all.accept_redirects=0
net.ipv6.conf.all.forwarding=1
net.ipv6.conf.all.proxy_ndp=1



network/interfaces (container)







auto lo
iface lo inet loopback



auto eth0
iface eth0 inet manual
iface eth0 inet6 manual





LXC-Container config






  • lxc.network.type = veth

  • lxc.network.flags = up

  • lxc.network.link = br0

  • lxc.network.hwaddr = 7e:7f:de:16:xx:xx

  • lxc.network.ipv4.gateway = 81.7.xx.1


  • lxc.network.ipv4 = 81.7.xx.xxx/24

  • lxc.network.ipv6 = 2a02:xxxx:a:77::123/64



I would be very thankful for any advice.



Best,
Patrick

Xenserver, iSCSI and Dell MD3600i



I have a functional xenserver 6.5 pool with two nodes. It is backed by an iscsi share on a Dell MD3600i SAN, and this works fine. It was set up before my time.




We've added three more nodes to the pool. However these three new nodes will not connect to the storage.



Here's one of the original nodes, working fine:



[root@node1 ~]# iscsiadm -m session
tcp: [2] 10.19.3.11:3260,1 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)
tcp: [3] 10.19.3.14:3260,2 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)
tcp: [4] 10.19.3.12:3260,1 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)
tcp: [5] 10.19.3.13:3260,2 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)



Here's one of the new nodes. Notice the corruption in the address?



[root@vnode3 ~]# iscsiadm -m session
tcp: [1] []:-1,2 ▒Atcp: [2] 10.19.3.12:3260,1 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)
tcp: [3] 10.19.3.11:3260,1 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)
tcp: [4] 10.19.3.14:3260,2 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)



The missing IP address is .13 but another node is missing .12



Comments:



I have live running production VMs on the existing nodes and nowhere to move them, so rebooting the SAN is not an option.



Multipathing is disabled on the original nodes, despite the san having 4 interfaces. This seems sub optimal so I've turned on multipathing on the new nodes.



The three new nodes have awfully high system loads. Original boxes have a load average of 0.5 to 1, and the three new nodes are sitting at about 11.1, with no VMs running. top shows no high CPU processes, so its something kernel-related ? There are no processes locked in state D (uninterruptable sleep)




If I tell Xencenter to "repair" those Storage Repositories it sits spinning its wheels for hours till I hit cancel. The message is Plugging PDB for node5



Question: How do I get my new xenserver pool members to see the pool storage and work like expected ?



EDIT Further information




  • None of the new nodes will do a clean reboot either - they get wedged in "stopping iSCSI" on a reboot and I have to use the drac to remotely repower them.

  • Xencenter is adamant that the nodes are in maintenance mode and that they haven't finished booting.




Good pool node:



[root@node1 ~]# multipath -ll
36f01faf000eaf7f90000076255c4a0f3 dm-36 DELL,MD36xxi
size=3.3T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=12 status=enabled
| |- 14:0:0:6 sdg 8:96 active ready running
| `- 15:0:0:6 sdi 8:128 active ready running
`-+- policy='round-robin 0' prio=11 status=enabled

|- 12:0:0:6 sdc 8:32 active ready running
`- 13:0:0:6 sdh 8:112 active ready running
36f01faf000eaf6fd0000098155ad077f dm-35 DELL,MD36xxi
size=917G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=14 status=enabled
| |- 12:0:0:5 sdb 8:16 active ready running
| `- 13:0:0:5 sdd 8:48 active ready running
`-+- policy='round-robin 0' prio=9 status=enabled
|- 14:0:0:5 sde 8:64 active ready running
`- 15:0:0:5 sdf 8:80 active ready running



Bad node



[root@vnode3 ~]# multipath
Dec 24 02:56:44 | 3614187703d4a1c001e0582691d5d6902: ignoring map
[root@vnode3 ~]# multipath -ll
[root@vnode3 ~]# (ie no response at all, exit code was 0)



Bad node



[root@vnode3 ~]# iscsiadm -m session
tcp: [1] []:-1,2 ▒Atcp: [2] 10.19.3.12:3260,1 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)
tcp: [3] 10.19.3.11:3260,1 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)
tcp: [4] 10.19.3.14:3260,2 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)

[root@vnode3 ~]# iscsiadm -m node --loginall=all
Logging in to [iface: default, target: iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb, portal: 10.19.3.13,3260] (multiple)

^C iscsiadm: caught SIGINT, exiting...


So it tries to log into an IP on the SAN, but spins its wheels for hours till I hit ^C.


Answer



For closure, there were multiple things wrong.




  1. The hosts were configured for a 1500 byte MTU, whereas the storage SAN was using 9216 byte MTU.

  2. One of the hosts had a subtly-different IQN from reality - the SAN listed the correct IQN as "unassigned" even though it was visually the same as the IQN in use.


  3. My original two nodes had management IPs configured on their on-board 1 Gbit card. The three new nodes had an acceptable management IP configured on the bonded interface, in a vlan. This was too different and mostly stopped the new hosts from coming out of maintanence mode after a boot.



Multipath seemed to have no bearing on the problem at all.



Deleting and fiddling around with files in /var/lib/iscsi/* on the xenserver nodes had no impact on the problem.



I had to use other means to reboot these newer boxes too - they would wedge up trying to stop the iscsi service.



And finally the corruption in the IQN name visible in iscsiadm -m session has vanished completely. This was possibly related to the MTU mismatch.




For future internet searchers - good luck!


linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...