December 2015

Thursday, December 31, 2015

networking - Troubleshooting (or tracing) Windows network logon problems

I've got a situation where a client computer (Windows Vista) doesn't seem to be sending the right password to a server (Windows Server 2003).

The event log records the logon failure, but as far as I can tell the client has the right password - so I'd really like to know what is actually being sent back & forth between the two computers as they try to negotiate the logon.

Is there any way to monitor/trace/examine a Windows logon session? (I assume a plain packet capture wouldn't work, since the passwords aren't sent in plain text - at least I hope not!)

MORE INFO: The server is the only server on the network. The computers are all on the same subnet, 192.168.1.xxx. The client computer is not a member of the domain. The server computer is the DNS server - and the client computer can correctly resolve the server's address without any problems.

The following events are logged in the event log:

A logon attempt by MICROSOFT_AUTHENTICATION_PACKAGE_V1_0, which fails with code 0xC0000234

A "logon failure" event which says "unknown user name or bad password."
- The user name specified in the event is the user name I'm using
- The "logon type" is "3"
- The logon process is "NtLmSsp"
- The authentication package is NTLM

All the client computer is trying to do is connect to a network share (mapping a network drive, actually).

Answer

There is more data to be gathered.

Does the user report problems with logging in, or are you just responding to the messages in the event log? Can you reproduce this yourself?

If the user isn't reporting problems, then it is quite possible that they are running a service under their user name that has an expired password. Take a look at their local services (under Administrative Tools, and make sure that the "Log on As" field doesn't have their user name.

Also, ensure that the clocks are in sync. Kerberos doesn't work with a large time skew between two boxes.

sql server - Can't see second azure sql database in SSMS object explorer

I have created a second database on my existing Azure SQL server. The first database works fine and I can see it using SSMS.

I cannot see the second database in the object explorer. Autocomplete detects that it exists however.

Any suggestions?

Answer

So for me, it turned out that when I was connecting using SSMS, I had set the database to connect to as my first database by accident - meaning that was all I could see.

On connect, go to Options, then check the database you're connecting to. If you're using multiple users to connect to that server like I was, my admin user ended up also being forced to connect to a single database instead of < default >.

spam - EXIM SMTP allows to send mails without login / authentication via telnet to any domain

I'm ashamed, but I have to ask for help. My server is being used for sending spam, I've found out I can simply connect with telnet (edit: from any server in office, home and even directly from CMD/Putty Telnet), add mail from/rcpt to/data without any login/authorization and send mail from my domain to any external mailbox (for example gmail accounts). I'm using Exim/SMTP/CSF on Debian, and have basic knowlegde about them.

root@vps:~# telnet example.com 25
Trying 19x.10x.8x.1xx...
Connected to example.com.
Escape character is '^]'.
220 serwer.example.com.pl ESMTP Exim 4.91 Wed, 19 Sep 2018 10:48:05 

+0200
mail from: xyz@example.com
250 OK
rcpt to: outerbox@gmail.com
250 Accepted
data
354 Enter message, ending with "." on a line by itself
test data.
.
250 OK id=1g2Y9t-0003yu-Of

I want to prevent this and force any form of authentication to prevent sending spam from my server to external mailboxes. My second server while trying to do this same thing, after "rcpt to": command returns "550 authentication required". I think that's the proper behaviour, so you can't send spam.

In my exim.conf I've got empty relay parameters (I've tried putting my server's IP or localhost adress, without luck):

addresslist whitelist_senders = lsearch;/etc/virtual/whitelist_senders
addresslist blacklist_senders = lsearch;/etc/virtual/blacklist_senders
domainlist blacklist_domains = lsearch;/etc/virtual/blacklist_domains
domainlist whitelist_domains = lsearch;/etc/virtual/whitelist_domains

domainlist local_domains = lsearch;/etc/virtual/domains
domainlist relay_domains = 
domainlist use_rbl_domains = lsearch;/etc/virtual/use_rbl_domains
hostlist auth_relay_hosts = 
hostlist bad_sender_hosts = lsearch;/etc/virtual/bad_sender_hosts
hostlist bad_sender_hosts_ip = net-lsearch;/etc/virtual/bad_sender_hosts
hostlist relay_hosts = 
hostlist whitelist_hosts = lsearch;/etc/virtual/whitelist_hosts
hostlist whitelist_hosts_ip = net-lsearch;/etc/virtual/whitelist_hosts

Authentication section

begin authenticators

plain:
    driver = plaintext
    public_name = PLAIN
    server_prompts = :
    server_condition = "${perl{smtpauth}}"

    server_set_id = $2

login:
    driver = plaintext
    public_name = LOGIN
    server_prompts = "Username:: : Password::"
    server_condition = "${perl{smtpauth}}"
    server_set_id = $1

How can I protect my smtp socket? How can i force "authentication required" process? I tried to compare .conf files with my second server, but despite 2 days of tries I'm out of luck.

Nginx + PHP5-FPM repeated cut outs 502

I've seen a number of questions here that highlight random 502 (Nginx + PHP-FPM = "Random" 502 Bad Gateway) and similar time outs when using Nginx + PHP-FPM.

Even with all the questions, I'm still unable to find a solution.

Using Ubuntu 10.10 + Nginx + PHP5-FPM + APC and every 1 out of 4 requests ends in a timeout and failure. This isn't a load issue or large traffic, it happens even in dev environment with one person.

I am doing this across 3 1GB machines, each with the same configurations and same problems.

fastcgi_params

fastcgi_param  QUERY_STRING       $query_string;
fastcgi_param  REQUEST_METHOD     $request_method;
fastcgi_param  CONTENT_TYPE       $content_type;
fastcgi_param  CONTENT_LENGTH     $content_length;


fastcgi_param  SCRIPT_NAME        $fastcgi_script_name;
fastcgi_param  REQUEST_URI        $request_uri;
fastcgi_param  DOCUMENT_URI       $document_uri;
fastcgi_param  DOCUMENT_ROOT      $document_root;
fastcgi_param  SERVER_PROTOCOL    $server_protocol;

fastcgi_param  GATEWAY_INTERFACE  CGI/1.1;
fastcgi_param  SERVER_SOFTWARE    nginx/$nginx_version;

fastcgi_param  REMOTE_ADDR        $remote_addr;

fastcgi_param  REMOTE_PORT        $remote_port;
fastcgi_param  SERVER_ADDR        $server_addr;
fastcgi_param  SERVER_PORT        $server_port;
fastcgi_param  SERVER_NAME        $server_name;

fastcgi_param  REDIRECT_STATUS    200;

/etc/php5/fpm/main.conf

; FPM Configuration ;

;include=/etc/php5/fpm/*.conf

; Global Options ;

pid = /var/run/php5-fpm.pid

error_log = /var/log/php5-fpm.log


;log_level = notice

;emergency_restart_threshold = 0

;emergency_restart_interval = 0

;process_control_timeout = 0

;daemonize = yes


; Pool Definitions ; 

include=/etc/php5/fpm/pool.d/*.conf

/etc/php5/fpm/pool.d/www.conf

[www]
listen = 127.0.0.1:9000


;listen.backlog = -1
;listen.allowed_clients = 127.0.0.1
;listen.owner = www-data
;listen.group = www-data
;listen.mode = 0666

user = www-data
group = www-data

;pm.max_children = 50

pm.max_children = 15
;pm.start_servers = 20
pm.min_spare_servers = 5
;pm.max_spare_servers = 35
pm.max_spare_servers = 10
;pm.max_requests = 500
;pm.status_path = /status

;ping.path = /ping
;ping.response = pong


request_terminate_timeout = 30
;request_slowlog_timeout = 0

;slowlog = /var/log/php-fpm.log.slow

;rlimit_files = 1024
;rlimit_core = 0

;chroot = 


chdir = /var/www

;catch_workers_output = yes

Samba server - CIFS mount issues

Background:

I have a samba cifs server. It is not joined to a domain, but has a keytab for an MIT kerberosV realm.

Kerberized mounts (e.g. mount -t cifs //cifs.example.com/groups /mnt/cifs -o sec=krb5i) work from Linux clients.
Kerberized mounts from AD joined windows machines (joined to a domain configured with a trust to the Kerberos Realm). Password based mounts don't work for Linux clients (not a big deal).

Password based mounts for non AD joined Windows clients kind of work. Using explorer to go to \\cifs.example.com\groups will not work, and no password prompt will appear. However, if \\cifs.example.com\groups is mounted as a letter drive, the dialog will not complete, but the drive mapping will be established and work, and the dialog box can be canceled at this point while retaining the mount.

Question:

How can a make the UNC path prompt for a password on non AD joined Windows machines?

Configs:

hostname: cifs.example.com
realm: EXAMPLE.COM
distro: CentOS release 6.5 (Final)
samba version: samba-3.6.9-167.el6_5.x86_64

smb.conf

syslog only = yes
syslog = 3


server string = %h server (Samba, CentOS)
workgroup = EXAMPLE.COM
security = ads
realm = EXAMPLE.COM
create krb5 conf = no
kerberos method = secrets and keytab
server signing = auto
smb encrypt = auto
smb ports = 445
use sendfile = yes


map to guest = Bad User
guest account = nobody

wins support = no
dns proxy = no

load printers = no
printing = bsd
printcap name = /dev/null

disable spoolss = yes

hide files = /Desktop.ini/$RECYCLE.BIN/Thumbs.db/~$.*/

[home]
path = /export/home/
writeable = yes
guest ok = no
browseable = no
create mask = 0600

directory mask = 0700

[groups]
path = /export/groups
writeable = yes
guest ok = no
browseable = yes
create mask = 0660
directory mask = 0770

klist -k

Keytab name: FILE:/etc/krb5.keytab
KVNO Principal
---- --------------------------------------------------------------------------
   8 host/cifs.example.com@EXAMPLE.COM
   8 host/cifs.example.com@EXAMPLE.COM

   8 host/cifs.example.com@EXAMPLE.COM
   8 host/cifs.example.com@EXAMPLE.COM
   8 cifs/cifs.example.com@EXAMPLE.COM
   8 cifs/cifs.example.com@EXAMPLE.COM
   8 cifs/cifs.example.com@EXAMPLE.COM
   8 cifs/cifs.example.com@EXAMPLE.COM

getsebool -a | grep -e cifs -e samba

allow_ftpd_use_cifs --> off
cobbler_use_cifs --> off
git_cgi_use_cifs --> off
git_system_use_cifs --> off
httpd_use_cifs --> off
qemu_use_cifs --> on
rsync_use_cifs --> off
samba_create_home_dirs --> off
samba_domain_controller --> off
samba_enable_home_dirs --> off

samba_export_all_ro --> off
samba_export_all_rw --> off
samba_portmapper --> off
samba_run_unconfined --> off
samba_share_fusefs --> off
samba_share_nfs --> off
sanlock_use_samba --> off
tftp_use_cifs --> off
use_samba_home_dirs --> off
virt_use_samba --> off

/etc/pam.d/samba

#%PAM-1.0
auth       required pam_nologin.so
auth       include  password-auth
account    include  password-auth
session    include  password-auth
password   include  password-auth

/etc/pam.d/password-auth

#%PAM-1.0
# This file is auto-generated.
# User changes will be destroyed the next time authconfig is run.
auth        required      pam_env.so
auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_succeed_if.so uid >= 500 quiet

auth        sufficient    pam_sss.so use_first_pass
auth        required      pam_deny.so

account     required      pam_unix.so
account     sufficient    pam_localuser.so
account     sufficient    pam_succeed_if.so uid < 500 quiet
account     [default=bad success=ok user_unknown=ignore] pam_sss.so
account     required      pam_permit.so

password    requisite     pam_cracklib.so try_first_pass retry=3 type=

password    sufficient    pam_unix.so sha512 shadow nullok try_first_pass use_authtok
password    sufficient    pam_sss.so use_authtok
password    required      pam_deny.so

session     optional      pam_keyinit.so revoke
session     required      pam_limits.so
session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session     required      pam_unix.so
session     optional      pam_sss.so

Answer

Needed to change max protocol from the default NT1 to max protocol = SMB2.

Wednesday, December 30, 2015

Move Azure SQL database server between subscriptions

We have two subscriptions (DEV and PROD). A database server has been deployed to the DEV subscription, but it is now in use by our production system. I want to move the Azure SQL database server to the PROD environment.

I have owner privilege on both the DEV and PROD environment, but when I try to move the Azure SQL Server then the PROD environment doesn't show in the list of subscriptions. This feature has been implemented in 2012 and I do see the 'subscriptions' combo, so I think it should be possible. Anyone has a hint?

How to tell rsync do not check permissions

I have two directories with same PHP application in every of them. I want to execute rsync -rvz from one, source directory to another, destination, so rsync will copy changed files only. Problem is that files in the source directory has 755 permissions, by mistake. Permissions in destination are fine.

How can I tell to rsync ignore permission checkings and check by size only?
Thanks.

Answer

As long as you don't supply the -p flag permissions shouldn't be changed. If you're still not getting the permissions you expect make sure perms is off and use --chmod=ugo=rwX

For reference:
http://linux.die.net/man/1/rsync

 -r, --recursive             recurse into directories

 -v, --verbose               increase verbosity
 -z, --compress              compress file data during the transfer
 -n, --dry-run               perform a trial run with no changes made

 -p, --perms                 preserve permissions
 -E, --executability         preserve executability
     --chmod=CHMOD           affect file and/or directory permissions

-p, --perms This option causes the receiving rsync to set the destination permissions to be the same as the source permissions. (See
also the --chmod option for a way to modify what rsync considers to be
the source permissions.)

The man page goes on to say:

In summary: to give destination files (both old and new) the source
permissions, use --perms. To give new files the destination-default

permissions (while leaving existing files unchanged), make sure that
the --perms option is off and use --chmod=ugo=rwX (which ensures that
all non-masked bits get enabled).

Side-note: If possible it might make more sense for your developer to be pushing their changes back into the repo, and have all servers use a code repo, rather then use rsync to ship files form one server to the other.

email - Postfix - Before Queque or After?

I want to relay my e-mails, but I want the Mail-Relay to check the relayed mails for spam.
I am using Debian 6.0, Postfix 2.7.1 and Amavis.

Headers of the messages look like this at the moment - it looks like amavis is handling the message but does not scan it =(:

X-spam-status: No, score=0
tagged_above=-999 required=6.31
tests=[none] autolearn=ham Received:
from mx.domain.org
([127.0.0.1]) by localhost
(mx.domain.org [127.0.0.1])
(amavisd-new, port 10024) with ESMTP

id c1bRJ7muUUN0 for
; Mon, 7
Mar 2011 17:42:35

Any suggestions?

linux - Network Restructure Method for Double-NAT network

Due to a series of poor network design decisions (mostly) made many years ago in order to save a few bucks here and there, I have a network that is decidedly sub-optimally architected. I'm looking for suggestions to improve this less-than-pleasant situation.

We're a non-profit with a Linux-based IT department and a limited budget. (Note: None of the Windows equipment we have runs does anything that talks to the Internet nor do we have any Windows admins on staff.)

Key points:

We have a main office and about 12 remote sites that essentially
double NAT their subnets with physically-segregated switches. (No
VLANing and limited ability to do so with current switches)

These locations have a "DMZ" subnet that are NAT'd on an identically

assigned 10.0.0/24 subnet at each site. These subnets cannot talk to
DMZs at any other location because we don't route them anywhere
except between server and adjacent "firewall".

Some of these locations have multiple ISP connections (T1, Cable, and/or DSLs) that we manually route using IP Tools in Linux. These firewalls all run on the (10.0.0/24) network and are mostly "pro-sumer" grade firewalls (Linksys, Netgear, etc.) or ISP-provided DSL modems.

Connecting these firewalls (via simple unmanaged switches) is one or more servers that must be publically-accessible.

Connected to the main office's 10.0.0/24 subnet are servers for email, tele-commuter VPN, remote office VPN server, primary router to the internal 192.168/24 subnets. These have to be access from specific ISP connections based on traffic type and connection source.

All our routing is done manually or with OpenVPN route statements

Inter-office traffic goes through the OpenVPN service in the main 'Router' server which has it's own NAT'ing involved.

Remote sites only have one server installed at each site and cannot afford multiple servers due to budget constraints. These servers are all LTSP servers several 5-20 terminals.

The 192.168.2/24 and 192.168.3/24 subnets are mostly but NOT entirely on Cisco 2960 switches that can do VLAN. The remainder are DLink DGS-1248 switches that I am not sure I trust well enough to use with VLANs. There is also some remaining internal concern about VLANs since only the senior networking staff person understands how it works.

All regular internet traffic goes through the CentOS 5 router server which in turns NATs the 192.168/24 subnets to the 10.0.0.0/24 subnets according to the manually-configured routing rules that we use to point outbound traffic to the proper internet connection based on '-host' routing statements.

I want to simplify this and ready All Of The Things for ESXi virtualization, including these public-facing services. Is there a no- or low-cost solution that would get rid of the Double-NAT and restore a little sanity to this mess so that my future replacement doesn't hunt me down?

Basic Diagram for the main office:
enter image description here

These are my goals:

Public-facing Servers with interfaces on that middle 10.0.0/24 network to be moved in to 192.168.2/24 subnet on ESXi servers.

Get rid of the double NAT and get our entire network on one single subnet. My understanding is that this is something we'll need to do under IPv6 anyway, but I think this mess is standing in the way.

Answer

1.) Before basically anything else get your IP addressing plan straightened out. It's painful to renumber but it's the necessary step to arrive at a workable infrastructure. Set aside comfortably large, easily summarized supernets for workstations, servers, remote sites (with unique IP's, naturally), management networks, loopbacks, etc. There's a lot of RFC1918 space and the price is right.

2.) It's hard to get a sense of how to lay out L2 in your network based on the diagram above. VLAN's may not be necessary if you've got sufficient numbers of interfaces in your various gateways as well as sufficient numbers of switches. Once you've got a sense of #1 it might make sense to reapproach the L2 question separately. That said, VLAN's aren't an especially complex or novel set of technologies and needn't be that complicated. A certain amount of basic training is in order, but at a minimum the ability to separate a standard switch into several groups of ports (i.e. without trunking) can save a lot of money.

3.) The DMZ hosts should probably be placed onto their own L2/L3 networks, not merged in with workstations. Ideally you'd have your border routers connected to a L3 device (another set of routers? L3 switch?) which, in turn, would connect a network containing your externally facing server interfaces (SMTP host, etc). These hosts would likely connect back to a distinct network or (less optimally) to a common server subnet. If you've laid out your subnets appropriately then the static routes required to direct inbound traffic should be very simple.

3a.) Try to keep the VPN networks separate from other inbound services. This will make things easier as far as security monitoring, troubleshooting, accounting, etc.

4.) Short of consolidating your Internet connections and/or routing a single subnet via several carriers (read: BGP) you'll need the intermediate hop before your border routers to be able to redirect in- and out- bound traffic appropriately (as I suspect you're doing at the moment). This seems like a bigger headache than VLAN's, but I suppose it's all relative.

Tuesday, December 29, 2015

iis - FTP directory restrictions via IIS6

We are trying to setup the FTP on our stand-alone dedicate Windows Server 2003 (Standard, 32bit, SP2) IIS6. We are NOT using AD.

It is NOT setup for user isolation, as I need the "administrator" account to be able to access any part of the D: drive (which the FTP has setup as it's root).

I want to be able to restrict a single user account (created on the local box) to only be able to access a particular sub-sub-directory structure on the drive. I do not want to allow this user to read/write/navigate to any other part of the D: drive. If necessary I can accept directory listings, but certainly nothing more than that.

In IIS6 I have created a virtual directory using the username (as the user mentioned above) as the alias - logging into FTP using the credentials puts them straight into the directory, which is correct and what I'm after. But I cannot find any way of blocking them from navigating outside of "their" structure.

I have tried Denying them permission at the root of the D: drive, but of course the Deny overrides any attempt to Allow them permission in "their" directory.

I have also tried creating a group, so that should I need to I can add other users into this group and they will also be denied access to anything that isn't their directory structure.

As you might have gathered, I'm not a Network Admin by trade, so please be gentle!

Answer

Thanks to Bad Dos, he has sent me roughly in the right direction and I have figured out a way of setting up the permissions that works for us. As he says, our current FTP setup is not ideal, and we will look at changing it in the near future.

In the mean time, I have done the following things:

Created a group (called "AllFtpUsers") and allocated the new user (mentioned above) into it.

Removed both the "Users" group and "Authenticated Users" group from the root of the D: drive

Added the "AllFtpUsers" group to the root of the D: drive, with just "List Folder Contents" allowed.

All directories in the root of the D: drive have had the "AllFtpUsers" group set with "List Folder Content" denied (meaning they can see but not alter files in the root directory, and not navigate into any of the sub-directories)

Set the permissions on the "sub-sub-directory" (that IIS has the virtual directory set to) to "Modify" control

This means that we still have full control with our normal login, but the new user can only change their sub-sub-directory (despite also being able to see all files in the root).

The reason for remove the "Users" and "Authenticated Users" group from the root was that as soon as the new user had logged in, they were given all the permissions of those groups - leaving us access to the files/directories as normal through the "Administrators" group.

Monday, December 28, 2015

routing - linux route 2nd internal network

In my network configuration, I have three switches:

Internet (xx.xx.140.129/25)

Internal SAN (10.1.1.0/24)

iLo management (10.1.30.0/24)

I have one Linux server which I use for management that needs to have access to all three networks, however it only has 2 NICs. I've cabled switches #2 and #3 together, so there is a physical path between them, and I've tried ip route add 10.1.1.0/24 eth0, but that did not work. Any ideas on how this could be done?



[root@ilo]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
XX.XX.140.128 0.0.0.0           255.255.255.128 U     0      0        0 eth1
10.1.30.0       0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.1.1.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 eth1
0.0.0.0         XX.XX.140.129   0.0.0.0         UG    0      0        0 eth1

ext4 - How does SSD meta-data corruption on power-loss happen? And can I minimize it?

Note: This is a follow-up question to Is there a way to protect SSD from corruption due to power loss?. I got good info there but it basically centered in three area, "get a UPS", "get better drives", or how to deal with Postgres reliability.

But what I really want to know is whether there is anything I can do to protect the SSD against meta-data corruption especially in old writes. To recap the problem. It's an ext4 filesystem on Kingston consumer-grade SSDs with write-cache enabled and we're seeing these kinds of problems:

files with the wrong permissions

files that have become directories (for example, toggle.wav is now a directory with files in it)

directories that have become files (not sure of content..)

files with scrambled data

The problem is less with these things happening on data that's being written while the drive goes down, or shortly before. It's a problem but it's expected and I can handle that in other ways.

The bigger surprise and problem is that there is meta-data corruption happening on the disk in areas that were not recently written to (ie, a week or more before).

I'm trying to understand how such a thing can happen at the disk/controller level. What's going on? Does the SSD periodically "rebalance" and move blocks around so even though I'm writing somewhere else? Like this:

And then there is a power loss when D is being rewritten. There may be pieces left on block 1 and some on block 2. But I don't know if it works this way. Or maybe there is something else happening..?

In summary - I'd like to understand how this can happen and if there anything I can do to mitigate the problem at the OS level.

Note: "get better SSDs" or "use a UPS" are not valid answers here - we are trying to move in that direction but I have to live with the reality on the ground and find the best outcome with what we have now. If there is no solution with these disks and without a UPS, then I guess that's the answer.

References:

Is post-sudden-power-loss filesystem corruption on an SSD drive's ext3 partition "expected behavior"?
This is similar but it's not clear if he was experiencing the kinds of problems we are.

EDIT:
I've also been reading issues with ext4 that might have problems with power-loss. Ours are journaled, but I don't know about anything else.

Prevent data corruption on ext4/Linux drive on power loss

http://www.pointsoftware.ch/en/4-ext4-vs-ext3-filesystem-and-why-delayed-allocation-is-bad/

Answer

For how metadata corruption can happen after an unexpected power failure, give a look at my other answer here.

Disabling cache can significantly reduce the likehood of in-flight data loss; however, based on your SSDs, data-at-rest remain at risk of being corrupted. Moreover, it commands a massive performance loss (I saw 500+ MB/s SSDs to write at a mere 5 MB/s after disabling the private DRAM cache).

If you can't trust your SSDs, the only "solution" (or, rather, workaround) is to use an end-to-end checksumming filesystem as ZFS or BTRFS and a RAID1/mirror setup: in this manner, any eventual single-device (meta)data corruption can be recovered from the other mirror side by running a check/scrub.

git - etckeeper pushing to github

I set up etckeeper and added the file /etc/etckeeper/commit.d/60github-push in order to push the commit to github.

[orschiro@thinkpad etc]$ sudo cat /etc/etckeeper/commit.d/60github-push 
#!/bin/sh 
set -e
if [ "$VCS" = git ] && [ -d .git ]; then   
  cd /etc/   
  git push origin master 
fi

However, pushing to github fails as etckeeper tries to push as root. Should the use of sudo not preserve my user account settings for git, including my ~/.ssh keys?

[orschiro@thinkpad etc]$ sudo etckeeper commit "test"
[master de5971c] test
 Author: orschiro 
 3 files changed, 2 insertions(+), 1 deletion(-)
 rename etckeeper/{ => commit.d}/60github-push (100%)
 create mode 100644 test
no such identity: /root/.ssh/id_rsa: No such file or directory

no such identity: /root/.ssh/id_dsa: No such file or directory
no such identity: /root/.ssh/id_ecdsa: No such file or directory
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Answer

To preserve the current ssh keys for when you're in root, use sudo -E.

That way there's no need to add anything to the root ssh config

mac osx - Can I specify a port in an entry in my /etc/hosts on OS X?

I want to trick my browser into going to localhost:3000 instead of xyz.com. I went into /etc/hosts on OS X 10.5 and added the following entry:

127.0.0.1:3000 xyz.com

That does not work but without specifying the port the trick works. Is there a way to do this specifying the port?

Answer

No, the hosts file is simply a way to statically resolve names when no DNS server is present.

Sunday, December 27, 2015

ssl - Redirect users connecting with SSLv3 within nginx

I was looking to drop all support for the SSLv3 due to POODLE, but found that there are still some people coming from old browsers for the likes of IE on Windows XP.

How do I detect these SSLv3-only users from within nginx, and redirect them to some helper page with further instructions?

I definitely need no workarounds to keep these users using insecure browsers.

And I'll be especially happy if I could do the same thing to all non-SNI browsers: SSLv3 doesn't come with SNI, so if I could redirect non-SNI browsers, it'll solve SSLv3 problem too.

Answer

Putting aside the issue of leaving SSLv3 enabled, you can simply instruct nginx to redirect based on whether the SSLv3 protocol is being used:

if ($ssl_protocol = SSLv3) {
  rewrite ^ /poodle-doodle.html;
}

You can test this from a shell:

$ wget --secure-protocol=SSLv3 -O - $SERVER_URL

# or
$ curl -v -3 $SERVER_URL

debian - Advice on most fitting choice of virtualization platform: Xen | OpenVZ | KVM | other? What suits the purpose best?

I am aware of the multitude of virtualization threads here but they all seem to be oldish and i'd like to have a brand new one ready for 2012.

I'm currently evaluating Xen, OpenVZ and KVM for virtualization purposes.
I'm having trouble deciding what to use.
The OS this will be running off of is Debian, preferably. The guest OS's will all be *nix based, mostly Debian as well. No windows, macos or other exotic stuff required.
I have a single server, which has 16gigs of ram and a xeon processor on it. I also have a software raid 1 disk configuration with 3tb raid capacity.

I am setting up this environment so separate the sites my current server hosts by level of trust, and software version.
For example, there are some sites i know might have security holes, others which should be perfectly secure, and others that require an archaic version of PHP.

All in all, i'd like to set up 3 different guests: one for trusted, one for untrusted, one for old php.

Part of my problem is managing backups properly:
I enjoy using Bacula or duplicity to manage my backups because of incremental, encrypted backups.
I do not want any of my client sites to ever have to go offline due to backup processes.
I also only have 100 gigs of remote off-site backup space, so i want to use that wiseley, and not just dump all i have up there. Restoring from backups should be fast [no downloading huge iso files!].

I also want to do the disk space allocation right.
I've read marvelous things about LVM and how it makes ones life easier.
Assuming a raid 1 [two 3tb disks under raid1], how would you lay out your partition map?

I'd be happy if somebody could share his personal experiences, setup configurations and win/fails regarding different virtualization platforms, for a similar goal as mine.

Thanks!

Answer

I'd say use KVM - that way your hypervisor & Dom0 can be the standard debian you're familiar with. With KVM, the hypervisor and the Dom0 are the same machine - it's one of the design principles behind KVM that the best hv to have is a full-featured linux system.

With Xen, the hypervisor runs on the bare metal and the Dom0 runs inside it along with all the guest DomUs, kind of like a special purpose VM.

I don't think container-style virtualisation offers enough real benefits over simple vhosting that it's worth the bother.

For performance, I think your plan to use LVM for VM images (rather than, say, image files on a fs) is a good one.

Alternatively, you could use zfsonlinux (note: not zfs-fuse, it's too slow) which is pretty stable and reliable. the "catch" is you have to download the debianised source packages from the ubuntu zfsonlinux PPAs and recompile them for debian. easy if you're comfortable with compiling packages, probably not very easy if you're not.

zfs gives you everything that LVM with fewer restrictions and limitations (e.g. snapshotting even running VM volumes is fast and easy), and with a much less steep learning curve. If you're already familiar with LVM that last one isn't a big deal.

Disclaimer: I'm opinionated and therefore biased.

I'm not a fan of Xen. I've used Xen & KVM, dabbled with vmware (and virtualbox too although that's more of an end-user/desktop-oriented virtualisation tool rather than server virtualisation) and I strongly prefer KVM. It just works, without stupid hassles.

I'm hoping that the recent merge of Xen into the mainline kernel results in rapid improvement of Xen. It certainly can't hurt to escape being stuck with ancient kernel versions.

similarly, i'm not a huge fan of LVM either. I used it in the past because there was nothing else that did what it did. However, I have never liked it and have always thought that it is clumsy and obtuse and gratuitously complicated. i've been using zfsonlinux for a few months now and it's everything i ever wanted LVM to be. I hope i never have to build or administer another lvm system again.

Saturday, December 26, 2015

domain name system - How do I redirect www to non-www in Route53?

I host my site at domain.com.

My DNS entries in Route53 are as follows:

domain.com      A       xxx.xxx.xxx.xxx      300
domain.com      NS      stuff.awsdns-47.org  172800
domain.com      SOA     stuff.awsdns-47.org  900

I would like to redirect traffic from www.domain.com to domain.com, as currently this just returns a 404. This question on SO suggested a PTR record, and I added that:

www.domain.com  PTR     domain.com           300

but it didn't work. What should I be doing?

Answer

PTR is for setting up reverse IP lookups, and it's not something you should care about. Remove it.

What you need is a CNAME for www:

www.domain.com  CNAME  domain.com 300

domain name system - DNS forwarding or root hints

The author of Best practices for DNS forwarding [petri.com] recommends using the ISP's DNS servers as forwarders instead of doing the recursive lookups yourself, the main reason being performance. This makes sense as you're only doing one query, getting the response probably right away, given a big enough cache at the ISP and a popular enough website.

A downside of using your ISP's DNS servers might be their stability. It used to be the case that ISP's often had not-very stably DNS servers. However, this can be solved by simply forwarding to name servers such as 1.1.1.1, 8.8.8.8, or 9.9.9.9.

What are the benefits of doing the lookups yourself?

Edit: Using public name servers like Quad9 also adds in security as it filters out known malicious domains.

Answer

To answer my own question...

John is correct in stating that "if a DNS service meets your needs, certainly forward to it." A few reasons why it may not meet your needs:

The DNS provider might block certain websites (e.g. torrent sites) by returning an IP address they - or the government - owns, hosting a website stating the website is banned for illegal activities.

The DNS provider might return A records for non-existent domain names for advertising purposes (comment from Torin Carey).

A reason for running your own resolvers:

If your company is dual-homed to two different ISPs, it might not be possible to use the DNS servers from ISP1 when traffic leaves your network via ISP2. In this case you should either use public DNS servers (e.g. 8.8.8.8) or run your own resolvers.

If the latency from the ISP's or a public DNS server is too high, you should run your own resolvers.

If both options (own resolvers or public ones) both are valid options for your company, you can chose which to want, depending on personal or architectural preferences. Of course, running your own resolvers means more systems to manage, you need to have system administrators with DNS knowledge in your team, etc.

iis 6 - Disable .net completely in a IIS6 Application Pool

we're managing some web sites for our clients on our servers, some running Windows Server 2003 R2 and others running 2008 R2. In Windows Server 2008 R2, we can disable completely .NET framework usage for some application pools, which is great since most of our websites are still using classic ASP.

After some issues with classic ASP applications being configured to run as ASP.NET 4 in a CLR 2.0 pool, we wanted to do the same thing in IIS6 - that is, have application pools without any .NET support.

Is this a supported scenario in IIS6?

Thanks

Friday, December 25, 2015

linux - Does crontab file automatically bash a text file?

I have a crontab job setup. In the crontab file, I have a path to a text file. My text file has a wget command (which in turn, executes a PHP file). If the crontab file just has the path to the text file, will it automatically bash (execute) that text file? Or do I need to prefix the path to the text file with bash?

Answer

If the file is executable (check if it has x in ls -l, if not, then use chmod to set the executable bit) and the first line contains #!/bin/bash then it will be interpreted in bash.

The other option is, as you suggest, to pass it as an argument to bash:

/bin/bash /path/to/your/file.sh

Thursday, December 24, 2015

linux - iDRAC6 Virtual Media native library cannot be loaded

When attempting to mount Virtual Media on a iDRAC6 IP KVM session I get the following error:

The Virtual Media program will close. Reason: The Virtual Media native library cannot be loaded.

I'm using Ubuntu 9.04 and:

$ javaws -version
Java(TM) Web Start 1.6.0_16

$ uname -a
Linux aud22419-linux 2.6.28-15-generic #51-Ubuntu SMP Mon Aug 31 13:39:06 UTC 2009 x86_64 GNU/Linux

$ firefox -version
Mozilla Firefox 3.0.14, Copyright (c) 1998 - 2009 mozilla.org

On Windows + IE it (unsurprisingly) works.

I've just gotten off the phone with the Dell tech support and I was told it is known to work on Linux + Firefox, albeit Ubuntu is not supported (by Dell, that is).

Has anyone out there managed to mount virtual media in the same scenario?

vmware vsphere - Copying template from SAN Datastore to local datastore

I want to copy a template from SAN datastore to local ESXi datastore. How is that possible.
Underlined VMFS for both source and destination datastore is 5.33.

hardware - Can I connect a PCIe 2.5" SSD to a regular PCIe 3 slot?

I'm looking to upgrade my server's storage. It currently has SATA SSD storage but I would like to add to it an even faster PCIe3 SSD. It's a full tower with a few empty PCIe3 slots but no PCIe3 backplane, the motherboard model is Supermicro X9DR3-F. 2.5" PCIe SSDs are significantly cheaper than SSDs with PCIe3 card form factor. Is there a way to add a 2.5" PCIe SSD despite this motherboard not having the right disk backplane?

Answer

Assuming you are buying U.2 form-factor NVMe SSDs, then you could get something like this PCI adapter card, which let's you fit a 2.5" NVMe SSD into a PCI-e slot.

Wednesday, December 23, 2015

linux - ssh port forwarding with firewall-cmd

I'm trying to do an ssh tunnel into a server behind NAT:

ssh from laptop --> Host with port forwarding in firewall --> Get directly into guest (172.16.0.2, behind host NAT).

Using iptables on Host - it will work:

# iptables -I OUTPUT  -d 0.0.0.0/0 -j ACCEPT
# iptables -I FORWARD  -d 0.0.0.0/0 -j ACCEPT
# iptables -I INPUT  -d 0.0.0.0/0 -j ACCEPT
# iptables -t nat -I PREROUTING -d 0.0.0.0/0 -p tcp --dport 222 -j DNAT --to-destination 172.16.0.2:22

However, iptables are not saved on Host reboot, since firewalld service is running (firewalld is the default in RHEL 7).

So I'm trying to do the same port forwarding with firewall-cmd.

Using firewall-cmd on Host - it will NOT work:

# firewall-cmd --permanent --zone=public --add-forward-port=port=222:proto=tcp:toport=22:toaddr=172.16.0.2'
# firewall-cmd --permanent --zone=public --add-masquerade
# firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -d 0.0.0.0/0 -j ACCEPT
# firewall-cmd --permanent --direct --add-rule ipv4 filter OUTPUT 0 -d 0.0.0.0/0 -j ACCEPT
# firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -d 0.0.0.0/0 -j ACCEPT

# firewall-cmd --permanent --direct --add-rule ipv4 nat PREROUTING 0 -d 0.0.0.0/0 -p tcp --dport 222 -j DNAT --to-destination 172.16.0.2:22


# firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="0.0.0.0/0" forward-port port="222" protocol="tcp" to-port="22" to-addr='"172.16.0.2"

# firewall-cmd --reload

# firewall-cmd --list-all

public (active)
  target: default
  icmp-block-inversion: no
  interfaces: enp4s0f0

  sources: 
  services: ssh dhcpv6-client
  ports: 8139/tcp
  protocols: 
  masquerade: yes
  forward-ports: port=222:proto=tcp:toport=22:toaddr=172.16.0.2
  source-ports: 
  icmp-blocks: 
  rich rules: 
     rule family="ipv4" destination address="0.0.0.0/0" forward-port port="222" protocol="tcp" to-port="22" to-addr="172.16.0.2" 


# firewall-cmd --direct --get-all-rules

ipv4 filter INPUT 0 -d 0.0.0.0/0 -j ACCEPT
ipv4 filter OUTPUT 0 -d 0.0.0.0/0 -j ACCEPT
ipv4 filter FORWARD 0 -d 0.0.0.0/0 -j ACCEPT
ipv4 nat PREROUTING 0 -d 0.0.0.0/0 -p tcp --dport 222 -j DNAT --to-destination 172.16.0.2:22

Now, when trying to connect to the guest - from my laptop, via host port 222 - the ssh connection is refused:

ssh -l stack my-host -p 222
ssh: connect to host my-host port 222: Connection refused

Any idea what am I missing ?

apache 2.2 - How can I install mod_dav_svn 1.6 on CentOS 5.4?

I'm trying to install mod_dav_svn on CentOS 5.4, and this is what I see:

# yum --enablerepo=rpmforge install mod_dav_svn
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * addons: mirrors.adams.net
 * base: mirror.sanctuaryhost.com
 * extras: mirror.sanctuaryhost.com
 * rpmforge: fr2.rpmfind.net
 * updates: mirror.steadfast.net

Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package mod_dav_svn.x86_64 0:1.4.2-4.el5_3.1 set to be updated
--> Processing Dependency: subversion = 1.4.2-4.el5_3.1 for package: mod_dav_svn
--> Running transaction check
---> Package subversion.i386 0:1.4.2-4.el5_3.1 set to be updated
--> Finished Dependency Resolution
[...]

Version 1.4.2 is older than my installed Subversion 1.6.9 (I installed it before). How and where can I get mod_dav_svn in version 1.6.9?

Answer

There's also a script provided from WANdisco for the latest pristine open source binaries. This will setup a repository and you can "yum update" to the latest anytime:

http://wandisco.com/subversion/os/downloads

Memory Usage for Databases on Linux

So with free output what we care about with application memory usage is generally the amount of free memory in the -/+ buffers/cache line. What about with database applications such as Oracle, is it important to have a good amount of cached and buffers available for a database to run well with all the IO?

If that makes any sense, how do you figure out just how much?

Answer

Well, it's kinda simple. The short answer is "mess with it and load test until you find peak perforance".

More detail:
Most database engines fall into one of two overly-broad categories:

A database that takes over a raw disk device and does raw I/O

A database that creates files ("disk files" or some other scheme) in the operating system's FS

Type #1 doesn't care about the OS buffer cache. It wants to glom up all the RAM it can for its own cache, and would prefer that the OS gets the hell out of its way (these almost always get run on big fat dedicated systems)
Off the top of my head Oracle and Sybase can both be configured this way, but I'm sure others can.

Type #2 includes Oracle and Sybase (with different configurations) as well as the two open-source juggernauts (MySQL & Postgres). These systems do care about the OS buffer cache, but how much they care is debatable and depends on the underlying storage engine & the efficiency of the OS buffer cache.
In most cases there are two layers of caching here (the DB engine has a cache & the OS has its buffer cache), and you tune both caches up & down until you find the mix that gives you the best performance.

There are some more extensive notes on type #2 in the Postgres Wiki (look for shared_buffers & effective_cache_size). Those notes are Postgres-specific, but the concepts are generally applicable to other DB engines that use the filesystem to hold their data.

It still always boils down to the short answer I gave at the beginning though.

Tuesday, December 22, 2015

storage - Are bcache and/or dm-cache considered stable for production in 2016?

I would like to use linux SSD caching (dm-cache or bcache) with Debian Jessie production servers. (kernel 3.16)

My question: Are dm-cache and bcache modules reliables in linux 3.16 ? Do I need to upgrade my kernel to a more recent version ?

I also found this worrying message about bcache: https://lkml.org/lkml/2015/12/22/154

Notice that I totally understand what implies caching mode choices (write-back/write-through) in term of reliability and data-loss, my question is more about software bug in these modules

February 2018 follow up after more than 1 year of bcache on a continuous integration server (jenkins instance running lot of intensive jobs !)

Configuration of the server (storage stack essentially)

Hardware:

2 x 480GB SSD (Samsung SM863 enterprise grade MLC)

2 x 4TB HDD (Seagate Constellation ES.3 SATA)

Dell R730 - Dual Xeon E52670 - 128GB RAM

NO hardware RAID, no battery/flash backed hardware write bache, that's where the bcache's writeback feature becomes interesting.

Software:

configured in September 2016, never rebooted

Debian Jessie with 4.6 kernel (from official jessie-backport at the time of last update)

software MD raid 10
- 1 raid10 device for the 2 SSD
- 1 raid10 device for the 2 HDD

2 LVM VG on top the 2 raid devices

a bcache "caching" device created on a logical volume on the SSD_RAID10 VG

a bcache "backing" device created on a logical volume on the HDD_RAID10 VG

the bcache cache configured as writeback

Workload

many jenkins jobs (continuous integration)

cpu intensive jobs mixed with periods of I/O intensivity
- before using bcache such periods where rising regularly the I/O average latency above 5 seconds (!!!)

real workload on this server started only 1 year ago (~Feb 2017)

I/O amount issued on the bcache device according to /proc/diskstats)

350TB written

6TB read (I double checked that, I think that the large amount of RAM helps a lot to cache the reads in the VFS layer)

Result

rock stable ! the machine never had to be rebooted (uptime 525 days), no corruption detected.

hit rate is high ! 78% in all time average, and rising: above 80% in the last months

writeback helps a lot: the disk latency is now order of magnitude lower, sadly I have not accurate measures for that, but the computations are not stalled anymore by write bursts. The dirty data amount rises above 5GB, where an hardware RAID writecache has usually a size between 512MB and 1GB )

Conclusion

bcache is rock stable on this configuration ( but 1 machine, 1 configuration, 1 machine year, it is not sufficient to generalize but it is a good start !)

bcache is very performant on this workload and the writeback mode seems to efficiently replace an hardware RAID write-cache (but keep in mind that the reliability on power loss has not been tested)

in my personal opinion bcache is underrated, and interesting solution could be packaged using it but notice also that the original author now develops bcachefs (a filesytem based on is bcache work) and doesn't improve bcache anymore

Answer

I think that the decreasing cost of SSD storage and the increasing capacity and range of options available make a good case for use using solid-state storage where you need it and foregoing the idea of selective (and potentially buggy) caching.

If you fill in some detail about the environment, the capacity needs and anything else, it may help with a better answer.

Monday, December 21, 2015

Load-balancing between a Procurve switch and a server

I've been searching around the web for this problem i've been having. It's similar in a way to this question: How exactly & specifically does layer 3 LACP destination address hashing work?

My setup is as follows:
I have a central switch, a Procurve 2510G-24, image version Y.11.16. It's the center of a star topology, there are four switches connected to it via a single gigabit link. Those switches service the users.

On the central switch, I have a server with two gigabit interfaces that I want to bond together in order to achieve higher throughput, and two other servers that have single gigabit connections to the switch.

The topology looks as follows:

sw1   sw2   sw3   sw4
 |     |     |     |
---------------------
|        sw0        |
---------------------
  ||        |      |
 srv1      srv2   srv3

The servers were running FreeBSD 8.1. On srv1 I set up a lagg interface using the lacp protocol, and on the switch I set up a trunk for the two ports using lacp as well.
The switch showed that the server was a lacp partner, I could ping the server from another computer, and the server could ping other computers. If I unplugged one of the cables, the connection would keep working, so everything looked fine.

Until I tested throughput. There was only one link used between srv1 and sw0. All testing was conducted with iperf, and load distribution was checked with systat -ifstat.
I was looking to test the load balancing for both receive and send operations, as I want this server to be a file server. There were therefore two scenarios:

iperf -s on srv1 and iperf -c on the other servers

iperf -s on the other servers and iperf -c on srv1 connected to all the other servers.

Every time only one link was used. If one cable was unplugged, the connections would keep going. However, once the cable was plugged back in, the load was not distributed.

Each and every server is able to fill the gigabit link. In one-to-one test scenarios, iperf was reporting around 940Mbps. The CPU usage was around 20%, which means that the servers could withstand a doubling of the throughput.

srv1 is a dell poweredge sc1425 with onboard intel 82541GI nics (em driver on freebsd). After troubleshooting a previous problem with vlan tagging on top of a lagg interface, it turned out that the em could not support this. So I figured that maybe something else is wrong with the em drivers and / or lagg stack, so I started up backtrack 4r2 on this same server.

So srv1 now uses linux kernel 2.6.35.8. I set up a bonding interface bond0. The kernel module was loaded with option mode=4 in order to get lacp. The switch was happy with the link, I could ping to and from the server. I could even put vlans on top of the bonding interface.

However, only half the problem was solved:

if I used srv1 as a client to the other servers, iperf was reporting around 940Mbps for each connection, and bwm-ng showed, of course, a nice distribution of the load between the two nics;

if I run the iperf server on srv1 and tried to connect with the other servers, there was no load balancing.

I thought that maybe I was out of luck and the hashes for the two mac addresses of the clients were the same, so I brought in two new servers and tested with the four of them at the same time, and still nothing changed. I tried disabling and reenabling one of the links, and all that happened was the traffic switched from one link to the other and back to the first again. I also tried setting the trunk to "plain trunk mode" on the switch, and experimented with other bonding modes (roundrobin, xor, alb, tlb) but I never saw any traffic distribution.

One interesting thing, though:
one of the four switches is a Cisco 2950, image version 12.1(22)EA7. It has 48 10/100 ports and 2 gigabit uplinks. I have a server (call it srv4) with a 4 channel trunk connected to it (4x100), FreeBSD 8.0 release. The switch is connected to sw0 via gigabit. If I set up an iperf server on one of the servers connected to sw0 and a client on srv4, ALL 4 links are used, and iperf reports around 330Mbps. systat -ifstat shows all four interfaces are used.

The cisco port-channel uses src-mac to balance the load. The HP should use both the source and destination according to the manual, so it should work as well. Could this mean there is some bug in the HP firmware? Am I doing something wrong?

nginx - Redirect from HTTP to HTTPS with respect to the X-Forwarded-For header (SSL termination used)

I'm looking to redirect from HTTP to HTTPS. My Nginx server sits behind a load balancer which will terminate SSL for me and send all traffic (HTTP and HTTPS) to port 80. The only evidence I will have to indicate whether the original request made was HTTP or HTTPS is via the X-Forwarded-For header that is set by the load balancer. Is there a built-in, inexpensive way to handle redirection in Nginx when the original request was on HTTP? Keep in mind, I'll only have a server set up for port 80.

Answer

Assuming that you're guaranteeing that the X-Forwarded-For header is only set for SSL traffic ...

if ($http_x_forwarded_for) {
    return 301 https://$host$request_uri;
}

Although this is arguably something you should do at the balancer.

Attach storage drive in a VM via iSCSI initiator or VHDX on the Hyper-V host

Our current 200-user infrastructure is based upon XenServer and a StarWind iSCSI SAN. The C: drives for the virtual server's OS are mounted in XenServer SR volumes which are in turn virtual hard disks on the StarWind SAN. However, the data drives for the VM (say our file server) is mounted using the Microsoft iSCSI initiator from within the virtual OS. Therefore the bulk of the I/O is via iSCSI direct (okay via the hypervisor NIC stack) to/from the SAN. We're not limited by the 2TB disk limit in XenServer. Thin-provisioning is provided by the StarWind SAN.

We're moving to a Hyper-V 2012 environment and the situation is a little less clear as we could mount the E: drive via a 2nd VHDX (now the same 2TB size limit is removed). The VHDX also offers thin-provisioning. However, it's still going to have to go across iSCSI from the Hyper-V server to the same SAN so to me, it feels like the VHDX route must be adding an extra layer and therefore would offer lower performance.

Any words of wisdom on whether direct iSCSI or via VHDX is "better" appreciated.

Answer

The two strategies are about equally costly. The VHDX part does add a very thin layer, but doing networking from the Hyper-V parent partition in comparison to doing it from the guest will be slightly less expensive, as you're not doing network virtualization for the iSCSI traffic.

The VHDX strategy, however, it far easier to manage. Personally, I'd choose the ease of management.

windows server 2003 - Migrating to new HDDs

I am running a W2k3 Server on a software raid 1 using two old PATA drives. I now want to replace them with two new SATA drives. What's the easiest way to go about this? Since uptime isn't really an issue at this point, I was thinking about breaking the array, installing the new drive, and rebuilding it; then remove the other PATA drive, replace it and rebuild once again. Are there any good reasons (other than it probably taking forever) not to do this?

I know I could also use something like Acronis TrueImage but I don't really feel like shelling out money for this seemingly trivial migration :-)

Answer

Your idea sounds fine. I would take out disk 0 and cable disk 1 to where disk 0 was, then reboot. Assuming it boots fine keep disk 0 somewhere safe as if it all goes pear shaped you just put the original disk 0 back.

Sunday, December 20, 2015

domain name system - Dynamic DNS client records not being updated correctly

Our network is running Windows and macOS clients all of which are joined to Active Directory. The Macs are all running Windows via Bootcamp. We were having issues which were being caused by our macOS clients because they were not registering PTR records. We were also having issues with stale records. I spent a lot of time researching DNS and DHCP configuration to try to resolve these problems and ended up thinking I had solved it with the following configuration. But I have now realised that we still have problems which I will explain after the configuration.

Servers

2 x Windows Server 2016 VM's

Both Domain Controlers

Both running DNS

Both running DHCP

DHCP Config

Failover mode: Load balance

Enable DNS dynamic updates: Always dynamically update DNS records

Discard A and PTR records when lease is deleted: Enabled

Dynamically update DNS records for DHCP clients that do not request updates: Enabled

Disable dynamic updates for DNS PTR Records: Disabled

DHCP name protection: Disabled

Lease duration: 2 days

Dynamic DNS credentials are configured. Account is only a Domain User

DNS Config

Active Directory Integrated Zones

Dynamic updates: Secure only

Replication: All domain controllers in this domain

No-refresh interval: 1 days

Refresh interval: 1 days

Scavenging period: 3 days

Group Policy

Computer\Administrative Templates\Network\DNS Client\Dynamic update: Set to Disabled

Current Problems

My aim was to get the DHCP servers to handle all DNS registration to solve the problem of macOS not registering PTR records. The PTR records are now being created but it looks like the macs are still creating their own A records as the permissions list the client instead of the DNS registration user.

The macs obviously use the same ethernet or wireless card in macOS and Windows so the MAC address won't change. I assume when each OS requests an IP it will be given the same IP due to the MAC address and the DHCP server will update the client name. I'm not sure about DNS though; Will the DHCP server create a new A record because the client name is different? resulting in duplicate records for a single IP; OR will the the DHCP server try to change the client name on the A record with the matching IP in which case it will fail if the mac has already registered its own record. I don't know if/how I can tell the mac to let the DHCP server register the records for it (Our mac technician decided to pursue a different career path leaving us Windows tech's scratching our heads :P ).

The second problem I'm having is that for some reason even the Windows clients aren't always registering correctly. I have a Windows only device which has received an IP from DHCP but has no A or PTR records. This device is affected by the Group Policy mentioned above, so it would be relying on the DHCP server to register the records for it. But I can't see anything in the DHCP-Server or DNS-Server logs in Event Viewer on the our servers which relate to this client name.

The third problem, albeit minor is that any clients with statically assigned IP's don't register in DNS due to the setting in Group Policy. There are only a handful of these and we occasionally assign IP's temporarily for various tasks (as DHCP clients require proxy for Internet)

If I have missed any important information please ask. Any help would be greatly appreciated.

domain name system - Subdomain specific nameservers fail to resolve intermittently

I'm having issues with DNS resolution of a subdomain. I'm have a bit of a weird situation (at least weird to me) so bear with me while I explain it all.

I am working with a friend who owns the november-project.com domain name. It was purchased and has its DNS hosted with GoDaddy. The nameservers point to HostGator where the wordpress website lives.

Ok. So I have created a separate web app that I was successfully serving under the tracking.november-project.com subdomain. I used HostGator and pushed my assets to the public folder they give you when you create a subdomain. There is also tracking-staging.november-project.com that I used for testing.

Recently, I wanted to move away from HostGator as we had some issues with SSL certs and uptime. I decided to move my app to S3 and uses ButtFront for caching as well as Route 53 to delegate the subdomain resolution to AWS. I used this doc to help me set up the Route 53 subdomain record:

http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/CreatingNewSubdomain.html

I was able to hit the website and see the S3 assets being delivered so I removed the subdomain from HostGator. Now the website is loading for some people but not for others. I can get to the site on my phone but not my laptop. I'm seeing server DNS address could not be found. errors.

I'm not very savvy with DNS stuff and I've learned most of what I know going through this process. Any help is greatly appreciated. I can provide more info about anything as well.

Update:

After checking with GoDaddy, the DNS service is being delegated to HostGator so I cannot add zone records there.

I then talked to HostGator support and they don't allow NS records for customers at my price level (most web/shared levels).

After discussions with a friend, it seemed like the best course of action was to stop using HostGator for DNS. And since I was making that change anyways, I decided to go with Route 53 to consolidate.

I was hoping Route 53 allowed for wildcard NS records so I can delegate everything I don't want to deal with back to HostGator; however, it doesn't seem like wildcard NS records are a common thing for any DNS. So I took some time to copy over all the DNS records in HostGator to Route 53. I then switched GoDaddy to point to Route 53. I'm hoping this will work and won't cause down time for the root site.

Does this all seem reasonable? Are there more recommended courses of action?

Answer

A whois on november-project.com shows the following nameserver records:

Name Server: NS8065.HOSTGATOR.COM
Name Server: NS8066.HOSTGATOR.COM
Name Server: NS-1032.AWSDNS-01.ORG

Name Server: NS-40.AWSDNS-05.COM
Name Server: NS-1565.AWSDNS-03.CO.UK
Name Server: NS-572.AWSDNS-07.NET
Name Server: NS-1465.AWSDNS-55.ORG
Name Server: NS-688.AWSDNS-22.NET
Name Server: NS-2026.AWSDNS-61.CO.UK
Name Server: NS-458.AWSDNS-57.COM

When I dig tracking.november-project.com on the hostgator nameserver then on the AWS one, I get very different responses:

Amazon:

Dig tracking.november-project.com@NS-1465.AWSDNS-55.ORG (205.251.197.185) ...
Authoritative Answer
 Query for tracking.november-project.com type=255 class=1
  tracking.november-project.com A (Address) 52.85.40.208
  tracking.november-project.com A (Address) 52.85.40.67
  tracking.november-project.com A (Address) 52.85.40.155
  tracking.november-project.com A (Address) 52.85.40.215

  tracking.november-project.com A (Address) 52.85.40.151
  tracking.november-project.com A (Address) 52.85.40.200
  tracking.november-project.com A (Address) 52.85.40.138
  tracking.november-project.com A (Address) 52.85.40.222
  tracking.november-project.com NS (Nameserver) ns-1465.awsdns-55.org
  tracking.november-project.com NS (Nameserver) ns-2026.awsdns-61.co.uk
  tracking.november-project.com NS (Nameserver) ns-458.awsdns-57.com
  tracking.november-project.com NS (Nameserver) ns-688.awsdns-22.net

Hostgator:

Dig tracking.november-project.com@NS8065.HOSTGATOR.COM (192.185.5.19) ...
Authoritative Answer
 Query for tracking.november-project.com type=255 class=1
  tracking.november-project.com A (Address) 192.185.38.67
  november-project.com NS (Nameserver) ns8066.hostgator.com
  november-project.com NS (Nameserver) ns8065.hostgator.com
  ns8065.hostgator.com A (Address) 192.185.5.19
  ns8066.hostgator.com A (Address) 192.185.5.190

Ideally, the hostgator nameservers need to have any trace of this domain completely removed to ensure they don't think they are authoritative and will pass the request on the the nameserver that IS. It looks like this isn't happening.

EDIT:

I've had a better look at the Amazon document and they are very vague, but realistically I think you need to add the AWS NS records to the subdomain, not the root domain.

I don't believe you should be seeing the Amazon nameservers when you whois the root domain. The records should be

november-project.com             NS    *hostgator ns*
tracking.november-project.com    NS    *amazon ns*

Whether this is the cause it's hard to tell. When I do an nslookup of tracking-staging on my own machine it fails because the primary NS is listed as the hostgator one - an NS record on the subdomain should be seen as more specific and will take precedence so should hopefully stop this behaviour.

QUESTIONS:
        tracking-staging.november-project.com, type = AAAA, class = IN
    AUTHORITY RECORDS:
    ->  november-project.com

        ttl = 219 (3 mins 39 secs)
        primary name server = ns8065.hostgator.com
        responsible mail addr = dnsadmin.gator4033.hostgator.com
        serial  = 2016033001
        refresh = 86400 (1 day)
        retry   = 7200 (2 hours)
        expire  = 3600000 (41 days 16 hours)
        default TTL = 86400 (1 day)

Saturday, December 19, 2015

hardware raid - Can I mix SAS and SATA2 drives on one controller?

I've read here that I can't mix SAS and SATA2 drives on the same controller - is that really true? I understand I shouldn't mix on the same channel, but the same card!?

For example, when I plug my 6-drive backplane (IBM x3650-7979) via SFF-8087 cable connector into an Adaptec 2405. Spinning 2 SAS drives (Ch.1), and 4 SATA2 drives (Ch.2+3) - Would it work?

Answer

Yes. This works fine and is not an uncommon arrangement. You may want to look into using nearline or midline SAS drives instead of SATA for future installations, though. These days, there are fewer reasons to use SATA disks given the high-capacity SAS options available.

See: SAS or SATA for 3 TB drives?

linux - Slow performance due to txg_sync for ZFS 0.6.3 on Ubuntu 14.04

I am using native ZFS with "ZFS on Linux" installed from the PPA here. Setup was not a problem and I am using it in mirrored configuration with two WD 4TB Red HDDs. Unfortunately I am having performance issues, when writing to the disk-array. When reading performance is OK.

I am having the problem, that during large writes to the array, the copy process stalls to ~5-10MB/s every ~5 seconds as reported by rsync. The speeds in-between stalls is ~75MB/s, which is inline with other filesystems and what I would expect from the system (I tried btrfs, which gets ~85MB/s). Looking at iotop I have found that the copy-stalls coincide with the process txg_sync performing/hogging I/O. This issue appears to be the issue of "bursty" I/O that seems to be a common issue with ZFS (see here and here). I have applied the option from the first link

options zfs zfs_prefetch_disable=1

which helped somewhat with the performance issues, but did not solve them. The 5s interval of txg_sync appears to be that of vfs.zfs.txg.timeout="5" (e.g. 5s), which is the default setting of ZFS on Linux.

Is this normal behaviour or are there other settings can I try? If so, any suggestions? Note that I couldn't find many of the options in both links...

EDIT 2: To follow up a little: The system I am using is a HP ProLiant Microserver N36L, which I upgraded to 8GB ECC RAM. The commands I used for creating the ZFS volume is given here. Note that I am using -o ashift=12 as I found (found on the zfsonlinux FAQ) that this should get ZFS to play nice with the 4096Byte blocks of Advanced Format Disks.

$ zpool create -o ashift=12 -m /zpools/tank tank mirror ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0871252 ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E3PKP1R0
$ zfs set relatime=on tank
$ zfs set compression=lz4 tank
$ zfs create -o casesensitivity=mixed tank/data

Added the zfs_prefetch_disable option to /etc/modprob.d/zfs.conf to make changes permanent:

options zfs zfs_prefetch_disable=1

So that:

$ cat /sys/module/zfs/parameters/zfs_prefetch_disable 
1

EDIT 1: As requested, I added the zpool get all output. Note that I forgot to mention that I turned on compression on the pool...

$ zpool get all
NAME  PROPERTY               VALUE                  SOURCE
tank  size                   3.62T                  -
tank  capacity               39%                    -
tank  altroot                -                      default
tank  health                 ONLINE                 -
tank  guid                   12372923926654962277   default

tank  version                -                      default
tank  bootfs                 -                      default
tank  delegation             on                     default
tank  autoreplace            off                    default
tank  cachefile              -                      default
tank  failmode               wait                   default
tank  listsnapshots          off                    default
tank  autoexpand             off                    default
tank  dedupditto             0                      default
tank  dedupratio             1.00x                  -

tank  free                   2.21T                  -
tank  allocated              1.42T                  -
tank  readonly               off                    -
tank  ashift                 12                     local
tank  comment                -                      default
tank  expandsize             0                      -
tank  freeing                0                      default
tank  feature@async_destroy  enabled                local
tank  feature@empty_bpobj    active                 local
tank  feature@lz4_compress   active                 local

Answer

Pacoman,
It seems that because you have two two WD-RED drives in a mirror, the IO to write the ZIL consistency group to disk is causing high IO. There is always a ZIL (Write-Cache). If you do not have any LOG devices, then the log device is on the pool itself, and can be as large as maximum write speed * 5 seconds. Your probably reading from the ZIL, and committing the data to permanent storage every 5 seconds. Questions:

Do you have a SLOG device? This is ideally a DRAM Drive (HGST ZeusRAM, etc...).

Do you have any cache devices to read from? Ideally, a bunch of Flash, like a 480GB PCIe card.

My recommendation would be to create a SLOG somewhere other than the pool (even the boot device is better than no where, assuming it NOT flash). This way you aren't reading and writing to the mirror intensively every 5 seconds.

mod rewrite - How to simulate Apache [END] flag on a redirect?

For business-specific reasons I created the following RewriteRule for Apache 2.2.22 (mod_rewrite):

RewriteRule /site/(\d+)/([^/]+)\.html /site/$2/$1 [R=301,L]

Which if given a URL like:

http://www.example.com/site/0999/document.html

Is translated to:

http://www.example.com/site/document/0999.html

That's the expected scenario. However, there are documents which name are only numbers. So consider the following case:

http://www.example.com/site/0055/0666.html

Gets translated to:

http://www.example.com/site/0666/0055.html

Which also matches my RewriteRule pattern, so I end up with "The web page resulted in too many redirects" errors from browsers.

I have researched for a long time, and haven't found "good" solutions. Things I tried:

Use the [END] flag. Unfortunately is not available on my Apache version nor it works with redirects.

Use %{ENV:REDIRECT_STATUS} on a RewriteCond clause to end the rewrite process (L). For some reason %{ENV:REDIRECT_STATUS} is empty all the times I tried.

Add a response header with the Header clause if my rule matches and then check for that header (see: here for details). Seems that a) REDIRECT_addHeader is empty b) headers are can't be set on the 301 response explicitly.

There is another alternative. I could set a query parameter to the redirect URL which indicates it comes from a redirect, but I don't like that solution as it seems too hacky.

Is there a way to do exactly what the [END] flag does but in older Apache versions? Such as mine 2.2.22.

Answer

Add a query parameter to the redirect is the best option. It is done quite often for all kinds of reasons.

For example make the rule like this:

RewriteCond %{QUERY_STRING}  !redirected

RewriteRule /site/(\d+)/([^/]+)\.html /site/$2/$1?redirected [R=301,L]

What now happens is that the redirects are to URLs with "?redirected" added to them. And so the rule isn't applied the second time.

storage - Expanding JBOD with SAS extenders

My project is expanding 24-HDD ZFS box by adding 45-drive box with option to add another 45-drive box later (all are JBODs).

Host box (24 HDDs) is Supermicro with single port expander backplane and LSI RAID card (I think, MegaRAID SAS 9240-8i). Currently backplane occupies one SAS port on card, leaving 1 port open.

My current understanding is that on first level, RAID card has to support number of drives I want to have via expanders. So I need to change RAID card to something supporting 128-drives.

After that I should be able to expand by daisy-chaining: Host backplane expander -> host RAID card <- extension 1, backplane 1 (24 HDDs) <- extension 1, backplane 2 (21 HDDs) <- extension 2, backplane 1 (24 HDDs) <- extension 2, backplane 2 (21 HDDs)

My RAID setup is volumes of 4 or 5 vdevs, RAIDZ2 (RAID6), 4TB SAS drives

Questions:
Will daisy-chaining of 2 or 4 backplanes preserve 6Gb/s speed? Is the only limitation is number of supported disks by RAID card?

Also, as I understand, use of dual-expander backplanes allows redundancy by connecting backplanes by daisy-chaining via additional paths. Is that true and is it worth extra 200$?

Should I throw away RAID card and use HBA instead since it will allow support for 128 drives cheaper and all I care is JBOD?

I used these sources to get understanding:

Friday, December 18, 2015

linux - PHP Sockets. Better on multiple ports or not?

I have been running my server for the past 6 years without any problem, but recently ran into some trouble. I am getting a lot of SYN_RECV connections which is pushing the backlog of connecting devices so far back and nothing is updating as it should. Now my server is a Intel(R) Xeon(R) CPU E31230 @ 3.20GHz with 7 cores with 32GB memory and running CentOS 6.7 Red Hat. I have about +- 800 active connections running on this machine. And I have 12 ports running and all +-800 devices is split over these 12 ports with max 100 devices per port. I have my sockets created/running through PHP. I would appreciate ANY advice.

-Would it help to have everything on 1 port?
-Would it help to extent them over more ports (Max 50 devices per port)
-Would it help to tweak the linux settings (which settings would it be)
-Would it help to write it in another language (which would be best if any)
-What else could cause this problem?

I have SYN Cookies enable to help with SYN Attacks:

sysctl -n net.ipv4.tcp_syncookies

This is my sysctl.conf:

/# Kernel sysctl configuration file for Red Hat Linux
/#
/# For binary values, 0 is disabled, 1 is enabled. See sysctl(8) and
/# sysctl.conf(5) for more details.

/# Controls IP packet forwarding net.ipv4.ip_forward = 0

/# Controls source route verification net.ipv4.conf.default.rp_filter = 1

/# Do not accept source routing net.ipv4.conf.default.accept_source_route = 0

/# Controls the System Request debugging functionality of the kernel kernel.sysrq = 0

/# Controls whether core dumps will append the PID to the core filename.
/# Useful for debugging multi-threaded applications. kernel.core_uses_pid = 1

/# Controls the use of TCP syncookies net.ipv4.tcp_syncookies = 1

/# Disable netfilter on bridges. net.bridge.bridge-nf-call-ip6tables = 0 net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0

/# Controls the default maxmimum size of a mesage queue kernel.msgmnb = 65536

/# Controls the maximum size of a message, in bytes kernel.msgmax = 65536

/# Controls the maximum shared segment size, in bytes kernel.shmmax = 68719476736

/# Controls the maximum number of shared memory segments, in pages kernel.shmall = 4294967296

/# Retry SYN/ACK only three times, instead of five net.ipv4.tcp_synack_retries = 5

/# Try to close things only twice net.ipv4.tcp_orphan_retries = 5

/# FIN-WAIT-2 for only 5 seconds net.ipv4.tcp_fin_timeout = 30

/# Increase syn socket queue size (default: 512) net.ipv4.tcp_max_syn_backlog = 1024 net.core.netdev_max_backlog = 1000

/# One hour keepalive with fewer probes (default: 7200 & 9) net.ipv4.tcp_keepalive_time = 7200 net.ipv4.tcp_keepalive_probes = 5

/# Max packets the input can queue net.core.netdev_max_backlog = 65536

/# Keep fragments for 15 sec (default: 30) net.ipv4.ipfrag_time = 30

/# Use H-TCP congestion control net.ipv4.tcp_congestion_control = htcp

net.core.rmem_default = 256960 net.core.rmem_max = 256960
net.core.wmem_default = 256960 net.core.wmem_max = 256960

net.ipv4.tcp_sack = 1 net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 0 net.ipv4.ip_local_port_range = 15000 61000
net.core.somaxconn = 1024

Minus the "/" before the "#"

Here is my PHP code for running the ports:

#!/usr/bin/php -q
    error_reporting(0);
    set_time_limit(0);
    ob_implicit_flush();
    $address = '123.123.123.123';
    $port = 8000;


    //Check for the connections
    if (($master = socket_create(AF_INET, SOCK_STREAM, SOL_TCP)) < 0) 
    { 
        logs("socket_create() failed, reason: " . socket_strerror($master) . "\n", $enable_logging); 
    }

    socket_set_option($master, SOL_SOCKET,SO_REUSEADDR, 1); 

    if (($ret = socket_bind($master, $address, $port)) < 0) 
    { 

        logs("socket_bind() failed, reason: " . socket_strerror($ret) . "\n", $enable_logging); 
    }

    if (($ret = socket_listen($master, SOMAXCONN)) < 0) 
    { 
        logs("socket_listen() failed, reason: " . socket_strerror($ret) . "\n", $enable_logging); 
    } 

    $read_sockets = array($master);


    //Read all data from buffer
    while (true) 
    { 
        $changed_sockets = $read_sockets; 
        $num_changed_sockets = socket_select($changed_sockets, $write = NULL, $except = NULL, NULL); 
        foreach($changed_sockets as $socket) 
        { 

            if ($socket == $master) 
            { 

                if (($client = socket_accept($master)) < 0) 
                { 
                    logs("socket_accept() failed: reason: " . socket_strerror($msgsock) . "\n", $enable_logging); 
                    continue; 
                }
                else
                { 
                    array_push($read_sockets, $client);
                    logs("[".date('Y-m-d H:i:s')."] ".$client." CONNECTED "."(".count($read_sockets)."/".SOMAXCONN.")\r\n", $enable_logging);
                } 

            }
            else
            { 
                $buffer = socket_read($socket, 8192);
                if ($buffer === "") 
                { 
                    $index = array_search($socket, $read_sockets); 
                    unset($read_sockets[$index]); 
                    socket_close($socket); 
                }

                else
                {
                    //Do DB connections etc here
                }
            }
        }
    }
?>

I open and close the ports with a bash script.

Any any help, I'm open for anything.

Thursday, December 17, 2015

security - File Permissions created by Apache

OS: CentOs

Whenever apache (apache2) creates a file or directory it automatically sets the permissions to 777. I want it's directories to be 775 and files 664. How can I fix this?

Answer

Put umask 002 in the end of /etc/sysconfig/httpd and restart httpd (service restart httpd) and it should do the trick for future files. Apache inherits the umask from its parent process, so this setup makes that happen.

networking - Subnetting/Supernetting Configuration

I have an existing network. It looks like this.

Router LAN (192.168.0.1 /24) -> Switch (192.168.0.10 /24) -> Workstations (192.168.0.100 - 192.168.0.200 /24)

Our network has expanded and I need to have more hosts available.

The simplest way to do this is to change the router's inside interface subnet (/24) to a supernet (/23).

This will give me 192.168.0.0-192.168.1.254 instead of just 192.168.0.0-192.168.0.254.

That is exactly what I want.

My question is do I need to adjust the subnet masks on hosts on either the 192.168.0.0 /24 network or 192.168.1.0 /24 network or should they continue to work having only changed the subnet mask on the inside interface of the router?

The reason I'm confused is 192.168.0.0 /24 and 192.168.1.0 /24 are both part of 192.168.0.0 /23 so in my mind there is not a need to change the subnet masks of hosts in those smaller networks but having made the subnet mask change only to the inside interface of the router I am not able to communicate with a host with the static IP address 192.168.1.40.

Finally, I would like to know if I need to change the subnet mask on the hosts, the switches, or both.

Answer

If you do this, then yes, you will need to change the netmask of all the hosts connected to the network, otherwise you will get annoying problems like hosts in different networks not being able to communicate. The ethernet-level method of contacting a host differs according to whether the destination host is on the same network or not, the netmask is used to determine whether this is the case.

The good news is that it only applies to communication between hosts in different (old or new) parts of the network. You can change the netmask on the router, then on all of the hosts, and then start adding hosts in the new part of the network.

If reconfiguring all the hosts is cumbersome, I would add a second IP (192.168.1.1/24) to your router's inside interface. Communication between the two networks would (at least to some extent) go through the router's inside interface, so if you plan to have a massive amount of communication between hosts in your network you may not want that. You could instead add something bigger, like 172.16.0.0/16, and gradually move over your computers from the old network.

You might want to investigate DHCP, especially if some part of your hosts are user machines that usually do not need to be contacted.