May 2016

Tuesday, May 31, 2016

SQL Server database filled the hard drive and freeing up space isn't possible

I have a database in SQL Server 2008 on a 1Tb hard drive and it filled the drive, there is only 4Kb free. The MDF file is 323Gb and the LDF is 653Gb. The hard disk this DB is on has no other files on it other than the MDF and LDF so it's impossible to free up any space on the drive. The main hard disk is smaller but there is enough room to transfer the MDF to that drive, in case that helps. This server is overseas at a customer site and it's not possible at the moment to add more disk space to the server. It's also not possible to delete any records because the DB is in a failed mode (due to no disk space) and it doesn't respond to most commands. The Db is currently in full recovery mode which is why the LDF file is so large. This DB really doesn't need to be in full recovery so going forward we plan on switching it to simple mode which will save us a lot of space. I also don't care about losing the LDF file, but I need all of the data. I've spent a lot of time looking for a way out of this problem but everything I've found first involves either freeing up disk space or adding more disk space, neither of which is an option at this time. I'm stuck and any help would be greatly appreciated.

I get the following log when trying to switch the DB to online mode.

Msg 945, Level 14, State 2, Line 3 Database 'DBNAME' cannot be opened due to inaccessible > files or insufficient memory or disk space. See the SQL Server errorlog for details. Msg > 5069, Level 16, State 1, Line 3 ALTER DATABASE statement failed. Msg 1101, Level 17,
State 12, Line 3 Could not allocate a new page for database 'DBNAME' because of
insufficient disk > space in filegroup 'DEFAULT'. Create the necessary space by dropping > objects in the filegroup, adding additional files to the filegroup, or setting autogrowth > on for existing files in the filegroup.

I've found the following solutions but none work due to having no disk space on that drive, and since the DB is in a failed state I can't run most commmands.
- DBCC SHRINKFILE - can't be run because doing a 'use DBNAME' fails
- Detaching the DB and then changing the location of the MDF/LDF files, this fails because the DB is in an offline mode so you can't run detach.

I'm at a loss about what else to try.

Thanks.

amazon ec2 - Command does not execute in crontab while command itself works just fine

I have this script from Colin Johnson on Github - https://github.com/colinbjohnson/aws-missing-tools/tree/master/ec2-automate-backup

It seems great.
I have modified it to send email to myself every time an EBS snapshot is created or deleted.
The following works like a charm

ec2-automate-backup.sh -v "vol-myvolumeid" -k 3

However, it does not execute at all as part of my crontab (I didn't receive any emails)

#some command that got commented out

*/5 * * * * ec2-automate-backup.sh -v "vol-fb2fbcdf" -k 3;

* * * * * date >> /root/logs/crontab.log;

*/5 * * * * date >> /root/logs/crontab2.log

Please note that the 2nd and 3rd execute just fines as I can see the date and time in log files.

What could I have missed here?

The full ec2-automate-backup.sh is as follows:

#!/bin/bash -
# Author: Colin Johnson / colin@cloudavail.com
# Date: 2012-09-24
# Version 0.1
# License Type: GNU GENERAL PUBLIC LICENSE, Version 3
#
#confirms that executables required for succesful script execution are available

prerequisite_check()

{
    for prerequisite in basename ec2-create-snapshot ec2-create-tags ec2-describe-snapshots ec2-delete-snapshot date
    do
        #use of "hash" chosen as it is a shell builtin and will add programs to hash table, possibly speeding execution. Use of type also considered - open to suggestions.
        hash $prerequisite &> /dev/null
        if [[ $? == 1 ]] #has exits with exit status of 70, executable was not found
            then echo "In order to use `basename $0`, the executable \"$prerequisite\" must be installed." 1>&2 | mailx -s "Error happened 0" eric@mydomain.com ; exit 70
        fi
    done
}


#get_EBS_List gets a list of available EBS instances depending upon the selection_method of EBS selection that is provided by user input
get_EBS_List()
{
    case $selection_method in
        volumeid)
            if [[ -z $volumeid ]]
                then echo "The selection method \"volumeid\" (which is $app_name's default selection_method of operation or requested by using the -s volumeid parameter) requires a volumeid (-v volumeid) for operation. Correct usage is as follows: \"-v vol-6d6a0527\",\"-s volumeid -v vol-6d6a0527\" or \"-v \"vol-6d6a0527 vol-636a0112\"\" if multiple volumes are to be selected." 1>&2 | mailx -s "Error happened 1" eric@mydomain.com ; exit 64
            fi
            ebs_selection_string="$volumeid"

            ;;
        tag) 
            if [[ -z $tag ]]
                then echo "The selected selection_method \"tag\" (-s tag) requires a valid tag (-t key=value) for operation. Correct usage is as follows: \"-s tag -t backup=true\" or \"-s tag -t Name=my_tag.\"" 1>&2 | mailx -s "Error happened 2" eric@mydomain.com ; exit 64
            fi
            ebs_selection_string="--filter tag:$tag"
            ;;
        *) echo "If you specify a selection_method (-s selection_method) for selecting EBS volumes you must select either \"volumeid\" (-s volumeid) or \"tag\" (-s tag)." 1>&2 | mailx -s "Error happened 3" eric@mydomain.com ; exit 64 ;;
    esac
    #creates a list of all ebs volumes that match the selection string from above

    ebs_backup_list_complete=`ec2-describe-volumes --show-empty-fields --region $region $ebs_selection_string 2>&1`
    #takes the output of the previous command 
    ebs_backup_list_result=`echo $?`
    if [[ $ebs_backup_list_result -gt 0 ]]
        then echo -e "An error occured when running ec2-describe-volumes. The error returned is below:\n$ebs_backup_list_complete" 1>&2 | mailx -s "Error happened 4" eric@mydomain.com ; exit 70
    fi
    ebs_backup_list=`echo "$ebs_backup_list_complete" | grep ^VOLUME | cut -f 2`
    #code to right will output list of EBS volumes to be backed up: echo -e "Now outputting ebs_backup_list:\n$ebs_backup_list"
}


create_EBS_Snapshot_Tags()
{
    #snapshot tags holds all tags that need to be applied to a given snapshot - by aggregating tags we ensure that ec2-create-tags is called only onece
    snapshot_tags=""
    #if $name_tag_create is true then append ec2ab_${ebs_selected}_$date_current to the variable $snapshot_tags
    if $name_tag_create
        then
        ec2_snapshot_resource_id=`echo "$ec2_create_snapshot_result" | cut -f 2`
        snapshot_tags="$snapshot_tags --tag Name=ec2ab_${ebs_selected}_$date_current"
    fi

    #if $purge_after_days is true, then append $purge_after_date to the variable $snapshot_tags
    if [[ -n $purge_after_days ]]
        then
        snapshot_tags="$snapshot_tags --tag PurgeAfter=$purge_after_date --tag PurgeAllow=true"
    fi
    #if $snapshot_tags is not zero length then set the tag on the snapshot using ec2-create-tags
    if [[ -n $snapshot_tags ]]
        then echo "Tagging Snapshot $ec2_snapshot_resource_id with the following Tags:"
        ec2-create-tags $ec2_snapshot_resource_id --region $region $snapshot_tags
        #echo "Snapshot tags successfully created" | mailx -s "Snapshot tags successfully created" eric@mydomain.com

    fi
}

date_command_get()
{
    #finds full path to date binary
    date_binary_full_path=`which date`
    #command below is used to determine if date binary is gnu, macosx or other
    date_binary_file_result=`file -b $date_binary_full_path`
    case $date_binary_file_result in

        "Mach-O 64-bit executable x86_64") date_binary="macosx" ;;
        "ELF 64-bit LSB executable, x86-64, version 1 (SYSV)"*) date_binary="gnu" ;;
        *) date_binary="unknown" ;;
    esac
    #based on the installed date binary the case statement below will determine the method to use to determine "purge_after_days" in the future
    case $date_binary in
        gnu) date_command="date -d +${purge_after_days}days -u +%Y-%m-%d" ;;
        macosx) date_command="date -v+${purge_after_days}d -u +%Y-%m-%d" ;;
        unknown) date_command="date -d +${purge_after_days}days -u +%Y-%m-%d" ;;
        *) date_command="date -d +${purge_after_days}days -u +%Y-%m-%d" ;;

    esac
}

purge_EBS_Snapshots()
{
    #snapshot_tag_list is a string that contains all snapshots with either the key PurgeAllow or PurgeAfter set
    snapshot_tag_list=`ec2-describe-tags --show-empty-fields --region $region --filter resource-type=snapshot --filter key=PurgeAllow,PurgeAfter`
    #snapshot_purge_allowed is a list of all snapshot_ids with PurgeAllow=true
    snapshot_purge_allowed=`echo "$snapshot_tag_list" | grep .*PurgeAllow'\t'true | cut -f 3`


    for snapshot_id_evaluated in $snapshot_purge_allowed
    do
        #gets the "PurgeAfter" date which is in UTC with YYYY-MM-DD format (or %Y-%m-%d)
        purge_after_date=`echo "$snapshot_tag_list" | grep .*$snapshot_id_evaluated'\t'PurgeAfter.* | cut -f 5`
        #if purge_after_date is not set then we have a problem. Need to alter user.
        if [[ -z $purge_after_date ]]
            #Alerts user to the fact that a Snapshot was found with PurgeAllow=true but with no PurgeAfter date.
            then echo "A Snapshot with the Snapshot ID $snapshot_id_evaluated has the tag \"PurgeAllow=true\" but does not have a \"PurgeAfter=YYYY-MM-DD\" date. $app_name is unable to determine if $snapshot_id_evaluated should be purged." 1>&2 | mailx -s "Error happened 5" eric@mydomain.com
        else
            #convert both the date_current and purge_after_date into epoch time to allow for comparison

            date_current_epoch=`date -j -f "%Y-%m-%d" "$date_current" "+%s"`
            purge_after_date_epoch=`date -j -f "%Y-%m-%d" "$purge_after_date" "+%s"`
            #perform compparison - if $purge_after_date_epoch is a lower number than $date_current_epoch than the PurgeAfter date is earlier than the current date - and the snapshot can be safely removed
            if [[ $purge_after_date_epoch < $date_current_epoch ]]
                then
                echo "The snapshot \"$snapshot_id_evaluated\" with the Purge After date of $purge_after_date will be deleted."
                ec2-delete-snapshot --region $region $snapshot_id_evaluated
                echo "Old snapshots successfully deleted for $volumeid" | mailx -s "Old snapshots successfully deleted for $volumeid" eric@mydomain.com
            fi
        fi

    done
}

#calls prerequisitecheck function to ensure that all executables required for script execution are available
prerequisite_check

app_name=`basename $0`

#sets defaults
selection_method="volumeid"

region="ap-southeast-1"
#date_binary allows a user to set the "date" binary that is installed on their system and, therefore, the options that will be given to the date binary to perform date calculations
date_binary=""

#sets the "Name" tag set for a snapshot to false - using "Name" requires that ec2-create-tags be called in addition to ec2-create-snapshot
name_tag_create=false
#sets the Purge Snapshot feature to false - this feature will eventually allow the removal of snapshots that have a "PurgeAfter" tag that is earlier than current date
purge_snapshots=false
#handles options processing
while getopts :s:r:v:t:k:pn opt

    do
        case $opt in
            s) selection_method="$OPTARG";;
            r) region="$OPTARG";;
            v) volumeid="$OPTARG";;
            t) tag="$OPTARG";;
            k) purge_after_days="$OPTARG";;
            n) name_tag_create=true;;
            p) purge_snapshots=true;;
            *) echo "Error with Options Input. Cause of failure is most likely that an unsupported parameter was passed or a parameter was passed without a corresponding option." 1>&2 ; exit 64;;

        esac
    done

#sets date variable
date_current=`date -u +%Y-%m-%d`
#sets the PurgeAfter tag to the number of days that a snapshot should be retained
if [[ -n $purge_after_days ]]
    then
    #if the date_binary is not set, call the date_command_get function
    if [[ -z $date_binary ]]

        then date_command_get
    fi
    purge_after_date=`$date_command`
    echo "Snapshots taken by $app_name will be eligible for purging after the following date: $purge_after_date."
fi

#get_EBS_List gets a list of EBS instances for which a snapshot is desired. The list of EBS instances depends upon the selection_method that is provided by user input
get_EBS_List

#the loop below is called once for each volume in $ebs_backup_list - the currently selected EBS volume is passed in as "ebs_selected"

for ebs_selected in $ebs_backup_list
do
    ec2_snapshot_description="ec2ab_${ebs_selected}_$date_current"
    ec2_create_snapshot_result=`ec2-create-snapshot --region $region -d $ec2_snapshot_description $ebs_selected 2>&1`
    if [[ $? != 0 ]]
        then echo -e "An error occured when running ec2-create-snapshot. The error returned is below:\n$ec2_create_snapshot_result" 1>&2 ; exit 70
    else
        ec2_snapshot_resource_id=`echo "$ec2_create_snapshot_result" | cut -f 2`
        echo "Snapshots successfully created for volume $volumeid" | mailx -s "Snapshots successfully created for $volumeid" eric@mydomain.com
    fi  

    create_EBS_Snapshot_Tags
done

#if purge_snapshots is true, then run purge_EBS_Snapshots function
if $purge_snapshots
    then echo "Snapshot Purging is Starting Now."
    purge_EBS_Snapshots
fi

cron log

Oct 23 10:24:01 ip-10-130-153-227 CROND[28214]: (root) CMD (root (ec2-automate-backup.sh -v "vol-myvolumeid" -k 3;))
Oct 23 10:24:01 ip-10-130-153-227 CROND[28215]: (root) CMD (date  >> /root/logs/crontab.log;)
Oct 23 10:25:01 ip-10-130-153-227 CROND[28228]: (root) CMD (date  >> /root/logs/crontab.log;)
Oct 23 10:25:01 ip-10-130-153-227 CROND[28229]: (root) CMD (date >> /root/logs/crontab2.log)
Oct 23 10:26:01 ip-10-130-153-227 CROND[28239]: (root) CMD (date  >> /root/logs/crontab.log;)
Oct 23 10:27:01 ip-10-130-153-227 CROND[28247]: (root) CMD (root (ec2-automate-backup.sh -v "vol-myvolumeid" -k 3;))
Oct 23 10:27:01 ip-10-130-153-227 CROND[28248]: (root) CMD (date  >> /root/logs/crontab.log;)
Oct 23 10:28:01 ip-10-130-153-227 CROND[28263]: (root) CMD (date  >> /root/logs/crontab.log;)

Oct 23 10:29:01 ip-10-130-153-227 CROND[28275]: (root) CMD (date  >> /root/logs/crontab.log;)
Oct 23 10:30:01 ip-10-130-153-227 CROND[28292]: (root) CMD (root (ec2-automate-backup.sh -v "vol-myvolumeid" -k 3;))
Oct 23 10:30:01 ip-10-130-153-227 CROND[28293]: (root) CMD (date  >> /root/logs/crontab.log;)
Oct 23 10:30:01 ip-10-130-153-227 CROND[28294]: (root) CMD (date >> /root/logs/crontab2.log)
Oct 23 10:31:01 ip-10-130-153-227 CROND[28312]: (root) CMD (date  >> /root/logs/crontab.log;)
Oct 23 10:32:01 ip-10-130-153-227 CROND[28319]: (root) CMD (date  >> /root/logs/crontab.log;)
Oct 23 10:33:01 ip-10-130-153-227 CROND[28325]: (root) CMD (date  >> /root/logs/crontab.log;)
Oct 23 10:33:01 ip-10-130-153-227 CROND[28324]: (root) CMD (root (ec2-automate-backup.sh -v "vol-myvolumeid" -k 3;))
Oct 23 10:34:01 ip-10-130-153-227 CROND[28345]: (root) CMD (date  >> /root/logs/crontab.log;)
Oct 23 10:35:01 ip-10-130-153-227 CROND[28362]: (root) CMD (date  >> /root/logs/crontab.log;)

Oct 23 10:35:01 ip-10-130-153-227 CROND[28363]: (root) CMD (date >> /root/logs/crontab2.log)

Mails to root

From root@ip-10-130-153-227.ap-southeast-1.compute.internal  Tue Oct 23 06:00:01 2012
Return-Path: 
Date: Tue, 23 Oct 2012 06:00:01 GMT
From: root@ip-10-130-153-227.ap-southeast-1.compute.internal (Cron Daemon)
To: root@ip-10-130-153-227.ap-southeast-1.compute.internal

Subject: Cron  root ec2-automate-backup.sh -v "vol-myvolumeid" -k 3
Content-Type: text/plain; charset=UTF-8
Auto-Submitted: auto-generated
X-Cron-Env: 
X-Cron-Env: 
X-Cron-Env: 
X-Cron-Env: 
X-Cron-Env: 
Status: R


/bin/sh: root: command not found

Update
Apparently, undx is right about cron having limited environment (x-cron-env)
The backup script is located in /usr/local/bin/ which is not included in PATH of x-cron-env.
This opens a new horizon to me as I assume cron simply execute the shell script the same way the script executes itself. Off to making the AWS CLI Tools work with cron now

As Colin Johnson has updated his script and as some of you pointed out, there was actually no need for me to edit such script. I simply needed to use it properly (with proper understanding of CRON and a bit of AWS CLI Tools). Doesn't make sense for amateur like myself to attempt to change the awesome script by guru like Colin either.

The backup script has worked beautifully me ever since. Strongly recommend AWS Missing Tools to everyone.

Answer

On a default UNIX-like system, cron has a minimal environment defined.
Usually HOME, SHELL, LOGNAME are defined and PATH is set to /bin.

You have two solutions:

enter the full path of your script. ie: /home/me/bin/ec2-automate-backup.sh

alter the PATH environment variable.

If you want to receive email from the cron daemon define the MAILTO variable.

MAILTO=me@example.org
PATH=/bin:/home/me/bin
*/5 * * * * ec2-automate-backup.sh -v "vol-fb2fbcdf" -k 3
# OR
*/5 * * * * /path/to/script/ec2-automate-backup.sh -v "vol-fb2fbcdf" -k 3

smartctl - e2fsck found no errors, but S.M.A.R.T. self test fails

I have an external Freecom HDD (Samsung drive inside), connected via USB and using it's own power supply.

The disk disconnects itself on random interval of time (from few hours up to a month). I tend to blame the operating system because the same drive had no problems working connected to a USB port of TP-Link router.

Anyway, just to be sure I performed extended SMART self test using smartctl and completed with Completed: read failure 30% message.
So, I performed an additional test using e2fsck. It took me a whole night to perform the test on this 1.5TB drive. The test completed with no errors at all.

I am pretty confused - should I trust on SMART self-test or on e2feck results? Also, the SMART health status is ' PASSED' and short self test is fine, too.

The usual suspects are checked - the USB cable has been replaced with new one and the external power is checked.
Ideas?
Should I buy new drive or I am safe? Is SMART or e2fsck a more reliable source of health status?

Answer

The SMART result means that the hard drive is failing, it is very likely to fail completely, soon, and you should retire it as a matter of urgency. The fact that e2fsck returns no errors means that the incipient failures have not yet corrupted your data (or, to be more precise, have not yet corrupted the file system which houses your data: e2fsck doesn't check every bit of the data).

You may find, when you copy all the data off that drive - which you should do today - that you can read all the data. This means that the blocks which have so far failed and are unreadable do not hold any of the data; they are just unallocated blocks. The emptier the FS, and the fewer the failures, the more likely you are to get away with it.

You may also find that the copying tool fails on reading one or more blocks which make up a file. If this happens, you'll have to shrug, and regard that file as corrupted. You'll also need to use a tool that is tolerant of block read errors and won't just stop dead when it hits the first one. I prefer dumpe2fs, but I'm an ancient relic.

However you slice and dice it, the famous google paper is clear: smartctl errors are a strong predictor of imminent failure. Get your data off that drive today, and if at all possible, get it out of service. And if it turns out you get it all, consider buying a lottery ticket: you're a lucky person!

Monday, May 30, 2016

redirect - NGINX URL with parameter rewrite to URL without parameter

I have an NGINX background on my server. I cannot figure out why my rules doesn't work.

How can I redirect or rewrite it so I could open URL with the same content, for example:

http://www.example.com/my_dir/value/

If I have this URL, for example:

http://www.example.com/my_dir/filename.php?parameter=value

I tried some solutions but they didn't worked:

Case #1:
location /my_dir/ {
  if ($args ~* "filename.php/?parameter=value1") {
      rewrite ^ http://www.example.com/my_dir/$arg_param1? last;
  }
}


Case #2:
rewrite ^/my_dir/(.*)$ /filename.php?parameter=$1 last;

Case #3:
location /my_dir {
    try_files $uri $uri/ my_dir/filename.php?$args;
}

Case #4:

location /my_dir/ {
    rewrite ^/my_dir/filename.php?parameter=(-+([a-zA-Z0-9_-]+))?\$ /filename.php?parameter=$1 last;
}

Is that possible? Do I have to configure my PHP file for get it working?

Any ideas?

Thanks!

Answer

Visit : domain.com/folder/abc
--> domain.com/folder/filenam.php?url=abc

//this is inside the http block
map $request_uri $request_basename {
    ~/(?[^/?]*)(?:\?|$)
$captured_request_basename;
}


server {
    [snip]
    location /folder {
        try_files folder folder/ /folder/filename.php?url=$request_basename;
    }
}

Sunday, May 29, 2016

vmware esxi - Are there any benefits to using a Distributed vSwitch for iSCSI?

I am designing our vSphere farm - we'll be migrating from ESX 3.5 to 4.1. I plan to set up a new farm using ESXi 4.1, and move the Virtual Machines on the 3.5 farm into it by shutdown, then import.

In ESX 3.5 there is no distributed networking, so each host has a vSwitch connected to my SAN NICs, and a port group for the vmkernel.

In vSphere (ESXi 4.1) I have the extra option to set up a distributed vSwitch and distributed port groups for vmkernel to access iSCSI storage.

Is there any benefit to this, or should I stick to non-distributed networking for iSCSI.

Answer

Not the answer you want to hear but we've had so many problems with distributed switches, even with 4.1, that we don't use them at all, let alone for iSCSI. As for benefits, none leap to mind.

Hot swapping a dead drive with hardware raid on linux

I have a server with 4 SATA hot-swappable drives and a 3Ware 9650SE-4LPML hardware RAID controller.

The server runs Ubuntu 10.04.3 LTS and I use tw_cli to control the RAID array.

So, a disk died and after a reboot the controller had kicked it out of the array:

# tw_cli /c0 show    

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -       -       64K     5587.9    RiW    ON     

VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   1.82 TB   SATA  0   -            ST2000DL003-9VT166  

p1    OK             u0   1.82 TB   SATA  1   -            ST2000DL003-9VT166  
p3    OK             u0   1.82 TB   SATA  3   -            ST2000DL003-9VT166

A replacement disk is arriving today.

My question is, can the system administrator just replace the drive or do I need to run some commands first to tell the array I'm changing the drive over.

Also, what commands should I run afterwards for it to see the drive needs to be re-added?
The man page for tw_cli says the following:

   /cx rescan [noscan]
   This command instructs the controller to rescan all ports and reconstitute all units. The controller will
   update its list of ports (attached disks), and visits every DCB (Disk Configuration Block) in order to re-
   assemble its view and awareness of logical units.  Any newly found unit(s) or drive(s) will be listed.
   noscan is used to not inform the OS of the unit discovery. Default is to inform the OS.

Does that sound like what I should do?

Thanks in advance.

Answer

Since the controller has already evicted the drive, it is no longer part of the physical array.

This means you can safely swap it out for a new one.

You should run /c0 rescan after plugging in the new drive, followed by /c0 show; you should see the new drive mentioned as a Spare.

Then you can give the command to rebuild (this will happen automatically with the default configuration settings)

php - crontab - how to create in Terminal?

I'm trying to run a scheduled task on my shared Linux server using the crontab via Terminal on my Mac.

I can login to my ssh OK, and can view directories etc with ease.

I try and create/edit my crontab by using...

crontab -e

But i get the response

no crontab for [username] - using an empty one

So trying to then add my line to set my schedule up, and Terminal just doesn't seem to respond.

10 * * * * /home/username/www/myphpfile.php

Any clues what on earth I'm doing wrong? Or a link to a clear step-by-step tutorial?

I have checked with the webhost that this is supported, it's just undocumented!

All I want to do is run a php script every so often through the day (to check for updates to an XML file).

Saturday, May 28, 2016

freenas - Samba Read-only shadow copies

We are currently running a FreeNAS 11.1-U6 machine (Which runs Samba 4.7.0) which we use for development. Since sometimes files get overwritten or deleted by accident before it gets committed in our VCS i have setup automatic snapshots every 15 minutes.

To make it easier for users to restore those files themselves instead of having to contact me i wanted to expose them using shadow copies, unfortunately this means that they can restore an entire directory to a previous state which I want to prevent.

My question is if there a way to disable the "Restore" option serverside to only allow the viewing of these copies?

domain name system - DNS Problems with .pt configuration

I have a hosting service with aplus.net, however I had a need to register a .pt domain, but aplus doesnt have this service, so I contacted a .pt registar, called hostingbug.net, to do this.

So now I'm owner of a .pt domain, click.pt. I gave hostingbug the aplus nameservers needed for propagation.

And here began the problems. When hostingbug tried to configure, the following error was displayed:

<<>> DiG 9.3.6-P1-RedHat-9.3.6-4.P1.el5_4.2 <<>> @64.29.151.221 click.pt. NS +norecurse
 (1 server found)
 global options:  printcmd
 connection timed out no servers could be reached

And they told me that aplus.net needed to create a new dns zone for .pt domains.

So I contacted aplus.net, and they didnt understand this issue, and told me that everything was fine with their servers, and sent me back to hostingbug.

So I'm felling like a ping pong ball right now... How can I configure this "new dns zone" for .pt domains? Anyone have clue of how to do this so I can tell them? Or should I cancel aplus services?

Thanks in advance

Friday, May 27, 2016

windows server 2003 - things to check prior to moving all FSMO roles to a new domain controller before decommission original old domain controller

need to move a domain controller, the first in the forest that holds all the fsmo roles, to another location in a clients building. it will require turning off this dc, call it dc1 for this question. i want to transfer the fsmo roles to a new domain controller call the new one dc2. dc2 is already on the network and has been promoted and made a domain controller, it's dns settings are set and it is also a global catalog (GC). the work is scheduled to be done after hours and i am also planning on moving the dhcp server to dc2.

i am looking for a best practices checklist of things to verify prior to moving fsmo roles and turning dc1 off? as far as i know there are no issues with replication between the dc's. my biggest worry is if i turn on dc1 after moving it and i have hardware issues or boot issues, i would rather move the fsmo roles to a known good one that is a few months old (dc2) vs still using a 5yr old box (dc1), this is part of my migration strategy too.

thanks for the help.

Answer

Some of these tips are just general AD health checks.

Run dcdiag on both domain controllers to ensure everything is clean.

Verify that the FSMO roles are where you assume they are. (KB234790)

Look through Active Directory Sites & Services and confirm that you only see the servers and sites you expect to be there.

Ensure that your migration target (dc2) is a global catalog server.

Look through DNS to ensure that both domain controllers are properly registered, and there are no extra records lying around, especially in _msdcs.

If you are handling DHCP with Windows Server, you should deauthorize the original server before demoting it with dcpromo.

Why do you want to turn dc1 off? If there is only a single domain controller in your domain and it fails, you'll have a big problem on your hands. Consider leaving dc1 running as backup.

linux - Force SFTP/SCP to copy files with a remote directory's permission

I am having a problem with SFTP and SCP where files that are copied are not inheriting the permissions of the remote parent directory. I have seen similar questions on serverfault where the umask of the SFTP/SCP session is modified, however, that does not necessarily solve my issue, as some directories will need to have different permissions than others. Thus, I do not to have a default umask set.

Thus, I want to force the copied file to have the permissions that are set by the parent directory on the remote system. Basically, I want SCP/SFTP to work the same way that cp works without the -p option. Currently SFTP/SCP is mimicking cp -p behavior.

Here is what I want to have happen:

1.) User wants to copy file foo.txt with permissions:


-rw-------. 1 user user        0 Feb 29 09:08 foo.txt

2.) User uses SCP to copy foo.txt to the server under directory /bar. /bar has permissions (setgid is set):


drwxrws---+  3 root usergroup  4096 Feb 28 12:19 bar

3.) /bar has the following facl's set:


user::rwx
group::rwx
group:usergroup:rwx

default:user::rwx
default:group::rwx
default:group:usergroup:rw-

4.) foo.txt should have the following permissions (and facl):

-rw-rw----+ 1 user usergroup     0 Feb 29 09:33 foo.txt
user::rw-
group::rwx     #effective: rw-

group:usergroup:rw-

5.) Instead, foo.txt has permissions:

-rw-------+ 1 user usergroup     0 Feb 29 09:36 foo.txt
user::rw-
group::rwx          #effective:---
group:usergroup:rw-     #effective:---

Is there an easy way to get the file obtain expected permissions above?

Also, do my facl's make sense, or are they redundant?

EDIT: Fixed post to display properly. (Serverfault's code and numbering doesn't work too well. I needed to wrap things in pre tags.)

Answer

From the man page: "When scp copies files, the destination files are created with certain file attributes. By default, the file permissions adhere to a umask on the destination host, and the
modification and last access times will be the time of the copy. Alternatively, you can tell scp to duplicate the permissions and timestamps of the original files. The -p option accomplishes this."

For example:

sftp user@server:backup <<< $'put -rp mysoftware/mysqldump'

Based on the info above I am not sure what you want is possible without using a cronjob to set permissions. The umask option only applies to files being created. Setgid only applies to the group. I am sure you can write a job that sets the permissions recursively but that is all I can think of that would result in what you described unless I misunderstood the question.

Thursday, May 26, 2016

linux - All HTTP connections refused from public IP yet all others are fine?

I wasn't sure how to describe this problem in the title, so I'll give you a little background on it.

I have a machine running Ubuntu 12.10 Server with Apache 2.2.22 (all packages at latest release version). It has been running very smoothly for about 7 months now, and I've only encountered minor problems with it every so often, mostly as a result of my idiocy.

Last night, I was bored and decided to try out Subterfuge, to see whether or not I could make it work. I was well aware of the fact that its web interface could interfere with my Apache setup, so I installed it and fired it up on TCP port 81. I then immediately checked my sites, and they were all still running, meaning that subterfuge was not interfering with Apache. I started it up and did some scanning, intermittently checking to make sure that my sites were still up; which they were.

After I finished messing around with Subterfuge, I stopped it and tried to CTRL + C the subterfuge process (which was 'attached' to my SSH session). It wouldn't quit, no matter how many times I hit CTRL+C. I closed my SSH session, and logged back in. When I logged back in, all was well and good, but I noticed that my sites were no longer responding (giving a 'connection refused error'). I didn't try to fix it, I just went to bed, figuring that the problem might resolve itself.

Fast forward to this morning. Sites still weren't responding, and I SSH'd into the server to check things out. When I logged in it told me that there were 3 zombie processes, which I then saw when I opened htop. The zombies were all subterfuge processes. I quit them normally (using SIGTERM in htop), and they went away like good little zombies. My sites still weren't responding to connections.

At this point I assumed that this was a problem with my router configuration, so I logged in. I changed the port that was forwarded to an HTTP alt, then an arbitrary private one. That still didn't solve the problem.

A summary of how things stand right now:

The server is responding to all other types of connections (SSH, HTTPS, VNC)

The server won't respond to HTTP requests from the Internet.

The server will respond to HTTP requests from the local network.

The server will respond to HTTPS requests from the Internet.

An Nmap scan shows port 80 as 'open' when scanning from the Internet.

IPTABLES, ufw, etc. are all disabled.

I've rebooted the router

Another server on the network responds to requests

UPDATE:

I haven't changed any Apache configurations, but now, when you visit the server from a browser, it responds with the default "It works!" page, saying that there is no content in the document root. All of the files that were there before are definitely still there, so I'm about to look into the possibility of permissions problems (or the document root was somehow changed). At this point, it's probably just the Chinese hacking me.

Can anyone think of anything? Thank you for your time in advance, I know this was a very long question.

Answer

In the end, I was unable to solve this problem. This system actually ended up bringing our network to a grinding halt; I think it's infected with some malware. I shut the machine down (after 57 days of uptime), unplugged it from the network, and have yet to bring it back up. I'm currently trying to work out a solution on some really old machines involving XenServer or ESXi. Thanks for the suggestions, but I think this machine has a much bigger problem.

redhat - How to diagnose very bad and slow ext3 behavior?

I'm managing an old admin server running Redhat WS4 update 3, and we have an ext3 volume where I had a large (30GB) sqlite database mounted on /opt.

Everytime I do large queries/inserts into this database it raises the IO waits so high that we cannot login to the server anymore, nor sudo to another user, nor edit a crontab file (vi never quits).

I'm replacing sqlite with mysql and while backuping the 19GB or mysql directory, I encounter the same problem.

Note that these operations are done with a regular user.
The server is a PROLIANT DL385 G1 with kernel 2.6.9-34.ELsmp in 64bits.

I'm now considering remounting the volume as ext2 to see if journaling is the source to my problem, but I honestly don't really know what to check next.

Every serious file copy ends up blocking the server for other users trying to log on, and server gets back to normal once the copy ends.

I need to pointers to where to look next to explain such behavior (old disk getting slower ? bad kernel with known bug ? corrupt journaling which triggers thousands of superfluous reads/writes ? etc...)

Thanks in advance.

remote access - Why can't I remotely connect to MySQL running on my Azure VPS?

I've installed MySQL 5.5 on a Microsoft Azure VPS (2012 Server). I created a user account, told it to allow any host and created an EndPoint on port 3306. I can remote into the VPS using RDP and connect locally just fine.

I also allowed the MySQL installer to create the needed firewall rules in Windows and it appears to be valid (allowing incoming TCP to port 3306).

What am I missing?

The error is "Can't connect to MySQL Server (4) 2003"

And, I can't seem to find the MySQL log files.

Also, this is a dump of my mysql.user table:

Host        User    Password

-------------------------------------------------------------
localhost   root    *0735EF2BBDF0D50FB780E9B58198D7260991E311
127.0.0.1   root    *0735EF2BBDF0D50FB780E9B58198D7260991E311
::1         root    *0735EF2BBDF0D50FB780E9B58198D7260991E311
%           cbmeeks *3CA067E806B5BB2DF87A89D517BBAF80DD22C27A

Neither cbmeeks or root work remotely.

Answer

You need to have Host specific entries in mysql.user table.

mysql> use mysql;
Database changed
mysql> select Host,User,Password from user;
+------------+------+-------------------------------------------+
| Host       | User | Password                                  |
+------------+------+-------------------------------------------+
| localhost  | root | *62395BB52702DE50773EBF629DD4AE90F07FFD94 |
| sgeorge-mn | root |                                           |
| 127.0.0.1  | root | *62395BB52702DE50773EBF629DD4AE90F07FFD94 |

| ::1        | root |                                           |
| localhost  |      |                                           |
| sgeorge-mn |      |                                           |
| localhost  | suku | *EAF5C8242B88A14545BB61062D64CA5207DD1A37 |
+------------+------+-------------------------------------------+
7 rows in set (0.01 sec)

So, follow me to create such a user with permission for remote access.

[ADDED] If you already have such a user, then try to follow the steps mentioned on me

active directory - Domain Controller DNS Best Practice/Practical Considerations for Domain Controllers in Child Domains

I'm setting up several child domains in an existing Active Directory forest and I'm looking for some conventional wisdom/best practice guidance for configuring both DNS client settings on the child domain controllers and for the DNS zone replication scope.

Assuming a single domain controller in each domain and assuming that each DC is also the DNS server for the domain (for simplicity's sake) should the child domain controller point to itself for DNS only or should it point to some combination (primary VS. secondary) of itself and the DNS server in the parent or root domain? If a parent>child>grandchild domain hierarchy exists (with a contiguous DNS namespace) how should DNS be configured on the grandchild DC?

Regarding the DNS zone replication scope, if storing each domain's DNS zone on all DNS servers in the domain then I'm assuming a DNS delegation from the parent to the child needs to exist and that a forwarder from the child to the parent needs to exist. With a parent>child>grandchild domain hierarchy then does each child forward to the direct parent for the direct parent's zone or to the root zone? Does the delegation occur at the direct parent zone or from the root zone?

If storing all DNS zones on all DNS servers in the forest does it make the above questions regarding the replication scope moot? Does the replication scope have some bearing on the DNS client settings on each DC?

Answer

I'd rather go with a single domain using your two servers for redundancy than to use two separate domains on single (point of failure) servers. What is driving your choice to go with a parent/child domain forest? You could just use the DNS space for the child domain since you said it's contiguous without requiring an AD domain unless you have security boundary concerns.

Against my better judgement, I'll answer the question assuming you have two servers for each domain (four total) -- just subtract two of the servers for your case.

Option 1.

With your desire to keep DNS local to the domain, the parent DCs point to one another and the child DCs point to one another as well. The easier configuration would be to use a scope that replicates the DNS zone forest-wide.

You have parent.local (or whatever) as your top-level and child.parent.local as the subdomain.

AD will replicate both domains throughout the forest making DNS resolution simple. You'll see overlap on a given DNS server with the zones, but Windows deals with that.

Option 2.

Another option is to not do forest-wide replication in which case I would simply configure a forwarder on the child DCs to send everything up to the parent DCs DNS, but on the parent you'll need to create a delegation for the child subdomain back down.

Wednesday, May 25, 2016

apache 2.2 - Repeated requests on our server?

I encountered something strange in the access log of our Apache server which I cannot explain. Requests for webpages that I or my colleagues do from the office's Windows network get repeated by another IP (that we don't know) a couple of seconds later.

The user agent repeating our requests is

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET
CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR
3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)

Has anyone an idea?

Update:
I've got some more information now.

The referrer of the replicate is set to the URL I requested before and it's not the exact same request as the protocol version is changed from 'HTTP/1.1' to 'HTTP/1.0'.

The IP is not just one, it's just one of a subnet (80.40.134.*).

It's just the first request to a resource that's get repeated, so it seems the "spy" is building up some kind of cache of visited places.

The repeater is also picky. I tried randomly URLs with different HTTP status codes and different file patterns. 301s and 200s are redone, 404s not. Image extensions seem to be ignored.

While doing my tests I discovered that this behavior seems to be common as I found other clients visiting just after the first requests:

66.249.73.184 - - [25/Oct/2012:10:51:33 +0100] "GET /foobar/ HTTP/1.1" 200 10952 "-" "Mediapartners-Google"

50.17.125.180 - - [25/Oct/2012:10:51:33 +0100] "GET /foobar/ HTTP/1.1" 200 41312 "-" "Mozilla/5.0 (compatible; proximic; +http://www.proximic.com/info/spider.php)"

I wasn't aware about this practice, so I don't see it that much as a threat anymore. I still want to find out who this is, so any further help is appreciated. I'll try later if this also happens if I query some other server where I have access to the access logs and will update here then.

Answer

After some digging, I was able to determine that accesses from 80.40.134.* originated from TalkTalk Virus Alerts. This ISP is monitoring its users' web traffic and scanning the pages its users visit for viruses/malware.

Mediapartners-Google is just Google AdSense. You placed Google ads on your page, so Google is reading the page text in order to provide ads targeted to the content.

The final example you gave is self-documenting; try visiting the URL given.

Tuesday, May 24, 2016

Remove 1 Disk From 4 Disk RAID 5 Array

Im using a PERC 3/DC controller to run a RAID 5 array using 4 hard disks.
I am hoping to change this to 3 disks in the array and 1 hot spare.
Is it possible to remove 1 disk from the array, reconfigure it as a hot spare, then reconfigure the RAID 5 array to use 3 disks WITHOUT loosing any data?
I have backups but I would rather just reconfigure it without going through the hassle of restoring data.
Thanks!

Answer

No, and if you have 4 disks and want two spares you are better off performance wise with a 4 disk RAID 10 instead of a 3 disk RAID 5 + hot spare. Either way your array will have 2 disks worth of usable capacity but the RAID 10 will have better write speeds and takes less time to recover from a failed disk.

Assuming the RAID 5 doesn't contain your boot volume, I'd go ahead and backup the data, reconfigure to a RAID 10, and restore the data.

    RAID Level Comparison for arrays up to 8 drives.
Features                RAID 0  RAID 1   RAID 5   RAID 6   RAID 10
Min Drives                2       2        3        4        4
Data Protection           0       1        1        2        2 (Up to one disk failure in each sub-array)
Read Performance         High    High     High     High     High
Write Performance        High   Medium    Low      Low     Medium
Degraded Read Perf       N/A    Medium    Low      Low      High

Degraded Write Perf      N/A     High     Low      Low      High
Capacity Utilization %   100     50      67-87    50-75     50

Monday, May 23, 2016

centos7 - I've updated CentOS 7 and now nginx doesn't start automaticaly on system boot

I've updated my server's OS (CentOS 7) through the command "yum update" and now NGINX doesn't start during system boot.

At startup log shows:

Sep 15 05:41:30 server_hostname nm-dispatcher: req:1 'hostname': start running ordered scripts...
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7261] dhcp-init: Using DHCP client 'dhclient'
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7261] manager: rfkill: WiFi enabled by radio killswitch; enabled by state file
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7261] manager: rfkill: WWAN enabled by radio killswitch; enabled by state file
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7262] manager: Networking is enabled by state file

Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7264] Loaded device plugin: NMBondDeviceFactory (internal)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7264] Loaded device plugin: NMBridgeDeviceFactory (internal)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7265] Loaded device plugin: NMDummyDeviceFactory (internal)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7265] Loaded device plugin: NMEthernetDeviceFactory (internal)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7265] Loaded device plugin: NMInfinibandDeviceFactory (internal)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7265] Loaded device plugin: NMIPTunnelDeviceFactory (internal)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7265] Loaded device plugin: NMMacsecDeviceFactory (internal)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7295] Loaded device plugin: NMMacvlanDeviceFactory (internal)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7295] Loaded device plugin: NMTunDeviceFactory (internal)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7295] Loaded device plugin: NMVethDeviceFactory (internal)

Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7295] Loaded device plugin: NMVlanDeviceFactory (internal)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7295] Loaded device plugin: NMVxlanDeviceFactory (internal)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7351] Loaded device plugin: NMWifiFactory (/usr/lib64/NetworkManager/libnm-device-plugin-wifi.so)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7377] Loaded device plugin: NMTeamFactory (/usr/lib64/NetworkManager/libnm-device-plugin-team.so)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7389] device (lo): link connected
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7396] manager: (lo): new Generic device (/org/freedesktop/NetworkManager/Devices/1)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7413] manager: (dummy0): new Dummy device (/org/freedesktop/NetworkManager/Devices/2)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7428] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/3)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7448] device (eth0): state change: unmanaged -> unavailable (reason 'managed') [10 20 2]
Sep 15 05:41:30 server_hostname kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready

Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7474] device (eth0): link connected
Sep 15 05:41:30 server_hostname kernel: 8021q: adding VLAN 0 to HW filter on device eth0
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7500] manager: (teql0): new Generic device (/org/freedesktop/NetworkManager/Devices/4)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7550] manager: (tunl0): new IPTunnel device (/org/freedesktop/NetworkManager/Devices/5)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7565] manager: (gre0): new IPTunnel device (/org/freedesktop/NetworkManager/Devices/6)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7603] manager: (gretap0): new Generic device (/org/freedesktop/NetworkManager/Devices/7)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7613] manager: (ip_vti0): new Generic device (/org/freedesktop/NetworkManager/Devices/8)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7623] manager: (ip6_vti0): new Generic device (/org/freedesktop/NetworkManager/Devices/9)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7638] manager: (sit0): new IPTunnel device (/org/freedesktop/NetworkManager/Devices/10)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7648] manager: (ip6tnl0): new Generic device (/org/freedesktop/NetworkManager/Devices/11)

Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7668] manager: (ip6gre0): new Generic device (/org/freedesktop/NetworkManager/Devices/12)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7864] device (eth0): state change: unavailable -> disconnected (reason 'none') [20 30 0]
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7883] policy: auto-activating connection 'System eth0'
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7914] device (eth0): Activation: starting connection 'System eth0' (5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03)
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7920] device (eth0): state change: disconnected -> prepare (reason 'none') [30 40 0]
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7924] manager: NetworkManager state is now CONNECTING
Sep 15 05:41:30 server_hostname NetworkManager[3277]:   [1505464890.7942] device (eth0): state change: prepare -> config (reason 'none') [40 50 0]
Sep 15 05:41:30 server_hostname firewalld[3266]: WARNING: ICMP type 'beyond-scope' is not supported by the kernel for ipv6.
Sep 15 05:41:30 server_hostname firewalld[3266]: WARNING: beyond-scope: INVALID_ICMPTYPE: No supported ICMP type., ignoring for run-time.
Sep 15 05:41:30 server_hostname firewalld[3266]: WARNING: ICMP type 'failed-policy' is not supported by the kernel for ipv6.

Sep 15 05:41:30 server_hostname firewalld[3266]: WARNING: failed-policy: INVALID_ICMPTYPE: No supported ICMP type., ignoring for run-time.
Sep 15 05:41:30 server_hostname firewalld[3266]: WARNING: ICMP type 'reject-route' is not supported by the kernel for ipv6.
Sep 15 05:41:30 server_hostname firewalld[3266]: WARNING: reject-route: INVALID_ICMPTYPE: No supported ICMP type., ignoring for run-time.
Sep 15 05:41:30 server_hostname kernel: random: crng init done
Sep 15 05:41:31 server_hostname NetworkManager[3277]:   [1505464891.0265] device (eth0): state change: config -> ip-config (reason 'none') [50 70 0]
Sep 15 05:41:31 server_hostname NetworkManager[3277]:   [1505464891.0313] device (eth0): state change: ip-config -> ip-check (reason 'none') [70 80 0]
Sep 15 05:41:31 server_hostname NetworkManager[3277]:   [1505464891.0323] device (eth0): state change: ip-check -> secondaries (reason 'none') [80 90 0]
Sep 15 05:41:31 server_hostname NetworkManager[3277]:   [1505464891.0325] device (eth0): state change: secondaries -> activated (reason 'none') [90 100 0]
Sep 15 05:41:31 server_hostname NetworkManager[3277]:   [1505464891.0326] manager: NetworkManager state is now CONNECTED_LOCAL
Sep 15 05:41:31 server_hostname NetworkManager[3277]:   [1505464891.0480] manager: NetworkManager state is now CONNECTED_SITE

Sep 15 05:41:31 server_hostname NetworkManager[3277]:   [1505464891.0481] policy: set 'System eth0' (eth0) as default for IPv4 routing and DNS
Sep 15 05:41:31 server_hostname NetworkManager[3277]:   [1505464891.0493] device (eth0): Activation: successful, device activated.
Sep 15 05:41:31 server_hostname NetworkManager[3277]:   [1505464891.0498] manager: startup complete
Sep 15 05:41:31 server_hostname NetworkManager[3277]:   [1505464891.0502] manager: NetworkManager state is now CONNECTED_GLOBAL
Sep 15 05:41:31 server_hostname nm-dispatcher: req:2 'up' [eth0]: new request (3 scripts)
Sep 15 05:41:31 server_hostname nm-dispatcher: req:2 'up' [eth0]: start running ordered scripts...
Sep 15 05:41:31 server_hostname nm-dispatcher: req:3 'connectivity-change': new request (3 scripts)
    Sep 15 04:33:05 server_hostname systemd: Started Network Manager Wait Online.
    Sep 15 04:33:05 server_hostname systemd: Starting LSB: Bring up/down networking...
    Sep 15 04:33:05 server_hostname nm-dispatcher: req:3 'connectivity-change': start running ordered scripts...

    Sep 15 04:33:05 server_hostname network: Bringing up loopback interface:  [  OK  ]
    Sep 15 04:33:05 server_hostname network: Bringing up interface eth0:  [  OK  ]
    Sep 15 04:33:05 server_hostname systemd: Started LSB: Bring up/down networking.
    Sep 15 04:33:05 server_hostname systemd: Reached target Network.
    Sep 15 04:33:05 server_hostname systemd: Starting Network.
    Sep 15 04:33:05 server_hostname systemd: Reached target Network is Online.
    Sep 15 04:33:05 server_hostname systemd: Starting Network is Online.
    Sep 15 04:33:05 server_hostname systemd: Starting nginx - high performance web server...
    Sep 15 04:33:05 server_hostname systemd: Starting LSB: Start and stop FastCGI processes...
    Sep 15 04:33:05 server_hostname systemd: Starting MySQL Community Server...

    Sep 15 04:33:05 server_hostname systemd: Starting The PHP FastCGI Process Manager...
    Sep 15 04:33:05 server_hostname systemd: Starting Fail2Ban Service...
    Sep 15 04:33:05 server_hostname systemd: Starting Dynamic System Tuning Daemon...
    Sep 15 04:33:05 server_hostname systemd: Starting Pure-FTPd FTP server...
    Sep 15 04:33:05 server_hostname systemd: Starting /etc/rc.d/rc.local Compatibility...
    Sep 15 04:33:05 server_hostname systemd: Starting OpenSSH server daemon...
    Sep 15 04:33:05 server_hostname nginx: nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
    Sep 15 04:33:05 server_hostname nginx: nginx: [emerg] bind() to [**My IPv6 address here**]:80 failed (99: Cannot assign requested address)
    Sep 15 04:33:05 server_hostname nginx: nginx: configuration file /etc/nginx/nginx.conf test failed
    Sep 15 04:33:05 server_hostname systemd: nginx.service: control process exited, code=exited status=1

    Sep 15 04:33:05 server_hostname systemd: Failed to start nginx - high performance web server.
    Sep 15 04:33:05 server_hostname systemd: Unit nginx.service entered failed state.
    Sep 15 04:33:05 server_hostname systemd: nginx.service failed.
    Sep 15 04:33:05 server_hostname systemd: Started /etc/rc.d/rc.local Compatibility.
Sep 15 05:41:31 server_hostname spawn-fcgi: Starting spawn-fcgi: [  OK  ]
Sep 15 05:41:31 server_hostname systemd: Started LSB: Start and stop FastCGI processes.
Sep 15 05:41:31 server_hostname systemd: Started Dynamic System Tuning Daemon.
Sep 15 05:41:31 server_hostname fail2ban-client: 2017-09-15 05:41:31,760 fail2ban.server         [3616]: INFO    Starting Fail2ban v0.9.6
Sep 15 05:41:31 server_hostname fail2ban-client: 2017-09-15 05:41:31,761 fail2ban.server         [3616]: INFO    Starting in daemon mode
Sep 15 05:41:31 server_hostname mysqld_safe: 170915 05:41:31 mysqld_safe Logging to '/var/log/mysqld.log'.

Sep 15 05:41:31 server_hostname mysqld_safe: 170915 05:41:31 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
Sep 15 05:41:31 server_hostname systemd: Started Fail2Ban Service.
Sep 15 05:41:32 server_hostname systemd: Started MySQL Community Server.
Sep 15 05:41:32 server_hostname systemd: Started ISPConfig DC Sync.
Sep 15 05:41:32 server_hostname systemd: Starting ISPConfig DC Sync...
Sep 15 05:41:32 server_hostname NetworkManager[3277]:   [1505464892.7411] policy: set 'System eth0' (eth0) as default for IPv6 routing and DNS
Sep 15 05:41:33 server_hostname systemd: Started The PHP FastCGI Process Manager.
Sep 15 05:41:33 server_hostname systemd: Reached target Multi-User System.
Sep 15 05:41:33 server_hostname systemd: Starting Multi-User System.
Sep 15 05:41:33 server_hostname systemd: Starting Update UTMP about System Runlevel Changes...

Sep 15 05:41:33 server_hostname systemd: Started Update UTMP about System Runlevel Changes.
Sep 15 05:41:33 server_hostname systemd: Startup finished in 1.741s (kernel) + 3.868s (userspace) = 5.609s.

Curiously, I can start NGINX manually after booting, and it works fine.

CentOS 7.4.1708

NGINX 1.12.1

/etc/systemd/system/multi-user.target.wants/nginx.service

[Unit]
Description=nginx - high performance web server
Documentation=http://nginx.org/en/docs/
After=network-online.target remote-fs.target nss-lookup.target
Wants=network-online.target

[Service]
Type=forking

PIDFile=/var/run/nginx.pid
ExecStartPre=/usr/sbin/nginx -t -c /etc/nginx/nginx.conf
ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID

[Install]
WantedBy=multi-user.target

Answer

I found the answer in the Linode's Forum.

Create the file: /etc/sysctl.d/80-network.conf with the content:

net.ipv4.ip_nonlocal_bind = 1
net.ipv6.ip_nonlocal_bind = 1

...to allow daemons to bind to non-existing network interfaces.

Source: https://forum.linode.com/viewtopic.php?f=19&t=15219

linux - What is the difference between unlink and rm?

Is unlink any faster than rm?

Answer

Both are a wrapper to the same fundamental function which is an unlink() system call.

To weigh up the differences between the userland utilies.

rm(1):

More options.

More feedback.

Sanity checking.

A bit slower for single calls as a result of the above.

Can be called with multiple arguments at the same time.

unlink(1):

Less sanity checking.

Unable to delete directories.

Unable to recurse.

Can only take one argument at a time.

Marginally leaner for single calls due to it's simplicity.

Slower when compared with giving rm(1) multiple arguments.

You could demonstrate the difference with:

$ touch $(seq 1 100)

$ unlink $(seq 1 100)
unlink: extra operand `2'

$ touch $(seq 1 100)
$ time rm $(seq 1 100)

real    0m0.048s
user    0m0.004s
sys     0m0.008s


$ touch $(seq 1 100)
$ time for i in $(seq 1 100); do rm $i; done

real    0m0.207s
user    0m0.044s
sys     0m0.112s

$ touch $(seq 1 100)
$ time for i in $(seq 1 100); do unlink $i; done


real    0m0.167s
user    0m0.048s
sys     0m0.120s

If however we're talking about an unadulterated call to the system unlink(2) function, which I now realise is probably not what you're accounting for.

You can perform a system unlink() on directories and files alike. But if the directory is a parent to other directories and files, then the link to that parent would be removed, but the children would be left dangling. Which is less than ideal.

Edit:

Sorry, clarified the difference between unlink(1) and unlink(2). Semantics are still going to differ between platform.

Sunday, May 22, 2016

networking - CentOS and PXE Boot

I am working on setting up a PXE boot server based upon a minimal installation of CentOS 5.6. I have installed CentOS 5.6 with the the first CD using the minimal installation method. Next, I followed the tutorial listed here to get the PXE components installed and running. This all seems to be working correctly. However, I would like to take advantage of using a graphic as part of the boot screen. It is my understanding that vesamenu.c32 is required to do this. However, I am not able to locate this file within the installed package for CentOS. I have tried in the past to copy over the vesamenu.c32 file from here without success. I was curious how I could get a graphical boot menu setup with this environment.

Can the Intel C600-series SAS controller RAID SATA drives?

(Sorry if this is off-topic - technically this is a home workstation setting, but is regarding mostly-server technology.)

I have a GA-X79S-UP5-WIFI motherboard and four ST3000DM001 drives, and the IRST manager isn't letting me put the drives in a RAID (it lists them, but the option to select the C600-series controller is simply grayed out).

Have I done a mistake? Do I need SAS drives to RAID them on the SAS controller?

Answer

Apparently, it's possible to just create the RAID array on a different controller (e.g. SATA) and plug the drives into the SAS controller's SATA ports. The array will be recognized. I can only explain the inability to create a RAID array on the SAS ports as an IRST bug.

amazon ec2 - AWS - Status check fails when loading AMI created from snapshot

I have a running micro instance using an 8GB EBS that I've customized.

To my understand there are two ways I can create an AMI from this.

1) EC2 console -> INSTANCES - Instances -> Right Click instance -> Create Image (EBS AMI)

2) EC2 console -> ELASTIC BLOCK STORE - Volumes -> Right Click Volume -> Create Snapshot, then go to snapshots and Right "Create Image From Snapshot"

When I right click and Select "Launch Instance" from my list of private AMIs, I'm able to successfully launch an instance from the AMI generated from the first method. However whenever I try to launch an instance from the AMI generated by the 2nd method, the Status Checks show either 1/2 checks passed or 0/2 checks passed.

Why am I unable to launch an instance from an AMI generated from the snapshot?

Prevent data corruption on ext4/Linux drive on power loss

I have some embedded boards running American Megatrends bios with embedded linux as the OS. The problem I have is that the industrial flash ide's will be corrupted on power loss. I have them formatted as ext4. Whenever this happens, I can usually fix the flash with fsck, but this will not be possible in our deployments. I have heard that disabling the write-caching should help, but I can't figure out how to do it. Also, is there any thing else I should do?

More Info

The drive is a 4gb ide flash module.
I have one partition which is ext4. The O.S. is installed on that partition and grub is my bootloader.

fdisk -l shows /dev/sda as my flash module with /dev/sda1 as my primary partition.

After a power loss I usually cannot make it entirely through the boot init scripts.

When I mount the drive on another P.C. I run fsck /dev/sda1. It always shows messages like

"zero datetime on node 1553 ... fix (y)?"

I fix them and it boots fine until the next power loss.

When I get to the office tomorrow, I will post the actual output of fdisk -l

This is all I know about how the system works. I am not a systems guy, I am a Software Engineer that has a habit of getting into predicaments that are outside of his job description. I know how to format drives, install a bootloader, write software, and hack on an operating system.

Here is the output from dumpe2fs

#sudo dumpe2fs /dev/sda1
dumpe2fs 1.41.12 (17-May-2010)

Filesystem volume name:   VideoServer
Last mounted on:          /
Filesystem UUID:          9cba62b0-8038-4913-be30-8eb211b23d78
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    (none)
Filesystem state:         not clean
Errors behavior:          Continue

Filesystem OS type:       Linux
Inode count:              245760
Block count:              977949
Reserved block count:     48896
Free blocks:              158584
Free inodes:              102920
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      239

Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Fri Feb  4 15:12:00 2011
Last mount time:          Sun Oct  2 23:48:37 2011
Last write time:          Mon Oct  3 16:34:01 2011
Mount count:              2
Maximum mount count:      26

Last checked:             Tue Oct  4 07:44:50 2011
Check interval:           15552000 (6 months)
Next check after:         Sun Apr  1 07:44:50 2012
Lifetime writes:          21 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:           256
Required extra isize:     28
Desired extra isize:      28

Default directory hash:   half_md4
Directory Hash Seed:      249d2b79-1e20-49a3-b324-6cb631294a63
Journal backup:           inode blocks

Answer

The write cache has usually nothing to do with the BIOS, mostly there is no option for switching disk cache settings in there. With linux, using hdparm -W 0 should help.

The setting is persistent, so if you don't have hdparm to play around with in your production systems, you should be able to disable the disk write cache on a different system and replug the disk.

BTW: I'd second the idea of a non-writable root filesystem (so your system could boot in a kind of "recovery mode" and allow for remote access even if the writable filesystem is not mountable for some reason). And if you can change the hardware design, consider using mtd devices instead of IDE/SATA disks with a flash-aware filesystem like jffs2. We've been using this combination with several embedded devices (mostly VPN router solutions in the field) for several years with good results.

Update: the root of your problem seems to be that you are running an ext4 filesystem with journaling disabled - has_journal is missing from the Filesystem features list. Just shut down all services, check if anything still has open files using lsof +f -- /, remount your root partition read-only with mount -o remount,ro /, enable the journal with tune2fs -O has_journal /dev/sda1 and set up the "ordered" journal mode as the default mount option using tune2fs -o journal_data_ordered /dev/sda1 - you will have to re-run fsck (preferably from a rescue system) and remount root / reboot after this operation.

With these settings in place, the metadata is guaranteed to be recoverable from the journal even in the event of a sudden power failure. The actual data is also consistently written to disk, although you may see data of several seconds before the power outage lost on bootup. If this is not acceptable, you might consider using the tune2fs -o journal_data /dev/sda1 mount option with your filesystem - this would include all data written to disk in the journal - this obviously would give you better data consistency but at the cost of a performance penalty and a higher wear level on your SSD.

storage area network - Differences between HP and EMC SAN

I've managed Dell's EMC and Equallogic SANs in the past. I was recently put in charge of a Hewlett-Packard P2000 SAN. HP and EMC use slightly different terminology. I'd like to confirm I understand HP's terminology. Can you verify/correct some of the definitions below?

vDisk. A collection of physical disks grouped into a RAID array. EMC refers to these as "Storage Pools." vDisks and Storage Pools aren't exactly the same as "RAID groups," but they serve the same purpose.

Volume. A logical division of a vDisk, which is presented to a host as a single volume. EMC refers to these as "LUNs", and LUN is the proper term.

Storage Groups. Distinct from "Storage Pools," EMC SANs can define "Storage Groups," which can combine multiple LUNs into a single volume that's presented to hosts. I can't find an equivalent to this on HP SANs.

Global Spares. Physical hard drives that are not assigned to any vDisk. If a hard drive in a vDisk fails, the SAN automatically uses an available spare to replace the drive and make the vDisk fault tolerant. EMC refers to these as "hot spares." With both vendors, spares do not need to be assigned to a particular RAID group or vDisk. The SAN will use them for any failed hard drive.

Regarding Equallogic SANs: Equallogic arrays create one RAID group and one LUN from all available disks in the array. The administrator can select only: the type of RAID in the RAID group, and the number and size of Storage Groups presented to the hosts.

I think I have these terms right, but I'd like to verify with someone who's used both vendors' SANs. I'm especially concerned that I can't find HP's equivalent of Storage Groups. Surely HP has a way to combine multiple LUNs into one logical volume. Am I missing that setting somewhere?

Answer

If this P2000 is anything like my old MSA4400, then vDisks are volumes that are assigned to servers as LUNs, but under the covers, what is happening on the HP has little in common with a Clariion.

The way I remember it, the HP has a bunch of disks that it sets up in fixed groups (with RAID, I think), and then vDisks are created that live on these disks with their own virtual raid. So you could have one vDisk with virtually raid 10, meaning each block or chunk that comprised the vDisk would be saved twice, and another one with virtually raid 5, meaning that the chunks would get saved once, but would have distributed parity.

I'm a little hazy on the details about the actual disks, whether there was raid under all these vDisks or just a JBOD. I do remember that we had global spares, because we had several dozen disks of about 100 fail over the course of three weeks, and the system was able to take the hits until more than 9 were rebuilding at the same time.

Maybe someone with more recent and specific P2000 experience can chime in here and help, but this is what I remember.

Saturday, May 21, 2016

apache 2.2 - Always get a 403 Forbidden error

I'm getting a 403 Forbidden error on my client's server (which somebody else set up). I'm using Apache on CentOS. Since somebody else set up the server, and since the server is CentOS and not the Ubuntu I'm used to, I don't know how to fix the problem.

Any advice on how to troubleshoot this problem?

Edit: in the logs, I keep getting this series of notices/warnings:

[Mon Feb 06 09:45:43 2012] [notice] caught SIGTERM, shutting down
[Mon Feb 06 09:45:44 2012] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Mon Feb 06 09:45:44 2012] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Mon Feb 06 09:45:44 2012] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Mon Feb 06 09:45:44 2012] [warn] Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)
[Mon Feb 06 09:45:44 2012] [notice] Digest: generating secret for digest authentication ...

[Mon Feb 06 09:45:44 2012] [notice] Digest: done
[Mon Feb 06 09:45:44 2012] [notice] mod_bw : Memory Allocated 0 bytes (each conf takes 32 bytes)
[Mon Feb 06 09:45:44 2012] [notice] mod_bw : Version 0.8 - Initialized [0 Confs]
[Mon Feb 06 09:45:44 2012] [notice] mod_python: Creating 4 session mutexes based on 50 max processes and 0 max threads.
[Mon Feb 06 09:45:44 2012] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Mon Feb 06 09:45:44 2012] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?
[Mon Feb 06 09:45:44 2012] [warn] Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)
[Mon Feb 06 09:45:44 2012] [notice] Apache/2.2.19 (Unix) DAV/2 mod_fcgid/2.3.6 mod_python/3.2.8 Python/2.4.3 mod_ssl/2.2.19 OpenSSL/0.9.8f mod_perl/2.0.4 Perl/v5.8.8 configured -- resuming normal operations

Edit 2: here's my VirtualHost stuff:


    ServerAdmin jason@electricsasquatch.com
    DocumentRoot /var/www/html

Friday, May 20, 2016

linux - Stuck trying to track down the culprit for high CPU usage on a LAMP server

I'm running Apache on RedHat Enterprise 5.5 with PHP 5.1 from the Rackspace IUS community repository.

Occasionally, I have server spikes of 8+ load. top shows that httpd.worker is consuming 120-150% CPU but the hits don't appear to be coming that fast. I do notice that our craftysyntax URL seems to keep coming up when that happens.

I also have mod_security installed with one of the core rules sets.

I'm scrathcing my head about how to figure out what's causing the high CPU usage. My prime suspects are PHP and mod_security.

I have the following oprofile output form during a load spike, but it doesn't tell me which process is calling libpcre:

% opreport
CPU: Intel Architectural Perfmon, speed 2793.09 MHz (estimated)

Counted CPU_CLK_UNHALTED events (Clock cycles when not halted)
with a unit mask of 0x00 (No unit mask) count 100000

CPU_CLK_UNHALT...|
  samples|      %|

------------------
  7061182 91.9105 libpcre.so.0.0.1
   206901  2.6931 php-cgi
   142239  1.8514 mod_security2.so
   138121  1.7978 vmlinux
    53809  0.7004 libc-2.5.so
    20909  0.2722 libapr-1.so.0.2.7
    16585  0.2159 oprofiled
     9230  0.1201 oprofile

FYI, ldd shows that mod_security is linked to libpcre, but php-cgi is not (weird).

Answer

Attach ltrace the pid of the live process that's causing the issue. It looks like it's some bad regex, or a regex that gets called a lot. In either case, gotta localize it first. Remember to follow the forks.

Thursday, May 19, 2016

debian - Can't find process using a quarter of memory

I am running a small vServer with Debian 9 (stretch) and 2GB RAM.

For a few months I am somehow missing about 500MB RAM and which I cannot find out how they are used.

When I run free -h

              total     used      free    shared  buff/cache     available

Mem:           2.0G     1.0G      482M       66M        511M     764M
Swap:          1.0G       0B      1.0G

I can see that half of the memory is used, about a quarter is used for caches that can be freed if it is needed and the rest is free.

But when I now check my running processes I can only find about 500MB are used by my processes.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                   
 1458 mysql     20   0  927516 359260      0 S   0.0 17.5 108:31.26 mysqld                                                                                                    

  877 seafile   20   0  258032  94644   4876 S   0.0  4.6   0:22.92 python2.7                                                                                                 
  460 seafile   20   0  237156  85504   8036 S   0.0  4.2   0:13.14 python2.7                                                                                                 
  463 seafile   20   0  233096  82236   8956 S   0.0  4.0   0:05.79 python2.7                                                                                                 
  875 seafile   20   0  244356  81644   5408 S   0.0  4.0   0:19.50 python2.7                                                                                                 
  461 seafile   20   0  232464  81032   8232 S   0.0  3.9   0:03.58 python2.7                                                                                                 
 4054 www-data  20   0  374264  54976  45128 S   0.0  2.7   0:07.58 php-fpm7.0                                                                                                
 4026 www-data  20   0  372652  54840  44408 S   0.0  2.7   0:10.36 php-fpm7.0                                                                                                
 1865 seafile   20   0 1704520  52828     16 S   0.0  2.6   3:45.10 seaf-server                                                                                               
 4021 www-data  20   0  370836  48880  40468 S   0.0  2.4   0:10.83 php-fpm7.0                                                                                                
 1975 seafile   20   0  129128  47156   1944 S   0.0  2.3   0:02.06 python2.7                                                                                                 

21106 netdata   20   0  189412  36600   2660 S   0.3  1.8  16:07.50 netdata                                                                                                   
 1604 lukas     20   0  107132  34860   2736 S   0.0  1.7   2:07.91 gunicorn

I have no clue where the remaining 500MB Memory are.

I suspected the kernel but running slabtop shows it uses only about 80MB.

 Active / Total Size (% used)       : 76599.41K / 79594.40K (96.2%)

I am running netdata on my server, which shows a nice overview of the memory usage per category and indeed it shows my missing 530MB. I played around with the grouping and created a new group called testing with the following config and it includes my missing memory (in /etc/netdata/apps_groups.conf)

testing: systemd*

Why does systemd (or something netdata categorizes as systemd) use about a quarter of my memory? After a reboot it uses only 50MB but after some time it always uses about 500MB.

Answer

systemd is the process with pid 1. All the other processes are children of systemd.

In netdata, all processes not matched by the groups given are assigned to category other.

Since netdata assigns processes to groups respecting their hierarchy, the match systemd* just moved most processes from other to testing. So systemd* is not a really useful match.

If I were you, I would attempt to understand which applications the server runs and add groups for these specific applications.

Additionally you can enable the systemd charts in netdata. This will allow you see the Services section in netdata. Depending on the debian version, a reboot may be required to enable memory reporting for them (you may need to add a kernel boot parameter - check the wiki).

linux - How does my DHCP server know my machine's hostname when I didn't define one in dhclient.conf?

I'm trying to resolve some funky DNS issues related to DHCP on our network (I suspect we have more than one DHCP server running at the moment), and while trying to figure that out, I noticed something strange with a new server I just set up.

The server in question is a Xen virtual machine running Ubuntu 9.10 Server. The physical Xen server is also on our network, and when I booted up the VM for the first time in Xen (I imported it from a local Virtualbox VM running on my machine, where it was running on a different network), it got a DHCP lease from our office network and everything was good.

I checked the dhclient.eth0.leases file to see what got configured, and saw that the old DHCP lease from the previous network the machine was on was still there, as well as the new DHCP lease for the office network it's currently connected to. There are two things I noticed right away:

The old DHCP lease information from the previous network doesn't have an options host-name line, which I take to mean the original VirtualBox version of the VM wasn't sending this option to the DHCP server. Or does this mean the old DHCP server didn't support the DHCP host name option? It was using VirualBox's internal DHCP server at the time...

The new DHCP lease information does have an options host-name line, which includes the correct, current hostname for the server ("fozzie"). If I understand correctly, this means the server sent it's hostname to the DHCP server on our network.

There are a number of things I don't understand about all this.

First, I did not change dhclient.conf for the server at any point; it's using the default configuration. In fact it contains the following line verbatim:

send host-name ""

So my first question is, how in the heck did it know to send the server's real hostname if the configuration isn't set up to send it in the first place?

Second, why did the first DHCP lease (for the old network) not include option host-name, but the second DHCP lease (on the new network) did include it, if I haven't touched any of the configuration files?

All I did was export the original VirtualBox machine as an OVF, and then import it into XenServer, so how did it magically configure my hostname via DHCP if it's not even configured with the actual hostname in dhclient.conf?

Third: When I run hostname, the server returns fozzie.our.domain, but dhclient.eth0.leases says the hostname option was set to fozzie (no domain). How did it know to strip off the domain?

Wednesday, May 18, 2016

networking - SSH client refuses to connect to any server (Connection refused)

Out of nowhere, my SSH client decided to stop connecting to any server (whether it is an SSH server or not) and output the following error:

ssh: connect to host x.x.x.x port 22: Connection refused

Mind you, this is not a simple setup error. I am fully aware that an SSH server is running on hosts that I try to connect to, evident by the following:

➜  ~ sudo nmap -sS 192.168.0.200
Password:

Starting Nmap 7.01 ( https://nmap.org ) at 2018-04-01 10:58 IST
Nmap scan report for 192.168.0.200
Host is up (0.0035s latency).
Not shown: 996 closed ports
PORT    STATE SERVICE
22/tcp  open  ssh   <========================
80/tcp  open  http

81/tcp  open  hosts2-ns
443/tcp open  https
MAC Address: B8:27:EB:7C:24:64 (Raspberry Pi Foundation)

Nmap done: 1 IP address (1 host up) scanned in 0.47 seconds

I can also telnet into it successfully (no "connection refused" message)

➜  ~ telnet 192.168.0.200 22

Trying 192.168.0.200...
Connected to 192.168.0.200.
Escape character is '^]'.
SSH-2.0-OpenSSH_7.4p1 Raspbian-10+deb9u2

However...

➜  ~ ssh srv@192.168.0.200
ssh: connect to host 192.168.0.200 port 22: Connection refused

This shows that it is not an error with my server, or even my machine per se, but with the SSH client as it is unable to connect to a port which is clearly open and an SSH service port. Interestingly, it fails with ANY address, regardless of whether it's even reachable or not, which further makes me think this is a problem with my SSH client and not firewall:

➜  ~ ssh 1.1.1.1
ssh: connect to host 1.1.1.1 port 22: Connection refused
➜  ~ ssh 23.23.23.23
ssh: connect to host 23.23.23.23 port 22: Connection refused
➜  ~ ssh 232.221.231.3
ssh: connect to host 232.221.231.3 port 22: Connection refused

➜  ~ ssh 192.168.0.0
ssh: connect to host 192.168.0.0 port 22: Connection refused
➜  ~ ssh 123.123.0.1
ssh: connect to host 123.123.0.1 port 22: Connection refused

How is this even possible? SSH doesn't even attempt the connection and claims "connection refused". What can possibly be the culprit?

Additional info:

➜  ~ uname -a
Darwin MacBook-Air.local 17.4.0 Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64 x86_64

Answer

It turned out to be an easy fix. If there is anyone else going through this, consider the following...

Earlier, I had enabled SOCKS proxying in the macOS Network Preferences > Advanced ... > Proxies > SOCKS Proxy. It looked like this:

Long story short, this server was no longer running, so proxy connections to loopback were unsuccessful. The reason Chrome, Telnet, and Nmap worked was because they did not respect the macOS SOCKS proxy setting. Any applications which actually respected these settings (such as ssh) were unable to access the Internet, which is why SSH connections were failing.

I'm not 100% sure why SSH came up with "connection refused" instead of some generic "unreachable" error, but I know that the SOCKS proxy was the reason. If the same happens to you, try checking your proxy settings!

Tuesday, May 17, 2016

windows server 2003 - List processes with their launching command line

I'm looking for a Windows feature or third-party tool that can produce a list of active processes (as in the task manager) with the command line used to start each process.

e.g. if I launch "php.exe -q script.php" in a command line, during the execution of my process, I'd like to see this command in the list and not only "php.exe"

Tasklist, process explorer, taskinfo... can't give this information and/or make it available in a text format. Do you know if such tools/features exist?

Thanks

Answer

wmic process list full /format:htable > wmic_task_list.html

wmic process get Name,ProcessId,CommandLine /format:table > wmic_task_list.txt

See wmic process list /format /? or wmic process get /format /? for a list of valid output formats.