February 2017

Tuesday, February 28, 2017

hacking - How do I deal with a compromised server?

This is a Canonical Question about Server Security - Responding to Breach Events (Hacking)
See Also:

Canonical Version
I suspect that one or more of my servers is compromised by a hacker, virus, or other mechanism:

What are my first steps? When I arrive on site should I disconnect the server, preserve "evidence", are there other initial considerations?

How do I go about getting services back online?

How do I prevent the same thing from happening immediately again?

Are there best practices or methodologies for learning from this incident?

If I wanted to put a Incident Response Plan together, where would I start? Should this be part of my Disaster Recovery or Business Continuity Planning?

Original Version

2011.01.02 - I'm on my way into work at 9.30 p.m. on a Sunday because our server has been compromised somehow and was resulting in a

DOS attack on our provider. The servers access to the Internet
has been shut down which means over 5-600 of our clients sites are now
down. Now this could be an FTP hack, or some weakness in code
somewhere. I'm not sure till I get there.

How can I track this down quickly? We're in for a whole lot of
litigation if I don't get the server back up ASAP. Any help is
appreciated. We are running Open SUSE 11.0.

2011.01.03 - Thanks to everyone for your help. Luckily I WASN'T the only person responsible for this server, just the nearest. We managed
to resolve this problem, although it may not apply to many others in a
different situation. I'll detail what we did.

We unplugged the server from the net. It was performing (attempting to
perform) a Denial Of Service attack on another server in Indonesia,
and the guilty party was also based there.

We firstly tried to identify where on the server this was coming from,

considering we have over 500 sites on the server, we expected to be
moonlighting for some time. However, with SSH access still, we ran a
command to find all files edited or created in the time the attacks
started. Luckily, the offending file was created over the winter
holidays which meant that not many other files were created on the
server at that time.

We were then able to identify the offending file which was inside the
uploaded images folder within a ZenCart website.

After a short cigarette break we concluded that, due to the files
location, it must have been uploaded via a file upload facility that
was inadequetly secured. After some googling, we found that there was
a security vulnerability that allowed files to be uploaded, within the
ZenCart admin panel, for a picture for a record company. (The section
that it never really even used), posting this form just uploaded any
file, it did not check the extension of the file, and didn't even
check to see if the user was logged in.

This meant that any files could be uploaded, including a PHP file for

the attack. We secured the vulnerability with ZenCart on the infected
site, and removed the offending files.

The job was done, and I was home for 2 a.m.

The Moral
- Always apply security patches for ZenCart, or any other CMS system for that matter. As when security updates are released, the whole
world is made aware of the vulnerability.

- Always do backups, and backup your backups.
- Employ or arrange for someone that will be there in times like these. To prevent anyone from relying on a panicy post on Server
Fault.

Answer

It's hard to give specific advice from what you've posted here but I do have some generic advice based on a post I wrote ages ago back when I could still be bothered to blog.

Don't Panic

First things first, there are no "quick fixes" other than restoring your system from a backup taken prior to the intrusion, and this has at least two problems.

It's difficult to pinpoint when the intrusion happened.

It doesn't help you close the "hole" that allowed them to break in last time, nor deal with the consequences of any "data theft" that may also have taken place.

This question keeps being asked repeatedly by the victims of hackers breaking into their web server. The answers very rarely change, but people keep asking the question. I'm not sure why. Perhaps people just don't like the answers they've seen when searching for help, or they can't find someone they trust to give them advice. Or perhaps people read an answer to this question and focus too much on the 5% of why their case is special and different from the answers they can find online and miss the 95% of the question and answer where their case is near enough the same as the one they read online.

That brings me to the first important nugget of information. I really do appreciate that you are a special unique snowflake. I appreciate that your website is too, as it's a reflection of you and your business or at the very least, your hard work on behalf of an employer. But to someone on the outside looking in, whether a computer security person looking at the problem to try and help you or even the attacker himself, it is very likely that your problem will be at least 95% identical to every other case they've ever looked at.

Don't take the attack personally, and don't take the recommendations that follow here or that you get from other people personally. If you are reading this after just becoming the victim of a website hack then I really am sorry, and I really hope you can find something helpful here, but this is not the time to let your ego get in the way of what you need to do.

You have just found out that your server(s) got hacked. Now what?

Do not panic. Absolutely do not act in haste, and absolutely do not try and pretend things never happened and not act at all.

First: understand that the disaster has already happened. This is not the time for denial; it is the time to accept what has happened, to be realistic about it, and to take steps to manage the consequences of the impact.

Some of these steps are going to hurt, and (unless your website holds a copy of my details) I really don't care if you ignore all or some of these steps, that's up to you. But following them properly will make things better in the end. The medicine might taste awful but sometimes you have to overlook that if you really want the cure to work.

Stop the problem from becoming worse than it already is:

The first thing you should do is disconnect the affected systems from the Internet. Whatever other problems you have, leaving the system connected to the web will only allow the attack to continue. I mean this quite literally; get someone to physically visit the server and unplug network cables if that is what it takes, but disconnect the victim from its muggers before you try to do anything else.

Change all your passwords for all accounts on all computers that are on the same network as the compromised systems. No really. All accounts. All computers. Yes, you're right, this might be overkill; on the other hand, it might not. You don't know either way, do you?

Check your other systems. Pay special attention to other Internet facing services, and to those that hold financial or other commercially sensitive data.

If the system holds anyone's personal data, immediately inform the person responsible for data protection (if that's not you) and URGE a full disclosure. I know this one is tough. I know this one is going to hurt. I know that many businesses want to sweep this kind of problem under the carpet but the business is going to have to deal with it - and needs to do so with an eye on any and all relevant privacy laws.

However annoyed your customers might be to have you tell them about a problem, they'll be far more annoyed if you don't tell them, and they only find out for themselves after someone charges $8,000 worth of goods using the credit card details they stole from your site.

Remember what I said previously? The bad thing has already happened. The only question now is how well you deal with it.

Understand the problem fully:

Do NOT put the affected systems back online until this stage is fully complete, unless you want to be the person whose post was the tipping point for me actually deciding to write this article. I'm not going to link to that post so that people can get a cheap laugh, but the real tragedy is when people fail to learn from their mistakes.

Examine the 'attacked' systems to understand how the attacks succeeded in compromising your security. Make every effort to find out where the attacks "came from", so that you understand what problems you have and need to address to make your system safe in the future.

Examine the 'attacked' systems again, this time to understand where the attacks went, so that you understand what systems were compromised in the attack. Ensure you follow up any pointers that suggest compromised systems could become a springboard to attack your systems further.

Ensure the "gateways" used in any and all attacks are fully understood, so that you may begin to close them properly. (e.g. if your systems were compromised by a SQL injection attack, then not only do you need to close the particular flawed line of code that they broke in by, you would want to audit all of your code to see if the same type of mistake was made elsewhere).

Understand that attacks might succeed because of more than one flaw. Often, attacks succeed not through finding one major bug in a system but by stringing together several issues (sometimes minor and trivial by themselves) to compromise a system. For example, using SQL injection attacks to send commands to a database server, discovering the website/application you're attacking is running in the context of an administrative user and using the rights of that account as a stepping-stone to compromise other parts of a system. Or as hackers like to call it: "another day in the office taking advantage of common mistakes people make".

Why not just "repair" the exploit or rootkit you've detected and put the system back online?

In situations like this the problem is that you don't have control of that system any more. It's not your computer any more.

The only way to be certain that you've got control of the system is to rebuild the system. While there's a lot of value in finding and fixing the exploit used to break into the system, you can't be sure about what else has been done to the system once the intruders gained control (indeed, its not unheard of for hackers that recruit systems into a botnet to patch the exploits they used themselves, to safeguard "their" new computer from other hackers, as well as installing their rootkit).

Make a plan for recovery and to bring your website back online and stick to it:

Nobody wants to be offline for longer than they have to be. That's a given. If this website is a revenue generating mechanism then the pressure to bring it back online quickly will be intense. Even if the only thing at stake is your / your company's reputation, this is still going generate a lot of pressure to put things back up quickly.

However, don't give in to the temptation to go back online too quickly. Instead move with as fast as possible to understand what caused the problem and to solve it before you go back online or else you will almost certainly fall victim to an intrusion once again, and remember, "to get hacked once can be classed as misfortune; to get hacked again straight afterward looks like carelessness" (with apologies to Oscar Wilde).

I'm assuming you've understood all the issues that led to the successful intrusion in the first place before you even start this section. I don't want to overstate the case but if you haven't done that first then you really do need to. Sorry.

Never pay blackmail / protection money. This is the sign of an easy mark and you don't want that phrase ever used to describe you.

Don't be tempted to put the same server(s) back online without a full rebuild. It should be far quicker to build a new box or "nuke the server from orbit and do a clean install" on the old hardware than it would be to audit every single corner of the old system to make sure it is clean before putting it back online again. If you disagree with that then you probably don't know what it really means to ensure a system is fully cleaned, or your website deployment procedures are an unholy mess. You presumably have backups and test deployments of your site that you can just use to build the live site, and if you don't then being hacked is not your biggest problem.

Be very careful about re-using data that was "live" on the system at the time of the hack. I won't say "never ever do it" because you'll just ignore me, but frankly I think you do need to consider the consequences of keeping data around when you know you cannot guarantee its integrity. Ideally, you should restore this from a backup made prior to the intrusion. If you cannot or will not do that, you should be very careful with that data because it's tainted. You should especially be aware of the consequences to others if this data belongs to customers or site visitors rather than directly to you.

Monitor the system(s) carefully. You should resolve to do this as an ongoing process in the future (more below) but you take extra pains to be vigilant during the period immediately following your site coming back online. The intruders will almost certainly be back, and if you can spot them trying to break in again you will certainly be able to see quickly if you really have closed all the holes they used before plus any they made for themselves, and you might gather useful information you can pass on to your local law enforcement.

Reducing the risk in the future.

The first thing you need to understand is that security is a process that you have to apply throughout the entire life-cycle of designing, deploying and maintaining an Internet-facing system, not something you can slap a few layers over your code afterwards like cheap paint. To be properly secure, a service and an application need to be designed from the start with this in mind as one of the major goals of the project. I realise that's boring and you've heard it all before and that I "just don't realise the pressure man" of getting your beta web2.0 (beta) service into beta status on the web, but the fact is that this keeps getting repeated because it was true the first time it was said and it hasn't yet become a lie.

You can't eliminate risk. You shouldn't even try to do that. What you should do however is to understand which security risks are important to you, and understand how to manage and reduce both the impact of the risk and the probability that the risk will occur.

What steps can you take to reduce the probability of an attack being successful?

For example:

Was the flaw that allowed people to break into your site a known bug in vendor code, for which a patch was available? If so, do you need to re-think your approach to how you patch applications on your Internet-facing servers?

Was the flaw that allowed people to break into your site an unknown bug in vendor code, for which a patch was not available? I most certainly do not advocate changing suppliers whenever something like this bites you because they all have their problems and you'll run out of platforms in a year at the most if you take this approach. However, if a system constantly lets you down then you should either migrate to something more robust or at the very least, re-architect your system so that vulnerable components stay wrapped up in cotton wool and as far away as possible from hostile eyes.

Was the flaw a bug in code developed by you (or a contractor working for you)? If so, do you need to re-think your approach to how you approve code for deployment to your live site? Could the bug have been caught with an improved test system, or with changes to your coding "standard" (for example, while technology is not a panacea, you can reduce the probability of a successful SQL injection attack by using well-documented coding techniques).

Was the flaw due to a problem with how the server or application software was deployed? If so, are you using automated procedures to build and deploy servers where possible? These are a great help in maintaining a consistent "baseline" state on all your servers, minimising the amount of custom work that has to be done on each one and hence hopefully minimising the opportunity for a mistake to be made. Same goes with code deployment - if you require something "special" to be done to deploy the latest version of your web app then try hard to automate it and ensure it always is done in a consistent manner.

Could the intrusion have been caught earlier with better monitoring of your systems? Of course, 24-hour monitoring or an "on call" system for your staff might not be cost effective, but there are companies out there who can monitor your web facing services for you and alert you in the event of a problem. You might decide you can't afford this or don't need it and that's just fine... just take it into consideration.

Use tools such as tripwire and nessus where appropriate - but don't just use them blindly because I said so. Take the time to learn how to use a few good security tools that are appropriate to your environment, keep these tools updated and use them on a regular basis.

Consider hiring security experts to 'audit' your website security on a regular basis. Again, you might decide you can't afford this or don't need it and that's just fine... just take it into consideration.

What steps can you take to reduce the consequences of a successful attack?

If you decide that the "risk" of the lower floor of your home flooding is high, but not high enough to warrant moving, you should at least move the irreplaceable family heirlooms upstairs. Right?

Can you reduce the amount of services directly exposed to the Internet? Can you maintain some kind of gap between your internal services and your Internet-facing services? This ensures that even if your external systems are compromised the chances of using this as a springboard to attack your internal systems are limited.

Are you storing information you don't need to store? Are you storing such information "online" when it could be archived somewhere else. There are two points to this part; the obvious one is that people cannot steal information from you that you don't have, and the second point is that the less you store, the less you need to maintain and code for, and so there are fewer chances for bugs to slip into your code or systems design.

Are you using "least access" principles for your web app? If users only need to read from a database, then make sure the account the web app uses to service this only has read access, don't allow it write access and certainly not system-level access.

If you're not very experienced at something and it is not central to your business, consider outsourcing it. In other words, if you run a small website talking about writing desktop application code and decide to start selling small desktop applications from the site then consider "outsourcing" your credit card order system to someone like Paypal.

If at all possible, make practicing recovery from compromised systems part of your Disaster Recovery plan. This is arguably just another "disaster scenario" that you could encounter, simply one with its own set of problems and issues that are distinct from the usual 'server room caught fire'/'was invaded by giant server eating furbies' kind of thing.

... And finally

I've probably left out no end of stuff that others consider important, but the steps above should at least help you start sorting things out if you are unlucky enough to fall victim to hackers.

Above all: Don't panic. Think before you act. Act firmly once you've made a decision, and leave a comment below if you have something to add to my list of steps.

storage area network - Clean LUN on HP MSA P2000 G3

I am trying to deploy a self-host oVirt Engine on a SAN HP MSA P2000 G3 LUN. I created a Vdisk and a volume via the Storage Management Utility (SAN GUI). The volume is already containing data because of previous tests and oVirt documentation specifies that

LUNs cannot be reused, as is, to create a storage domain or virtual disk. If you try to reuse the LUNs, the Administration Portal displays the following error message: Physical device initialization failed. Please check that the device is empty and accessible by the host.

The solution they give is to use
dd if=/dev/zero of=/dev/mapper/LUN_ID

Is there a better way to clean LUN via the MSA P2000 GUI or another command line faster than dd (I need to clean 0.5 TB)?

Monday, February 27, 2017

ssl - Http nginx behind https ELB and index auto redirect

I've got an Amazon ELB that listens for http and https traffic. Instances behind it have nginx on port 80. Http only. So ELB forwards both https and http to http of nginx.

When I make an https request to a folder like
https://example.com/folder
it is redirected automatically to a slash version
http://example.com/folder/
but protocol becomes http. Folder contains index.html file. I assume that's what makes it work.

Is there any way to fix this? I.e. make it redirect to https instead of http. I can not enforce https globally.

My config:

http {
map $http_x_forwarded_proto $thescheme {
    default $scheme;
    https https;
}


server {
    listen 80;
    server_name example.com;
    location / {
        root /var/www/html;
        add_header X1 $scheme;
        add_header X2 $thescheme;
        index index.html;
    }
}

}

I've added X1 and X2 headers to check what protocol nginx thinks is used and if X-Forwarded-Proto header is added by ELB. X1 is http, X2 is https for the example request.

I found that adding

if (-d $request_filename) {
    rewrite [^/]$ $thescheme://$http_host$uri/ permanent;
}

inside location helps but wondering if there's a better solution.

apache 2.2 - reverse proxy only from one internal server

I have configured a reverse proxy and is working ok for one internal server, for example our mail server.

Now, I like to know if it is possible to configure a reverse proxy for only one server /application (in this case our web intranet).

Our problem is Intranet call another aplication inside same intranet server and another internal servers, and the only way that I know to publish this resources is make a reverse proxy in our dmz apache for all apllications servers, but I like that from our DMZ reverse apache only intranet will be called, and other applications will be called by intranet server, and not reverse proxy.

I like to configure with this system for security reason, and only allow external access to one server.

I have configured With Debian Squeeze and apache 2.2

It is possible? How?

I'll try to give more information about my environment and what I'm trying to do.

I have a server in a dmz that has a domain published DNS Records https://intranet.domain.com with apache 2 configured as reverse proxy of a local intranet server (https://local_ip/intranet/)
config in dmz apache:


    ProxyHTMLLogVerbose On
    ProxyHTMLURLMap     ttps://local_ip/intranet/ /intranet/
    ProxyHTMLURLMap     / /intranet/
    #
    ProxyPass         ttps://local_ip/intranet/
    ProxyPassReverse  ttps://local_ip/intranet/

local intranet server has some other application called with relative paths
ttps://local_ip/app1 as (/app1)
ttps://local_ip/app2 as (/app2)
ttps://local_ip/app3 as (/app3)
and also other application locates on other server and called from intranet server with absolute paths, for example:
ttps://server4/app4
ttps://server5/app5

At this moment I can visit our intranet from external source (Internet) with https://intranet.domain.com/intranet/ but if I want to allow to visit other applications called from intranet server I have to configure every application to reverse proxy and allow comunication between dmz/reverse-proxy server and local_ip, server4, server5 ... and we like to allow only between dmz/reverse-proxy and local_ip (intranet server), because other applications only are called from intranet and we want to restrict ips that can visit others servers that are different of intranet server

If I configure every application app4, app5, app1, app2, app3 and /intranet in reverse proxy this works, but this requires to configure all aplication to reverse proxy and with connectivity from dmz

now works....

Internet<--->dmz/reverse-proxy<------>https://local_ip/intranet
                              <--------------------->/app1
                              <--------------------->/app2

                              <--------------------->/app3
                              <--------------------------------->https://server4/app4
                              <--------------------------------->https://server5/app5

I like to configure with this structure:

Internet<--->dmz/reverse-proxy<------>https://local_ip/intranet
                                                         <----->/app1
                                                         <----->/app2

                                                         <----->/app3
                                                         <----->https://server4/app4
                                                         <----->https://server5/app5

The reason for this configuration is to restrict direct access from external access to internal server, and only allow direct access to intranet server from proxy.
It is possible? How can I do this?
Last question, how can I hide urls when you are visitin intranet or other internal application from itnernet and only show https://intranet.domain.com as fixed url?

apache2 - why do we have to use ServerName for our websites in apache?

I would like to understand the need for ServerName in apache.

Lets say that I have a web site with IP 12.13.14.15.

In DNS zone I call it example.com and refer the Ip to it with an A record.

Why isn't this enough? It tells the www that whenever anyone writes example.com, it needs to go to 12.13.14.15 ip.

So why do I need to set it also in apache with NameServer attribute?

Answer

You don't need to use it, but it allows you to have multiple virtual hosts with different server names to listen on the same IP address.

In the early days of the web, before HTTP/1.1 was introduced, you only could host one domain on any IP address, as there was no way to differentiate the domains. HTTP/1.1 added the Host header that told the server which domain the client wants to talk to and the ServerName tells Apache for which domains the current virtual host should answer.

http://httpd.apache.org/docs/2.2/en/vhosts/name-based.html

Sunday, February 26, 2017

amazon elb - What's the options if you must provide a static IP endpoint for your service behind AWS ELB?

I have a web service with few EC2 servers behind a AWS ELB. As I understand, there is no way an ELB endpoint can have a static IP, because it is a DNS-based load balancing solution, and that is a design decision made by ELB team.

However, one of the 3rd party partners that we integrated with require IP of our servers due to their internal infrastructure limit (ya, I know).

After some research, I plan to prepare a SSL pass-through reverse proxy behind a static IP and pass requests to our ELB endpoint. This server will only be used by that client. I will probably use HAProxy because proxy server need to resolve IP of ELB dynamically.

Pros :

No changes to the infrastructure behind the AWS ELB.

No additional SSL certification required.

Cons :

Introduce single point of failure, but only affect that client.

The client need to assign the IP for our domain name by themselves, or we set up another domain name point to this server.

No previous experience set up such reserve proxy.

This is the only way I came up without change our infrastructure, I would like to hear your input, what would you do if you are in this situation ?

Answer

In the end, I go with the TCP SSL pass-through reverse proxy solution, here is my HAProxy config :

global
    log /dev/log local0

    log /dev/log local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    timeout connect         10s

    timeout client          1m
    timeout server          1m
    option tcplog
    log-format %ci:%cp\ [%t]\ %ft\ %b/%s/%si\ %Tw/%Tc/%Tt\ %B\ %ts\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq
    log global

resolvers dns
    nameserver google 8.8.8.8

# pass 80 port request to AWS ELB

listen http-proxy
    bind *:80
    mode tcp
    server elb my.elb.amazonaws.com:80 check resolvers dns

# pass 443 port request to AWS ELB
listen https-proxy
    bind *:443
    mode tcp
    server elb my.elb.amazonaws.com:443 check resolvers dns

Some explanation :

The proxy listen connections from port 80 and 443, then pass to the ELB endpoint.

HAProxy will resolve the IP dynamically with DNS I specify

Use TCP mode so there is no need to create extra SSL certification for the proxy

I did some tests and it works well.

However I did notice a downside (or just didn't know how to solve it)

Unable to put real client IP into HTTP header because it is in TCP mode

This may cause problems if you want to allow some IPs to access certain service.

virtualhost - Apache SSL virtual host using SNI ignores ServerName

I would like to serve SNI-enabled clients that send the wrong host name a 400 Bad Request, but Apache always serves the default virtual host in this situation. I cannot add a default virtual host that sends the 400 Bad Request status, because SNI-disabled clients will always get this virtual host.

It seems that the ServerName virtual host directive is ignored for SNI-disabled clients when I enable name based virtual hosts on an SNI-enabled Apache installation.

See the following virtual host configuration:

NameVirtualHost 192.168.4.46:443

        ServerName 192.168.4.46
        DocumentRoot /var/www/error-page/

        SSLEngine on
        SSLCertificateFile /path/to/certificate.crt
        SSLCertificateKeyFile /path/to/certificate.key



        ServerName test-ssl
        DocumentRoot /var/www/valid-website/

        SSLEngine on
        SSLCertificateFile /path/to/certificate.crt
        SSLCertificateKeyFile /path/to/certificate.key

If I use an SNI-disabled client, I would get the error page regardless of the Host: header I send in the request. Because I use the same certificate in both virtual hosts, I would like SNI-disabled clients to be able to still reach the second virtual host based on a match with ServerName.

If I'd switch the position of the virtual hosts, the website would be the default virtual host and then SNI-enabled clients would get the website instead of the error if they supply a wrong Host: in the headers.

So basically, how do I get Apache to serve an error for every wrong Host: header regardless of SNI support, while still serving the website when using an SNI-disabled client and serving the right virtual host when using an SNI-enabled client?

Answer

Shortest answer I believe will be:

MOD_REWRITE

Set a cond to inspect the host header; If it is not correct, forward the request off to the error page.

Your non-error site will be the default and the rewrite-rule will live in this virtual host.

If there do turn out to be 'shorter/easier' options, I suspect this option will provide clear logic into how requests should be processed. This solution assumes that you want ALL requests regardless of SNI status to provide a matching host header for a given virtual host.

iis - IIS7 remove file extension

I'm trying to configure IIS7 (Windows Server 2008) so that I can use URL's that do not include the file extension (i.e. somepage.php would become http://DOMAIN/somepage)

I'm using the URL rewrite tool in IIS7, and have the following rule:

Match URL

Requested URL - Matches the pattern

Using: Wildcards

Pattern: /*


No conditions.

Action

Action Type: Rewrite

Action properties - Rewrite URL: {R:1}.php, append query string

It's returning a 404 error. When I use the Test Pattern tool, it appears to work, and R:1 is what I want to add before the file extension. Any help or ideas would be greatly appreciated!

Thanks!

Answer

You need to change the rule to use regular expressions instead of wildcards. This is because the R:1 is a regexp backreference.

Also - you probably want your pattern to be (.*) for it to be R:1 (most likely it will be R:0 without the parens - R:0 is the backreference for "the entire match")

Check out http://learn.iis.net/page.aspx/497/user-friendly-url---rule-template/ for lots of info on the user friendly URL portion of URL Rewrite.

Saturday, February 25, 2017

domain name system - DNS Settings in Intranet with Windows Server 2012

We have one server in the Intranet, which is acting as DHCP, DNS, Domain Controller, IIS, ...

The server is in the DMZ and hosts some websites. There are several URLs directing to the server and with the former Netgear router it worked to access the external IP address and it would automatically redirect the DMZ.

With the new router, this doesn't work anymore, that's why I've added the domain names in a modified version (.local appended) to the forward lookup zone of the DNS server. This works now on the server itself, but doesn't work on any other device connected to the network.

As suggested in Intranet with local DNS resolution issues our DNS is the only one configured on the clients.

The server is running Windows Server 2012 and the clients are running Windows 7 and Windows 8. However running nslookup on the clients gets the correct address but performing a ping afterwards not anymore, also ipconfig /flushdns didn't help, any other thing I can try?

Thanks

Answer

The solution to the problem for me was to disable IPv6 on the client computers (which is enabled by default in Windows 7 and Windows 8). This was necessary even after I've added IPv6 records to the DNS.

linux - Irresponsive nginx while doing “nginx reload”

While reloading nginx, I started getting errors in messages log "possible SYN flooding on port 443", and it seems like nginx becomes completely irresponsive at that time (quite for a while), cause zabbix reports "nginx is down" with ping 0s. RPS at that time is about 1800.

But, server stays responsive on the other non-web ports (SSH, etc.)

Where should I look into and what configs (sysctl, nginx) should I show to find the root cause of this.

Thanks in advance.

UPD:

Some additional info:

$ netstat -tpn |awk '/nginx/{print $6,$7}' |sort |uniq -c
   3266 ESTABLISHED 31253/nginx
   3289 ESTABLISHED 31254/nginx
   3265 ESTABLISHED 31255/nginx
   3186 ESTABLISHED 31256/nginx

nginx.conf sample:

worker_processes  4;
timer_resolution 100ms;
worker_priority -15;
worker_rlimit_nofile 200000;

events {

  worker_connections  65536;
  multi_accept on;
  use epoll;
}

http {

  sendfile on;
  tcp_nopush on;
  tcp_nodelay on;


  keepalive_requests 100;
  keepalive_timeout  65;

}

custom sysctl.conf

net.ipv4.ip_local_port_range=1024 65535

net.ipv4.conf.all.accept_redirects=0
net.ipv4.conf.all.secure_redirects=0
net.ipv4.conf.all.send_redirects=0
net.core.netdev_max_backlog=10000
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_max_syn_backlog=20480
net.ipv4.tcp_synack_retries=2
net.ipv4.tcp_syn_retries=2
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216

net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.netfilter.nf_conntrack_max=1048576
net.ipv4.tcp_congestion_control=htcp
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_max_tw_buckets=1400000
net.core.somaxconn=250000

net.ipv4.tcp_keepalive_time=900
net.ipv4.tcp_keepalive_intvl=15
net.ipv4.tcp_keepalive_probes=5
net.ipv4.tcp_fin_timeout=10

UPD2

Under normal load at about 1800 RPS, when I set backlog on nginx to 10000 on 80 and 443 ports, and then reloaded nginx it became to use more RAM (3.8Gb out of my 4GB instance were used, and some workers were killed by OOM-killer), and with worker_priority at -15 load was over 6 (while my instance has 4 cores only). So, the instance was quite laggy, and I set worker_priority to -5, and backlog to 1000 for every port. For now, it uses less memory, and peak load was 3.8, but, nginx still becomes unresponsive for a minute or two after reload. So, the problem still persists.

Some netstat details:

netstat -tpn |awk '/:80/||/:443/{print $6}' |sort |uniq -c
      6 CLOSE_WAIT
     14 CLOSING
  17192 ESTABLISHED
    350 FIN_WAIT1
   1040 FIN_WAIT2
    216 LAST_ACK
    338 SYN_RECV

  52541 TIME_WAIT

Answer

If you have:

  keepalive_timeout  65;

I can imagine that it can take a while for connections to get terminated and workers restarted. I am not quite sure without looking in the code if nginx is waiting for them to expire once it gets a reload.

You could try lowering the value and see if it helps.

mac osx - How to make a Macintosh register a Hostname to the DHCP Server?

I have a MacBook running Snow Leopard in our company's internal network, which is basically a Windows domain network. Our TO department complains about my MacBook not specifying a name to the DHCP server. Accordingly, the MacBook isn't resolvable by anything else than its IP address from our Windows workstations.

I already

set the NetBIOS name in the Network settings in the OSX Control Panel

set the DHCP Client ID in the Network settings in the OSX Control Panel

set the "Computer Name" in Control Panel > Sharing.

set the hostname using sudo scutil --set HostName MACBOOK001 (and rebooted)

... but all of that didn't solve the problem.

Can anyone tell me how to make OSX register its hostname to the DHCP server so that it can be reachable e.g. using ping MACBOOK001

What is the oem-dell-he-esxi module?

I'm attempting to use the vSphere Update Manager for the first time to do an ESXi 4.1 to ESXi 5.0 upgrade.

The server shipped with Dell's ESXi pre-installed, and when I attempt to do an upgrade to ESXi 5, I get the following prompt:

Software modules oem-dell-he-esxi published by third party vendor(s) are installed on the host. Upgrading the host will remove these modules

Software modules oem-dell-he-esxi published by third party vendor(s) are installed on the host. Upgrading the host will remove these modules

I've googled high and low, but I can't find any reference to what on earth that module is, and what I will lose if I remove it in order to do the upgrade.

Does anyone know what the oem-dell-he-esxi module is?

Answer

Dell CIM provider modules for ESXi, nice to have if you want to monitor the host, and be able to see all of it's hardware specs and statuses in the ESXi console, and also remotely connect to it using IT-Assistant

Friday, February 24, 2017

raid5 - How should I sanitize a SAS drive that I removed from a RAID 5 Array?

I had a disk experience a predictive failure. As a result, I removed the drive and replaced it.

Now that I have the drive removed, I would like to sanitize it. What is the easiest way to go about that?

I have a usb to SATA converter, but I'm not sure if that would work. But all I really want to do is execute the secure erase command.

Answer

You're going to need a host with a SAS controller or an eraser appliance that supports SAS. As Chris S is pointing out, you can use SATA drives with SAS controllers, but not the other way around.

As far as secure deletion, one pass with zeros will stop the amateurs, if you're worried about a TLA, make it one pass with random data, more if you're paranoid. What you really have to watch out for is remapped sectors...if you can't wipe those and the data is sufficiently sensitive, you should probably physically destroy the platters as Kromey suggests. Outfits that deal with classified data usually have special arrangements with the manufacturers so that they only return the part of the drive case with the label when getting warranty replacements.

Note that Guttman patterns were designed for MFM/RLL drives...I think I've read somewhere that he recommends overwriting with random data is the best you can do, but I don't have a citation handy at the moment.

Note also that I'm assuming you're talking about a conventional mechanical drive...an SSD is a whole different deal.

Thursday, February 23, 2017

domain name system - DNS conflict with Exchange mail server

We have a client whose website we host, and it is creating a conflict with their Exchange mail server. Exchange's Autodiscover URL is reporting back the wrong IP.

The URL in question is https://clientdomain.com:443/autodiscover/autodiscover.xml, and the client's public DNS provider has MX.clientdomain.com and Autodiscover.clientdomain.com pointing to their mail server.

In this instance would it be better to alter our DNS server to point to their IPs, or to set up an XML file at the /autodiscover/ path?

Wednesday, February 22, 2017

domain name system - How to ping a server on a different network by the hostname only?

have two networks that I am connected to via my computer, two different routers and all. This computer is connected to a windows domain "Domain A", I can resolve the IP from any Computer Name from CMD/Ping on the same Domain A network. But on a different NIC on the same computer, I can't resolve Domain B computers just by the host name alone, I need the FQDN (like machinename1.domain.local) when pinging Domain B computers. How can I suppress the fact that I need the FQDN?

Hope this makes sense, this is the only way that I know of how to ask this question.

Answer

Add the name of Domain B to your DNS Suffix Search List. Either manually through IPv4 settings -> Advanced -> DNS, or automatically through Group Policy.

apache 2.2 - Redirect 301 fails with a path as destination

I'm using a large number of Redirect 301's which are suddenly failing on a new webserver.

We're in pre-production tests on the new webserver, prior to migrating the sites, but some sites are failing with 500 Internal Server Error. The content, both databases and files, are mirrored from the old to the new server, so we can test if all sites work properly.

I traced this problem to mod_alias' Redirect statement, which is used from .htaccess to redirect visitors and search engines from old content to new pages.

Apparently the Apache server requires the destination to be a full url, including protocol and hostname.

Redirect 301  /directory/  /target/                        # Not Valid
Redirect 301  /main.html   /                               # Not Valid
Redirect 301  /directory/  http://www.example.com/target/  # Valid

Redirect 301  /main.html   http://www.example.com/         # Valid

This contradicts the Apache documentation for Apache 2.2, which states:

The new URL should be an absolute URL
beginning with a scheme and hostname,
but a URL-path beginning with a slash
may also be used, in which case the

scheme and hostname of the current
server will be added.

Of course I verified that we're using Apache 2.2 on both the old and the new server. The old server is a Gentoo box with Apache 2.2.11, while the new one is a RHEL 5 box with Apache 2.2.3.

The workaround would be to change all paths to full URL's, or to convert the statements to mod_rewrite rules, but I'd prefer the documented behaviour.

What are your experiences?

Answer

It appears that this behavior varies between Apache versions and distributions, and it contradicts the Apache documentation (as stated in the question). Very annoying. I could find no obvious pattern to which version supports which behaviour.

Rewriting all Redirects to similar RewriteRules does the trick since RewriteRules are much more versatile, but at the expense of readability.

Tuesday, February 21, 2017

virtualization - How to [politely?] tell software vendor they don't know what they're talking about

Not a technical question, but a valid one nonetheless. Scenario:

HP ProLiant DL380 Gen 8 with 2 x 8-core Xeon E5-2667 CPUs and 256GB RAM running ESXi 5.5. Eight VMs for a given vendor's system. Four VMs for test, four VMs for production. The four servers in each environment perform different functions, e.g.: web server, main app server, OLAP DB server and SQL DB server.

CPU shares configured to stop the test environment from impacting production. All storage on SAN.

We've had some queries regarding performance, and the vendor insists that we need to give the production system more memory and vCPUs. However, we can clearly see from vCenter that the existing allocations aren't being touched, e.g.: a monthly view of CPU utilization on the main application server hovers around 8%, with the odd spike up to 30%. The spikes tend to coincide with the backup software kicking in.

Similar story on RAM - the highest utilization figure across the servers is ~35%.

So, we've been doing some digging, using Process Monitor (Microsoft SysInternals) and Wireshark, and our recommendation to the vendor is that they do some TNS tuning in the first instance. However, this is besides the point.

My question is: how do we get them to acknowledge that the VMware statistics that we've sent them are evidence enough that more RAM/vCPU won't help?

--- UPDATE 12/07/2014 ---

Interesting week. Our IT management have said that we should make the change to the VM allocations, and we're now waiting for some downtime from the business users. Strangely, the business users are the ones saying that certain aspects of the app are running slowly (compared to what, I don't know), but they're going to "let us know" when we can take the system down (grumble, grumble!).

As an aside, the "slow" aspect of the system is apparently not the HTTP(S) element, i.e.: the "thin app" used by most of the users. It sounds like it's the "fat client" installs, used by the main finance bods, that is apparently "slow". This means that we're now considering the client and the client-server interaction in our investigations.

As the initial purpose of the question was to seek assistance as to whether to go down the "poke it" route, or just make the change, and we're now making the change, I'll close it using longneck's answer.

Thank you all for your input; as usual, serverfault has been more than just a forum - it's kind of like a psychologist's couch as well :-)

Answer

I suggest that you make the adjustments they have requested. Then benchmark the performance to show them that it made no difference. You could even go so far to benchmark it with LESS memory and vCPU to make your point.

Also, "We're paying you to support the software with actual solutions, not guesswork."

Sunday, February 19, 2017

How to update SSD Firmware on RAID Controllers

Got some servers with RAID Controllers and SSD.

Raid Controller: LSI MegaRaid 9271-4i
SSD: Intel 520

The MegaRaid controllers have the ability to update drive firmware using MegaCLI.

Intel for example only offers a Firmware Update Tool, and not direct .bin Firmware files.

Is there a possibility to update these drives without pulling them out and attaching them to a normal SATA controller and using the FUT?

Any other ways you can think of for doing this? Or do people using RAID Controllers + SSD not update their SSD Firmware's at all?

Answer

You don't have an option to do this through your RAID controller. In practice, I don't update SSD firmware unless it's fully vendor-integrated (Like Fusion-io or HP-branded and Dell-branded Sandisk enterprise disks).

If you DO need to update the firmware, you will need to connect directly to a SATA port and use the manufacturer's utility.

Client can't see website

A simple temporary landing page:
http://www.carltonforestgroup.com/

Hosted on shared hosing that I control. They control DNS and have pointed to my server IP using an A Record. "www" is pointed to carltonforestgroup.com using a CName.

Client can't see page and receives this message in IE10:

This page can’t be displayed
Make sure the web address http://carltonforestgroup.com is correct.
Look for the page with your search engine.
Refresh the page in a few minutes.
Check that all network cables are plugged in.
Verify that airplane mode is turned off.
Make sure your wireless switch is turned on.
See if you can connect to mobile broadband.
Restart your router.

Fix connection problems

Everywhere I test the site from is fine including:
http://ismywebsiteupnow.com/en/quicktest.php?action=result&qtid=1118415&r=7939

I just want to check I'm not missing anything and that I can advise them that this is a problem local to their network?

Is necessary to upgrade HP ProLiant server firmware before hardware upgrade?

I have a HP ProLiant server and I'm planning to expand the RAID capacity.
I never upgrade the firmware of anything.

Can I add the new disks without the firmwares upgrade or it is absolutely necessary (and why in this case)?

apache 2.2 - How to track down an io-bound botleneck

I'm currently working on optimization of a web server, however I'm quite stuck at one particular problem.
I'm using jmeter to simulate simulate load. jmeter is configured as follows:

400 threads

Ramp-up 30 seconds

Loop count 1

Each thread visits 17 different pages on the server with a 1 - 5 second delay between each request.

What I'm experiencing is, up to 350 threads everything seems to be working as it should.
The load and cpu usage increases, the site becomes noticeable slower but is till usable.

Somewhere between 350 - 400 threads however something happens. The load drops to nearly nothing
, cpu is idling about 75 - 85% and the site hangs for several minutes for everyone.

What I have ruled out:

The server does not swap, at least it does not show in top and collectd graphs.

There is no MySQL queries that are waiting to finish (as reported by MySQL Administrator). Although I'm seeing a lot of open connections.

max_connections in MySQL is 1600 (1 MySQL connection per request, so this limit is far from reached)

wait is so to speak non existent in cpu graphs (collectd)

We are using memcached, but the timeout is set to 1 second.

memcached is run on the same server so network latency should not be an issue.

MaxClients and ServerLimit is not reached in apache

I'm running out of ideas of how to track this issue down.
Any tips, tricks or ideas to help pin the reason down?

Thanks

Saturday, February 18, 2017

Puzzled about PHP file permission and shared webhosting - what are some explanations?

I have this issue with different web-hosting, particular upload scripts which can only upload to a folder only if it has 777 permission (which is risky). On the test server (on a different webhost), 755 works well.

On another web-hosting, log files generated by PHP file functions cannot be write to some time, but other files are mysteriously unaffected (for instance, the log files for the entire week is 655, and they work well, but just today's log-file doesn't work unless it is set to 777).

I am more of an application developer than a server backend expert, so these behaviours puzzle me to no end. Why are they happening? What can be done?

Changing netmask from /24 to /16 on a Windows 2003 domain

I have a Windows 2003 domain using 192.168.0.0/24 with all static addresses (no dhcp). I want to move to 192.168.0.0/16 because we need more addresses. I understand that we need to change netmask from all computer from 255.255.255.0 to 255.255.0.0

My questions are:

Is there a way to not change netmask of all computer and changing our domain controller to 192.168.0.0/16?

What change need to be done on DNS side (Active Directory) to be able to handle the new subnet?

Answer

If you have enough computers that a /24 isn't large enough it is seriously time to start switching over to DHCP.

If the addresses are set, there is no magic where you can set some single setting on the domain controller.

You could possibly build some kind of startup script that used the various command line tools to get the current settings and update them. This could result in broken systems without network access if you don't get it exactly right though.

If didn't mind performance issues you could setup your router to perform a proxy-arp so you don't have to change every system at once. The ability and procedure for this depends on what router you have.

As for DNS, you probably will just need to either additional /24 reverse zones, or remove your existing zone and add a /16. I am not aware of anyway to convert from a /24 to /16 on Windows.

My main concern is, if I change servers' masks from 255.255.255.0 to 255.255.0.0 do you think everyone will still be able to communicate? ... I just want to have time to do it, without having to shutdown the whole network.

Assuming you don't have any other usage in the 192.168.0.0/16 network, then you could start changing masks on systems. Just keep that until the masks are change on all systems, then systems with an IP address 192.168.0.0 - 192.168.0.255 and a /24 mask will not be able to communicate with machine with an address 192.168.1.0 - 192.168.255.254 and the /16 mask. So, you should probably re-number quickly, and you not actually use any of the new address space until you are done.

ubuntu - Strange HTTP response headers being sent to Internet Explorer 8 only

So here's something I find puzzling.

I'm working on a Javascript that needs to parse XML data and I'm using jQuery's $.ajax to fetch and parse the data. It's working great everywhere except when I test with Internet Explorer 8 (it might be a problem on 7 and 9 too). On IE, I'm getting parse errors. I installed a console.log to check the HTTP headers. Here's what I get from Chrome on Windows XP and what I'm getting from IE --

Chrome:

Date: Sat, 09 Apr 2011 16:06:24 GMT
Connection: Keep-Alive
Content-Length: 2283
Last-Modified: Sat, 09 Apr 2011 15:59:12 GMT
Server: Apache/2.2.14 (Ubuntu)
ETag: "48048-8eb-4a07e6c693400"
Content-Type: application/xml
Accept-Ranges: bytes

Keep-Alive: timeout=15, max=97

IE8:

LOG: ETag: "48048-8eb-4a07d7a3cbe40"
Keep-Alive: timeout=15, max=97
Content-Type: text/html
Content-Length: 2283
Last-Modified: Sat, 09 Apr 2011 14:51:29 GMT

This is a sample of what the xml document looks like:



    
        name
        message
        avatar

    
    
        name
        message
        avatar

I checked the MIME configuring for my Apache server and it is set to send xml files as 'application/xml'. So it's strangely sending a content-type of 'application/xml' to Chrome, but IE gets content-type of 'text/html'.

So I built a simple PHP script:

header('Content-type: application/xml; charset=UTF-8');
echo '';
?>

    
        name

        message
        avatar
    
    
        name
        message
        avatar

When I change the Javascript to retrieve the PHP instead of the XML file, I get these response headers --

Chrome with PHP:

Date: Sat, 09 Apr 2011 16:10:39 GMT
X-Powered-By: PHP/5.2.10-2ubuntu6.7
Connection: Keep-Alive
Content-Length: 2102
Server: Apache/2.2.14 (Ubuntu)

Content-Type: application/xml; charset=UTF-8
Keep-Alive: timeout=15, max=97

IE with PHP:

LOG: X-Powered-By: PHP/5.2.10-2ubuntu6.7
Content-Length: 2102
Keep-Alive: timeout=15, max=100
Content-Type: application/xml; charset=UTF-8

More strangeness I just discovered. I put an

AddType application/xml tweets

into my directives for that virtual server. When I then fetch my XML document with the .tweets extension, IE does get the correct content-type in the header! In fact, the header looks more like the Chrome version --

Chrome with .tweets file:

Connection: Keep-Alive
Content-Length: 2102
Last-Modified: Sat, 09 Apr 2011 16:33:46 GMT
Server: Apache/2.2.14 (Ubuntu)
ETag: "48048-836-4a07ee807ee80"
Content-Type: application/xml
Accept-Ranges: bytes
Keep-Alive: timeout=15, max=100

IE with .tweets file:

LOG: Date: Sat, 09 Apr 2011 16:38:56 GMT
Server: Apache/2.2.14 (Ubuntu)
Last-Modified: Sat, 09 Apr 2011 16:33:46 GMT
ETag: "48048-836-4a07ee807ee80"
Accept-Ranges: bytes
Content-Length: 2102
Keep-Alive: timeout=15, max=100

Connection: Keep-Alive
Content-Type: application/xml

So from what I can tell, with my limited Apache knowledge, it seems like the raw XML file is getting sent without the right content-type only to IE, even though I have it configured to send 'application/xml'. Chrome is receiving the right content-type. When I use PHP, Apache seems to be following my wishes and sending 'application/xml' because that is what I stamped it as in the script. Is it also strange that IE doesn't have all the same headers as Chrome? "Server" is missing for instance. When I add a custom extension of .tweets, configured to use application/xml, I also get the correct content-type.

So what could possibly be getting in the way and changing 'application/xml' to 'text/html' only for Internet Explorer? I'd hate to have to rely on my workarounds. I thought of mod-deflate, but I disabled it and the results are the same.

Any ideas?

(PS - the XML I'm including is just a sample, so the content-lengths don't match up

Answer

So I think I figured it out.

It appears that IE caches the AJAX GET data in such a way that it is hard (impossible?) to clear it out. Maybe I had the xml configured as text/xml at some point, but I don't think so. Basically, IE continued to use the cached results for that XML file over the actual server results. That also explains why the HTTP headers looked so odd (no server information for instance). Or it's possible that the cache is always producing text/html (I gave up further tests).

My solution:
I added a '?avoidcache=' + a timestamp to the end of the URL in the GET request. Now IE gets the proper HTTP headers that I set on the server.

Wow do I hate Internet Explorer. How many development hours are wasted creating workarounds for it's horrible behavior?

apache 2.2 - SSL causing same content on multiple domains

I have setup a Debian LAMP server where I host multiple websites. As far as I know I can only use SSL on one of them, if I'd like to use SSL on two or more sites I'd have to add another IP - so far so good.

The problem is that whenever I type https://siteone.com or https://sitetwo.com it always displays the content from: https://siteone.com. I'd rather it display some error message or something else but absolutely not to display my main site's content (which is where I want the SSL to work).

Note: my Debian web server uses ispconfig as it's control panel.

Answer

Make sure you have your virtual hosts set up correctly such that each virtual host binds only to a single ip address. The Apache documentation for IP based virtual hosting says that your virtual hosts should look similar to the following:


ServerAdmin webmaster@smallco.example.com
DocumentRoot /groups/smallco/www
ServerName smallco.example.com
ErrorLog /groups/smallco/logs/error_log
TransferLog /groups/smallco/logs/access_log




ServerAdmin webmaster@baygroup.example.org
DocumentRoot /groups/baygroup/www
ServerName baygroup.example.com
ErrorLog /groups/baygroup/logs/error_log
TransferLog /groups/baygroup/logs/access_log

The first vitual host listed in your apache config is the default one. Add a fake one before the your first two just to ensure that you are actually matching your virtual hosts, and not just blindly falling into the first one. Here is a more complete article about such a setup from IBM: http://www-01.ibm.com/support/docview.wss?uid=swg21045922

Friday, February 17, 2017

Grafana: ip and port of your graphite-web or graphite-api install

Trying to connect graphite with grafana. The manual says:

Url The http protocol, ip and port of your graphite-web or
graphite-api install.

Where I can find either of them? I grep-ed /opt/graphite/, and the only 'graphite-web' related thing I found had relation to url I'm using to open basic graphite screen (the one with tree and graphite composer). I tried this url but got orange 'Unknown error' at grafana's 'Edit data source' (no errors in /var/log/grafana/* or the main log).

They both are on the same server, so I used http://127.0.0.1:81/graphite/ (curl shows Graphite Browser and frameset)

I was unable to found graphite-api at all.

grafana-4.0.1

Graphite... Well, I haven't find how to check version but it was instaslled few days ago by pip install https://github.com/graphite-project/graphite-web/tarball/master

Could anyone help, please?

Answer

Fixed:

open developer tool and check console log.

switched 127.0.0.1 to IP I can use in my desktop browser. Have no idea why it cannot be communicated directly and have to jump back and forth across the globe

see XMLHttpRequest cannot load http://example.com/graphite//metrics/find/?query=*. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://example.com/grafena' is therefore not allowed access.

Add this to apache virtualhost config at the graphite side:

Header set Access-Control-Allow-Origin: "*"

Header set Access-Control-Allow-Methods: "GET, OPTIONS, POST"

Header set Access-Control-Allow-Headers: "origin, authorization,accept, content-type"

then apachectl -t && apachectl graceful

That's all.

windows server 2008 - 100% uptime for a web application

We received an interesting "requirement" from a client today.

They want 100% uptime with off-site failover on a web application. From our web application's viewpoint, this isn't an issue. It was designed to be able to scale out across multiple database servers, etc.

However, from a networking issue I just can't seem to figure out how to make it work.

In a nutshell, the application will live on servers within the client's network. It is accessed by both internal and external people. They want us to maintain an off-site copy of the system that in the event of a serious failure at their premises would immediately pick up and take over.

Now we know there is absolutely no way to resolve it for internal people (carrier pigeon?), but they want the external users to not even notice.

Quite frankly, I haven't the foggiest idea of how this might be possible. It seems that if they lose Internet connectivity then we would have to do a DNS change to forward traffic to the external machines... Which, of course, takes time.

Ideas?

UPDATE

I had a discussion with the client today and they clarified on the issue.

They stuck by the 100% number, saying the application should stay active even in the event of a flood. However, that requirement only kicks in if we host it for them. They said they would handle the uptime requirement if the application lives entirely on their servers. You can guess my response.

Answer

Here is Wikipedia's handy chart of the pursuit of nines:

enter image description here

Interestingly, only 3 of the top 20 websites were able to achieve the mythical 5 nines or 99.999% uptime in 2007. They were Yahoo, AOL, and Comcast. In the first 4 months of 2008, some of the most popular social networks, didn't even come close to that.

From the chart, it should be evident how ridiculous the pursuit of 100% uptime is...

linux - How precise is a cron daemon?

Is the cron job scheduler really precise?

I mean, I need a script to run every night the latest possible, BUT before 00:00 of the next day.

I'd ideally run a cron job at 23.59 (or 11:59 pm), but will the system be really precise? Since a second does matter, should I set the cron job to 23:58 to leave it some time?

Thursday, February 16, 2017

amazon web services - AWS Private subnet not redirected to NAT Instance

AWS is not setting up properly default gw for instance in private subnet.

NAT address:

ec2din i-ef7f8a3a|grep PRIVATEIPADDRESS
PRIVATEIPADDRESS    172.16.0.31

ROUTING TABLE configuration:

ec2drtb rtb-7c9f3618
ROUTETABLE  rtb-7c9f3618    vpc-43da3455
ROUTE   local       active  172.16.0.0/16           CreateRouteTable
ROUTE       i-ef7f8a3a  active  0.0.0.0/0   eni-4055320a        CreateRoute

ASSOCIATION rtbassoc-cc1764a8   main
ASSOCIATION rtbassoc-51b7c435   subnet-c92429be`

PRIVATE SUBNET configuration:

ec2dsubnet subnet-c92429be
SUBNET  subnet-c92429be available   vpc-43da3455    172.16.1.0/24   250 us-east-1a  false   false
TAG subnet  subnet-c92429be`

Like we see I configured instance i-ef7f8a3a as NAT, and set it in routing table as default gw for all traffic.

When I login to my machine started in private subnet (172.16.1.220) and check the routing table it is not showing default gw as ip of my NAT instance, instead, it redirects to default router:

ip r
default via 172.16.1.1 dev eth0 
default via 172.16.1.1 dev eth0  metric 1024 
172.16.1.0/24 dev eth0  proto kernel  scope link  src 172.16.1.220 
172.16.1.1 dev eth0  scope link  metric 1024`

Default route on NAT is set up as 172.16.0.1 not to igw, so I thought all magic is done on AWS router and it will redirect to my NAT anyway, so I start tests.

I try to ping some outside IP and I start tcpdump on my nat instance but I dont see any incoming packets from my private subnet:

ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
^C
--- 8.8.8.8 ping statistics ---

20 packets transmitted, 0 received, 100% packet loss, time 19150ms`

tcpdump -n host 8.8.8.8
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel`

Both instances can "see" each other -- I can ssh in both directions, but when I even try to set up default route with my NAT as gw im getting:

ip r add default via 172.16.0.31
RTNETLINK answers: Network is unreachable

Did I miss something? How should the route table look on an instance on private subnet? Should my NAT IP be there, or the default?

untagged - Preparing For Interview - 'IT Incedent Management & Senior Service Desk'

I have recently been accepted for an interview for a position with the state government in NSW, for the position of 'IT Incedent Management & Senior Service Desk'.

I have been to interviews before for lower positions, and some similar positions 0 but they did not have a very formal interviewing process (as I presume the state government will).

Can anyone please help me with some preperation tips that I could expect for this type of interview?

Thanks for any help.

Answer

I can tell you what I look for in a service desk analyst or manager (general policy):

Hopefully you have some experience with ITIL or another popular management framework.
Understand what an IT service, an IT process, and what in incident refers to.
hands on experience administering at least 1 OS
Tell me how you solve a ticket for something you know nothing about "the whosit won't work with the grundersnatch". I'm looking for you to tell me you'd find out who owns the Service that runs grundersnatch.
what makes an incompetent helpdesk staff.

tell me what you do to incompentant helpdesk staff.

Wednesday, February 15, 2017

Emails sent through our application are going to spam or not comming at all

When sending emails through our rails app they are going to spam in some email accounts(hotmail) and not coming at all in others.

We are using sendmail to send the emails. The sender email id is no-reply@xyz.com. What could be the possible reasons for this. Where do we check the logs for the sendmail(ubuntu).

linux - Do I need to restart SUSE when upgrade glibc?

I plan to upgrade glibc version (HAVE to install by rpm command) on 30 SUSE machines, all the machine are using for running test on Jenkins. Could someone tell me the reboot is needed in this case?

memory - How to find what is using linux swap or what is in the swap?

I have virtual linux (Fedora 17) server with 28GB RAM and 2GB swap. The server is running a MySQL DB that is set up to use most of the RAM.

After some time running the server starts to use swap to swap out unsued pages. That is fine as my swappiness is at default 60 and it is the expected behaviour.

The strange thing is that the number in top/meminfo does not correspond with info from processes. I.e. the server is reporting these numbers:

/proc/meminfo:
SwapCached:        24588 kB
SwapTotal:       2097148 kB
SwapFree:         865912 kB

top:
Mem:  28189800k total, 27583776k used,   606024k free,   163452k buffers

Swap:  2097148k total,  1231512k used,   865636k free,  6554356k cached

If I use the script from https://serverfault.com/a/423603/98204 it reports reasonable numbers (few MBs swapped by bash'es, systemd, etc) and one big allocation from MySQL (I omitted a lot of output lines):

892        [2442] qmgr -l -t fifo -u
896        [2412] /usr/libexec/postfix/master
904        [28382] mysql -u root
976        [27559] -bash
984        [27637] -bash

992        [27931] SCREEN
1000       [27932] /bin/bash
1192       [27558] sshd: admin@pts/0
1196       [27556] sshd: admin [priv]
1244       [1] /usr/lib/systemd/systemd
9444       [26626] /usr/bin/perl /bin/innotop
413852     [31039] /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/data/mysql/err --open-files-limit=8192 --pid-file=/data/mysql/pid --socket=/data/mysql/mysql.sock --port=3306
449264   Total Swap Used

So if I get the script output right the total swap usage should be 449264K = ca. 440MB with mysql using ca. 90% of the swap.

The question is why this differs so much from the top and meminfo numbers? Is there any way how to "dump" swap info to see what is actually in it instead of summing the swap usages from all processes?

When analyzing the issue I came up with different ideas but they all seems to be wrong:

The script output is not in KB. Even if it would be in 512 or 4KB units it won't match. Actually the ratio (1200:440) is about 3:1 which is "strange" number.

There are some pages in swap that are somehow shared between processes as mentioned in https://serverfault.com/a/477664/98204 . If this is true how can I find the actual number of memory used like this? I mean it would need to make cca 800MB difference. And that does not sound right in this scenario.

There are some "old" pages in swap used by processes that already finished. I would not mind that if I were able to find out how much is this "freeable" swap.

There are pages in swap that has been swapped back to memory and are in swap just in case they did not change in RAM and need to be swapped out again as mentioned in https://serverfault.com/a/100636/98204 . But the SwapCached value is only 24MB.

The strange thing is that the swap usage is slowly increasing while the sum output from the script is roughly the same. In last 3 days the swap used increased from 1100MB to current 1230MB while the sum increased from 430MB to current 449MB (ca.).

The server has enough free(able) RAM so I could just turn off the swap and turn it back on. Or I could probably set swappiness to 0 so the swap would get used only if the is no other way. But I would like to solve the issue or at least find out what is the cause of this.

Tuesday, February 14, 2017

root - rm: cannot remove - Permission denied

Anyone have any idea why I can't remove these?

~# find /var/lib/php5/ -xdev -depth -type f -size 0 -exec ls -al {} \; -exec lsattr {} \;  -exec rm -rf {} \;
-rwxrwxrwx 1 root root 0 Jan 23 05:20 /var/lib/php5/165498
-------------e-- /var/lib/php5/165498
rm: cannot remove ‘/var/lib/php5/165498’: Permission denied
-rwxrwxrwx 1 root root 0 Jan 23 05:20 /var/lib/php5/217306
-------------e-- /var/lib/php5/217306
rm: cannot remove ‘/var/lib/php5/217306’: Permission denied
-rwxrwxrwx 1 root root 0 Jan 23 05:20 /var/lib/php5/275922
-------------e-- /var/lib/php5/275922

rm: cannot remove ‘/var/lib/php5/275922’: Permission denied
-rwxrwxrwx 1 root root 0 Jan 23 05:20 /var/lib/php5/148947
-------------e-- /var/lib/php5/148947

Seems like I should be able to?

~# whoami
root

fstab output

~# cat /etc/fstab
LABEL=cloudimg-rootfs   /        ext4   defaults        0 0
/dev/xvdb       /mnt    auto    defaults,nobootwait,comment=cloudconfig 0       2

namei output

~# namei -mo /var/lib/php5
f: /var/lib/php5
drwxr-xr-x root root /
drwxr-xr-x root root var
drwxr-xr-x root root lib
drwxr-xr-x root root php5

findmnt output

~# findmnt
TARGET                       SOURCE                             FSTYPE     OPTIONS
/                            /dev/disk/by-label/cloudimg-rootfs ext4       rw,relatime,data=ordered
├─/sys                       sysfs                              sysfs      rw,nosuid,nodev,noexec,relatime
│ ├─/sys/fs/cgroup                                              tmpfs      rw,relatime,size=4k,mode=755
│ │ └─/sys/fs/cgroup/systemd systemd                            cgroup     rw,nosuid,nodev,noexec,relatime,name=systemd
│ ├─/sys/fs/fuse/connections                                    fusectl    rw,relatime
│ ├─/sys/kernel/debug                                           debugfs    rw,relatime
│ ├─/sys/kernel/security                                        securityfs rw,relatime
│ └─/sys/fs/pstore                                              pstore     rw,relatime

├─/proc                      proc                               proc       rw,nosuid,nodev,noexec,relatime
├─/dev                       udev                               devtmpfs   rw,relatime,size=1908536k,nr_inodes=477134,mode=755
│ └─/dev/pts                 devpts                             devpts     rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000
├─/run                       tmpfs                              tmpfs      rw,nosuid,noexec,relatime,size=383888k,mode=755
│ ├─/run/lock                                                   tmpfs      rw,nosuid,nodev,noexec,relatime,size=5120k
│ ├─/run/shm                                                    tmpfs      rw,nosuid,nodev,relatime
│ └─/run/user                                                   tmpfs      rw,nosuid,nodev,noexec,relatime,size=102400k,mode=755
└─/mnt                       /dev/xvdb                          ext3       rw,relatime,data=ordered

mount output

~# mount
/dev/xvda1 on / type ext4 (rw)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/cgroup type tmpfs (rw)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)

udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
none on /run/user type tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755)
none on /sys/fs/pstore type pstore (rw)
systemd on /sys/fs/cgroup/systemd type cgroup (rw,noexec,nosuid,nodev,none,name=systemd)
/dev/xvdb on /mnt type ext3 (rw,_netdev)

>

EDIT: In repsonse to Dan Armstrong

selinux

~# ls -al /usr/sbin/getenforce

ls: cannot access /usr/sbin/getenforce: No such file or directory

apparmor

~# /usr/sbin/apparmor_status
apparmor module is loaded.
4 profiles are loaded.
4 profiles are in enforce mode.
   /sbin/dhclient

   /usr/lib/NetworkManager/nm-dhcp-client.action
   /usr/lib/connman/scripts/dhclient-script
   /usr/sbin/tcpdump
0 profiles are in complain mode.
1 processes have profiles defined.
1 processes are in enforce mode.
   /sbin/dhclient (516)
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.

os version

~# cat /etc/os-release
NAME="Ubuntu"
VERSION="14.04, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04 LTS"
VERSION_ID="14.04"

HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"

stop apparmor

~# /etc/init.d/apparmor stop
 * Clearing AppArmor profiles cache                                                                                                                                                                                                                                                                                   [ OK ]
All profile caches have been cleared, but no profiles have been unloaded.

Unloading profiles will leave already running processes permanently
unconfined, which can lead to unexpected situations.

To set a process to complain mode, use the command line tool
'aa-complain'. To really tear down all profiles, run the init script
with the 'teardown' option."

retry rm

~# find /var/lib/php5/ -xdev -depth -type f -size 0 -exec ls -al {} \; -exec lsattr {} \;  -exec rm -rf {} \;
-rwxrwxrwx 1 root root 0 Jan 23 05:20 /var/lib/php5/165498
-------------e-- /var/lib/php5/165498
rm: cannot remove ‘/var/lib/php5/165498’: Permission denied

samba - Filename Case issue with over WebDav

We are accessing SAMBA shared directory from a Windows Client with WebDav client WebDrive. But we are having the issue that it is showing same contents in both the directories ( data/ & Data/ ) though they are entirely different.

I know this is because of Windows Filesystem being case insensitive and Linux being Case Sensitive.

is there any solution for this?

We had the same issue when viewed through the SAMBA mounted directory but we solved it by editing the SMB.conf as said in the following link

Does Samba work well with Windows when case-sensitive names are enabled?

Please help to solve this when accessed from the WebDav

Monday, February 13, 2017

domain name system - Nameserver Problem

I installed Virtualmin GPL with BIND, etc. I created a virtual host for my main domain, and edited the zone file, added the A records for the nameservers ns1.mydomain.com and ns2.mydomain.com. All looks good from the VPS, whenever I do a dig or nslookup, all seems fine, returning correct details.

But when I test externally, there are no results. I tried several sites such as intodns, etc. It says my nameservers doesn't point to any IP. I have registered my nameservers with the appropriate IP in my registrar, and I even contacted support to verify this, and they told me that the registration was successful, but when they dig my nameservers, there's no ip address.

I can also connect to my server IP's port 53. Also, I know that the DNS propagation is 24-48 hours. But there has to be some server that can return the correct results by now.

Please lead me to the right direction, thanks.

centos - VPS host can't send email to Google and Yahoo Mail

I got a new VPS setup and I'm wondering why I can't send emails to yahoo and gmail. Here's the error in /var/log/maillog:

00:43:00 mylamp sendmail[32507]: o45Gh0nc032505: to=, ctladdr= (48/48), delay=00:00:00, xdelay=00:00:00, mailer=esmtp, pri=120405, relay=alt4.gmail-smtp-in.l.google.com. [74.125.79.27], dsn=4.0.0, stat=Deferred: Connection refused by alt4.gmail-smtp-in.l.google.com

What seems to be the problem?

raid - Windows Tool to read S.M.A.R.T. attributes on SATA drive in HP D2700 Enclosure using P812 Controller

I've got two HP DL380 G7 servers + P812 controller + D2700 enclosure. They're database servers with 144 Gb RAM. P812 firmware is 6.40, D2700 is at 0147

They both worked great with 18 OWC Mercury Extreme SSDs (SATA). After I added 6 more SSDs in both D2700 enclosures to make 24 SSDs in each enclosure, one of the servers is exhibiting very poor disk performance compared to how it was before the upgrade and compared to the other server.

So I suspect that one of the 6 SSDs that was added to the server with poor performance is faulty. But which one? HP Arrays Configuration Utility doesn't show any issues and no issues appear at POST. Even the long ACU report doesn't show anything.

So I'd like to see the S.M.A.R.T. attributes for these drives to see if I can pick out the one failing. Is there a Windows tool that will allow me to view S.M.A.R.T. attributes in this configuration?

In a very similar question 3rd party SSD drives in HP Proliant server - monitoring drive health it is suggested to use smartctl from smartmontools. Unfortunately, I'm not having any luck seeing the SSDs behind the P812+D2700 - how can I make smartctl work?

C:\Program Files\smartmontools\bin>smartctl -a /dev/sdc,0 -T permissive -s on
smartctl 6.3 2014-06-23 r3922 [x86_64-w64-mingw32-2012r2] (cf-20140623)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HP
Product:              LOGICAL VOLUME

Revision:             6.40
User Capacity:        5,760,841,244,672 bytes [5.76 TB]
Logical block size:   512 bytes
Rotation Rate:        15000 rpm
Logical Unit id:      0x600508b1001cf0ebb14e9131d7XXXXXX
Serial number:        PAGXQ0ARXXXXXX
Device type:          disk
Local Time is:        Fri Dec 12 18:42:32 2014 EST
SMART support is:     Unavailable - device lacks SMART capability.


=== START OF ENABLE/DISABLE COMMANDS SECTION ===
unable to fetch IEC (SMART) mode page [Input/output error]

=== START OF READ SMART DATA SECTION ===

Error Counter logging not supported

Device does not support Self Test logging

Here is the output for the command suggested by the very similar question (I changed /dev/sda to /dev/sdc because that's the device of the first volume on the P812:

C:\Program Files\smartmontools\bin>smartctl -a -l ssd /dev/sdc -d sat+cciss,1
smartctl 6.3 2014-06-23 r3922 [x86_64-w64-mingw32-2012r2] (cf-20140623)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/sdc: Type 'sat+...': Unknown device type 'cciss,1'
=======> VALID ARGUMENTS ARE: ata, scsi, sat[,auto][,N][+TYPE], usbcypress[,X], usbjmicron[,p][,x][,N], usbsunplus, areca,N[/E], auto, test <=======

Use smartctl -h to get a usage summary

Answer

Here is the answer to the original question, asking for a Windows tool that will allow me view S.M.A.R.T. parameters on SSDs that sit behind an HP SmartArray P812 on a D2700 chassis:

I've edited the answer as of Aug 29, 2017. Originally I concluded that there was not a windows-based tool that allows me to query the S.M.A.R.T. parameters on a SATA drive in a D2700 enclosure using a P812 controller, I see this is not completely accurate. While the HP Arrays Configuration Utility (ACU) does not allow me to query the S.M.A.R.T. parameters, it does notify me when a drive is predicted to fail soon and this notification also appears in the Array Diagnostics Report.

As of the time of the original answer, I considered these three candidates but none of them did the job at the time. The comments below might not be accurate any longer:

SmartmonTools/smartctl - looks like querying S.M.A.R.T. behind an HP controller is supported on Linux according to 3rd party SSD drives in HP Proliant server - monitoring drive health, but the windows version of smartctl does not appear to support cciss driver which is needed for HP SmartArray controllers according this

HP SSA CLI - has extensive support for HP controllers, but no support for S.M.A.R.T. - HP seems to favor their own SmartSSD Wear Gauge technology. The command I used is: "controller slot=1 ssdphysicaldrive all show detail" another useful command is "controller slot=1 show ssdinfo"

HD Sentinel - advertises support for HP Controllers here, but when you read the fine print here it says it can't peer behind HP SmartArray controllers