Linux networking tcp keep alive settings

We at our company started having regular server downtime for 3-4 days exactly around our peak load timeperiod.(i.e: around 2 pm IST). Now debugging it was turning into a hard issue due to the fact that we have a fairly(above average?) level of complexity in our application. Not to mention, we have so far resisted or avoided fiddling with linux server internal configurations.*

But this time, it turned out to be unavoidable. So to stick to the original evolution of the storypoint, first response on looking at our dashboard of all possible signals,
is hey look that jserr also seems to have gone up around the sametime. Not to mention our email boxes get flooded with, critical error in websocket server messages. On a look at the logs of websocket server we see messages like “”.

We also see a clear spike in auth login failure graph. So first preventive action after first day of down time is to spin up more login servers.

So we decide that the websocket servers get flooded too much during peak load, so we will spin up more servers.

Result on the next day, we still have same spike and down time, but only 1 hr later.

Ok, clearly that more login servers helped, but was not the core cause nor did it completely eradicate the problem. Time to setup deeper investigation tools. First attempt was to stay up around till the peak load time, and run sar -A close to the times.
While that showed up high paging faults, and context switches, it was not clear what was the cause.
Besides, staying up and monitoring actively led to proactively restarting some of the servers/processes that seem to have stopped the peak load and downtime.

But 4-5 hrs later than the usual, the servers did go down again.

About a week or so before this, we had decided mongod was causing too much paging and for our usage, it is not a good choice for a backend memory application, and we should move most of our data to redis, especially chat rooms and messages.

So we had started work on it and were testing the changes to migrate to redis around this time. So we decided accelerate the testing and release on this and pooled efforts together.
Result, we had a good solid working copy of code that uses redis to create new rooms, nad messages and send, and distribute them.

so we went ahead and deployed them, but discovered that our code to migrate old and existing rooms was taking quite a long time, due to a whole lot of old private rooms ** present.

But this deploy seemed to have gone stable on the production and for a change we went for about 30 hrs time period without a server down time. But then saturday early morning it went down again.

This time, a colleague noticed that there’s a whole lot of tcp open sockets/connections that are in SYN_RECV state. i.e: output of netstat -atn had a whole lot of connections in that state.

So, I started reading around about what exactly this state means. I end up reading this, this and this.

Based on my understanding of this, our current tcpkeepalive settings on kings’ landing consider a connection dead only after 2 hrs 11 minutes . (7200 + 75* 9 seconds)

each of those 3 values can be configure in

# echo 600 > /proc/sys/net/ipv4/tcp_keepalive_time

# echo 60 > /proc/sys/net/ipv4/tcp_keepalive_intvl

# echo 20 > /proc/sys/net/ipv4/tcp_keepalive_probes

Files respectively.

I am changing the first tcp_keepalive_time to 600. It means the number of seconds of no packet activity on a socket before sending a tcp_keepalive packet to check if it’s alive. ..

By doing this, dead connections will be detected in 21 minutues of inactivity( 600 + 75*9 seconds).

One of the downsides of this is if someone opens up a chat room starts typing but hits send only after 21 minutes, the client side app will have to reacquire a connection.

Basic hypothesis:
Dead connections/ inactive app clients are being cleaned up after 2 hrs and 11 minutes. But there’s a limit on how many sockets our server can handle/provide/keep active. so during peak load it runs out of sockets to distribute.

Assumptions that may fail:
1. The app client is sensible to reconnect/reacquire a connection before sending( or atleast if there’s a inactivity time gap before send).

We shouldn’t have a server failure at peak loads. Infact we currently see a failure/sudden drop at 25 messages/5s(or whatever x-axis that graphite shows).
But after this change, we should see our messaging rate peak go up, without sudden drop.

Undo Action:
To undo these changes just run
# echo 7200 > /proc/sys/net/ipv4/tcp_keepalive_time

* — The core reason simply being we would like to scale our app, to 100x or more of current times, and poking around kernel parameters, and configurations makes it harder to leverage the power of cloud and instant instance creations etc.. (though with ec2′s tagging etc.. may not be all that hard)

** — Private messages to another user or to a group chat.

Indian society rants

Society — a few rants:
Most of these are thoughts andidas that have bugged me ofter I have though often and repeatednd occatosion though I have had troubel trynig to fireg out the bisaes and problems insidE.n So this is a complete no filters from any of my rational brain dump of thoughts and ideas.

Death: Why exactly is old age and death gets the response of nothing more than an acceptance.It’s a just the way of life?. Why oh for the love of one’s self why? Or in some other cases, it’s just a very taboo subject spoken rarely of, and never a good topic to speak about? It’s one of the biggest problems left (viewed from the biased first-world viewpoint of course). I understand there are reasons for civilizations, and groups and tribes of men to flourish, they needed to evolve a certain set of spoken and/or un-spoken rules about topics to discuss.Some of my fellow countrymen would say, it’s simply because there’s so many other more basic and fundamental problems to be solved that there’s no time for abstract, impractical, use-less, philosophical, “Who’s John Galt? type” questions. While it may be true for quite a good chunk(or percent) of people, the majority I meet aren’t that impoverished and living hand-to-mouth.(Granted, am in some sense part of the circle that’s easily classified as higher than middle class, but nevertheless, I can see that it’s those unfortunate classes, who seem to enjoy their life). While the middle or higher economic classes, seem to tend to live in a world of un-named, vague set of fears and guilts.
Infact, the most intelligent/smart/evolved people i have met seem to have a learned helplessness about it.(not unlike Dumbledore in HPMOR), rest are clearly in denial, considering any talk about death as a taboo. It’s not polite conversation for the same reason. Some people might be in denial, and you would be scaring them. Hell, I’ll take learned helplessness (only conditionally), but denial no way. Go on read up about Kubler-Ross Model and graduate atleast one step.

Another related irony, is the reluctance to talk about sex. In many ways, as I understand it, sex can be one of the powerful helper in dealing with the Damocles’ sword of death hanging over all of us.
It’s perhaps the intensity of emotions/affect both of these arouse that both are considered impolite conversation, but to hell with those rules.
We are not in a stone age anymore, where every conversation is/was fraught with the danger of costly violence(both immediate and vendetta )

And to those who say, these are rules/heuristics/traditions that survived the test of time and therefore anti-fragile, I
just point to this quote:
If you are in a shipwreck and all the boats are gone, a piano top … that comes along makes a fortuitous life preserver. But this is not to say that the best way to design a life preserver is in the form of a piano top. I think that we are clinging to a great many piano tops in accepting yesterday’s fortuitous contrivings.

Buckminster Fuller

Here, there’s a major pervasive neurosis of losing money or atleast i think so. But overally, there’s major social ideas about, whethe to talk about money or not. there definitely is a major preference to talk vague about money and an idea that better vague the discussion the better it is. It’s kinda weird, given that there’s uncertainty about money and the temporal dynamics of how it grows or changes with time. . I am tempted to say it’s a marketing/sales tactic, along with information asymmetry leveraging.
The worst case is when the human agent that uses information asymmetry to leverage, doesn’t realize that’s what it is doing.
I guess, within the field of “The art of persuasion”, it is an advantage not to realize that you are leveraging asymmetry of information.
It kinda enables you to create, and send symbols that communicate genuiness of despair, sadness and grief.
The flipside, of it is you are more and more likely to have full-blown depression. Anyway, the point I am trying to make here is,
that this approach completely blindsides you to some set of strategies that recommend limiting the cost of living.
Like this one on early retirement.

Gender Roles:
iSee this site . The truth is as part of all our society’s mystifying sex.
Infact it’s a whole origin of books like “Eleven Minutes” and also prostitution.
I personally think, that prostitution is perhaps the biggest antithesis of the spirit/idea of sex.
Not very unlike what Francisco d’Anconia explains to Hank Rearden in “Atlas Shrugged”, There I confess am a fan.
But not a cult member. Am also a fan of lesswrong community and their ideals. So really, judge my actions, over a meaningful timeline in which it can be relevant.
And don’t judge my words too seriously. :-P

Now, this is yet another area where we as a society are dragging our feet about reconsidering custom rules.
As I mentioned in the Death section above, it’s one of the most powerful antidote to fear of death, and yet has as powerful a taboo surrounding it.
Infact, I would say, it’s more powerful than Death. But no, having seen the potential for it to incite violence, we stick to talking about sex is a taboo in public conversation.
But guess what, we do know the power it holds, and utilize it to sell soap,deodarants, cars, etc..
Just ask any marketing department. Instead of trying to discuss and understand our strong emotions, we instead get cliched,vague(as Agent Smith puts it vapid) ideals like love.
Insead we get either “Eleven minutes” or sex abuse/rape as a tool of violence.
Or we get porn, bdsm and other fetishes that are either a bunch of people rebelling against the taboo we create for sex or
just plain prostitution, which is pure flesh without any attention and as Ayn Rand puts across in Atlas Shrugged,
the lowest, and meanest form of pleasure one can find.
I never understand, why instead of all this discomfort people keep around and dance around, why don’t they just talk about the elephant in the room?.
I mean, we(society) as a civilization have progressed enough to understand the costs of violence, and formed methods to discourage it.
Clearly, not all individuals are in complete agreement with all of them, or with the costs of violence, and avoiding it always.
Nevertheless, its time for us (society) to re-examine some of our unspoken rules and taboos.
I refuse to believe that the humans around me are incapable of re-thinking, questioning, changing their behavioural habits.
I for one, refuse to abide by rules and conventions and heuristics that make no sense any more, in the era material abundance.

Well, what can I say, that I haven’t already rambled on and on about sex, relationships above?.
To start off, it’s a legal entity, and all other properties assigned to it are exactly that.
(assigned by society, and a self-fulfilling prophecy, in a few cases.)
Otherwise, there’s a whole lot of myth, and a little truth, but the worst part is that the proportion is so immense that, it’s generally discouraged to go looking for the proverbial needle in the haystack.
It is a legal contract, in which at the moment men are at a disadvantage(due to the gender roles (see above),
as perceived by society, and partly some aprior frequency data).
I can’t get around the fact that it is instead used as if it were a natural law.
I can’t stand the fact that it’s a real PITA, to be able to enforce the actual(nay purported by society) properties/advantages of marriage.
I understand the reasons why it works, namely, financial security against future for the woman, and reproductive assurance/(continuation of the species) for the man.

Yet, I don’t understand, why or how either of them are the most important aspects of modern society.
to some extent, I can understand that, but the most baffling part of it all is how marriage is supposed to guarantee either of those.
As far as I know, it is impossible to make that guarantee via the instrument called marriage in modern civilized society.

And here, according to Trout, was the reason human beings could not reject ideas because they were bad: Ideas on Earth were badges of friendship or enmity. Their content did not matter. Friends agreed with friends, in order to express friendliness. Enemies disagreed with enemies, in order to express enmity. The ideas Earthlings held didn’t matter for hundreds of thousands of years, since they couldn’t do much about them anyway. Ideas might as well be badges as anything.

Kurt Vonnegut, Breakfast of Champions

Software codebase as an organism/human child

It’s a useful metaphor for provoking thought. It’s all from here


Here’s what no one tells you when you graduate with a CS degree and take up a job in software engineering:

The computer is a machine, but a codebase is an organism.

This should make sense to anyone who’s worked on a large software project. Computer science is all about controlling the machine — making it do what you want, on the time scale of nano- and milliseconds. But software engineering is more than that. It’s also about nurturing a codebase — keeping it healthy (low entropy) as it evolves, on a time scale of months and years.

Like any organism, a codebase will experience both growth and decay, and much of the art of software development lies in learning to manage these two forces.

Growth, for example, isn’t an unmitigated good. Clearly a project needs to grow in order to become valuable, but unfettered growth can be a big problem. In particular, a codebase tends to grow opportunistically, by means of short-sighted local optimizations. And the bigger it gets, the more ‘volume’ it has to maintain against the forces of entropy. Left to its own devices, then, a codebase will quickly devolve into an unmanageable mess, and can easily collapse under its own weight.

Thus any engineer worth her salt soon learns to be paranoid of code growth. She assumes, correctly, that whenever she ceases to be vigilant, the code will get itself into trouble. She knows, for example, that two modules tend to grow ever more dependent on each other unless separated by hard (‘physical’) boundaries.

(Of course all code changes are introduced by people, by programmers. It’s just a useful shortcut to pretend that the code has its own agenda.)

Faced with the necessity but also the dangers of growth, the seasoned engineer seeks a balance between nurture and discipline. She can’t be too permissive — coddled code won’t learn its boundaries. But she also knows not to be too tyrannical. Code needs some freedom to grow at the optimal rate.

She also understands how to manage code decay. She has a good ‘nose’ for code smells: hints that a piece of code is about to take a turn for the worse. She knows about code rot, which is what happens when code doesn’t get enough testing/execution/exercise. (Use it or lose it, as they say.) She’s seen how bad APIs can metastasize across the codebase like a cancer. She even knows when to amputate a large piece of code. Better it should die than continue to bog down the rest of the project.

Bottom line: building software isn’t like assembling a car. In terms of growth management, it’s more like raising a child or tending a garden. In terms of decay, it’s like caring for a sick patient.

And all of these metaphors help explain why you shouldn’t build software using the factory model.

So the question is how “grown-up” is your codebase? Terrible twos? middle school? Teenaged? Left home early adult? Or more than that?

Undiscovered Math

Anand Jeyahar:

Fun post.. mathematically nothing new, nevertheless a fun reading post..

Originally posted on Math with Bad Drawings:

One day in fifth grade, I was playing with numbers, scribbling down products and quotients—you know, typical cool-kid stuff—when I noticed a pattern. Take any pair of numbers that are two apart (like 6 and 8, or 9 and 11). Multiply them together. Then add one.

You’ll get the square of the number in between them!

This blew my mind. The numbers were hiding secret alliances, passing coded messages amongst themselves, and I’d somehow broken inside. I was a number spy.

View original 492 more words