Monday, March 16, 2015

Fun Number Facts

Most of the times I have blogged whenever something has bugged me. I have railed against -  Internet service providers, bankers (here, here and here), statisticians, entire Countries (Greece and India) ,  analysts, global warming, chartered accountants, Tambrahms  etc.

Just to balance things out a little bit, I am going to write about a few things that have amused me. I am going to focus on mathematical ideas that I have seen recently which have held my attention. Apparently, the Pythaogoren brotherhood used to go around behaving like a "cult", looking for mathematical patterns everywhere.  I am going to list a set of popular references and interesting math bits here.

Ramanujan Number: Many might have heard about this. This is the number 1729. It is special because this is the smallest natural number that can be split as sum of two cubes in two different ways. 1729 = 12^3 + 1^3 and 10^3 + 9^3. It must take a particularly brilliant mind to stumble upon this. Now, what is the smallest number that can be broken as the sum of two squares in two different ways? What is the smallest number that can be broken as the sum of two squares in two different ways if the squares have to be distinct?

Armstrong Number: A 3-digit Armstrong number is a number that is the sum of the cubes of its digits. 153 is an Armtrong number. 1^3 + 5^3 + 3^3 = 153. There are a few more. Life is short. It will never feel complete if you do not know all the Armstrong numbers. Give it a go.

Perfect Numbers:, A perfect number is one that is equal to the sum of its factors (except itself of course). The talk on perfect numbers takes us on to numbers that are semiperfect, deficient, abundant or amicable. Some which are abundant but not semiperfect are called weird, as one can clearly see.

My favourite in this whole lot are the "almost perfect" numbers. They, um, remind me of myself. And yeah, if you did not know before, number geeks are extremely fond of powers of two.  

The best number in the world is 73. This is from Big Bang Theory by the way. 73 in binary is a palindrome. 73 is the 21st prime, which by itself is not such a big deal, but if you reverse 73, it gives us 37 which is the 12th prime.The digits of 73, 7 * 3 gives us 21, which is why being the 21st prime is such a neat deal.  

There is only one natural number in the world, whose successor is a cube and predecessor is a square. Finding this number is not that tough. Proving it is fiendishly tough (Apparently. How would I know?). Fermat apparently mentioned about this number in some letter that he wrote.

There is only one 4-digit perfect square that is of the form 'aabb' where a and b are digits from 0 to 9.

Some other interesting nuggets - there is only one set of three consecutive odd integers all prime (Find these). We can find a set of 6 integers in AP all of which are less than 1000 and are prime (Find these as well, this is tougher). 16 is the only natural number that can be represented as x^y and y^x, where x, y are distinct. Who woulda thunk?

An irrational number raised to an irrational power can be rational (unlike probably a lot that I have mentioned in this post). Try proving that.

Every natural number in the world has a multiple that comprises all the digits appearing at least once each.

Did you know that two triangles can have 3 angles equal and 2 sides equal and still be not-congruent to each other? A quadrilateral can have a pair of sides equal and a pair of sides parallel and still not be a parallelogram. I think it is easier to be this quadrilateral than to be those two triangles.

Do you know that we put our kids through 15 years of school education without them discovering most of these facts? Let me stop right there. I can sense a rant against the Education system coming through the system. 

Friday, March 13, 2015

India's daughter - liberals miss the mark, conservatives continue to be blinkered

The recent banning of the documentary by Leslee Udwin has provided an excellent opportunity for India's chattering classes to get their K's into a T. Having run out of the usual banal routine within 2-3 days, the newscycle pressure has forced the gentlefolk of the media to up the ante to really wildly fantastically irrational territory.

Sample this from the liberal bastion, the Hindu. This is a classic

Some years ago, a friend confided in me that in a fit of rage her husband had shouted that he wished she would be gang raped because she deserved it. Then he paused and said, “No, I think I want something worse than that to happen to you. I want you to die.”
I watched India’s Daughter before the government banned it. As I listened to the rapist explain how he and the others thought about women, I realised there was little difference between them and this husband. But that’s where the similarity ended. He was an upper caste male, an IIT aristocrat living in Silicon Valley, studying at a top business school. The only other difference was that he never acted on his thoughts.
Our lady author friend is gagging with feminist rage and so she extrapolates extravagantly. The sentence "The only other difference was that he never acted on this thoughts" is so brilliant that I hurt myself when I fell from the chair laughing. It is a shame that no one in the editing team from the venerable Hindu told the author "But dear, that seems a pretty big difference to me". 
One one hand, a piqued husband probably says something in anger, on the other hand lies the most heinous crime India has seen (or at least heard of) in the 21st century. This kind of shabby equivalence argument is why Indian intellectual liberalism has not had a credible voice since Nehru.
Somewhere, the liberals have sought to draw a broad enough canvas so as to draw a link between a most gruesome crime and various shades of patriarchy that are present in our Country. This impulse from India's liberal media to simplify everything along pre-existing faultlines is ridiculous. A conversation on rape becomes about Patriarchy-is-the-root-cause vs. blame-the-victim schools of thought. That the rabid conservatives cannot go beyond the "India-is-great" koolaid is a given. That is no excuse for liberals to automatically occupy the diametrically opposite position. 
I am neither liberal nor conservative and if there is one thing that I want to scream out in this whole episode, it is this. I am saddened extremely that this crime against humanity has been perpetrated. I am shaken to the core that there exist people in this world who could commit crimes that are this gruesome. I am scared for women in our cities and our villages. This much is true. I must also say that I am not even a little bit ashamed . Sad, yes. Shamed, no. That my compatriot has committed this crime has not filled my whole being with shame. I feel as much connect with him as a present-day American would with the guy who dropped the bomb on Hiroshima. 
I concede that a feudal patriarchal upbringing has played a role in the way we view women. I also accept that I am posit somewhere on this patriarchal hierarchy (I would argue that I aint that bad, my wife believes I am more chauvinist than I would like to believe. Thats a debate for another day. Either way, I sit somewhere on this line). But no matter where I sit on that line, I refuse to be co-opted into this collective "I feel shamed by this" - this feeling that the liberals want me to feel and the conservatives are supposedly rebelling against. That the convict and I share the same nationality has no bearing. 
I even feel shamed by the Country's response, have a sense of helplessness about the state of our security, but I feel no shame in relation to the fact that an "Indian" committed the crime. 
All this talk of shame neatly brings us to the response from India's conservatives. This has been ridiculous. I could not even begin to wind my head around the ban. The policy response has been broadly "Throw a lot of mud. Some will probably stick". The primary issue has been with the producer's nationality. We still have this holier-than-thou attitude, which when mixed with colonial hangover results in "So, how are you any better?" as the built-in response to any issue.
On this front, the statistics on rape per 1000 people that has been doing the rounds has been very helpful to the conservative cause. The stats are wrong. They are absurdly, ridiculously, unspinnably wrong. The stats are all about "reported rapes" and these are miles apart from actual rape, especially for India. In the west they have come a long way on women's safety. Their rape cases are more a case of "pushing the boundaries" and date-rape. I would be shocked if any Indian woman who had lived in Delhi and New York claimed to feel safer in Delhi over NY. And we need to keep in mind that this is a very favourable sample point for India. If we had to compare, say, rural Bihar to Texas things might be far worse. 
Apparently more than two-thirds of rapes are committed by someone who the victim knows. About 0.1% of these will get reported in India. We must be wearing extraordinary truth-protection blinkers to believe that women are safer in India than they are in the west. I am appalled that so many of my friends shared links that showed these statistics. I would not accuse India's conservative media of Intellectual dishonesty (they can at best be called merely dishonest), but many who shared these links should have known better.
In the US, they are talking about the merits of a "No means no" vs. "Yes means Yes" legal framework. 80% of Indian women would not know where to go to complain if they were sexually assaulted, and this is from the educated class. If you are poor, illiterate and a woman, then God save you. One needs to watch only 2-3 episodes of "Savdhan India" to get a sense of the level to which poor in our Country are not guaranteed any of the freedoms that the middle-class is. 
I am bitterly disappointed that so many of my friends shared the statistics. I am ashamed that not one of them came and said these stats seem absurd. Far more ashamed of this than of being a compatriot of the guy committed the crime. 
We need to really stop this right-wing nonsense about how the west is out to malign us. Everything is not a conspiracy. If we did not view everything from the viewpoint of "Does this show my Country in bad light?", it would be that little bit better. We cannot look for any solutions if we continue to be in denial. 
There are many things to be proud of in India. Protection given to women, especially poorer and vulnerable women is not one of them. The sooner we come to accept that, the sooner we can try to improve our lot. 
Last time I re-posted an article on how women should take safety precautions, a group of my friends came down on me like a ton of bricks (Their peeve was that I was somehow blaming-the-victim). I am troubled by the fact even they have not called out this conservative statistic fudging.

Friday, March 6, 2015

Why three-fourth is not always 75%?

One of my pet peeves is the fact that numerical ideas often (too often) get misused in a bid to convey the wrong impression. It is done by those who should know better and very often by those who indeed know better. Recently, Butttowood had a decent article on this. The financial sector is the biggest culprit. Especially, in the willfully misleading category.

When I was in my previous avatar as a flunky in a global investment advisory, the Economist at the firm in charge of global asset allocation released a report key insight was "Semiconductors lead the rest of technology in the recovery cycle in 75% of the recessions" (or some such tripe). Now, this godforsaken sector was one I was in charge of and therefore had to read more on.

Turns out our Global Guru had looked at the last 4 major global recessions over the last 80 years and found that in 3 out of those 4 semis semiconductor stocks bounced sharper and sooner than the rest of tech.

This is tripe. When you are doing empirical research and are looking at 4 cycles, you have no business describing anything in percentage terms. I stopped reading anything else published by the "Guru". But who am I to say anything? He was the top ranked Economist in the investment industry, and I was the guy who wasn't good enough to get fired when I desperately wanted to.

In the Indian context, Outlook published a cover story saying total scam amount in India Rs. 1.75 lakhs crores or some such. They detailed many scams in this one -

The list roughly goes like this
900 crores - Fodder scam
600 crores - Taj Corridor scam
23 crores - Railway placements scam
3 crores - Perhaps accepted in bribe by the first cousin once removed of some central Govt employee
etc etc.

The last scam number listed was

Total black money stashed abroad = 1.72 lakh crores (estimated).

So, this ginormous number that forms more than 95% of the amount put in the cover page is plucked out of the hat. So, why the $#*k should you go into details in the other scams?

There is this beautiful idea of significant digits in the art of measurement. I never really understood this while at school. Any measurement, be it with Vernier Callipers or Screw guage comes with a built-in error factor. (Least Count?). So, whenever you gave any measurement the number of significant digits must be determined keeping this in mind.

So, if the built-in error is 1 cm. We cannot give a measurement that says 223.5 cms (even if take 10 measurements and average this out). We can at best say 223 cms or 224 cms. The 223.5 suggests that we have confidence over that 0.5, which we can technically not have. Simple idea really. It is like saying if your measurement has some built-in error, do not convey more accuracy than there is. So, if you measure some length 18 times with Vernier Callipers and the average comes out to be 32.222cms. You should bite the bullet and say roughly 32cms. Conveying confidence beyond what the numbers tell you is a crime (or at least should be considered one). Finance and sports are the two fields where this gets done the most.

You might have seen something along these lines frequently

1. The best stock returns are seen from Thursday to Monday.
2. Left-arm bowlers have seen the most success in ODIs conducted since the 1990s.
3. Ricky Ponting really struggles against India as he has an average that is 6 runs than his overall average

Why are these absurd?
If you analysed stock returns over three-day window, some three window would have the best returns. This does not mean that that three-day window has something special going for it. This means something else. Something very special. Something that every statistician worth his salt must have the courage to say on 90% of the times he attempts some statistical analysis. This means nothing.

Ditto the other two inferences.

The idea of statistical significance
So, are all statistical inferences absurd? Of course not. This is where the term statistical significance comes into the picture. If the observation is statistically significant, the flag must be raised. And only then must the flag be raised. So, how do we wind our heads around statistical significance. Statisticians have fancy terms for this. But let us see if we can have an intuitive approach around this. Let us have a go at this with an example.

Let us say we want to test whether Ricky Ponting underperformance against a particular team, say, India is statistically significant. Let us further say his average against India is less than his overall average by about 15% (this looks significant).

Now, let us not take his average against his nemesis, India and keep it as a benchmark metric. Now, let us revisit the original sample and extract a sub-sample from this randomly. If the benchmark metric is lower than that observed in the sub-sample, say, 90% of the time, then let us say that the underperformance is statistically significant.

Let us build on this with numbers. Let us say, Ricky Ponting has scores of {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} in 10 innings. Further let us say, he has played 2 matches each against 5 teams. His overall average is 5.5. Now, let us say he has an average against India that is more than 15% less than his overall average. Or, an average of 4.5 or lesser. Is this statistically significant?

If we extract two scores that have to average 4.5 or lesser, we can have {1, 2}, {1, 3} {1, 4}, {1, 5}, {1, 6}, {1, 7}, {1, 8}, {2, 3}, {2, 4}, {2, 5}, {2, 6}{2, 7}, {3, 4}, {3, 5}, {3, 6}, {4, 5} - 16 possibilities. Totally, there are 45 possibilities. So, there is a nearly 40% chance that some random sample of 2 out of this 10 will have an average that is 15% lesser than the original. So, this 15% below average number means nothing. It is not statistically significant.

In reality, a great many quoted inferences derived from numbers are not statistically significant. If you are handed any statistical inference on a platter, you have hajaar grounds to suspect it is false. And we have not even come to the idea of bias. There are many ways in which we can bias a sample, Some biases creep in, while some others are introduced. Let me give a few examples.

The 2015 world cup stats counter states rather gleefully that the Indian batting unit has one of the highest strike rates in power plays. Now, we need to remember that this is largely because of 70% of India's cricket is played on subcontinental wickets, where the par score is 330-ish. England might play 50% of its matches on English wickets, where the par score might be 260-ish. So, unless the powerplay strike rates are at least 30% apart, we have no business making any inferences. These are the biases that the samples naturally carry.

There are some other biases that data-presenters can bring in. The most beautiful and one that most fudge-statisticians introduce with an unbearable holier-than-thou approach is the selection bias. Let me deal with this with an example.

Let us say, there are two stocks Alpha and Beta that, as an analyst I want to suggest are correlated heavily. I will draw the stock charts for Alpha and Beta and compute correlation numbers. But here is where I will be smart. I will choose the end date to be today's date and the start date to be any date from 2010 to 2013. I will find the correlation numbers for all 1000 or so possibilities and pick the date from which the correlation is the highest. If it is a Friday afternoon and I want to be really intellectually dishonest before my weekend, I will 'float' my end date also.

For any two stocks, if you have large enough database, about 40 minutes of time on your hands, and a moral compass pointing towards "bonus" there is a 50% chance of finding one set of dates where the correlation is more than 90%. You can even wear your best "Why are you looking at me like that. This is what the numbers are telling us". If you want to be thorough, you should find some pseudo-intellectual justification for having picked the date range that you did indeed pick. In case you are wondering how I know this scam with this much clarity, you should look for Business Objects Cognos 91% correlation on some research database. (In my defence, I am not proud of this).

A good statistician is one who can look at a lot of data and tell us why they do not mean much; and then pick up one nugget that actually means something. The statistician who cannot say "this means nothing" should be kicked out of his job. Lot of stats that we see online and on Television are the ones that should be binned.