Use Freefind to search this site (or the web)
Site search Web search

powered by FreeFind

Thursday, June 29, 2006

Interesting application of Baye's Rule

This post isn't so much to talk about the politics of the NSA mass surveillance program as it is to demonstrate a well known statistical concept.

Basically, what his article argues is this: suppose you want to determine if someone is a terrorist threat. Suppose you have some algorithm that is reasonably accurate when you apply it (say, if a person meets criteria X, Y, Z, W,.... then it follows that there is a, say, 90% probability that this person is a terrorist threat). Now you randomly apply this process to the population at large and it turns up names, say, including Ollie. So, now that Ollie has been identified, what is the probability that Ollie is indeed a terrorist threat?

Answer: pretty low; probably less than 50%. 50% is what you would get if you were to merely flip a coin with heads: "he is a terrorist" and tails: "he isn't".

Of course, one could argue is that this first mass process is really some weeding out thing so as one has a collection of people, each with a slightly higher than normal chance of being a terrorist threat, and then one could apply more expensive tests to this group, knowing that the "false alarm" rate is going to be very high.

Anyway, here goes with the article:

http://www.counterpunch.org/rudmin05242006.html

Why Does the NSA Engage in Mass Surveillance of Americans When It's Statistically Impossible for Such Spying to Detect Terrorists?

By FLOYD RUDMIN

The Bush administration and the National Security Agency (NSA) have been secretly monitoring the email messages and phone calls of all Americans. They are doing this, they say, for our own good. To find terrorists. Many people have criticized NSA's domestic spying as unlawful invasion of privacy, as search without search warrant, as abuse of power, as misuse of the NSA's resources, as unConstitutional, as something the communists would do, something very unAmerican.

In addition, however, mass surveillance of an entire population cannot find terrorists. It is a probabilistic impossibility. It cannot work.

What is the probability that people are terrorists given that NSA's mass surveillance identifies them as terrorists? If the probability is zero (p=0.00), then they certainly are not terrorists, and NSA was wasting resources and damaging the lives of innocent citizens. If the probability is one (p=1.00), then they definitely are terrorists, and NSA has saved the day. If the probability is fifty-fifty (p=0.50), that is the same as guessing the flip of a coin. The conditional probability that people are terrorists given that the NSA surveillance system says they are, that had better be very near to one (p_1.00) and very far from zero (p=0.00).

The mathematics of conditional probability were figured out by the Scottish logician Thomas Bayes. If you Google "Bayes' Theorem", you will get more than a million hits. Bayes' Theorem is taught in all elementary statistics classes. Everyone at NSA certainly knows Bayes' Theorem.

To know if mass surveillance will work, Bayes' theorem requires three estimations:

1) The base-rate for terrorists, i.e. what proportion of the population are terrorists.

2) The accuracy rate, i.e., the probability that real terrorists will be identified by NSA;

3) The misidentification rate, i.e., the probability that innocent citizens will be misidentified by NSA as terrorists.

No matter how sophisticated and super-duper are NSA's methods for identifying terrorists, no matter how big and fast are NSA's computers, NSA's accuracy rate will never be 100% and their misidentification rate will never be 0%. That fact, plus the extremely low base-rate for terrorists, means it is logically impossible for mass surveillance to be an effective way to find terrorists.

I will not put Bayes' computational formula here. It is available in all elementary statistics books and is on the web should any readers be interested. But I will compute some conditional probabilities that people are terrorists given that NSA's system of mass surveillance identifies them to be terrorists.

The US Census shows that there are about 300 million people living in the USA.

Suppose that there are 1,000 terrorists there as well, which is probably a high estimate. The base-rate would be 1 terrorist per 300,000 people. In percentages, that is .00033% which is way less than 1%. Suppose that NSA surveillance has an accuracy rate of .40, which means that 40% of real terrorists in the USA will be identified by NSA's monitoring of everyone's email and phone calls. This is probably a high estimate, considering that terrorists are doing their best to avoid detection. There is no evidence thus far that NSA has been so successful at finding terrorists. And suppose NSA's misidentification rate is .0001, which means that .01% of innocent people will be misidentified as terrorists, at least until they are investigated, detained and interrogated. Note that .01% of the US population is 30,000 people. With these suppositions, then the probability that people are terrorists given that NSA's system of surveillance identifies them as terrorists is only p=0.0132, which is near zero, very far from one. Ergo, NSA's surveillance system is useless for finding terrorists.

Suppose that NSA's system is more accurate than .40, let's say, .70, which means that 70% of terrorists in the USA will be found by mass monitoring of phone calls and email messages. Then, by Bayes' Theorem, the probability that a person is a terrorist if targeted by NSA is still only p=0.0228, which is near zero, far from one, and useless.

Suppose that NSA's system is really, really, really good, really, really good, with an accuracy rate of .90, and a misidentification rate of .00001, which means that only 3,000 innocent people are misidentified as terrorists. With these suppositions, then the probability that people are terrorists given that NSA's system of surveillance identifies them as terrorists is only p=0.2308, which is far from one and well below flipping a coin. NSA's domestic monitoring of everyone's email and phone calls is useless for finding terrorists. [...]


Note: This article goes on to discuss some politics, which I don't want to get into on this particular blog. My other blog: http://blueollie.blogspot.com doesn't shy away from this discussion.

Friday, June 23, 2006

Some fun; global and local optima

Today I had a nice swim (4000 yards, via 40 by 100) and decided to blog about some technical stuff that has been on my mind recently.

First: there is a debate going on about the minimum wage. Some wish to raise the federal minimum wage. The question, of course, is "does this do the poor workers any good?" The debate seems to be: "if you raise the minimum wage, the poorest of the workers will get more money, be better off, and have more to spend" vs. "raise the wage, and businesses won't be able to hire as many workers."

I don't have any economic credentials and therefore won't be talking about that issue specifically, though I do know from history that those old "company store" days didn't benefit anyone.

But what did strike me as odd is the following type of argument that I've heard (on a recent PBS Newshour program):

http://www.pbs.org/newshour/bb/business/jan-june06/minwage_06-21.html

GWEN IFILL: June O'Neill, what about the argument that raising the minimum wage would basically be the tide that lifts all boats, that even though people aren't necessarily making $5.15 an hour now, they could be making more?

JUNE O'NEILL: Well, for one thing, the vast bulk of people are nowhere near the minimum wage. They're earning much more than the minimum wage, so I don't think that they could speak too well. They're not talking about themselves.

But if it's such a magic wand, why stop at $5.15? Why stop at $7.25? Why not say $15 an hour? And then, voila, everyone will be earning $15 an hour and we would do all kinds of good, because you know that employers are not going to employ people. There are other places that they can go.

My note: June O'Neill is an economics professor at Baruch College and served in the Office of Management and Budget in the mid 1990's.

Anyway, it is this argument that I want to pick up on. Suppose you wanted to optimize some economic measure for the lowest income people. Dr. O'Neill seems to be saying that if you think that a small raise in the minimum wage will benefit people, then it follows that a larger raise ought to benefit them more, and of course raising the minimum wage to an absurd level doesn't make sense. Therefore, raising it at all (or even having one at all) makes no sense.

But this argument simply doesn't make sense to me, and here is why: sometimes, an optimum can occur at a local optimum point. It depends on the model.

Bear with me while I give an example:

take, for instance, gasoline mileage. Everyone who has driven a car knows that you get fewer miles per gallon when driving 70 miles per hour than when you drive, say, 55 miles per hour. (of course, I am not talking about race cars). So, driving slower always means getting better mileage right? Well, wrong. What happens when, say, you do lots of driving, say at 20 miles per hour? You get bad gas mileage!

So you see, the best gas mileage comes at a certain speed. Here is a real life efficiency curve:
http://www.fsec.ucf.edu/pubs/energynotes/en-19.htm

Click to see a larger version. So, if you are driving, say, at 35 miles per hour, you should speed up to get better gas mileage. If you are driving say, at 65 miles per hour, then you should slow down to get better gas mileage.

The speed corresponding to the peak of the graph (about 45 miles per hour) is what is known as a local maximum; if you are at that speed and change speed in either direction, your gas mileage goes down. It is also a "global" maximum, in that this is the best speed to drive in terms of gas mileage.

The same thing could be true of the minimum wage; we could well be at a point where raising the minimum wage will help things, though if it is raised too much it might end up hurting the poorest workers.

But either way, what Dr. O'Neill said (the part in bold) is not necessarily a valid argument.

Now for some fun stuff:

Darksyde of the Daily Kos has a good science diary; he writes one every Friday.

Today's can be found here:
http://www.dailykos.com/storyonly/2006/6/23/74340/6263
Today he interviews cosmologist Sean Carroll.

Sean Carroll has a cool blog worth checking out: http://cosmicvariance.com/

Finally, reading a comment on another blog reminded me of the film 2001: A Space Odyssey. I have to admit that I never "got" some of the weirder stuff in that film. But there is a website that takes a credible stab at some of the symbolism and makes some sense to me:

http://www.kubrick2001.com/

Anyway, I found that a fun site to visit.


Monday, June 19, 2006

Whew!

It has been a good long while. This past semester has been a busy one; though my course load wasn't that heavy (the usual three courses) I had mathematical statistics, linear algebra and numerical methods.

Those courses, while enjoyable to teach, require quite a bit of preparation.

During this time, I mostly focused on finishing a paper for publication; I am one more review of the rough draft away from sending it out.

When I send it out, I'll post a link to a preprint and discuss it on this blog. Don't worry; I doubt that this paper will be foundational!

So what happens when one sends a paper out?

The editor gets it, and decides if it is worth sending to a referee. The referee checks to see if it is correct and if it is interesting enough for that journal.

What have been my results? In the past, I've had the following:
  • Accepted with very minor revisions (typo correction and the like)
  • Accepted with moderate revisions: explain this, delete that, modify this proof in this way, put in a diagram, etc.
  • Accepted pending revisions: there is a problem but a fixable one
  • Rejected with a recommendation to resubmit if a problem is fixed
  • Rejected with a recommendation that I send the paper elsewhere (paper isn't appropriate for the given journal; i. e., too specialized, not high enough quality for the target journal, etc.)
  • Rejected with a recommendation that I expand the result to make it more interesting
  • Rejected because of a math error (a couple of times; I was grateful that something not obviously false was published)
  • Rejected because my paper just wasn't interesting.
Fortunately, every rejection has lead directly to another publication; most of the time another paper or, in one instance, a published "problem."

To be honest, my research is far from stellar, but my stuff has appeared in the following journals: College Mathematic Journal, American Mathematical Monthly, Proceedings of the American Mathematical Society, Houston Journal of Mathematics, Journal of Knot Theory and its Ramifications (most of my stuff appears here), Bulletin of the Mexican Mathematical Society, Missouri Journal of the Mathematical Sciences and one refereed conference Proceedings (Low-Dimensional Topology, Knoxville 1992).

On another note: I got funding to attend the Math Fest in Knoxville from August 10-12 this year.