Sunday, November 25, 2007

Communication whine

If I'm not careful this might turn into an _actual_ blog if keep up this existential musing crap...

I've had a lot of academic frustration lately. Some of it is purely logistical and political; this is not unexpected, and it will pass. But I've also had a log of...hmm...let's say "pedagogical" frustration. No, this isn't a coded way of saying I hate my teacher(s). More, I don't understand things, and when I search out resources to help me understand them, those resources are more likely to infuriate me than enlighten me. Notable exceptions to this are the professors in my department; indeed, that's one of the reasons I (heart) my department: it's full of insanely smart people who are remarkably eloquent and lucid in their explanations of things. Unfortunately, I can't go bug them for every little question that pops into my head.

My frustration revolves much more around written pedagogy in my field. It drives me absolutely batty how profoundly incapable most people seem to be when it comes to verbally communicating an idea. I think I've mentioned at some point previously how dismal Wikipedia is in this regard. On most normal, run-of-the-mill topics, Wikipedia does an admirable job of giving a coherent overview of a given topic. When it comes to highly technical topics, however, it seems to go off the deep end. Technical articles are so inscrutably technical that you basically need to understand the topic before you look it up.

Let me give you an example that made me want to bang my head against a wall today. I wanted to learn about support vector machines (SVM's). Here's Wikipedia's first paragraph in the support vector machine article:
Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. They belong to a family of generalized linear classifiers. They can also be considered a special case of Tikhonov regularization. A special property of SVMs is that they simultaneously minimize the empirical classification error and maximize the geometric margin; hence they are also known as maximum margin classifiers.
To understand my frustration, look no farther than the 3rd sentence: "They can also be considered a special case of Tikhonov regularization." What?! Why? Why the fuck is that the third sentence in the article?? If I'm looking this up in Wikipedia, chances are I want to understand what the fuck an SVM is from a high level. Why is the third thing you tell me related to an obscure formalism that I, and probably most people who look at the article, don't care about?

Then look at the fourth sentence: "A special property of SVMs is that they simultaneously minimize the empirical classification error and maximize the geometric margin..." Great. The third sentence told me about some weird formalism, and now you've used two terms ("empirical classification error" and "geometric margin") that you haven't defined, nor have you provided a link for. Meanwhile, you still haven't told me any of:
  • What an SVM is in terms a lay-person (or at least a lay-person with a computer science degree) can understand
  • What it's used for (in similar terms)
  • Why it's called an SVM
Wikipedia goes on to note that there is "excellent introduction to the topic" at an external link. Let me excerpt for you what comes immediately after the opening notes of that "excellent introduction":
There is a remarkable family of bounds governing the relation between the capacity of a learning machine and its performance. The theory grew out of considerations of under what circumstances, and how quickly, the mean of some empirical quantity converges uniformly, as the number of data points increases, to the true mean (that which would be calculated from an infinite amount of data) (Vapnik, 1979). Let us start with one of these bounds.

The notation here will largely follow that of (Vapnik, 1995). Suppose we are given l observations. Each observation consists of a pair: a vector xi ∈ Rn, i = 1, . . . , l and the associated “truth” yi, given to us by a trusted source. In the tree recognition problem, xi might be a vector of pixel values (e.g. n = 256 for a 16x16 image), and yi would be 1 if the image contains a tree, and -1 otherwise (we use -1 here rather than 0 to simplify subsequent formulae). Now it is assumed that there exists some unknown probability distribution P(x, y) from which these data are drawn, i.e., the data are assumed “iid” (independently drawn and identically distributed). (We will use P for cumulative probability distributions, and p for their densities). Note that this assumption is more general than associating a fixed y with every x: it allows there to be a distribution of y for a given x. In that case, the trusted source would assign labels yi according to a fixed distribution, conditional on xi. However, after this Section, we will be assuming fixed y for given x.
What?! How is something that talks about a "fixed distribution, conditional on xi" an "excellent introduction"? Here's a general rule to go by, as far as I'm concerned: math is not ever an excellent introduction to other math. And indeed, this is the source of my frustration: people who have incredibly analytically adept minds (unlike mine) seem terminally incapable of explaining concepts in anything other than excessively, anally precise mathematical terms that obscure what the fuck they are talking about.

This has become a pet peeve of mine in no small part because stuff like this used to make me feel stupid. I thought I was an idiot because I found it really hard to understand. I am now of the belief that I find it really hard to understand because it's really fucking hard to understand. And, it doesn't need to be. Most people, including a lot of computer scientists and people who might be interested in a topic like this, *gasp* don't think in math. Like most people, a huge amount of their brain is dedicated to visual processing, so give me something to visualize. Also, like most people, it helps them to have a concrete example to frame what you're talking about before you go into the gory details of the theory. If someone would just take the time to write these things in an accessible manner, a lot more people would discover, "Oh, _that's_ what you're talking about! That's much simpler than I thought it was."

Here's what the article on SVMs roughly should have said:

"Support vector machines are a mechanism by which a program can learn to classify data. Imagine, for instance, your data lies on a 2D coordinate plane. Each data point is a dot on that plane, and the data falls roughly into two groups, which translates into two distinct clumps of dots on your 2D plane (perhaps one grouped somewhere around the y-axis and one around the x-axis, for instance). Support vector machines are a learning mechanism that allows an automated agent to segregate the data into the two groups (and, implicitly, to figure out which "clump" a new piece of data should belong to). It does this, roughly, by figuring out what line most cleanly divides one clump from the other."

See? Was that so fucking hard? That's the basic gist of support vector machines, and any schmuck with a basic college education can probably understand it. I'm not that smart. Other people just seem to have the communicative abilities of an orangutan. Grr.

Anyway, it's really really really really frustrating, and I hate it. I don't care how smart you are if you can't communicate your ideas effectively. Part of the reason I am not that interested in areas like security and things like puzzle-solving is that I hate having to figure out things that somebody else already knows but won't/can't/is too incompetent to explain to me. It feels like a profound waste of my time. There are too many problems out there that are hard to solve when we're _cooperating_ without introducing ones that arise just because we're incompetent dicks and can't or won't talk to each other.

1 comment:

Anonymous said...

Hey, it's Wikipedia. You could, uh...change it.

:-)