I don't particularly like any of the descriptions of the Halting Problem that I found lying around the Web, so here goes. Assume there is a program, halt or H for short, that solves the halting problem, i.e., whether any program will halt on a hypothetical input. H's input therefore has two components: 1) any program that can run on a Turing Machine (treated or encoded as data), and 2) hypothetical input for the passed program. For Turing Machines, program instructions and data reside on the same "tape", so it's not that incredible for program instructions to function as data or vice versa - for instance, a (Universal) Turing Machine implemented (finitely and imperfectly) in hardware can simulate an incredible number of Turing Machines, so any program in software that can run on one of the simulated Turing Machines will run on the hardware (Universal) Turing Machine.
Now consider another program, NOT-halt or N for short, that builds on H. This is reasonable both because we started by assuming H's existence and because N's additional code is simple as can be: it does what H does to its input, then it performs the opposite of H's result. If H computes "halt", N will not halt. If H computes "no halt", N will halt. For any input x, N(x) does the opposite of whatever H(x) is.
Notwithstanding how rebellious it acts, N is a program like any other, which implies H can determine if N halts or not for any input. So H(y), where y is an input whose program component is N, must return either "halt" or "no halt". Above we established that for any x N(x) does the opposite of whatever H(x) reports, so certainly in the specific case of x=y, N(y) does the opposite of whatever H(y) reports. If H(y) computes a halt, then N(y) does not halt. If H(y) computes a non-halt, then N(y) does halt. But the program component of y is N! If N(y) halts, H(y) should compute halt, not its opposite! And if N(y) does not halt, H(y) should compute non-halt, not its opposite! For program N (input y), H does not work, therefore H does not work for all programs. The original assumption that H works for all programs has met with a contradiction, so there can be no H. The conclusion is that there is no fully general program for computing the halting problem. I hope readers with minds similar to mine, for whom some other Halting Problem descriptions feel unsatisfactory, will find this helpful.