Another way of phrasing it: You know that effect when a wheel is spinning so fast, it looks like it's going backwards? If it looks like it's standing still, there's a Fourier coefficient at that rotational frequency.
... Or at least, I think there is. It's hard to be precise with analogies.
There has been some interesting recent work [1] applying Fourier transforms (more precisely, an adaptation of the Fourier transform to the sphere) to CNNs, to automatically encode equivariance to rotational symmetries.
For neural nets, https://NN-512.com generates stand-alone C code vectorized Fourier convolutions that are 2x2 spatially strided (with a big 16x16 tile)
This is great for large filter, 2x2 strided neural net convolutions (e.g., 7x7 stride 2x2)
There's none of the waste you find elsewhere, in implementations that downsubsample the output of a dense (unstrided) Fourier convolution. I haven't seen this technique used anywhere else!
Proakis's book is quite good if you want to really nail down the fundamentals of the mathematics, but it's completely detached from applications and intuition despite what the cover (I suspect this is because the author has a different book on communication theory) says, so I'm not a huge fan of it.
Like many others, I was hyped when it came out. Then it turns out that BERT with half the param count still kicks their butts in accuracy.
They also only model the .real component and ignore the .imaginary component entirely, which you can't do and expect good results.
But, FFTs are so cool and under-explored that I'm sure they'll be making the rounds in NNs soon. There are lots of advantages to frequency space representations.
I don't understand why the first operations of a CNN are FFT-based decomposition into spatial wavelets. This is basically all the first layers are doing anyways. In fact the filters usually learn both an edge and a centroid at a given orientation. You can get both at the same time with complex wavelets.
> The Fourier Transform is one of deepest insights ever made. Unfortunately, the meaning is buried within dense equations
The post does a great job at explaining what the Fourier transform means, but I am always skeptical of sentences like this. It's a pattern found in many other posts too: "unfortunately" this concept is expressed in math form, let's try to solve this problem.
I find that it's exactly the other way around. Math and math notation is dismissed too early and too easily even in engineering studies, resulting in shoddy and imprecise definitions and understandings.
> The Fourier Transform is one of deepest insights ever made. Fortunately, the meaning is engraved in dense equations
>It's a pattern found in many other posts too: "unfortunately" this concept is expressed in math form, let's try to solve this problem.
Some things are less intuitive because of the dense equations. You'd have to be slightly insane to say that "x^2 = y^2 + z^2" is a more intuitive way to explain what a circle is than a picture of a circle.
> Fortunately, the meaning is engraved in dense equations
> You'd have to be slightly insane to say that "x^2 = y^2 + z^2" is a more intuitive way to explain what a circle is than a picture of a circle.
I would expect any freshmen STEM student to recognize sqrt(x^2+y^2) as the distance of an (x,y) point to the origin. And the equation x^2+y^2=r^2 says this quantity must be fixed.
Now I wouldn't expect a student to be able to write the previous two sentences so clearly.
This gets to the heart of the paradox of learning. How to see past the clutter if you don't know what you're looking for?
The human brain is lazy and prone to skipping logical steps, as reflected in the variation in natural language (and student's writing skills). Therefore learning a natural language does not demand simplicity up front.. unlike the up-front rigor required to learn mathematics notation or a programming language.
>How to see past the clutter if you don't know what you're looking for?
the picture of the circle isn't clutter - in a very precise sense it is the definition (platonic ideal). the algebraic definition "x^2 = y^2 + z^2" clearly isn't the definition because it's missing things (which field is this polynomial over?). there are many other definitions (another is R/Z).
>mathematics notation or a programming language
notation isn't a formal language - there is no formal grammar written down somewhere and in fact there can be none (cf godel). whether you realize it or not it's a natural language that evolves, has slang, and is spoken to various degrees of competency by many successful mathematicians.
> the picture of the circle isn't clutter - in a very precise sense it is the definition (platonic ideal)
Which precise sense? I get that language is slippery, but I can't follow you if you get philosophical, bring up fields (maybe the polynomial is defined over a ring!) while simultaneously stating that your favorite definition is The One True Way (TM).
How does one use the platonic ideal to distinguish a circle from an ellipse?
> whether you realize it or not it's a natural language that evolves
Yes, I realize.
My point was that the day-to-day usage of natural language tolerates a lot of ambiguity. Whereas the classroom usage of mathematical/programming notation does not. It's hard to be informal and "break the rules" as a beginner while retaining precision. Because the web of potential associations hidden in a statement (the "clutter") hasn't yet been pruned down by experience.
>How does one use the platonic ideal to distinguish a circle from an ellipse?
entailed in the picture "definition" is constant curvature.
>while simultaneously stating that your favorite definition is The One True Way (TM).
that's not what i wrote.
>Whereas the classroom usage of mathematical/programming notation does not.
i won't speak to programming notation but every upper level or grad math class i've had has been at times extremely informal in its use of notation. take an integral on any chalkboard in any class and you're almost certain to have these questions without listening to the lecture itself (i.e. it's omitted from the notation):
1 what is the domain? R or C or something really exotic
2 is it over a closed contour/boundary or open?
3 what's the metric? i.e. are we actually integrating forms?
4 is it even an integral at all not just shorthand for solving a differential equation (e.g. stochastic integral)
> in the sense that "mathematical entities are abstract, have no spatiotemporal or causal properties, and are eternal and unchanging"
oh THAT sense, why didn't you just say so?
but seriously what does that sense have to do with a picture of a circle? how would one use the picture of a circle to distinguish from an ellipse?
edit: nobody ever checked if i knew diddly squat about "real actual academic platonism v formalism" before peforming mathematical exposition in front of numerous classrooms in at least two countries, and surely i'm not going to hold back while arguing on the internet!
i mean if you're not familiar with the real actual academic platonism v formalism debate maybe you shouldn't pontificate on what's better or worse re mathematical exposition?
> Fortunately, the process is engraved in dense equations. Unfortunately, the dense equations make it difficult to learn why we're following this process.
> Some things are less intuitive because of the dense equations. You'd have to be slightly insane to say that "x^2 = y^2 + z^2" is a more intuitive way to explain what a circle is than a picture of a circle.
It's the difference between qualitative and quantitative descriptions. Which one captures the full depth of the concept? Neither on its own; you need to contemplate both at the same time to capture that depth ("range" may be a better word). There is information hiding in the interplay between the two that is missed when considering them in isolation.
I don't believe formalism is all we need, but it is necessary if the problem at hand requires more than just intuition. Conversely, formalism without qualitative understanding is opaque and sterile. Together, they combine operational ability and simple intuition into a higher form of intuition (operational intuition), the most desirable kind.
yes and this particular flavor of insanity has a name: pedantry.
>the meaning is engraved in dense equations
what's hilarious is that middling tier mathematically literate people worship mathematical formalism while truly mathematically literate people (i.e. mathematicians) eschew it because they understand that mathematics can't be formalized but also because it's natural language whose value is communication not codification. even here - ask someone to explain to you rigorously the definition of the fourier transform wrt dirac deltas. again the middling tier mathematically literate will assume that writing down an integral against a dirac delta is rigorous simply because you can write it down completely forgetting that the dirac delta is a tempered distribution not a function and thus that integral means something completely different. ultimately it is the case that the definition is consistent with a set of agreed upon definitions but the symbology itself is just a pun.
lol tell me you're a physicist without telling me you're a physicist :)
>with minimal confusion.
i think this the key thing that people that do a little math but not an immense amount don't understand: notation serves the same purpose as short hand in any other speech/communication. locally it's crystal clear, where local can be as small as that paper, that subsection, or just that paragraph (cf "the distinction is clear from context" lolol). i've written papers myself where i switch notations midstream for efficiency's sake (try defining tensors without bases but then try manipulating them without index notation...) but taking notation for granted is just as likely to get you into trouble as taking an API's promises for granted (in fact they're exactly the same things - leaky abstractions).
the value of [mathematical] language is not only communication, but also reasoning. It is remarkable, and sometimes unfortunate, how much mathematical reasoning you can do without knowing what you're doing, by pattern matching. Notation doesn't just allow you to communicate. It also allows you to manipulate e.g. truths that you know into truths that you do not yet know. In doing this abstract manipulation, you postpone (when necessary, and hopefully not indefinitely) thinking about semantics, while acting only syntactically.
A lot of people say something like "math is a universal language", and it sounds like we both agree that's nonsense. But I think we disagree on the extent to which math is a natural language. Your comment does not, in my opinion, recognize the symbolic manipulation (which doesn't have to be mostly-linear sequences of symbols, but can be e.g. diagrams too) that make math math.
For most people, having only the equations means they cannot build an intuition for what a fourier transform actually is, because they don't have a solid enough grasp of math notation to easily make sense of what the dense equations actually represent.
For those people it's very helpful to first get a sense of what it all means (even if it means that initial mental model is somewhat imprecise), because it makes understanding the equations easier.
My experience is that visuals help people _feel_ like they understand the concept, but the equations are necessary to get an _operational_ knowledge.
It's all well and good to be able to talk about the "deepest human knowledge" at a dinner party, but it doesn't help you solve actual problems using the fourier transform.
My brain works the opposite. Equations give me something I can work with but feels like voodoo, and I only get operational knowledge once I can visualize it in some way.
But "dense" in this context literally means "difficult to understand". If you are trying to build intuition, then that is pretty much by definition a bad thing.
At the end of these types of popsci explanations you still can't grok the math any better and you end up with a hand-wave-y misleading understanding of what's going on.
It starts off by saying that the Fourier Transform "measures every possible cycle". This is an wrong approach to thinking of the Fourier Transform. If you think in these terms you will quickly get lost. The Fourier basis is a set of sinusoids that are mutually orthogonal (ie. none can be represented as a combination of the others). They just happen to generally cover the frequency ranges you're interested in (though this depends on your problem). If you choose some random sinusoidal signal that isn't a harmonic of your sampling period, then the Fourier Transform gives you back a mess that's hard to interpret (how a off-harmonic sinusoid gets "smeared" across the basis is actually something I frankly still don't quite understand https://ccrma.stanford.edu/~jos/mdft/Frequencies_Cracks.html )
This is in contrast to the phase information that you get that is continuous. Each frequency has two associated basis vectors and they can give you any phase offset. The frequencies are only set harmonics, but the phase can be any value. This is the result of how the basis is setup. If anything, the phase information you get from the Fourier transform is more interesting than the frequency information you can glean.
The "Circular Paths" visualization is also highly misleading. The complex plane is a crutch that gives you pretty pictures. It pretends to be like a 2D plane but it's actually fundamentally distinct from a Cartesian/"normal" X-Y plane b/c complex numbers can be multiplied while normal [x,y] pairs can not (b/c the operation is simply undefined - you can't multiply two points on a map). The product of two complex numbers is giving you another complex number.. but good luck visualizing where that point ends up! You'll notice all complex plane examples stay on the unit circle b/c you get a "rotation" but it's an edge case. The whole 2D analogy is setting you up to be very confused
Why is there always this pushback against "popsci explanations" in these threads? Do people not realize every brain is unique and people have different learning styles? A few hours of 3blue1brown videos gave me a useful intuition about FT, FFT, quaternions, and other mathematical things that years of courses with equations and homework failed to do.
I'm talking real, useful intuition that improved my work in signals processing, using wavelets, filters, convolutions etc.
And yes I can visualize 3.5D pretty easily in my head: a 3D volume with layers and colors that overlap. Though usually it's easier to picture two 2D frames side-by-side.
I don't fault people when they grok words/equations but visual diagrams do nothing for them (blueprints/CAD are an enigma to my partner). Don't fault people for needing visualization to grok equations.
I didn't mean to imply that there is anything wrong with visualization. I feel they're easier to understand than equations for the vast majority of people. But as a concrete example, to understand the Fourier transform it probably took me a couple of weeks of reading different textbooks in my spare time to really grok how the math works here.. it took some effort and I'm not some autistic savant that sees things in equations - so it has nothing to do with learning styles.
Looking at some diagrams and videos for certain topics is a lazy and harmful approach to learning because some problems are just not conducive to visualization. Efforts to force a visualization ends up being actively bad and give viewers the wrong idea. My comment was a point by point explanation of how this ends up happening. At least in this particular case, with the Fourier Transform, the visualizations in effect made it harder for me personally to understand how the math works b/c they are so sloppy and emphasis all the wrong points.
In the grand scheme of things, having a half-assed popsci understanding is generally worse than not know anything. You can get a sense of overconfidence and then go on to apply these techniques incorrectly and make broken/fragile systems that make no mathematical sense (or publish insane papers)
If you really work with signal processing and don't understand the mathematics of the fourier transform then that's frankly a little scary. I'd really recommend taking the time to understand it b/c I think it really pays off - but yes, it takes a bit more effort than watching a youtube video..
>But as a concrete example, to understand the Fourier transform it probably took me a couple of weeks of reading different textbooks in my spare time to really grok how the math works here.. it took some effort and I'm not some autistic savant that sees things in equations - so it has nothing to do with learning styles.
So you're still claiming your experience covers all.
When did I say anything about not understanding the math of the FT? I took a whole semester of classes in signals, control theory, comms, etc. Derived FT, FFT, and friends, from scratch. I understand the math. But I have limited intuition about what I can do with the math. I can look it up in a book, or plug and chug, but it's not very fluid. The amount of times I've needed to manipulate equations since that class is zero.
>Looking at some diagrams and videos for certain topics is a lazy and harmful approach to learning
god my eyes can't roll farther back in my head. your point would be less absurd if it weren't for the fact that algebraic geometry is an enormously powerful and productive area of math whose whole ass founding principle is that geometric interpretations of algebraic objects are extremely useful.
AND
spectral theory (what fourier analysis is called today) is an extremely geometric area owing to the dependence of convergence on the metric.
>In the grand scheme of things, having a half-assed popsci understanding is generally worse than not know anything
yes because this is a resource for rocket scientists and not interested people (probably high school and college students)?
>If you really work with signal processing and don't understand the mathematics of the fourier transform then that's frankly a little scary.
bro this isn't oppenheim. it's just a nice intro for people that might be curious or are learning for the first time.
>Why is there always this pushback against "popsci explanations" in these threads?
because underneath all of the rhetoric about entrepreneurialism and "disrupting" thing tech (and by extension HN) is one of the most conservative and elitest communities that has ever existed. think about what people used to say about dynamic languages vs statically typed, about frontend vs backend, about sql vs nosql, etc etc etc. stodgy old engineers have always (and will always) resist the bar being lowered and newcomers being admitted.
edit:
what's funny is that the opposite response shows up, invariably, on posts that are very precise and technical. i.e. something like "this is too technical i wish mathematicians would accommodate us lay folk".
> stodgy old engineers have always (and will always) resist the bar being lowered and newcomers being admitted.
I try not be elitist or unwelcoming to newcomers, but yeah uh why should we lower the bar? Are the newcomers somehow incapable of learning engineering at the same level that the old ones did? Its no fun to work with people who aren't as motivated and cry for a lower bar instead of rising to the challenge. Its easy to work with someone more motivated than you but it sucks to work with someone less motivated.
> The complex plane is a crutch that gives you pretty pictures.
That’s a very poor outlook on an incredibly useful mathematical tool. Maths is done essentially by finding new ways of viewing an old thing, and restricting the set of tools available seems like a poor idea in general. As for teaching mathematics, finding more ways of looking at something, however imprecise, is a huge aid to understanding. Never be afraid to show someone a picture that’s only 80% true but could potentially aid their understanding in how something works!
I also disagree that the “circular paths” visualisation is misleading - what is misleading about it? This is a completely accurate representation of how a Fourier series is added up to get a real part which varies over time. It even makes it intuitive that multiplying the series by a fixed complex number (corresponding to rotating and scaling the “orbiting” points on the left) will have the effect of translating the resulting signal to the right or the left.
Also for what it’s worth, the product of complex numbers is easy to visualise? Add their angles, multiply their magnitudes.
B/c it actually adding a layer of complexity that's not necessary for the sake of getting a pretty picture. I'll try to illustrate..
The roots of unity are just trivially observed mathematically
Through Euler's identity: e^i2π = cos(2π) + isin(2π) = 1
then obviously (e^i2π/b)^b = 1
So then e^i2π/b is a b'th root of 1. Done!
You don't need a complex plane.. You don't need to explain why you're multiplying 2D points.. and then explain that now you actually have this new operation defined and it's not really a 2D point oh and btw this multiplication is giving you a rotation.. which rotates your point back to 1+i0 (which is maybe the crummiest part b/c 2D rotations are not intuitive and visualizations should leverage our "gut feeling" and intuition). You just skip all of that! The equations are easy. Everyone knows (A^b)^c = A^bc
Then you show that (e^i2π/b)^(b+1) = (e^i2π/b) - so it's "cyclical", the equations starts returning previous values at higher integer inputs.
And then you show orthogonality between basis vector by just taking two basis vectors and doing a dot product. If you write out an example it's really easy to see how the values pair off and cancel out. There is no easy visual equivalent. You can't see how a real/imaginary sinusoid pair is orthogonal to another.. If you try to force a visualization it's only going to be more confusing.
And actually, b/c it's hard to visualize this last step is usually sorta glossed over in intro material. But I'd argue it's the most important step b/c it's the whole reason why are we choosing these extra-challenging e to the i things to build a basis... it's for this orthogonality. If we just wanted sinusoids that are cyclical then we'd be doing everything with real numbers and keeping it simple. You'd cross correlate a sinusoid at different frequencies for instance or something to that effect... but the fourier transform is giving you something much better than that. And as a bonus you see why the chosen frequencies are discrete and not arbitrary.
All operations have to be defined, and learned first. That's like saying that you can't multiply two matrices... (Though it does annoy me when a symmetric symbol is used for non-commutative operations !)
Oh woah, I've never seen that visualization - very cool. I only meant to really highlight that often things are thrown on the complex plane and presented in introductory material as making the problem in 2D. The few times I've seen this (physics, math and signal processing classes) I've never actually seen it emphasized anywhere that the complex plane is fundamentally different from a cartesian 2D plane. It's the crux of the whole spinning unit circle stuff and then why you get orthogonality
I still find the visualization completely unnecessary b/c it doesn't help illustrate or give any kind of intuition for why the basis vectors are orthogonal
Is what bothers you that the complex plane has this very specific feature that purely imaginary x purely imaginary = real, so you're "mixing" orthogonal dimensions ?
However, I still don't understand your reticence, since imaginary numbers are fundamentally about cycles and rotations (via exponentiation), and rotations fundamentally require having 2 dimensions to represent them : hence a plane !
(Like you say, you also need 2 dimensions to be able to have orthogonality...)
And I'm getting a bit rusty here, but I guess that imaginary being orthogonal to real might be why these Fourier basis vectors are orthogonal ?
I mean, it's literally Euler's formula :
exp(𝛕iθ) = cos(𝛕θ) + i sin(𝛕θ)
I don't remember, aren't those 2 orthogonal basis vectors "cos" and "sin" ?
I guess what bothers me is that visualizations are supposed to leverage our intuition and gut feeling. Like in Geometry, the teacher tell you that when the lines are parallel, the angels on on a bisecting line are the same. You "see" it on the image and it immediately makes sense.
When you have a weird new operation that doesn't behave in any relatable way then the visualization is effectively useless. I explained a bit further here how you can completely skip this pseduo-2D stuff and things remain very clear:
Maybe the 2D plane is useful in other contexts and it's an analogy that's worth internalizing - I don't wanna knock it entirely - but in the case of Fourier Transform it's a high cognitive load for something that's actually not that important and easy demonstrated from the roots of unity.
And the net effect is that the stuff that can't be visualized, and that is arguably more important, is glossed over b/c it's not pretty enough
- the "inner product" operation (integral of the product of the two functions): imagine what would happen as you approximate functions as discrete vector with a very high number of dimensions/co-ordinates and computed the dot-product between those two vectors, but scale the result to be invariant of how many dimensions you used to approximate it => you get the integral formula
- Now, it's just normal linear algebra:
- The "length" of one of these vectors can now be thought of as the square root of the inner product of the function with itself
- The "distance" between two functions can now be thought of by subtracting one function from the other, to get a new "vector/function", and compute its length
- The cosine of the "angle" between two functions is the dot product between two functions scaled to have length 1
- The functions describing a sine or cosine wave are vectors which have a inner-product against themselves of 1, and a dot-product against any other frequencies of 0
- Thus the different frequency functions/vectors form an orthonormal basis
- This means that you can find the co-ordinates of any function by taking the inner product of the function against each fourier basis function
- The "co-ordinates" of your function w.r.t. the orthonormal basis can be computed by taking the inner product against each basis function/vector
- This will be the point that minimizes the distance to your actual function
- These "co-ordinates" are the fourier co-efficients for the fourier series representation of your function
- For non-periodic functions, you can take the limit as your period goes to infinity, that gives you the fourier transform representation.
So you got through 2 or 3 years of calculus at least a year of abstract algebra and probably 2 years of linear before you could understand what a Fourier transform was?
The most intuitive explanation I've seen, although a bit abstract, is that Fourier transforms, and the other Fourier-like transforms including the s and z transforms, are just about changing the system of coordinates of the data vector using the transform's generative vectors.
Indeed we can easily see that the definition of all those transforms take the form of the sum of a scalar product that computes the projection of the data vector unto the kth dimension in the transformation (e.g. frequency) space.
This is the interpretation that leads you to imagine what a fractional Fourier transform might be: a rotation other than 90 degrees in the vector space you described. Interesting, if esoteric.
I was wondering recently whether some kind of 45° rotation transform could be used to represent equally well both "sides" of Heisenberg's unsharpness principle ?
I see this particular mistake crop up a lot, and it's a bit pedantic, but I don't get the impression that it's being intentionally ignored for educational reasons: The author repeatedly describes "the" recipe, "the" set of frequencies, .... The standard Fourier transform is only unique conditioned on the fact that we've agreed on a particular set of basis functions ahead of time. It isn't even the only choice of orthogonal exponentials we could have picked.
In terms of real applications that point often doesn't matter much -- especially in fields like lossy compression where you might not actually care much about any underlying meaning in the representation, and especially in fields like radio where your data is in some sense fundamentally represented as combinations of sinusoids -- but in general it isn't a sound leap to go from "this Fourier coefficient is big" to "there exists a big sinusoid that semantically matters to my data."
If you add some additional structure of remembering that the domain of these transformations is an abelian group, then there is actually a completely canonical version of the Fourier transform, whose domain is the dual group - the only non-canonical part is picking labels for elements of the dual group, but there is no other choice of basis functions. (Wikipedia gets a little into this, you can find it under "Pontryagin duality" on the Fourier transform page, but it's not really enough of an explanation to learn from).
Take the discrete Fourier transform for example, whose domain group is Z/nZ: the integers-mod-n for some fixed n. The basis functions for the transforms need to be group homomorphisms f: Z/nZ -> C, meaning that they need to satisfy f(a + b) = f(a) f(b) for all a, b. Without too much trouble you can work out that the only such functions are of the form f(k) = exp(2 pi i m k / n) for m = 0, ..., n - 1. So once the group structure is remembered, the choice of basis is forced. The only ambiguity is relabelling the basis elements back into Z/nZ, which is a (very useful) hack.
The same reasoning (with more complicated working) applies to taking the Fourier series of a periodic function, or taking the Fourier transform of a continuous function on the real line. There is a canonical choice of basis, but some ambiguity in what labels to assign to those basis elements.
Since you could be in any basis before taking the Fourier transform, it's more like there's a canonical change-of-basis than there is a canonical basis. Time domain might not be the system's most natural basis.
Linear time-invariant physical processes are so common in the real world that it's not unreasonable to expect sinusoids to semantically matter... At least when the data pertains to physical systems.
Certainly. It's super reasonable to think that they might matter and to investigate that approach; it's unreasonable to take as an axiom that they matter and use that as proof of other facts.
Nope, I just use HN search to find them - but I have software that makes it fast to do, including formatting the lists. Previous times this came up, if anyone is curious:
I'm thinking of adding code to macroexpand all HN links (to submissions) that people post, into the same format I'm using above: "Title - <url> - mm/yy - (number of comments)". Can anybody think of a downside to doing that?
Longer term, we'd like to make this a community thing so everyone can curate these lists collaboratively. There are often major related threads that don't show up in the obvious HN searches, because their titles didn't include the obvious search terms. For any such thread there's probably at least one HN user who remembers it, and I'd love to give them a chance to add it.
There's also the related (no pun intended) question of how to bundle together related articles on current ongoing stories, whether or not they have HN threads.
Why? I totally understood it. Then again, I have a background in chemistry, HPLC-MS, NMR, AA, FTIR, so I'm used to the idea of "breaking a mixture into orthogonal components."
Because "recipe for a smoothie" analogy is too broad and doesn't help you understand how the Fourier Transform is different from a Ruby script or a bill of materials for a chair.
Not to mention that the idea of "given a smoothie, it finds the recipe" is impossible and makes no sense in the first place.
If you're looking for deeper explanations, I recommend "Practical Signal Processing": https://www.amazon.com/Practical-Signal-Processing-Mark-Owen...
Many of the chapters are here: https://battle.shawwn.com/sdb/books/PSP/ but buy the book if you can afford to.
In particular, Chapter 4's "untwisting" intuition is the best I've found: https://imgur.com/a/r7KvO60
Another way of phrasing it: You know that effect when a wheel is spinning so fast, it looks like it's going backwards? If it looks like it's standing still, there's a Fourier coefficient at that rotational frequency.
... Or at least, I think there is. It's hard to be precise with analogies.