So I just searched my email on HIBP again. Most of the leaks I see there were from old websites I hardly cared about securing from many years ago. But, in general, how do I find out what has actually been leaked (if it's not website specific)?
I'm not going to change all of my passwords every time a random website that I used briefly ten years ago leaks my low effort password.
There are sites for searching for your (or anyone else's) publicly revealed information, but the one free one I knew of was forced offline.
Downloading the datasets--there are so many with so few options to obtain them. The mega-compilations likely won't include everything, either, like your license plate numbers or all your compromised addresses, nor the site from which hackers stole it.
So basically don't bother. If you want the same experience, open up notepad, HIBP, and your password manager, and make a little doxx file on yourself, in CSV or JSON.
I use separate emails for all accounts and that get's me in trouble when companies "consolidate" accounts because "everyone uses the same email for all accounts". Your good idea might be true, practice is not.
The parent was talking about different passwords, not different emails. But I'm curious, what does it mean for a company to consolidate accounts? How would that be done to your separate accounts automatically, and what trouble does it cause? And what is the normal case where people have multiple accounts all with the same email?
I just don't understand the circumstance you're describing.
Exactly! Then you write each password down in your notebook of passwords and pat yourself on the back for how hard it would be to compromise all your accounts in one go ;)
If I understand this correctly, it translates Rocq to C++? Took me several minutes to even understand what this is. Why is it called an extraction system? Who is this for?
I'm confused.
edit: I had to dig into the author's publication list:
Testing remains a fundamental practice for building confidence in
software, but it can only establish correctness over a finite set of
inputs. It cannot rule out bugs across all possible executions. To
obtain stronger guarantees, we turn to formal verification, and in
particular to certified programming techniques that allow us to de-
velop programs alongside mathematical proofs of their correctness.
However, there is a significant gap between the languages used
to write certified programs and those relied upon in production
systems. Bridging this gap is crucial for bringing the benefits of
formal verification into real-world software systems.
That's essentially correct. Extraction is a term in roqc. A rocq program contains both a computational part, and proofs about that computation, all mixed together in the type system. Extraction is the automated process of discarding the proofs and writing out the computational component to a more conventional (and probably more efficient) programming language.
The original extractor was to ocaml, and this is a new extractor to c++.
Just like JavaScript folks like calling their compilers "transpiler", proof assistants folks like calling their compilers "extraction". Essentially it's a compiler from a high-level language to a slightly lower-level, but still reasonably high-level language.
Simplifying a bit, a compiler tr(.) translates from a source language L1 to a target language L2 such that
semantics(P) == semantics(tr(P))
for all programs in L1. In contrast, and again simplifying a bit, extraction extr(.) assumes not only language L1 and L2 as above, but, at least conceptually, also corresponding specification languages S1 and S2 (aka logics). Whenever P |= phi and extr(P, phi) = (P', phi') then not just
semantics(P) == semantics(P')
as in compilation, but also
semantics(phi) = semantics(phi'),
hence P' |= phi'.
I say "at least conceptually" above, because this specificatyion is often not lowered into a different logical formalism. Instead it is implied / assumed that if the extraction mechanism was correct, then the specification could also be lowered ...
I'm not entirely sure I fully agree with this definition; it seems somewhat arbitrary to me. Where is this definition from?
My usual intuition is whether the generated code at the end needs a complicated runtime to replicate the source language's semantics. In Crane, we avoid that requirement with smart pointers, for example.
This definition is my potentially flawed attempt at summarising the essence of what program extraction is intended to do (however imperfect in practise).
I think extraction goes beyond 'mere' compilation. Otherwise we did not need to program inside an ITP. I do agree that the state-of-the-art does not really full reach this platonic ideal
I have another question, the abstract of your paper says that you "provide concurrency primitives in Rocq". But this is not really explained in the text.
What are those "concurrency primitives"?
We mean Haskell-style software transactional memory (STM). We call it a primitive because it is not defined in Rocq itself; instead, it is only exposed to the Rocq programmer through an interface.
I'm the other dev of Crane. Our current plan is to use BRiCk (https://skylabsai.github.io/BRiCk/index.html) to directly verify that the C++ implementation our STM primitives are extracted to matches the functional specification of STM. Having done that, we can then axiomatize the functional specification over our monadic, interaction tree interface and reason directly over the functional code in Rocq without needing to worry about the gritty details of the C++ interpretation.
I'm not an expert in this field, but the way I understand it is that
Choice Trees extend the ITree signature by adding a choice operator. Some variant of this:
ITrees:
CoInductive itree (E : Type -> Type) (R : Type) : Type :=
| Ret (r : R)
| Tau (t : itree E R)
| Vis {T : Type} (e : E T) (k : T -> itree E R)
ChoiceTrees:
CoInductive ctree (E : Type -> Type) (C : Type -> Type) (R : Type) : Type :=
| Ret (r : R)
| Tau (t : ctree E C R)
| Vis {T : Type} (e : E T) (k : T -> ctree E C R)
| Choice {T : Type} (c : C T) (k : T -> ctree E C R)
One can see "Choice" constructor as modelling internal non-determinism, complementing the external non-determinism that ITrees already allow with "Vis" and that arises
from interaction with the environment. (Process calculi like CCS, CSP and Pi, as well as session types and linear logic also make this distinction).
There are some issues arising from size inconsistencies (AKA Cantor's Paradox) if / when you try to fit the representation of all internal choices (this could be infinite) into a small universe of a theorem prover's inductive types. The ChoiceTree paper solves this with a specific encoding. I'm currently wondering how to port this trick from COq/Rocq to Lean4.
If you're an OMSCS student, most courses offer the download through Ed or Canvas. Usually it's a big zip file under the first lesson, but I've seen some available in the shared Dropbox. I've seen this for GIOS, ML4T, ML, and a few others. Or you can just reach out to the TAs.
If you're not a student, then it gets a bit tricky. Some courses are available as YouTube playlists or on Coursera, but then it becomes a hassle to download and piece together hundreds of individual files.
Feel free to drop me a note (email in my profile), or open an issue on github.
Thanks for your reply. Unfortunately I'm not a student there. I just saw that they were making some of their lessons publicly available and wanted to organize the material for myself. I'm experiencing their courseware through the 2 minute long micro lessons on the Ed platform and I don't see any way to download the videos.
Some courses are widely available on YT [1], and already in the more palatable (IMO) long-form format instead of hundreds of 1-2 min snippets. Some other courses you can find download links somewhere [2].
So yeah, it's a bit of a hassle, and but you can probably still piece it together for some/most courses that are publicly available.
Hi is there a way to view the lectures in a more traditional way? For example, as one long video? I'm seeing lessons broken up into 2-5 minute long videos.
In reference to the open courseware, is there a way to either just download all of the videos in bulk, or view them as part of a single video? It looks like they're broken down into ~2 minute long video clips through the Ed platform, which is very annoying.
Annoying indeed. I created a script using ffmpeg to merge all the 2-min clips into a longer video per chapter[1], so I could watch the lectures on my commute.
You may need to tweak for different courses, but I've used for ML4T, GIOS, and ML, and it has been incredibly helpful.
As noted in another reply, the natural numbers example is contrived, but illustrative. Nevertheless, if you have a set theoretical foundation, e.g. ZF/C, at some point you need to define what you are doing in that foundation. Most mathematicians do not and happily ignore the problem. That works until it dont. The whole reason Vladimir Voevodsky started work on HoTT and univalent foundations was that he believed we in fact DO need to be able to pull definitions back to foundations and to verify mathematics all the way down.
You encounter abusive language/notation basically everywhere in math. Open up a calculus/real analysis textbook. A lot of the old ones write sequences in the curly brace/set {x_n} notation:
"let {x_n} be a sequence"
As the author points out, a sequence is a function. The statement {x_n} is the set of terms of the sequence, its range. A function and its range are two different things. And also sets have no ordering. It might seem like a minor thing, but I thought we were trying to be precise?
A second example: at the high school level, I'm pretty sure a lot of textbooks don't carefully distinguish between a function and the formula defining the function very well.
The author of this web page has a section on what he calls "double duty definitions". Personally, I don't find anything wrong with the language "let G=(V,E) be a graph". G is the graph and we're simultaneously defining/naming its structure. So, some of this is a matter of taste. And, to some extent, you just have to get used to the way mathematicians write.
Right, the notation of derivatives is also totally confusing, especially for things like the chain rule. The sloppy way to write it seems very intuitive but isn't precise. To write that in a precise way, you have to use the vertical-bar-on-the-right (e.g. d/dx f(x) | x=0, but typeset properly), variable names etc. In high school I rewrote a lot of the textbook stuff in a super explicit version for myself like this and it got very verbose of course, but gave deeper understanding.
Same in college when learning the Fourier transform, a stumbling block was that the prof didn't properly explain that it takes a function as a whole and gives a whole new function as output. When you first learn this concept, it's a bit of time to wrap your head around, but when it clicks, everything makes more sense. But just writing F{sin(x)} = ... seems like F acts on a concrete value. A more explicit way would be F{x->sin(x)}={x->...}
Of course once you already know these fundamentals and they are baked into your brain and take them for granted, it's hard to see where beginners get confused, and writing in short hand is so much easier so you get sloppy while still unambiguous to experienced people.
This is why I always preferred to see coded-up demos and implementations as opposed to formulas on blackboards and slides. If you have to implement it, you can't handwave away things as pedantry. It forces precision by default.
Fully agree with this thought. Sloppy notation has been a hindrance to my personal understanding many times. Math made more sense for me when I could code it up and see the function doing things. Everything explicit vs inconsistent shorthand that saved the author a few pen strokes.
Which is why I am so favorable of Jupyter notebook-like teaching environments. Embed the (guaranteed to execute!!! no illegal shorthand) code so that learners can get a true representation that can be manipulated. Although, I think they are still unlikely to reshape education - now you require some coding fluency + the niche math topic.
I think there's also a bit of sneakiness in that by not requiring actual understanding, it's possible to memorize equations and pretend to understand, and then pretend that students got more understanding than in reality.
> abusive language/notation basically everywhere in math
In most cases it is not as much abusing notation as overloading it. If you think of the context of a formula (say, adjacent paragraphs) as its implicit arguments (think lambda captures in c++), then it is natural that curly braces can denote both a set and a sequence, depending on this implicit input.
Such context dependent use of symbols is actually rather convenient with a little practice.
"it is natural that curly braces can denote both a set and a sequence, depending on this implicit input."
?
I don't even know where to begin. Overloading symbols in mathematics occurs all over the place. There's nothing wrong with that. The difference between overloading a symbol and abusing it is whether there is an agreed upon definition/convention regarding its use and to what extent its use conforms to that definition/convention. What I'm saying in my original post is that the statement "{x_n} is a sequence" disagrees with the formal idea of what a sequence is and that most writers don't bother to explain their own notational use.
If you wish to re-define the curly braces to have a context-dependent meaning, knock yourself out. But, I would imagine that that practice would confuse a lot of people. Math is a human activity. It's not a programming language.
I remember being very pleased when I first encountered what might be an abuse of notation, but is nonetheless super convenient: \Sum_{s \in S}, where s is an element of a set S. Or, even better, just: \Sum s_i. Of course, this is defensible as I was writing out 100 summations in a stats course, not writing teaching material.
My guess is that most mathematicians wouldn't consider that notation abusive. Why do you think it is? That notation is a convenience that allows us to represent a sum over a set where the elements aren't indexed by integers. So, in this particular case, I think there is a utility to the notation. And also: the definition of \sum_{e\in S} is ubiquitous.
I'm not a mathematician, but I went decently far into math and hardly ever encountered a summation over non-indexed elements, or really anything beyond the standard \Sum_{i=1}^{n}, even up to my final math courses.
I wasn't aware of its ubiquity! I may only think of it as "abusive" due to lack of familiarity. The way I've seen it used is: \Sum_{e \in S} e_i, where 'i' is never explicitly defined, and this still assumes elements indexed by integers. The only utility seems to be from the abbreviation, leaving out the range of indices being iterated over. Not saying that isn't useful, but the rigor of the math probably doesn't benefit from time-saving omissions.
I'm tempted to call that notation simply wrong instead of abusive. Generally "abusive" notation, while technically wrong, has some redeeming feature in intuition or consicebess.
In this case, the alternative notation would be to simply drop the index and write "\Sum_{e \in S} e", which seems to be all around better.
From having spent way too much time doing technical writing; I'm tempted to say the notation you are recalling really was a mistake. They probably started out with "\Sum_{e \in S} e", then decided to make all summations be index based instead of set based. Unless you spend a lot of time proofreading, that type of style change can easily lead to half translated expressions like what you recall.
I'm not going to change all of my passwords every time a random website that I used briefly ten years ago leaks my low effort password.