More

prosim · 2025-08-14T00:36:33 1755131793

tl;dr: Vuln only possible by placing Copilot into YOLO mode. And it's fixed with the August Patch Tuesday release.

prosim · 2025-02-28T02:39:47 1740710387

To quote from the article: "These repositories, belonging to more than 16,000 organizations, were originally posted to GitHub as public, but were later set to private [..]" Once things are public, they will forever remain public (in some form). That's how the internet works.

prosim · 2025-02-28T02:38:33 1740710313

Flagged yesterday already. Still a non-story today: https://news.ycombinator.com/item?id=43183333

prosim · 2025-02-26T13:56:38 1740578198

tl;dr Bing indexed and cached a public repository, then made it available to its AI chat. Later, the repository author switched the repository to private and understood how the internet works. And the story gets only better as the author is the founder of a “cybersecurity” company.

daeken · 2025-02-26T14:12:58 1740579178

I'm baffled. There isn't even the seed of a story here, just someone not understanding that if you put data out there, the data is [checks notes] out there.

dathinab · 2025-02-26T14:19:24 1740579564

there are also the various fundamental security issues GitHub has where making a repo private (and a few other cases e.g. related to forks) comitts/content _which was never public_ (e.g. pushed after it was made private) are publicly available.

whalesalad · 2025-02-26T15:32:57 1740583977

techcrunch has never been good in the journalistic integrity department, it's a tech tabloid.

prosim · on Sept 10, 2024

Is this a fork of https://github.com/intitni/CopilotForXcode? OSS under MIT license, which CoderKit is not.

vicinnoCoderKit · on Sept 11, 2024

Sorry to reply late. We don't have any dedicated marketers to watch the communication. We do get inspired from it and other projects and just added a credit statement in our GitHub page. Hope that addresses your concern. About being OSS, we are not ready to do it right now on both code and business strategy side. May do it later.

prosim · on July 11, 2024

He started at GitHub 3 years after the article was written. Don't think GitHub's interview process takes that long. ;)

qakjfa · on July 11, 2024

That is a good point for this individual article!

However, the broader issue that Microsoft has infiltrated OSS and its organizations successfully by hiring and donating remains. It would not surprise me at all if they now hire people with an ostensibly "freedom fighter" background for credibility.

Look at how many people here cite his (former?) membership in the Pirate Party for credibility! Party membership means nothing. Politicians (in general!) change their minds, can be bought, etc. The Green Party in Germany started out as a peace party and has been used repeatedly to lend credibility to the Kossovo and other wars.

alyma · on July 11, 2024

GitHub was just the logical progression from okfn:

https://blog.okfn.org/2022/03/03/microsoft-to-support-open-d...

Today, we are pleased to announce that Microsoft will once again be supporting Open Data Day by providing mini-grants to organisations to help them run events, the call will launch on Open Data Day 2022.

They also supported "Open Data Day 2021". Sounds like a nice trojan horse to influence EU legislation through purported activists.

prosim · on July 11, 2024

Except the article is dated in URL and sidebar to July 5, 2021.

prosim · on Jan 30, 2024

Depends on how you define "embrace". Services like Ubicloud violate the GitHub Terms of Service.

LOLwierd · on Jan 31, 2024

Can you elaborate on how and why?

prosim · on Oct 5, 2023

That’s no longer true. Copilot uses the same ChatGPT-3.5 model as, well, ChatGPT. If it were trained on just GitHub projects, the chat features wouldn’t work at all.

lolinder · on Oct 5, 2023

You're assuming that Copilot Chat and the regular completion are the same model. Do you have a source that says so? I'd assumed that they were two different models, since they're quite different tasks.

prosim · on Oct 5, 2023

Footnote 1 on page 2 explicitly mentions the 3.5 model and the research in this paper is only about auto completion: https://arxiv.org/pdf/2306.15033.pdf

And this blog post states “beyond Codex”, again for auto completion: https://github.blog/2023-07-28-smarter-more-efficient-coding...

Lastly, OpenAI states on the original Codex page: “OpenAI Codex is a descendant of GPT-3; its training data contains both natural language and billions of lines of source code from publicly available sources, including code in public GitHub repositories.” - It included GitHub repos, but it never was only GitHub repos. https://openai.com/blog/openai-codex

Update: GitHub Community Manager confirms it here: https://github.com/orgs/community/discussions/56975#discussi...

lolinder · on Oct 5, 2023

As I said to the other commenter, I specifically avoided saying "only", I said "primarily", and I should have clarified that I meant "primarily fine-tuned". My point is simply that it is far more likely to spit out results that are patterned after GitHub than results that are patterned after any inter-programmer communications.

Also, I wasn't contesting that autocomplete uses GPT 3.5 as the base model, I was contesting the idea that it uses the same derivative model as chat.

empath-nirvana · on Oct 5, 2023

you don't need to use the chat to test whether it was exclusively trained on github or not.

I started typing this in my IDE in a rust project I already had open:

// the following function prints out the best soccer players of all time, which teams they played for, and why they were great as a hashmap fn soccer_greats() -> Hashmap<String,(Vec<String>, String)> { let mut hm = HashMap::new();

it completed with the following: hm.insert("Pele", (vec!["Santos", "New York Cosmos"], "He scored over 1000 goals in his career".to_string())); hm.insert("Maradona", (vec!["Boca Juniors", "Barcelona", "Napoli"], "He won the world cup with Argentina".to_string())); hm.insert("Messi", (vec!["Barcelona"], "He has won the Ballon d'Or 6 times".to_string())); hm.insert("Ronaldo", (vec!["Barcelona", "Inter Milan", "Real Madrid", "AC Milan", "Corinthians"], "He won the world cup with Brazil".to_string())); hm.insert("Zidane", (vec!["Juventus", "Real Madrid"], "He won the world cup with France".to_string())); }

I don't believe that information is going to be on github anywhere, but i could be mistaken.

lolinder · on Oct 5, 2023

You're addressing a straw man, I never claimed it was "exclusively" trained on GitHub. I said "primarily", though I should have been specific and said "primarily fine-tuned".

In the context of the person I replied to, the point is that it isn't made up primarily of a bunch of communications between programmers.

arp242 · on Oct 5, 2023

You need to indent code with two spaces for it to render as-is.

prosim · on Oct 5, 2023

They did not prompt at all. They used GitHub’s code search to find projects where the repo owner specified that the code was generated “by Copilot” and the authors took that at face value for all code in the project. Whether the code was actually suggested by Copilot is not at all analyzed in the paper. As such, the results are highly questionable.