Hacker News

Next steps for BPF support in the GNU toolchain

112 points by signa11 4 months ago | 26 comments

I wish these articles would have a one sentence description of what BPF stands for. It would help passers-by.

tremon 4 months ago

Alternatively, the use of BPF without explanation signals that the article is a deep-dive and not intended for random passers-by. I don't support the position that every article should be written to cater to the lowest common denominator.

The first line of the article:

> Support for BPF in the kernel has been tied to the LLVM toolchain since the advent of extended BPF.

Should the article also explain which kernel they're referring to, what LLVM is and stands for, and highlight the differences between BPF and extended BPF? Or are they allowed to expect a motivated reader to do a cursory web search to fill in the gaps in their knowledge?

phiresky 4 months ago

I disagree, you can always spend one or two sentences at the top to immediately bring everyone to a good starting point, regardless of how much technical depth the rest of the article has.

For example in this case: "eBPF is a method for user space to add code to the running Linux kernel without compromising security. They have been tied [...]. The GNU toolchain, the historical and still by many preferred system to build Linux currently has no support.

The description of what LWN and Linux is would be in the about page linked in the article.

It costs almost nothing for an expert to skim/skip two sentences while saving loads of time for everyone else.

The article is also completely missing motivation (why do we care whether BPF is supported in the second toolchain?) Which would be helpful for almost everyone, including people who think it is obvious.

Edit: To be clear though, I love LWN. But the articles are very often missing important context that would be easy to add that I suspect would help a large portion of the reader base.

spaceywilly 4 months ago

A nice practice that I try to follow it to always spell out what any Three Letter Acronyms (TLAs) the first time they are used. Then from that point onwards the simple TLA can be used.

In this case, BPF (shorthand for eBPF), stands for Extended Berkley Packet Filter. It’s a relatively new feature in the kernel that allows attaching small programs at certain “hook points” in the kernel (for example, when some syscall is called). These programs can pass information into userspace (like who is calling the syscall), and make decisions (whether to allow the call to proceed).

More info here https://ebpf.io/what-is-ebpf/

corbet 4 months ago

We do try to spell things out and/or link them in LWN articles to make the context available, but some things we just have to assume.

Additionally, spelling out "Berkeley Packet Filter" is not going to help any readers here; BPF is far removed from the days when its sole job was filtering packets, and that name will not tell readers anything about why BPF is important in the Linux kernel.

dustbunny 4 months ago

I generally agree but for BPF they actually just took over the meaning and it no longer means "Berkely Packet Filter"

cassepipe 4 months ago

Here is what I gathered without really ever checking.

It's a safe script that has access to part of the kernel and that unlocks a lot of monitoring. You could use a kernel module that's much unwieldy, error-prone etc.

How correct am I ?

ggm 4 months ago

Llvm is a different licence not illegal to examine. How to manage pointer+offset address integrity/legality inside the kernel (for instance) has a proof by examples a-plenty in the other code. You don't have to invent a totally unique way of doing things unless you want to.

ajross 4 months ago

> Llvm is a different licence not illegal to examine.

As others are pointing out, rigorous application of copyright precedent argues in the other direction.

But I agree that it's really sad to see this is where we are in the community. The Apache license isn't some crazy monstrosity, it's literally free software per the FSF! It's "additional requirements" that the GPLv2 bumps into are things like the patent grant that we all agree are good things. And it's not incompatible with GPLv3!

Yet no one can work together. GPLv2 projects won't relicense to Apache or GPLv3, GPLv3 proponents won't link to GPLv2, corporate sponsors have refused to use GPLv3 at all. Everyone looks at these historical warts and incompatibilities as fortress walls around their own worlds and has forgotten that the only reason these licenses exist in the first place is that we all agree (or used to) that software is better when we can all share it.

But apparently it's not, because Everybody Else wants to share it in the Wrong Way.

I feel very old sometimes.

rapidlua 4 months ago

> How to manage pointer+offset address integrity/legality inside the kernel (for instance) has a proof by examples a-plenty in the other code

Let me provide some context here. These annotations aren’t there to help the compiler/linter. They exist to aid external tooling. Kernel can load BPF programs (JIT-compiled bytecode). BPF can invoke kernel functions and also some kernel entities can be implemented or augmented with BPF.

It is paramount to ensure that types are compatible at the boundaries and that constraints such as RCU locking are respected.

Kernel build records type info in a BTF blob. Some aspects aren’t captured in the type system, such as rcu facet, this is what the annotations are used for. The verifier relies on the BTF.

mustache_kimono 4 months ago

> Llvm is a different licence (sic) not illegal to examine.

FSF considers Apache 2 incompatible with the GPL2 because of its "additional conditions".

I happen to agree with you that, at the very least, we haven't fully grappled with the fact that FOSS, like the Linux, is published to the Internet, and freely available to read, by anyone. Obviously, there should be a distinction between reading and copying, just like there is a distinction between reading and copying a literary work.

The issue -- as I see it -- is that many GPL fanatics just don't see it the same way. I believe Linus has even opined that any filesystem which was developed after Linux, whose developers are aware of Linux, could be considered a "derived work". This is of course ridiculous, but the GPL, if read without due care and funneled through social media slop, can be ridiculous.

jcelerier 4 months ago

It's of course not ridiculous. It's why black box reverse engineering exists and is generally legal, while white box reverse engineering is generally illegal. It doesn't matter whether it applies to proprietary or free software, copyright applies equally to both.

loeg 4 months ago

Copyright applies to literal reproduction of documents; not to ideas. It is straightforwardly allowed to read some implementation of an idea, think about it, and write your own implementation of the same idea.

bayindirh 4 months ago

IBM published its initial BIOS code in a manual bundled with the PC. Having any knowledge of it, even if you don't implement it verbatim makes you tainted, and makes you guilty in any of the subsequent cases.

This is why Black Box Reverse Engineering Exists.

Same is true for console reverse engineering. No self respecting reverse engineer reads code leaks from official console development. Otherwise they'd be in a legal hot water.

This is serious stuff and there's no blurry line in this.

netbsdusers 4 months ago

There's no legal concept called "tainting". Black boxing is just a means by which you try to make yourself completely irreproachable (for if you haven't seen something, then it's outright impossible that you copied it). It obviously doesn't follow that the converse is true. If it were, musicians would not listen to other's music nor would painters look at other's paintings!

schoen 4 months ago

The part that they're trying to avoid is the "access" in "access and substantial similarity"

https://en.wikipedia.org/wiki/Substantial_similarity

(usual factual elements in determining the possibility of a copyright infringement in U.S. law).

I agree with you that it's possible in principle that copyright infringement would not be found even when there was evidence of access. But I think the courts would usually give the defendant a higher burden in that case. You can see in the Wikipedia article that there has been debate about whether access becomes more relevant when the similarity is greater and less relevant when the similarity is less (apparently the current Ninth Circuit standard on that is "no"?).

psychoslave 4 months ago

I agree with all that (that is on actual legal basis, not personal preferences on societal structure), and it’s all the more frustrating to think about the power asymmetry in the era of LLM trained on basically every written material that can be found out there, and in that case oh yes you absolutely can go with it regardless of how verbatim some code they output can be.

bayindirh 4 months ago

The more disturbing thing about LLMs is, it's "fair-use" to scrape everything for them, but it's a liability for the user to use the code, text, whatever.

If it emits a large block of copyrighted material, you'll be again in legal hot water.

Considering even fair-use can be abused (see what GamersNexus is going through) at-will, it looks even more bleaker than at first glance.

bawolff 4 months ago

That seems totally reasonable to me. A large part of fair use is about the purpose of the use. It seems like a reasonable compromise that what is fair use in one context might not be in another. I can't think of any alternative to fair use that would make more sense.

I think the only unreasonable part is llm companies are implicitly or sometimes explicitly advertizing their products output as being fit for use in other projects. I think that is a false advertising problem.

bawolff 4 months ago

You're correct that copyright does not apply to ideas only implementations. However if you take an existing implementation and base yours on it, generally the original work's copyright applies (there are a whole lot of details this is skimming over).

As an example, if you take a painting someone else made, and try and make your own version using the original as a reference, that is probably subject to the original author's copyright. On the other hand if you both happen to paint the same sunset its all ok.

I think you're more stating how you would like copyright to work, not how it actually does.

loeg 4 months ago

> I think you're more stating how you would like copyright to work, not how it actually does.

Nope. This is just how it works. I don’t care one way or another.

ggm 4 months ago

So clean room the llvm analysis and work from a spec carried out of the room.

It's a really low bar to avoid, tbh. The point is that people have hobbies. And aspects of this work can look like a hobbyist "but i don't want to do it that way" view.

As a consumer of compiler products it doesn't have to matter to me, nor as a user of compilers. It's only observations reading the comments and the article which brought this to mind: llvm is proof by example and is a different kind of open source, it's not a barrier I would struggle to pierce, for my own personal view of code licences.

(I'm old enough to have read the gnu manifesto when it first published btw)

mustache_kimono 4 months ago

> It's of course not ridiculous.

I have the feeling you're arguing against and about something I never said.

To clarify, I'll restate: "I believe Linus has even opined that any filesystem which was developed after Linux, whose developers are aware of Linux, could be considered a 'derived work'. [The view that any new filesystem, simply aware of, but created independent of, and after Linux, is a derived work of Linux] is of course ridiculous,..."

> It's why black box reverse engineering exists and is generally legal, while white box reverse engineering is generally illegal.

Oh, I agree a clean room implementation is generally the best legal practice. I am just not sure there are cases on point that always require a clean room implementation, because I am aware of cases which expressly don't require clean room implementations (see Sony v. Connectix and Sega Enterprises Ltd. v. Accolade, Inc). And, given the factual situation has also likely changed due to FOSS and the Internet, I am saying some of these questions are likely still open, even if you regard them as closed.

LegionMammal978 4 months ago

I agree, some of the Linux people have a very broad notion of what counts as a derived work, but I haven't seen much in the way of actual case law to support the conclusion that "white box reverse engineering is generally illegal".

Software generally receives wide protection for its 'non-literal elements', but it's not the case that every possible iota of its function falls under its protectable expression. Indeed, plenty of U.S. courts have subscribed to the "abstraction-filtration-comparison test" which explicitly describes how some characteristics of a program can be outside the scope of its protection.

But in practice, many of the big software copyright cases have been about literal byte-for-byte copying of some part of a program or data structure, so that the issue comes down to either contract law or fair-use law (which Sega v. Accolade and Google v. Oracle both fall under). The scope of fine-grained protection for 'non-literal elements' seems relatively unexplored, and the maximalist interpretation has seemingly proliferated from people wanting to avoid any risk of legal trouble.

ggm 4 months ago

Linux/Linus is not the FSF, hence the FSF insistence on referring to gnu/Linux.

tuna74 4 months ago

FSF/Stallman wanted the whole OS (Grub + Linux + Glibc + SysV init + ...) called Gnu/Linux.