Most sensible abstraction & feature set for a systems language?

Jasen Qin17 Sep 2022 19:49 UTC

0 points

Programming Computer Science Productivity

Edit: Made title seem more like a question as intended and tried to clarify on some points Ive made and provided more context on my motivations

Motivations

Systems programming is not well defined. But for the purposes of this post I define it to mean “writing software to be used as core, performant and reliable modules for other software”. A systems software would not be used to provide business value directly, but rather be part of a network of modules eventually used in an application software that does provide business value. If a systems software dies, so could the user. If an application software dies, it could be rebootable and fixed by IT professionals. Every extra millisecond a systems software spends on doing its job, the application software using it spends an extra second. Thats $1 million lost.

Current systems languages like C, C++, Rust do their job in the systems programming space and have been thoroughly optimised throughout the years. Unfortunately, problems with ergonomics, complexity, and the tendency for bad code and shoddy solutions is still amiss.

Newer languages like rust have tackled many of the above issues, leading to a language that is most loved by devs (Stack Overflow 2020-2022) by far, compared to the next languages on the list. Unfortunately critics often cite rust’s complexity with its lifetimes and borrowing system as well as parts of its syntax. I happen to think they’re a bit verbose sometimes and not as easily to digest in one skim through. Rust’s toolchain has definitely improved the overall development experience, simplifying the retrieval, management and usage of external libraries as well as providing a uniform(ish) interface to customise and adapt to your use case. The language server protocol was also a step in the right direction, allowing extra productivity if your language can properly utilise its full potential. I think rust-analyzer is among the best language servers available, with rust’s strong static checks, leading to common errors being found as you type so you don’t have to manually compile → check for errors → check your code → retry.

However, despite the promising advances, I still think we’re not quite at the mark of something that “just works” and provides an optimal balance of performance, ergonomics, productivity, scalability, and intuitiveness. I think this is mainly due to the core idea of a “systems language” not being entirely well defined and studied as a science. Hence the systems languages that exist merely attempt to broadly occupy the space of systems programming without moulding itself to it’s nuances and thinking deeper about ergonomics related features one would usually expect in a higher level language.

Though, a systems language may have to make tradeoffs with performance and lower latency over certain “high level” features. For instance, you probably wouldn’t add a Web Frontend framework to the C standard library.

Classical Computing

For computers modelled on the von neumann (VN) architecture, what is the “best” abstraction that provides an optimal balance of performance, ergonomics, productivity, etc.?

VN or “classical” computer’s are ubiquitous in modern computing, accounting for up to 99% of the computing being done.

One would be expected to have a set of input and output lines and a central processor that manages the its execution and any subsystems, e.g. a gpu. Your central processor could be parallelised in different core topologies, or internally parallelised with instruction/data level paralellism. It might have a pipeline where instruction execution is broken up into parts that can be more easily dealt with using specialised subsystems such as: dispatchers, decoders, ALUs, branch units, memory caching and writeback units.

Assembly

Assembly would probably be too low, being almost 1:1 for assembly instruction: cpu instruction. It would be quite verbose and repetitive. Also, if the CPU arch changes or new extensions are added, your gonna have to add more instructions / modify the existing ones.

It is quite neat though, for doing something highly specialised like writing a startup function or syscall handler. I think it makes the most sense to embed assembly in something else, rather than to write it directly. That said, I would say that rust’s core::asm is the closest to optimal usage of assembly at the moment. Assembly as an embedded DSL or some pseudo-assembly like syntax that allows more fine grained tuning or control over the processor would be quite useful and allow more rigid construction of systems, over inserting some random assembly code into the build process and trying to link it at certain positions or interfacing through linker script variables.

C

Something like C makes sense. You have “high level” concepts like records (structs, enums, unions), arrays of a certain type, functions, pointers to memory addresses of any kind of memory context such as heap or stack.

I think this is a good level of abstraction for building systems. You have pretty much a functional programming language that is able to interface with memory and IO like how a modern computer with a single cpu core (and RAM with MMIO) would. You just read and write to any arbitrary address and as long as you know the underlying hardware layout and systems, you can write code to do very useful things such as a supervisor (aka an OS) for the hardware.

You don’t even have to deal with the hardware directly most of the time either. Just build your abstract data types and functions that wrap around the dynamic parts. Even better if you arent writing bare metal code, and just use the std library. That way int main() { printf("Hi Mum!"); } just works. The std library allows an avenue for portable code and powerful fns that do a lot of work for the amount of code you write.

Unfortunately, you also get quite a few issues, e.g. buffer overflow, weak typing and hence being able to cast anything to anything else, creating the possibility of all sorts of problems when you arent careful.

I also don’t really like c style compiler directives. It just feels out of place sometimes, #ifndef blocks, etc. They can really break the flow of your concentration and intuition about what they’re actually trying to do here. Especially since so many of them have overlapping and terrible naming schemes.

C++

C++ adds extra utilities to the programmer to model more complex systems in a more efficient and intuitive manner. Classes, templates, references, operator overloading, etc. Ive also noticed as the complexity increases, the more “math like” the language becomes. C with functions, and ADTs, C++ with HKTs, higher order functions, and std::functional.

The std library can be quite useful and is loaded with many many features. Maybe too many. I quite like the random (mt19337), functional, clock, containers (vector, map, etc), io, and numerics libraries. It also makes sense that these features are built as a library rather than into the language itself. So maybe this is also quite a decent level of abstraction.

But unlike c style projects, I find c++ projects to be much more readable (maybe because c++ devs realised how bad it would look if it was written like a c project?). It could be because the powerful class based abstractions and templates allow you to write more expressive code with less. Maybe a lot of C code makes use of annoying naming schemes such as __init. In C you’d typedef → #define → typedef until you get the level of abstraction you wanted. In C++ you can build it almost directly.

However, like C, similar safety issues are still there. I guess you can use “good practices” like explicitly stating static_cast<T>, using shared_ptr(), building abstractions and code that is actually safe (which you can do with C anyway). But I mean… if you could do something, but you didnt have to. Then its gonna be hard to resist the tempatation to just write some code from your ass that you get the feeling is going to work and solve the problem at hand, without going out of your way to be safe. Also thats how you get more technical debt.

Debugging

I hate debugging. I also hate printing to stdout.

I guess if you run into a problem that isnt as easy to solve by looking through the code, you’d want to run a debugger and go step by step through execution to gain a better view of what the program is actually doing to do and any hidden things you didnt see.

I feel like if you have to use a debugger, then theres something inherently problematic with the language. Esp if you have to use it a lot. That being said, I think the “proper” way to solve all of this is to have a language with stronger static analysis and an ergonomic, strict formal verification scheme that detects most of all the problems that could arise. Rust and Eiffel seem to be on the right track.

Build systems

I don’t mind cmake + clangd if the project was setup well. That means the right directory structures for organising your source files, headers, libraries, assets and scripts. Vscode and Clion both have pretty good integration with cmake, and clangd is able to spot a bunch of common problems and issues to do with your intent. Did you want your constructor to be explicitly called? Or would an implicit {a, b} constructor do?

C/C++ don’t have very good packaging systems though. I guess its not the easiest to do after decades of technical debt and code size explosion. To use a library, you’d do it by specifying a remote cmake/meson/make repo through something like fetchcontent in your own build system. Worse, if you don’t have a modern build system and simply use something like make/autotools. Then, you’ll have zero integration with your IDE.

If theres a perfect build and packaging system, I’d say it would be something on the lines of cargo, elm and pipenv. I don’t see the big deal with having so many options on any one single tool; it make more sense to split them up, like how rust splits its toolchain up into rustup, cargo and rustc.

Documentation

Easily one of the biggest issues with using someone else’s code, or trying to work on the same code. C/C++ has a reasonable tool doxygen, but many projects don’t even seem to be interested in using it. Though I do prefer something like pydoc or rustdoc which results in something a bit more modern and navigable (search bar and all that jazz).

Inlining docs with code can be very frustrating to look at sometimes. Some C files are 50% documentation and 50% code with terrible naming schemes. Docs and code may be intersparsed randomly, breaking the reader off from his concentration every time he encounters a comment block that may not even be that useful, since it mostly depends on what the author thought was important at the time, not what is actually important. If the docs were mostly placed right above each field like a fn, class, global var, etc, then I think that would make the most sense.

If there was also a way to automatically summarise your comments or generate examples for each fn, then it would make using someone else’s interface much less annoying. Again, rust’s md examples is a step in the right direction.

Rust

Rust tackles many safety related issues with C/C++. It adds a borrow checker, explicit lifetimes, some run time pseudo safety like panic!() on buffer overflow rather than UB or getting seg faulted by the MMU/OS.

It also has some nice features like sum types, immutable entities by default, and clear separation of behavior vs data, with traits and structs. That way, the first thing you think about is composition and generics, which often allow better performance, e.g. no vtable lookups on virtual methods. Rust seems to encourage better programming practices and prevent technical debt from accumulating as much as a C/C++ project. Just look at linux...

Rust also has FP composable features like C++‘s std::reduce by default with Iterator, and all your usual map-reduce functions. The fact that its reexported in std::prelude is also nice, to encourage use of it over a more rugged OOP style that a C++ programmer would probably model with. I also like the idea of std::prelude in general, where a lot of the common fn’s are imported into your project’s namespace by default. Though I think certain C/C++ compilers are smart enough to link a header even if you didnt specify it.

Other languages

Other languages of interest are go, scala, D, nim, zig and V.

I thought go was pretty cool but I think it might be at a too high level of abstraction to be really considered a systems language. Its more like an application language. I do quite like the idea of having a simple goroutine parallel execution. I don’t really like the package semantics like in Java though, I’d rather a more rigid and implicit project management structure. The fact that go doesnt have a package manager is also kinda meh. Maybe you don’t need a package registry per se. But some easy way to upload a package into some /namespace/package and then download from that /namespace/package seamlessly I think is really nice.

Scala as well, but scala is quite interesting in that its pretty good for building distributed systems and scala native has made a bunch of progress. But I think its a little too expressive and complex.

Then we get to D, which seems like a better and less complex version of C++. I thought dub was decent. But seems like noone really uses it. It can interface with C++ though, which could be useful. I also like its syntax (except the semicolons which I find distracting), and find it very “sensible” for a systems language. Maybe D is a good candidate for the optimal abstraction without being overly complex.

Nim is also quite nice. Nimble is great to use, and out of the other langs, I think nim’s syntax is the most readable out of the other langs. It has a builtin GC which kind of turns me off a bit. I prefer a more complex memory management system that makes you be able to write code that looks like nim but with pretty much zero runtime cost. Ive seen some benchmarks and they do seem decent though.

I haven’t used zig too much but I thought it was decent. Kind of like how C would be if it was modernised on all fronts. It includes some great features like tagged unions and C FFI with @cImport.

I thought V was interesting. Its like rust and zig but it also aims to have a rich standard library including graphics and servers. But I heard there were some issues with memory and performance, and I havent used it much.

These langs generally have decent enough toolchains, build systems, etc. By no means are they perfect or intuitive in all the features they offer, but they do work alright. Their IDE integration can be lacking quite a bit, some due to the fact that they’re quite new. I’m not the best in them so maybe I missed a bunch of features that might prove to be key to solving the optimality problem.

There are also some newer languages like odin, jai and ante that caught my eye. I’m still looking into them so I can’t say too much, but they do seem to solve many problems with performance & safety while considering ergnomics and “high level” functionality.

Language server & IDE

A language server should be a paramount feature for any language imo. The added ergonomics and efficiency in terms of autocompletions/snippets, goto definitions, early error checks (like rust-analyzer), refactoring help, auto formatting, type inference hints and other hints, hover over identifier for its definition, etc. These things make using the language so much better.

Not too mention if your working on the same thing with other users, you can also have tools like gitlens to view each line’s commit and commit comments, adding a specific line or change to your VCS with a hotkey rather than the terminal, etc.

Testing

Another area is writing effective unit tests to ensure each atomic unit of functionality works the way you expect it to. That way when you combine everything together, it should just work.

In rust, you can simply write a #[test] function like any other function. Compared to scalatest, c++ googletest, etc, cargo test is great as there isn’t any extra things you have to setup or any extra cognitive loading to simply verify that a function does what you expect it to do. Coupled with a good language server, you simply click “run test” above the test function to test that specific function instead of the entire test suite.

Formal Verification

Unfortunately, there doesnt seem to be any good builtin tools for proving code validity. There’s Coq and other tools but they’re not the greatest use and it would be much better if they could be embedded within your language like a rust unit test.

I think this is one of the most interesting areas that should be explored a bit more. Design by contract is a related idea, and if these concepts could be applied in a sensible manner, perhaps they could simplify systems development by a lot.

Optimal systems language?

Quite a bit of progress was made in terms of safety and ergonomics. In terms of performance, tons of C/C++ code have been heavily optimised over the years. And they do work, just use any major game engine, operating system, or vm/runtime like nodejs, python, etc. Rust has taken strides in the right direction of further improving the development environment and attempted to tackle some of the key issues of C/C++.

But I think we’re still lacking in some areas. Maybe some way to integrate a GC-like feature in a zero-ish cost way for a cleaner and simpler design. Maybe deeper consideration of static analysis, formal verification, data oriented patterns, functional patterns, ownership patterns, modular patterns and so on. More rigid project structures and usage of external tools such as sed which would usually be invoked via script. Better development environments and perhaps the possibility for a language-environment fusion to allow for maximum productivity, ergonomics, cooperative. VSCode’s live sharing feature is a step in this regard. Perhaps extensions to gitlens to make public code development even better.

Maybe something on the lines of an “ECS” (entity component system”) style language where the only “features” you have at the very core are the three atomic features: entities, components, and systems. And your core and standard libraries would build on that framework to expose generally useful functionality for building performant and reliable systems software.

Jasen Qin17 Sep 2022 19:49 UTC

0 points

5 comments10 min readLW link

Programming Computer Science Productivity

Viliam 18 Sep 2022 13:09 UTC
5 points
2
Maybe let’s start with the question: Why? Are you planning to write an operating system from scratch? What would be the benefits of such project?
If you have a specific answer, such as “I want to make an OS that is provably 100% safe and never crashes”, that will have an impact on your choice of tools. (If you don’t have a specific answer, then what’s the point?)
I think it might be a better idea to have the tool auto document your fields and try to infer from the param names, fn names, return names, etc. to figure out what the field is trying to do. And to allow the programmer to override the autodocs with Doc comments or something. The IDE would then hide all non essential comments by default, such as those not right above a field.
I think this misses the point of documentation. If the tool automatically creates comments based on the function names, and then the IDE hides those comments… why even have them in the first place?
Good documentation describes intent, things that are not obvious from looking at the code, and maybe examples of usage. Long documentation should be foldable in IDE; many IDEs already have this function.
Dagon 19 Sep 2022 19:09 UTC
3 points
0
I’ve long (30 years plus, and reinforced in working on dev tools including an IDE for a large Java server company) been of the opinion that no one language is sufficient to solve the diverse set of fairly large and complex problems that modern developers face. You can get a long way with the Unix philosophy (small single-purpose tools that compose via a shell/script mechanism, but there are plenty of apps that need the effeciency or safety of a coherent design or language structures to manage threads or networking or something else missing from either the low-level syscall-wrapper language or the high-level flow control and data-structure-manipulation language of choice.
I’m a fan of the JVM abstraction for applications—it’s easy to mix languages (java, scala, kotlin, groovy, etc.) all with the same (imperfect but well-defined) underlying memory and threading model, and a reasonably complete core data structure library. It runs on a bunch of different CPU architectures, mostly fairly efficiently (at least for long-running repeated operations, where JIT optimizations dominate).

It’s not great as an extremely low-level “systems” language—for that, a mix of assembler, C, C++, Rust, and others is better. And those aren’t great for mid-level “glue” languages (one-off or higher-level OS or deployment automations), for which Python is the current winner (with a historical mix of BASH, Python, Ruby, and others).

Fundamentally, don’t think of programming language as either the limiter or the enabler of your proficiency as a systems developer/engineer. They’re just additional tools in the box of how to think about efficiency, reliability, and explainability of systems that include computing machinery.
I apologize for being blunt, but your comments on debugging and documentation indicate to me mostly that you’ve never worked on large projects that span abstraction levels or use complex enough data structures that “debugging via stdout” is infeasible. Getting good at using a debugger for watchpoints and tracing is life-changing for many kinds of work. Likewise memory and execution profilers—not necessary for many pieces of code you’ll write, but life-saving for some.
- Jasen Qin 20 Sep 2022 12:12 UTC
  3 points
  0
  Parent
  Maybe thats a problem, you’ve been at this for too long, so you’ve developed a very firm outlook. Just kidding.
  You can get a long way with the Unix philosophy (small single-purpose tools that compose via a shell/script mechanism
  I do quite like sed, wget, etc. but it also doesnt feel very rigid to use in a project. What I see is a bunch of projects that have a mess of scripts invoked by other scripts. See https://github.com/gcc-mirror/gcc. Why not try to link those tools into the code in a more rigid way? If a project uses git why make a script that calls git from the system instead of directly linking something like https://docs.rs/git2/latest/git2/ into your code to do what you want? I believe cargo/rust is going the right direction with build.rs, config/cargo.toml, etc.
  On the case of debugging, what Im trying to say is that your language shouldnt need to be debugged at all, given that it is safe. That means no logging to stdout or any half assed ways to see where your code is dying at. It should be formally verified, through some mathematical scheme. The (safe) features of language should be used and composed in a way that the code must always be correct. Another thing is to adequately test each individual functionality of your code to ensure that it behaves in practice how you expect. Property based testing is also a good idea imo. I just dont think a project, no matter how big it gets, needs to ever get to the point where it just randomly dies due to some small bug/potentially buggy code or half assed solution that the programmer decided to oversee or implement.
  So that leads me to say that I do absolutely think that a systems language definition is important, if not, key to the effective development and functioning of a program. You’ve also mentioned big projects that spans multiple abstraction levels—that doesnt sound like very good software design to me. From first principles, one would expect KISS, DRY, principle of least knowledge, etc. to hold up. Why arent projects like that broken up into smaller pieces that are more manageable? Why arent they using better FFI/interface designs? A lot of the big “industry grade” projects seem to have decades of technical debt piled on, just trying to get it to work and wrap bandaids around bad code instead of proper refactoring and redesigning. Newer protocols are supported in a lazy manner leading to more technical debt.
  Just like how a good powertool like a circular saw would dramatically improve your DIY experience and quality in making furniture over a handsaw. I think a good systems language + development environment would make dramatically improve your experience in making good software
  - Dagon 21 Sep 2022 16:04 UTC
    2 points
    0
    Parent
    Maybe thats a problem, you’ve been at this for too long, so you’ve developed a very firm outlook.
    Grin. There’s more than a little bit of truth there, thanks for reminding me.
    I do quite like sed, wget, etc. but it also doesnt feel very rigid to use in a project.
    Depends on the project. If it’s stupid and it works, it’s not stupid. There’s a WHOLE LOT of projects that don’t need wide applicability, great uptime, nor long-term maintainability. Getting them functional in half a day is way better than getting them reliable in a week (or in GCC’s case, keeping the build-of-gcc system functional over 35 years is important, but making it clean is less important than improving the actual usage and functionality of GCC). Rigidity is a cost, not a benefit. Sometimes.
    Which is my main point—using different tools and languages for different needs is the only way I know to get to the right points in the tradeoff space (both human-level and operational) for different things. I suspect we’re talking past each other when we talk about multiple levels of abstraction in a project—the word “project” is pretty undefined, and separation between these abstractions is absolutely normal -whether they build/test/deploy together or separately is an independent question.
    You’re absolutely right that pretty much all long-lived projects are a mess of hacks and debt just trying to get it to work (and keep it working as new uses get added). But there are pretty difficult-to-oppose reasons that this is universally true. The decisions about when to refactor/rewrite vs when to patch over are … non-trivial and heated. Fundamentally, tech debt is a lot like financial debt—a VERY useful tool for being able to do things sooner than you can fully afford it by yourself, and as long as you’re growing, can be rolled over for a long long time. And it makes you more fragile in a downturn, so the organizations and codebases that over-use it get winnowed occasionally. Circle of life, you know.
Jasen Qin 20 Sep 2022 11:41 UTC
1 point
0
@Villiam I noticed I didnt really spell out my motivations but yes, Im looking at things like operating systems and the “science” of making core software that would then be used by other software, and eventually apps that would directly provide business logic.
My reasoning for documentation is mostly a way to elaborate how the code should work in a vacuum. That is, how you can directly use that piece of code to get what you want. I think it is quite hard to understand someone else’s perspective just from comments so it makes sense to have a language that is quite strict + precise and provides a way to directly generate “reasoning” from the code so that anyone can understand. This also includes examples of how to use the code, and the clarification on the types and values that should be expected.
That said I dont really like inline comments very much or comments above non essential fields like within fn defs or class bodies.
And to clarify, I meant that the language server could e.g. auto generate the documentation for each important field. Then if a user wishes, he can turn on an option that would summarise those fields using e.g. an extractive model. Non important fields would be auto folded, not the important fields.