Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Delphi 2 can compile large .pas files at 1.2M lines per second (twitter.com/pervognsen)
199 points by luu on Oct 9, 2020 | hide | past | favorite | 200 comments


If you want to be able to compile a certain language fast, translation units need to be modular. I.e. if Source1 is referenced by Source2 and Source3, you should be able to compile Source1 once, save its public part into a binary structure with logarithmic access time, and then just load it when compiling Source2 and Source3.

This works splendidly with Pascal because each .pas file is split into interface and implementation sections. The interface section is a contextless description of what the module exports, that can be easily translated to a bunch of hash tables for functions, types, etc, that will be blazingly fast.

It's a whole different story with C/C++. In order to support preprocessor macros and C++ templates, referencing a module means recursively "including" a bunch of header files (i.e. reparsing them from scratch), meaning O(n^2) complexity where n is the number of modules. You can speed it up with precompiled headers, but you will still need to separately parse each sequence of references (e.g. #include <Module1.h> \n #include <Module2.h> and #include <Module2.h> \n #include <Module1.h> will need to be precompiled separately. C++20 modules do address this, but since C/C++ is all about backwards compatibility, it's still a hot mess in real-world projects.

That said, C# solves this problem near perfectly. Each .cs file can be easily translated to an mergeable and order-independent "public" part (i.e. hierarchical hash table of public types), and that part can be reused when building anything that references this file. It is also interesting how C# designers achieved most of the usability of C++ templates by using constrained generic types that are actually expanded at runtime (JIT IL-to-native translation to be more specific).


Rob Pike on include files [0]:

> Simple rule: include files should never include include files. If instead they state (in comments or implicitly) what files they need to have included first, the problem of deciding which files to include is pushed to the user (programmer) but in a way that's easy to handle and that, by construction, avoids multiple inclusions. Multiple inclusions are a bane of systems programming. It's not rare to have files included five or more times to compile a single C source file. The Unix /usr/include/sys stuff is terrible this way.

Plan 9 compilation is very fast. It is small but also it restricts recursive #include, no guards (example [2]), each header has corresponding library, headers included in C file only [1]:

    #include <u.h>
    #include <libc.h>

    void
    main(void)
    {
        ...
Remember Go language demo compilation time? Same authors.

[0] http://www.lysator.liu.se/c/pikestyle.html

[1] https://www.lysator.liu.se/c/plan9c.html

[2] https://github.com/brho/plan9/blob/master/sys/include/bio.h


I would strongly recommend that people new to C and reading this comment do not follow this advice. It might have made sense back when it was written but it's not how modern C is typically written, and it would be very annoying for modern C coders to put up with these shenanigans.

Multiple inclusions are hardly a problem nowadays, with header guards it means that subsequent inclusions won't actually include anything and the performance impact is fairly negligible the vast majority of the time. C compilers are pretty damn fast these days and in my experience it's the optimization pass that slows things down the most, not parsing includes.

Also in general I tend to take what Rob Pike says with a grain of salt, he's a very smart guy and he definitely says interesting things but he has weird idiosyncracies. He's definitely worth listening to, but I wouldn't follow his advice blindly. See his weird take about how "syntax highlighting is juvenile" for instance.


Headers not including headers is still good advice, it's not so much about fast compiles (at first at least while a project is small), but about not letting header inclusions grow out of control (see almost every C++ project with more than a few hundred source files, it's almost impossible to clean up unused headers, and even when that happens - after a few weeks it's back to the old mess).

See here for a relatively recent take on this:

https://ourmachinery.com/post/physical-design/


The only practical issue with "out of control" includes is still compilation time though, especially if modifying a single file causes many other files to rebuild because they include it directly or indirectly. Beyond that I really don't care about it, I don't usually bother to micromanage my #includes, I don't really see the point.

Beyond that I think that very complicated include graphs are usually a sign that there's an architectural problem with the application, forbidding includes within includes treats the symptom, not really the core of the issue.

As a quick example of some code I wrote not long ago: I have a microcontroller communicating with an external chip over a serial link using a custom protocol. I have a "serial.h" file that describes the API to send and receive serial messages, then a "protocol.h" that implements the higher level protocol. protocol.h includes serial.h since it uses it as a lower layer, it's effectively a dependency. I don't really think it's a problem.

So I'm not really sure that it makes a lot of sense to forbid includes within includes as a rule of thumb. I guess if I was to make a rule it would be "include trees are fine, include graphs are bad"? As in, there shouldn't be cycles in the include graph?


Yeah - so layer-lawyer would argue that the only thing that needs to talk to serial.h would be the protocol - but, in the real world of deployments that actually work in the field, it's often the case that layer lawyers get in the the way, and being able to bypass the protocol to talk directly to the phy is a thing - which means you end up including the phy at multiple places.

Purity vs Practicality.


Headers not including other headers is a great thing IMO.

If you’re coming into a code base with new eyes there’s less confusion as to where things came from.

Of course if you’re reading code - ctags, and cscope are invaluable. I’ve yet to use this fancy new LSP stuff but it seems popular.

Convenience headers can be ok occasionally but it kind of runs in the face of writing modular software at times.

Syntax highlighting has caught a few mistakes for me in the past but I do think it’s kind of a crutch.


Not including headers in header files can be confusing.

Say you maintain a library (lib1.h) that exports an API that takes in an argument defined in lib2.h.

If you do not include lib2.h in lib1.h, then the users of your API need to include lib2.h before lib1.h in every source file and header. With a larger number of dependencies, this can become unwieldy for users.

In addition, future maintainers of your library may also find it confusing why things break when includes are reordered.


Most modern C compilers are aware of include guards: if they notice that the _entire_ file is enclosed in include guards, they won't read it a second time.


I haven't opened up my C compiler to verify this, but even for ancient compilers that would be a non-issue, as they see only what the preprocessor hands them. Re-reading (i.e. parsing by the preprocessor) an include file isn't so bad either, as it is already cached.

I find this focus on compilation speed curious. At my last workplace a full built took a good share of the day (!) and was typically done overnight. Building during the daytime was discouraged, as it happened on our workstations, slowing down everything else. We found ways to cope, chiefly learning which subsets are autarkic, i.e. finding the least general Makefile which when invoked would catch your recent changes and wouldn't traverse too much of the tree. Not ideal, but workable.


> full built took a good share of the day

Yikes Was this with/out some sort of cluster support?

Which language?

Is this due to legacy practices, or is it simply a difficult build?

I am asking because i am involved with a parallel build project. and often there is plenty low hanging fruits


> Yikes Was this with/out some sort of cluster support?

Each developer built on their own workstation. We meant to look into built-cluster, but we never got around to it.

> Which language?

C++ mostly old style, some C++11, quite a bit of template programming (but I haven't seen anything too fancy).

> Is this due to legacy practices, or is it simply a difficult build?

It sure was an old code base, not sure what you mean by legacy practices (but that might make it a yes ;-} Not sure either, what would make it difficult, but there were C++ templates and it was fairly large (sum of resulting object files in the order of a few hundred MiB).


That breaks undefs before includes.


Well yeah. If you make normal, idiomatic use of include guards, compilers will be able to apply this optimization. If you do weird stuff, they'll be forced to re-read the file.

https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.htm...


That is a major usability setback. It gets worse when headers A and B both require header C. Let's say your source already includes A and C and you want to add B. Now you need to manually read through the comments to understand where exactly in the include list you need to plug it.

This will become completely unmanageable once your include count surpasses 20-something, which is normal for large projects.


As with much of Pike's advice, it's worth remembering where it comes from: Unix, specifically AT&T-style Unix (and later Plan 9), where a fairly disciplined (programming-wise, at least :P) team of very good, and oftentimes very experienced developers wrote a lot of small, interconnecting utilities.

If your whole source code is written in that manner, and written by people who write code in that manner, this advice is easy to follow.

On the other hand there aren't many files in the Plan 9 source tree that are longer than, what, 3,000 lines? 4,000? Attempting to shoehorn that, and many other well-intended nuggets of wisdom from the Unix 'heritage", into your average commercial codebase, with dozens of third-party libraries and whatnot, is unworkable (I know, I tried).

Some programming advice pearls translate well into any project. "Avoid gratuitous comments like "increment i", use comments to explain the reasoning behind functionality, not describe what the source code already adequately describes" can be applied on any source tree, starting at any point. But some of them -- such as this one -- only work if you write and organize your code in a particular way.

FWIW, I think that this bit of advice was useful back when Pike wrote it, in the age of slow, buggy C compilers (although some would argue we're still in that era!) munching code off of network-mounted drives. Nowadays, not so much -- I think it has readability advantages but I doubt the performance gain (if any is left) is worth the extra programmers' effort. I like to follow it when I'm writing small utilities but I don't really fuss about it for larger projects, and I've never been curious to see how fast it is. I believe (but never measured it, so take it with a grain of salt!) that it's one of those things that 30+ years of computer systems evolution (in the biological sense -- "advancement" is probably too strong a word) has made obsolete.


Usability is easy to fix. Replace comment with #pragma or better list dependencies in a separate file (like gemspec). Add #require keyword which resolves order automatically.

That's advice from February 21, 1989. They've implemented OS and some GUI applications, for example acme text editor [1]. Maybe it is not that hard.

[1] https://github.com/brho/plan9/blob/master/sys/src/cmd/acme/a...


I think that's (a big part) of the whole point: splitting a project into too many small headers which depend on each other is bad. Including headers into headers just hides the problem, but it's still there under the hood and grows worse the bigger the project becomes until it's too late to do anything about it. That how you get hours-long compile times in large C++ projects.

Instead use few big headers, for instance one public-interface header per system/module which only contains declarations (no inline or template code), and at most one private header for each module (or better none at all). Then keep the dependency tree between modules/systems as flat as possible.

This is especially important for large projects.


The preprocessor can be expected to read files and scan to the corresponding #endif very fast. Reading from cache is also very fast. C++ compilation bogs down when templates are involved. You want to split those (in a way which makes sense to the human reader), so that only the ones needed for a given compilation unit are instantiated. Further the link stage, which is difficult to parallize, is a bottleneck.


It's not just about compile speed, but also about staying sane while working on the project ;)


> Now you need to manually read through

C already has similar problems that are worse. For any given function used in a program, you have no guarantee that you can locate where that function is implemented in a straightforward way. In the worst cases, you have to resort to using dumb hacks like ctags or a project-wide recursive grep. It makes it hard to read any given program in the Knuthian way—laid up in bed with a printout to study how it works.

(Go has the same problem. When I started seriously looking at Go, I was surprised that it copied C here, given the strong influence from Oberon.)


This sounds like something that can be solved by a separate tool (similar to rcorder that determines the order in which to execute rc files in BSD systems based on their dependency specifications) one can plug into the build process.


Or use a language witn normal ʼimportʼ support.


He's not wrong per se but this shows how utterly broken C is. And macros are a "bandage" that make the problem worse.

> include files should never include include files

Cool, there are two possible outtakes to this:

1) Why aren't include files generated automatically from C files then? (If you can't include anything this means they shouldn't have no "code", so they can't call anything, so they can't include anything or reference anything else)

2) Why aren't they a compilation target? (yes I know modern GCC allows you to do that) but why doesn't C specify a binary representation for them.


He addressed performance issue. Just what parent asked — separate interface and implementation.

Looks like you ask "why convention, not restriction?" It solved their problems while maintaining interoperation with UNIX world [1]. If convention is not enough for you, you may define format and tooling.

[1] http://doc.cat-v.org/plan_9/4th_edition/papers/ape


Because ISO doesn't force language implementers to do such tooling, which is also why there is UB to start with.


Agree with the first part, but I’m not sure where you’re going with the second.


The unwillingness to standardise specific language features back in the 80s, during C89, because they would break pseudo compatible K&R compilers like Small-C, or force vendors of now long dead platforms, to actually spend a little more effort improving their toolchains.


A variant of this is to mandate that every header #define a documented symbol, and then have a #ifndef guard for this around the inclusion of the header.


The standard practice of putting the ifndef guard inside each header is specifically optimized by compilers, though I don't recall where I read that.


I don't know why this is downvoted. This is exactly how it is.

The first time the compiler reads the include file, it remembers where the corresponding #endif is for each #if. The next time the file is included, the compiler can skip over the whole included file. There is no performance advantage from outsourcing the conditional to every including location.

Putting an ifndef guard at the location where the file located is not workable. It's terrible boilerplate. To the relatively simple '#include <A.h>' another variable that must be synchronized with dependencies gets added. Plus 2-3 lines for the conditional. At each including location.


IIRC, this is documented in the CPP part of the GCC manual. Whenever a header file is completely enclosed in a ifndef guard, and that guard is still defined, it will skip the whole file.

Another option is "#pragma once"; it's non-standard, but every popular C compiler (even MSVC) understands it, and has the advantage that there's no risk of accidentally using the same include guard for more than one header file.


Windows compilers were the first to have it, naturally MSVC understands it.


All production compilers now support `#pragma once`, so the traditional `#include` guard is almost totally obsolete.

The only plausible exceptions are in proprietary compilers for microcontroller SDKs.


I very rarely encounter #pragma once in the wild personally, and I still always use old school #ifndef guards myself. I can't really see a reason to change: it's not a huge amount of typing, it works everywhere, it's very explicit and not magical.

For instance out of the top of my head I don't remember how #pragma once decides what constitutes the same include file. If you have symbolic links, or absolute and relative paths for instance, how does it work? I'm sure I can find the answer in the compiler's documentation, but with #ifndef I don't even have to, it's obvious what's going to happen.

I would personally argue that old school guards should be preferred because of this and the fact that they're standard, but I don't feel super strongly about it.


I've seen terrible bugs in the past where there were multiple incompatible versions of a header file in a codebase, but nobody noticed because both headers used the same guard variable name, so the header that was included first always won.


Yes, this is the usual failure mode. It happens quite often, because nothing checks for unique or correct guard names.

Failing to switch to `#pragma once` when it becomes permissible is reckless and irresponsible.


I'd think it would be much easier for a preprocessor to determine that it is done with a file, once it encounter that pragma, instead of having to scan for the corresponding #endif (not sure if it'll dare to scan from the back and hope that all #if/#ifdef, #endif pairs match or whether it diligently checks those every time).


Wouldn't it be as simple as maintaining a bloom filter containing the absolute paths of every included file encountered so far? Whenever you encounter #pragma once you check if the file is already present in the bloom filter and return early if it is. If it's a new file you add the file name to the bloom filter and continue processing.

It probably even more memory efficient (albeit insignificant in the grand scheme of compiler memory usage) for the compiler to just maintain the bloom filter and not have to deal with hundreds of otherwise unused preprocessor symbols, one per include file.


The typical argument against pragma once is that it's unclear what should happen if two different hard link paths are used to include a file with the same inode (along with similar situations on Windows or with files on network drives). That is to say, in general, there is no consistent and portable "true" absolute path that uniquely identifies file(s) as conceptually having the same underlying storage.

In practice, I have never actually seen a problem arise. The most likely real trigger would probably be a mix of including a library through a path that indicates a specific version and including it through a path that uses a "latest stable" link. That is already fragile and asking for trouble, though.


Bloom filter can have false positives.


You may imagine that you are helping the compiler avoid error, but the fact is that real compilers, when they encounter `#include` guards, translate therm implicitly to `#pragma once`, and move on. The compiler is better at it than you are.

Meanwhile, you are exposing yourself to subtle bugs resulting from multiple headers that define the same guard name, as noted nearby. So, your imagined virtue is folly.


Pascal also was originally designed for easy parsing and compilation - single pass compiler with simple recursive-descent parser.

Maybe Delphi compiler works the same way.


This is also what I am thinking. Even if I am a fan of Delphi/Lazarus for what they deliver, native GUI programs that generally have no dependencies, I think some parts of the Pascal language were designed for ease of the compiler and less for the programmer.

Here are a few features that prevent Pascal from being my go-to language:

* Having all function variables declared in var clauses at the start, far away from where they are first used. It eventually gets littered by loop indices and other temporary variables.

* "interface" sections that can easily grow (C/C++ header files are also somewhat similarly inconvenient).

But then again, these languages have been around when computers are not as powerful. And perhaps, keeping these rules is the price of not breaking what works.

Still, I doubt language design alone is enough to make compilation fast (i.e. why FPC is not as fast). There must have been a lot of sorcery in the Borland compilers that we will never know. Also they must have taken advantage of only supporting Windows (and only x86?) back then.


Don't forget the most irritating restriction of all - no circular dependencies between PAS files. That one could get really painful later on in bigger projects.

The Delphi compiler was really great, but it was meant primarily for GUI business apps and not any other kind of coding. Result: the compiler hardly optimised. Borland didn't advertise new Delphi releases by referencing benchmark scores for example, but rather new productivity features.

Building fast compiler is not sorcery. Look at Java. JavaC starts out interpreted every time you run it, but it can still compile a large amount of source code very fast. Then at runtime HotSpot C1 can produce machine code from bytecode at a staggering speed. The resulting code even has respectable performance, with the slower (but still fast) C2 compiler only kicking in for where it's really needed.


Turbo Pascal has supported circular dependencies since Turbo Pascal 5.5 for MS-DOS.


Only with some restrictions. I can't remember what they were but it was something like a circular dep between two interface sections wasn't possible, but between two implementation sections was.


Correct, but there was a way.


Oh yeah, taking this for granted with C# all the time! This what made C# my go-to prototyping language. Circular dependencies might be "bad", but it takes more planning to avoid them when coding from scratch. C++ is also bit of a pain with this. Furthermore C# doesn't need an equivalent of C/C++ headers or Pascal's interface.


> single pass compiler with simple recursive-descent parser. > Maybe Delphi compiler works the same way.

Delphi is indeed a single-pass compiler. It does not use a normal recursive-descent parser, but a hand-coded parser. Other than that, yes - the Delphi language today is still designed to be extremely efficient to parse and it compiles a million lines of code in seconds.


Recursive descent parsers are usually hand-coded and vice versa.


>That said, C# solves this problem near perfectly.

Hadn't it already been solved in Java even better before? It doesn't look right to praise C# for resolving the problem when C# merely took that idea from Java. C# came up with the templates part though.

What C# didn't copy was the idea of turning every source file into a binary file. A library is just a zip archive of those .class binary files. What's the advantage for C# to not have .class files and to pack everything into a 'proprietary' library format? Why haven't they copied the zip-idea, too?


Gentle reminder that architect of C# was snatched by Microsoft from Borland. Before that he was main guy at Borland for Pascal/Delphi. So I'm not so sure he simply copied from Java anything.

https://en.wikipedia.org/wiki/Anders_Hejlsberg


His first job at Microsoft was creating J++.

He was not snatched, he left Borland unhappy with its management and Microsoft was the obvious choice as next employer.


> Gentle reminder that architect of C# was snatched by Microsoft from Borland.

snatching wutt? They could have paid more to keep him.

Not a Microsoft fan either, but Borland managers have no right to complain :)


No duplication. There's a single file where every type reference or method reference exists once. Note that it's not a library file. The compiled dll or exe has no relationship to the original way it was stored in .cs file. Instead it's a set of tables which hold all types, all members.


Partial types.


The main reason is to have a library-scope name table and a bunch of global indexes. Can't beat that in terms of performance.


Worth noting that Java has this too now, it's called the "jimage" format. However it's not used for redistribution of compiled artifacts, that's still JARs. Instead a jimage file is created when JARs are "jlinked" into a runtime image.

This requires modules and it's a new technique that post-dates Java 8, so not many Java shops use it yet. However Java 9+ comes optimised this way by default, that's what the 'modules' file is in the JDK lib directory. Class files have their strings extracted and placed into a single string table are compressed, and their offsets then placed into a "perfect hash table" that's guaranteed to satisfy lookups in O(1) or O(2) time.


You can look back at least as far as Ada83 to find a specification for a library manager as part of the system, where precompiled modules are stored in a format that lends itself to faster compilation.


However the Ada library manager has a reputation for being complicated, slow, and fragile. Gnat was able to compile Ada much faster by using a C-style compilation model instead! https://dwheeler.com/essays/make-it-simple-dewar.html


Oh, definitely, but just because the implementation sucks doesn't mean that the idea isn't there.

Modula-3 was also pretty slow compiling, but it also compiled interfaces so that it could refer to them. Unfortunately, the underlying objects could change during a compilation because of partial revelation, so parts of the system would have to be recompiled when the builder found those cases. I'm sure Modula-2 could have been faster in that regard, not having the opportunity for a type to be elaborated more in later compilation units.


> In order to support preprocessor macros and C++ templates, referencing a module means recursively "including" a bunch of header files (i.e. reparsing them from scratch), meaning O(n^2) complexity where n is the number of modules. You can speed it up with precompiled headers, but you will still need to separately parse each sequence of references

Not sure I really get your point here. Even if you remove macros and templates, reading headers is actually necessary just for the sake of knowing the size of structures, the offsets of attributes, etc.

How can _any_ language compile a unit passing a struct Foo on the stack without knowing the size of Foo (and thus having to read, say, foo.h)?

Now I agree that if you play with preprocessor to have conditional things in your headers, reparsing will be necessary, but I guess this has been solved by include guards/pragma once a long time ago. 99% of preprocessor usage is project wide configuration, so wrapping the whole headers is fine.

Not saying this is the best solution, but in practice, that shouldn't make much difference.

A subtle distinction that can make quite a difference though is to provide independent forward declarations. A lot of times, your compiler will just need to know the symbols and signatures (e.g. When passing by reference), and providing the full structure definition is overkill and slower. I don't see enough people taking care to that, which is a pity


>How can _any_ language compile a unit passing a struct Foo on the stack without knowing the size of Foo.

The idea is not to compile a unit without knowing the size of Foo. It's about making sure you figure out sizeof(Foo) in near constant time.

With C#/Pascal, it means:

* For each referenced module load its symbol table into RAM. If implemented cleverly, this is literally one read() call.

* For each referenced symbol, search each hash table until you find it. Since the hash tables are only built during compilation, you can auto-size them so that an average lookup takes just 1-2 steps.

* Bonus point: for libraries, you can merge the hash tables. So n is now a number of libraries, not number of files.

With C++ you are stuck reparsing megabytes of code at the beginning of each translation unit. It's millions of times slower than reading a couple of pages from disk to RAM.


I feel stupid but I still don't understand what would prevent a C++ compiler to do exactly what you just described.

The preprocessor does not have side effects. That means whenever a header is compiled, you could store a hash of all used defines and their checksum. Whenever that file should be read again, compare the checksum of the used defines to the one stored. If they are the same, you can reuse the precompiled header. If not, create a new version of the precompiled header for a different set of defines.

99.9% of the time, every single header is included with the exact same set of defines.


Ok. This is what prevents a C/C++ compiler from doing that:

  config.h

    #define I_LIKE_FEATURE_X

  b.h
    #include "config.h"

    struct App {
      #ifdef I_LIKE_FEATURE_X
      FeatureXImplementation *fx;
      #endif
    }
The preprocessor can change the size of structs (even redeclare them several times, or create new ones with unique names or ...)


I don't see why this is a problem for reusing precompiled headers in the way the parent post suggested. If the compiler records the value of every #define when it precompiles b.h, if b.h is #included again with the same #defines set, the compiler can reuse the same precompiled version of b.h.

Indeed, if the compiler-buildsystem detects that b.h has been #included with a different set of #defines, it might be a good idea to issue a warning because that could be an indication that the codebase is about to footgun itself.


That was already accounted in my answer

> whenever a header is compiled, you could store a hash of all used defines and their checksum. Whenever that file should be read again, compare the checksum of the used defines to the one stored. If they are the same, you can reuse the precompiled header. If not, create a new version of the precompiled header for a different set of defines.

In this example, when compiling b.h, you can store that this compiled header is for when state<I_LIKE_FEATURE_X=1>. Next time you have to include it, check if the state matches. If that is the case, use the precompiled header. If not, create a new version for state<I_LIKE_FEATURE_X=0>. As also already mentioned, it is quite rare for a single project to include the same file with a different configuration of defines that would change it. 99.9% of the times you would just -DI_LIKE_FEATURE_X project wide, so à single precompiled header only would be necessary


> For each referenced module load its symbol table into RAM. If implemented cleverly, this is literally one read() call.

That might easily be a lot slower than building a huge hash table at the start.

But there is a huge advantage to be had if the effort of building the large hash table can be shared between all files that are simultaneously built.


Pascal/Delphi do not use regular object files, instead they have DPU/TPU files which both contain parsed headers and object code.

Those files are not compatible between major versions, so this probably means they are designed to make them as easy to parse as possible, and map closely to internal compiler structures. This means including a unit is way faster than re-parsing the unit's text.

Additional speculation: those DPU files might contain a copy of relevant definition (like names of fields, sizes and offsets). This means if you have "program" which uses "my_unit" which uses "high_level_comms" which uses "Handle" defined in "low_level_comms", then definition of "Handle" could be copied directly into my_unit.DPU. So when you compile "program", and it includes "my_unit", all you need to read is "my_unit.DPU". You do not care about total size of "high_level_comms" and "low_level_comms" files.

Unfortunately, in C/C++, when people '#include <iostream>', they can now use "std::plus", "mbrtowc" and "timer_getoverrun". Which means including high-level modules has to bring tens of thousands of lines in translation units. (Yes, impl pattern, well-designed interfaces, forward declarations can fix this, but few people bother. Stdlib certainly does not)


Compiled Delphi unit files include a serialisation of the symbol table.

Import definitions are not included. There's an import symbol for each import that has a pointer to the definition information which is updated when the used unit is loaded. This linking is fast, and uses a combo of three things: hash code, symbol name and symbol version. Hash codes in the Delphi compiler are cached in each symbol, so each symbol linked is a hash probe and a few string compares at most. The symbol version is a hash of its (original) definition. If there's a version mismatch, Delphi knows it needs to recompile the unit.

(I spent about 6 years working on the compiler at Borland / Codegear / Embarcadero.)


C/C++ object files (and shared libraries) can include DWARF info that effectively gives you the same featureset as Pascal "parsed headers".


Not everyone uses DWARF though.


C++20 interface modules are just the same.


> How can _any_ language compile a unit passing a struct Foo on the stack without knowing the size of Foo (and thus having to read, say, foo.h)?

by having a runtime that has higher level instructions like "change that field" moved the onus of knowing where the field is onto the runtime. Another answer is that it's the difference between needing to parse text everytime and having some binary intermediary description (which both delphi and c# have, compare .obj to .dcu and an assembly). This means there is less work to do when a dependency changes in incremental compilation.


> Another answer is that it's the difference between needing to parse text everytime and having some binary intermediary description

This is an implementation detail to me. Most C++ compilers _do_ compile headers if you tell them to.

> by having a runtime that has higher level instructions like "change that field" moved the onus of knowing where the field is onto the runtime

Alright but that's not the same language anymore. With such a runtime you don't compile natively anymore, you compile for a VM that will dynamically allocate stack space and translate adresses.


> This is an implementation detail to me. Most C++ compilers _do_ compile headers if you tell them to.

C and C++ compilers will pre-compile one header if you ask nicely, the first one you include. Compiling multiple independent headers is very hard, though.

C and C++'s header time complexity hell is not an “implementation detail”, every compiler has to suffer with it. The C preprocessor is a fundamental design problem.


> C and C++ compilers will pre-compile one header if you ask nicely, the first one you include. Compiling multiple independent headers is very hard, though.

Are you sure this is the case? my interpretation of clang's documentation does not confirm this. From what I understand, the at the very least headers are pre-parsed to an AST. At best, they are fully precompiled. I don't see mentions of restrictions on the number or nesting of headers.

> The C preprocessor is a fundamental design problem.

Come on, this is a huge overstatement. While I can agree that a textual replacement based preprocessor is not the sexiest of things, it does have the advantage of being pretty much limitless, simple to grasp, and implement.


> The use case for precompiled headers is relatively simple: when there is a common set of headers that is included in nearly every source file in the project, we precompile that bundle of headers into a __single__ precompiled header (PCH file).

From https://clang.llvm.org/docs/PCHInternals.html#design-philoso... (emphasis mine)


> While I can agree that a textual replacement based preprocessor is not the sexiest of things, it does have the advantage of being pretty much limitless, simple to grasp, and implement.

…and is a fundamental performance bottleneck. Why is it an overstatement? I agree it's simple and flexible, but compilation performance suffers regardless.


> I guess this has been solved by include guards/pragma once a long time ago

include guards don't really solve this problem, because the preprocess needs to read every line to find the end of the include guard. The pragma once solution works, but it is hard to implement it for every system (what happens when a file has different names but same content)?


Microsoft's compiler has an optimization that means it doesn't even open the header file a second time if the include guard surrounded all non-comment tokens. I imagine gcc and clang do something similar.

Of course, they also recommend using "#pragma once" unless you're really concerned about portability.

Source: https://docs.microsoft.com/en-us/cpp/preprocessor/once?view=...

> The compiler recognizes the include guard idiom, and implements the multiple-include optimization the same way as the #pragma once directive if no non-comment code or preprocessor directive comes before or after the standard form of the idiom:


This is all nice, but not portable. C++ has no formal concept of a "builder", every new invocation of the compiler is independent, so it doesn't know how a previous invocation parsed a particular header file.


>The pragma once solution works, but it is hard to implement it for every system (what happens when a file has different names but same content)?

I would say, that qualifies as "good enough". The amount of headache saved from accidentally having 2 guards with the same name it way more than the requirement to NOT have multiple symbolic paths to the same file used at the same time.

That said, modern compilers just use inode number (and a similar thing on Windows), that covers about 99.9% of real-world systems where you want to build.


Related question, how do C and Go compare on compilation speed in similar codebases? Neither of them have monomorphized generics, and codegen loads should be similar.

This will tell whether pragmas / include guards have largely solved the problem.


I've seen demos of Jonathan Blow's language Jai where he shows how fast his current video game project in Jai compiles. He's doing the opposite of what you say: just give all input files to one compiler invocation without headers or incremental compilation, in order to let it handle concurrency properly. He still ends up compiling at 1mil/s. The added benefit is that his program can get access to the entire AST and modify it during the build.


Visual C++ pre-compiled headers, alongside incremental compilation and linking make it relatively fast thought.

Plus on Windows shops we tend to compile C++ libraries into native artifacts, or COM libraries, so unless you are making the whole world from scratch, or doing crazy meta-programing without external templates, do the compile times become a problem.

VC++ modules support is showing quite promising.

Then again I do enjoy having to spend most of my time on Jav/.NET land in what concerns compile times.


A more detailed answer (but still somewhat vague) for those wondering things like "how long are the lines" and "on what hardware" can be found in the linked thread: "110 cycles per source byte".

Of course, that can vary significantly based on what those bytes actually represent (e.g. scanning through comments should take far fewer cycles than parsing keywords, numerical constants, or other expressions) so the actual "texture" of code being compiled will make a difference.

...and as someone who has spent a lot of time decompiling binaries, the lack of (any!) optimisations Delphi has reflects itself in the code it generates: It's extremely fast because it generates code extremely stupidly. Redundant moves, duplicated instructions(!), and other perplexingly stupid things are frequently seen. I suppose some of that might be a reflection of the quality of the source it was compiled from, but normally those don't show through with C compilers even at O0.


Delphi programs are still mainly compiled for windows desktops, I have noticed the binary for the 120K line MPR system I developed for my client has creeped up from about 8MB to 20MB since 2007. My current delphi 10.2 takes 7.2 seconds to do a full compile or about 2 seconds if I just hit the run button and it only has to re-compile the modules I was working on, this has remained pretty consistent over the 13 years since I first wrote it, despite improvements to the hardware.

I think people can read too much in to your statement about the stupid source code the compiler generates (which I cannot confirm or deny), as the final exe binary is lightning fast without any optimisations, all performance issues I have ever encountered in Delphi go back to network calls such as REST APIs, MySQL libraries, printing, network drives, etc, good multi-threading can hide most. There are optimisation levels on Delphi but in 20 years I have never felt the need to go off the default.

As a developer, I will take near instant compilation over optimisation any day.


> As a developer, I will take near instant compilation over optimisation any day.

Not if your technical marketing department publishes results from performance tests comparing it with the competition.

I remember from my use of Turbo Pascal (way back), that it was impressively fast (CP/M version on a Z80A!) and produced at least reasonable good code (better than the awful Small C compiler of that time, partly due to the different calling convention). Moving on to C and later C++, I never had such fast turn around times anymore, but didn't miss them much in a statically typed language either.


Except it is always possible to choose what one wants, C and C++ compilers aren't the only game in town with optimization flags.


I wrote a comment about a week ago in this very forum that a non-optimizing compiler should be able to compile about 1MLOC/s [0]. That seems to stay reasonably true a week later.

What is more interesting is what tradeoffs you make to produce fast code. Chez scheme produces code that is plenty fast compared to most other scheme compilers, and does so at almost 100KLOC/s on my computer (on generated code that is relatively simple and already macro-expanded). They have strict limits on the time complexity on optimization passes, and even though some people have whined about it it seems to have served chez scheme well.

[0]: https://news.ycombinator.com/item?id=24660468


Tweet text for the Twitter haters:

> I wrote a synthetic program generator that can target different languages so I can get measurements across a bunch of different languages and compilers while varying the generator parameters. So far the champ among modern compilers is TCC and Delphi among historical compilers.

> Even when running under a Windows 98 VM, Delphi 2 can compile large .pas files at 1.2 million lines per second. TCC comes in at 0.7 million lines per second. This is on a 2.5 GHz Skylake laptop.

> All the modern Pascal compilers I could find did very poorly on the same benchmark workloads. Lazarus/FreePascal at 55 kloc/sec, Embarcadero Delphi at 15 kloc/sec.

So 1.2 million lines/sec, running through a VM, on laptop hardware.


I suspect those modern compilers are also optimizing more than Delphi 2 was, and are probably parsing a somehow larger language.

Granted, it could be a case of a code base adapting to change poorly over time.


I follow the tweet author (Per) and I know he's a big fan of building compilers in the style of Niklaus Wirth. I think the idea is that modern compilers generate very inefficient code first, and then have many expensive optimization passes, while Wirth advocated for simpler compilers that just generated reasonably efficient code in a single pass.

> parsing a somehow larger language

That would also explain it, but I think it's appropriate to acknowledge the effect it has on compile times when you add complexity to the language. A lot of folks avoid C++ features (or the language entirely) simply because they would inflate compilation times too much, and it clearly wasn't a priority for the language committee when those features were added.


Due to certain language features that's been added, like generics, the binaries can also be a lot larger. So could very well be IO constraints.

At work we currently use Delphi 10.3 (10.4 was recently released), and it compiles and links ~1Mloc in less than 10 seconds, on my older 4770k with a Samsung 850 SATA SSD.


I can't think why Windows 98 would slow it down.

If the VM is using VM acceleration, it would be near full speed. The only reason 98 matters is that it's probably 32-bit not 64-bit. But if Delphi runs on Windows 98 then it probably is 32-bit too.


No probably about it. Win98 predates AMD64 so it is 32bit only. Delphi 2 would also be 32bit.


FAT32 is not a very fast filesystem. Things like caching are necessary to make it performant. Also, the IO subsystems in Win98 sometimes chose slow paths or interfaces.


Have you got any benchmarks to back up FAT32 being slow? FAT32 has no journaling, ACLs, transactions, CoW, nor any of the other advancements that many modern file systems do (remember we are talking FAT32 and not exFAT which modern systems use when falling back to a FAT32-like file system).

Yes caching was required but that was as much to work around slow storage mediums (as was the norm back then) and in fact that capability is built into the OS rather than the file system and is used by all file systems. This is the same for Windows 9x, NT, and even Linux and UNIXes too (ZFS on Linux is the one exception I know of where caching happens within the file system driver).

Sure, Win 98 might might a few IO optimisations, I think DMA might even be disabled by default, but it’s also a much lighter OS than modern contemporaries and that will make a hell of a difference too. Remember 98 was designed to run on hardware less powerful than a 10 year old smart phone and the entire OS has a install footprint of around 300MB. There’s going to be less background processes (and particularly on a VM install specifically for Delphi 2 benchmarks). The only draw back is you wouldn’t have SMP; but that doesn’t always help with software compilers because they are traditionally single threaded processes.


FAT32 isn't often included these days in Linux FS benchmarks [0] but some older benchmarks [1] also show noticeable performance differences, especially when the workload is more meaningful than copy files. Lack of features does not imply efficient design. Caching was optional with Windows 98 and sometimes was turned off by default, depending on how the drive was formatted (partition type) and the IO driver (BIOS, LBA, OEM) loaded. The referenced Linux benchmarks probably limit the variability here since I think Linux tends to follow the same code paths, independent of the partition types.

BTW, Windows benchmarks tend to follow same pattern of FAT32 being slower than newer filesystems. [2][3][4]

[0] https://www.phoronix.com/scan.php?page=article&item=linux_us...

[1] https://web.archive.org/web/20010627181258/http://bulma.lug....

[2] https://www.flexense.com/fat32_exfat_ntfs_usb3_performance_c...

[3] https://www.sami-lehtinen.net/blog/exfat-fat32-ntfs-real-lif...

[4] https://www.tomshardware.com/reviews/ssd-file-system-ntfs,31...


> FAT32 isn't often included these days in Linux FS benchmarks [0]

Running sqlite3 on a USB stick isn't really the same as one time sequential read from a virtual hard disk though. Even FAT32 on Windows 98 could handle sequential reads just fine. In fact if you look at the other benchmarks in that link you supplied -- the ones that are doing tasks more equivalent to a compiler -- FAT32 actually comes out pretty fast.

> but some older benchmarks [1] also show noticeable performance differences, especially when the workload is more meaningful than copy files.

But the workload in question isn't more meaningful. A compiler is "just" reading a few files and then writing a few new files. It's going to be more CPU and memory intensive than disk heavy.

> Lack of features does not imply efficient design.

I'm never suggested it does. However for simple workloads like a sequential read and write from a virtual hard drive where your host has already abstracted away the problems of physical hardware with optimised drivers, a lack of features on the guest operating system could conceivably be a more efficient design. It's not that far fetched a comment given that's the principle behind unikernels.

> Caching was optional with Windows 98 and sometimes was turned off by default

It's optional in every OS (even now) but is rarely ever turned off by default under normal operating conditions -- this is as true for Windows 98 as it is for Linux.

> The referenced Linux benchmarks probably limit the variability here since I think Linux tends to follow the same code paths, independent of the partition types.

There's a few things to consider when looking at these file system benchmarks:

1. They are all running on Linux so none of them actually demonstrate the FAT32 driver in question

2. They're writing direct to physic media (which isn't the case with the Delphi 2 benchmark).

3. The workload you're demonstrating isn't equivalent to the workload of a compiler. I wasn't suggesting people should run databases on FAT32 (tbh I wasn't suggesting people should run FAT32 at all).

I would be interested in the same Delphi 2 benchmark but with Delphi running bare metal in WINE


Not surprising that TCC is slower; Pascal is much easier to parse then C.


More to the point, Wirth _designed_ Pascal to be one-pass translatable. Anders Hejlsberg took advantage of this in his succession of Pascal compilers: BLS Pascal (= Nascom Pascal), Compas Pascal, and Turbo Pascal. (Delphi is an evolution of Turbo Pascal but I don't know if he was involved).


Anders Hejlsberg was the chief architect for Delphi at Borland, and at Microsoft he was lead architect for C# and Typescript.


I always find it interesting how Danes are great at making programming languages. Anders perhaps being the most prolific among them.


I'm curious which other Danes have made great programming languages too?

Not that I doubt you, but Anders Hejlsberg is just so incredibly prolific and has developed so many great programming languages over such a long time, that it's easy to mistake him for a bunch of different Danish people at once! ;) It's hard to believe one person did all that!

The world also has the great Danes to thank for the Hundeprutterutchebane.

https://en.wikipedia.org/wiki/Hundeprutterutchebane

https://www.youtube.com/watch?v=oL8az1fV1Tw


Danes have been disproportionately successful at creating high impact languages and tools. (Canadians too)

Here’s a list.

Anders Hejlsberg (came up with Turbo Pascal, lead architect for C#)

Bjarne Stoustrup (came up with C++)

David Heinemeier Hansson (created Ruby on Rails)

Lars Bak (created V8 Javascript VM for Google Chrome)

Peter Naur (Turing Award winner, contributed to ALGOL 60)

Rasmus Lerdorf (created PHP, born in Greenland)

https://www.friism.com/famous-danish-programmers/


Dane here. I have a few more to the list (and surely many I forgot):

- You mentioned Peter Naur, but I must point out that he's the 'N' in BNF

- Per Brinch Hansen (concurrent Pascal, brought us monitors)

- Ole Lehrmann (BETA)

...

I think it's a network effect. Early practitioners in Denmark had a PL focus and fostered the next generation.


And J++.


It is hard to emphasize enough now how the original Turbo Pascal was such a breakthrough in development tools for the PC. RIP Borland - for a while there you were really the business.


RIP Pascal as well. Delphi wasn’t without it’s frustrations but Pascal was a far better language than people give it credit for these days.


Why? See how Free Pascal Compiler is alive: https://github.com/graemeg/freepascal/commits/master And its IDE: https://github.com/graemeg/lazarus/commits/upstream

And we have new versions of Delphi very year, with a free Community edition: https://beta.embarcadero.com/products/delphi/starter

PS, I myself write all my software with Delphi, and this is the newest one: https://docxmanager.com - something that turns MS Word into something like WordPress :)


I’m aware Pascal is still around (I have Lazarus installed and use it from time to time) but given how dominant Pascal once was, it’s fair to say the Pascal development community is a fraction of what it once was (even smaller if you compare it to the popularity of other languages right now).


Having been there, I start to see some similarities the way JetBrains is attempting to go everywhere and replace OS vendors own IDEs and languages.


YES! Me too, I'm developing in Go now and their Goland IDE really reminds me of some of the best of the old days @ Borland.


Worth noting that JetBrains was founded in 2000, at a time when Borland Delphi was at its peak.


Delphi was extremely fast, and later much heavier releases such as Delphi 4 still were a lot more fast back then on older machines than Lazarus/FreePascal is today on modern hardware.

To give some figures, I've installed an old non legit version of Delphi 7 on Linux through Wine, and timed some compilation tests, from start of compilation to application running. Well, on Lazarus an empty form takes like 28 seconds to compile first time, or uncached, while from 2nd time on it takes from 2 to 3 seconds. Another form filled with objects takes 3 seconds on Delphi 7 uncached, and about 1 second cached. All this under Wine, i5 machine, mechanical disk drive; the difference is astounding. I wish Lazarus/Freepascal received some optimizations to get more close to the speed of Delphi.


Faster than FreePascal? And I thought FreePascal is already damn fast, well at least before I know Go.


Yes, hence why Go compile times, while welcomed, are just surprising for a generation lost in scripting languages and C++ compile times with crazy meta-programing tricks.


Stats like this remind me how insanely slow the rust compiler is. Its possible to write a compiler which spends 110 cycles per source byte. How many cycles per source byte does the rust compiler consume? It seems like its 4-5 orders of magnitude slower - which is so crazy!

Why is it so hard to optimize? Are we just being lazy? Whats going on?


Perhaps a better question is, what are compiler authors optimizing for? A compiler that runs in 3 seconds that produces code that's half the speed as a compiler that runs in 3 minutes is a worse option than the latter in plenty of use cases. And if we really care about minimizing compile times (and if we're being a little pedantic) we can just use an interpreted language and achieve zero compile times, by virtue of not doing any compiling at all.

Rust and Pascal have different use cases in mind. I think a better question would be what are the different design philosophies between the two languages, and how this impacts compiler implementations.


There is an amount of time spent compiling which has a qualitative effect on my programming. I start to feel it around ten seconds, definitely by thirty; three minutes is way over the line.

I start to be reluctant to make small tweaks. I want to program in 'blocks', time compiling with bathroom breaks. My frame is just wrecked, and I'm fairly convinced that code quality suffers from this.

I like what Zig is doing, by taking such care to keep unoptimized builds as fast as possible. Release builds can run overnight for all I care, I'm fine with taking arbitrary amounts of time to get the fastest end product I can have.

But I firmly believe that a language which can't give me a build in a handful of seconds is missing out, and it's a real pity Rust is in that category.

It's possible that a language with type inference and a borrow checker simply must have slow compile times, that it's inherent. I doubt that somehow, though I concede that one-pass compiling a la Turbo Pascal is likely impossible.


Are those your compile times with incremental compilation?


Those are real compile times I've experienced, the post is about what it feels like for compiling to take "too long" and approximately what "too long" is. Rather than what it's like to use a particular compiler on a specific project, at this moment in time.


Pascal's original role was as a teaching language. Students will invariably compile multiple times making many errors along the way, but never run their final iteration in production.

For those who don't remember "funny money" I'll point out that, not only did we wear onions on our belts, but CPU usage was charged back in the day.

:beratna owte wameku ere nebula:


> not only did we wear onions on our belts

It was the style at the time.


Well, ideally you have both! For what I’m working on right now, a plain debug compile takes an hour. Since I usually interact with a debug build for just a few minutes (if I’m not playing tricks) I would really like a “make this faster to compile but slower to run” option. (I doubt one could exist, but you never know.)


Except you are forgetting that with many native compilers, I can choose to wait 3 seconds or 3 minutes, I am not stuck with a frozen compile time, because their toolchains allow me to choose how long I am willing to wait, including the option to use binary libraries on their package managers.

D, C++, Ada, Eiffel, Delphi,...


>Are we just being lazy?

"We" being who here? I'm pretty sure the Rust devs work like crazy on the language and its features -- they're not being "lazy". At best, they deprioritize compiler speed work, which is not the same as laziness.


Performing semantic analysis on a language like Rust is much harder than Pascal i.e. to compile a rust file you have to perform static analysis as part of the compilation.

It can be faster though. D is a very big language and I can compile the entire compiler in a second flat on my machine (i.e. it's cheap enough not to cache the resulting object file to include in my project)


Static analysis is a minor part of Rust compile cost. The main cost is from optimization, as well as from redundant work caused by monomorphized generic code or custom macros.


The rust compiler is pretty slow even in debug mode, with optimizations turned off. (Although macros and monomorphization still happen in debug mode).

In general I don't buy the complexity argument. Rust is more complex than Go, D or Delphi. But its not 50x more complex, or whatever we would guess from the compilation speed difference.

I suspect a lot of the compilation time bloat is a result of all the intermediate representations the code goes through, in order to play nice with LLVM's optimizers and codegen. (Or to put it another way, I suspect a compiler optimized for development mode iteration has the potential to be way faster than rustc.) If you mostly don't care about runtime performance, a debugging compiler could probably also skip monomorphization and just compile everything once and use dynamic dispatch for generics.


Rust doesn't have to be much slower but when your compiler has three IRs to go through it doesn't help.

D is a pretty complex language, but it's explicitly designed around compile times in many locations.

Changing the basic implementation of language features probably won't help a huge amount, you could probably get a lot back with a simple backend that does no optimisations other than register allocation. (The main D compiler can do more than this but it lags behind LLVM by quite a way so it's not useful for fast code)


Three IRs can actually help; it's not that simple. For example, https://blog.rust-lang.org/inside-rust/2019/12/02/const-prop... is an example of performing an optimization on an earlier IR, which gives you a 2%-10% speed boost.

> Even though LLVM has its own constant propagation pass, we see improvements because our pass operates on MIR while it is still generic. The more concrete instances of a generic function that are instantiated, the larger the payoff from this optimization.

It can be counter-intuitive, but it just really depends on the details. Of course, a single pass compiler will be faster to compile than a more complicated one, but it's not as straightforward as "more IRs == slower."


My point was more along the lines of the language is sufficiently complex to lose the bigger picture when it comes to compile time speed rather than that the compiler is definitely slower due to said IRs.


D can, be a good language to compare compile time speeds, as it has LDC(LLVM D compiler), GDC(GCC D compiler) and the official compiler optimised for speed. This way we can see how much overhead stuff like llvm add. And how much the perf is better.


This long series of blog posts ("How to speed up the Rust compiler") will be of interest: https://blog.mozilla.org/nnethercote/2020/09/08/how-to-speed...

I think they've all been discussed on HN at some point.


LLVM.

The race between most used compilers is in producing fastest code. There are only few folk who care about correctness and compile times.

LLVM, while its drawbacks are not highlighted, is a monolith and only slightly better than GCC in some areas. Compile times being one of them.

Not to forget "fast" compilers don't apply many optimizations.

There are some rust specific reasons too, like lack of binary dependencies.


I cut my teeth on Delphi 2. Didn't realize how good I had it! Solid IDE that Just Worked right out of the box with point and click GUI builder. And the builds were fast and IDE responsive even on "average" hardware of the day.


And packaging and installation was a doddle, if I remember correctly. Everything just rolled up into one installable exe?


I loved Delphi. And Borland and Borland-progeny products.

I have a shrink-wrapped box of Turbo Pascal 7 for DOS.

It's been long enough since physical acquisition of software that I'm not even sure how to say that. "Box of Turbo Pascal" seems awkward.


A gift (c. 1997?) from my parents of a "box" of Turbo Pascal is easily one of the most influential moments of my life. It was such a great programming environment and the docs (books!) were fantastic and quite accessible. I'm still chasing the dragon of the feeling of programming I had back then.


Absolutely. The rapid feedback from the near instant compile was so addictive. I would be in flow state for so long. If only I knew about unit tests back then.

I really only get that these days with toy challenge problems. That might also have a lot to do with the nature of programming now. So much is based on googling known solutions and scouring examples, etc. Probably a better use of time, but you don't get the ego trip of feeling like a wizard.


The first version wasn't even a box - a paperback book with floppy disks. Imagine that - printed documentation!


My first foray into programming was reading the GW BASIC manual and then asking my father whether he could get me GW BASIC; I got QBasic instead, which nicely integrated all the text from that tome in the IDE. Right now I wonder what children will start with and how it would compare to my journey. Personally I feel, while JS in a browser is so much more capable, it's all but beginner-friendly and its main benefit to beginners is just ubiquity.


I'm the author of the tweet. Here's some more context:

The main purpose of the testing was to measure compile time scaling on simple code that has straightforward equivalents in all languages. This meant integer and float operations, compound expressions making use of the full range of arithmetic and bitwise operators, local variables, function definitions and calls. So the code wasn't trivial (unlike most of the marketing-oriented tests or benchmarks I came across), but I also wasn't claiming that this was directly representative of a real-world code base of comparable size.

I was working on a fast compiler (which generated machine code somewhere between gcc -O0 and -O1 in code quality) and wanted to have something to compare it across languages and compilers while being able to easily vary test parameters like total code size, size of each module, complexity of the module graph, identifier/whitespace/comment length distribution, etc.

"Lines of code" is a pretty poor metric for engineering but unlike "tokens of code" or "bytes of code" it's something for which programmers have an intuitive sense of scale, so it made more sense for a tweet. Tokens of code is the most useful code size parameter for measuring one-pass compiler performance if you have to pick a single number. In an expression like "x + 1" you can assign the cost of the parsing, type checking and code generation of the expression (separate from the sub-expressions "x" and "1") to the "+" token. Even the cost of lexing is often dominated by the per-token cost (the switch jump on the leading byte is usually a forced branch mispredict of ~15 cycles) if you do the per-byte handling for variable-length tokens like identifiers efficiently. [1] If you fix the token distribution, you get almost-linear scaling with token count. [2]

[1] Identifier bytes make up 60-70% of a typical code base, so micro-optimization can really pay off here if the rest of the compiler is fast. I used a SIMD method with a mispredict-free fast path for identifiers shorter than 16/32 bytes, with vectorized scanning and hashing, and the symbol table lookup/insertion was done at the same time and tuned such that it was mispredict-free 90% of the time for lookups and only had 1 mispredict for inserts which occur the first time a given identifier is seen in a module.

[2] Except for threshold effects when your working set starts pushing you out of a level of the cache hierarchy. There's two factors: the size of your symbol working set, and your utilization of symbol hash tables. E.g. if a region of code is using a tiny subset of a huge symbol table, each loaded cacheline from the hash table is only expected to contain one symbol from your working set due to the pseudo-random distribution of hashes, but that caps out pretty quickly. With 64-byte cachelines and interleaved key/value pairs of 16 bytes, this part can't get worse than 25% cache utilization. So the worst-case footprint from the symbol working set is one 64-byte cacheline per symbol in the working set plus whatever associated symbol data is accessed. This utilization is improved if you put more symbol data directly into the hash table cacheline but unfortunately that would mean you don't have stable pointers to that data when you need to rehash.


I did a similar test some years ago: https://imgur.com/a/jQUav#xVgi2ZA (Have you tried plotting some data?) I found some interesting and perhaps surprising results. (But I also acknowledge that it was very far from real-world code, any conclusions drawn from that were only for fun.)


Yeah, I remember seeing your measurements at the time.

I did plots, but for most of the compilers I only measured at a handful of points in the parameter space since it took forever to run and I wanted to be able to regenerate the measurements in a reasonable amount of time if I changed the generator or parameters. I did dense multi-parameter measurements and model fitting for my own compiler; that's how it started, so the comparative data for other languages and compilers came later.


Turbo Pascal compiled thousands of lines a second on a 286. When I moved to Turbo C I was shocked at how long it took to compile code. What took 3 seconds now took 15.


If that scaled completely linearly - I'm sure it doesn't, but it might be close enough - it would mean that it could compile a few kloc in less time than most interpreters/VMs take to boot up.

For me, "instant" compiler and test feedback would be the ideal development experience. (Plus a good debugger)

I think sadly there are some features (e.g. HM-style type inference) that I want to have but would be an order of magnitude or two slower than this. But still, I'd love to see a compiler for a modern language use something like this as a goal. Even 10kloc/thread*sec would be quite good, now that we have 16+ thread CPUs readily available.


The "other" fast compiler (tcc) can do this:

https://bellard.org/tcc/tcc-doc.html

> TCC can be invoked from scripts, just as shell scripts. You just need to add #!/usr/local/bin/tcc -run at the start of your C source:

  #!/usr/local/bin/tcc -run
  #include <stdio.h>

  int main() 
  {
    printf("Hello World\n");
    return 0;
  }
And famously, with some patches it can compile the Linux kernel at boot time (in reasonable time):

https://bellard.org/tcc/tccboot.html


Back in the days of floppy drives, an interesting technical challenge was: write a system that boots, compiles itself, and enters its REPL before the drive mechanics manage to hit full RPM.

What would the equivalent be on current systems? I suppose it might be possible to write a system that boots in a single bandwidth-delay product, but somehow that doesn't have the same attraction.

ando wok fash nawit towsh nagut (running light without overbyte)


Before the flash drive heat throttles itself?


There used to be a Borland Museum that had free copies of Turbo Pascal and Turbo C, but it is down now as every link I have for it doesn't work. I was hoping they'd release Delphi 2.0 or 1.0 because it is 16 bit Windows.


The old museum site is currently offline, but if you visit the Delphi 25th anniversary site (https://delphi.embarcadero.com/) you can find in the menu at the top a link to download "Historic Delphi 1 Client/Server Install ISO"


Maybe we need to make a museum site museum.


Windows 3.1 and Windows 95 can run in DOSBOX just install the Borland software in a DOSBOX machine.

Edit I got DOSEMU mixed up with DOSBOX.


Delphi 2 was the version that introduced the 32-bit compiler and ran on Windows 95. My impression is that Embarcadero (who bought and own the rights to Delphi) still have customers with legacy Delphi applications who pay them for their newer backward-compatible Delphi product, so they're unlikely to release the older versions for free. It's also a bit of a hassle to run on modern PCs. With Turbo Pascal you can just use DOSBox but you need to run Delphi on an old Windows version. If it runs in Wine, that might be the easiest option.



Thanks


Theres a video of this on YouTube it always fascinated me. I asked an old timer about it he said it was designed to be simple to compile and the compiler was in Assembly. Not sure how truthful this is would love to hear more about how and why its so fast. I would love to someday see FreePascal be just as fast. Its a shame Borland didnt try to make some sort of collaboration between them and FPC. I am sure FPC would welcome some sort of joint effort you help us we help you kinda deal.


I think optimizations passes is what takes the most time in a compiler. Older compilers did a lot less.


Well then at leastbwith -O0 should be similiar times achievable?


Modern compilers are still pass-based with an optimisation-friendly IR even if you skip the actual optimisation passes. A single-pass compiler will be faster because it does not have the overhead of doing things in multiple passes.


-O0 still does some optimization from what I can find online. Maybe someone knows more specifics since the manual is pretty brief.


I can only confirm that -O0 performs some optimization, which can be annoying when single-stepping through the code (cough, cough) using a debugger. Hence gcc offers -Og to suppress those optimizations (not sure, whether that in itself causes some debugging info to be added though).


Turbo pascal was single pass, parse and compile.


I think a very early Turbo Pascal was in assembly, but I would find that hard to believe of Delphi.


It isn't that much of a stretch to believe that they kept most of the compiler in assembly. The language itself didn't change that much from Borland/Turbo Pascal 6.0/7.0 to Delphi iirc.

Also, there was a source code leak about a decade ago of Turbo Pascal 6.0 that showed the compiler was still 100% x86 assembly. Just found it via Yandex - it was pretty hard to find with Google. (I only looked at the directory listings of the source code filenames just now, at some sourceforge-like site.)


As a young lad, I studied the internals of BLS Pascal (the great-grandfather of Turbo Pascal) and was blown away by the cleverness of much of the code. I had never seen such clever machine code before and have ever since considered Anders a bit of a genius.


Anders isn't a 'bit of a genius' when it comes to language/runtime design he is straight up one of the geniuses.

I'm 40, I've programmed since 87 and the languages I've truly loved using in the real world have always been Anders languages in some way (TP -> Delphi -> C#/.Net -> TypeScript) - he has amazing taste in what he implements and when - he just has an eye for practical languages that solve common problems nicely.



Free Pascal and Lazarus - the open source alternative to Delphi, is also extremely fast.

A demo that I put up sometime ago of developing a very simple GUI Hello World app from starting a fresh project to running it in less than 42 seconds.

https://www.youtube.com/watch?v=XsCDqzL3k7k


And me having a Gtk-rs project that takes 18 minutes to compile from scratch with an empty frame and a couple of menu entries on a Core Duo.

My test project to validate compile time improvements on each Rust release.

The C++ original with Gtkmm takes less than 10 minutes.

Really miss Delphi like compile times, so then I just end up doing something on Qt, Lazarus or Eiffel Community instead.


I have fond memories of working in Turbo Pascal. Compiling and running a program felt instantaneous.


Reminds me of a comic I saw in an old Apple ][ magazine:

(One kid talking to another, with dad in the background hunched over an Apple ][.)

"Daddy's playing Pascal. That's where you try and see how many dots you can get before it beeps and you say nasty words."


Not surprised [0]. Pascal was always very fast to compile, especially with Borland tools.

[0] https://prog21.dadgum.com/47.html


Good old times ... Thank you for the nostalgic reminder.


Yup. Delphi was always stupidly fast.

I recall first time using another IDE I thought the compiler had hung. Nope...just slow.


It doesn't matter. Most of the time now is spent on code optimization. And it's becoming more significant everyday


Pascal compiles to some sort of bytecode doesn't it?

One would think it's an easier task than all the fancy x86 business?


Early Pascals did. P-Code [0] it was called. I don't think it was really about easy of compilation. It was about ease of portability, at least early on.

[0] https://en.wikipedia.org/wiki/P-code_machine


It was and it was key to why Pascal became so successful. Porting the VM of the UCSD-p machine was so much easier than writing a new backend so Pascal spread like wildfire. I think I even saw a Commodore 64 port at some point.


Not compilers released by Borland/successors or Free Pascal. It is all native code.


Now let's compare performance after implementing 2 decades worth of missing program optimization techniques

(edit: downvotes? I'm sorry to ruin the romanticizing, but surely it should come as no surprise that these old compilers aren't fast due to black magic)


For C compilers like gcc and clang it's comparing against -O0. Anyway, there are two issues with that long-term trend in compiler design from my perspective. The first is that the optimizing path is too slow and they don't offer an intermediate point on the Pareto frontier that reflects my trade-off preferences because too much of the slowness is architectural in origin. But the second and much worse problem is that the non-optimizing path is barely any faster. You can't make a fast non-optimizing or somewhat-optimizing compiler by taking the architecture of a heavyweight optimizing compiler and setting some internal flags and skipping a few passes. You need the fast compilation path to completely dictate the architecture from end to end, with the constraint of still supporting optimizing backends, and you need a development culture that treats the fast path's performance as inviolate and sacrosanct.


I only want that on a few of my builds.

Most days I could be happy with 1 second builds.

Considering that I don’t have a million lines of code, I think 100,000 lines of code a second should cover it.


My prat comment about C++ is the compiler writers are stuck in a trap. Because compiling C++ is slow they add optimizations to the compiler (also written in C++) to make the code faster. But that makes the compiler even slower. So they add even more optimizations to the compiler to make it run faster...


No.

C++ is slow to compile due to some historical decisions like preprocessor.

But the C++ standard committee has come to rely on optimizing compilers with powerful inliners and optimizers, to implement new "zero cost" [sic] abstractions as convoluted template libraries with lot of corner cases (neglecting incomprehensible error messages and debug build performance as well).

Efficiently compiling C++ effectively __requires__ a top notch optimizer because of the disconnect between standards committee and implementers. Moreover, relying on optimizers often results in non obvious performance traps.

The stupidity of standards committee and their sheer insistence on nothing but being able to sell more books every years is why even modern C++ is such a horrible development experience.


This description is absolutely at variance with the reality of C++ Standard activity. Compiler and Standard Library implementers are intimately involved in all phases of standardization. Book sales have much less effect than authors would like; they would prefer many fewer changes.

Furthermore, C++ coding experience improves markedly with each new Standard, released on a 3-year cadence. C++20 is nicer than '17, which is nicer than '14, which is nicer than '11, which is much nicer than '03, which is a little nicer than '98. '23 will be nicer than all of them.


Eh? I don't understand which kind of involvement. I occassionally read C++ proposals, and also read blogs from some very good C++ programmers.

The "everything implemented as library" pattern so pervasive in new C++ releases is not doing any good to the language. All those are half-baked implementation of what exists as a simple orthogonal language feature in languages like OCaml. The "implement as library" sentiment has gone too far that it hurts, among other things: * Debug build performance * Compilation times (a big thing) * Quality of error messages * Reliance on not-so-reliable optimizations, often causing performance regressions (there are many instances of this happening).

Even if clang / gcc developers are involved, probably they don't bother about 1000::template shittery because their compilers have already got comparable quality inliners, and it is hard for a competition to surface here. Or they might bother but can't be sufficiently involved into the C++ standard committee crowd.

> Furthermore, C++ coding experience improves markedly with each new Standard, released on a 3-year cadence.

And don't tell me my knowledge of last standard is obsolete and I have to read the new shiny c++ book to learn Modern C++. Another consequence is the scarcity of freely available resources accessible to students, due to rapid change.


You may invent whichever activities suit your imagination, but I can state with absolute authority of decades' experience that they are utterly divorced from any objective reality.

Most people learning the newer features of C++ do not turn first to books, but rely on cppreference.com, blog posts, and on conference videos easily found on youtube.

The best way to learn the new features is to use them.


Beauty in the eye of the beholder. Clearly C++20 has more features than C++17, which had more features that C++14, etc., but nicer? There are some features which are nice, but the whole set? And some of those features solve just problems one doesn't have outside of C++ and other problems remain. I think C++ is the Algol68 of our days. It won't be improved upon, but (hopefully soon) replaced.


I can only report my own experience, but programming with concepts is definitely better than without, with generic lambdas better than without, with template variadics better than without, with pass-by-move better than without.

It would be lovely to have a more modern language with C++'s capabilities, but no such language is even approaching the horizon. Rust aspires to such a role, but more people pick up C++ each week than have ever coded one line of Rust. Adoption, judging from conference attendance and youtube viewership, is today growing faster than ever.

You may imagine C++ stagnating, but only by abandoning reality. Essentially all of the highest-paid development for the most demanding applications is today done in C++. For reasons.

There are only two kinds of language: those people complain about, and those no one uses. C++ is complained about more than any except Java or Javascript.


You don't get it, the language is terrible from a compilation point. I would say designed but that would be an insult to people that actually did/do design languages.

You can't actually fix it, you can only make it worse.

Simple point: Adding optimizations doesn't speed C++ compilers up faster than having do those optimizations slows them down.

Solution, stop using C++ for anything.

The most hopeful thing in the last 5 years is the move to remove C++ from new languages tool chains. Good because C++ is hopelessly broken.


> Simple point: Adding optimizations doesn't speed C++ compilers up faster than having do those optimizations slows them down.

Well that's an obvious point.

> Solution, stop using C++ for anything.

We can't, realistically.

I know rust is great in many aspects. I know lot of software in C++ can be written better even in an AOT'd GC language with equal or better performance, because of so many C++ performance pitfalls that appear at scale.

But we can't throw away all the code already written. There is lot of talent pool for C++. And the supposedly better languages have non existent talent pool. (Go doesn't count, such a boilerplate language hampers any readability at scale).


I wonder what these "performance pitfalls at scale" are, that other languages succeed in avoiding. Normally such an expression would refer to abstraction overhead, but this is where C++ excels. Cache nonlinearities, with pipeline stalls and branch prediction failures, affect all languages. Control of memory footprint is essential to avoiding those, but C++ exceeds others there, too.


I don't care about optimizations, I just want it to compile fast so that I can test if my new code solves the bug.

Before final release to QA, we can run the compiler over night on the CI and get a hyper-optimized build.

Or not, since it is a compiled language it will likely be fast enough anyway.


So what? compiler should catch bugs as soon as possible, as much as possible, at compile time, if it can do this fast, great! if not, then not that great, but far better than chewing through billions lines of code and do nothing, leaving you all your bugs to the run time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: