`
ideage
  • 浏览: 317996 次
社区版块
存档分类
最新评论

The Case for D

阅读更多

The Case for D

Andrei Alexandrescu

 

D could be best described as a high-level systems programming language

Andrei Alexandrescu is the author of Modern C++ Design and The D Programming Language. He can be contacted at erdani.org/.

 


Let's see why the D programming language is worth a serious look.

 

Of course, I'm not deluding myself that it's an easy task to convince you. We programmers are a strange bunch in the way we form and keep language preferences. The knee-jerk reaction of a programmer when eyeing a The XYZ Programming Language book on a bookstore shelf is something like, "All right. I'll give myself 30 seconds to find something I don't like about XYZ." Acquiring expertise in a programming language is a long and arduous process, and satisfaction is delayed and uncertain. Trying to find quick reasons to avoid such an endeavor is a survival instinct: the stakes are high and the investment is risky, so having the ability to make a rapid negative decision early in the process can be a huge relief.

 

That being said, learning and using a programming language can be fun. By and large, coding in a language is fun if the language does a satisfactory job at fulfilling the principles that the coder using it holds in high esteem. Any misalignment causes the programmer to regard the language as, for example, sloppy and insecure or self-righteous and tedious. A language can't possibly fulfill everyone's needs and taste at the same time as many of them are contradictory, so it must carefully commit to a few fundamental coordinates that put it on the landscape of programming languages.

 

So what's the deal with D? You might have heard of it already -- the language with a name like a pun taken a bit too far; annoyingly mentioned now and then on newsgroups dedicated to other languages before the off-topic police reprimands the guilty; praised by an enthusiastic friend all too often; or simply as the result of an idle online search a la "I bet some loser on this big large Internet defined a language called D, let's see... oh, look!"

 

In this article I provide a broad overview, which means by necessity I use concepts and features without introducing them rigorously as long as they are reasonably intuitive.

 

Let's take a brief look at some of D's fundamental features. Be warned that many features or limitations come with qualifications that make their boundaries fuzzy. So if you read something that doesn't quite please you, don't let that bother you too much: the next sentence may contain a redeeming addendum. For example, say you read "D has garbage collection" and you get a familiar frozen chill up the spine that stops in the cerebrum with the imperious command "touch the rabbit foot and stay away." If you are patient, you'll find out that D has constructors and destructors with which you can implement deterministic lifetime of objects.

 

But Before Getting Into It...

 

Before getting into the thick of things, there are a few things you should know. First and foremost, if you kind of considered looking into D for whatever reason, this time is not "as good as any," it's in fact much better than others if you're looking for the edge given by early adoption. D has been evolving at a breakneck pace but in relative silence, and a lot of awesome things have been and are being done about it that are starting to become known just about now -- some literally in this very article. At this writing, my book The D Programming Language is 40% complete and available for pre-order at Amazon. Safari's Rough Cuts subscription-based service makes advance chapters available here.

 

There are two major versions of the language -- D1 and D2. This article focuses on D2 exclusively. D1 is stable (will undergo no other changes but bug fixes), and D2 is a major revision of the language that sacrificed some backwards compatibility for the sake of doing things consistently right, and for adding a few crucial features related to manycores and generic programming. In the process, the language's complexity has increased, which is in fact a good indicator because no language in actual use has ever gotten smaller. Even languages that started with the stated intent to be "small and beautiful" inevitably grew with use. (Yes, even Lisp. Spare me.) Although programmers dream of the idea of small, simple languages, when they wake up they seem to only want more modeling power. D's state of transition is putting yours truly in the unenviable position of dealing with a moving target. I opted for writing an article that ages nicely at the expense of being occasionally frustrating in that it describes features that are in the works or are incompletely implemented.

 

The official D compiler is available for free off digitalmars.com on major desktop platforms (Windows, Mac, and Linux). Other implementations are underway, notably including an a .NET port and one using the LLVM infrastructure as backend. There are also two essential D libraries, the official -- Phobos, and a community-driven library called Tango. Tango, designed for D1, is being ported to D2, and Phobos (which was frustratingly small and quirky in its D1 iteration) is undergoing major changes and additions to take full advantage of D2's capabilities. (There is, unsurprisingly, an amount of politics and bickering about which library is better, but competition seems to spur both into being as good as they can be.)

 

Last but definitely not least, two windowing libraries complete the language's offering quite spectacularly. The mature library DWT is a direct port of Java's SWT. A newer development is that the immensely popular Qt Software windowing library has recently released a D binding (in alpha as of this writing). This is no small news as Qt is a great (the best if you listen to the right people) library for developing portable GUI applications. The two libraries fully take D into "the GUIth dimension."

 

2.D Fundamentals

 

D could be best described as a high-level systems programming language. It encompasses features that are normally found in higher-level and even scripting languages -- such as a rapid edit-run cycle, garbage collection, built-in hashtables, or a permission to omit many type declarations -- but also low-level features such as pointers, storage management in a manual (' la C's malloc/free) or semi-automatic (using constructors, destructors, and a unique scope statement) manner, and generally the same direct relationship with memory that C and C++ programmers know and love. In fact, D can link and call C functions directly with no intervening translation layer. The entire C standard library is directly available to D programs. However, you'd very rarely feel compelled to go that low because D's own facilities are often more powerful, safer, and just as efficient. By and large, D makes a strong statement that convenience and efficiency are not necessarily at odds. Aside from the higher-level topics that we'll discuss soon, no description of D would be complete without mentioning its attention to detail: all variables are initialized, unless you initialize them with void; arrays and associative arrays are intuitive and easy on the eyes; iteration is clean; NaN is actually used; overloading rules can be understood; support for documentation and unit testing is built-in. D is multi-paradigm, meaning that it fosters writing code in object-oriented, generic, functional, and procedural style within a seamless and remarkably small package. The following bite-sized sections give away some generalities about D.

 

Hello, Cheapshot

 

Let's get that pesky syntax out of the way. So, without further ado:

 

import std.stdio; void main() { writeln("Hello, world!"); }

 

Syntax is like people's outfits -- rationally, we understand it shouldn't make much of a difference and that it's shallow to care about it too much, but on the other hand we can't stop noticing it. (I remember the girl in red from The Matrix to this day.) For many of us, D has much of the "next door" familiar looks in that it adopted the C-style syntax also present in C++, Java, and C#. (I assume you are familiar with one of these, so I don't need to explain that D has par for the course features such as integers, floating-point numbers, arrays, iteration, and recursion.)

 

Speaking of other languages, please allow a cheapshot at the C and C++ versions of "Hello, world." The classic C version, as lifted straight from the second edition of K&R, looks like this:

 

#include <stdio.h>
main()
{
    printf("hello, world\n");
}

 

and the equally classic C++ version is (note the added enthusiasm):

 

#include <iostream> int main() { std::cout << "Hello, world!\n"; }

 

Many comparisons of the popular first program written in various languages revolve around code length and amount of information needed to understand the sample. Let's take a different route by discussing correctness, namely: what happens if, for whatever reason, writing the greeting to the standard output fails? Well, the C program ignores the error because it doesn't check the value returned by printf. To tell the truth, it's actually a bit worse; although on my system it compiles flag-free and runs, C's "hello world" returns an unpredictable number to the operating system because it falls through the end of main. (On my machine, it exits with 13, which got me a little scared. Then I realized why: "hello, world\n" has 13 characters; printf returns the number of characters printed, so it deposits 13 in the EAX register; the exit code luckily doesn't touch that register; so that's ultimately what the OS sees.) It turns out that the program as written is not even correct under the C89 or C99 standards. After a bit of searching, The Internet seems to agree that the right way to open the hailing frequencies in C is:

 

#include < stdio.h> int main() { printf("hello, world\n"); return 0; }

 

which does little in the way of correctness because it replaces an unpredictable return with one that always claims success, whether or not printing succeeded.

 

The C++ program is guaranteed to return 0 from main if you forgot to return, but also ignores the error because, um, at program start std::cout.exceptions() is zero and nobody checks for std::cout.bad() after the output. So both programs will claim success even if they failed to print the message for whatever reason. The corrected C and C++ versions of the global greet lose a little of their lip gloss:

 

#include <stdio.h> int main() { return printf("hello, world\n") < 0; }

 

and

 

#include <iostream> int main() { std::cout << "Hello, world!\n"; return std::cout.bad(); }

 

Further investigation reveals that the classic "hello, world" for other languages such as Java (code omitted due to space limitations), J# (a language completely -- I mean completely -- unrelated to Java), or Perl, also claim success in all cases. You'd almost think it's a conspiracy, but fortunately the likes of Python and C# come to the rescue by throwing an exception.

 

How does the D version fare? Well, it doesn't need any change: writeln throws on failure, and an exception issued by main causes the exception's message to be printed to the standard error stream (if possible) and the program to exit with a failure exit code. In short, the right thing is done automatically. I wouldn't have taken this cheapshot if it weren't for two reasons. One, it's fun to imagine the street riots of millions of betrayed programmers crying how their "Hello, world" program has been a sham. (Picture the slogans: "Hello, world! Exit code 13. Coincidence?" or "Hello, sheeple! Wake up!" etc.) Second, the example is not isolated, but illustrative for a pervasive pattern -- D attempts not only to allow you to do the right thing, it systematically attempts to make the right thing synonymous to the path of least resistance whenever possible. And it turns out they can be synonymous more often than one might think. (And before you fix my code, "void main()" is legal D and does what you think it should. Language lawyers who destroy noobs writing "void main()" instead of "int main()" in C++ newsgroups would need to find another favorite pastime if they switch to D.)

 

Heck, I planned to discuss syntax and ended up with semantics. Getting back to syntax, there is one notable departure from C++, C#, and Java: D uses T!(X, Y, Z) instead of T<X, Y, Z>(and T!(X) or simply T!X for T<X>) to denote parameterized types, and for good reasons. The choice of angular brackets, when clashing with the use of '<', '>', and '>>' as arithmetic operands, has been a huge source of parsing problems for C++, leading to a hecatomb of special rules and arbitrary disambiguations, not to mention the world's least known syntax object.template fun<arg>(). If one of your C++ fellow coders has Superman-level confidence, ask them what that syntax does and you'll see Kryptonite at work. Java and C# also adopted the angular brackets but wisely chose to disallow arithmetic expressions as parameters, thus preemptively crippling the option to ever add them later. D extends the traditional unary operator '!' to binary uses and goes with the classic parentheses (which (I'm sure) you always pair properly).

 

Compilation Model

 

D's unit of compilation, protection, and modularity is the file. The unit of packaging is a directory. And that's about as sophisticated as it goes. There's no pretense that the program source code would really feel better in a super-duper database. This approach uses a "database" tuned by the best of us for a long time, integrating perfectly with version control, backup, OS-grade protection, journaling, what have you, and also makes for a low entry barrier for development as all you need is an editor and a compiler. Speaking of which, specialized tool support is at this time scarce, but you can find things like the emacs mode d-mode, vim support, the Eclipse plug-in Descent, the Linux debugger ZeroBugs, and the full IDE Poseidon.

 

Generating code is a classic two-stroke compile and link cycle, but that happens considerably faster than in most similar environments, for two reasons, no, three.

 

  • One, the language's grammar allows separate and highly optimized lexing, parsing, and analysis steps.
  • Two, you can easily instruct the compiler to not generate many object files like most compilers do, and instead construct everything in memory and make only one linear commit to disk.
  • Three, Walter Bright, the creator and original implementor of D, is an inveterate expert in optimization.

 

This low latency means you can use D as a heck of an interpreter (the shebang notation is supported, too).

 

D has a true module system that supports separate compilation and generates and uses module summaries (highbrowspeak for "header files") automatically from source, so you don't need to worry about maintaining redundant files separately, unless you really wish to, in which case you can. Yep, that stops that nag right in mid-sentence.

3.Memory Model and Manycores

 

Given that D can call C functions directly, it may seem that D builds straight on C's memory model. That might be good news if it weren't for the pink elephant in the room dangerously close to that Ming-dynasty vase: manycores -- massively parallel architectures that throw processing power at you like it's going out of style, if only you could use it. Manycores are here, and C's way of dealing with them is extremely pedestrian and error prone. Other procedural and object-oriented languages made only little improvements, a state of affairs that marked a recrudescence of functional languages that rely on immutability to elegantly sidestep many parallelism-related problems.

 

Being a relative newcomer, D is in the enviable position of placing a much more informed bet when it comes to threading. And D bets the farm on a memory model that is in a certain way radically different from many others. You see, old-school threads worked like this: you call a primitive to start a new thread and the new thread instantly sees and can touch any data in the program. Optionally and with obscure OS-dependent means, a thread could also acquire the so-called thread-private data for its own use. In short, memory is by default shared across all threads. This has caused problems yesterday, and it makes today a living hell. Yesterday's problems were caused by the unwieldy nature of concurrent updates: it's very hard to properly track and synchronize things such that data stays in good shape throughout. But people were putting up with it because the notion of shared memory was a close model to the reality in hardware, and as such was efficient when gotten right. Now is where we're getting to the "living hell" part -- nowadays, memory is in fact increasingly less shared. Today's reality in hardware is that processors communicate with memory through a deep memory hierarchy, a large part of which is private to each core. So not only shared memory is hard to work with, it turns out to be quickly becoming the slower way of doing things because it is increasingly removed from physical reality.

 

While traditional languages were wrestling with all of these problems, functional languages took a principled position stemming from mathematical purity: we're not interested in modeling hardware, they said, but we'd like to model true math. And math for the most part does not have mutation and is time-invariant, which makes it an ideal candidate for parallel computing. (Imagine the moment when one of those first mathematicians-turned-programmers has heard about parallel computing -- they must have slapped their forehead: 'Wait a minute!. . . ') It was well noted in functional programming circles that such a computational model does inherently favor out-of-order, parallel execution, but that potential was more of a latent energy than a realized goal until recent times. Today, it becomes increasingly clear that a functional, mutation-free style of writing programs will be highly relevant for at least parts of a serious application that wants to tap into parallel processing.

 

So where's D positioning itself in all this? There's one essential concept forming the basis of D's approach to parallelism:

 

Memory is thread-private by default, shared on demand.

 

In D, all memory is by default private to the thread using it; even unwitting globals are allocated per-thread. When sharing is desired, objects can be qualified with shared, which means that they are visible from several threads at once. Crucially, the type system knows about shared data and limits what can be done with it to ensure that proper synchronization mechanisms are used throughout. This model avoids very elegantly a host of thorny problems related to synchronization of access in default-shared threaded languages. In those languages, the type system has no idea which data is supposed to be shared and which isn't so it often relies on the honor system -- the programmer is trusted to annotate shared data appropriately. Then complicated rules explain what happens in various scenarios involving unshared data, shared annotated data, data that's not annotated yet still shared, and combinations of the above -- in a very clear manner so all five people who can understand them will understand them, and everybody calls it a day.

 

Support for manycores is a very active area of research and development, and a good model has not been found yet. Starting with the solid foundation of a default-private memory model, D is incrementally deploying amenities that don't restrict its options: pure functions, lock-free primitives, good old lock-based programming, message queues (planned), and more. More advanced features such as ownership types are being discussed.

 

 

4.Immutability

 

So far so good, but what happened to all that waxing about the purity of math, immutability, and functional-style code? D acknowledges the crucial role that functional-style programming and immutability have for solid parallel programs (and not only parallel, for that matter), so it defines immutable as a qualifier for data that never, ever changes. At the same time D also recognizes that mutation is often the best means to a goal, not to mention the style of programming that is familiar to many of us. D's answer is rather interesting, as it encompasses mutable data and immutable data in a seamless whole.

 

Why is immutable data awesome? Sharing immutable data across threads never needs synchronization, and no synchronization is really the fastest synchronization around. The trick is to make sure that read-only really means read-only, otherwise all guarantees fall apart. To support that important aspect of parallel programs, D provides an unparalleled (there goes the lowest of all literary devices right there) support for mixed functional and imperative programming. Data adorned with the immutable qualifier provides a strong static guarantee -- a correctly typed program cannot change immutable data. Moreover, immutability is deep -- if you are in immutable territory and follow a reference, you'll always stay in immutable territory. (Why? Otherwise, it all comes unglued as you think you share immutable data but end up unwittingly sharing mutable data, in which case we're back to the complicated rules we wanted to avoid in the first place.) Entire subgraphs of the interconnected objects in a program can be "painted" immutable with ease. The type system knows where they are and allows free thread-sharing for them and also optimizes their access more aggressively in single-threaded code, too.

 

Is D the first language to have proposed a default-private memory model? Not at all. What sets D apart is that it has integrated default-private thread memory with immutable and mutable data under one system. The temptation is high to get into more detail about that, but let's leave that for another day and continue the overview.

 

Safety High On the List

 

Being a systems-level language, D allows extremely efficient and equally dangerous constructs: it allows unmanaged pointers, manual-memory management, and casting that can break into pieces the most careful design.

 

However, D also has a simple mechanism to mark a module as "safe," and a corresponding compilation mode that forces memory safety. Successfully compiling code under that subset of the language -- affectionately dubbed "SafeD" -- does not guarantee you that your code is portable, that you used only sound programming practices, or that you don't need unit tests. SafeD is focussed only on eliminating memory corruption possibilities. Safe modules (or triggering safe compilation mode) impose extra semantic checks that disallow at compilation time all dangerous language features such as forging pointers or escaping addresses of stack variables.

 

In SafeD you cannot have memory corruption. Safe modules should form the bulk of a large application, whereas "system" modules should be rare and far between, and also undergo increased attention during code reviews. Plenty of good applications can be written entirely in SafeD, but for something like a memory allocator you'd have to get your hands greasy. And it's great that you don't need to escape to a different language for certain parts of your application. At the time of this writing, SafeD is not finished, but is an area of active development.

 

Read My Lips: No More Axe

 

D is multi-paradigm, which is a pretentious way of saying that it doesn't have an axe to grind. D got the memo. Everything is not necessarily an object, a function, a list, a hashtable, or the Tooth Fairy. It depends on you what you make it. Programming in D can therefore feel liberating because when you want to solve a problem you don't need to spend time thinking of ways to adapt it to your hammer (axe?) of choice. Now, truth be told, freedom comes with responsibility: you now need to spend time figuring out what design would best fit a given problem.

 

By refusing to commit to One True Way, D follows the tradition started by C++, with the advantage that D provides more support for each paradigm in turn, better integration between various paradigms, and considerably less friction in following any and all of them. This is the advantage of a good pupil; obviously D owes a lot to C++, as well as less eclectic languages such as Java, Haskell, Eiffel, Javascript, Python, and Lisp. (Actually most languages owe their diction to Lisp, some just won't admit it.)

 

A good example of D's eclectic nature is resource management. Some languages bet on the notion that garbage collection is all you need for managing resources. C++ programmers recognize the merit of RAII and some say it's everything needed for resource management. Each group lacks intimate experience with the tools of the other, which leads to comical debates in which the parties don't even understand each other's arguments. The truth is that neither approach is sufficient, for which reason D breaks with monoculture.

 

Object-Oriented Features

 

In D you get structs and then you get classes. They share many amenities but have different charters: structs are value types, whereas classes are meant for dynamic polymorphism and are accessed solely by reference. That way confusions, slicing-related bugs, and comments a la // No! Do NOT inherit! do not exist. When you design a type, you decide upfront whether it'll be a monomorphic value or a polymorphic reference. C++ famously allows defining ambiguous-gender types, but their use is rare, error-prone, and objectionable enough to warrant simply avoiding them by design.

 

D's object-orientation offering is similar to Java's and C#'s: single inheritance of implementation, multiple inheritance of interface. That makes Java and C# code remarkably easy to port into a working D implementation. D decided to forgo language-level support for multiple inheritance of implementation, but also doesn't go with the sour-grapes theory "Multiple Inheritance is Evil: How an Amulet Can Help." Instead, D simply acknowledges the difficulty in making multiple inheritance of state work efficiently and in useful ways. To provide most of the benefits of multiple inheritance at controllable cost, D allows a type to use multiple subtyping like this:

 

class WidgetBase { ... } class Gadget { ... } class Widget : WidgetBase, Interface1, Interface2 { Gadget getGadget() { ... } alias getGadget this; // Widget subtypes Gadget! }

 

The alias introduction works like this: whenever a Gadget is expected but all you have is a Widget, the compiler calls getGadget and obtains it. The call is entirely transparent, because if it weren't, that wouldn't be subtyping; it would be something frustratingly close to it. (If you felt that was an innuendo, it probably was.) Moreover, getGadget has complete discretion over completing the task -- it may return e.g. a sub-object of this or a brand new object. You'd still need to do some routing to intercept method calls if you need to, which sounds like a lot of boilerplate coding, but here's where D's reflection and code generation abilities come to fore (see below). The basic idea is that D allows you to subtype as you need via alias this. You can even subtype int if you feel like it.

 

D has integrated other tried and true techniques from experience with object orientation, such as an explicit override keyword to avoid accidental overriding, signals and slots, and a technique I can't mention because it's trademarked, so let's call it contract programming.

 

5.Functional Programming

 

Quick, how do you define a functional-style Fibonacci function?

 

uint fib(uint int n) { return n < 2 ? n : fib(n - 1) + fib(n - 2); }

 

I confess to entertaining fantasies. One of these fantasies has it that I go back in time and somehow eradicate this implementation of Fibonacci such that no Computer Science teacher ever teaches it. (Next on the list are bubble sort and the O(n log n)-space quicksort implementation. But fib outdoes both by a large margin. Also, killing Hitler or Stalin is dicey as it has hard-to-assess consequences, whereas killing fib is just good.) fib takes exponential time to complete and as such promotes nothing but ignorance of complexity and of the costs of computation, a "cute excuses sloppy" attitude, and SUV driving. You know how bad exponential is? fib(10) and fib(20) take negligible time on my machine, whereas fib(50) takes nineteen and a half minutes. In all likelihood, evaluating fib(1000) will outlast humankind, which gives me solace because it's what we deserve if we continue teaching bad programming.

 

Fine, so then what does a "green" functional Fibonacci implementation look like?

 

uint fib(uint n) { uint iter(uint i, uint fib_1, uint fib_2) { return i == n ? fib_2 : iter(i + 1, fib_1 + fib_2, fib_1); } return iter(0, 1, 0); }

 

The revised version takes negligible time to complete fib(50). The implementation now takes O(n) time, and tail call optimization (which D implements) takes care of the space complexity. The problem is that the new fib kind of lost its glory. Essentially the revised implementation maintains two state variables in the disguise of function parameters, so we might as well come clean and write the straight loop that iter made unnecessarily obscure:

 

uint fib(uint n) { uint fib_1 = 1, fib_2 = 0; foreach (i; 0 .. n) { auto t = fib_1; fib_1 += fib_2; fib_2 = t; } return fib_2; }

 

but (shriek of horror) this is not functional anymore! Look at all that disgusting mutation going on in the loop! One mistaken step, and we fell all the way from the peaks of mathematical purity down to the unsophisticatedness of the unwashed masses.

 

But if we sit for a minute and think of it, the iterative fib is not that unwashed. If you think of it as a black box, fib always outputs the same thing for a given input, and after all pure is what pure does. The fact that it uses private state may make it less functional in letter, but not in spirit. Pulling carefully on that thread, we reach a very interesting conclusion: as long as the mutable state in a function is entirely transitory (i.e., allocated on the stack) and private (i.e., not passed along by reference to functions that may taint it), then the function can be considered pure.

 

And that's how D defines functional purity: you can use mutation in the implementation of a pure function, as long as it's transitory and private. You can then put pure in that function's signature and the compiler will compile it without a hitch:

 

pure uint fib(uint n) { ... iterative implementation ... }

 

The way D relaxes purity is quite useful because you're getting the best of both worlds: iron-clad functional purity guarantees, and comfortable implementation when iteration is the preferred method. If that's not cool, I don't know what is.

 

Last but not least, functional languages have another way of defining a Fibonacci sequence: as a so-called infinite list. Instead of a function, you define a lazy infinite list that gives you more Fibonacci numbers the more you read from it. D's standard library offers a pretty cool way of defining such lazy lists. For example, the code below outputs the first 50 Fibonacci numbers (you'd need to import std.range):

 

foreach (f; take(50, recurrence!("a[n-1] + a[n-2]")(0, 1))) { writeln(f); }

 

That's not a one-liner, it's a half-liner! The invocation of recurrence means, create an infinite list with the recurrence formula a[n] = a[n-1] + a[n-2] starting with numbers 0 and 1. In all this there is no dynamic memory allocation, no indirect function invocation, and no non-renewable resources used. The code is pretty much equivalent to the loop in the iterative implementation. To see how that can be done, you may want to read through the next section.

 

Generic Programming

 

(You know the kind of caution you feel when you want to describe to a friend a movie, a book, or some music you really like but don't want to spoil by overselling? That's the kind of caution I feel as I approach the subject of generic programming in D.) Generic programming has several definitions -- even the neutrality of the Wikipedia article on it is being disputed at the time of this writing. Some people refer to generic programming as "programming with parameterized types a.k.a. templates or generics," whereas others mean "expressing algorithms in the most generic form that preserves their complexity guarantees." I'll discuss a bit of the former in this section, and a bit of the latter in the next section.

 

D offers parameterized structs, classes, and functions with a simple syntax, for example here's a min function:

 

auto min(T)(T a, T b) { return b < a ? b : a; } ... auto x = min(4, 5);

 

T would be a type parameter and a and b are regular function parameters. The auto return type leaves it to the compiler to figure out what type min returns. Here's the embryo of a list:

 

class List(T) { T value; List next; ... } ... List!int numbers;

 

The fun only starts here. There's too much to tell in a short article to do the subject justice, so the next few paragraphs offer "deltas"-- differences from the languages with generics that you might already know.

 

Parameter Kinds. Not only types are acceptable as generic parameters, but also numbers (integral and floating-point), strings, compile-time values of struct type, and aliases. An alias parameter is any symbolic entity in a program, which can in turn refer to a value, a type, a function, or even a template. (That's how D elegantly sidesteps the infinite regression of template template template. . . parameters; just pass it as an alias.) Alias parameters are also instrumental in defining lambda functions. Variable-length parameter lists are also allowed.

 

String Manipulation. Passing strings to templates would be next to useless if there was no meaningful compile-time manipulation of strings. D offers full string manipulation capabilities during compilation (concatenation, indexing, selecting a substring, iterating, comparison....)

 

Code Generation: The Assembler of Generic Programming . Manipulating strings during compilation may be interesting, but is confined to the data flatland. What takes things into space is the ability to convert strings into code (by use of the mixin expression). Remember the recurrence example? It passed the recurrence formula for Fibonacci sequences into a library facility by means of a string. That facility in turn converted the string into code and provided arguments to it. As another example, here's how you sort ranges in D:

 

// define an array of integers auto arr = [ 1, 3, 5, 2 ]; // sort increasingly (default) sort(arr); // decreasingly, using a lambda sort!((x, y) { return x > y; })(arr); // decreasingly, using code generation; comparison is // a string with conventional parameters a and b sort!("a > b")(arr);

 

Code generation is very powerful because it allows implementing entire features without a need for language-level support. For example, D lacks bitfields, but the standard module std.bitmanip defines a facility implementing them fully and efficiently.

 

Introspection. In a way, introspection (i.e., the ability to inspect a code entity) is the complement of code generation because it looks at code instead of generating it. It also offers support for code generation -- for example, introspection provides the information for generating a parsing function for some enumerated value. At this time, introspection is only partially supported. A better design has been blueprinted and the implementation is "on the list," so please stay tuned for more about that.

 

is and static if. Anyone who's written a non-trivial C++ template knows both the necessity and the encumbrances of (a) figuring out whether some code "would compile" and deciding what to do if yes vs. if not, and (b) checking for Boolean conditions statically and compiling in different code on each branch. In D, the Boolean compile-time expression is(typeof(expr)) yields true if expr is a valid expression, and false otherwise (without aborting compilation). Also, static if looks pretty much like if, except it operates during compilation on any valid D compile-time Boolean expression (i.e., #if done right). I can easily say these two features alone slash the complexity of generic code in half, and it filled me with chagrin that C++0x includes neither.

 

But Wait, There's. . .Well, You Know. Generic programming is a vast play field, and although D covers it with a surprisingly compact conceptual package, it would be hard to discuss matters further without giving more involved information. D has more to offer, such as customized error messages, constrained templates ' la C++0x concepts (just a tad simpler -- what's a couple of orders of magnitude between friends?), tuples, a unique feature called "local instantiation "(crucial for flexible and efficient lambdas), and, if you call within the next five minutes, a knife that can cut through a frozen can of soda.

6.A Word on The Standard Library

 

This subject is a bit sensitive politically because, as mentioned, there are two full-fledged libraries that can be used with D, Phobos and Tango. I only worked on the former so I will limit my comments to it. For my money, ever since the STL appeared, the landscape of containers+algorithms libraries has forever changed. It's changed so much, in fact, that every similar library developed after the STL but in ignorance of it runs serious risks of looking goofy. (Consequently, a bit of advice I'd give a programmer in any language is to understand the STL.) This is not because STL is a perfect library -- it isn't. It is inextricably tied to the strengths and weaknesses of C++, for example it's efficient but it has poor support for higher-order programming. Its symbiosis with C++ also makes it difficult for non-C++ programmers to understand the STL in abstract, because it's hard to see its essence through all the nuts and bolts. Furthermore, STL has its own faults, for example its conceptual framework fails to properly capture a host of containers and ways of iterating them.

 

STL's main merit was to reframe the entire question of what it means to write a library of fundamental containers and algorithms, and to redefine the process of writing one in wake of the answer. The question STL asked was: "What's the minimum an algorithm could ever ask from the topology of the data it's operating on?" Surprisingly, most library implementers and even some algorithm pundits were treating the topic without due rigor. STL's angle put it in stark contrast to a unifying interface view in which, for example, it's okay to unify indexed access in arrays and linked lists because the topological aspect of performing it can be written off as just an implementation detail. STL revealed the demerit of such an approach because, for example, it's disingenuous to implement as little as a linear search by using an unifying interface (unless you enjoy waiting for quadratic algorithms to terminate). These are well-known truths to anyone serious in the least about algorithms, but somehow there was a disconnect between understanding of algorithms and their most general implementations in a programming language. Although I was conversant with algorithm fundamentals, I can now say I had never really thought of what the pure, quintessential, Platonic linear search is about until I first saw it in the STL 15 years ago.

 

That's a roundabout way of saying that Phobos (places to look at in the online documentation: std.algorithm and std.range) is a lot about the STL. If you ask me, Phobos' offering of algorithms is considerably better than STL's, and for two reasons. One, Phobos has the obvious advantage of climbing on the shoulders of giants (not to mention the toes of dwarfs). Two, it uses a superior language to its fullest.

 

Ranges are Hot, Iterators are Not. Probably the most visible departure from the STL is that there are no iterators in Phobos. The iterator abstraction is replaced with a range abstraction that is just as efficient but offers vastly better encapsulation, verifiability, and abstraction power. (If you think about it, none of an iterator's primitive operations are naturally checkable. That's just bizarre.) Code using ranges is as fast, safer, and terser than code using iterators -- no more for loop that's too long to contain in one line. In fact, thinking and coding with ranges is so much terser, new idioms emerge that may be thinkable but are way too cumbersome to carry through with iterators. For example, you might have thought of a chain function that iterates two sequences one after another. Very useful. But chain with iterators takes four iterators and returns two, which makes it too ham-fisted to be of any use. In contrast, chain with ranges takes two ranges and returns one. Furthermore, you can use variadic arguments to have chain accept any number of ranges -- and all of a sudden, we can avail ourselves of a very useful function. chain is actually implemented in the standard module std.range. As an example, here's how you can iterate through three arrays:

 

int[] a, b, c; ... foreach (e; chain(a, b, c)) { ... use e ... }

 

Note that the arrays are not concatenated! chain leaves them in place and only crawls them in sequence. This means that you might think you could change elements in the original arrays by means of chain, which is entirely true. For example guess what this code does:

 

sort(chain(a, b, c));

 

You're right -- the collective contents of the three arrays has been sorted, and, without modifying the size of the arrays, the elements have been efficiently arranged such that the smallest come in a and so on. This is just a small example of the possibilities offered by ranges and range combinators in conjunction with algorithms.

 

Laziness to Infinity and Beyond. STL algorithms (and many others) are eager: by the time they return, they've finished their job. In contrast, Phobos uses lazy evaluation whenever it makes sense. By doing so, Phobos acquires better composition capabilities and the ability to operate with infinite ranges. For example, consider the prototypical higher-order function map (popular in functional programming circles, not to be confused with the homonym STL data structure) that applies a given function to each element in a range. If map were insisting to be eager, there'd be two problems.

 

  • First, it would have to allocate new space to store the result (e.g., a list or an array).
  • Second, it would have to consume the range in its entirety before returning control to the caller.

 

The first is an efficiency problem: memory allocation could and should be avoided in many cases (for example, the caller wants to just look at each result of map in turn). The second is a problem of principles: eager map simply can't deal with infinite ranges because it would loop forever.

 

That's why Phobos defines map to return a lazy range -- it incrementally makes progress as you consume elements from it. In contrast, the reduce function (in a way a converse of map) is eager. Some functions need both lazy and eager versions. For example, retro(r) returns a range that iterates the given range r backwards, whereas reverse(r) reverses r in place.

 

Conclusion

 

There would be more things to talk about even in an overview, such as unit tests, UTF strings, compile-time function evaluation (a sort of D interpreter running during compilation of a D program), dynamic closures, and many others. But with any luck, your curiosity has been piqued. If you are looking for a system-level language without the agonizing pain, an application language without the boredom, a principled language without the attitude, or -- most importantly -- a weighted combination thereof, then D may be for you.

 

If you feel like asking further questions, write the author, or better yet, tune to the Usenet server news.digitalmars.com and post to the digitalmars.d newsgroup -- the hub of a vibrant community.

 

Acknowledgments

 

Many thanks to Scott Meyers who pointed out the necessity of such an article and suggested its title. I have gotten excellent reviews, feedback, and suggestions from Bill Baxter, Jason House, John "Eljay" Love-Jensen, Denis Koroskin, leonardo maffi (sic), Petru M?arginean, Bartosz Milewski, Derek Parnell, Brad Roberts, Joel Salomon, Benjamin Shropshire, David Simcha, Florin Trofin, Cristian Vlasceanu, and Walter Bright.

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics