How an old Drawbridge helped Microsoft bring SQL Server to Linux | Ars Technica
When in March this year Microsoft announced that it was bringing SQL Server to Linux the reaction was one of surprise, with the announcement prompting two big questions: why and how?
SQL Server is one of Microsoft’s major Windows applications, helping to drive Windows sales and keep people on the Windows platform. If you can run SQL Server on Linux, well, that’s one less reason to use Windows.
And while SQL Server does share heritage with Sybase SQL Server (now called SAP ASE), a Unix database server that Microsoft ported to Windows, that happened a long time ago. Since 1993, when Sybase and Microsoft went their separate ways, the products have diverged and, for the last 23 years, Microsoft SQL Server has been strictly a Windows application. That doesn’t normally make for easy porting.
Reaching the customers where they are
Even as a Windows-exclusive offering, SQL Server has widespread adoption, taking a substantial chunk of the paid database market. But being Windows-specific has meant that, equally, there was always a proportion of the market that SQL Server couldn’t reach. It doesn’t much matter what the merits of the database engine really are; it was off-limits to organizations that standardized on Linux.
Rohan Kumar, general manager of database systems at Microsoft, told us that there had long been some level of demand from customers who wanted the flexibility to pick a database without being forced onto a specific operating system but that over the past two to three years these calls had become louder. Containerization and Docker, in particular, were cited as stimulating this demand; organizations want to take advantage of these features to streamline their deployments and management, but at the same time, they want SQL Server’s features and its not-as-cripplingly-expensive-as-Oracle licensing.
(Windows Server 2016 supports Docker-compatible containers, too, but the broader point remains: companies want to be able to take advantage of platform features as they become available without tying their hands when it comes to decisions about applications.)
In the past, Microsoft might have ignored these requests. In the past, Microsoft did ignore these requests. But the company is changing and is increasingly going after opportunities that it might once have ignored. This has taken many forms over the past few years. The release of Office for iPad, for example, cedes what might otherwise have been a unique advantage for Windows-based platforms, but the opportunity to consolidate Office’s position across the industry and strengthen the Office 365 subscription value proposition has outweighed that concern.
Reaching more customers with more features
SQL Server is seeing two changes that we might not have expected from the Microsoft of old. One of these is already available today; with Service Pack 1 for SQL Server 2016, Microsoft radically changed the way the database’s different versions and price points worked. Traditionally, the different variants of SQL Server—Express, Standard, Enterprise—have offered two kinds of variation. They have different levels of scalability—Express is limited to a certain amount of RAM and certain sizes of database—and different features. Some high-end features, such as extensive encryption support, were unavailable in the cheaper versions of the database.
With Service Pack 1, the feature differences were substantially eliminated, leaving only the scalability differences. This means that even the free Express version, for example, has the same encryption capabilities as the expensive versions; it just can’t be used to host such big databases.
Scott Guthrie, executive vice president for cloud and enterprise, told us that this kind of change came with some risks, but it also offered opportunities. He pointed to Visual Studio Community Edition, released in 2014, as a kind of precursor to this. Making a fully featured version of Visual Studio available for free was a decision that could have cost revenue as paying developers moved to the free version (prior to Community Edition, the Express Edition products were free, but these had certain functional constraints not found in Community Edition), but Guthrie told us that the company had actually seen Visual Studio revenue grow. Teams and companies that otherwise might not have used Visual Studio at all used the product and became drawn to it.
Guthrie expresses a similar hope for Service Pack 1. Sure, some companies may have been paying for SQL Server solely to access features not found in the Express SKU, and they may now save that money, but the company hopes that this will be offset by developers making greater use of SQL Server in general, making greater use of SQL Server’s complex features (ones previously available only in expensive versions), and seeing increased sales as these new applications need the scalability that the paid SKUs provide.
Kumar told us that expectations for SQL Server on Linux were similar. There was some nervousness and trepidation when the decision was first made to bring SQL Server to Linux—would it simply cut Windows revenue as companies switched to the cheaper server operating system?—but as the project progressed, the team grew more confident that it would open up SQL Server to a whole set of customers who previously couldn’t even contemplate it because of its OS dependency.
Moreover, he said that without offering this kind of flexibility, there were customers using SQL Server that were likely to defect to other options anyway to reduce their platform lock-in. SQL Server for Linux carries risks, but so does keeping SQL Server restricted to Windows. The opportunity was felt to outweigh these concerns.
The initial response has been encouraging, with 21,000 people signing up to use the Linux preview and 3,000 to 4,000 using it extensively.
Lowering the Drawbridge
The decision to go ahead with SQL Server for Linux was made about 18 months ago. The question then became, how to do it?
It was seen as essential that SQL Server on Linux have identical semantics and performance to SQL Server on Windows, right down to the level of database file compatibility. This would be very difficult to do if the software were forked to have two separate versions each with their own approach to file I/O, memory management, threading, and so on.
But SQL Server is a large application, and although its interactions with Windows are relatively narrow—things like the graphical management tools are remaining Windows-only, at least for the time being, so a large part of the Windows API surface is avoided—it still uses about 1,500 Win32 API calls. Supporting all of these on Linux would be a major undertaking.
Drawbridge runs an almost complete operating system within a process.
The first piece of the puzzle was a Microsoft Research Project that was completed in 2011 called Drawbridge. The Drawbridge project explored a new approach to process virtualization and isolation with two major elements: a picoprocess, which is a lightweight process that has access to a small number of about 50 low-level kernel features, and a Library OS (LibOS), which is a modified operating system stack designed to run within a picoprocess.
An application and LibOS run within the picoprocess together, with LibOS providing all the operating system-like functionality that the application depends on, such as threading, virtual memory management, and a full set of file I/O features. LibOS talks to the underlying kernel using those 50 or so API calls.
This approach offers much of the isolation and security that virtual machines offer—each picoprocess has its own LibOS, so even if the application is compromised and the kernel attacked, it’s only the LibOS “kernel” running in user mode, rather than the real kernel running below—but with lower overheads. LibOS doesn’t need to provide an entire operating system’s worth of functionality; it can be restricted to only the APIs and services that a specific application needs. This cuts the per-application memory and processor overhead when compared to running each individual application within a virtual machine.
Though the Drawbridge project ended in 2011, Microsoft has drawn inspiration from it already. The picoprocess concept is used for the Windows Subsystem for Linux. Linux processes on Windows are “empty” processes that omit the usual Windows libraries. In fact, they can’t even make Windows kernel function calls. WSL doesn’t use a Library OS. Instead, it uses a kernel-mode component that offers the WSL picoprocesses the ability to make Linux kernel function calls; that component offers a subset of the Linux API built on top of the Windows NT kernel.
SQL Server’s operating system on top of an operating system
The second major element to SQL Server for Linux is a piece of work that the SQL Server team did for SQL Server 2005, called SQLOS.
Dependent as it was on Windows, SQL Server is actually something of an anomaly when it comes to Windows programs, because it avoids using Windows as much as it can. Windows, as a general purpose OS, offers all manner of functionality. It has a filesystem, with complex data caching. It has a thread scheduler, designed for everything from single core machines up to monstrous servers with hundreds of cores; it has a memory manager that works with anything from a browser on a machine with 2GB RAM to a database on a machine with 1TB RAM. It spans a huge range of configurations and workloads, and it does so with a single kernel that works the same way (albeit with a small number of tunable options) across the board.
SQL Server, however, is a performance-critical application that often (though not always) runs on dedicated hardware. The engineers understand its workload very well and want to optimize for that workload and that workload alone. Accordingly, they’ve designed SQL Server to handle many of the tasks that the operating system would normally handle.
For example, most Windows applications use the Windows file cache to improve disk I/O performance. SQL Server, however, uses unbuffered I/O, bypassing the Windows cache entirely, and instead implements its own internal caching system. Similarly, SQL Server handles scheduling and switching between threads, another task traditionally handled by the OS.
SQLOS (called “SOS” in this diagram) also replicates much of an operating system’s functionality.
Thanks to SQLOS, the core SQL Server engine doesn’t actually depend on Windows very much. SQLOS uses a few low-level features from the Windows kernel and then implements its own “operating system” on top of them.
In other words, SQLOS is doing a lot of the kind of work that LibOS does.
Not every part of SQL Server uses SQLOS. The SQL Server Agent service, used for scheduling tasks within the database, relies on regular OS handling of file I/O and process scheduling and so on. So too do elements such as SQL Server’s XML support (which uses Microsoft’s MSXML XML library) and SQLCLR, the .NET runtime embedded within the database. These need something closer to full Win32 to run—and again, that sounds a lot like LibOS as it provides a version of Win32, or at least a large part of it.
Taking the best of both worlds
Accordingly, Microsoft took these ideas and merged them. The final snapshot of the LibOS code from Drawbridge was merged with SQLOS, producing what Microsoft calls SQLPAL, the SQL Platform Abstraction Layer. Whereas the Drawbridge LibOS offered Windows-style thread scheduling, I/O management, and so on, SQLPAL took the SQL Server-optimized code from SQLOS for these features, creating a kind of cut-down version of Win32 with its performance tailored to SQL Server’s needs.
SQLPAL’s Win32 support is not complete—around 1 percent (81MB) of Windows libraries are used, with SQLPAL itself being another 8MB—but it’s sufficient to run MSXML, SQL Server Agent, and many of the other essential parts of SQL Server. While these components used the regular Windows scheduler, I/O, and memory management in older SQL Server, with SQLPAL even these parts now use the optimized, specialized routines.
Drawbridge’s LibraryOS combined with SQLOS gives us SQLPAL.
Beneath SQLPAL is a layer called a “host extension” (the equivalent in Drawbridge was called a Platform Abstraction Layer) that provides the bridge between the underlying platform, either Linux or Windows. Everything above the host extension is common code; the host extension has Win32 and Linux versions.
Indeed, the things running on top of the host extension are not merely common code; they’re common binaries. SQL Server on Linux is not a Linux executable in Linux’s ELF format. It’s a Windows executable, in Windows’ PE format. You could in principle take it and run it on Windows.
The design means that in broad strokes, the performance of SQL Server on Linux and SQL Server on Windows will be matched; the underlying OS is in many regards bypassed completely except for the set of calls that the host extension makes. However, a few pieces are performance critical, such as disk I/O. In these areas, Microsoft has cut back the amount of code between SQL Server and the underlying platform, enabling SQL Server to make more or less direct calls to the underlying operating system APIs. In some cases this requires a small amount of translation code—for example, SQL Server is built to use Windows’ scatter/gather I/O, which uses different memory layouts to Linux’s vectored I/O, so there must be conversion between the two—but often even this isn’t needed.
Work in progress
In the current preview release, the merging of SQLOS and LibOS is still ongoing. Currently, SQL Server includes its own SQLOS, sitting atop the SQLPAL. Microsoft is working to remove SQLOS and make SQL Server call into SQLPAL directly. When this is done, there will be a single SQLPAL with both SQL Server and all its ancillary processes and libraries running on top of it.
There are still parts of SQL Server that won’t be in the initial stable release of SQL Server for Linux. Some of these, such as a SQL Server feature called FileTables, are tricky because they depend on specific features of the NTFS filesystem. Similarly, integrated Windows authentication depends on Windows. It’s not clear how these features will be brought to Linux, if indeed they are. Other missing features, such as Full Text Search and Reporting Services, are likely to be brought over eventually; they just require additional developer time and so won’t be ready in time for the initial release. The long-term goal is to have every part of SQL Server that makes sense to run on Linux running on Linux.
Testers of the preview have even expressed a desire for this to include SQL Server’s graphical management tools. Currently, SQL Server for Linux offers only command-line-based management. The graphical tools run on Windows only at the moment, though they can be used to manage the database running on both operating systems.
To be truly credible as a Linux database, there’s more work than just porting SQL Server itself. Application support is also important. Microsoft has taken some steps toward this already—for example, the JDBC driver, used to access SQL Server from Java applications, was recently open sourced—but there’s much more work to do to ensure that SQL Server is on the same footing as other Linux database engines when it comes to support from management front-ends and software libraries. Kumar told us that this is an area of ongoing effort and that the importance is recognized.
With SQL Server on Linux, the Drawbridge project has been one of Microsoft’s more obviously fruitful pieces of research. Both the picoprocess concept and the LibOS concept have been used and are—albeit separately from one another—or will be, part of shipping software. Early indications suggest strong interest in SQL Server for Linux, and while the company is still a little way off, the end goal of having just “SQL Server,” rather than SQL Server for Windows and SQL Server for Linux, looks attainable.