Register Discussions Communities Projects Download Source Browser

November 06, 2009

nice ZIL device....

Working together with James McPherson, I've been able to get a driver for a really nifty device from DDRdrive -- see here -- working with OpenSolaris.



The device itself has interesting applications -- but I suspect the real killer application for it is in use as a "Logzilla" (ZIL) to accelerate synchronous write loads with ZFS. The only limitation on performance is the software and the single PCIe lane. (We're really right up against the PCIe single lane limit.)

From the driver perspective, its really interesting because it works with only modest changes to blk2scsa (which I'll be integrating into Nevada soon). The driver itself is tiny -- only about 950 lines of code. Its proven to be a great validation of the concept of blk2scsa -- while I never intended blk2scsa for use with high performance devices, I'm ecstatic that its as performant as it is. (Sorry, I can't post performance numbers here -- at least not yet.)

If you already have one of these devices, and want to test it in a non-production environment running OpenSolaris (no Solaris 10 yet!), please drop me a line. I'm willing to work with a few people to get some more testing done.

November Tokyo Linux User Group Meeting

The Tokyo Linux UG will have a technical meeting & nomikai on Saturday November 14th. Stop by.

DTrace Cheatsheet

A while ago Jonathan Adams posted an mdb cheatsheet, summarizing the syntax and commands of the Solaris Modular Debugger. It's a pretty handy reference, and last week I used it as a handout when teaching a class that included both mdb and DTrace. Well, I have the mdb cheatsheet - what about a DTrace one?

Here is my DTrace cheatsheet, styled by Todd. This doesn't include all of DTrace - it includes what I use most frequently. As a reminder, I flicked through the DTraceToolkit to see what I frequently used in those scripts.

If you learn what everything does on that one page cheatsheet including the one-liners, you'll know a significant amount of DTrace. If something has been summarized too much on that cheatsheet, refer to the full DTrace Guide.

November 05, 2009

OpenSolaris Sparc snv125 LiveCD

Following numerous requests from users I've prepared Sparc LiveCD based on osol-1002-125-ai-sparc.iso (see my previous post). Please remember that this iso is UNOFFICIAL version, I've created it only for my tests. With this iso you can test OpenSolaris on your hardware, AI tools or you can install system from this LiveCD manually, using steps from my old posts (i.e. you need to create zpool, copy all LiveCD contents to rootfs, disable live-media services, change vfstab and etc).
Temporarily iso is here.

gnome-terminal window sizing

Quick post on this, I'm pretty sure more people run into the same problem.. Usually the first thing I do after opening a new gnome-terminal is resizing so it doesn't overlap over others or vice-versa. Or if I have a big screen, or know that I'll need really long terminals (or tall ones) and so on.. Anyway, resizing all the time becomes kinda annoying, so I've added the following option to the gnome-panel icon for opening a new terminal:

gnome-terminal --geometry=80x52

FIPS Capable OpenSSL for OpenSolaris

Earlier this morning I integrated into the SFW consolidation the changes for

PSARC/2009/507 FIPS Capable OpenSSL
6562055 FIPS-capable version of OpenSSL

A FIPS Capable OpenSSL is a regular OpenSSL built with the OpenSSL FIPS 140-2 Object Module which has been certified by NIST to be 140-2 compliant. It can be used in both a FIPS mode and as a regular OpenSSL. The certification for the OpenSSL FIPS 140-2 Object module was very unusual in that it was given for the source code instead of for a binary object. As long as the certified source has not been modified in any way and the security policy is strictly followed when building the source the certification remains valid. If you would like to see how OpenSSL is built for OpenSolaris, the full source for the SFW consolidation can be downloaded here.

The only application included in OpenSolaris which works properly in FIPS mode with the FIPS Capable OpenSSL is openssl(1).

$ LD_LIBRARY_PATH=/lib/openssl/fips-140 OPENSSL_FIPS=1 openssl version
OpenSSL 0.9.8k-fips 25 Mar 2009 (+ security fixes for: CVE-2009-1377 CVE-2009-1378 CVE-2009-1379 CVE-2009-2409)

Unfortunately it was not possible to simply replace the existing version of OpenSSL with the FIPS Capable OpenSSL. The main issue was one of performance - the OpenSSL FIPS Object Module is based on older code than the 0.9.8k release and performs poorly on some newer CPUs. We didn't feel that it was viable to introduce such a performance regression especially as most people aren't interested in a FIPS Capable OpenSSL. More information can be found in the PSARC mail logs.

We may deliver a version of SunSSH which can be run in a "FIPS mode" which will make use of the FIPS Capable OpenSSL.

The FIPS Capable OpenSSL will be available build 128.

No need for fsck in ZFS

Recently there was an article at OSNEWS " Should ZFS Have a fsck Tool?". Well, no it shouldn't. I wanted to write an explanation why it is the case but Joerg was first and there is no point me repeating him. So if you wonder why ZFS doesn't need a fsck tool read Joerg blog entry about it.

Update: Recent Cloud Security Happenings

I have to say that it has been a very busy couple of weeks. That said, I am happy to say that there is a lot to show for everyone's effort however. We have been able to publish quite a lot of new and updated content, and I figured that it might be a good time to shine a spotlight on some of the more interesting items. Without further ado...

Going forward, we are going to try and bring together all of the Cloud Computing security content on our brand new Sun.COM Cloud Security home page. Be sure to check it out regularly!

More is coming, don't miss it!

Technorati Tag:

November 04, 2009

OSCON 2009: FreeBSD

The last "people" talk that I went to at OSCON was Marshall Kirk McKusick's talk "Building and Running an Open-Source Community". I was interested in this talk for a couple reasons. First, I don't know much about how the BSD communities work, and I'm always interested in how large open-source communities do things. Second, I was at Berkeley during the time of the Computer Systems Research Group (CSRG). And while I got to know some of the CSRG staff, I was not directly involved with the development of Berkeley Unix. So I wanted to find out more about what was going on while I was there.

Kirk mentioned that the CSRG started up in the 1970s, after Bill Joy was already at Berkeley. At first, they didn't use a source code control system. Then around 1980 they started using SCCS. There are various reasons for using a source code control system, such as making it easy to review changes if a regression is discovered. For the CSRG, introducing SCCS enabled better productivity for the CSRG staff. Although they still reviewed all the changes that were checked in prior to a release, they could hand off some of the mechanics, such as merging patches and testing, to trusted committers.

This basic structure, with a core team, a group of committers, and a group of developers, is still used today for FreeBSD development. Kirk mentioned a couple details that I thought were interesting. In particular, he said that most developers don't want to be committers. This is usually because they don't want to be involved that much; they just have a change or two that they want to see made. Kirk also mentioned that committers are held to higher standards for things like email etiquette. And all changes must be reviewed by at least one other committer.

The FreeBSD core team is 9 people who are nominated from and elected by the committers every 2 years. They maintain the FreeBSD roadmap, they resolve conflicts between committers, and they admit and remove committers.

Kirk pointed out that people can contribute to FreeBSD in ways other than writing code. They can write documentation, they can do testing, they can do release engineering, and so forth. These people can be committers, too, and there's no relative weighting between code committers and other committers. This also means that these other folks can be elected to the core team. In fact, the latest election brought an advocacy/marketing committer onto the core team. At the time of the talk, there were 390 committers and around 6,000 developers.

FreeBSD does a stable .0 release every couple years. The stable branch has a 5-year lifetime and a binary compatibility guarantee. Minor (dot) releases happen on the stable branch every 6 months or so. Development happens on the trunk, and important bug fixes are merged into the stable branch. They use Subversion and provide a CVS mirror.

The pre-release freeze times vary: for a new stable branch it's about a month, and for a dot release it's about a week. I'm not sure how to compare those times with the times for OpenSolaris. For example, OpenSolaris freezes are gradual: first there's a period where only bug fixes are allowed in (no new features), then there's a period where only fixes for stopper bugs are allowed in. I wish I'd thought to ask Kirk for more details on how FreeBSD manages freezes.

One of the ways the BSD license is different from the GPL or the CDDL is that it lets people make proprietary changes to the code. Kirk said that this does happen, but those changes are usually specific to a particular product. Because they aren't generally interesting, the FreeBSD project probably wouldn't take the changes even if they were offered.

Crossbow paper wins best paper award at Usenix LISA09 and BOF schedule

Crossbow paper wins best paper award at Usenix LISA09 and BOF schedule

We had submiited another paper at Usenix LISA 2009 conference at Baltimore, MD which is being held from Nov 3-5, 2009. The paper is title Crossbow Virtual Wire: Network in a Box. Yesterday we were informed that our paper won the Best Paper award for the conference. Woohoo!!

I met many people here at LISA that are already using Crossbow in very interesting ways. I got many requests to hold a BOF while we were here. So I hit our marketing VP for some beer budget (can't have a BOF without drinks) and we are now having a Crossbow and Solaris Networking BOF on Nov 4th, 2009 from 10.30 to 11.30pm in Dover AB conference room. The venue details can be found on usenix LISA site here. So people who are already at the conference on in the general area of Maryland, Virginia, DC, etc, please do come buy. It would be good to attach faces to name and we will have chilled beer. We will also be showing the Virtual Wire Builder kit to build your own virtual network (all available in open source form).

Once again, BOF details are
  • Crossbow & Solaris Networking BOF at Usenix Lisa 2009
  • Place: Marriott Waterfront Hotel, Baltimore. MD.
  • Date: Nov 4th, 2009
  • Time: 10.30-11.30pm
  • Agenda: Virtual Wire Builder kit, Open discussion, Beer

Hope to see you there.

NEW: Solaris 10 Security Deep Dive Presentation

Today, I am very happy to announce the availability of a new Solaris 10 Security Deep Dive training. This version has been updated for Solaris 10 10/2009 (also known as Update 8). From a security perspective, there have only been a few updates since my last posted version, but it is always good to be current. Items added in this new version include: ZFS user and group quotas, ZFS pre-defined ACL sets, NTPv4, and nss_ldap shadowAccount support. In addition, there was a bit of cleanup throughout and a new example was added for Trusted Extensions.

As usual, I have made this content available in both OpenDocument Format (ODF) and PDF. If you are using Microsoft Office, you can use the Sun MS Office ODF Plugin to read the source document.

For those of you who have downloaded one of the previous versions, thank you! There have been nearly 8,000 downloads of this presentation so far! If you have not had a chance, I would encourage you to download and check out a copy today. It is really amazing how many new and updated security features and capabilities there are in Solaris 10. If you have been away from Solaris (even Solaris 10) for a while, I am sure you will be shocked with what you can do today! As always, feedback is greatly appreciated!

Take care!

Glenn

Technorati Tag:

links for 2009-11-04

☞ Random Monday Grab-bag

November 03, 2009

moinakg


After Belenix 0.8 Alpha which included exciting features such as the Google Widgets, Webkit, and KDE 4.2.4 — all built with GCC 4.4.0 in addition to Gnome 2.26, BeleniX 0.8 Beta1 is now available with improvements (bug fixes and functionality) to the KDE 4.3.1 desktop and other apps and new package additions. Several patches/fixes for various packages were taken from the Fedora Core 11 repository.

You need to use the Network Installer in order to install this 0.8 Beta1 Release. The Network Installer will not touch your current environment in any way. It creates a new Boot Environment and installs into that. Your current environment remains as the default one.

You can see the full announcement here: http://www.belenix.org/content/BeleniX-08-Beta1-available-Network-Installer

SUNWgtk2-print-cups /SUNWgtk2-print-papi in OpenSolaris 126

Part of the preparation for making CUPS as default on OpenSolaris, I have split out 2 packages SUNWgtk2-print-cupsand SUNWgtk2-print-papi in OpenSolaris 126 from SUNWgtk2.

Why did I do that?
The primary reason is that when LP ceases to the default print system on the LiveCD, having the PAPI print backend on the liveCD and not have /usr/lib/libpapi.so, all the applications that have print dialog will *CRASH*.

Splitting this out allows the PAPI print backend not to be installed on the liveCD when CUPS becomes default and allows applications to continue to work properly.

When will CUPS as default happen?
The basic code to switch CUPS as default is in b127, however, a lots of packages refactoring is being worked on so that CUPS will be slimmer than LP on the LiveCD. The credit for that belongs to Gowtham T (and as usual Norm as the adviser).

So when will CUPS as default becomes a reality from the LiveCD, that is all in the capable hands of Dave Miner and David Comay :).

So in the meantimes, in b126/127, you may have to do:

$ pfexec pkg install SUNWgtk2-print-cups

if you are already using CUPS and noticed that all the printers you used to see is not visible in the print dialog.

Why can this be fixed automatically?
It seems until facets is implemented in IPS, I cannot easily specify some of the interdependencies easily. (see discussion thread here, here)

While I am goggling, it is really excited to see that Bart is implementing facets with this bug.

GREAT News!

building an ON IPS repository

I've been working with the gracious help of Mark on making the ON consolidation create an pkg(5) repository as part of the build process. If you build the ON consolidation from source ever, this is probably interesting to you.

Our changes are destined to be integrated into the main ON gate, which should happen in November 2009 sometime (though that's subject to change and doesn't constitute a promise). We've tried to make it easy for folks to build their own ON IPS repositories for testing in advance of integration of our changes. You can access the latest instructions for building your own ON repository in the README which lives in our development mercurial repository.

If you do want to try this out, I strongly recommend subscribing to on-ips-dev@opensolaris.org, as that's where we're answering questions, giving heads-ups about important changes, and having development conversations. We've got some sizable changes coming over the next few weeks, including a protocmp which works on IPS manifests.

I'm really enjoying using the same tools as we expect our customers to use. It's now pkg image-update to update my ON development bits rather than the development-only tool bfu. pkg image-update is at least as fast, if not faster than bfu, especially over a slower link. That's because only the bits which have actually changed between versions are downloaded and updated by pkg(5). Nice. And nice that our normal upgrade experience is now as blindingly fast as ON developers have come to expect.

November 02, 2009

ZFS Deduplication Integrated!

It took more than expected but it has been finally integrated! Read Jeff Bonwick's post on ZFS dedup.
PSARC 2009/571 ZFS Deduplication Properties
6677093 zfs should have dedup capability
You can find code changes here.

Olhar Digital III (English)

These days two of my kids did participate on a television show about their experience in GNU/Linux. The program is in Brazilian Portuguese, but i did find this youtube version with english subtitles. So, if you want to watch… ps.: I did think they were learning MS Windows at school… now i know i will need [...]


Immutable Service Containers @ Amazon EC2

Just in time for the OpenSolaris Developer Conference, we were able to publish new Immutable Service Containers images directly to the Amazon Web Services Elastic Compute Cloud (EC2) environment. Previously, I talked about creating ISCs using our security enhanced OpenSolaris 2009.06 AMIs. Today, I am happy to announce that we have taken the next logical step by making available AMIs that fully incorporate the ISC changes. If you want to try out this configuration, simply provision an Immutable Service Containers AMI on EC2. We have made AMIs available in both the U.S. (ami-48c32021) and European (ami-78567d0c) regions. As always, we would love to get your feedback on these images and what you would like to see next!

Take care!

Technorati Tag:

Immutable Service Containers on Amazon EC2

Back in June, we released the very first security hardened virtual machine images for the Amazon Web Services Elastic Compute Cloud (EC2) environment. These original images were based upon the OpenSolaris 2008.11 release and were configured in accordance with the guidelines published by Sun the Center for Internet Security. Since its initial release, we have provided an update to offer this image in the European Region. In August, we took another step forward with the release of a security-enhanced image based upon the OpenSolaris 2009.06 release. This image went beyond just the simple hardening of its predecessor to add functionality such as encrypted swap, non-executable stacks and auditing that was enabled by default. With such a strong foundation, it should have been no surprise that it was likely to be used as a foundation for layered functionality. Just this month, for example, we announced the release of an image pre-configured with Drupal (v6.10) along with Apache (v2.2), MySQL (v5.0), and PHP (v5.2).

In parallel, the Immutable Service Containers project was announced back in June. This project was focused on the creation of secure execution environments for services. One of the key deliverables from this project has been the OpenSolaris ISC Construction Kit (Preview) that transforms an OpenSolaris 2009.06 system into an ISC configuration. Interestingly, several of the functional elements used today as part of the security-enhanced AMIs actually got their start as part of the ISC Construction Kit.

This brings us to today. For the first time, we have been able to create ISCs in the Cloud on Amazon EC2! Using the OpenSolaris ISC Construction Kit and the security-enhanced OpenSolaris 2009.06 AMI, we have deployed an ISC that exposes a representative service (in this case, a web server).

HELLO WORLD!

The nice thing about this is that the installation process was essentially the same as the one we used to create our pre-configured OVF image. There were two settings that needed to be adjusted in order for the ISC Construction Kit to properly work on EC2:

export ISC_SVCS_DOCK="fs network zone encrypted_scratch"
export ISC_DOCK_NET_IF_NAME="xnf0"

These two parameters had to be set before running the iscadm.ksh command. The first parameter simply removes steps that have already been completed in the base AMI (or are not needed for EC2). The second parameter changes the network interface name from e1000g0 (default) to xnf0 which is needed on EC2. That's all there was to it!

If you are interested in ISCs and how you can use them in your environment, I would love to hear from you! Also, just in case you missed it, I had the pleasure of joining Hal Stern to discuss ISCs on a recent Innovating@Sun podcast. Check it out and send us your feedback! Thanks in advance!

Take care!

Technorati Tag:

Immutable Service Containers @ OSDevCon

Just wanted to let everyone know of a new Immutable Service Containers technical presentation (ODF, PDF) that has been posted. This version was delivered last week in Dresden, Germany at the OpenSolaris Developer Conference. This presentation has all of the latest and greatest information particularly on the OpenSolaris ISC Construction Kit. As always, I would love to hear your comments and feedback!

Take care!

Technorati Tag:

☞ Random Weekend Grab-bag

ZFS Deduplication

You knew this day was coming: ZFS now has built-in deduplication.

If you already know what dedup is and why you want it, you can skip the next couple of sections. For everyone else, let's start with a little background.

What is it?

Deduplication is the process of eliminating duplicate copies of data. Dedup is generally either file-level, block-level, or byte-level. Chunks of data -- files, blocks, or byte ranges -- are checksummed using some hash function that uniquely identifies data with very high probability. When using a secure hash like SHA256, the probability of a hash collision is about 2^-256 = 10^-77 or, in more familiar notation, 0.00000000000000000000000000000000000000000000000000000000000000000000000000001. For reference, this is 50 orders of magnitude less likely than an undetected, uncorrected ECC memory error on the most reliable hardware you can buy.

Chunks of data are remembered in a table of some sort that maps the data's checksum to its storage location and reference count. When you store another copy of existing data, instead of allocating new space on disk, the dedup code just increments the reference count on the existing data. When data is highly replicated, which is typical of backup servers, virtual machine images, and source code repositories, deduplication can reduce space consumption not just by percentages, but by multiples.

What to dedup: Files, blocks, or bytes?

Data can be deduplicated at the level of files, blocks, or bytes.

File-level assigns a hash signature to an entire file. File-level dedup has the lowest overhead when the natural granularity of data duplication is whole files, but it also has significant limitations: any change to any block in the file requires recomputing the checksum of the whole file, which means that if even one block changes, any space savings is lost because the two versions of the file are no longer identical. This is fine when the expected workload is something like JPEG or MPEG files, but is completely ineffective when managing things like virtual machine images, which are mostly identical but differ in a few blocks.

Block-level dedup has somewhat higher overhead than file-level dedup when whole files are duplicated, but unlike file-level dedup, it handles block-level data such as virtual machine images extremely well. Most of a VM image is duplicated data -- namely, a copy of the guest operating system -- but some blocks are unique to each VM. With block-level dedup, only the blocks that are unique to each VM consume additional storage space. All other blocks are shared.

Byte-level dedup is in principle the most general, but it is also the most costly because the dedup code must compute 'anchor points' to determine where the regions of duplicated vs. unique data begin and end. Nevertheless, this approach is ideal for certain mail servers, in which an attachment may appear many times but not necessary be block-aligned in each user's inbox. This type of deduplication is generally best left to the application (e.g. Exchange server), because the application understands the data it's managing and can easily eliminate duplicates internally rather than relying on the storage system to find them after the fact.

ZFS provides block-level deduplication because this is the finest granularity that makes sense for a general-purpose storage system. Block-level dedup also maps naturally to ZFS's 256-bit block checksums, which provide unique block signatures for all blocks in a storage pool as long as the checksum function is cryptographically strong (e.g. SHA256).

When to dedup: now or later?

In addition to the file/block/byte-level distinction described above, deduplication can be either synchronous (aka real-time or in-line) or asynchronous (aka batch or off-line). In synchronous dedup, duplicates are eliminated as they appear. In asynchronous dedup, duplicates are stored on disk and eliminated later (e.g. at night). Asynchronous dedup is typically employed on storage systems that have limited CPU power and/or limited multithreading to minimize the impact on daytime performance. Given sufficient computing power, synchronous dedup is preferable because it never wastes space and never does needless disk writes of already-existing data.

ZFS deduplication is synchronous. ZFS assumes a highly multithreaded operating system (Solaris) and a hardware environment in which CPU cycles (GHz times cores times sockets) are proliferating much faster than I/O. This has been the general trend for the last twenty years, and the underlying physics suggests that it will continue.

How do I use it?

Ah, finally, the part you've really been waiting for.

If you have a storage pool named 'tank' and you want to use dedup, just type this:

zfs set dedup=on tank

That's it.

Like all zfs properties, the 'dedup' property follows the usual rules for ZFS dataset property inheritance. Thus, even though deduplication has pool-wide scope, you can opt in or opt out on a per-dataset basis.

What are the tradeoffs?

It all depends on your data.

If your data doesn't contain any duplicates, enabling dedup will add overhead (a more CPU-intensive checksum and on-disk dedup table entries) without providing any benefit. If your data does contain duplicates, enabling dedup will both save space and increase performance. The space savings are obvious; the performance improvement is due to the elimination of disk writes when storing duplicate data, plus the reduced memory footprint due to many applications sharing the same pages of memory.

Most storage environments contain a mix of data that is mostly unique and data that is mostly replicated. ZFS deduplication is per-dataset, which means you can selectively enable dedup only where it is likely to help. For example, suppose you have a storage pool containing home directories, virtual machine images, and source code repositories. You might choose to enable dedup follows:

zfs set dedup=off tank/home

zfs set dedup=on tank/vm

zfs set dedup=on tank/src

Trust or verify?

If you accept the mathematical claim that a secure hash like SHA256 has only a 2^-256 probability of producing the same output given two different inputs, then it is reasonable to assume that when two blocks have the same checksum, they are in fact the same block. You can trust the hash. An enormous amount of the world's commerce operates on this assumption, including your daily credit card transactions. However, if this makes you uneasy, that's OK: ZFS provies a 'verify' option that performs a full comparison of every incoming block with any alleged duplicate to ensure that they really are the same, and ZFS resolves the conflict if not. To enable this variant of dedup, just specify 'verify' instead of 'on':

zfs set dedup=verify tank

Selecting a checksum

Given the ability to detect hash collisions as described above, it is possible to use much weaker (but faster) hash functions in combination with the 'verify' option to provide faster dedup. ZFS offers this option for the fletcher4 checksum, which is quite fast:

zfs set dedup=fletcher4,verify tank

The tradeoff is that unlike SHA256, fletcher4 is not a pseudo-random hash function, and therefore cannot be trusted not to collide. It is therefore only suitable for dedup when combined with the 'verify' option, which detects and resolves hash collisions. On systems with a very high data ingest rate of largely duplicate data, this may provide better overall performance than a secure hash without collision verification.

Unfortunately, because there are so many variables that affect performance, I cannot offer any absolute guidance on which is better. However, if you are willing to make the investment to experiment with different checksum/verify options on your data, the payoff may be substantial. Otherwise, just stick with the default provided by setting dedup=on; it's cryptograhically strong and it's still pretty fast.

Scalability and performance

Most dedup solutions only work on a limited amount of data -- a handful of terabytes -- because they require their dedup tables to be resident in memory.

ZFS places no restrictions on your ability to dedup. You can dedup a petabyte if you're so inclined. The performace of ZFS dedup will follow the obvious trajectory: it will be fastest when the DDTs (dedup tables) fit in memory, a little slower when they spill over into the L2ARC, and much slower when they have to be read from disk. The topic of dedup performance could easily fill many blog entries -- and it will over time -- but the point I want to emphasize here is that there are no limits in ZFS dedup. ZFS dedup scales to any capacity on any platform, even a laptop; it just goes faster as you give it more hardware.

Acknowledgements

Bill Moore and I developed the first dedup prototype in two very intense days in December 2008. Mark Maybee and Matt Ahrens helped us navigate the interactions of this mostly-SPA code change with the ARC and DMU. Our initial prototype was quite primitive: it didn't support gang blocks, ditto blocks, out-of-space, and various other real-world conditions. However, it confirmed that the basic approach we'd been planning for several years was sound: namely, to use the 256-bit block checksums in ZFS as hash signatures for dedup.

Over the next several months Bill and I tag-teamed the work so that at least one of us could make forward progress while the other dealt with some random interrupt of the day.

As we approached the end game, Matt Ahrens and Adam Leventhal developed several optimizations for the ZAP to minimize DDT space consumption both on disk and in memory, key factors in dedup performance. George Wilson stepped in to help with, well, just about everything, as he always does.

For final code review George and I flew to Colorado where many folks generously lent their time and expertise: Mark Maybee, Neil Perrin, Lori Alt, Eric Taylor, and Tim Haley.

Our test team, led by Robin Guo, pounded on the code and made a couple of great finds -- which were actually latent bugs exposed by some new, tighter ASSERTs in the dedup code.

My family (Cathy, Andrew, David, and Galen) demonstrated enormous patience as the project became all-consuming for the last few months. On more than one occasion one of the kids has asked whether we can do something and then immediately followed their own question with, "Let me guess: after dedup is done."

Well, kids, dedup is done. We're going to have some fun now.

ZFS deduplication has been integrated

A really great news - ZFS deduplication project has been integrated into ON source base.

The funny thing about it is that I was having a conversation with a friend yesterday and said that I am willing to bet money on ZFS dedup going to be in over the course of the nearest two weeks. I never expected it to happen that quick!

Headless Sun xVM VirtualBox guests via SMF

I finally had some extra time this weekend to sit down and finish an SMF manifest to support headless VirtualBox guests. Overall, I am quite pleased with the result:

The source for the service manifest and method script is available here. These files should be copied to /var/svc/manifest/application/virtualbox/ and /lib/svc/method/, respectively.

I have also created an IPS package for OpenSolaris users; if you are interested in installing the package, you will need to add my private IPS repository first:
% pfexec pkg set-authority -O http://arf.ubound.org/pkg/ arf.ubound.org
Once the repository has been added, you can then install the virtualbox-headless package:
% pfexec pkg install virtualbox-headless
Once the files are in place, you may need to import the manifest manually; since there are no default instances, svc:/system/manifest-import:default has a tendency to skip the manifest during a bulk import (this issue seems to affect the IPS package as well):
% pfexec svccfg import /var/svc/manifest/application/virtualbox/virtualbox-headless.xml
Once you have imported the manifest, it is time to add each guest you wish to manage. I have written the manifest such that a guest is identified by the name of the instance itself. For example, if I wanted to add a guest named qnx641, I would issue the following:
% pfexec svccfg -s virtualbox/headless add qnx641
You may then enable the instance via svcadm(1M):
% pfexec svcadm enable virtualbox/headless:qnx641
Two properties are provided to control the start and stop behavior of each instance:
  1. The vbox/start_type property corresponds to the --type argument passed to VBoxManage startvm; by default, it is set to headless. Possible values are: gui, sdl, vrdp, and headless, however only the last two really make any sense when used with SMF.

  2. The vbox/stop_method property corresponds to the argument passed to VBoxManage controlvm which is responsible for stopping the instance; by default it is set to savestate. Possible values are: pause, resume, reset, poweroff, savestate, acpipowerbutton, and acpisleepbutton
For those unfamiliar with SMF, these properties may be set by using this pattern:
% pfexec svccfg -s virtualbox/headless:qnx641
svc:/application/virtualbox/headless:qnx641> addpg vbox application
svc:/application/virtualbox/headless:qnx641> setprop vbox/stop_method = astring: "poweroff"
^D
Since this is a "wait" model service, it does not properly support a user shutting down the guest outside of svcadm(1M) invocations; the service will continue to report its status as online. In this case, a simple disable or restart will resolve the problem.

Enjoy!

Translating the OpenSolaris Bible: Japanese

There is an interesting discussion going on in the Japan OpenSolaris Community. Ken Okubo has been floating the idea of translating the OpenSolaris Bible into Japanese. The thread is getting long and it looks like there is some progress. Translating a book that is over a thousand pages long as a community project is a big deal. If you'd like to contribute, ping Ken on ug-jposug at opensolaris dot org. He's a good guy. Sign up to the JPOSUG list here.

Imagine how helpful it would be to get such a book localized into Japanese. It would be a fantastic community-building tool.

OpenSolaris Development Build 126

I updated to OpenSolaris developer build 126 today. Good so far. Go get it here. Remember: these are development builds. Read the release notes. File bugs. Get involved. Enjoy. Also, go here for a Free CD of the OpenSolaris 2009.06 product.

Setting up new Collectives on XWiki

I will be starting to set up new Collectives on XWiki later this week. For the past few months we've had a temporary moratorium on creating new infrastructure on the site due to the website transition. The interim period was way longer than we expected. Apologies for that. I'll clean out the queue this week in the order in which the requests came in over the past few months. Also, if you have been waiting to submit Collective proposals to your Community Groups for new Projects or User Groups, please feel free to move ahead now. The same applies for new Community Groups getting approval from the OGB. I only get involved in this process on the implementation end, and everything you need to know about that is documented here: Collective Life Cycle Process.

OSDevCon Images

Here are some images from OSDevCon in Germany last week. I grabbed them off of advocacy-discuss from Wolfgang, Karim, and Nicolas. And I see Teresa was taping the event, so watch the OSDevCon site for video (presos already there). I am really bummed I couldn`t go this year. But I have been totally swamped (slightly overwhelmed, actually) and sick, and so the schedule just made it impossible. I am seriously cutting back this year. Need to get back to some sort of balance for my own sanity and health. Anyway, the conference looked very cool. I continue to be impressed with the OpenSolaris User Groups as they just go about the day-to-day business of building community.

Photos: here, here, here. osdevcon09 tag on Flickr here. If more crop up, I will update this post.

November 01, 2009

xxxxSolaris?

Here's something in the category of "things that makes you go wha?!?": The OpenSolaris Security Summit has been renamed to simply "Solaris" Security Summit.

If we've been looking for the first shot fired at OpenSolaris this would seem to be it. The question is whats next? When you combine this with the recent resurrection of "Solaris Next" (aka: Solaris 10++) it starts suggesting something is in the works, undoubtedly Oracle orchestrated.

Now, at this point I'm not jumping to any conclusions, and I don't think you should either. Oracle's intentions seem fairly clear at the moment and entirely positive for the future of Solaris and SPARC; and we know that X86 is also a part of that vision. Turning some love away from the OpenSolaris distro towards Solaris will be a welcome change for large enterprise customers, and undoubtedly a motivating factor.

My advise is to watch and wait... the wheels are turning.

If folks from Oracle/Sun are reading this; do what you wish with the Solaris product roadmap, but the community and source for Solaris are a critical part of a successful future. Please feel free to reassure us that we won't lose that. I personally rely on access to the source for problem analysis and research on a daily basis and having access to Solaris developers, both badged and unbadged, is something I never want to be without again.

October 31, 2009

Halloween 2009

Happy Z-Day everyone!

Breaking with tradition somewhat, I'm not sure there's going to be any fireworks photos here this year: we're down at my parents house, and are less likely to get the sort of sustained shelling that we normally experience in Raheny each year.

On the plus side, we had homemade pumpkin soup for lunch, so was at least able to get this shot. If there's any fireworks later on, I'll update this post - but otherwise, here's some Halloween cheer:

The day's been great so far - lovely birthday presents (a Merino base-layer and a copy of Neverwhere from the lovely missus, and DVDs of the first two Ice Age films from E (I think she had an ulterior motive there!) and a nice fleece from my folks)

I also popped out for a birthday run around Greystones this afternoon, only 10k but it was enough for me to realise I'm far from being back in running form: the recovery is going to last a few more weeks I think!

My folks are baby-sitting tonight, so myself and the missus get to go out for a grown-up dinner, which I'm really looking forward to. Happy Halloween!

Update: There were fireworks after all, here's a few shots: fantastic, the tradition continues!

OpenSolaris at the Tokyo Open Source Conference

Here are some images from the Fall 2009 Tokyo Open Source Conference. The OpenSolaris community participated with presentations from Reiko Saito and Masafumi Ohta and a booth full of demos for the weekend event. There are some NetBeans and Linux guys mixed in here as well. There were dozens and dozens of communities there.

☞ Choice Drivers

  • It may be an advertising stunt, but the videos are great and it embodies the idea central to open source that people contribute readily to thinks they get a kick out of.
  • "But to suggest that taking ecstasy is less dangerous than horse-riding, or that cannabis is safer than alcohol and tobacco - however true that may be - is to say the unsayable in the political drugs debate" -- The UK has a government that would rather appear OK to the Daily Mail reader than actually do what's right according to the experts advising them. It's true in the case of drugs, and it's true in the case of the internet and downloads. Watching Labour erode its core of support as it desperately tries to win over the Conservatives' heartland.
  • Good list, although I disagree with a few of thee choices which seem to prioritise simplicity over safety (for example, there's no way I will use Empathy for IM without OTR).

☞ Competition

  • Carlo Piana (Europe's answer to Eben Moglen) once again delivers a clear analysis, this time showing how Amazon's announcement of hosted MySQL in the cloud punches a hole in Stallman's argument against the Oracle acquisition. Looking forward to hearing from Stallman both why Carlo is wrong and why dual-license is good for software freedom.
  • Finally Flickr has a serious competitior.
    (tags: Cat)