Register Discussions Communities Projects Download Source Browser

February 10, 2010

IBM 2010: Customers in Revolt

From my own experience their sales people are very aggressive with an attitude of sale first and let someone else worry later. While I always take any vendor claims with a grain of salt I learnt to double or even triple check any IBM's claims.

February 09, 2010

Check out latest development build of Web Stack

If you are using Cool Stack or Web Stack for your AMP (Apache, PHP and MySQL) stack needs on either Solaris 10 or Red Hat Enterprise Linux, then you might want to check out the latest development build update of Web Stack. This build delivers PHP 5.2.12 , Apache 2.2.14 etc. 

For more information on how to update to this build, please refer to my earlier blog on this topic. If you run into any issues, please use our forum for any trouble shooting tips.

Medialib (SUNWmlib) and Adobe flash plugin for Firefox

Hi All,

 So I'm just playing with B132 on the Acer One 8.9".

Even though I downloaded the flash plugin from Adobe and installed it to /usr/lib/firefox/plugins it would not show up in firefox.

I found out in the end that you need to have SUNWmlib installed for the flash plugin to work.

I also tested the attachment in the http://defect.opensolaris.org/bz/show_bug.cgi?id=14099 , after a restart both the Acer One 8.9" and MSI wind could run compiz.

Tokyo2Point0 020810

I went to the Tokyo2Point0 event last night. There were 250 people there, so it was a packed house for sure. Really nice to catch up with a bunch of people. I haven't been to one of these events in many months. Just been too busy. It was also to good to see Michael Sullivan do a short talk on the OpenSolaris Bible Translation Project, too.

Power 7

Now lets wait for some benchmarks. I only wish Solaris was running on them as well as right now you need to go the legacy AIX route or not so mature Linux route - not an ideal choice.

☞ Leaving a bad taste

Japan OpenSolaris Community at OSC

The OpenSolaris Community in Japan will participate at the Spring Tokyo Open Source Conference with three talks from Keiichi Oono, Kenichi Mizoguchi, and Masafumi Ohta on February 27th. See Ohta-san's announcement in Japanese and English.

opensolaris

Auth Update: Early

We had planned to update auth.opensolaris.org this week, but Alan and Martin finished this phase of the work early and deployed the upgrade last Friday. It's always cool to get something done, tested, and out the door early. This latest version of auth.opensolaris.org offers the following changes:
  • New public information screens displaying much more detail about user, collective, and governance relationships (these screens will be accessible via each XWiki Collective in the near future as well).
  • The ability to download the data from the public info screens in multiple formats.
  • New screens in each private user account displaying summary data from all the user's relationships with start and end dates.
  • The addition of eight languages (so Auth is now localized into 25 languages).
  • Some miscellaneous bug fixes and probably some stuff I missed.
Also, some of the elements on the auth.opensolaris.org page (headers and footers, basically) are now dawn via a new web service that has also been localized, so as we integrate all of the subsites with auth.opensolaris.org we'll start to layer a common look/feel across the entire site. This will take some time and come together in pieces, but the latest step is encouraging. Also, when the new SCM Console at repo.opensolaris.org is deployed, it will be localized as well (the first set of localizations is already done). Please note that all of these content localizations are contributions from the i18n/l10n community, so people from around the world are directly helping evolve the site. If the community didn't contribute this work, the site would be in one language: English. So, these contributions are huge. Here's how to contribute site localizations.

And finally, there has been a bit of confusion on some lists recently about how the community is organized and the various roles/rights people have on the site. If anyone has any questions, please read the Roles & Collectives document first. It's the only document on the site that explains all the roles and all the collectives and all of the website and governance privileges. Send questions to website-discuss.

Conspiracy Theory

Well, as many of you, i remember all the criticism Sun and the OpenSolaris project did receive on the start (the license, the company behind, etc). I think i’m not radical about Open Source software, i did use GPL, BSD, CDDL, and even proprietary software. I have my personal opinion about it, but i don’t [...]


New pkg(5) license functionality

For those of you that require license acceptance or display for your packages, you should be aware that starting with OpenSolaris development build b131, new functionality will be delivered to support license acceptance and display.

The following fixes and enhancements were integrated:

  • 5943 add 'must-accept' attribute to license actions
  • 5586 licenseinfo api needs to expose license action attributes
  • 13155 add 'must-display' attribute to license actions
  • 13158 change pkg.client.api to understand and require license acceptance
  • 13160 pkg(1) needs update for client api license acceptance changes 

These changes were made to support packages that require display or acceptance of license related data during package install and update operations.  As a result of these changes, the pkg(1) client and client API require explicit acceptance and/or indication of license display during install and update operations.

Please note that (currently) no packages exist that use this functionality, but they are expected in the near future.

Client Considerations

If any of the license actions contained in a package being installed or updated have must-accept=true, the following pkg(1) subcommands require that the new --accept option be provided before the operation will proceed:

  • install
  • fix
  • image-update
  • change-variant
  • change-facet

In addition, all of the above subcommands also now have a --licenses option to display the payload of all the licenses for packages part of the operation.  For example:

pkg install -n --licenses foo

The above command would display all of the licenses for the packages that would be installed or updated if the package 'foo' were installed or updated.  If the --accept option is not provided, and a license requires acceptance, the pkg(1) client will now exit with exit code 6, indicating license acceptance failure.  If a license requires display, the pkg(1) client will display it during install/update operations; this cannot be suppressed.

Client API Changes

Version 29:
Incompatible with clients using versions 0-28:
    The ImageInterface class has changed as follows:

  • set_plan_license_status() was added.  This is used to indicate
              whether licenses for the packages being operated on have been
              accepted or displayed.  Clients must do this if the related
              license requires acceptance or display.

    The LicenseInfo class has changed as follows:

  • get_text() may now trigger a remote retrieval of the license
              payload if needed to return the text.
  • The related package FMRI and license attributes are now
              properties: fmri, license, must_accept, and must_display.

    The PlanDescription class has changed as follows:

  • get_changes() is now a generator function.
  • get_licenses() was added to allow clients to retrieve the
              list of licenses related to the plan's operations as well
              as the current accepted and displayed status of each.  Please
              note that this function returns _all_ licenses related to the
              operation; not just those that require acceptance or display.

Publication Considerations

To use this new license acceptance functionality, simply add must-accept=true or must-display=true (as appropriate) to license actions in your package manifest.  An example pkgsend sequence might look like this:

open licensed@1.3,5.11-0
add depend type=require fmri=baz@1.0
add file /tmp/libc.so.1 mode=0555 owner=root group=bin path=/lib/libc.so.1
add license /tmp/libc.copyright license=libc.copyright must-display=True
add license /tmp/libc.license license=libc.license must-accept=True
close

Please note that this functionality is not supported before build 131, and that you should use this functionality sparingly.  must-accept=true should not be placed on the majority of open source licenses (BSD, GPL, etc.) and must-display should only be set if absolutely necessary. 

Comments or concerns should be sent to the pkg-discuss mailing list on opensolaris.org.

February 08, 2010

Simplifying Publisher Configuration

Changeset 1736 in the pkg(5) gate made the following changes:

  11522 pkg should require publisher prefix to match repository information
  7156 client image api needs image creation interface
  12744 update_publisher over-zealously testing publisher validity
  14203 image-create usage doesn't mention mirror / origin options

What changed?

The pkg(1) client and the pkg.client.api will now attempt to retrieve publisher configuration information from package repositories.  This information is used to auto-configure new publishers, validate configuration requests, and to update the configuration of existing publishers.

These changes were made with a focus on simplifying publisher addition and updates of existing publisher configuration.

Further improvements and additions to this functionality are planned for future releases.

Are there any pkg.depotd(1m) changes?

While the pkg.depotd(1m) program did not change, clients are now more reliant on its correct configuration.

As already mentioned in the past, please be certain that you set the publisher.prefix property found in the repository's cfg_cache file correctly.  The pkg.depotd man page contains instructions on how to set this property.

Please also note that the repository.origins and/or repository.mirrors properties should be set as clients will use these to automatically configure new publishers and to update existing publishers.  You might also want to consider providing a description so that users have an idea of what sort of packages the repository contains.

Part of an example cfg_cache might look like this:

[publisher]
prefix = example.com
alias = None
...
[repository]
origins = http://pkg-us.example.com/,http://pkg-ca.example.com/
mirrors = http://pkg-us-mirror1.example.com/,http://pkg-ca-mirror1.example.com/
...

How has pkg(1) image-create changed?

The client remains completely compatible syntax-wise, so no changes to existing scripts are required, and behaviour when using the older syntax (other than validating publisher configuration) remains unchanged.

However, image-create now also accepts -p <uri> where <uri> is the URI of a package repository and example usage is as follows:

pkg image-create -p <uri> /target

When using this syntax, image-create will retrieve all of the publisher's configuration information from the target package repository and add all of the publishers it finds to the image.

How has pkg(1) set-publisher changed?

The syntax of the set-publisher command remains backwards compatible, so no changes to existing scripts, etc. are required.  However, set-publisher now accepts the -p option which accepts the URI of a package repository from which all publisher configuration information will be retrieved.

Any new publishers found in the retrieved configuration information will be added, while existing ones will be updated if the provided URI is already in the list of configured origins for the publisher being updated.

If a publisher name is provided, then -p will only use publisher configuration information that matches the provided name.

As an example, the old syntax to add the contrib repository might have been:

pkg set-publisher -g http://pkg.opensolaris.org/contrib \
  contrib.opensolaris.org

The new syntax is:

pkg set-publisher -p http://pkg.opensolaris.org/contrib

How has pkg.client.api changed?

Incompatible with clients using API versions 0-31.

The ImageInterface class has changed as follows:

  • The add_publisher and update_publisher methods now validate the image's publisher configuration against the origins of the publisher.  If any of the origins are found to not match, an UnknownRepositoryPublishers exception will be raised. If one of the new repository origins does not provide publisher configuration information or it is incomplete, a RepoPubConfigUnavailable exception will be raised.

The pkg.client.api module has changed as follows:

  • A new method named image_create has been added.  See 'pydoc pkg.client.api' for details.


The pkg.client.api_errors modules has changed as follows:

  • UnknownRepositoryPublishers, RepoPubConfigUnavailable, and UnknownErrors expections have been added for use by the pkg.client.api.  API consumers are reminded that they should catch all ApiException class exceptions, although catching specific exception subclasses for case-by-case handling in addition to that is acceptable.


Feedback is welcomed on the pkg-discuss mailing list on opensolaris.org.

Have You Read the Release Notes?

I can't count how many times I've read in the various OpenSolaris forums "Read the release notes."

It's true, the OpenSolaris release notes are chock full of good information. However, where are these mysterious release notes?

The release notes are posted by David Comay to the osol-announce and indiana-discuss mailing lists. However, you have to filter through all the other traffic on those lists to find them.

I've thought about setting up a wiki pointing to them, but that's just one more thing I'd forget to maintain. Instead, here's a quick Google search that seems to do the trick. If you can think of a better way to customize it, please let me know.

☞ Community Matters

OpenSolaris: My Original Pre Launch Email in 2005

Earlier today I was thinking about the original "good luck" email I sent to the OpenSolaris Pilot Community just before we opened the project in June of 2005. Fortunately, the opensolaris-discuss public archive actually goes back 9 months before we launched, so this mail survives in the open and from the other threads you get a glimpse into some of the very earliest conversations taking place when the project was private. Anyway, what strikes me is how different the situation was back then, how utterly conservative we were, and how my thinking has changed as a result of my experiences all along the way. A day after I sent this email, we opened. See my opening blog here, and the result of that opening announcement here. History. Always enlightening.

[osol-discuss] Good Luck and Thank You

Jim Grisanzio Jim.Grisanzio at Sun.COM
Mon Jun 13 17:27:01 PDT 2005

Hello, OpenSource Pilot Community.

I just wanted to chime in before the fur really flies around here:

 Good Luck, and Thank You!

You all deserve Sun's thanks for your efforts and your patience this 
year. It should be wild day tomorrow, for sure, so light up those blogs 
and start talking, guys. The engineers are leading this launch tomorrow, 
make no mistake about it.

Oh, and if you want to bring someone into the program, you *don't* have 
to call me and sign another f****** NDA. Just do it. I can't tell you 
how happy I am to not have to dig out another NDA. Not that I could read 
the damn thing but whatever. It's such a cold way to start a friendly 
little conversation, don't you think? Also, I've tried to honor as many 
of your requests (and those from internal people) as possible to get 
people into the program. We ended up with 145, but quite frankly, dozens 
and dozens of developers never made it in due to lack of time or 
resources. We even had a dozen Chinese engineers all briefed, 
translated, and NDA-signed but couldn't get export control approval in 
time. It drove me nuts for three months. I'm more than a bit pissed 
about that one.

Anyway, I hope you are happy with the results of what we are all 
releasing. The core team here has worked almost non-stop for weeks on 
this to get ready for the final push. We wanted to do more, you know 
that, but hey, look at where we were last year and look at the potential 
tomorrow brings. Also, the OpenSolaris team internally really has been 
genuine in their intentions, I can assure you. At times we've not been 
as open as we could have been -- we get that -- but I hope you believe 
me when I say that many people on the team fought hard on your behalf 
all year long. Every time you told us we were full of shit on something 
we took it to heart and it went up line. There were a few, ah, heated, 
conversations regarding some of the issues that were discussed in the 
pilot. We won some and we lost some, but every time we moved a little 
closer to our goal of openness. As you've seen, this stuff takes time. I 
wish we could have exposed more of that process to you. Next time it 
will probably be easier to do that.

As this program has grown it's garnered attention from all across Sun 
and from Sun's competitors and supporters. Just recently, I've heard 
from executives and engineers traveling to South America and to Asia, 
and they report that there *absolutely* is massive community interest 
out there. Even Wall Street has noticed. Some people are probably a bit 
confused since the Solaris community was supposed to be dead by now. 
Well, too bad. It's too late. They lost their window of opportunity to 
crush us. Our next step is to stay positive and to engage the interest 
we know is there, make it tangible, and grow this OpenSolaris community.

In a very real way, you've all been part of something special here. 
You've helped change this company and potentially an entire market along 
the way. Some people may not know this quite yet, but they'll surely 
find out tomorrow. You are some of the most knowledgeable people in the 
world about Solaris, and you've help make OpenSolaris a possibility. 
Congratulations and we'll see you on the other side.

Jim

February 07, 2010

☞ Worrying Trend

☞ The Advance of Open

Win the War, Write the History

It matters greatly who wins the war because the winners write the history and they rarely -- if ever -- characterize events accurately. That's what makes history fun. It's a puzzle and it's always changing. In this case I'm talking about Caesar, who in 58 A.D. destroyed the Celts in Gaul (France), killed and enslaved millions, took the gold, propagandized the history, and went on to rule Rome as Emperor. Nice guy. That is of you like vicious dudes running psychotic military dictatorships. But whatever. The point is that the Romans won, so their view of things survived throughout the ages. But I'm more interested in what was lost? What did the Romans conveniently leave out of their history?

For that, check out The Primitive Celts, an entertaining and fascinating look at the Celts, who the Romans say were mere barbarians. But were they? Seems some archaeologists are discovering the Celts actually had a highly developed society with the most advanced calender at the time and a sophisticated economy based on a variety of trades. They minded gold all across Europe, and they built a vast network of roads to facilitate international trade. Generally, the contrast to Rome was nearly total. Where the Celts decentralized things into a web and community-like structure, the Romans centralized them into a rigid hierarchy. And that proved a critical and fatal difference -- at least in ancient times. Centralization won. Big time, actually.

But I wonder if that distinction remains true today. What's the better concept around which to build a society in 2010? And, more importantly, who wins the war when these differences collide for whatever reason? Surely the world today is substantially different than when the Romans were wrecking the place two thousand years ago, but would their systems prevail today? You can look at this from the perspective of a county or a company or even a project. It's just the management of resources to achieve a goal. Nothing more. But my question asks which is better and who wins now?

February 06, 2010

Nexenta = flying a F29 with a wii remote and other highlights of the last few days

Just since Wednesday of this week here are a few things that have happened:

  • Three new partners have joined Nexenta from different parts of the world
  • A major central European partner — one of the largest server resellers in Europe — has just about completed work for their public launch at CEBIT as a Nexenta Certified partner in early March; a significant print ad campaign will be kicked off at CEBIT as well featuring NexentaStor
  • One of the world’s top technology companies took delivery of a HA Cluster with attached JBODs from PogoLinux and reached out to us and Pogo to say this is great and they’d like us to talk to their corporate headquarters about a larger relationship.  Corporate is speaking to us Monday about how we can work together to satisfy their ‘infinite demand’ for storage.
  • Another of our name brand customers has purchased more systems in Europe, China and India.  Our partner Inprove in the Netherlands is working with this customer in Europe.  Learn more via Twitter update here.
  • One of the largest server makers in the world suggested that perhaps they could start a line of pre-certifed NexentaStor appliances.   Certification could start as soon as early March.  We are working with them to define which predefined configurations make the most sense.
  • Ongoing progress with a couple of NexentaStor.org ‘plug-in’ developers including one who has a really interesting enclosure and disk monitoring solution that we expect to be available shortly.
  • We believe we were the first storage company to check-in code with Citrix / Xen to support the  new StorageLink disaster recovery capabilities.  We will officially announce this after Citrix / Xen review.  Learn more about the NexentaStor StorageLink plug-in here.
  • A 192TB system for part of the Dutch government went live.  A photo is available via this Tweet.
  • An evaluator said that Nexenta, thanks to the use of an easy interface plus the Solaris kernel, is like “flying a F29 with a Wii remote.”  See tweet here.
  • 154 new registered users since Wednesday midnight
  • Several press references including:
    • Interview with Evan on why ZFS is all about community, community, community Here
    • An interview from Linux Magazine on Nexenta.org and CEBIT was published Here
  • Another few investors pinged us to see about investing.  Since we do not need money, we look to be a good investment ;)
  • FOSDEM (Free and OpenSource Developer’s European Meeting) kicked off in Brussels.  Sorry to miss it this year.  But StormOS, based on NCP, is there with free disks that include NCP with pNFS.  Find Andy, get free code!  See his Tweet here.
  • Registration for our first community oriented ZFS and NexentaStor training started.  This will be held in the famous Atlanta Athletic Club.  For those golf nuts out there, this was the home course of Bobby Jones.  Click here to learn more and register.

And, yes, lots of work.  For example several new systems installed for our automated test and certification solution, approximately 60 customers and countless prospects have had their questions answered, and significant recruiting work on our immediate priorities including inside sales, sales engineering, software develand support.

Hopefully this behind the scenes look into life at a fast growing start-up is of interest to some of you out there.

B132 rge fix MSI Wind

Hi All,

 So to fix the B132 rge pcie problems on the MSI Wind you need to copy the B130 /kernel/drv/rge somewhere on the system after the pkg image-update, on the first boot after B132 update, copy the stored rge into /kernel/drv and reboot, the errors should go away.

For some reason the Xorg on my system is trying to load the 64 bit libglx.so, which stops compiz running, you can just link the /usr/lib/xorg/modules/extensions/GL/libglx.so back to the /usr/lib/xorg/modules/extensions/libglx.so.

What's pissing me off, is during a restart the link gets set back to the 64/libglx.so, not sure what's doing this yet.

USB sd card ready is not working now either, and cpu 1 is still running at 100% after a resume.

Dave


February 05, 2010

Building latest PostgreSQL on OpenSolaris

I am moving my PostgreSQL on OpenSolaris realted entries to a new external blog. Since it is not part of my $dayjob anymore. Hope you update your bookmarks too.

Read  "Building latest PostgreSQL CVS Head on OpenSolaris".


OpenSolaris Rocks Serbia

Here is a nice example from Serbia demonstrating the value of building a local OpenSolaris community. It can lead to some very interesting organizations paying very close attention to what you are doing. Congrats, guys! Some of the OpenSolaris User Groups are doing some really interesting work out there, and they are contributing to the overall community in a very big way.

Data Corruption - ZFS saves the day, again

We came across an interesting issue with data corruption and I think it might be interesting to some of you. While preparing a new cluster deployment and filling it up with data we suddenly started to see below messages:

XXX cl_runtime: [ID 856360 kern.warning] WARNING: QUORUM_GENERIC: quorum_read_keys error:
Reading the registration keys failed on quorum device /dev/did/rdsk/d7s2 with error 22.

The d7 quorum device was marked as being offline and we could not bring it online again. There isn't much in documentation about the above message except that it is probably a firmware problem on a disk array and we should contact a vendor. But lets investigate first what is really going on.

By looking at the source code I found that the above message is printed from within quorum_device_generic_impl::quorum_read_keys() and it will only happen if quorum_pgre_key_read() returns with return code 22 (actually any other than 0 or EACCESS but from the syslog message we already suspect that the return code is 22).

The quorum_pgre_key_read() calls quorum_scsi_sector_read() and passes its return code as its own. The quorum_scsi_sector_read() will return with an error only if quorum_ioctl_with_retries() returns with an error or if there is a checksum mismatch.

This is the relevant source code:

406 int
407 quorum_scsi_sector_read(
[...]
449 error = quorum_ioctl_with_retries(vnode_ptr, USCSICMD, (intptr_t)&ucmd,
450 &retval);
451 if (error != 0) {
452 CMM_TRACE(("quorum_scsi_sector_read: ioctl USCSICMD "
453 "returned error (%d).\n", error));
454 kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH);
455 return (error);
456 }
457
458 //
459 // Calculate and compare the checksum if check_data is true.
460 // Also, validate the pgres_id string at the beg of the sector.
461 //
462 if (check_data) {
463 PGRE_CALCCHKSUM(chksum, sector, iptr);
464
465 // Compare the checksum.
466 if (PGRE_GETCHKSUM(sector) != chksum) {
467 CMM_TRACE(("quorum_scsi_sector_read: "
468 "checksum mismatch.\n"));
469 kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH);
470 return (EINVAL);
471 }
472
473 //
474 // Validate the PGRE string at the beg of the sector.
475 // It should contain PGRE_ID_LEAD_STRING[1|2].
476 //
477 if ((os::strncmp((char *)sector->pgres_id, PGRE_ID_LEAD_STRING1,
478 strlen(PGRE_ID_LEAD_STRING1)) != 0) &&
479 (os::strncmp((char *)sector->pgres_id, PGRE_ID_LEAD_STRING2,
480 strlen(PGRE_ID_LEAD_STRING2)) != 0)) {
481 CMM_TRACE(("quorum_scsi_sector_read: pgre id "
482 "mismatch. The sector id is %s.\n",
483 sector->pgres_id));
484 kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH);
485 return (EINVAL);
486 }
487
488 }
489 kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH);
490
491 return (error);
492 }

With a simple DTrace script I could verify if the quorum_scsi_sector_read() does indeed return with 22 and also I could print what else is going on within the function:

56 -> __1cXquorum_scsi_sector_read6FpnFvnode_LpnLpgre_sector_b_i_ 6308555744942019 enter
56 -> __1cZquorum_ioctl_with_retries6FpnFvnode_ilpi_i_ 6308555744957176 enter
56 - __1cZquorum_ioctl_with_retries6FpnFvnode_ilpi_i_ 6308555745089857 rc: 0
56 -> __1cNdbg_print_bufIdbprintf6MpcE_v_ 6308555745108310 enter
56 -> __1cNdbg_print_bufLdbprintf_va6Mbpcrpv_v_ 6308555745120941 enter
56 -> __1cCosHsprintf6FpcpkcE_v_ 6308555745134231 enter
56 - __1cCosHsprintf6FpcpkcE_v_ 6308555745148729 rc: 2890607504684
56 - __1cNdbg_print_bufLdbprintf_va6Mbpcrpv_v_ 6308555745162898 rc: 1886718112
56 - __1cNdbg_print_bufIdbprintf6MpcE_v_ 6308555745175529 rc: 1886718112
56 - __1cXquorum_scsi_sector_read6FpnFvnode_LpnLpgre_sector_b_i_ 6308555745188599 rc: 22

From the above output we know that the quorum_ioctl_with_retries() returns with 0 so it must be a checksum mismatch! As CMM_TRACE() is being called above and there are only three of them in the code lets check with DTrace which one it is:

21 -> __1cNdbg_print_bufIdbprintf6MpcE_v_ 6309628794339298 quorum_scsi_sector_read: checksum mismatch.

So now I knew exactly what part of the code is casing the quorum device to be marked offline. The issue might have been caused by many things like: a bug in a disk array firmware, a problem on an SAN, a bug in a HBA's firmware, a bug in a qlc driver or a bug in SC software, or... However because the issue suggests a data corruption and we are loading the cluster with a copy of a database we might have a bigger issue that just an offline quorum device. The configuration is a such that we are using ZFS to mirror between two disks arrays. We have been restoring a couple of TBs of data into and we haven't read almost anything back. Thankfully it is ZFS so we might force a re-check off all data in the pool and I did. ZFS found 14 corrupted blocks and even identified which file is affected. The interesting thing here is that for all blocks both copies on both sides of the mirror were affected. This almost eliminates a possibility of a firmware problem on disk arrays and suggest that the issue was caused by something misbehaving on the host itself. There is still a possibility of an issue on SAN as well. It is very unlikely to be a bug in ZFS as the corruption affected reservation keys as well which has basically nothing to do with ZFS at all. Then we are still writing more and more data into the pool and I'm repeating scrubs and I'm not getting any new corrupted blocks nor quorum is misbehaving (I fixed it by temporarily adding another one, removing the original and re-adding it again while removing the temporary one).

While I still have to find what caused the data corruption the most important thing here is ZFS. Just think about it - what would happen if we were running on any other file system like: UFS, VxFS, ext3, ext4, JFS, XFS, ... Well, almost anything could have happened with them like some data of could be corrupted, some files lost, system could crash, fsck could be forced to run for many hours and still not being able to fix the filesystem and it definitely wouldn't be able to detect any data corruption withing files or everything would be running fine for days, months and then suddenly the system would panic, etc. when application would try to access the corrupted blocks for the first time. Thanks to ZFS what have actually happened? All corrupted blocks were identified, unfortunately both mirrored copies were affected so ZFS can't fix them but it did identified a single file which was affected by all these blocks. We can just remove the file which is only 2GB and restore it again. And all of these while the system was running and we haven't even stopped the restore or didn't have to start from the beginning. Most importantly there is no uncertainty about the state of the filesystem or data within it.

The other important conclusion is that DTrace is a sysadmin's best friend :)


Building latest PostgreSQL CVS Head on OpenSolaris

With the talk about PostgreSQL 9.0 alpha 5, I thought it is time for me to try out another CVS head build on OpenSolaris. Of course this time this was on my home desktop which runs OpenSolaris 2009.06
I wanted to download the CVS head. The instructions are there on the wiki page. However before following it I needed to install the CVS on my OpenSolaris instance.

#pkg install SUNWcvs

The using the instructions I created a copy of the cvs repository and created my own project workspace.

I already had the Sun Compilers on my setup. (If not or have an old copy then you can always install or upgrade it as
# pkg install sunstudio
)
So I started my task of creating the new binaries on OpenSolaris. I found it to be bit bumpy.
Here is my configure options with my standard options.
$ ./configure --prefix=$HOME/project CFLAGS="-xO3 -xarch=native -xspace -W0,-Lt -W2,-Rcond_elim -Xa  -xildoff -xc99=none -xCC" --without-readline

However then I hit my first problem to do make.
Need to do using GNU make which was not installed on my desktop.
Back to pkg manager

# pkg install SUNWgmake

Continuing again with make which proceeded and then eventually stopped again due to missing bison. (I wonder why the "configure" script did  not catch that?)

# pkg install SUNWbison

Anyway I started to run make again now that I have installed bison. Strangely it failed again.
Figured it still did not find bison. I had to use ./configure statement again and tried gmake after that which allowed gmake to pick up bison.

However it failed again.

gmake[3]: Entering directory `/export/home/postgres/project/pgsql.project/src/backend/parser'
/usr/bin/bison -d  -o gram.c gram.y
gmake[3]: *** [gram.c] Broken Pipe
gmake[3]: Leaving directory `/export/home/postgres/project/pgsql.project/src/backend/parser'
gmake[2]: *** [parser/gram.h] Error 2

This one was not easy to solve. I thought that probably the bison was buggy and was about to give up. Then I thought I will give it a shot using truss to figure out what is happening

$ truss -f /usr/bin/bison -d -o gram.c gram.y 2> /tmp/bisontruss.txt

Going through that file bisontruss.txt I found:

10451:  fcntl(6, F_DUP2FD, 0x00000001)                  = 1
10451:  close(6)                                        = 0
10451:  execve("/usr/sfw/bin/gm4", 0x08047D00, 0x08047DA0) Err#2 ENOENT

I had no clue what gm4 does, but it is missing.
I used the search feature of OpenSolaris to see if there is a package associated with it.

# pkg search gm4
INDEX      ACTION    VALUE                     PACKAGE
.....
basename   link      usr/sfw/bin/gm4           pkg:/SUNWgm4@1.4.2-0.111

As they say on TV : Yep, there's a pkg for that

# pkg install SUNWgm4

And now back to gmake

Darn another package missing:
gmake[3]: Entering directory `/export/home/postgres/project/pgsql.project/src/backend/bootstrap'
***
ERROR: `flex' is missing on your system. It is needed to create the
file `bootscanner.c'. You can either get flex from a GNU mirror site
or download an official distribution of PostgreSQL, which contains
pre-packaged flex output.
***
# pkg search -r flex
INDEX      ACTION    VALUE                     PACKAGE
....
basename   link      usr/sfw/bin/flex          pkg:/SUNWflexlex@2.5.33-0.111

# pkg install SUNWflexlex

(Of course gmake wont use it immediately till you use the configure statement again)

Finally after a long time (it sure seemed to take a long time) the gmake seemed to hang on preproc.c compilation:

"preproc.y", line 13548: warning: line number in #line directive must be less than or equal to 32767

This is just not my day.
I did a cleanup of the make. At this point I found that I forgot to enable dtrace probes in my configure statement and also since I am using 64-bit kernel, decided to build 64-bit binaries. Retrying with the slightly modified configure still did not solve the loop problem in preproc.y which just runs 100% on the CPU. I left it running for a long while just to see if it finished (till my patience runs out) It did not finish.

I finally updated my sunstudio binaries and retired  gmake. This time it succeeded in finishing the gmake. (Hence I changed my wordings above to reflect install/upgrade sunstudio).

After that a quick gmake install

$ gmake install

and the binaries are ready.
A quick check reflects
$ initdb
$ pg_ctl start -l server.log
$ psql postgres
psql (8.5devel)
Type "help" for help.

postgres=#

For a second I was expecting to see psql(9.0develop) but this is still acceptable. :-)

In Summary, you want to do the following before building the latest PostgreSQL source on OpenSolaris

# pkg install SUNWcvs sunstudio SUNWgmake SUNWbison SUNWgm4 SUNWflexlex

and the configure script
$ ./configure --prefix=$HOME/project CFLAGS="-m64 -xO3  -xarch=native -xspace -W0,-Lt -W2,-Rcond_elim -Xa  -xildoff -xc99=none -xCC" --without-readline --enable-dtrace DTRACEFLAGS="-64"

And with gmake I am ready to use the latest build of PostgreSQL on OpenSolaris.


Building International Communities in Tokyo

Here are two really nice articles in the Japan Times talking about the international tech community in Tokyo:
The articles describe the meta community here, and that's where we OpenSolaris guys hang out. By contributing to the larger community, we've found that the OpenSolaris community here is growing and earning its way right along side everyone else. There are language and culture barriers to overcome, but we all are making a great deal of progress. It's quite common now to find OpenSolaris developers, administrators, and users participating in multiple international communities, which, of course, helps us to learn in return. And the Web 2.0 community is growing in size and diversity as well. Also, since the tech community locally is well connected globally, we can extend our reach around the world by just interacting right here at home. Here's my photo archive as well (mostly Linux & OpenSolaris).

February 04, 2010

Jonathan Says Goodbye via Twitter Haiku

The message was simple:

Today's my last day at Sun. I'll miss it. Seems only fitting to end on a #haiku. Financial crisis/Stalled too many customers/CEO no more

Please post your thoughts on Jonathan's leaving. Its a mixed emotion... on one hand he set some great goals and put a fire under things. A lot of us believed in him. And yet, he failed to execute and ultimately was responsible for Sun's demise. Could someone else have done a better job and still kept the culture alive? I don't know honestly.

I'll continue to stay neutral on the subject and reserve judgment until the behind-the-scenes stories trickle out over the next months and years. Jonathan screwed up, yes, but I think that Jonathan also got screwed himself, more than we realize. Time will tell.

In other news, Oracle is finally doing what has needed to be done for years: Oracle to Revamp Sun Supply Chain. One of the biggest complaints by customers for years has been inability to get timely delivery of systems. Its good to see signs of that era ending.

Also, Project Darkstar & Kenai are being axed. Project Kenai, a SourceForge like project hosting service provided free by Sun, will close its doors on April 2nd 2010. You have untill then to get stuff out. One of the most important projects there, Immutable Service Containers (ISC) has moved to OpenSolaris.org.

VB 3.1.4_BETA2 available

http://download.virtualbox.org/virtualbox/3.1.4_BETA2/

B131 with Ubuntu 9.10 and Windows 7

Hi All,

 So I've been working on a Samsung NC20, working to get a solid base working with Ubuntu 9.10 and an AT&T 3G usb modem.

I was creating the best hardware platform for a JavaFX project to run on, I'm sending the Samsung to the JavaFX team today.

  I was left with the AT&T 3G usb modem, so I thought, why not install Ubuntu on the MSI Wind.

It already had Window 7 and OpenSolaris B131 and a 40Gb fat32 partition between the two OS's, so I decided to install Ubuntu 9.10 here.

Now the MSI will boot Windows 7, Ubuntu 9.10 or OpenSolaris B131.

I may even try to get the AT&T modem working under OpenSolaris.

☞ More on H.264

Missing audio packages

I have learned that at least two packages, SUNWaudioemu10k and SUNWaudiosolo, are not part of the "standard" ("entire?") install of OpenSolaris b131. If you're looking for either of these, you should do "pfexec pkg install SUNWaudiosolo" or "pfexec pkg install SUNWaudiosolo".

Hopefully we'll get this sorted out before the next official release.

Update: Apparently (according to the expert I talked to) this problem only affects systems updating with pkg image-update. If you install a fresh system, the audio packages should be installed.

Scalability FUD

Yesterday I saw yet another argument about the Linux vs. Solaris scalability debate. The Linux fans were loudly proclaiming that the claim of Solaris' superior scalability is FUD in the presence of evidence like the Cray XT class of systems which utilize thousands of processors in a system, running Linux.

The problem with comparing (or even considering!) the systems in the Top500 supercomputers when talking about "scalability" is simply that those systems are irrelevant for the typical "scalability" debate -- at least as it pertains to operating system kernels.

Irrelevant?! Yes. Irrelevant. Let me explain.

First, one must consider the typical environment and problems that are dealt with in the HPC arena. In HPC (High Performance Computing), scientific problems are considered that are usually fully compute bound. That is to say, they spend a huge majority of their time in "user" and only a minuscule tiny amount of time in "sys". I'd expect to find very, very few calls to inter-thread synchronization (like mutex locking) in such applications.

Second, these systems are used by users who are willing, expect, and often need, to write custom software to deal with highly parallel architectures. The software deployed into these environments is tuned for use in situations where the synchronization cost between processors is expected to be "relatively" high. Granted the architectures still attempt to minimize such costs, using very highly optimized message passing busses and the like.

Third many of these systems (most? all?) are based on systems that don't actually run a single system image. There is not a single universal addressable memory space visible to all processors -- at least not without high NUMA costs requiring special programming to deliver good performance, and frequently not at all. In many ways, these systems can be considered "clusters" of compute nodes around a highly optimized network. Certainly, programming systems like the XT5 is likely to be similar in many respects to programming software for clusters using more traditional network interconnects. An extreme example of this kind of software is SETI@home, where the interconnect (the global Internet) can be extremely slow compared to compute power.

So why does any of this matter?

It matters because most traditional software is designed without NUMA-specific optimizations, or even cluster-specific optimizations. More traditional software used in commercial applications like databases, web servers, business logic systems, or even servers for MMORPGs spend a much larger percentage of their time in the kernel, either performing some fashion of I/O or inter-thread communication (including synchronization like mutex locks and such.)

Consider a massive non-clustered database. (Note that these days many databases are designed for clustered operation.) In this situation, there will be some kind of central coordinator for locking and table access, and such, plus a vast number of I/O operations to storage, and a vast number of hits against common memory. These kinds of systems spend a lot more time doing work in the operating system kernel. This situation is going to exercise the kernel a lot more fully, and give a much truer picture of "kernel scalability" -- at least as the arguments are made by the folks arguing for or against Solaris or Linux superiority.

Solaris aficionados claim it is more scalable in handling workloads of this nature -- that a single SMP system image supporting traditional programming approaches (e.g. a single monolithic process made up of many threads for example) will experience better scalability on a Solaris system than on a Linux system.

I've not measured it, so I can't say for sure. But having been in both kernels (and many others), I can say that the visual evidence from reading the code is that Solaris seems like it ought to scale better in this respect than any of the other commonly available free operating systems. If you don't believe me, measure it -- and post your results online. It would be wonderful to have some quantitative data here.

Linux supporters, please, please stop pointing at the Top500 as evidence for Linux claims of superior scalability though. If there are some more traditional commercial kinds of single-system deployments that can support your claim, then lets hear about them!

Recovering an OpenSolaris system

Before I decided to blog this, I figured I would first search to check out if there are other blogs that discuss the techniques required to recover a system.  Turns out sriram blog talks about using beadm technique but here a bit more elaborate version.

Our local admin was trying to upgrade the AI server to a newer build.  The system was installed with OpenSolaris 2009.06 and he had published a number of install services.  However a simple pkg image-update failed miserably and so did pkg install SUNWipkg.  A quick investigation revealed that someone had added an older version 2008.11 version of the SUNWipkg(pkg verify is your friend). This normally would be possible,  but the user had added the SVR4 version of SUNWipkg.  So the system was not upgradable.  A reinstall was not the answer he wanted to hear.

So here's what we did:

1. Created a new beadm from the snapshot and activated it.

2. Rebooted to the new BE and mounted the old one.

3. pkg image updated the old BE.

4. Upon successful completion, activated the old BE and rebooted it.

These are the steps. All these commands were run as root.  The name of the BE was opensolaris.

1. beadm create -e opensolaris@install opensolaris-1

2. beadm activate opensolaris-1; reboot

3. (after logging in), beadm mount opensolaris /mnt

4. pkg -R /mnt SUNWipkg

5. pkg -R /mnt image-update

6. reboot.





Over Chasm

The waiting is over, deal is done.  Sun + Oracle = Oracle.   I am glad I made it across but it is sad to see old friends depart.  The move to Oracle will definitely require one to adapt to new ways which at times will be challenging.  But for me it's a marked timeline in my life.  I will be a bit nostalgic about the past, but it's time to move on and look at the future and hope its bright.

February 03, 2010

Linux & Solaris

What's the Future of Linux and Solaris at Oracle?: Larry Ellison: "We've been in the open source business a very long time. We've been a distributor of Apache and we have our own version of Linux  ... We have no problems having both Linux and Solaris and we want to make them both better ... I'm a Linux fan and if you want Linux we have the best Linux in the world. If you want UNIX, we have the best UNIX in the world."

Works for me. I already use both systems and participate in both communities. 

Web Server 7 Update 8 addresses critical Security Issues

Recently, Sun published Sun Alert for recently discovered security vulnerability within Sun Web Server and immediately released updates to Web Server 7 and Web Server 6.1 release train to address these discovered vulnerabilities. 

If you are running Sun Java System / iPlanet / Sun ONE Web Server, we strongly urge to upgrade to this latest update.