Sunday, March 21, 2010

a PowerShell provider for local Mercurial repositories

PowerShell and Providers

If someone is working on Windows then he or she should try PowerShell for command-line tasks and general administrative scripting. Comparisons between it and bash or zsh are instructive but ultimately unimportant because PowerShell excels at occupying a unique niche: it's a closed-source shell uniquely interwoven with the .Net platform and Microsoft products. (The level of marketing spin in its name is also unique, but doesn't having "power" in your name give off the impression that you're compensating for something?)

Yet its distinctiveness is combined with many borrowed ideas from *nix, one of which is the emulation of file-system interfaces for many kinds of data. In PowerShell these are "providers" that show up like disk drives (i.e. a name followed by a colon). "get-psprovider" produces a handy list that includes providers for aliases, the shell environment, variables, and the beloved (ha-ha) registry. The obvious advantage here is uniformity. Users can employ the same commands and pipelines regardless of the actual nature of the data source.

PowerShell's collection of providers is extensible. As someone who uses both it and Mercurial, I decided to make a read-only PowerShell provider for local Mercurial repositories. The Windows PowerShell 2.0 SDK contains a lot of samples and documentation for such projects. In fact, I got pretty far merely by lightly adapting the SDK's TemplateProvider and AccessDBSampleProvider files and following along in the instructions. The superclass implementations mostly sufficed.

Clearly, unlike an extension this provider doesn't add new features to Mercurial itself, and of course even thin translation layers have a cost however slim in speed and space. Its true purpose is to act as "glue" between the data in a Mercurial repository and the versatile capabilities of PowerShell. So the practical payoff is simply anything an inventive PowerShell user can imagine. While this hypothetical user could've accomplished his or her objective by concocting the "hg" commands directly and processing the output, going through this provider should be less work overall.

Usage

The project is at http://bitbucket.org/artvandalay/mercurialpsprovider/ . Installation consists of picking a directory in $env:PSModulePath, creating a "MercurialProvider" subdirectory, downloading the release DLL into this new subdirectory and finally entering (as usual, add this line to a Profile if you want to run it automatically on startup):
Import-Module MercurialProvider
To make a "drive" that exposes a repository (recall throughout that PowerShell is case-insensitive):
new-psdrive -psprovider Mercurial -name DriveName 
-root PathToRepositoryRoot
One way to specify the path is to first "cd RepositoryRoot" and then specify the root parameter as "(pwd)". (You probably know this, but "cd" and "pwd" are built-in aliases for the cmdlets "Set-Location" and "Get-Location". I prefer my shell commands to be short and cryptic so if I mention an unrecognizable command then try running it through "gi alias:Cmd". For instance, "gi alias:gi").

Like with any provider, removing the drive is "remove-psdrive DriveName", and all it does is drop the provider/drive name for the repository; the cmdlets for this provider never modify the Mercurial repository in any way. But whenever the repository changes, perhaps through a regular hg commit or rebase, commands may give incorrect output because of the 10-changeset "cache" that the drive keeps to significantly speed up operation. This even applies to a simple commit to the default branch, which changes the meaning of "DriveName:\default", the changeset identified by "default". The cache could be cleared by removing and recreating the drive, but it's less trouble to clear it with
cli DriveName:
The repository drive is a hierarchy of three levels including the repository/drive "root" itself. The middle level is named branches, and the last level is changesets. The longest allowed path therefore looks like "DriveName:\NamedBranch\ChangesetIdentifier". The changeset identifier can be in any form accepted by Mercurial, revision number or short hash or tag, since in the end the provider passes it on to the Mercurial command line as entered. And if the path terminates with a changeset identifier then the branch name portion is ignored, although for quicker response the branch name should still be a valid one like "default".

Cmdlets that retrieve items or content should work fine. Each changeset item has the expected properties such as repository revision number, description, author, etc. that show up in output from "hg log". Each changeset's content is the patch lines in the output from "hg di -c". In PowerShell-speak you could get the 1) names of all branches or 2) the "tipmost" changesets of the branches (notice that PowerShell named parameters are case-insensitive and only must be long enough to eliminate ambiguity so -Name can be -n by sacrificing readability):
1) ls DriveName: -n  
2) ls DriveName:
the 1) tipmost changeset of one branch or 2) that changeset's patch:
1) gi DriveName:\BranchName  
2) gc DriveName:\BranchName
likewise for any known changeset
1) gi DriveName:\AnyBranchName\ChangesetIdentifier   
2) gc DriveName:\AnyBranchName\ChangesetIdentifier
When obtaining a list of changeset items, use the PowerShell-standard "Filter" parameter to specify extra arguments to "hg log" where feasible; as PowerShell Help states, filtering is more efficient when done at the source rather than by a long pipeline of cmdlets. Also, consider using the command in an assignment to a PowerShell variable ("$myvar = ls [...]") so you can reuse the results without rerunning the command. For example, getting 1) all commits by Fred, 2) all commits by Barney to branch "dontuse" with a message containing "oops" (-r is short for "Recurse",-fi is short for "Filter"):
1) ls DriveName: -r -fi "-u Fred"  
2) ls DriveName:\dontuse -fi "-u Barney -k oops"
Innards

One of the lesser-publicized details about current Mercurial is its XML style for the command line, requested through "--style xml" in the same way one requests "--style compact" or "--style changelog". Thanks to this XML output option and the .Net framework's abilities to run processes and slice-'n-dice XML (all hail XPathNavigator), the rest of the provider is relatively uninteresting path-handling code lifted directly out of AccessDBSampleProvider.

A detail about provider development that surprised me is the quantity of method calls that result from a single PowerShell command, particularly calls to "ItemExists". This is partly why a drive-internal cache, a queue of most-recently retrieved ChangesetInfo objects, is so vital to execution speed. With the cache, the provider only needs to execute "hg" once per PowerShell command (assuming it's not one of the more complicated commands...). The cache speeds up subsequent commands that request the same changesets, too.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.