I Work On Software: namespace

Showing posts with label namespace. Show all posts

Tuesday, March 3, 2009

Importing DTD to XSD

Update: Using the trick below to handle the "lang" attribute has been nothing but a headache. BizTalk apparently understands that http://www.w3.org/XML/1998/namespace is a special namespace, and that the xml: prefix is reserved for it, but that doesn't stop a BizTalk map from generating something like "xmlns:ns0="http://www.w3.org/XML/1998/namespace" ns0:lang="EN"". What I ended up doing was creating a new attribute named "lang" in the local namespace and using a custom pipeline component on the send side to tack the "xml:" prefix onto it as it goes out the door. I'm told that adding the "xml:" namespace declaration to the schema that uses the xml:lang attribute in a text editor works, but it gets erased next time you open it and save it in the BizTalk schema editor. I don't consider this a workaround.

I just spent a few hours fumbling around with the DTD -> XSD import tool included in the BizTalk Visual Studio tools, and I thought I'd share my experience. There are a number of small, helpful tips here regarding the tool itself, as well as manipulating schemas and playing around with namespaces (one important one in particular).

The tool is available if you right-click a BizTalk project in Solution Explorer and do Add> Add Generated Items... and use the Generate Schemas wizard. There are three options shown: DTD Schema, XDR Schema, and Well-Formed XML. Two of these tools, DTD and Well-Formed XML, are not available right off of the bat - you need to run a couple of script files to install them.

Before we even get there though, there's an important hotfix here that you must install. The library that contains the DTD converter is completely broken out of the box and needs to be replaced with this hotfix before running the script file to install it.

Once you've obtained and run the hotfix, navigate to %programfiles%\Microsoft BizTalk Server 2006\SDK\Utilities\Schema Generator and run the two .vbs scripts there, InstallDTD and InstallWFX. These scripts will copy the DLLs in that folder (one of which was just updated by the hotfix) to the appropriate location where they can be used by Visual Studio. You may need to restart Visual Studio after running the scripts.

Head back to Add> Add Generated Items...> Generate Schemas and feed the DTD -> XSD tool a DTD schema. What you'll get is a big jumbly mess of node definitions, all at the root level. See my post on root_reference and displayroot_reference for more information about this and how to "fix" it.

So now I've got my DTD imported, but I've got one more problem: The root node on my schema, "Document", has an attribute with the namespace "xml". This isn't represented in the DTD at all, so its understandable that it isn't reflected in the XSD schema, but the sample documents I have all show that the "lang" attribute uses the "xml" namespace prefix. My schema validates just fine, but trying to validate my sample documents fails.

The most important thing to know here is that the "xml" namespace is a special case - XML parsers should universally understand that the "xml" prefix is implicitly reserved to resolve to the namespace "http://www.w3.org/XML/1998/namespace". Defined in this namespace are a couple of attributes, one of them being "lang". So if this namespace is supposed to be implicitly understood, why is schema validation getting hung up on it?

The reason is that BizTalk and its tools understand the namespace, but they don't inherently know what's contained in it. Our schema needs to reference another schema that defines the types in the http://www.w3.org/XML/1998/namespace namespace. If I was to take one of my sample documents and import it using the Well-Formed XML -> XSD generator (a great tool, but be careful - it can only define the nodes present in the particular sample document you use), I would get a second schema, referenced in the first, that defines the attributes available in http://www.w3.org/XML/1998/namespace. The Well-Formed XML -> XSD generator knows about this namespace, and knows it needs a schema that defines those types. Unfortunately, the way it resolves the problem really isn't the best way of going about it - if you deploy the project as-is, you'll get a warning that a schema is already deployed that defines types in http://www.w3.org/XML/1998/namespace.

The BizTalk product team has already defined a schema that contains the types in the http://www.w3.org/XML/1998/namespace. The schema is called "BTS.xml", and it is located in the Microsoft.BizTalk.GlobalPropertySchemas assembly, which by default is referenced in every BizTalk project and deployed to the BizTalk.System application. To reference the schema, open your document schema, click the "Schema" root node, and in the Properties pane, click the Imports property. This will surface the hidden ellipsis button - click this to open the Imports dialog. Select "XSD Import" in the dropdown and click Add to open the BizTalk Type Picker. In the Type Picker, select References> Microsoft.BizTalk.GlobalPropertySchemas> Schemas> BTS.xml and click OK. BTS.xml will be added as an XSD import with a default namespace like "ns0". This is fine, but it is more appropriate to change the prefix to "xml", which is specifically reserved for defining this namespace (note: by using the prefix "xml", no new prefix/namespace declaration will appear in the root xs:schema node, since the "xml" prefix is implicitly understood. Using any other prefix will cause a new namespace declaration to appear).

The reference has been added, but there's still one more thing to fix: the schema still thinks that the "lang" attribute is part of its namespace, not the http://www.w3.org/XML/1998/namespace. To fix this, close the schema and re-open it using the XML editor. Scroll down to where the "lang" attribute is defined and replace the entire type definition with the following: <xs:element ref="xml:lang" use="required">. You can change the value of the "use" attribute depending on your needs, but the key is that you are using "ref" instead of "name", and you have specified the "xml" prefix.

In the XML editor, you will get a blue underline with the message that the attribute is not defined, but this is because the editor can't get to the schema since it's referenced in a remote assembly. Now, if I validate a sample document against the schema, it works perfectly. When I deploy this project to BizTalk, it will automatically reference the BTS.xml schema that has already been deployed.

Posted by nw at 10:46 AM 3 comments:

Labels: BizTalk, dtd, import, namespace, schema, xml, xsd

Wednesday, August 1, 2007

On BizTalk, Assemblies and Deployment, pt. 1

This is the first article in a multi-part series about assemblies and how they relate to BizTalk. This part covers what you need to know about assemblies in general in order to understand what they do and how to deploy them properly. I don't endeavor to cover every detail of .NET assemblies and how they are put together, but this hits most of the big topics. It's a bit long, so feel free to just skim.

The concept of assemblies and the GAC can be very confusing to a BizTalk developer who hasn't interacted with these ideas in .NET before. Fortunately, if you don't want to spend the time, there's no need to have a complete, detailed understanding of what assemblies are and how they work. By grasping a few simple ideas, working with assemblies in BizTalk becomes very easy, and gives you great insight into how BizTalk and .NET actually work with assemblies.

To get a good overall idea of what an assembly is, Wikipedia has a fairly descriptive and high-level article about it: http://en.wikipedia.org/wiki/.net_assembly.

Here's the short-short version: an assembly contains your compiled code, resources, and some metadata about that code and those resources. That's about it. Typically, a "project" in Visual Studio (which can consist of multiple code files, resources, references to other assemblies, etc.) equates to an assembly - a single .dll or .exe file. When you build that project it compiles your code and puts all of those other resources together in a big pile and stuffs it into a .dll or .exe. Additionally, it creates an "index" of machine-readable metadata that is incredibly useful - it contains all the information about what callable methods are in that code, it powers Visual Studio's IntelliSense functionality, and generally provides enough information to let everything outside of the assembly know what it does - just not how exactly it does it. Whenever you create any kind of application, all of the code and resources used to run it are in one or more .dll or .exe files. If you make a lot of little one-off "toolbelt" applications that are contained in a single Visual Studio project, that .exe that you get after you build has all of the code in it. If you were to spread the code over multiple projects and add the appropriate references within those projects, then when you ran the .exe and did something in your application that required code in one of those assemblies, the assembly it needs must be in the right place or the application will throw an exception. What is "the right place" for an assembly? I'll discuss that momentarily.

The next important bit about assemblies is how they are named. A name might seem trivial, but it contains a lot of information and is a guaranteed unique identifier for the assembly. The full name of an assembly has four parts: The short name, the version, the culture, and the public key token.

The short name is typically the name of the assembly without the file extension, e.g. NW.Applications.MyAssembly. This short name is also typically used as the namespace of all of the modules in the assembly - a namespace is simply something that helps to uniquely identify a module. There's lots of code in the universe, and it's highly likely that for every module, there's another module out there with the same name - let's take a hypothetical module called NumberCruncher. The namespace is like a surname that gets tacked onto it, usually containing a company name or a product name, that guarantees that those two modules with the same name are still unique - NumberCruncher in the NW.Applications.MyAssembly namespace is different from the NumberCruncher module in the ABC.Software.EnterpriseApps namespace. For this reason, many assembly names are like my sample one above: multiple tokens separated with dots, each token representing some kind of arbitrary hierarchy that's unique to my company or organization.
The version number of the assembly, presented like so: 1.0.4.0. The names for each of those values are "major version," "minor version", "build" and "revision." Major version is typically used to signify the product version: Is this SuperToolbox 2 or 3? Minor version represents things like service packs or patches. Build number is typically incremented by the developer every time the assembly is built - this number is often represented by four digits (filling in zeroes if necessary) because builds happen thousands of times. The last number can be used for things like hot fixes. Some developers may increment the last two numbers based on the date and time the build was completed at. Here's the important bit about versions: Two versions of the same assembly can co-exist within the GAC (keep reading), and are unique! Note that an assembly has an "assembly version" and a "file version." The one that really counts here is the assembly version. These two version numbers don't necessarily have to always be the same: one way to manage version numbers is to only increment the file version while you are developing. This helps avoid confusion with version numbers when you keep redeploying your code for testing - the assembly version always stays the same, but the file version (which does not make an assembly unique, but can still be viewed) can be used to determine exactly what build you are using.
Culture provides information about the language that the assembly is presented in (human language, not programming language). This will typically be "neutral."
Public key token: a 16-character hexadecimal string, the public half of a public-key cryptography pair. I won't discuss the details of how public-key cryptography works here, but the short and long of it is that only people who have the private half of the key can generate assemblies that have that public half of the key. This token essentially ensures that the assembly has come from a certain author and is guaranteed authentic. An assembly does not have to be given a public key token. If it has one, it can be called a "strongly-named assembly." Assigning a strong name is done in Visual Studio using a .snk file, which can be generated by using a Visual Studio command-line tool. An assembly must be strongly-named to be placed in the GAC.

So... what's the GAC? The GAC is the Global Assembly Cache, a universal repository of strong-named assemblies on your machine. Any assembly placed here is globally accessible and can be referenced easily and shared by multiple applications. An assembly can co-exist here with other assemblies that have the same short name, so long as they have different versions. Try browsing to C:\Windows\Assembly (the Assembly folder in your Windows install directory, wherever that might be): what you will see isn't actually a folder on your disk, but a specially-crafted view of all the assemblies in the GAC. There's a folder structure under there, but it's generally irrelevant to human beings. Try right-clicking on a few assemblies and click Properties to see interesting info about them.

This is where I get to the part about where assemblies need to be placed so they can be used. The part of .NET that finds assemblies when it needs them is called Fusion. If you are running an application that has references to other assemblies and it needs one of those other assemblies, Fusion kicks in and looks for assemblies in the following places in this order (this is direct from the .NET Assembly article on Wikipedia):

If the assembly is strongly named it will first look in the GAC (your app knows if the assembly is strongly named because it captures this information from the assembly when you add a reference to it in your project).
Fusion will then look for redirection information in the application's configuration file. If the library is strongly named then this can specify that another version should be loaded, or it can specify an absolute address of a folder on the local hard disk, or the URL of a file on a web server. If the library is not strongly named, then the configuration file can specify a subfolder beneath the application folder to be used in the search path.
Fusion will then look for the assembly in the application folder with either the extension .exe or .dll.
Fusion will look for a subfolder with the same name as the short name (PE file name) of the assembly and then looks for the assembly in that folder with either the extension .exe or .dll.

So, as you can see, you can essentially put an assembly anywhere as long as you configure your application properly. However, for ease of use and universal understanding, most people will either GAC their assemblies or put them in the application folder. Ah, I almost forgot to mention how to GAC an assembly: use gacutil.exe (the path should already exist in a Visual Studio Tools command prompt; on my machine it's located at C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin\gacutil.exe) with the /i switch and the path to the assembly to GAC. You can use /if to "force" the installation, for example if the assembly already exists and you are reinstalling.

That's it for the lecture on assemblies. Next post I'll talk about how assemblies and the GAC work with BizTalk - how to deploy and redeploy assemblies, what needs to be GACed and what doesn't, where to put things, etc.

Posted by nw at 8:25 AM

Labels: .NET, assemblies, assembly, deploy, GAC, gacutil, namespace, version

I Work On Software

Tuesday, March 3, 2009

Importing DTD to XSD

Wednesday, August 1, 2007

On BizTalk, Assemblies and Deployment, pt. 1

About This Site

Archive