Skip to main content
Code Review

Return to Question

deleted 70 characters in body
Source Link
Jamal
  • 35.2k
  • 13
  • 134
  • 238

This is intended to be part of a generalised solution to the problem of converting any (with some minor restrictions) CSV content into XML. The restrictions on the CSV, and the purpose of the schema should be apparantapparent from the annotations.

  1. Is it suitable for non-destructive round-trip transformations from csv.csv to xml.xml and back again to csv.csv?
  2. Is the schema clear and readable enough?
  3. Is there a simpler way to do the same thing?
  4. Are there any obvious errors?

Thanks in advance if you can contribute your thoughts. This schema, as well as associated XSLT style-sheets, when polished, will be put to good use in the public domain with a creative commons license.

use casesUse cases

caseCase 1 Lines

Lines ending in CR LF, including the last line. The CSV:

The CSV:

Case 2 As

As case 1, but with line endings as just LF.

Case 3 Lines

Lines ending in CR LF, including the last line.

Case 4 Same as case 3, but last line ends in eof. In other words, the last byte of the file is the UTF-8 code for 'w'

Same as case 3, but last line ends in eof. In other words, the last byte of the file is the UTF-8 code for 'w'.

Same XML!

Case 5 Empty

Empty file. The size of the file is zero.

Case 6. The file has one byte: the UTF-8 code for LF.

The file has one byte: the UTF-8 code for LF.

Case 7. CVS

CVS encoding errors

caseCase 8. Specific application where csv looks like

Specific application where CSV looks like:

and inIn this specific application, the header is always there there, with columns in the specified order. In this specific application:

  1. Step 1. Transform cvs.cvs into xcvs.xcvs, using a generic library XSLT style-sheet.
  2. Step 2. Transform xcsv.xcsv into the application-specific structure as above, using a trivial XSLT style-sheet.

caseCase 9 This use case demonstrates the necessary XML encoding on a lexical level for & and < and raw data. No special encoding is required at the XML parser API level.

This use case demonstrates the necessary XML encoding on a lexical level for & and < and raw data. No special encoding is required at the XML parser API level.

The csvCSV:

This is intended to be part of a generalised solution to the problem of converting any (with some minor restrictions) CSV content into XML. The restrictions on the CSV, and the purpose of the schema should be apparant from the annotations.

  1. Is it suitable for non-destructive round-trip transformations from csv to xml and back again to csv?
  2. Is the schema clear and readable enough?
  3. Is there a simpler way to do the same thing?
  4. Are there any obvious errors?

Thanks in advance if you can contribute your thoughts. This schema, as well as associated XSLT style-sheets, when polished, will be put to good use in the public domain with a creative commons license.

use cases

case 1 Lines ending in CR LF, including the last line. The CSV:

Case 2 As case 1, but with line endings as just LF.

Case 3 Lines ending in CR LF, including the last line.

Case 4 Same as case 3, but last line ends in eof. In other words, the last byte of the file is the UTF-8 code for 'w'

Same XML!

Case 5 Empty file. The size of the file is zero.

Case 6. The file has one byte: the UTF-8 code for LF.

Case 7. CVS encoding errors

case 8. Specific application where csv looks like

and in this specific application, the header is always there, with columns in the specified order. In this specific application

  1. Step 1. Transform cvs into xcvs, using a generic library XSLT style-sheet.
  2. Step 2. Transform xcsv into the application-specific structure as above, using a trivial XSLT style-sheet.

case 9 This use case demonstrates the necessary XML encoding on a lexical level for & and < and raw data. No special encoding is required at the XML parser API level.

The csv:

This is intended to be part of a generalised solution to the problem of converting any (with some minor restrictions) CSV content into XML. The restrictions on the CSV, and the purpose of the schema should be apparent from the annotations.

  1. Is it suitable for non-destructive round-trip transformations from .csv to .xml and back again to .csv?
  2. Is the schema clear and readable enough?
  3. Is there a simpler way to do the same thing?
  4. Are there any obvious errors?

This schema, as well as associated XSLT style-sheets, when polished, will be put to good use in the public domain with a creative commons license.

Use cases

Case 1

Lines ending in CR LF, including the last line.

The CSV:

Case 2

As case 1, but with line endings as just LF.

Case 3

Lines ending in CR LF, including the last line.

Case 4

Same as case 3, but last line ends in eof. In other words, the last byte of the file is the UTF-8 code for 'w'.

Same XML!

Case 5

Empty file. The size of the file is zero.

Case 6.

The file has one byte: the UTF-8 code for LF.

Case 7

CVS encoding errors

Case 8

Specific application where CSV looks like:

In this specific application, the header is always there, with columns in the specified order:

  1. Step 1. Transform .cvs into .xcvs, using a generic library XSLT style-sheet.
  2. Step 2. Transform .xcsv into the application-specific structure as above, using a trivial XSLT style-sheet.

Case 9

This use case demonstrates the necessary XML encoding on a lexical level for & and < and raw data. No special encoding is required at the XML parser API level.

The CSV:

please review is not needed
Source Link
Caridorc
  • 28k
  • 7
  • 54
  • 137

Please review my XML Schema for an XML representation of CSV

Please review my XML Schema for an XML representation of CSV. This is intended to be part of a generalised solution to the problem of converting any (with some minor restrictions) CSV content into XML. The restrictions on the CSV, and the purpose of the schema should be apparant from the annotations.

Please review my XML Schema for an XML representation of CSV

Please review my XML Schema for an XML representation of CSV. This is intended to be part of a generalised solution to the problem of converting any (with some minor restrictions) CSV content into XML. The restrictions on the CSV, and the purpose of the schema should be apparant from the annotations.

XML Schema for an XML representation of CSV

This is intended to be part of a generalised solution to the problem of converting any (with some minor restrictions) CSV content into XML. The restrictions on the CSV, and the purpose of the schema should be apparant from the annotations.

Add use cases
Source Link
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:xcsv="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 elementFormDefault="qualified"
 targetNamespace="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 xmlns:xcsv="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 version="1.0">
 <xs:import
 namespace="http://www.w3.org/XML/1998/namespace" 
 schemaLocation="xml.xsd"/>
 <xs:element name="comma-separated-single-line-values">
 <xs:annotation><xs:documentation xml:lang="en">
 This schema describes an XML representation of a subset of csv content.
 The format described by this schema, here-after referred to as "xcsv"
 is part of a generalised solution to the problem of converting
 general csv files into suitable XML, and the reverse transform.

 The restrictions on the csv content are:
 * The csv file is encoded either in UTF-8 or UTF16. If UTF-16, a BOM
 is required.
 * The cell values of the csv may not contain the CR or LF characters.
 Essentially, we are restricted to single-line values.

 The xcsv format was developed by Sean B. Durkin&#x85;
 www.seanbdurkin.id.au
 </xs:documentation></xs:annotation>

 <xs:complexType>
 <xs:sequence>
 
 <xs:element name="notice" type="xcsv:notice-en" minOccurs="0" maxOccurs="1"/>
 <xs:annotation><xs:documentation xml:lang="en">
 This is an optional element at the top that looks like the example.
 </xs:documentation>
 <xs:appinfo>
 <example>
 <notice xml:lang="en">The xcsv format was developed by Sean B. Durkin&#x85;www.seanbdurkin.id.au</notice>
 </example>
 </xs:appinfo></xs:annotation>
 <xs:complexType name="notice-en">
 <xs:simpleContent>
 <xs:extension base="xcsv:notice-content-en">
 <xs:attribute ref="xml:lang" use="required" fixed="en" />
 </xs:extension>
 </xs:simpleContent>
 </xs:complexType>
 <xs:simpleType name="notice-content-en">
 <xs:restriction base="xs:string">
 <xs:enumeration value="The xcsv format was developed by Sean B. Durkin&#x85;www.seanbdurkin.id.au"/>
 </xs:restriction>
 </xs:simpleType>
 <xs:element />

 <xs:element ref="xcsv:notice" minOccurs="0" maxOccurs="1"/>
 <xs:element name="row" minOccurs="0" maxOccurs="unbounded">
 <xs:annotation><xs:documentation xml:lang="en">
 A row element represents a "row" or "line" in the csv file. Rows contain values.
 </xs:documentation>
 <xs:appinfo>
 <example>
 <csv-line>apple,"banana","red, white and blue","quote this("")"</csv-line>
 <row>
 <value>apple</value>
 <value quoted="true">banana<<value>banana</vakye>value>
 <value>red, white and blue</value>
 <value quoted="true">quote<value>quote this(")</value>
 </row>
 </example>
 </xs:appinfo>
 </xs:annotation> 
 <xs:choice minOccurs="1" maxOccurs="unbounded">
 <xs:annotation><xs:documentation xml:lang="en">
 Empty rows are not possible in csv. We must have at least one value or one error.
 </xs:documentation></xs:annotation>
 <xs:element name="value" type="xcsv:single-line-with-quoted">name="value">
 <xs:annotation><xs:documentation xml:lang="en">
 A value element represents a decoded (model) csv "value" or "cell".
 If the encoded value in the lexical csv was of a quoted form, then
 the element content here is the decoded or model form. In other words,
 the delimiting double-quote marks are striped out and the internal
 escaped double-quotes are de-escaped.
 </xs:documentation></xs:annotation>
 <xs:simpleType name="single-line">simpleType>
 <xs:restriction base="xs:string">
 <xs:pattern value="[^\n]*"/>
 <xs:whiteSpace value="preserve"/>
 <xs:annotation><xs:documentation xml:lang="en">
 Cell values must fit this pattern because of the single-line restriction
 that we placed on the csv values.
 </xs:documentation></xs:annotation>
 </xs:restriction>
 </xs:simpleType>
 <xs:complexType name="single-line-with-quoted">
 <xs:simpleContent>
 <xs:extension base="xcsv:single-line">
 <xs:attribute name="quoted" type="xs:boolean" default="false"/>
 <xs:annotation><xs:documentation xml:lang="en">
 This attribute is True if the original lexical csv value was encoded
 in the quoted form, and False otherwise.
 
 If the attribute is not present, and the value contains a comma,
 or begins with a double-quote, the default for this attribute is to
 be deemed as True.
 
 If the attribute is not present, but the value neither contains a
 comma nor begins with a double-quote, then the default for this 
 attribute is to be deemed as False.
 </xs:documentation></xs:annotation>
 </xs:extension>
 </xs:simpleContent>
 </xs:complexType>
 </xs:element>
 <xs:group ref="xcsv:errorGroup">
 <xs:annotation><xs:documentation xml:lang="en">
 An error can be recorded here as a child of row, if there was an encoding
 error in the csv for that row.
 </xs:documentation></xs:annotation>
 </xs:group>
 </xs:choice>
 </xs:element>

 <xs:group ref="xcsv:errorGroup">
 <xs:annotation><xs:documentation xml:lang="en">
 An error can be recorded here as a child of the comma-separated-values element,
 if there was an i/o error in the transformational process. For example:
 CSV file not found.
 </xs:documentation></xs:annotation>
 </xs:group>
 </xs:sequence>
 <xs:attribute name="xcsv-version" type="xs:decimal"
 fixed="1.0" use="required"/>
 </xs:complexType>
 </xs:element>
 <xs:element name="comma-separated-multiline-values">
 <xs:annotation><xs:documentation xml:lang="en">
 Similar to xcsv:comma-separated-multi-line-values but allows multi-line values.
 </xs:documentation></xs:annotation>
 <xs:complexType>
 <xs:sequence>
 <xs:element ref="xcsv:notice" minOccurs="0" maxOccurs="1"/>
 <xs:element name="row" minOccurs="0" maxOccurs="unbounded">
 <xs:choice minOccurs="1" maxOccurs="unbounded">
 <xs:element name="value">
 <xs:simpleType>
 <xs:restriction base="xs:string">
 <xs:whiteSpace value="preserve"/>
 </xs:restriction>
 </xs:simpleType>
 </xs:element>
 <xs:group name="errorGroup">ref="xcsv:errorGroup">
 </xs:group>
 </xs:choice>
 </xs:element>
 <xs:group ref="xcsv:errorGroup">
 </xs:group>
 </xs:sequence>
 <xs:attribute name="xcsv-version" type="xs:decimal"
 fixed="1.0" use="required"/>
 </xs:complexType>
 </xs:element>
 <xs:element name="notice" type="xcsv:notice-en" />
 <xs:annotation><xs:documentation xml:lang="en">
 This is an optional element below comma-separated-single-line-values or
 comma-separated-multiline-values that looks like the example.
 </xs:documentation>
 <xs:appinfo>
 <example>
 <notice xml:lang="en">The xcsv format was developed by Sean B. Durkin&#x85;www.seanbdurkin.id.au</notice>
 </example>
 </xs:appinfo></xs:annotation>
 <xs:complexType name="notice-en">
 <xs:simpleContent>
 <xs:extension base="xcsv:notice-content-en">
 <xs:attribute ref="xml:lang" use="required" fixed="en" />
 </xs:extension>
 </xs:simpleContent>
 </xs:complexType>
 <xs:simpleType name="notice-content-en">
 <xs:restriction base="xs:string">
 <xs:enumeration value="The xcsv format was developed by Sean B. Durkin&#x85;www.seanbdurkin.id.au"/>
 </xs:restriction>
 </xs:simpleType>
 <xs:element />
 <xs:group name="errorGroup">
 <xs:annotation><xs:documentation xml:lang="en">
 This is an error node/message in one or more languages.
 </xs:documentation>
 <xs:appinfo>
 <example>
 <error error-code="2">
 <message xml:lang="en">Quoted value not terminated.</message>
 <message xml:lang="ru">Quotedlang="ru">Цитируется valueзначение notне terminatedпрекращается.</message>
 <error-data>"</error-data>
 </error>
 </example> 
 <example>
 <error error-code="3">
 <message xml:lang="en">Quoted value incorrectly terminated.</message>
 <message xml:lang="ru">Quotedlang="ru">Цитируется valueзначение incorrectlyнеправильно terminatedпрекращено.</message>
 </error>
 </example>
 </xs:appinfo> 
 </xs:annotation>
 <xs:element name="error">
 <xs:element name="message" minOccurs="1" maxOccurs="unbounded" type="xcsv:string-with-lang" />
 <xs:annotation><xs:documentation xml:lang="en">
 Although there can be multiple messages, there should only be at most one per language.
 </xs:documentation></xs:annotation>
 <xs:element name="error-data" minOccurs="0" maxOccurs="1" >
 <xs:simpleContent>
 <xs:restriction base="xs:string">
 <xs:whiteSpace value="preserve"/>
 </xs:restriction>
 </xs:simpleContent>
 </xs:element>
 <xs:attribute name="error-code" type="xs:integer"positiveInteger" default="1" />
 <xs:annotation><xs:documentation xml:lang="en">
 Each different kind of error should be associated with a unique error code.
 A map for the error codes is outside the scope of this schema, except to say the following:
 * zero (0) is not permitted as an error code.
 * one (1) means a general or uncategorised error. (Try to avoid this!)
 </xs:documentation></xs:annotation>
 </xs:element>
 </xs:group>
 <xs:complexType name="string-with-lang">
 <xs:annotation><xs:documentation xml:lang="en">
 This is an element with text content in some language as indicated
 by the xml:lang attribute.
 </xs:documentation></xs:annotation>
 <xs:simpleContent>
 <xs:extension base="xs:string">
 <xs:attribute ref="xml:lang" use="required" default="en" />
 </xs:extension>
 </xs:simpleContent>
 </xs:complexType>
</xs:schema>

use cases

case 1 Lines ending in CR LF, including the last line. The CSV:

1st name,2nd name
Sean,Brendan,"Durkin"
""","""
<This is a place-marker for an empty row>
"",

The XML equivalent (schema valid):

<xcsv:comma-separated-values
 xmlns:xcsv="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 xmlns:xml="http://www.w3.org/XML/1998/namespace"
 xcsv-version="1.0">
 <xcsv:notice xml:lang="en">The xcsv format was developed by Sean B. Durkin&#x85;www.seanbdurkin.id.au</xcsv:notice>
 <xcsv:row>
 <xcsv:value>1st name</xcsv:value> <xcsv:value>2nd name</xcsv:value>
 </xcsv:row>
 <xcsv:row>
 <xcsv:value>Sean</xcsv:value> <xcsv:value>Brendan</xcsv:value> <xcsv:value>Durkin</xcsv:value>
 </xcsv:row>
 <xcsv:row>
 <xcsv:value>","</xcsv:value>
 </xcsv:row>
 <xcsv:row>
 <xcsv:value />
 </xcsv:row>
 <xcsv:row>
 <xcsv:value /> <xcsv:value />
 </xcsv:row>
</xcsv:comma-separated-values>

Case 2 As case 1, but with line endings as just LF.

XML as case 1.

Case 3 Lines ending in CR LF, including the last line.

The CSV:

Fruit,Colour
Banana,Yellow

The XML equivalent (schema valid):

<xcsv:comma-separated-values
 xmlns:xcsv="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 xmlns:xml="http://www.w3.org/XML/1998/namespace"
 xcsv-version="1.0">
 <xcsv:row>
 <xcsv:value>Fruit</xcsv:value> <xcsv:value>Colour</xcsv:value>
 </xcsv:row>
 <xcsv:row>
 <xcsv:value>Banana</xcsv:value> <xcsv:value>Yellow</xcsv:value>
 </xcsv:row>
</xcsv:comma-separated-values>

Case 4 Same as case 3, but last line ends in eof. In other words, the last byte of the file is the UTF-8 code for 'w'

Same XML !

Case 5 Empty file. The size of the file is zero.

Valid XML instance:

<xcsv:comma-separated-values
 xmlns:xcsv="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 xcsv-version="1.0" />

Case 6. The file has one byte: the UTF-8 code for LF.

CSV:

LF

Valid XML instance:

Same XML as case 5!

Case 7. CVS encoding errors

The CSV (not valid):

Fruit,"Colour
Banana,"Yell"ow

The valid XML instance:

<xcsv:comma-separated-values
 xmlns:xcsv="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 xmlns:xml="http://www.w3.org/XML/1998/namespace"
 xcsv-version="1.0">
 <xcsv:row>
 <xcsv:value>Fruit</xcsv:value>
 <xcsv:error error-code="2">
 <xcsv:message xml:lang="en">Quoted value not terminated.</xcsv:message>
 <xcsv:error-data>"</xcsv:error-data>
 </xcsv:error>
 <xcsv:value>Colour</xcsv:value>
 </xcsv:row>
 <xcsv:row>
 <xcsv:value>Banana</xcsv:value>
 <xcsv:error error-code="3">
 <xcsv:message xml:lang="en">Quoted value incorrectly terminated.</xcsv:message>
 <xcsv:error-data>"</xcsv:error-data>
 </xcsv:error>
 <xcsv:value>Yell"ow</xcsv:value>
 </xcsv:row>
</xcsv:comma-separated-values>

case 8. Specific application where csv looks like

1st name,2nd name
Sean,Durkin
"Peter","Pan"

and in this specific application, the header is always there, with columns in the specified order. In this specific application

<people>
 <person first-name="Sean" first-name="Durkin" />
 <person first-name="Peter" first-name="Pan" />
</people>
  1. Step 1. Transform cvs into xcvs, using a generic library XSLT style-sheet.
  2. Step 2. Transform xcsv into the application-specific structure as above, using a trivial XSLT style-sheet.

case 9 This use case demonstrates the necessary XML encoding on a lexical level for & and < and raw data. No special encoding is required at the XML parser API level.

The csv:

 Character,Name
 &,Ampersand
 <,Less than

The equivalent schema-valid XML instance:

<xcsv:comma-separated-values
 xmlns:xcsv="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 xmlns:xml="http://www.w3.org/XML/1998/namespace"
 xcsv-version="1.0">
 <xcsv:row>
 <xcsv:value>Character</xcsv:value> <xcsv:value>Name</xcsv:value>
 </xcsv:row>
 <xcsv:row>
 <xcsv:value>&amp;</xcsv:value> <xcsv:value>Ampersand</xcsv:value>
 </xcsv:row>
 <xcsv:row>
 <xcsv:value>&lt;</xcsv:value> <xcsv:value>Less than</xcsv:value>
 </xcsv:row>
</xcsv:comma-separated-values>
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 elementFormDefault="qualified"
 targetNamespace="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 xmlns:xcsv="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 version="1.0">
 <xs:import
 namespace="http://www.w3.org/XML/1998/namespace" 
 schemaLocation="xml.xsd"/>
 <xs:element name="comma-separated-values">
 <xs:annotation><xs:documentation xml:lang="en">
 This schema describes an XML representation of a subset of csv content.
 The format described by this schema, here-after referred to as "xcsv"
 is part of a generalised solution to the problem of converting
 general csv files into suitable XML, and the reverse transform.

 The restrictions on the csv content are:
 * The csv file is encoded either in UTF-8 or UTF16. If UTF-16, a BOM
 is required.
 * The cell values of the csv may not contain the CR or LF characters.
 Essentially, we are restricted to single-line values.

 The xcsv format was developed by Sean B. Durkin&#x85;
 www.seanbdurkin.id.au
 </xs:documentation></xs:annotation>

 <xs:complexType>
 <xs:sequence>
 
 <xs:element name="notice" type="xcsv:notice-en" minOccurs="0" maxOccurs="1"/>
 <xs:annotation><xs:documentation xml:lang="en">
 This is an optional element at the top that looks like the example.
 </xs:documentation>
 <xs:appinfo>
 <example>
 <notice xml:lang="en">The xcsv format was developed by Sean B. Durkin&#x85;www.seanbdurkin.id.au</notice>
 </example>
 </xs:appinfo></xs:annotation>
 <xs:complexType name="notice-en">
 <xs:simpleContent>
 <xs:extension base="xcsv:notice-content-en">
 <xs:attribute ref="xml:lang" use="required" fixed="en" />
 </xs:extension>
 </xs:simpleContent>
 </xs:complexType>
 <xs:simpleType name="notice-content-en">
 <xs:restriction base="xs:string">
 <xs:enumeration value="The xcsv format was developed by Sean B. Durkin&#x85;www.seanbdurkin.id.au"/>
 </xs:restriction>
 </xs:simpleType>
 <xs:element />

 <xs:element name="row" minOccurs="0" maxOccurs="unbounded">
 <xs:annotation><xs:documentation xml:lang="en">
 A row element represents a "row" or "line" in the csv file. Rows contain values.
 </xs:documentation>
 <xs:appinfo>
 <example>
 <csv-line>apple,"banana","red, white and blue","quote this("")"</csv-line>
 <row>
 <value>apple</value>
 <value quoted="true">banana</vakye>
 <value>red, white and blue</value>
 <value quoted="true">quote this(")</value>
 </row>
 </example>
 </xs:appinfo>
 </xs:annotation> 
 <xs:choice minOccurs="1" maxOccurs="unbounded">
 <xs:annotation><xs:documentation xml:lang="en">
 Empty rows are not possible in csv. We must have at least one value or one error.
 </xs:documentation></xs:annotation>
 <xs:element name="value" type="xcsv:single-line-with-quoted">
 <xs:annotation><xs:documentation xml:lang="en">
 A value element represents a decoded (model) csv "value" or "cell".
 If the encoded value in the lexical csv was of a quoted form, then
 the element content here is the decoded or model form. In other words,
 the delimiting double-quote marks are striped out and the internal
 escaped double-quotes are de-escaped.
 </xs:documentation></xs:annotation>
 <xs:simpleType name="single-line">
 <xs:restriction base="xs:string">
 <xs:pattern value="[^\n]*"/>
 <xs:annotation><xs:documentation xml:lang="en">
 Cell values must fit this pattern because of the single-line restriction
 that we placed on the csv values.
 </xs:documentation></xs:annotation>
 </xs:restriction>
 </xs:simpleType>
 <xs:complexType name="single-line-with-quoted">
 <xs:simpleContent>
 <xs:extension base="xcsv:single-line">
 <xs:attribute name="quoted" type="xs:boolean" default="false"/>
 <xs:annotation><xs:documentation xml:lang="en">
 This attribute is True if the original lexical csv value was encoded
 in the quoted form, and False otherwise.
 
 If the attribute is not present, and the value contains a comma,
 or begins with a double-quote, the default for this attribute is to
 be deemed as True.
 
 If the attribute is not present, but the value neither contains a
 comma nor begins with a double-quote, then the default for this 
 attribute is to be deemed as False.
 </xs:documentation></xs:annotation>
 </xs:extension>
 </xs:simpleContent>
 </xs:complexType>
 </xs:element>
 <xs:group ref="xcsv:errorGroup">
 <xs:annotation><xs:documentation xml:lang="en">
 An error can be recorded here as a child of row, if there was an encoding
 error in the csv for that row.
 </xs:documentation></xs:annotation>
 </xs:group>
 </xs:choice>
 </xs:element>

 <xs:group ref="xcsv:errorGroup">
 <xs:annotation><xs:documentation xml:lang="en">
 An error can be recorded here as a child of the comma-separated-values element,
 if there was an i/o error in the transformational process. For example:
 CSV file not found.
 </xs:documentation></xs:annotation>
 </xs:group>
 </xs:sequence>
 <xs:attribute name="xcsv-version" type="xs:decimal"
 fixed="1.0" use="required"/>
 </xs:complexType>
 </xs:element>
 <xs:group name="errorGroup">
 <xs:annotation><xs:documentation xml:lang="en">
 This is an error node/message in one or more languages.
 </xs:documentation>
 <xs:appinfo>
 <example>
 <error error-code="2">
 <message xml:lang="en">Quoted value not terminated.</message>
 <message xml:lang="ru">Quoted value not terminated.</message>
 </error>
 </example> 
 <example>
 <error error-code="3">
 <message xml:lang="en">Quoted value incorrectly terminated.</message>
 <message xml:lang="ru">Quoted value incorrectly terminated.</message>
 </error>
 </example>
 </xs:annotation>
 <xs:element name="error">
 <xs:element name="message" minOccurs="1" maxOccurs="unbounded" type="xcsv:string-with-lang" />
 <xs:annotation><xs:documentation xml:lang="en">
 Although there can be multiple messages, there should only be at most one per language.
 </xs:documentation></xs:annotation>
 <xs:attribute name="error-code" type="xs:integer" default="1" />
 <xs:annotation><xs:documentation xml:lang="en">
 Each different kind of error should be associated with a unique error code.
 A map for the error codes is outside the scope of this schema, except to say the following:
 * zero (0) is not permitted as an error code.
 * one (1) means a general or uncategorised error. (Try to avoid this!)
 </xs:documentation></xs:annotation>
 </xs:element>
 </xs:group>
 <xs:complexType name="string-with-lang">
 <xs:annotation><xs:documentation xml:lang="en">
 This is an element with text content in some language as indicated
 by the xml:lang attribute.
 </xs:documentation></xs:annotation>
 <xs:simpleContent>
 <xs:extension base="xs:string">
 <xs:attribute ref="xml:lang" use="required" default="en" />
 </xs:extension>
 </xs:simpleContent>
 </xs:complexType>
</xs:schema>
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:xcsv="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 elementFormDefault="qualified"
 targetNamespace="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 version="1.0">
 <xs:import
 namespace="http://www.w3.org/XML/1998/namespace" 
 schemaLocation="xml.xsd"/>
 <xs:element name="comma-separated-single-line-values">
 <xs:annotation><xs:documentation xml:lang="en">
 This schema describes an XML representation of a subset of csv content.
 The format described by this schema, here-after referred to as "xcsv"
 is part of a generalised solution to the problem of converting
 general csv files into suitable XML, and the reverse transform.
 The restrictions on the csv content are:
 * The csv file is encoded either in UTF-8 or UTF16. If UTF-16, a BOM
 is required.
 * The cell values of the csv may not contain the CR or LF characters.
 Essentially, we are restricted to single-line values.
 The xcsv format was developed by Sean B. Durkin&#x85;
 www.seanbdurkin.id.au
 </xs:documentation></xs:annotation>
 <xs:complexType>
 <xs:sequence>
 <xs:element ref="xcsv:notice" minOccurs="0" maxOccurs="1"/>
 <xs:element name="row" minOccurs="0" maxOccurs="unbounded">
 <xs:annotation><xs:documentation xml:lang="en">
 A row element represents a "row" or "line" in the csv file. Rows contain values.
 </xs:documentation>
 <xs:appinfo>
 <example>
 <csv-line>apple,"banana","red, white and blue","quote this("")"</csv-line>
 <row>
 <value>apple</value>
 <value>banana</value>
 <value>red, white and blue</value>
 <value>quote this(")</value>
 </row>
 </example>
 </xs:appinfo>
 </xs:annotation> 
 <xs:choice minOccurs="1" maxOccurs="unbounded">
 <xs:annotation><xs:documentation xml:lang="en">
 Empty rows are not possible in csv. We must have at least one value or one error.
 </xs:documentation></xs:annotation>
 <xs:element name="value">
 <xs:annotation><xs:documentation xml:lang="en">
 A value element represents a decoded (model) csv "value" or "cell".
 If the encoded value in the lexical csv was of a quoted form, then
 the element content here is the decoded or model form. In other words,
 the delimiting double-quote marks are striped out and the internal
 escaped double-quotes are de-escaped.
 </xs:documentation></xs:annotation>
 <xs:simpleType>
 <xs:restriction base="xs:string">
 <xs:pattern value="[^\n]*"/>
 <xs:whiteSpace value="preserve"/>
 <xs:annotation><xs:documentation xml:lang="en">
 Cell values must fit this pattern because of the single-line restriction
 that we placed on the csv values.
 </xs:documentation></xs:annotation>
 </xs:restriction>
 </xs:simpleType>
 </xs:element>
 <xs:group ref="xcsv:errorGroup">
 <xs:annotation><xs:documentation xml:lang="en">
 An error can be recorded here as a child of row, if there was an encoding
 error in the csv for that row.
 </xs:documentation></xs:annotation>
 </xs:group>
 </xs:choice>
 </xs:element>
 <xs:group ref="xcsv:errorGroup">
 <xs:annotation><xs:documentation xml:lang="en">
 An error can be recorded here as a child of the comma-separated-values element,
 if there was an i/o error in the transformational process. For example:
 CSV file not found.
 </xs:documentation></xs:annotation>
 </xs:group>
 </xs:sequence>
 <xs:attribute name="xcsv-version" type="xs:decimal"
 fixed="1.0" use="required"/>
 </xs:complexType>
 </xs:element>
 <xs:element name="comma-separated-multiline-values">
 <xs:annotation><xs:documentation xml:lang="en">
 Similar to xcsv:comma-separated-multi-line-values but allows multi-line values.
 </xs:documentation></xs:annotation>
 <xs:complexType>
 <xs:sequence>
 <xs:element ref="xcsv:notice" minOccurs="0" maxOccurs="1"/>
 <xs:element name="row" minOccurs="0" maxOccurs="unbounded">
 <xs:choice minOccurs="1" maxOccurs="unbounded">
 <xs:element name="value">
 <xs:simpleType>
 <xs:restriction base="xs:string">
 <xs:whiteSpace value="preserve"/>
 </xs:restriction>
 </xs:simpleType>
 </xs:element>
 <xs:group ref="xcsv:errorGroup">
 </xs:group>
 </xs:choice>
 </xs:element>
 <xs:group ref="xcsv:errorGroup">
 </xs:group>
 </xs:sequence>
 <xs:attribute name="xcsv-version" type="xs:decimal"
 fixed="1.0" use="required"/>
 </xs:complexType>
 </xs:element>
 <xs:element name="notice" type="xcsv:notice-en" />
 <xs:annotation><xs:documentation xml:lang="en">
 This is an optional element below comma-separated-single-line-values or
 comma-separated-multiline-values that looks like the example.
 </xs:documentation>
 <xs:appinfo>
 <example>
 <notice xml:lang="en">The xcsv format was developed by Sean B. Durkin&#x85;www.seanbdurkin.id.au</notice>
 </example>
 </xs:appinfo></xs:annotation>
 <xs:complexType name="notice-en">
 <xs:simpleContent>
 <xs:extension base="xcsv:notice-content-en">
 <xs:attribute ref="xml:lang" use="required" fixed="en" />
 </xs:extension>
 </xs:simpleContent>
 </xs:complexType>
 <xs:simpleType name="notice-content-en">
 <xs:restriction base="xs:string">
 <xs:enumeration value="The xcsv format was developed by Sean B. Durkin&#x85;www.seanbdurkin.id.au"/>
 </xs:restriction>
 </xs:simpleType>
 <xs:element />
 <xs:group name="errorGroup">
 <xs:annotation><xs:documentation xml:lang="en">
 This is an error node/message in one or more languages.
 </xs:documentation>
 <xs:appinfo>
 <example>
 <error error-code="2">
 <message xml:lang="en">Quoted value not terminated.</message>
 <message xml:lang="ru">Цитируется значение не прекращается.</message>
 <error-data>"</error-data>
 </error>
 </example> 
 <example>
 <error error-code="3">
 <message xml:lang="en">Quoted value incorrectly terminated.</message>
 <message xml:lang="ru">Цитируется значение неправильно прекращено.</message>
 </error>
 </example>
 </xs:appinfo> 
 </xs:annotation>
 <xs:element name="error">
 <xs:element name="message" minOccurs="1" maxOccurs="unbounded" type="xcsv:string-with-lang" />
 <xs:annotation><xs:documentation xml:lang="en">
 Although there can be multiple messages, there should only be at most one per language.
 </xs:documentation></xs:annotation>
 <xs:element name="error-data" minOccurs="0" maxOccurs="1" >
 <xs:simpleContent>
 <xs:restriction base="xs:string">
 <xs:whiteSpace value="preserve"/>
 </xs:restriction>
 </xs:simpleContent>
 </xs:element>
 <xs:attribute name="error-code" type="xs:positiveInteger" default="1" />
 <xs:annotation><xs:documentation xml:lang="en">
 Each different kind of error should be associated with a unique error code.
 A map for the error codes is outside the scope of this schema, except to say the following:
 * one (1) means a general or uncategorised error. (Try to avoid this!)
 </xs:documentation></xs:annotation>
 </xs:element>
 </xs:group>
 <xs:complexType name="string-with-lang">
 <xs:annotation><xs:documentation xml:lang="en">
 This is an element with text content in some language as indicated
 by the xml:lang attribute.
 </xs:documentation></xs:annotation>
 <xs:simpleContent>
 <xs:extension base="xs:string">
 <xs:attribute ref="xml:lang" use="required" default="en" />
 </xs:extension>
 </xs:simpleContent>
 </xs:complexType>
</xs:schema>

use cases

case 1 Lines ending in CR LF, including the last line. The CSV:

1st name,2nd name
Sean,Brendan,"Durkin"
""","""
<This is a place-marker for an empty row>
"",

The XML equivalent (schema valid):

<xcsv:comma-separated-values
 xmlns:xcsv="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 xmlns:xml="http://www.w3.org/XML/1998/namespace"
 xcsv-version="1.0">
 <xcsv:notice xml:lang="en">The xcsv format was developed by Sean B. Durkin&#x85;www.seanbdurkin.id.au</xcsv:notice>
 <xcsv:row>
 <xcsv:value>1st name</xcsv:value> <xcsv:value>2nd name</xcsv:value>
 </xcsv:row>
 <xcsv:row>
 <xcsv:value>Sean</xcsv:value> <xcsv:value>Brendan</xcsv:value> <xcsv:value>Durkin</xcsv:value>
 </xcsv:row>
 <xcsv:row>
 <xcsv:value>","</xcsv:value>
 </xcsv:row>
 <xcsv:row>
 <xcsv:value />
 </xcsv:row>
 <xcsv:row>
 <xcsv:value /> <xcsv:value />
 </xcsv:row>
</xcsv:comma-separated-values>

Case 2 As case 1, but with line endings as just LF.

XML as case 1.

Case 3 Lines ending in CR LF, including the last line.

The CSV:

Fruit,Colour
Banana,Yellow

The XML equivalent (schema valid):

<xcsv:comma-separated-values
 xmlns:xcsv="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 xmlns:xml="http://www.w3.org/XML/1998/namespace"
 xcsv-version="1.0">
 <xcsv:row>
 <xcsv:value>Fruit</xcsv:value> <xcsv:value>Colour</xcsv:value>
 </xcsv:row>
 <xcsv:row>
 <xcsv:value>Banana</xcsv:value> <xcsv:value>Yellow</xcsv:value>
 </xcsv:row>
</xcsv:comma-separated-values>

Case 4 Same as case 3, but last line ends in eof. In other words, the last byte of the file is the UTF-8 code for 'w'

Same XML !

Case 5 Empty file. The size of the file is zero.

Valid XML instance:

<xcsv:comma-separated-values
 xmlns:xcsv="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 xcsv-version="1.0" />

Case 6. The file has one byte: the UTF-8 code for LF.

CSV:

LF

Valid XML instance:

Same XML as case 5!

Case 7. CVS encoding errors

The CSV (not valid):

Fruit,"Colour
Banana,"Yell"ow

The valid XML instance:

<xcsv:comma-separated-values
 xmlns:xcsv="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 xmlns:xml="http://www.w3.org/XML/1998/namespace"
 xcsv-version="1.0">
 <xcsv:row>
 <xcsv:value>Fruit</xcsv:value>
 <xcsv:error error-code="2">
 <xcsv:message xml:lang="en">Quoted value not terminated.</xcsv:message>
 <xcsv:error-data>"</xcsv:error-data>
 </xcsv:error>
 <xcsv:value>Colour</xcsv:value>
 </xcsv:row>
 <xcsv:row>
 <xcsv:value>Banana</xcsv:value>
 <xcsv:error error-code="3">
 <xcsv:message xml:lang="en">Quoted value incorrectly terminated.</xcsv:message>
 <xcsv:error-data>"</xcsv:error-data>
 </xcsv:error>
 <xcsv:value>Yell"ow</xcsv:value>
 </xcsv:row>
</xcsv:comma-separated-values>

case 8. Specific application where csv looks like

1st name,2nd name
Sean,Durkin
"Peter","Pan"

and in this specific application, the header is always there, with columns in the specified order. In this specific application

<people>
 <person first-name="Sean" first-name="Durkin" />
 <person first-name="Peter" first-name="Pan" />
</people>
  1. Step 1. Transform cvs into xcvs, using a generic library XSLT style-sheet.
  2. Step 2. Transform xcsv into the application-specific structure as above, using a trivial XSLT style-sheet.

case 9 This use case demonstrates the necessary XML encoding on a lexical level for & and < and raw data. No special encoding is required at the XML parser API level.

The csv:

 Character,Name
 &,Ampersand
 <,Less than

The equivalent schema-valid XML instance:

<xcsv:comma-separated-values
 xmlns:xcsv="http://seanbdurkin.id.au/xslt/xcsv.xsd"
 xmlns:xml="http://www.w3.org/XML/1998/namespace"
 xcsv-version="1.0">
 <xcsv:row>
 <xcsv:value>Character</xcsv:value> <xcsv:value>Name</xcsv:value>
 </xcsv:row>
 <xcsv:row>
 <xcsv:value>&amp;</xcsv:value> <xcsv:value>Ampersand</xcsv:value>
 </xcsv:row>
 <xcsv:row>
 <xcsv:value>&lt;</xcsv:value> <xcsv:value>Less than</xcsv:value>
 </xcsv:row>
</xcsv:comma-separated-values>
Tweeted twitter.com/#!/StackCodeReview/status/183427974165569536
Source Link
Loading
lang-xml

AltStyle によって変換されたページ (->オリジナル) /