clarin.sru.fcs.xml.writer

class clarin.sru.fcs.xml.writer.FCSRecordXMLStreamWriter[source]

Bases: object

This class provides several helper methods for writing records in the CLARIN-FCS record schema. These methods do not cover the full spectrum of all variations of records that are permitted by the CLARIN-FCS specification.

See also

  • CLARIN FCS specification, section “Operation searchRetrieve”

static startResource(writer: ContentHandler, pid: str | None = None, ref: str | None = None) None[source]

Write the start of a resource (i.e. the <Resource> element). Calls to this method need to be balanced with calls to the endResource method.

Parameters:
  • writer – the xml.sax.handler.ContentHandler to use

  • pid – the persistent identifier of this resource or None, if not applicable. Defaults to None.

  • ref – the reference of this resource or None, if not applicable. Defaults to None.

static endResource(writer: ContentHandler) None[source]

Write the end of a resource (i.e. the </Resource> element). Calls to this method need to be balanced with calls to the startResource method.

Parameters:

writer – the xml.sax.handler.ContentHandler to use

static startResourceFragment(writer: ContentHandler, pid: str | None = None, ref: str | None = None) None[source]

Write the start of a resource fragment (i.e. the <ResourceFragment> element). Calls to this method need to be balanced with calls to the endResourceFragment method.

Parameters:
  • writer – the xml.sax.handler.ContentHandler to use

  • pid – the persistent identifier of this resource or None, if not applicable. Defaults to None.

  • ref – the reference of this resource or None, if not applicable. Defaults to None.

static endResourceFragment(writer: ContentHandler) None[source]

Write the end of a resource fragment (i.e. the </ResourceFragment> element). Calls to this method need to be balanced with calls to the startResourceFragment method.

Parameters:

writer – the xml.sax.handler.ContentHandler to use

static startDataView(writer: ContentHandler, mimetype: str) None[source]

Write the start of a data view (i.e. the <DataView> element). Calls to this method need to be balanced with calls to the endDataView method.

Parameters:
  • writer – the xml.sax.handler.ContentHandler to use

  • mimetype – the MIME type of this data view applicable

static endDataView(writer: ContentHandler) None[source]

Write the end of a data view (i.e. the </DataView> element). Calls to this method need to be balanced with calls to the startDataView method.

Parameters:

writer – the xml.sax.handler.ContentHandler to use

static writeKWICDataView(writer: ContentHandler, left: str | None, keyword: str, right: str | None) None[source]

[Deprecated] Use the HITS data view instead! Convince method to write a KWIC data view. It automatically performs the calls to startDataView and endDataView.

Parameters:
  • writer – the xml.sax.handler.ContentHandler to use

  • left – the left context of the KWIC or None if not applicable

  • keyword – the keyword of the KWIC

  • right – the right context of the KWIC or None if not applicable

static writeSingleHitHitsDataView(writer: ContentHandler, left: str | None, hit: str, right: str | None) None[source]

Convince method to write a simple HITS data view. It automatically performs the calls to startDataView and endDataView.

Parameters:
  • writer – the xml.sax.handler.ContentHandler to use

  • left – the left context of the hit or None if not applicable

  • hit – the actual hit, that will be highlighted

  • right – the right context of the hit or None if not applicable

static writeHitsDataView(writer: ContentHandler, text: str, hits: List[Tuple[int, int]], second_is_length: bool) None[source]

Convince method to write a simple HITS data view. It automatically performs the calls to startDataView and endDataView.

Parameters:
  • writer – the xml.sax.handler.ContentHandler to use

  • text – the text content of the hit

  • hits – a list containing tuples (of start offsets, end offset/length) for the hit markers in the text content

  • second_is_length – if True the second element of each tuple in this hits array is interpreted as an length; if False it is interpreted as an end-offset

static writeResourceWithSingleHitHitsDataView(writer: ContentHandler, pid: str | None, ref: str | None, left: str | None, hit: str, right: str | None)[source]

Convince method to write a simple HITS data view. It automatically performs the calls to startResource and endResource.

The following code (arguments omitted) would accomplish the same result:

...
FCSRecordXMLStreamWriter.startResource(...)
FCSRecordXMLStreamWriter.writeSingleHitHitsDataView(...)
FCSRecordXMLStreamWriter.endResource(...)
...
Parameters:
  • writer – the xml.sax.handler.ContentHandler to use

  • pid – the persistent identifier of this resource or None, if not applicable.

  • ref – the reference of this resource or None, if not applicable.

  • left – the left context of the hit or None if not applicable

  • hit – the actual hit, that will be highlighted

  • right – the right context of the hit or None if not applicable

Raises:

TypeError – if writer is None

static writeResourceWithHitsDataView(writer: ContentHandler, pid: str | None, ref: str | None, text: str, hits: List[Tuple[int, int]], second_is_length: bool)[source]

Convince method to write a simple HITS data view. It automatically performs the calls to startResource and endResource.

The following code (arguments omitted) would accomplish the same result:

...
FCSRecordXMLStreamWriter.startResource(...)
FCSRecordXMLStreamWriter.writeHitsDataView(...)
FCSRecordXMLStreamWriter.endResource(...)
...
Parameters:
  • writer – the xml.sax.handler.ContentHandler to use

  • pid – the persistent identifier of this resource or None, if not applicable.

  • ref – the reference of this resource or None, if not applicable.

  • text – the text content of the hit

  • hits – a list containing tuples (of start offsets, end offset/length) for the hit markers in the text content

  • second_is_length – if True the second element of each tuple in this hits array is interpreted as an length; if False it is interpreted as an end-offset

Raises:

TypeError – if writer is None

static writeResourceWithKWICDataView(writer: ContentHandler, pid: str | None, ref: str | None, left: str | None, keyword: str, right: str | None) None[source]

[Deprecated] Convince method for writing a record with a KWIC data view.

The following code (arguments omitted) would accomplish the same result:

...
FCSRecordXMLStreamWriter.startResource(...)
FCSRecordXMLStreamWriter.writeKWICDataView(...)
FCSRecordXMLStreamWriter.endResource(...)
...
Parameters:
  • writer – the xml.sax.handler.ContentHandler to use

  • pid – the persistent identifier of this resource or None, if not applicable.

  • ref – the reference of this resource or None, if not applicable.

  • left – the left context of the KWIC or None if not applicable

  • keyword – the keyword of the KWIC

  • right – the right context of the KWIC or None if not applicable

Note

Only use, if you want compatability to legacy FCS applications.

static writeResourceWithHitsDataViewLegacy(writer: ContentHandler, pid: str | None, ref: str | None, left: str | None, hit: str, right: str | None) None[source]

[Deprecated] Convince method for writing a record with a HITS and a KWIC data view. This method is intended for applications that want ensure computability to legacy CLARIN-FCS clients.

The following code (arguments omitted) would accomplish the same result:

...
FCSRecordXMLStreamWriter.startResource(...)
FCSRecordXMLStreamWriter.writeSingleHitHitsDataView(...)
FCSRecordXMLStreamWriter.writeKWICDataView(...)
FCSRecordXMLStreamWriter.endResource(...)
...
Parameters:
  • writer – the xml.sax.handler.ContentHandler to use

  • pid – the persistent identifier of this resource or None, if not applicable.

  • ref – the reference of this resource or None, if not applicable.

  • left – the left context of the hit or None if not applicable

  • hit – the actual hit, that will be highlighted

  • right – the right context of the hit or None if not applicable

Note

Only use, if you want compatability to legacy FCS applications.

class clarin.sru.fcs.xml.writer.SpanOffsetUnit(value)[source]

Bases: str, Enum

An enumeration.

ITEM = 'item'
TIMESTAMP = 'timestamp'
class clarin.sru.fcs.xml.writer.Segment(id: str | int, start: int, end: int, ref: str | NoneType = None)[source]

Bases: object

id: str | int
start: int
end: int
ref: str | None = None
class clarin.sru.fcs.xml.writer.Span(segment: clarin.sru.fcs.xml.writer.Segment, value: str | NoneType, altValue: str | NoneType, highlight: str | int | NoneType)[source]

Bases: object

segment: Segment
value: str | None
altValue: str | None
highlight: str | int | None
class clarin.sru.fcs.xml.writer.AdvancedDataViewWriter(unit: SpanOffsetUnit)[source]

Bases: object

Helper class for serializing Advanced Data Views. It can be used for writing more than once, but it is not thread-save. This helper can also serialize HITS Data Views.

[Constructor]

Parameters:

unit – the unit to be used for span offsets

Raises:

TypeError – if unit is None

unit

the unit to be used for span offsets

reset()[source]

Reset the writer for writing a new data view (instance).

addSpan(layer_id: str, start: int, end: int, value: str | None = None, altValue: str | None = None, highlight: int | None = -1)[source]

Add a span.

Parameters:
  • layer_id – the span’s layer id

  • start – the span’s start offset

  • end – the span’s end offset

  • value – the span’s content value or None

  • altValue – the span’s alternate value or None

  • highlight – the span’s alternate value or None

writeAdvancedDataView(writer: ContentHandler)[source]

Write the Advanced Data View to the output stream.

Parameters:

writer – the xml.sax.handler.ContentHandler to use

writeHitsDataView(writer: ContentHandler, layer_id: str)[source]

Convenience method to write HITS Data View.

Parameters:
  • writer – the xml.sax.handler.ContentHandler to use

  • layer_id – the layer id of the layer to be serialized as HITS Data View