clarin.sru.fcs.xml.writer¶
- class clarin.sru.fcs.xml.writer.FCSRecordXMLStreamWriter[source]¶
Bases:
object
This class provides several helper methods for writing records in the CLARIN-FCS record schema. These methods do not cover the full spectrum of all variations of records that are permitted by the CLARIN-FCS specification.
See also
CLARIN FCS specification, section “Operation searchRetrieve”
- static startResource(writer: ContentHandler, pid: str | None = None, ref: str | None = None) None [source]¶
Write the start of a resource (i.e. the
<Resource>
element). Calls to this method need to be balanced with calls to the endResource method.- Parameters:
writer – the xml.sax.handler.ContentHandler to use
pid – the persistent identifier of this resource or
None
, if not applicable. Defaults to None.ref – the reference of this resource or
None
, if not applicable. Defaults to None.
- static endResource(writer: ContentHandler) None [source]¶
Write the end of a resource (i.e. the
</Resource>
element). Calls to this method need to be balanced with calls to the startResource method.- Parameters:
writer – the xml.sax.handler.ContentHandler to use
- static startResourceFragment(writer: ContentHandler, pid: str | None = None, ref: str | None = None) None [source]¶
Write the start of a resource fragment (i.e. the
<ResourceFragment>
element). Calls to this method need to be balanced with calls to the endResourceFragment method.- Parameters:
writer – the xml.sax.handler.ContentHandler to use
pid – the persistent identifier of this resource or
None
, if not applicable. Defaults to None.ref – the reference of this resource or
None
, if not applicable. Defaults to None.
- static endResourceFragment(writer: ContentHandler) None [source]¶
Write the end of a resource fragment (i.e. the
</ResourceFragment>
element). Calls to this method need to be balanced with calls to the startResourceFragment method.- Parameters:
writer – the xml.sax.handler.ContentHandler to use
- static startDataView(writer: ContentHandler, mimetype: str) None [source]¶
Write the start of a data view (i.e. the
<DataView>
element). Calls to this method need to be balanced with calls to the endDataView method.- Parameters:
writer – the xml.sax.handler.ContentHandler to use
mimetype – the MIME type of this data view applicable
- static endDataView(writer: ContentHandler) None [source]¶
Write the end of a data view (i.e. the
</DataView>
element). Calls to this method need to be balanced with calls to the startDataView method.- Parameters:
writer – the xml.sax.handler.ContentHandler to use
- static writeKWICDataView(writer: ContentHandler, left: str | None, keyword: str, right: str | None) None [source]¶
[Deprecated] Use the HITS data view instead! Convince method to write a KWIC data view. It automatically performs the calls to startDataView and endDataView.
- Parameters:
writer – the xml.sax.handler.ContentHandler to use
left – the left context of the KWIC or
None
if not applicablekeyword – the keyword of the KWIC
right – the right context of the KWIC or
None
if not applicable
- static writeSingleHitHitsDataView(writer: ContentHandler, left: str | None, hit: str, right: str | None) None [source]¶
Convince method to write a simple HITS data view. It automatically performs the calls to startDataView and endDataView.
- Parameters:
writer – the xml.sax.handler.ContentHandler to use
left – the left context of the hit or
None
if not applicablehit – the actual hit, that will be highlighted
right – the right context of the hit or
None
if not applicable
- static writeHitsDataView(writer: ContentHandler, text: str, hits: List[Tuple[int, int]], second_is_length: bool) None [source]¶
Convince method to write a simple HITS data view. It automatically performs the calls to startDataView and endDataView.
- Parameters:
writer – the xml.sax.handler.ContentHandler to use
text – the text content of the hit
hits – a list containing tuples (of start offsets, end offset/length) for the hit markers in the
text
contentsecond_is_length – if
True
the second element of each tuple in thishits
array is interpreted as an length; ifFalse
it is interpreted as an end-offset
- static writeResourceWithSingleHitHitsDataView(writer: ContentHandler, pid: str | None, ref: str | None, left: str | None, hit: str, right: str | None)[source]¶
Convince method to write a simple HITS data view. It automatically performs the calls to startResource and endResource.
The following code (arguments omitted) would accomplish the same result:
... FCSRecordXMLStreamWriter.startResource(...) FCSRecordXMLStreamWriter.writeSingleHitHitsDataView(...) FCSRecordXMLStreamWriter.endResource(...) ...
- Parameters:
writer – the xml.sax.handler.ContentHandler to use
pid – the persistent identifier of this resource or
None
, if not applicable.ref – the reference of this resource or
None
, if not applicable.left – the left context of the hit or
None
if not applicablehit – the actual hit, that will be highlighted
right – the right context of the hit or
None
if not applicable
- Raises:
TypeError – if writer is None
- static writeResourceWithHitsDataView(writer: ContentHandler, pid: str | None, ref: str | None, text: str, hits: List[Tuple[int, int]], second_is_length: bool)[source]¶
Convince method to write a simple HITS data view. It automatically performs the calls to startResource and endResource.
The following code (arguments omitted) would accomplish the same result:
... FCSRecordXMLStreamWriter.startResource(...) FCSRecordXMLStreamWriter.writeHitsDataView(...) FCSRecordXMLStreamWriter.endResource(...) ...
- Parameters:
writer – the xml.sax.handler.ContentHandler to use
pid – the persistent identifier of this resource or
None
, if not applicable.ref – the reference of this resource or
None
, if not applicable.text – the text content of the hit
hits – a list containing tuples (of start offsets, end offset/length) for the hit markers in the
text
contentsecond_is_length – if
True
the second element of each tuple in thishits
array is interpreted as an length; ifFalse
it is interpreted as an end-offset
- Raises:
TypeError – if writer is None
- static writeResourceWithKWICDataView(writer: ContentHandler, pid: str | None, ref: str | None, left: str | None, keyword: str, right: str | None) None [source]¶
[Deprecated] Convince method for writing a record with a KWIC data view.
The following code (arguments omitted) would accomplish the same result:
... FCSRecordXMLStreamWriter.startResource(...) FCSRecordXMLStreamWriter.writeKWICDataView(...) FCSRecordXMLStreamWriter.endResource(...) ...
- Parameters:
writer – the xml.sax.handler.ContentHandler to use
pid – the persistent identifier of this resource or
None
, if not applicable.ref – the reference of this resource or
None
, if not applicable.left – the left context of the KWIC or
None
if not applicablekeyword – the keyword of the KWIC
right – the right context of the KWIC or
None
if not applicable
Note
Only use, if you want compatability to legacy FCS applications.
- static writeResourceWithHitsDataViewLegacy(writer: ContentHandler, pid: str | None, ref: str | None, left: str | None, hit: str, right: str | None) None [source]¶
[Deprecated] Convince method for writing a record with a HITS and a KWIC data view. This method is intended for applications that want ensure computability to legacy CLARIN-FCS clients.
The following code (arguments omitted) would accomplish the same result:
... FCSRecordXMLStreamWriter.startResource(...) FCSRecordXMLStreamWriter.writeSingleHitHitsDataView(...) FCSRecordXMLStreamWriter.writeKWICDataView(...) FCSRecordXMLStreamWriter.endResource(...) ...
- Parameters:
writer – the xml.sax.handler.ContentHandler to use
pid – the persistent identifier of this resource or
None
, if not applicable.ref – the reference of this resource or
None
, if not applicable.left – the left context of the hit or
None
if not applicablehit – the actual hit, that will be highlighted
right – the right context of the hit or
None
if not applicable
Note
Only use, if you want compatability to legacy FCS applications.
- class clarin.sru.fcs.xml.writer.SpanOffsetUnit(value)[source]¶
-
An enumeration.
- ITEM = 'item'¶
- TIMESTAMP = 'timestamp'¶
- class clarin.sru.fcs.xml.writer.Segment(id: str | int, start: int, end: int, ref: str | NoneType = None)[source]¶
Bases:
object
- class clarin.sru.fcs.xml.writer.Span(segment: clarin.sru.fcs.xml.writer.Segment, value: str | NoneType, altValue: str | NoneType, highlight: str | int | NoneType)[source]¶
Bases:
object
- class clarin.sru.fcs.xml.writer.AdvancedDataViewWriter(unit: SpanOffsetUnit)[source]¶
Bases:
object
Helper class for serializing Advanced Data Views. It can be used for writing more than once, but it is not thread-save. This helper can also serialize HITS Data Views.
[Constructor]
- Parameters:
unit – the unit to be used for span offsets
- Raises:
TypeError – if unit is None
- unit¶
the unit to be used for span offsets
- addSpan(layer_id: str, start: int, end: int, value: str | None = None, altValue: str | None = None, highlight: int | None = -1)[source]¶
Add a span.
- Parameters:
layer_id – the span’s layer id
start – the span’s start offset
end – the span’s end offset
value – the span’s content value or
None
altValue – the span’s alternate value or
None
highlight – the span’s alternate value or
None
- writeAdvancedDataView(writer: ContentHandler)[source]¶
Write the Advanced Data View to the output stream.
- Parameters:
writer – the xml.sax.handler.ContentHandler to use
- writeHitsDataView(writer: ContentHandler, layer_id: str)[source]¶
Convenience method to write HITS Data View.
- Parameters:
writer – the xml.sax.handler.ContentHandler to use
layer_id – the layer id of the layer to be serialized as HITS Data View