Usage
A brief tour!
The plugin automatically detects resources on upload that can be added to the datastore. This is accomplished using the resource format. Currently the accepted formats are:
- CSV - csv, application/csv
- TSV - tsv
- XLS (old excel) - xls, application/vnd.ms-excel
- XLSX (new excel) - xlsx, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
If one of these formats is used then an attempt will be made to add the uploaded or URL to the datastore. Note that only the first sheet in multisheet XLS and XLSX files will be processed.
Adding data to the datastore is accomplished in two steps:
- Ingesting the records into MongoDB. A document is used per unique record ID to store all versions and the documents for a specific resource are stored in a collection named after the resource's ID. For more information on the structure of these documents see the Splitgill repository for more details.
- Indexing the documents from MongoDB into Elasticsearch. One indexed is used for all versions of the records and a document in Elasticsearch is created per version of each record. The index is named after the resource's ID with the configured prefix prepended. For more information on the structure of these indexed documents see the Splitgill repository for more details.
The ingesting and indexing is completed in the background using the CKAN's job queue.
Once data has been added to the datastore it can be searched using the datastore_search or more advanced datastore_search_raw actions.
The datastore_search action closely mirrors the default CKAN datastore action of the same name.
The datastore_search_raw action allows users to query the datastore using raw Elasticsearch queries, unlocking the full range of features it provides.
Actions¶
All of this extension's actions are fully documented, including all parameters and results.
Commands¶
vds¶
-
initdb: ensure the tables needed by this plugin exist.ckan -c $CONFIG_FILE versioned-datastore initdb -
reindex: reindex either a specific resource or all resources.ckan -c $CONFIG_FILE versioned-datastore reindex $OPTIONAL_RESOURCE_ID
Interfaces¶
IVersionedDatastore¶
This is the most general interface.
Here is a brief overview of its functions:
datastore_modify_data_dict- allows modification of the data dict before it is validated and used to create the search objectdatastore_modify_search- allows modifications to the search before it is made. This is kind of analogous toIDatastore.datastore_searchhowever instead of passing around a query dict, instead an elasticsearch-dslSearchobject is passed arounddatastore_modify_result- allows modifications to the result after the searchdatastore_modify_fields- allows modification of the field definitions before they are returned with the results of a datastore_searchdatastore_modify_index_doc- allows the modification of a resource's data during indexingdatastore_is_read_only_resource- allows implementors to designate certain resources as read onlydatastore_after_indexing- allows implementors to hook onto the completion of an indexing task
See the interface definition in this plugin for more details about these functions.
IVersionedDatastoreQuery¶
This interface handles hooks and functions specifically relating to search queries.
get_query_schemas- allows registering custom query schemas
IVersionedDatastoreDownloads¶
This interface handles hooks and functions specifically relating to downloads.
download_modify_notifier_start_templates- modify the templates used when sending notifications that a download has starteddownload_modify_notifier_end_templates- modify the templates used when sending notifications that a download has endeddownload_modify_notifier_error_templates- modify the templates used when sending notifications that a download has faileddownload_modify_notifier_template_context- modify the context/variables used to populate the notification templatesdownload_derivative_generators- extend or modify the list of derivative generatorsdownload_file_servers- extend or modify the list of file serversdownload_notifiers- extend or modify the list of notifiersdownload_data_transformations- extend or modify the list of data transformationsdownload_modify_manifest- modify the manifest included in the download filedownload_before_run- modify args before any search is run or files generateddownload_after_run- hook notifying that a download has finished (whether failed or completed)