More progress bars and embedding providers

Version 1.11.0 of refinery offers multiple improvements by adding progress bars to labeling functions and attribute calculation. It is now possible to use OpenAIs and Coheres embedding API service, whereas previously HuggingFace was the only source to use embedding models. The new version also removed GDPR badges from the bricks integrator and also allows you to now secure project exports with a password.

Progress bar for labeling functions and attribute calculation

The labeling functions are a powerful tool to capture domain knowledge and support a semi-automated data labeling approach, while the attribute calculation allows the transformation or enrichment of data in refinery. Previously, when running a labeling function or attribute calculation, there was no indication of how long it will take until a task is finished. Depending on the code that was executed and the size of the dataset, a run could be very short but could also take a while to finish. The newly introduced progress bar was implemented to provide the user with a clear indication of how many records have already been processed and provides a visual clue on how much time is left until the run is completed.

OpenAI and Cohere as additional embedding providers

The embedding services of OpenAI as well as Cohere are now available for use in refinery. Previously, only open-source models from HuggingFace as well as embeddings like bag of words or tf-idf have been usable for the embeddings. That way it was possible to use embedding models for free and locally to generate embeddings from text data.

While there are many excellent embedding models on HuggingFace, the embedding services from OpenAI as well as Cohere have the reputation of being of a very high quality and to provide state-of-the-art embeddings. Especially for enterprise use-cases, these paid for services are an interesting alternative.

Both the OpenAI and the Cohere embedding service do not rely on models that are downloaded and executed locally, but instead are available via an API. Because of this, you are required to accept their terms of service in order to be able to use these embedding services.

Removing the GDPR indicator of the bricks integrator

Transparency and data-security of of the utmost importance to us. With previous versions, the integrator to bricks showed a badge, indicating wether a brick module is compliant to the EU GDPR regulations or now. In refinery 1.11.0 we have removed this badge again, because we simply cannot take any responsibility for any third-party services or APIs that are used within a brick module.

We have no possibility to check the alignment of third-parties with GDPR and we think it would be wrong to provide you with any positive or negative indication in this regard when these things are not really clear at all.

The GDPR badges are removed from the bricks integrator, which can be found in:

  • Labeling functions
  • Active learners
  • Attribute calculation
  • Record IDE (accessible via the labeling page)

Please note that Kern AI has no control over third parties and there is always a risk of using an API service that is not offered by Kern AI. Apart from that, our platform still adheres to the ISO 27001 norm and all of ours managed services are still hosted within the European Union to comply with GDPR-regulations.

Password protected project imports

Keeping your data and your passwords save is important. This does not only apply to the projects that you have in refinery, but also the ones that you want to export. To provide project exports with additional security, you can now set an optional password for encryption when creating a project snapshot.

When a project snapshot is imported into refinery again, the password is needed to import the snapshot. That way, sensitive data can be exported and saved in a snapshot without the risk of immediate accessibility of the snapshot file.

Rework of the refinery-entry for a cleaner tech-stack

The refinery-entry (login, registration, forgot password, verification and settings) are refactored into react instead of Handlebars. The behavior of these pages should be same as before with the difference that all components are build in react to provide a cleaner tech-stack.

Minor changes

  • Added configuration to display pre-wrapped and pre-lined records and fixed line-break bug #256
  • Fixed bug in the version overview page
  • Updated package versions for security