Our rating system

For each of the AI-powered tools that are featured on these information pages we provide ratings in three categories: 

  • Accuracy and Quality
  • Flexibility and Features
  • Data Security and Privacy

For each of these categories the tools have been thoroughly tested and evaluated, and the ratings will be updated should the tool or our standards significantly change. Here we will provide the context and reasoning for each of the rating categories, and some recommendations for the interpretation of the ratings. 

Accuracy and Quality

Most AI-powered tools are operated using a prompt-system, meaning that the user has to supply the instructions for the model to follow. How the model interprets these instructions and how accurately it follows them is part of the evaluation for this rating. This includes, for example, the interpretation of concepts such as "left" and "right" when it comes to image generators, the understanding of more complex and field specific definitions and terminology for LLMs, and the recognition of complex sentences and words during the transcription of audio files.

We have combined this rating with an assessment of the quality, which can include the inclusion of the source of information in the output of LLMs, reduced blurriness or deformations in images, or the types of journals selected in literature search engines.

A rating of two or fewer stars indicates the model may not produce reliable or trustworthy output and may require more critical reflection or more guidance by the user than is required for comparable tools.

Flexibility and Features

Each tool has its own strengths and weaknesses. Some LLMs provide excellent support for coding, some have access to the internet so they can include more recent information in their answers, and others allow for the interpretation of images and/or sounds in addition to text. The number of unique and/or relevant features in a model determines a large part of this rating. For open-source models this could include variations of the models as well (such as different sizes or supported languages). For commercial models this also includes the possibility to disable these features when they are undesired. When features or even core functionalities of a tool are locked behind a paywall the rating is lowered.

Tools that have no relevant features available for free users will never be recommended on these pages, as doing so can increase inequality between users.

Data Security and Privacy

As Wageningen University we are legally obligated (both students and employees) to ensure no sensitive data is shared with parties that should not have access to such information. This includes not just sensitive personal information such as names, phone numbers and addresses. In the academic world even a research question, hypothesis, companies involved in a project, or a specific methodology used should be considered sensitive and may therefore not be shared. As commercial models may save the data we enter into a model, to be used for future training of new versions of those models, we should be extremely careful with how these models are used. Models that offer the option to disable the saving of data for future training or that state that the information provided is not used for training (and this claim is reasonably believable), a higher rating is awarded. Open-source models which are run on your own device (and therefore don't send data to external parties) will have a high rating.

The transparency of the developer of the model is also accounted for in this rating. If the training data is well documented, does not violate copyrights or other intellectual properties, and respects the privacy of their users or otherwise, the rating will be increased as well. However, this will weigh less heavily on the overall rating than the exchange of data to external parties, as that is considered a more serious concern and potential legal violation.

A model with a rating of two or fewer stars in this category should be avoided and an alternative (safer) tool should be used instead. If this is not possible, limit your exchange of information with this model as much as possible. Sharing of sensitive information is never allowed.