Simple fair-evaluator extension
This extension extends the functionality of the connectors by automatically extracting the metadata of the file. Which is used for creating asset and for calculating the FAIRness score. We test the fair-evaluator extension with existing EDC sample transfer-01-file-transfer. The sample transmits a test file from one connector to another connector. Before transferring the file it adds the file as an asset but while creating the asset object we have to hardcode the metadata of the file. We are extending the functionality of the transfer-01-file-transfer sample by automating the asset creation process and getting a fairness score for each asset by using fair-evaluator
The fair evaluator extension is based on the Fairness evaluator developed by Abu Ibne Bayazid. We use this repository with minor changes to calculate the Fairness score in the fair-evaluator extension and our main contribution is the automated extraction of the metadata. In our implementation, we focused on both retrievable and extractable metadata. For the testing of our approach; PDF and Word documents have been chosen as a form of input data; these formats are very commonly used types of files; especially for research in articles or reports. More detail about extension can be found in this document
How to use the extension
To extract metadata form text file we used NLP/ML and in order to execute NLP/ML models for the metadata extraction python was used for the task due to its inherent flexibility and an extensive collection of NLP/ML packages;
Before we use the extension, a recent Python version needs to be downloaded and installed in the system; as well as the required packages which are contained in the requirements.txt file,
the whole list of requirements could be installed using the following command pip install -r requirements.txt
.
The text_metadata_extractor.py python file contains a set of functions that extract metadata (title, keywords, summary…) from the pdf and Word document.
This file is invoked by the main Java class. There are several ways of calling Python scripts in a Java class; the one that we chose consists of using a class called ProcessBuilder
then we afterward redirect the output of the Python script to the Java output to be used for the next step.
The extensions can be used by adding the dependency to the Gradle file of other extensions. In this case, the transfer-file-local extension (it's part of the provider connector) will use the fair-evaluator. We add the dependency by adding the following line into the Gradle file.
implementation(project(":transfer:transfer-01-file-transfer:fair-evaluator"))
And then inject a dependency of Class FairnessScoreEvaluator
by adding the following line in FileTransferExtension
@Inject
private FairnessScoreEvaluator fairnessScoreEvaluator;
Once we inject the dependency we have the object of Class FairnessScoreEvaluator
. Using this object we can add properties automatically while creating an asset.
var fairnessProperties = fairnessScoreEvaluator.getFAIRScore(assetPath.toString(),assetPID,assetDataLink,assetVersion);
var assetId = "test-document";
var asset = Asset.Builder.newInstance()
.id(assetId)
.properties(fairnessProperties)
.build();
Running the connectors:
First, we have to build the jar file for provider and consumer connector:
.\gradlew transfer:transfer-01-file-transfer:file-transfer-provider:build
.\gradlew transfer:transfer-01-file-transfer:file-transfer-consumer:build
Now we run the provider connector in one terminal.
java -Dedc.fs.config=transfer/transfer-01-file-transfer/file-transfer-provider/config.properties -jar transfer/transfer-01-file-transfer/file-transfer-provider/build/libs/provider.jar
And consumer connector in another terminal.
java -Dedc.fs.config=transfer/transfer-01-file-transfer/file-transfer-consumer/config.properties -jar transfer/transfer-01-file-transfer/file-transfer-consumer/build/libs/consumer.jar
Since the fair-evaluator is part of the provider extension, it runs together with the provider connector. Provider connector creates the in-memory asset object with the help of fair-evaluator and file-local-transfer extension.
Now in third terminal we should be able to get providers catalog detail on consumer side.
curl -X POST "http://localhost:9192/api/v1/management/catalog/request" --header "X-Api-Key: password" --header "Content-Type: application/json" --data-raw "{\"providerUrl\": \"http://localhost:8282/api/v1/ids/data\"}"
This api will return the asset, contracts, policies detail available in the provider catalog. Below we can see an example. To test the extension using English document we used this paper and got the following properties:
Note:
The Sample support filesystem-based-configuration and source file path should be updated here and the target file path should be here.
Now that the asset has been created we can run the consumer connector and request to transfer the file. This process is the same as transfer-01-file-transfer and is out of the scope of this documentation. Please find the steps to run the complete file transfer process here.