From 9b08277c9a86e68960b88b6d6cfa78552d236337 Mon Sep 17 00:00:00 2001 From: Aviparna Biswas Date: Tue, 21 May 2024 16:15:57 +0000 Subject: [PATCH] Update README.md --- README.md | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 61b2b17..76c05a7 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,26 @@ -# Doccano +# Labeling text using Doccano -Labeling text using Doccano \ No newline at end of file +Doccano is an open source text annotation tool. It can be used to create labeled datasets for: + +- Text classification +- Entity extraction +- Sequence to sequence translation + +Doccano can be used to create labeled data for training the `EntityRecongnizer` model in `arcgis.learn`. + +This software is created by: Hiroki Nakayama and Takahiro Kubo and Junya Kamura and Yasufumi Taniguchi and Xu Liang + +## How to label training data for named entity recognition with doccano + +1. After Doccano has been deployed to the local machine, go to Doccano hompage and login with your credentials. +2. Select appropriate project type +3. If data import needed for annotation, go to Dataset from the left panel then click on Actions > Import dataset. +4. Select 'JSONL' and then click on 'Select file(s)' and point it to the reports file (docanno_deployment\reports_label.jsonl). **Alternatively, text documents can also be uploaded using the ‘Plain text’ option.** +5. After the file has been imported, you will see the documents loaded on the screen. +6. Click on 'Start annotation' from the top menu bar. +7. Analyze the document (use the bottom navigation bar for shifting through the docs). Mark sequences with your mouse and select the relevant title. +8. New labels can also be created by navigating to ‘Labels’ from the left panel. +9. Once all the documents have been labeled, go to 'Dataset' > 'Actions' > 'Export dataset'. +10. Select JSONL(Text-Labels). +11. Set an export file name. +12. Click Export. -- GitLab