awesome-nlp-polish

A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.

Models

SlavicBert - multilingual BERT model. The repository contains Bulgarian+Czech+Polish+Russian
Allegro BERT - It has not been publish yet (12.2019) - but there is a poster - https://conference.mlinpl.org/pdf/CfC_AllPosters.pdf
Word2vec polish models http://dsmodels.nlp.ipipan.waw.pl/w2v.html
FastText polish model FB - Common Crawl, Wikipedia
FastText polish model

Useful articles or projects

Word embeddings and language models for polish (Word2vec, fasttext, Glove, Elmo) - https://github.com/sdadas/polish-nlp-resources
Polish Word Embeddings Review - Evaluation of polish word embeddings prepared by various research groups. Evaluation is done by words analogy task https://github.com/Ermlab/polish-word-embeddings-review
Computional Linguistics in Poland (CLiP) http://clip.ipipan.waw.pl/: website cotains complex information about tools, resources, research centers and projects related to NLP of Polish
AGH DSP: different projects considering use of Polish language, speech mainly http://www.dsp.agh.edu.pl/pl:research:main

Reserch papers

"Evaluation of Sentence Representations in Polish" - Sławomir Dadas, Michał Perełkiewicz, Rafał Poswiata 2019 https://arxiv.org/pdf/1910.11834.pdf

Datasets

The KLEJ (Kompleksowa Lista Ewaluacji Językowych) benchmark is a set of nine evaluation tasks for the Polish language understanding.
Wroclaw Corpus of Consumer Reviews Sentiment (WCCRS)
PolEval datasets -
- Hate speach classification - In this task, the participants are to distinguish between normal/non-harmful tweets (class: 0) and tweets that contain any kind of harmful information (class: 1). This includes cyberbullying, hate speech and related phenomena: [PolEval 2019 Task6] [Ermlab mirror GDrive]
Ermlab Opineo dataset - https://github.com/Ermlab/pl-sentiment-analysis - GDrive
HateSpeech corpus in the current version contains over 2000 posts crawled from public Polish web. They represent various types and degrees of offensive language, expressed toward minorities (eg. ethnical, racial). The data were annotated manually. http://zil.ipipan.waw.pl/HateSpeech
Polish Speech Corpus (DSP AGH) http://www.dsp.agh.edu.pl/en:resources:korpusmowy : 55 hours of annotated Polish speech

Research Gropus

Contributors

People who contribute to this project.

Krzysztof Sopyła - https://ksopyla.com LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Awesome_nlp_polish.png		Awesome_nlp_polish.png
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

awesome-nlp-polish

Models

Useful articles or projects

Reserch papers

Datasets

Research Gropus

Contributors

About

Releases

Packages

License

shad94/awesome-nlp-polish

Folders and files

Latest commit

History

Repository files navigation

awesome-nlp-polish

Models

Useful articles or projects

Reserch papers

Datasets

Research Gropus

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages