Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xExtension-ReadingTime] Reading time of Japanese articles are too short #292

Open
hkcomori opened this issue Feb 14, 2025 · 3 comments
Open
Labels
good first issue help wanted javascript Pull requests that update Javascript code xExtension-ReadingTime xExtension-ReadingTime

Comments

@hkcomori
Copy link

Words counts seem to be incorrect in languages such as Japanese, where words are not separated by spaces.
Therefore, reading time calculated from word counts is very short compared to actual reading time.

I think it is better to calculate the reading time from letters, because it is difficult to accurately count words in these languages.

Therefore, it would be nice to be able to set up the following:

  • Source Metrics: Select what to calculate reading time from words, letters. (Default: words)
  • Conversion factor: Factor to convert source metrics to reading time. (Default: 300)
@math-GH math-GH added the xExtension-ReadingTime xExtension-ReadingTime label Feb 15, 2025
@Alkarex
Copy link
Member

Alkarex commented Feb 20, 2025

Before making a new option, I think it would be worth trying to make a more robust function:

reading_time.textContent = reading_time.textContent.replace(/(^\s*)|(\s*$)/gi, ''); // exclude start and end white-space
reading_time.textContent = reading_time.textContent.replace(/[ ]{2,}/gi, ' '); // 2 or more space to 1
reading_time.textContent = reading_time.textContent.replace(/\n /, '\n'); // exclude newline with a start spacing
return reading_time.textContent.split(' ').length;

An example of idea to investigate could be to add the number of ideograms to the number of Latin words

@Alkarex Alkarex added help wanted good first issue javascript Pull requests that update Javascript code labels Feb 20, 2025
@hkcomori
Copy link
Author

In my experience, there does not seem to be a correlation between the number of Japanese ideograms (i.e., kanji) and reading time.

I did some more research on how to count the number of words.
It requires morphological element analysis, and there seem to be several libraries for that (i.e., MeCab, kuromoji.js).
But, it seems to be difficult, for example, when proper nouns appear, the results are incorrect.

Then I believe it is much more accurate and stable to calculate reading time based on the number of letters.

@Alkarex
Copy link
Member

Alkarex commented Feb 21, 2025

We could try to use letters for all languages. PR and tests welcome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue help wanted javascript Pull requests that update Javascript code xExtension-ReadingTime xExtension-ReadingTime
Projects
None yet
Development

No branches or pull requests

3 participants