Kind of old post, this was written in 2010
http://texlexan.sourceforge.net/
TexLexAn is the project of an automatic text analyzer, classifier and summarizer.
It
can be used to:
- Estimate the reading time and the reading difficulty.
- Categorize a text (automatic classifier).
- List keywords.
- Summarize by extraction.
- Count repetition and estimate the ratio of basic words
- Look for an eventual plagiarism.
- Evaluate sentiments.
- Archive & retrieve documents.
- Knowledge base.
It
works with:
- url links (use wget ).
- text and html files.
- pdf, odt, ppt, doc files but require pdftotext, odt2txt, ppthtml and antiword to be installed).
-TexLexAn programs are tested on Ubuntu 8.04//9.04/10.04 and FreeBSD 6.2. The binaries included in the package work on Ubuntu 10.04
Auteur: Jean-Pierre Redonnet (last update: 2010/08/13)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.