Wednesday, August 10, 2016

TexLexAn Analyze, Classify and Summarize any text.

Kind of old post, this was written in 2010   


 http://texlexan.sourceforge.net/

TexLexAn is the project of an automatic text analyzer, classifier and summarizer. 

It can be used to:
  • Estimate the reading time and the reading difficulty.
  • Categorize a text (automatic classifier).
  • List keywords.
  • Summarize by extraction.
  • Count repetition and estimate the ratio of basic words
  • Look for an eventual plagiarism.
  • Evaluate sentiments.
  • Archive & retrieve documents.
  • Knowledge base.
It works with:
  • url links (use wget ).
  • text and html files.
  • pdf, odt, ppt, doc files but require pdftotext, odt2txt, ppthtml and antiword to be installed).
 The summarizer extracts the most relevant sentences in the text. It simplifies them a little bit by removing the sentences between the brackets and the repetition of same sentences, and replaces the deadwood expressions with their shortest forms (mode VIII). 

-TexLexAn programs are tested on Ubuntu 8.04//9.04/10.04 and FreeBSD 6.2. The binaries included in the package work on Ubuntu 10.04

Auteur: Jean-Pierre Redonnet (last update: 2010/08/13) 


No comments:

Post a Comment

Note: Only a member of this blog may post a comment.