Buerki, Andreas ORCID: https://orcid.org/0000-0003-2151-3246 2013. N-Gram Processor. GitHub. |
Abstract
The N-Gram Processor is a set of scripts and a Perl module allowing the creation and processing of n-gram lists out of text files. The feature set of the N-Gram Processor is simple enough: - creation of word n-gram lists out of input text, with n-gram frequencies - listing of document counts (in how many docs an n-gram occurs) - combination of large numbers of lists (of one n) into a single list - unicode support - support for processing of reasonably large corpora (depending on hardware) - support for processing of annotated corpora Please refer to the manual for a more detailed description. The NGP is a branch of the Ngram Statistics Package (NSP, v1.09) by Ted Pedersen and collaborators including code of the v1.10 re-write by Bjoern Wilmsmann.
Item Type: | Other |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | English, Communication and Philosophy |
Subjects: | P Language and Literature > P Philology. Linguistics Q Science > QA Mathematics > QA76 Computer software |
Publisher: | GitHub |
Last Modified: | 28 Oct 2022 10:23 |
URI: | https://orca.cardiff.ac.uk/id/eprint/78130 |
Actions (repository staff only)
Edit Item |