BART

Coreference Resolution for English

I am one of the authors of BART, a modular framework for coreference resolution. BART came to be in the JHU Summer Workshop project "Exploiting Lexical and Encyclopedic Resources for Entity Disambiguation as a joint effort, and it is being actively used and developed by multiple research groups.

Discriminative Parser

Lexicalized Parsing with Morphology
The code for the parser from (Versley and Rehbein, 2009) is available in source code from bitbucket. This includes parts that are based on code from Helmut Schmid's BitPar (included with kind permission), and is available under similar terms as BitPar, i.e. for non-commercial/research purposes only.

Python interface to CWB

Efficient Access to large Corpora
The Open Corpus Workbench (CWB) allows you to efficiently store and query large (>100M words) corpora. cwb-python is a Python interface (similar to the existing Perl one) that allows you to quickly retrieve, e.g., a certain sentence, or the occurrences of a certain word.

Blog posts

The brave new world of search engines
In an earlier post, I talked about current Google's search results in terms of personalization, and whether to like it or not. This post takes another aspect of 2011 Google search: what they do with complex queries. (more...)

Simple Pattern extraction from Google n-grams
Google has released n-gram datasets for multiple languages, including English and German. For my needs (lots of patterns, with lemmatization), writing a small bit of C++ allows me to extract pattern instances in bulk, more quickly and comfortably than with bzgrep. (more...)

Where to buy Music
After searching around a disproportionate time to find nice music that I want to buy, I decided to compile this list of internet shops that sell music in MP3 format to German citizens. (And no, I can't/won't use iTunes unless they make a Linux client).

Useful links

WCDG parser.
The Weighted Constraint Dependency Grammar parser which is one of the best parsers for German that you can get. It's available under an open source license and there is an online demo.

BitPar and SFST.
Helmut Schmid has written several tools that may come in useful in your next NLP application, including the TreeTagger, a decision-tree based part of speech tagger, BitPar, a fast PCFG parsing engine, and SFST, a set of highly useful tools for finite-state morphology analysis.

Conditional Random Fields.
Hanna Wallach has a very useful link collection on Conditional Random Fields. I'd recommend especially her tutorial on CRFs (which is also the introductory part of her MSc thesis) as well as Simon Lacoste-Juliens tutorial on SVMs, graphical models, and Max-Margin Markov Networks (also linked there).

Nice blogs

Language Log
NLPers
hunch.net
Technologies du Langage
Earning my Turns
Leiter Reports