The brave new world of search engines
In an earlier post, I talked about current Google's search results in terms of personalization, and whether to like it or not. This post takes another aspect of 2011 Google search: what they do with complex queries.
(more...)
Simple Pattern extraction from Google n-grams
Google has released n-gram datasets for multiple languages, including English and German. For my needs (lots of patterns, with lemmatization), writing a small bit of C++ allows me to extract pattern instances in bulk, more quickly and comfortably than with bzgrep.
(more...)
Where to buy Music
After searching around a disproportionate time to find nice
music that I want to buy, I decided to compile
this list of internet shops that sell music
in MP3 format to German citizens. (And no, I can't/won't use iTunes
unless they make a Linux client).
WCDG parser.
The
Weighted Constraint Dependency Grammar parser which is one of the best
parsers for German that you can get. It's available under an open source
license and there is an online demo.
BitPar and SFST.
Helmut Schmid
has written several tools that may come in useful in your next NLP application,
including the TreeTagger, a decision-tree based part of speech tagger,
BitPar, a fast PCFG parsing engine, and SFST, a set of highly useful tools
for finite-state morphology analysis.
Conditional Random Fields.
Hanna Wallach has a very useful link collection
on Conditional Random Fields. I'd
recommend especially her tutorial on CRFs (which is also the
introductory part of her MSc thesis) as well as Simon Lacoste-Juliens
tutorial on SVMs, graphical models, and Max-Margin Markov Networks
(also linked there).