I’m using Java 7 within Eclipse in combination with Git, Maven and Jenkins for professional development and deployment.
ElasticSearch is great to implement customised and advanced search solutions, recommendation engines and for realtime analytics.
Nutch is great to create a solid and robust web crawler out of the box. Combined with ElasticSearch it’s great to build custom search solutions.
I’m using this NoSQL storage in the backend. It’s schema free, flexible and compatible with JSON.
UIMA is the Unstructured Information Managment Architecture from Apache, originally created by IBM and it’s useful to chain different annotation and analysis steps that are common in Natural Language Processing and Text Mining.
You can think of it as a data analysis pipeline.