Automated retrieval and extraction of training course information from unstructured web pages