HTMLCorpus Scraper – is tool for scraping web content for text that can be used for topic modelling purposes. The tool can scrape an unlimited number of URLs to a maximum depth of 7.
The tool is helpful for producing corpus of texts for machine learning purposes. It produces a CSV file or corpus of text files – which can be used in your machine learning program for topic modelling.
- Extract article text from unlimited number of URLs.
- Extract articles as .txt files or .csv files.
- Superfast scraping process with realtime update data.
- Extracted data is also saved a non-structured database for advanced users interested in querying the data.
- Many more cool features, checkout our demo!