I need a software library written in PHP (Object Oriented) that can be used to scrape public data from Yooying, FB, YT, and TW.
The data will be scraped based on search terms or parameters such as: words, phrases, or hashtags. In addition the library should be able to provide a list of current trending Topics AND Hashtags per EACH social network. The library should apply PAGINATION to avoid over-loading the social networks servers.
The results returned should be in the form of postobjects or VIDEOS with all their attributes such as postobject ID, postobject ONWER USERNAME, postobject DESCRIPTION (TEXT, IMAGES, or VIDEO), MENTIONS (@username), LIKES, COMMENTS, SHARES, VIEWS (for videos). The library should also give functionality to query by USERNAME from the returned results in order to get FOLLOWERS (or SUBSCRIBERS), and LOCATION (country) if specified.
The library should be able to switch dynamically between IP (proxies) and scrape data within time intervals that are reasonable in order to AVOID blockage. You should provide with a list of at least 10 different pre-loaded IP proxies in order to start using the library as soon as possible.
The library should use a standard HTML DOM parser (opensource libraries). DO NOT use regular expressions to parse HTML!
There is no need for user interface in CSS, HTML, JS, I just need the OOP PHP ready to use as a library. The library should be ready to plug into Joomla CMS into the libraries folder. There is no need to make it a component or module. I want it to be a library that can be called and instantiated through a Factory constructor and be easily used within any other existing class.
If everything is delivered as I expect it and I'm satisfied with the product, I will award you another project which is the second stage of this one.
Technologies: OO PHP, Joomla, cURL
Timeline: 2 weeks