What data can you provide me?
Technically we can extract and deliver to you any data you can visually see on a website. However, there are certain legal considerations that need to be taken into account with every project (scraping behind a login, complying with terms and conditions, complying with data privacy and copyright laws, etc.) When you submit your project request our solution architecture team will work with our legal department to review your proposed project to make sure it won’t breach any legal best practices.
The act of extracting data from public websites is legal in many cases, however, in certain cases it is considered to be illegal or against web scraping best practices to do so due to specific data ownership laws that govern the data being extracted. Typically, this is the case when the terms and conditions explicitly state web scraping isn’t allowed, or when extracting the data would breach data privacy and copyright laws.
Which data extraction solution is right for me?
At Scrapinghub, we have a data extraction solution to suit any requirement. We can offer once-off data dumps, data subscriptions or professional services arrangements to help you get the data you need in the way you need it.
When you submit your project request, a member of our solution architecture team will discuss with you your project requirements in more detail and propose the best solution to meet your needs.
How does your project scoping and executing process work?
Once you’ve submitted your project request, a member of our solution architecture team will reach out to you to set up a project discovery call. There the solution architect will discuss your project in detail and gather the information they need to develop the optimal solution to meet your requirements. Within a couple days, he/she will then present you this solution for your approval.
What technology are your crawlers built with?
All our crawlers are built using Scrapy, the open source web scraping framework our founders created. Additionally, we use numerous other open source frameworks which we’ve developed that ensure your not locked into using propieratry technology. We use Crawlera as our proxy solution and Splash as a headless browser if one is required.
How do you ensure quality of the data?
At Scrapinghub we specialise in developing data extraction solutions for projects with mission critical business requirements. As a result, our number one priority is delivering high quality to our clients. To accomplish this we have implemented a four layer QA process that continuously monitors the health of our crawls and the quality of the extracted data.
How are setup fees calculated?
Setup fees may apply, subject to the nature of the project (once-off, subscription, custom), the complexity of the sites and the number of records being extracted. Our solution architecture team will assess each site on a case by case basis and you will be provided a final quote for approval
What support do you offer?
Support is available to all of our customers. We offer support for coverage issues, missed deliveries, minor site changes etc. Should there be a larger change on the site and the spider needs a complete overhaul - this may fall outside standard support offering and may incur additional cost - however - this is rare.
Can you provide the source code?
Yes, we can provide you access to the source code.
Do you offer data samples before purchasing?
Yes, if sample data is available for the data source. This can be provided in either CSV or Json format. If it is a new source we have not crawled before - sample data will be provided following development kick off. This occurs post purchasing.
Do you support one-time data extraction?
Yes we support one time extraction, get in touch to tell us your requirements or find out more here.
How will I receive my data / What format?
We offer many delivery types such as FTP, SFTP, AWS S3, Google Cloud storage, Email, Dropbox and Google Drive. Formats for delivery can be CSV, JSON, JSONLines or XML.