Description of Octoparse
Octoparse is an effective data scraping tool that allows the user to easily collect publicly available data without the need for coding. This tool offers a number of features such as auto-proxy and session timing setup to bypass systems that prevent scraping. Octoparse applies advanced machine learning algorithms for quick recognition and extraction of data from complex websites. This tool can handle various types of data, including text, links, image URLs, and HTML.
Here are step by step instructions on how to set up proxy parameters using Octoparse:
- Download and install Octoparse from the developer's site. Run the application after its installation.
- Click "+New" in the upper left corner to create a new task. Among the proposed options choose "Custom Task".

- Enter the URL of the page from where you want to download data, into the URL input field. For example, let's take the site "books.toscrape.com". Then press the Save button.

- After the page loads, click on the "Settings" button in the upper right corner.

- Find at the bottom a section named "Anti-block settings".
- Check the box next to "Proxy Server Enable". In this case, proxy settings and the "Configuration" button will appear.

- Press the Configuration button and a popup window will appear. Copy and paste your stableproxy server addresses into the corresponding field. Addresses must be in IP:PORT format.
Rotating residential proxies:
IP Selection: specify an address for rotating proxies. For example, we'll choose de-1.stableproxy.com

- Set a switch timer, based on the session type and your preference.
- Press the Confirm button to save changes.
- To verify the integration performed with Octoparse, make sure there is a checkmark in the "Anti-block settings" section before the "Configuration" button.
- To save changes, press the Save button.
- You will return to the main screen of the page you are analyzing.
- Click on the icon with the picture that looks like a bulb to open it and choose whether to go by pages or enable scrolling.
- When you select an option, click on "Create workflow".

- Select an element on the page that you want to analyze, for example, "Mystery". Click on it and select "Extract the text of the selected element".
- A popup window will appear. Click Save in the upper right corner and then Run.
- The upcoming window publishes possible options. Make your choice on the most suitable for you (some options may require payment). In our case, we will choose "Local Extraction" and "Standard Mode".

- A new page will open where the scraping process will begin. You will be able to stop and resume the necessary process at any time.
- As this is just an example, we'll stop here. Confirm the stop of the run.
- Statistics about your task will be displayed. Choose when to export the data: now or later; this time we'll choose "now".
- The last popup window will offer you to choose the data extraction format.
- Choose the most suitable data format.
Done! Now your device is set up and ready to work; target task: advanced data collection from web pages using Octoparse.