Learn how to create an Azure Serve as the use of a customized docker picture to run a Selenium internet scraper in Python

René Bremer

Selenium is the usual instrument for automatic internet browser checking out. On best of that, Selenium is a well-liked instrument for web scraping. When making a internet scraper in Azure, Azure Purposes is a logical candidate to run your code in. Then again, the default Azure Purposes picture does now not include the dependencies that Selenium calls for. On this weblog, a internet scraper in Azure Purposes is created as follows:

  • Create and deploy docker picture as Azure Serve as with Selenium
  • Scrape internet sites periodically and retailer effects

The structure of internet scraper is depicted underneath.

In the remainder the stairs are mentioned to deploy and run your internet scraper in Azure Purposes. For main points learn how to protected your Azure Purposes, see this weblog. For main points learn how to create a customized docker picture with OpenCV in Azure Purposes, see here and DockerFile here.

The bottom Azure Serve as picture does now not include the important chromium programs to run selenium webdriver. This venture creates a customized docker picture with the desired libraries such that it may be run as Azure Serve as. The next steps are achieved:

  • B01. Set up must haves
  • B02. Clone venture from GIT
  • B03. Create docker picture the use of docker desktop
  • B04. Create Azure Serve as and deploy docker picture

See additionally structure underneath.

B01. Set up dependencies

The next must haves wish to be put in:

B02. Create docker picture the use of docker desktop

Run the command underneath to clone the venture from git. When you didn’t set up git, the zip report can be downloaded and extracted.

git clone https://github.com/rebremer/azure-function-selenium.git

On this venture, the next information will also be discovered:

  • TimeTrigger/__init__.py: Python report that accommodates all code to scrape internet sites. This Azure Serve as is time induced
  • HttpTrigger/__init__.py: Identical as earlier bullet, alternatively, that is serve as HTTP induced and will also be run from a browser..
  • DockerFile: Document that accommodates all instructions to create Docker picture that will probably be utilized in the next move

B03. Create docker picture the use of docker desktop

Run the next instructions that installs chromium, chrome driving force and selenium on best of the Azure Serve as base picture:

# Variables
$acr_id = "<<your acr>>.azurecr.io"
# Create docker picture the use of docker desktop
docker login $acr_id -u <<your username>> -p <<your password>>
docker construct --tag $acr_id/selenium .
# Push docker picture to Azure Container Registry
docker push $acr_id/selenium:newest

B04. Create Azure Serve as and deploy docker picture

Run the next instructions to create an Azure Serve as and deploy the docker picture from Azure Container Registry.

# Variables
$rg = "<<your useful resource organization call>>"
$loc = "<<your location>>"
$plan = "<<your azure serve as plan P1v2>>"
$stor = "<<your garage account adhering to serve as>>"
$amusing = "<<your azure serve as call>>"
$acr_id = "<<your acr>>.azurecr.io"
# Create useful resource organization, garage account and app provider planaz organization create -n $rg -l $loc
az garage account create -n $stor -g $rg --sku Standard_LRS
az appservice plan create --name $plan --resource-group $rg --sku P1v2 --is-linux
# Create Azure Serve as the use of docker picture
az functionapp create --resource-group $rg --os-type Linux --plan $plan --deployment-container-image-name $acr_id/selenium:newest --name $amusing --storage-account $stor

The Azure Serve as that was once deployed within the earlier step accommodates a time induced serve as and an HTTP cause serve as. On this phase, the serve as will probably be induced, scrape internet sites and retailer effects to a knowledge lake account. The next steps are achieved:

  • B11. Create knowledge lake account
  • B12. Run HTTP cause Purposes

See additionally structure underneath.

B11. Create knowledge lake account and replace serve as

Execute the next instructions to create a knowledge lake account in Azure and replace the settings of the purposes.

# Variables
$rg = "<<your useful resource organization call>>"
$amusing = "<<your azure serve as call>>"
$adls = "<<your garage account>>"
$sub_id = "<<your subscription identity>>"
$container_name = "scraperesults"
# Create adlsgen2
az garage account create --name $adls --resource-group $rg --location $loc --sku Standard_RAGRS --kind StorageV2 --enable-hierarchical-namespace true
az garage container create --account-name $adls -n $container_name
# Assign identification to serve as and set params
az webapp identification assign --name $amusing --resource-group $rg
az functionapp config appsettings set --name $amusing --resource-group $rg --settings par_storage_account_name=$adls par_storage_container_name=$container_name
# Give amusing MI RBAC function to ADLS gen 2 account
$fun_object_id = az functionapp identification display --name $amusing --resource-group $rg --query 'principalId' -o tsv
New-AzRoleAssignment -ObjectId $fun_object_id -RoleDefinitionName "Garage Blob Information Contributor" -Scope "/subscriptions/$sub_id/resourceGroups/$rg/suppliers/Microsoft.Garage/storageAccounts/$adls/blobServices/default"

B12. Run Serve as

The time induced serve as will run periodically to scrape site. Then again, there could also be a HTTP induced serve as. When the URL is taken, it may be copied within the browser and run, see additionally underneath.

As soon as the serve as is administered, the consequences are saved within the knowledge lake account, see additionally underneath.

Selenium is a well-liked instrument for internet scraping. Then again, the default Azure Purposes picture does now not include the depencencies that Selenium calls for. On this weblog, a internet scraper in Azure Purposes is created that installs those dependencies as follows:

  • Create and deploy docker picture as Azure Serve as with Selenium
  • Scrape internet sites periodically and retailer effects

Structure of internet scraper is depicted underneath.

LEAVE A REPLY

Please enter your comment!
Please enter your name here