Python, Boto and AWS EC2

- - Python, Tutorials

Most if not all software companies have adopted to cloud infrastructure and services. AWS in particular is very popular amongst all. The intentions of this post is to host a few examples on using boto to make use of one of the services available on AWS i.e EC2. It is more likely than not to have need of a mechanism to programatically fire up a few instances, shut them down, filter instances and send remote commands to it to say the least.

Filter instances based on tag names from the AWS inventory

EC2 instances on AWS can have as many tag names key: value as required for purposes like identifying an instance or a set of instances. Also when the instance you are working on quite frequently needs to shut down and boot over again and you haven’t implemented elastic IP, you are bound to changes in the public IP address. Although you could argue to use private IP to filter an instance, it isn’t very effective when you have a lot of instances(>100).

Boto2
import boto.ec2

conn = boto.ec2.connect_to_region('us-east-1', aws_access_key_id='aws_access_id', aws_secret_access_key='aws_secret')
reservations = conn.get_all_instances(filters={'tagName' : 'value'})
public_ips = [each_instance.ip_address for r in reservations for each_instance in r.instances]
# each_instance.private_ip_address  to get the private ip address of the instance
Boto3
import boto3
session = boto3.session.Session(aws_access_key_id=aws_access_id,
                                aws_secret_access_key=aws_secret,
                                region_name='us-east-1')
 
ec2 = session.resource('ec2')
instances = ec2.instances.filter(
    Filters=[{'Name':'tag:purpose', 'Values':['intelligence']}
])
public_ips = [each_instance.public_ip_address for each_instance in instances]
# each_instance.private_ip_address to get the private ip address of the instance
Boot/Shutdown an instance/instances from the AWS inventory

Using boto, you can boot/shutdown/terminate instances.

Boto2
def start_stop_terminate_instance(instance_ids, conn, action='start'):
    if action == 'start':
        conn.start_instances(instance_ids=instance_ids)
    elif action == 'stop':
        conn.stop_instances(instance_ids=instance_ids)
    elif action == 'terminate':
        conn.terminate_instances(instance_ids=ids)
Boto3
def start_stop_terminate_instance(instance_ids, conn, action='start'):
    if action == 'start':
        conn.instances.filter(InstanceIds=instance_ids).start()
    elif action == 'stop':
        conn.instances.filter(InstanceIds=instance_ids).stop()
    elif action == 'terminate':
        conn.instances.filter(InstanceIds=instance_ids).terminate()
Create Instances based on various metrics

Boto makes use of the AWS APIs that also allows creating instances. An EC2 instance can have various properties. The most common is the type of the instance. Types are generally a grouping of instances based on metrics such as power, performance, bandwidth. Commonly used types for general purpose are t2, m4, m3. C5, c4, c3 are compute optimized instances. For a process/application more leaned towards in-memory activities, you’d use x1, r4, r3. There are other types too but the above mentioned are quite common in use. The other properties of an instance are instance id, the memory size (micro, nano, small, large, xlarge, 2xlarge, 4xlarge, 8xlarge, 10xlarge.), the key pair to make a secured connection to the instance, tag names, display names, security groups, attached storage id, etc. Using boto we can create an instance or multiple instances based on the above mentioned parameters.

Boto2
import boto.ec2
conn = boto.ec2.connect_to_region('us-east-1', aws_access_key_id='aws_access_id', aws_secret_access_key='aws_secret')
conn.run_instances(
    'ami-ag139jf',
    min_count=10, 
    max_count=100,
    key_name='myKey',
    instance_type='t2.small',
    security_groups=['sg-4512']
)
Boto3
import boto3
session = boto3.session.Session(aws_access_key_id='aws_access_id',
                                aws_secret_access_key='aws_secret',
                                region_name='us-east-1')
 
ec2 = session.resource('ec2')
ec2.create_instances(
    ImageId='ami-ag139jf', 
    MinCount=10, 
    MaxCount=100, 
    InstanceType='t2.small',
    KeyName='myKey',
    SecurityGroups=['sg-4512']
)
Send remote commands to an EC2 instance

Paramiko can be used for connecting to a remote instance and sending commands to be executed and get the standard output/error to act accordingly.

import paramiko

key = paramiko.RSAKey.from_private_key_file(path_to_pem_file)
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())

# Connect to the instance
try:
    # using username, public ip address and the pem file, create connection to the instance
    client.connect(hostname=instance_ip, username="ubuntu", pkey=key)

    # Execute command remotely.
    stdin, stdout, stderr = client.exec_command(“ls -l”)
    print stdout.read()
    client.close()

except Exception, e:
    print e

Web Scraping using Golang

- - Golang, Tutorials

Web Scraping can be beneficial to individuals and companies. The intentions of this post is to host a set of examples on Web Scraping using Golang and goquery. I will be using github’s trending page https://github.com/trending throughout this post for the examples, especially because it best suits for applying various goquery methods. There are two other versions of this article which replicates the same set of examples in Python and NodeJS.

Installation

go get github.com/PuerkitoBio/goquery

Get html of a page
package main
import (
    "log"
    "io"
    "os"
    "net/http"
)

func ScrapeHTML(){
    resp, err := http.Get("https://github.com/trending")
    if err != nil{
        log.Fatal(err)
    }
    defer resp.Body.Close()

    if resp.StatusCode != 200{
        log.Fatalf("status code error: %d %s", resp.StatusCode, resp.Status)
    }
    io.Copy(os.Stdout, resp.Body)    
}

func main(){
    ScrapeHTML()
}

Using goquery(golang library) to get title from a page

package main
import (
    "fmt"
    "log"
    "net/http"
    "github.com/PuerkitoBio/goquery"
)

func ScrapeHTML(){
    resp, err := http.Get("https://github.com/trending")
    if err != nil{
        log.Fatal(err)
    }
    defer resp.Body.Close()

    if resp.StatusCode != 200{
        log.Fatalf("status code error: %d %s", resp.StatusCode, resp.Status)
    }

    doc, err := goquery.NewDocumentFromReader(resp.Body)
  if err != nil {
        log.Fatal(err)
    }
    fmt.Println(doc.Find("title").Text())
   
}

func main(){
    ScrapeHTML()
}
Output

$ go run example.go
Trending repositories on GitHub today · GitHub

Using goquery, Find single element by tag name, find multiple elements by tag name
package main
import (
    "fmt"
    "log"
    "strings"
    "net/http"
    "github.com/PuerkitoBio/goquery"
)


func scrapeUsingTagNames(){
    resp, err := http.Get("https://github.com/trending")
    if err != nil{
        log.Fatal(err)
    }
    defer resp.Body.Close()

    if resp.StatusCode != 200{
        log.Fatalf("Status code error: %d %s", resp.StatusCode, resp.Status)
    }
    doc, err := goquery.NewDocumentFromReader(resp.Body)
    if err != nil{
        log.Fatal(err)
    }
    fmt.Println(doc.Find("title").Text())

    doc.Find("ol li").Each(func(i int, s *goquery.Selection){
        fmt.Println(strings.TrimSpace(s.Find("h3").Text()))
    })
}

func main(){
    scrapeUsingTagNames()
}
Output


$ go run example.go
Trending repositories on GitHub today · GitHubAsset 1Asset 1
you-dont-need / You-Dont-Need-Momentjs
ripienaar / free-for-dev
Nozbe / WatermelonDB
cjbarber / ToolsOfTheTrade
byoungd / English-level-up-tips-for-Chinese
TheAlgorithms / Python
thedaviddias / Front-End-Checklist
zziz / pwc
dawnlabs / carbon
CyC2018 / CS-Notes
Avik-Jain / 100-Days-Of-ML-Code
donnemartin / system-design-primer
mariusandra / pigeon-maps
Snailclimb / JavaGuide
JavaNoober / BackgroundLibrary
crossoverJie / JCSprout
Microsoft / nni
PansonPanson / Java-Notes
date-fns / date-fns
sindresorhus / ky
mciastek / sal
rwv / chinese-dos-games
vuejs / vue
GoogleCloudPlatform / open-match
lin-xin / vue-manage-system

Getting Attributes of an element
package main
import (
    "fmt"
    "log"
    "net/http"
    "github.com/PuerkitoBio/goquery"
)

func scrapeAttributes(){
    resp, err := http.Get("https://github.com/trending")
    if err != nil{
        log.Fatal(err)
    }
    defer resp.Body.Close()

    if resp.StatusCode != 200{
        log.Fatalf("Status code error: %d %s", resp.StatusCode, resp.Status)
    }
    doc, err := goquery.NewDocumentFromReader(resp.Body)
    if err != nil{
        log.Fatal(err)
    }
    fmt.Println(doc.Find("title").Text())

    doc.Find("ol li").Each(func(i int, s *goquery.Selection){
        href, has_attr := s.Find("a").First().Attr("href")
        if has_attr{
            fmt.Println("https://github.com" + href)
        }

    })
}

func main(){
    scrapeAttribtutes()
}

Output


$ go run example.go
Trending repositories on GitHub today · GitHubAsset 1Asset 1
https://github.com/you-dont-need/You-Dont-Need-Momentjs
https://github.com/ripienaar/free-for-dev
https://github.com/Nozbe/WatermelonDB
https://github.com/cjbarber/ToolsOfTheTrade
https://github.com/byoungd/English-level-up-tips-for-Chinese
https://github.com/TheAlgorithms/Python
https://github.com/thedaviddias/Front-End-Checklist
https://github.com/zziz/pwc
https://github.com/dawnlabs/carbon
https://github.com/CyC2018/CS-Notes
https://github.com/Avik-Jain/100-Days-Of-ML-Code
https://github.com/donnemartin/system-design-primer
https://github.com/mariusandra/pigeon-maps
https://github.com/Snailclimb/JavaGuide
https://github.com/JavaNoober/BackgroundLibrary
https://github.com/crossoverJie/JCSprout
https://github.com/Microsoft/nni
https://github.com/PansonPanson/Java-Notes
https://github.com/date-fns/date-fns
https://github.com/sindresorhus/ky
https://github.com/mciastek/sal
https://github.com/rwv/chinese-dos-games
https://github.com/vuejs/vue
https://github.com/GoogleCloudPlatform/open-match
https://github.com/lin-xin/vue-manage-system

 

Using class name or other attributes to get element
package main
import (
    "fmt"
    "log"
    "strings"
    "net/http"
    "github.com/PuerkitoBio/goquery"
)
func scrapeViaClassName(){
    resp, err := http.Get("https://github.com/trending")
    if err != nil{
        log.Fatal(err)
    }
    defer resp.Body.Close()

    if resp.StatusCode != 200{
        log.Fatalf("Status code error: %d %s", resp.StatusCode, resp.Status)
    }
    doc, err := goquery.NewDocumentFromReader(resp.Body)
    if err != nil{
        log.Fatal(err)
    }
    fmt.Println(doc.Find("title").Text())

    doc.Find("ol li").Each(func(i int, s *goquery.Selection){
        fmt.Println(strings.TrimSpace(s.Find(".float-sm-right").Text()))
    })
}


func main(){
    scrapeViaClassName()
}
Output


$ go run example.go
Trending repositories on GitHub today · GitHub
625 stars today
476 stars today
407 stars today
392 stars today
332 stars today
316 stars today
304 stars today
274 stars today
249 stars today
201 stars today
206 stars today
188 stars today
192 stars today
165 stars today
154 stars today
141 stars today
153 stars today
146 stars today
153 stars today
149 stars today
145 stars today
134 stars today
124 stars today
137 stars today
117 stars today

Navigate childrens from an element
package main
import (
    "fmt"
    "log"
    "strings"
    "net/http"
    "github.com/PuerkitoBio/goquery"
)
func navigateChildrens(){
    resp, err := http.Get("https://github.com/trending")
    if err != nil{
        log.Fatal(err)
    }
    defer resp.Body.Close()

    if resp.StatusCode != 200{
        log.Fatalf("Status code error: %d %s", resp.StatusCode, resp.Status)
    }
    doc, err := goquery.NewDocumentFromReader(resp.Body)
    if err != nil{
        log.Fatal(err)
    }
    fmt.Println(doc.Find("title").Text())
    olSelection := doc.Find("ol")
    olSelection.Children().Each(func(i int, s *goquery.Selection){ // using .Children() on the ol selection to get all li
        fmt.Println(strings.TrimSpace(s.Find("h3").Text()))
    })
}


func main(){
    navigateChildrens()
}
Output


$ go run example.go
Trending repositories on GitHub today · GitHub
you-dont-need / You-Dont-Need-Momentjs
ripienaar / free-for-dev
Nozbe / WatermelonDB
cjbarber / ToolsOfTheTrade
byoungd / English-level-up-tips-for-Chinese
TheAlgorithms / Python
thedaviddias / Front-End-Checklist
zziz / pwc
dawnlabs / carbon
CyC2018 / CS-Notes
Avik-Jain / 100-Days-Of-ML-Code
donnemartin / system-design-primer
mariusandra / pigeon-maps
Snailclimb / JavaGuide
JavaNoober / BackgroundLibrary
crossoverJie / JCSprout
Microsoft / nni
PansonPanson / Java-Notes
date-fns / date-fns
sindresorhus / ky
mciastek / sal
rwv / chinese-dos-games
vuejs / vue
GoogleCloudPlatform / open-match
lin-xin / vue-manage-system

The .children will only return the immediate childrens of the parent element.

Navigating previous and next siblings of an element
package main
import (
    "fmt"
    "log"
    "strings"
   "net/http"
    "github.com/PuerkitoBio/goquery"
)

func navigateSiblings(){
    resp, err := http.Get("https://github.com/trending")
    if err != nil{
        log.Fatal(err)
    }
    defer resp.Body.Close()
    if resp.StatusCode != 200{
        log.Fatalf("Status code error: %d %s", resp.StatusCode, resp.Status)
    }
    doc, err := goquery.NewDocumentFromReader(resp.Body)
    if err != nil{
        log.Fatal(err)
    }
    fmt.Println(doc.Find("title").Text())
    liSelection := doc.Find("ol li")
    fifthElement := liSelection.Eq(4) // using Eq() and passing the index we can navigate to the element with given index
    fmt.Println(strings.TrimSpace(fifthElement.Find("h3").Text()))
    fourthElement := fifthElement.Prev()
    fmt.Println(strings.TrimSpace(fourthElement.Find("h3").Text()))
    sixthElement := fifthElement.Next()
    fmt.Println(strings.TrimSpace(sixthElement.Find("h3").Text()))
}


func main(){
   navigateSiblings()
}
Output


$ go run example.go
Trending repositories on GitHub today · GitHub
byoungd / English-level-up-tips-for-Chinese
cjbarber / ToolsOfTheTrade
TheAlgorithms / Python

Putting it all together(Github Trending Scraper using Golang)
package main
import (
    "fmt"
    "log"
    "strings"
    //"io"
    //"os"
    "net/http"
    "github.com/PuerkitoBio/goquery"
)

func githubTrendingScraper(){
    resp, err := http.Get("https://github.com/trending")
    if err != nil{
        log.Fatal(err)
    }
    defer resp.Body.Close()
    if resp.StatusCode != 200{
        log.Fatalf("Status code error: %d %s", resp.StatusCode, resp.Status)
    }
    doc, err := goquery.NewDocumentFromReader(resp.Body)
    if err != nil{
        log.Fatal(err)
    }

    fmt.Println(doc.Find("title").Text())
    doc.Find("ol li").Each(func (i int, s *goquery.Selection){
        repositoryName := strings.TrimSpace(s.Find("h3").Text())
        totalStarsToday := strings.TrimSpace(s.Find(".float-sm-right").Text())
        href, has_attr := s.Find("a").Attr("href")
        if !has_attr{
            href = "No valid url found"
        }
        fmt.Println(repositoryName, "\t", totalStarsToday, "\t", "https://github.com" + href)
    })

}


func main(){
   githubTrendingScraper()
}
Output


$ go run example.go
Trending repositories on GitHub today · GitHub
you-dont-need / You-Dont-Need-Momentjs      625 stars today https://github.com/you-dont-need/You-Dont-Need-Momentjs
ripienaar / free-for-dev                    476 stars today https://github.com/ripienaar/free-for-dev
Nozbe / WatermelonDB                        407 stars today https://github.com/Nozbe/WatermelonDB
cjbarber / ToolsOfTheTrade                  392 stars today https://github.com/cjbarber/ToolsOfTheTrade
byoungd / English-level-up-tips-for-Chinese 332 stars today https://github.com/byoungd/English-level-up-tips-for-Chinese
TheAlgorithms / Python                      316 stars today https://github.com/TheAlgorithms/Python
thedaviddias / Front-End-Checklist          304 stars today https://github.com/thedaviddias/Front-End-Checklist
zziz / pwc                                  274 stars today https://github.com/zziz/pwc
dawnlabs / carbon                           249 stars today https://github.com/dawnlabs/carbon
CyC2018 / CS-Notes                          201 stars today https://github.com/CyC2018/CS-Notes
Avik-Jain / 100-Days-Of-ML-Code             206 stars today https://github.com/Avik-Jain/100-Days-Of-ML-Code
donnemartin / system-design-primer          188 stars today https://github.com/donnemartin/system-design-primer
mariusandra / pigeon-maps                   192 stars today https://github.com/mariusandra/pigeon-maps
Snailclimb / JavaGuide                      165 stars today https://github.com/Snailclimb/JavaGuide
JavaNoober / BackgroundLibrary              154 stars today https://github.com/JavaNoober/BackgroundLibrary
crossoverJie / JCSprout                     141 stars today https://github.com/crossoverJie/JCSprout
Microsoft / nni                             153 stars today https://github.com/Microsoft/nni
PansonPanson / Java-Notes                   146 stars today https://github.com/PansonPanson/Java-Notes
date-fns / date-fns                         153 stars today https://github.com/date-fns/date-fns
sindresorhus / ky                           149 stars today https://github.com/sindresorhus/ky
mciastek / sal                              145 stars today https://github.com/mciastek/sal
rwv / chinese-dos-games                     134 stars today https://github.com/rwv/chinese-dos-games
vuejs / vue                                 124 stars today https://github.com/vuejs/vue
GoogleCloudPlatform / open-match            137 stars today https://github.com/GoogleCloudPlatform/open-match
lin-xin / vue-manage-system                 117 stars today https://github.com/lin-xin/vue-manage-system

vis.js Network Examples

- - JavaScript, Tutorials

The intentions of this post is to host example code snippets so people can take ideas from it to make great visualization for themselves using visJS. VisJS is a dynamic, browser based visualization library. The library is designed to be easy to use, to handle large amounts of dynamic data, and to enable manipulation of and interaction with the data. The library consists of the components DataSet, Timeline, Network, Graph2d and Graph3d.

VisJS network [Nodes as images with label]
<!doctype html>
<html>
<head>
  <title>Bhishan's Services | TheTaraNights</title>

  <style type="text/css">

    body {
      font: 10pt arial;
    }
    #mynetwork {
      width: 100%;
      height: 900px;
      border: 1px solid lightgray;
      background-color:#333333;
    }
  </style>

  <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis.js"></script>
  <link href="https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis-network.min.css" rel="stylesheet" type="text/css" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
  <script type="text/javascript">
    var nodes = null;
    var edges = null;
    var network = null;

    // Called when the Visualization API is loaded.
    function draw() {
      // create people.
      var DIR = 'https://www.thetaranights.com/wp-content/uploads/2018/fiverr_reviews/';

      var ratings = '<span class="fa fa-star checked"></span><span class="fa fa-star checked"></span><span class="fa fa-star checked"></span><span class="fa fa-star checked"></span><span class="fa fa-star checked"></span><br>';
      nodes = [
        {id: 1, shape: 'circularImage', image: "https://avatars3.githubusercontent.com/u/6163574?s=400&u=c396866f2c1ca2a709c1ece0d5a352e0f6e1a865&v=4", label:"Bhishan", size:60},
        // {id: 1, shape: 'circularImage', image: DIR + "bhishan.png", label:"Bhishan", size:60},

        {id: 2,  shape: 'circularImage', image: DIR + 'Benf97/cropped(1).png', label:"Benf97" },
        {id: 3,  shape: 'circularImage', image: DIR + 'Bigyoyo55/cropped(2).png', label:"Bigyoyo55"},
        {id: 4,  shape: 'circularImage', image: DIR + 'Btpope22/cropped(3).png', label:"Btpope22"},
        {id: 5,  shape: 'circularImage', image: DIR + 'Chrisraven/cropped(4).png', label: "Chrisraven"},
        {id: 6,  shape: 'circularImage', image: DIR + 'Cnnsbs/cropped(5).png', label:"Cnnsbs"},
        {id: 7,  shape: 'circularImage', image: DIR + 'Danielemariotto/cropped(6).png', label:"Danielemariotto"},
        {id: 8,  shape: 'circularImage', image: DIR + 'Davideguerrini/cropped(7).png', label:"Davideguerrini"},
        {id: 9,  shape: 'circularImage', image: DIR + 'Den_bdt/cropped(8).png', label:"Den_bdt"},
        {id: 10, shape: 'circularImage', image: DIR + 'Devinsays/cropped(9).png', label:"Devinsays"},

      ];

      var container = document.getElementById('mynetwork');
      var data = {
        nodes: nodes,
      };
      var options = {

        font:{
          size: 100,
        },
        physics: {
            barnesHut: {
              avoidOverlap: 1,
              centralGravity: 0.2,
            },
      },
      nodes: {
          size:40,
          color: {
            background: '#006400'
          },
          font:{color:'#eeeeee', "size": 30},

        },

      };
      network = new vis.Network(container, data, options);

    }
  </script>

</head>

<body onload="draw()">

<div id="mynetwork"></div>

</body>
</html>

VisJs Edges between nodes

<!doctype html>
<html>
<head>
  <title>Bhishan's Services | TheTaraNights</title>

  <style type="text/css">

    body {
      font: 10pt arial;
    }
    #mynetwork {
      width: 100%;
      height: 900px;
      border: 1px solid lightgray;
      background-color:#333333;
    }
  </style>

  <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis.js"></script>
  <link href="https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis-network.min.css" rel="stylesheet" type="text/css" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
  <script type="text/javascript">

    var nodes = null;
    var edges = null;
    var network = null;

    // Called when the Visualization API is loaded.
    function draw() {
      // create people.
      var DIR = 'https://www.thetaranights.com/wp-content/uploads/2018/fiverr_reviews/';

      var ratings = '<span class="fa fa-star checked"></span><span class="fa fa-star checked"></span><span class="fa fa-star checked"></span><span class="fa fa-star checked"></span><span class="fa fa-star checked"></span><br>';
      nodes = [
        {id: 1, shape: 'circularImage', image: "https://avatars3.githubusercontent.com/u/6163574?s=400&u=c396866f2c1ca2a709c1ece0d5a352e0f6e1a865&v=4", label:"Bhishan", size:60},
        {id: 2,  shape: 'circularImage', image: DIR + 'Benf97/cropped(1).png', label:"Benf97" },
        {id: 3,  shape: 'circularImage', image: DIR + 'Bigyoyo55/cropped(2).png', label:"Bigyoyo55"},
        {id: 4,  shape: 'circularImage', image: DIR + 'Btpope22/cropped(3).png', label:"Btpope22"},
        {id: 5,  shape: 'circularImage', image: DIR + 'Chrisraven/cropped(4).png', label: "Chrisraven"},
        {id: 6,  shape: 'circularImage', image: DIR + 'Cnnsbs/cropped(5).png', label:"Cnnsbs"},
        {id: 7,  shape: 'circularImage', image: DIR + 'Danielemariotto/cropped(6).png', label:"Danielemariotto"},
        {id: 8,  shape: 'circularImage', image: DIR + 'Davideguerrini/cropped(7).png', label:"Davideguerrini"},
        {id: 9,  shape: 'circularImage', image: DIR + 'Den_bdt/cropped(8).png', label:"Den_bdt"},
        {id: 10, shape: 'circularImage', image: DIR + 'Devinsays/cropped(9).png', label:"Devinsays"},

      ];

      // create connections between people

      edges = [];
      for(var i =2; i<11; i++){
        edges.push({from: 1, to: i});
      }

      var container = document.getElementById('mynetwork');
      var data = {
        nodes: nodes,
        edges: edges
      };
      var options = {

        nodes: {
            size:40,
              color: {
              background: '#006400'
            },
            font:{color:'#eeeeee', "size": 10},
        },

      };
      network = new vis.Network(container, data, options);

    }
  </script>

</head>

<body onload="draw()">

<div id="mynetwork"></div>

</body>
</html>

VisJS onclick() and onhover()
<!doctype html>
<html>
<head>
  <title>Bhishan's Services | TheTaraNights</title>

  <style type="text/css">

    body {
      font: 10pt arial;
    }
    #mynetwork {
      width: 100%;
      height: 900px;
      border: 1px solid lightgray;
      background-color:#333333;
    }
  </style>

  <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis.js"></script>
  <link href="https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis-network.min.css" rel="stylesheet" type="text/css" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
  <script type="text/javascript">
    // var DIR = 'https://www.thetaranights.com/wp-content/uploads/2018/fiverr_reviews/';

    var nodes = null;
    var edges = null;
    var network = null;

    // Called when the Visualization API is loaded.
    function draw() {
      // create people.
      var DIR = 'https://www.thetaranights.com/wp-content/uploads/2018/fiverr_reviews/';

      var ratings = '<span class="fa fa-star checked"></span><span class="fa fa-star checked"></span><span class="fa fa-star checked"></span><span class="fa fa-star checked"></span><span class="fa fa-star checked"></span><br>';
      nodes = [
        {id: 1, shape: 'circularImage', image: "https://avatars3.githubusercontent.com/u/6163574?s=400&u=c396866f2c1ca2a709c1ece0d5a352e0f6e1a865&v=4", label:"Bhishan", size:60},
        {id: 2,  shape: 'circularImage', image: DIR + 'Benf97/cropped(1).png', label:"Benf97", "title": ratings + 'This guy is honestly amazing! <br>I have never experienced such pure dedication and commitment to a task, despite struggles, and a final product of which is outstanding!'},
        {id: 3,  shape: 'circularImage', image: DIR + 'Bigyoyo55/cropped(2).png', label:"Bigyoyo55", "title": ratings + "Very good seller mastering python and script development ! <br>Thanks for all<br>" + ratings + "Very good python developper, and very patient !<br> Thanks for all"},
        {id: 4,  shape: 'circularImage', image: DIR + 'Btpope22/cropped(3).png', label:"Btpope22", "title": ratings + "I really needed help getting anaconda installed on my laptop - it wasn't working correctly.<br> He was available right away and quickly assessed then fixed my issue."},
        {id: 5,  shape: 'circularImage', image: DIR + 'Chrisraven/cropped(4).png', label: "Chrisraven", "title": ratings + "Outstanding Experience!"},
        {id: 6,  shape: 'circularImage', image: DIR + 'Cnnsbs/cropped(5).png', label:"Cnnsbs", "title": ratings + "Awesome!!! I want to ask for work next time. <br>He wokred exact what I want."},
        {id: 7,  shape: 'circularImage', image: DIR + 'Danielemariotto/cropped(6).png', label:"Danielemariotto", "title": ratings + "Outstanding Experience!<br>" + ratings + "Outstanding Experience!"},
        {id: 8,  shape: 'circularImage', image: DIR + 'Davideguerrini/cropped(7).png', label:"Davideguerrini", "title": ratings + "A really great experience.<br> Bhishan Is very quick and professional. <br>A pleasure to work with him!"},
        {id: 9,  shape: 'circularImage', image: DIR + 'Den_bdt/cropped(8).png', label:"Den_bdt", "title": ratings + "Outstanding Experience!"},
        {id: 10, shape: 'circularImage', image: DIR + 'Devinsays/cropped(9).png', label:"Devinsays", "title": ratings + "Bhishan built a script for us to get Facebook data through their API.<br> Delivered quickly, quality job.<br>" + ratings + "Bhishan got the Alexa rank for about 2,000 websites we had in a spreadsheet."},

      ];

      // create connections between people

      edges = [];
      for(var i =2; i<11; i++){
        edges.push({from: 1, to: i});
      }

      var container = document.getElementById('mynetwork');
      var data = {
        nodes: nodes,
        edges: edges
      };
      var options = {

        nodes: {
            size:40,
              color: {
              background: '#006400'
            },
            font:{color:'#eeeeee', "size": 10},
        },

      };
      network = new vis.Network(container, data, options);

      network.on("click", function (params) {
          params.event = "[original event]";

      });
      network.on("hoverNode", function (params) {
          params.event = "[original event]";

      });


    }
  </script>

</head>

<body onload="draw()">

<div id="mynetwork"></div>

</body>
</html>

Putting it all together [Bhishan’s Freelance Network]
<!doctype html>
<html>
<head>
  <title>Bhishan's Services | TheTaraNights</title>

  <style type="text/css">
  .checked {
    color: orange;
},
    body {
      font: 10pt arial;
    }
    #mynetwork {
      width: 100%;
      height: 900px;
      border: 1px solid lightgray;
      background-color:#333333;
    }
  </style>

  <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis.js"></script>
  <link href="https://cdnjs.cloudflare.com/ajax/libs/vis/4.21.0/vis-network.min.css" rel="stylesheet" type="text/css" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
  <script type="text/javascript">

    var nodes = null;
    var edges = null;
    var network = null;

    // Called when the Visualization API is loaded.
    function draw() {
      // create people.

      var DIR = 'https://www.thetaranights.com/wp-content/uploads/2018/fiverr_reviews/';


      var ratings = '<span class="fa fa-star checked"></span><span class="fa fa-star checked"></span><span class="fa fa-star checked"></span><span class="fa fa-star checked"></span><span class="fa fa-star checked"></span><br>';
      nodes = [
        {id: 2,  shape: 'circularImage', image: DIR + 'Benf97/cropped(1).png', label:"Benf97", "title": ratings + 'This guy is honestly amazing! <br>I have never experienced such pure dedication and commitment to a task, despite struggles, and a final product of which is outstanding!'},
        {id: 3,  shape: 'circularImage', image: DIR + 'Bigyoyo55/cropped(2).png', label:"Bigyoyo55", "title": ratings + "Very good seller mastering python and script development ! <br>Thanks for all<br>" + ratings + "Very good python developper, and very patient !<br> Thanks for all"},
        {id: 4,  shape: 'circularImage', image: DIR + 'Btpope22/cropped(3).png', label:"Btpope22", "title": ratings + "I really needed help getting anaconda installed on my laptop - it wasn't working correctly.<br> He was available right away and quickly assessed then fixed my issue."},
        {id: 5,  shape: 'circularImage', image: DIR + 'Chrisraven/cropped(4).png', label: "Chrisraven", "title": ratings + "Outstanding Experience!"},
        {id: 6,  shape: 'circularImage', image: DIR + 'Cnnsbs/cropped(5).png', label:"Cnnsbs", "title": ratings + "Awesome!!! I want to ask for work next time. <br>He wokred exact what I want."},
        {id: 7,  shape: 'circularImage', image: DIR + 'Danielemariotto/cropped(6).png', label:"Danielemariotto", "title": ratings + "Outstanding Experience!<br>" + ratings + "Outstanding Experience!"},
        {id: 8,  shape: 'circularImage', image: DIR + 'Davideguerrini/cropped(7).png', label:"Davideguerrini", "title": ratings + "A really great experience.<br> Bhishan Is very quick and professional. <br>A pleasure to work with him!"},
        {id: 9,  shape: 'circularImage', image: DIR + 'Den_bdt/cropped(8).png', label:"Den_bdt", "title": ratings + "Outstanding Experience!"},
        {id: 10, shape: 'circularImage', image: DIR + 'Devinsays/cropped(9).png', label:"Devinsays", "title": ratings + "Bhishan built a script for us to get Facebook data through their API.<br> Delivered quickly, quality job.<br>" + ratings + "Bhishan got the Alexa rank for about 2,000 websites we had in a spreadsheet."},
        {id: 11, shape: 'circularImage', image: DIR + 'Dxbst18/cropped(10).png', label:"Dxbst18", "title": ratings + "Outstanding Experience!"},
        {id: 12, shape: 'circularImage', image: DIR + 'Elirosenblatt/cropped(11).png', label:"Elirosenblatt", "title": ratings + "Bhishan, as always, was quick to help troubleshoot and went beyond my expectations.<br> A++ highly recommended!<br>" + ratings + "bhishan did a FANTASTIC job. Great communication, and script worked perfectly.<br> Would for sure do business with him again!"},
        {id: 13, shape: 'circularImage', image: DIR + 'Everinearnest/cropped(12).png', label:"Everinearnest", "title": ratings + "Bhishan is a skilled troubleshooter who had a lot of patience with the unusual problems I was experiencing.<br> He is a very good communicator who explains the rationale for what he is doing, and is a pleasure to work with. <br>Highly recommended."},
        {id: 14, shape: 'circularImage', image: DIR + 'Faizallala/cropped(13).png', label:"Faizallala", "title": ratings + "Outstanding Experience!"},
        {id: 15, shape: 'circularImage', image: DIR + 'Geoman2222/cropped(14).png', label:"Geoman2222", "title": ratings + "SOOOO Fast He was Awsome<br>" + ratings + "Very Knowledgeable and concurred any task I gave him"},
        {id: 16, shape: 'circularImage', image: DIR + 'Gmorinan/cropped(15).png', label:"Gmorinan", "title": ratings + "Outstanding"},
        {id: 17,  shape: 'circularImage', image: DIR + 'Babasanfoor/cropped.png', label:"Babasanfoor", "title":ratings + "great experience<br>" + ratings + 'Excellent communication, excellent proficiency, will (and already have) order again.'},
        {id: 18,  shape: 'circularImage', image: DIR + 'Goaway77/cropped(16).png', label:"Goaway77", "title": ratings + "He solved the problem! Thanks a lot, A+++ coder! Highly recommended! "},
        {id: 19,  shape: 'circularImage', image: DIR + 'Hartlepool/cropped(17).png', label:"Hartlepool", "title": ratings + "Great job, very helpful"},
        {id: 20,  shape: 'circularImage', image: DIR + 'Hkinq1/cropped(18).png', label:"Hkinq1", "title": ratings + "Outstanding Experience!"},
        {id: 21,  shape: 'circularImage', image: DIR + 'Ikeqian/cropped(19).png', label:"Ikeqian", "title": ratings + "Pretty good. <br>Very fast and helpful!"},
        {id: 22,  shape: 'circularImage', image: DIR + 'Iveksl2/cropped(20).png', label:"Iveksl2", "title": ratings + "Outstanding Experience!<br>" + ratings + "bhishan wrote readable / concise code. <br>Did a good job"},
        {id: 23,  shape: 'circularImage', image: DIR + 'Jamoxie/cropped(21).png', label:"Jamoxie", "title": ratings + "Bhishan is the best coder that I've worked with: clear and thorough communication, well documented code, fast working and delivers before deadlines.<br> Also he exceeded my request with a feature that I didn't think could be added yet, so I could not be happier with this project. <br>Will be back soon!"},
        {id: 24,  shape: 'circularImage', image: DIR + 'Jeamer/cropped(22).png', label:"Jeamer", "title": ratings + "Outstanding Experience!"},
        {id: 25,  shape: 'circularImage', image: DIR + 'Jeffmactech/cropped(23).png', label:"Jeffmactech", "title": ratings + "Delivered EXACTLY what I needed! Thanks!"},
        {id: 26,  shape: 'circularImage', image: DIR + 'Jlondonuk/cropped(24).png', label:"Jlondonuk", "title": ratings + "Absolutely fantastic service! <br>Definitely buying again. I am so impressed!<br> Thank you Seller was kind, assuring, professional and completed it quickly!"},
        {id: 27,  shape: 'circularImage', image: DIR + 'Kiwibloke11/cropped(25).png', label:"Kiwibloke11", "title": ratings + "absolute legend. nailed it quickly, lots of communication will use again"},
        {id: 28,  shape: 'circularImage', image: DIR + 'Lauro1986/cropped(26).png', label:"Lauro1986", "title": ratings + "He was great. Project was delivered very quickly, very knowledgeable.<br> I recommend!"},
        {id: 29,  shape: 'circularImage', image: DIR + 'Manuel13/cropped(27).png', label:"Manuel13", "title": ratings + "Creative and outstanding work!<br> A real pro within this realm."},
        {id: 30,  shape: 'circularImage', image: DIR + 'Matheusroriz/cropped(28).png', label:"Matheusroriz", "title": ratings + "Awesome! He was very helpful with everything!<br> We will certainly do business again!"},
        {id: 31,  shape: 'circularImage', image: DIR + 'Michaelgrinberg/cropped(29).png', label:"Michaelgrinberg", "title": ratings + "I needed to parse a string using Python inside Zapier. <br>The final goal was to make sure that the utm parameters get passed on to the CRM.<br> Bhishan was able to deliver a script that works well, does the job.<br> He was ahead of time and very friendly. <br>I would recommend the gig and its owner to everybody who needs to do a similar job."},
        {id: 32,  shape: 'circularImage', image: DIR + 'Newenglandmedia/cropped(30).png', label:"Newenglandmedia", "title": ratings + "The code is very well written and works.<br> Bhishan parameterized the items that I was looking for in the script so that I could take it from here.<br> Look forward to working with him again. "},
        {id: 33,  shape: 'circularImage', image: DIR + 'Nielsdo/cropped(31).png', label:"Nielsdo", "title": ratings + "Perfect!"},
        {id: 34,  shape: 'circularImage', image: DIR + 'Nilsbor/cropped(32).png', label:"Nilsbor", "title": ratings + "Everything was good"},
        {id: 35,  shape: 'circularImage', image: DIR + 'Paulolpduarte/cropped(33).png', label:"Paulolpduarte", "title": ratings + "Excellent job! Recommended!"},
        {id: 36,  shape: 'circularImage', image: DIR + 'Picture93/cropped(34).png', label:"Picture93", "title": ratings + "Outstanding Experience!<br>" + ratings + "Outstanding Experience!"},
        {id: 37,  shape: 'circularImage', image: DIR + 'Qms123456/cropped(35).png', label:"Qms123456", "title": ratings + "Outstanding Experience!"},
        {id: 38,  shape: 'circularImage', image: DIR + 'Rayrock2014/cropped(36).png', label:"Rayrock2014", "title": ratings + "As usual excellent<br>" + ratings + "Brilliant<br>" + ratings + "Awesome engineer"},
        {id: 39,  shape: 'circularImage', image: DIR + 'Realrocker/cropped(37).png', label:"Realrocker", "title": ratings + "Once again great service...Thanks</br>" + ratings + "Great Seller! <br>Exactly as requested, will hire him again and again"},
        {id: 40,  shape: 'circularImage', image: DIR + 'Ritesh2407/cropped(38).png', label:"Ritesh2407", "title": ratings + "I have previously worked with Bishan and he is great Python Guy to go for. <br>Always recommended.<br>" + ratings + "Outstanding Experience!"},
        {id: 41,  shape: 'circularImage', image: DIR + 'Roverfanclub/cropped(39).png', label:"Roverfanclub", "title": ratings + "Bhishan provided an excellent programmatic solution in a very short turnaround. <br>Will definitely look to him for solutions to future programming problems. "},
        {id: 42,  shape: 'circularImage', image: DIR + 'Samartking/cropped(40).png', label:"Samartking", "title": ratings + "Great seller! Highly recommended!<br>" + ratings + "!!!!Outstanding experience!!!!! <br>This seller is a great person and his work is Outstanding! <br>I definitely buy again!"},
        {id: 43,  shape: 'circularImage', image: DIR + 'Shahaleelal/cropped(41).png', label:"Shahaleelal", "title": ratings + "outstanding"},
        {id: 44,  shape: 'circularImage', image: DIR + 'Somo_king/cropped(42).png', label:"Somo_king", "title": ratings + "he is a life saver. thank you"},
        {id: 45,  shape: 'circularImage', image: DIR + 'Stirling198/cropped(43).png', label:"Stirling198", "title": ratings + "Outstanding Experience!"},
        {id: 46,  shape: 'circularImage', image: DIR + 'Subzerom/cropped(44).png', label:"Subzerom", "title": ratings + "Excellent Seller! Highly Skilled, great communication, very fast"},
        {id: 47,  shape: 'circularImage', image: DIR + 'Topdrawersrq/cropped(45).png', label:"Topdrawersrq", "title": ratings + "Outstanding Experience!"},
        {id: 48,  shape: 'circularImage', image: DIR + 'Vatsaldesai/cropped(46).png', label:"Vatsaldesai", "title": ratings + "quick delivery..excellent work"},
        {id: 49,  shape: 'circularImage', image: DIR + 'Vivekgarg172/cropped(47).png', label:"Vivekgarg172", "title": ratings + "Outstanding Experience!<br>" + ratings + "Great experience! <br>Bishan is very responsive and quality of work is great."},
        {id: 50,  shape: 'circularImage', image: DIR + 'Vseomedia/cropped(48).png', label:"Vseomedia", "title": ratings + "Great. Several works with him.<br>" + ratings + "i'll contact him again.<br> Great job as i request it."},
        {id: 51,  shape: 'circularImage', image: DIR + 'Webm87/cropped(49).png', label:"Webm87", "title": ratings + "Outstanding Experience!"},
        {id: 52,  shape: 'circularImage', image: DIR + 'Xspdo2/cropped(50).png', label:"Xspdo2", "title": ratings + "Excellente!!<br>" + ratings + "Excellente!!<br>" + ratings + "Quality Work!! Excelente Dev!"},



      ];


      var container = document.getElementById('mynetwork');
      var data = {
        nodes: nodes,
      };
      var options = {
        interaction:{hover:true},
        font:{
          size: 100,
        },

        physics: {
          barnesHut: {
            avoidOverlap: 1,
            centralGravity: 0.2,
          },
          repulsion:{
            nodeDistance: 1000,
          },
        },

        nodes: {
          size:90,
            color: {
            background: '#006400'
          },
          font:{color:'#eeeeee', size: 30},
        },

      };
      network = new vis.Network(container, data, options);
      network.on("click", function (params) {
          params.event = "[original event]";
 
      });
      network.on("hoverNode", function (params) {
          params.event = "[original event]";

      });

      window.onresize = function() {network.fit();}

    }
  </script>

</head>

<body onload="draw()">

<div id="mynetwork"></div>

</body>
</html>


I post awesome content on programming tutorials and computer science in general. Subscribe to never miss an article and programming resources.

Web Scraping with NodeJS

- - JavaScript, Tutorials

Web Scraping has been of an interest to a lot of businesses and individuals with the immense potential of the quantitative data available online. The data collected can entice the growth of an organization or a personal business. Through this post, we will see through examples on how NodeJS can be used to scrape content from a website. The intentions of this post is to host example code snippets so people can take ideas from it to build scrapers as per their needs using cheerio module in NodeJS. I will be using github’s trending page https://github.com/trending throughout this post for the examples, especially because it best suits for applying various cheerio methods.

Installation


npm install --save promise request-promise cheerio

Get html of a page:
var Promise = require("promise");
var request = require("request-promise");
var cheerio = require("cheerio");

function getHtml(){
    return new Promise(function(resolve, reject){
        request("https://github.com/trending")
            .then(cheerio.load)
            .then(console.log)
            .then(resolve)
            .then(reject)
    });
}
Using cheerio to get title from a page
function getHtml(){
    return new Promise(function(resolve, reject){
        request("https://github.com/trending")
            .then(cheerio.load, requestError)
            .then(scrapeContent)
            .then(resolve)
            .then(reject)
    });
}


function scrapeContent($){
    console.log($('title').text());
}

function requestError() {
    console.log("The trending page could not be loaded!");
    throw Error("Could not fetch html!");

}


getHtml();
Find single element by tag name, find multiple elements by tag name
function getHtml(){
    return new Promise(function(resolve, reject){
        request("https://github.com/trending")
            .then(cheerio.load, requestError)
            .then(scrapeContent)
            .then(resolve)
            .then(reject)
    });
}


function scrapeContent($){
    //console.log($);
    console.log($('title').text());
    var all_li_elements = $('ol').find('li');
    all_li_elements.each(function(item){
        console.log($(this).find('h3').text().trim());
    });
    
}

function requestError() {
console.log("The trending page could not be loaded!");
throw Error("Could not fetch html!");
}


getHtml();
Output


Trending repositories on GitHub today · GitHub
Microsoft / FASTER
MichaelMure / git-bug
google / python-fire
Droogans / unmaintainable-code
Avik-Jain / 100-Days-Of-ML-Code
pwxcoo / chinese-xinhua
JuliaLang / julia
r-spacex / SpaceX-API
IEEEKeralaSection / rescuekerala
react-tools / react-move
imhuay / Interview_Notes-Chinese
crossoverJie / JCSprout
benhoyt / goawk
salesforce / TransmogrifAI
Jeffail / benthos
aykevl / tinygo
astorfi / Deep-Learning-World
vuejs / vue
firehol / netdata
loveRandy / vue-cli3.0-vueadmin
trekhleb / javascript-algorithms
palmerhq / react-async-elements
messeb / ios-project-env-setup
jianstm / Schedule
kholia / OSX-KVM

Getting Attributes of an element
function getHtml(){
    return new Promise(function(resolve, reject){
        request("https://github.com/trending")
            .then(cheerio.load, requestError)
            .then(scrapeContent)
            .then(resolve)
            .then(reject)
    });
}


function scrapeContent($){
    //console.log($);
    console.log($('title').text());
    var all_li_elements = $('ol').find('li');
    all_li_elements.each(function(item){
        console.log("https://github.com/" + $(this).find('a').attr('href'));
    });
}

function requestError() {
    console.log("The trending page could not be loaded!");
    throw Error("Could not fetch html!");

}


getHtml();
Output


Trending repositories on GitHub today · GitHub
https://github.com//Microsoft/FASTER
https://github.com//MichaelMure/git-bug
https://github.com//google/python-fire
https://github.com//Droogans/unmaintainable-code
https://github.com//Avik-Jain/100-Days-Of-ML-Code
https://github.com//pwxcoo/chinese-xinhua
https://github.com//JuliaLang/julia
https://github.com//r-spacex/SpaceX-API
https://github.com//IEEEKeralaSection/rescuekerala
https://github.com//react-tools/react-move
https://github.com//imhuay/Interview_Notes-Chinese
https://github.com//crossoverJie/JCSprout
https://github.com//benhoyt/goawk
https://github.com//salesforce/TransmogrifAI
https://github.com//Jeffail/benthos
https://github.com//aykevl/tinygo
https://github.com//astorfi/Deep-Learning-World
https://github.com//vuejs/vue
https://github.com//firehol/netdata
https://github.com//loveRandy/vue-cli3.0-vueadmin
https://github.com//trekhleb/javascript-algorithms
https://github.com//palmerhq/react-async-elements
https://github.com//messeb/ios-project-env-setup
https://github.com//jianstm/Schedule
https://github.com//kholia/OSX-KVM

Using class name or other attributes to get element
function getHtml(){
    return new Promise(function(resolve, reject){
        request("https://github.com/trending")
            .then(cheerio.load, requestError)
            .then(scrapeContent)
            .then(resolve)
            .then(reject)
    });
}


function scrapeContent($){
    //console.log($);
    console.log($('title').text());
    var all_li_elements = $('ol').find('li');
    all_li_elements.each(function(item){
        console.log($(this).find('.float-sm-right').text().trim());
    });
}

function requestError() {
    console.log("The trending page could not be loaded!");
    throw Error("Could not fetch html!");

}


getHtml();
Output
Trending  repositories on GitHub today · GitHubAsset 1Asset 1
880 stars today
744 stars today
614 stars today
311 stars today
191 stars today
182 stars today
178 stars today
179 stars today
103 stars today
152 stars today
134 stars today
129 stars today
128 stars today
126 stars today
125 stars today
122 stars today
104 stars today
99 stars today
107 stars today
108 stars today
101 stars today
102 stars today
89 stars today
88 stars today
76 stars today
Navigate childrens from an element
function getHtml(){
    return new Promise(function(resolve, reject){
        request("https://github.com/trending")
            .then(cheerio.load, requestError)
            .then(scrapeContent)
            .then(resolve)
            .then(reject)
    });
}


function scrapeContent($){
    //console.log($);
    console.log($('title').text());
    var all_li_elements = $('ol').children(); // using children()
    all_li_elements.each(function(item){
        console.log($(this).find('h3').text().trim());
    });
}

function requestError() {
    console.log("The trending page could not be loaded!");
    throw Error("Could not fetch html!");

}


getHtml();
Output


Trending repositories on GitHub today · GitHub
Microsoft / FASTER
MichaelMure / git-bug
google / python-fire
Droogans / unmaintainable-code
Avik-Jain / 100-Days-Of-ML-Code
pwxcoo / chinese-xinhua
JuliaLang / julia
r-spacex / SpaceX-API
IEEEKeralaSection / rescuekerala
react-tools / react-move
imhuay / Interview_Notes-Chinese
crossoverJie / JCSprout
benhoyt / goawk
salesforce / TransmogrifAI
Jeffail / benthos
aykevl / tinygo
astorfi / Deep-Learning-World
vuejs / vue
firehol / netdata
loveRandy / vue-cli3.0-vueadmin
trekhleb / javascript-algorithms
palmerhq / react-async-elements
messeb / ios-project-env-setup
jianstm / Schedule
kholia / OSX-KVM

The .children will only return the immediate childrens of the parent element.

Navigating previous and next siblings of an element
function getHtml(){
    return new Promise(function(resolve, reject){
        request("https://github.com/trending")
            .then(cheerio.load, requestError)
            .then(scrapeContent)
            .then(resolve)
            .then(reject)
    });
}


function scrapeContent($){
    //console.log($);
    console.log($('title').text());
    var fifth_li_element = $('ol').find('li').eq(4); // use eq(index) to access via indices.
    var fourth_li_element = fifth_li_element.prev(); // previous sibling
    console.log(fourth_li_element.find('h3').text().trim());
    var sixth_li_element = fifth_li_element.next(); // next sibling
    console.log(sixth_li_element.find('h3').text().trim());
}

function requestError() {
    console.log("The trending page could not be loaded!");
    throw Error("Could not fetch html!");

}


getHtml();
Putting it all together(Github Trending Scraper)
function getHtml(){
    return new Promise(function(resolve, reject){
        request("https://github.com/trending")
            .then(cheerio.load, requestError)
            .then(scrapeContent)
            .then(resolve)
            .then(reject)
    });
}


function scrapeContent($){
    //console.log($);
    var all_li_elements = $('ol').find('li');
    all_li_elements.each(function(item){
        repository_name = $(this).find('h3').text().trim();
        total_stars_today = $(this).find('.float-sm-right').text().trim();
        repository_url = 'https://github.com/' + $(this).find('a').attr('href');
        console.log(repository_name, "\t", total_stars_today, "\t", repository_url);
    });
}

function requestError() {
    console.log("The trending page could not be loaded!");
    throw Error("Could not fetch html!");

}

getHtml();
Output
Microsoft / FASTER                946 stars today      https://github.com//Microsoft/FASTER
google / python-fire              647 stars today      https://github.com//google/python-fire
MichaelMure / git-bug             578 stars today      https://github.com//MichaelMure/git-bug
Droogans / unmaintainable-code    228 stars today      https://github.com//Droogans/unmaintainable-code
pwxcoo / chinese-xinhua           198 stars today      https://github.com//pwxcoo/chinese-xinhua
Avik-Jain / 100-Days-Of-ML-Code   188 stars today      https://github.com//Avik-Jain/100-Days-Of-ML-Code
JuliaLang / julia                 170 stars today      https://github.com//JuliaLang/julia
aykevl / tinygo                   169 stars today      https://github.com//aykevl/tinygo
IEEEKeralaSection / rescuekerala  96 stars today       https://github.com//IEEEKeralaSection/rescuekerala
benhoyt / goawk                   152 stars today      https://github.com//benhoyt/goawk
r-spacex / SpaceX-API             150 stars today      https://github.com//r-spacex/SpaceX-API
crossoverJie / JCSprout           129 stars today      https://github.com//crossoverJie/JCSprout
imhuay / Interview_Notes-Chinese  121 stars today      https://github.com//imhuay/Interview_Notes-Chinese
trekhleb / javascript-algorithms  113 stars today      https://github.com//trekhleb/javascript-algorithms
salesforce / TransmogrifAI        107 stars today      https://github.com//salesforce/TransmogrifAI
loveRandy / vue-cli3.0-vueadmin   110 stars today      https://github.com//loveRandy/vue-cli3.0-vueadmin
astorfi / Deep-Learning-World     102 stars today      https://github.com//astorfi/Deep-Learning-World
vuejs / vue                       97 stars today       https://github.com//vuejs/vue
Jeffail / benthos                 104 stars today      https://github.com//Jeffail/benthos
palmerhq / react-async-elements   92 stars today       https://github.com//palmerhq/react-async-elements
firehol / netdata                 89 stars today       https://github.com//firehol/netdata
jesseduffield / lazygit           84 stars today       https://github.com//jesseduffield/lazygit
jianstm / Schedule                75 stars today       https://github.com//jianstm/Schedule
shadowsocks/shadowsocks-windows   67 stars today       https://github.com//shadowsocks/shadowsocks-windows
mawww / kakoune                   71 stars today       https://github.com//mawww/kakoune

There is another version of this article using same set of examples using BeautifulSoup in Python. You should read it too. https://www.thetaranights.com/web-scraping-beautifulsoup-python/

Google APIs and Python – Part II

- - Python, Tutorials

Google services are cool and you can build products and services around it. We will see through examples how you can use various google services such as spreadsheet, slides and drive through Python. I hope people can take ideas from the following example to do amazing stuffs with Google services. There is a part one to this article where I walked through procedure to enable Google APIs, installation of required packages in Python, authentication and demonstrated individual examples of Sheets, Drive and Slides API. https://www.thetaranights.com/brief-introduction-to-google-apissheets-slides-drive/ . In this article however, we will integrate Sheets, Drive and Slides API altogether.

The Idea

We will use data from a sheet which contains some statistics about a few applications/websites. The end goal is to create a presentation slide, add a background image to it, add content from the sheet to the slide and also some other cool stuffs. All the resources used in the following examples are public so you can follow along.

I will be using following resources throughout the example

Create Presentation Slides from Sheets data and Drive images using Python
from googleapiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

TEMPLATE_FILE = "TEM_F"

SCOPES = ('https://www.googleapis.com/auth/spreadsheets','https://www.googleapis.com/auth/drive')

CLIENT_SECRET = 'client_secret_760822340075-i0ark1h51pnbhii5dgafug3k4g1nodb8.apps.googleusercontent.com.json' # download from google console after activating apis

store = file.Storage('storage.json') # doesn't matter if not present, you will be prompted to accept access to google resources on your account and a token will be generated that is stored inside storage.json with requested previliges.

credz = store.get()

if not credz or credz.invalid:
    flow = client.flow_from_clientsecrets(CLIENT_SECRET, SCOPES)
    credz = tools.run_flow(flow, store)

HTTP = credz.authorize(Http())

SHEETS = discovery.build('sheets', 'v4', http=HTTP)

SLIDES = discovery.build('slides', 'v1', http=HTTP)

DRIVE = discovery.build('drive', 'v3', http=HTTP)


presentation_template_file_id = "1wLimfuGw1pqZZJvkc15lOfD7LSEAWCjuIlhbrOaiulE" # the template has been made public.
# name of the presentation file
DATA = {'name':'MobileApplicationsReport'}

PRESENTATION_ID = DRIVE.files().copy(body=DATA, fileId="1wLimfuGw1pqZZJvkc15lOfD7LSEAWCjuIlhbrOaiulE").execute()['id']
print(PRESENTATION_ID)

sheet_ID = '1xpjQkF692lNnTsfOckVll2OPTa659ZCuK3JezDSkris' # the sheet where we fetch data from to populate to the slides.

application_statistics = SHEETS.spreadsheets().values().get(range='Sheet1', spreadsheetId=sheet_ID).execute().get('values') # all the data from the sheet as lists including headers.

print(application_statistics)

presentation_details = SLIDES.presentations().get(presentationId=PRESENTATION_ID).execute()

slides_data = presentation_details.get('slides', [])[0]

page_id = slides_data['objectId'] # page id of the first slide of the presentation.

for each_data in application_statistics[1:]: # skip the headers.
    # duplicate slide for the next cycle before replacing content on a slide since we are using method of replacing text from the slide to populate data.
    reqs = [{"duplicateObject": {"objectId": page_id}}]
    copy_slide_rsp = SLIDES.presentations().batchUpdate(body={'requests':reqs}, presentationId=PRESENTATION_ID).execute()
    
    IMG_ID = each_data[10] # the id of the image present on google drive which we intend to have as a background image to this particular slide.
    img_url = '%s&access_token=%s' % (DRIVE.files().get_media(fileId=IMG_ID).uri, credz.access_token)
    print("Image url", img_url)

    # prepare a bulk requests that basically replaces the text from the template with the actual data from the sheets.
    bulk_requests = [
        {'updatePageProperties':{'objectId':page_id, 'pageProperties':{'pageBackgroundFill':{'stretchedPictureFill':{'contentUrl':img_url}}}, 'fields':'pageBackgroundFill'}},
        {'replaceAllText':{'containsText':{'text':'{{SHOWCASE  NAME}}', 'matchCase':True}, 'replaceText':each_data[1], "pageObjectIds": [page_id]}},
        {'replaceAllText':{'containsText':{'text':'{{DESCRIPTION}}', 'matchCase':True}, 'replaceText':each_data[2], "pageObjectIds": [page_id]}},
        {'replaceAllText':{'containsText':{'text':'{{COMPOSITION}}', 'matchCase':True}, 'replaceText':each_data[3], "pageObjectIds": [page_id]}},
        {'replaceAllText':{'containsText':{'text':'{{IMPRESSIONS}}', 'matchCase':True}, 'replaceText':each_data[8], "pageObjectIds": [page_id]}},
        {'replaceAllText':{'containsText':{'text':'{{VIDEO VIEWS}}', 'matchCase':True}, 'replaceText':each_data[7], "pageObjectIds": [page_id]}},
        {'replaceAllText':{'containsText':{'text':'{{USERS}}', 'matchCase':True}, 'replaceText':each_data[6], "pageObjectIds": [page_id]}},
        {'replaceAllText':{'containsText':{'text':'{{MOBILE}}', 'matchCase':True}, 'replaceText':each_data[9], "pageObjectIds": [page_id]}}
    ]
    bulk_update_response = SLIDES.presentations().batchUpdate(body={'requests':bulk_requests}, presentationId=PRESENTATION_ID, fields='').execute().get('replies')

    page_id = copy_slide_rsp['replies'][0]['duplicateObject']['objectId'] # update the page id as the one that was duplicated so we now can work on this slide.

delete_final_page = SLIDES.presentations().batchUpdate(body={'requests':[{"deleteObject": {"objectId": page_id}}]}, presentationId=PRESENTATION_ID, fields='').execute().get('replies')
Output Presentation Created:

After successful running of the above program, following presentation was generated.
https://docs.google.com/presentation/d/1h9YqUnCWu5pxXmW3rs_9rKmMsVeJM9I8nIBGbr25pME/edit?usp=sharing

Brief Introduction to Google APIs(Sheets, Slides, Drive)

- - Python, Tutorials

The intentions of this post is to familiarize usage of Google APIs with Python. Google services are cool and you can build products and services around it. We will see through examples how you can use various google services such as spreadsheet, slides and drive through Python. I hope people can take ideas from the following example to do amazing stuffs with Google services. In order to work with google services via their APIs, first we need to create a project on google console with specific APIs enabled. For the scope of this article, we will need the SHEETS API, SLIDES API and DRIVE API enabled.

I will be using following resources throughout the examples

Installation of libraries and setup

pip install --upgrade google-api-python-client oauth2client

Creating a project on Google Console and enabling APIs

1. Open google console https://console.cloud.google.com/apis/dashboard
2. Create a new project

create_a_project_google_console

Create a new project on google console

3. Name the project

new_project_google_console

Name new project

4. Enable Sheets, Slides and Drive APIs

google_console_enable_apis

Enable Google APIs



google_console_enable_drive_api

Enable Drive API as well as slides and sheets APIs

5. Create Credentials and Download it.

google_console_create_credentials

Create credentials for the project and download it

Authentication

We need the credentials that was downloaded from google console for authentication. Google creates an access token to access and work on the google resources. The token does expire and in case it does, we will be prompted on a browser to provide access to the application for the specified resources on our google account.

>>> from googleapiclient import discovery
>>> from httplib2 import Http
>>> from oauth2client import file, client, tools
>>> SCOPES = ('https://www.googleapis.com/auth/spreadsheets','https://www.googleapis.com/auth/drive')
>>> CLIENT_SECRET = 'client_secret_760822340075-i0ark1h51pnbhii5dgafug3k4g1nodb8.apps.googleusercontent.com.json'
>>> store = file.Storage('token.json')
>>> creds = store.get()
/home/bhishan-1504/googleapis/googleapienv/lib/python3.6/site-packages/oauth2client/_helpers.py:255: UserWarning: Cannot access token.json: No such file or directory
  warnings.warn(_MISSING_FILE_MESSAGE.format(filename))
>>> 

>>> if not credz or credz.invalid:
...     flow = client.flow_from_clientsecrets(CLIENT_SECRET, SCOPES)
...     credz = tools.run_flow(flow, store)
...

Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?client_id=760822340075-i0ark1h51pnbhii5dgafug3k4g1nodb8.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8080%2F&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fspreadsheets+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&access_type=offline&response_type=code

If your browser is on a different machine then exit and re-run this
application with the command-line parameter

  --noauth_local_webserver

Created new window in existing browser session.
Authentication successful.
>>>

Note: For all of the examples below, we need authentication. I am skipping them in all of the examples underneath to remove redundancy.

Reading a google spreadsheet using Python
>>> HTTP = credz.authorize(Http())
>>> SHEETS = discovery.build('sheets', 'v4', http=HTTP)
>>> sheet_ID = '1xpjQkF692lNnTsfOckVll2OPTa659ZCuK3JezDSkris'
>>> spreadsheet_read = SHEETS.spreadsheets().values().get(range='Sheet1', spreadsheetId=sheet_ID).execute()
>>> spreadsheet_read
{'range': 'Sheet1!A1:Z1001', 'majorDimension': 'ROWS', 'values': [['Category', 'Showcase Name', 'Description', 'Audience Composition', 'ID', 'Audience Name', 'Users', 'Video Views', 'Impressions', 'Mobile Impressions', 'Image'], ['Sports', 'Sports Fans', 'People who likes sports', 'online data', '12', 'Sports Fans', '1000', '1000', '1000', '1000', 'sports-fans.jpg']]}
>>> spreadsheet_values = spreadsheet_read['values']
>>> spreadsheet_values
[['Category', 'Showcase Name', 'Description', 'Audience Composition', 'ID', 'Audience Name', 'Users', 'Video Views', 'Impressions', 'Mobile Impressions', 'Image'], ['Sports', 'Sports Fans', 'People who likes sports', 'online data', '12', 'Sports Fans', '1000', '1000', '1000', '1000', 'sports-fans.jpg']]
>>>
Search and Download a file from drive using Python
>>> import io
>>> DRIVE = discovery.build('drive', 'v3', http=HTTP)
>>> resp = DRIVE.files().list(q="name='{filepath}'".format(filepath="P1150264.JPG")).execute()
>>> resp
{'kind': 'drive#fileList', 'incompleteSearch': False, 'files': [{'kind': 'drive#file', 'id': '0B54qUrMD2GDIa0FMdkxLMmpoZVU', 'name': 'P1150264.JPG', 'mimeType': 'image/jpeg'}]}
>>> file_id = resp['files'][0]['id']
>>> file_id
'0B54qUrMD2GDIa0FMdkxLMmpoZVU'
>>> file_request = DRIVE.files().get_media(fileId=file_id)
>>> fh = io.BytesIO()
>>> downloader = MediaIoBaseDownload(fh, file_request)
>>> done = False
>>> while done is False:
...     status, done = downloader.next_chunk()
...     print("Download {status}".format(status=status.progress() * 100))
...
Download 100.0
>>>

Google Slides API

Create a blank presentation using Python
>>> SLIDES = discovery.build('slides', 'v1', http=HTTP)
>>> body = {'title': 'AutomatedPresentation'}
>>> presentation_request = SLIDES.presentations().create(body=body).execute()
>>> presentation_request['presentationId']
'1ROJOeVFaA4PbC2voR5EFddohxQlZvUkrdi1dsJUks9c'
>>>

Follow the link to see the presentation the above code snippet creates.
https://docs.google.com/presentation/d/1ROJOeVFaA4PbC2voR5EFddohxQlZvUkrdi1dsJUks9c/edit?usp=sharing

Creating presentation using existing template from drive
>>>TEMPLATE_FILE = 'TEM_F'
>>> SLIDES = discovery.build('slides', 'v1', http=HTTP)
>>> DRIVE = discovery.build('drive', 'v3', http=HTTP)
>>> rsp = DRIVE.files().list(q="name='%s'"% TEMPLATE_FILE).execute()['files'][0]
>>> rsp
{'kind': 'drive#file', 'id': '1wLimfuGw1pqZZJvkc15lOfD7LSEAWCjuIlhbrOaiulE', 'name': TEMPLATE_FILE, 'mimeType': 'application/vnd.google-apps.presentation'}
>>> DATA = {'name': 'PresentationUsingTemplate'}
>>> create_presentation_request = DRIVE.files().copy(body=DATA, fileId=rsp['id']).execute()
>>> presentation_id = create_presentation_request['id']
>>> presentation_id
'10iDjayeyVkVSp5F6eQIqzpISAqjFlbqG4_jdYDAFJG4'
>>>

Follow the link to see the presentation the above code snippet creates. https://docs.google.com/presentation/d/10iDjayeyVkVSp5F6eQIqzpISAqjFlbqG4_jdYDAFJG4/edit?usp=sharing

Adding background image to a slide
>>> SLIDES = discovery.build('slides', 'v1', http=HTTP)
>>> DRIVE = discovery.build('drive', 'v3', http=HTTP)
>>> rsp = DRIVE.files().list(q="name='%s'"% TEMPLATE_FILE).execute()['files'][0]
>>> rsp
{'kind': 'drive#file', 'id': '1wLimfuGw1pqZZJvkc15lOfD7LSEAWCjuIlhbrOaiulE', 'name': 'TEM_F', 'mimeType': 'application/vnd.google-apps.presentation'}
>>> DATA = {'name': 'PresentationUsingTemplatePlusBackgroundImage'}
>>> create_presentation_request = DRIVE.files().copy(body=DATA, fileId=rsp['id']).execute()
>>> presentation_id = create_presentation_request['id']
>>> presentation_id
'1cxpaH19h582Q4Ot3b5GL9U6ETl9myqE3JlX4_Fa35e8'
>>> IMG_FILE = "sports-fans.jpg"
>>> img_file_request = DRIVE.files().list(q="name='%s'" % IMG_FILE).execute()['files'][0]
>>> img_url = '%s&access_token=%s' % (DRIVE.files().get_media(fileId=img_file_request['id']).uri, credz.access_token)
>>> img_url
'https://www.googleapis.com/drive/v3/files/0B54qUrMD2GDIa2syZWF3OE5xSUk?alt=media&access_token=ya29.Glz3BcRtfadsfGzKwUQ-6llroeaMfdasfdaffadsjfdXiOewDdHqhdgBef2euMm9OMxGXyXF-axwZ0gFBwH2-T6qS29qmpc-H3ELcyh7CDZCbfzn7DTNJkugoA'

>>> presentation_details = SLIDES.presentations().get(presentationId=presentation_id).execute()
>>> first_slide_data = presentation_details.get('slides', [])[0]
>>> first_slide_id = slides_data['objectId']
>>> first_slide_id
'p3'
>>> bulk_reqs = [{'updatePageProperties':{'objectId':first_slide_id, 'pageProperties':{'pageBackgroundFill':{'stretchedPictureFill':{'contentUrl':img_url}}}, 'fields':'pageBackgroundFill'}}]
>>> bulk_update_req = SLIDES.presentations().batchUpdate(body={'requests':bulk_reqs}, presentationId=presentation_id).execute()

Follow the link to see the presentation the above code snippet creates.
https://docs.google.com/presentation/d/1cxpaH19h582Q4Ot3b5GL9U6ETl9myqE3JlX4_Fa35e8/edit?usp=sharing

On a follow up post to this one, we will focus on integrating slides, sheets and drive API altogether. We shall use spreadsheet data and populate it onto presentation slides.
To be continued…

Update

Published the second part to this article. https://www.thetaranights.com/google-apis-and-python-part-ii/

Python filter() built-in

- - Python, Tutorials

Filter makes an iterator that takes a function and uses the arguments from the following iterable passed to the filter built-in. It returns a filtered iterator which contains only those values for which the function(passed as the first argument to the filter) evaluated truth value. What makes this possible is the equal status of every object in Python. One of the main goals of Python was to have an equal status for all the objects. Remember how even a function is an object in Python and hence it can be assigned to a variable, passed as an argument to an another function, etc.


filter(function or None, iterable)

The first argument is a function that you want each of the elements of the following iterables to be passed as an argument and be evaluated.

Other than the function object, the filter built-in should have one iterable as an argument such that the arguments for the function is taken from the iterable.

Filter takes two arguments
>>> def isdivisibleby2(x):
...     if x % 2 == 0:
...         return True
...     return False
...
>>> filter([1,2,3,4])
Traceback (most recent call last):
  File "", line 1, in 
TypeError: filter expected 2 arguments, got 1
>>> filter(isdivisibleby2)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: filter expected 2 arguments, got 1
>>> filter(isdivisibleby2, [1,2,3,4], [5,6,7,8])
Traceback (most recent call last):
  File "", line 1, in 
TypeError: filter expected 2 arguments, got 3
>>>
Filter Example
>>> def isdivisibleby2(x):
...     if x % 2 == 0:
...         return True
...     return False
...
>>> filtered_list = filter(isdivisibleby2, [1, 2, 3, 4])
>>> filtered_list
<filter object at 0x7f04cb644da0>
>>> list(filtered_list)
[2, 4]
>>>
Filter evaluates Truthy and Falsy

Filter built-in returns a filtered iterator which contains only those values for which the function(passed as the first argument to the filter) evaluated truth value(truthy). An empty sequence such as an empty list [], empty dictionaries, 0 for numeric, None are considered false values or falsy. Almost anything excluding the earlier mentioned are considered truthy. You should read this post on Truthy and Falsy concepts in Python. https://www.thetaranights.com/idiomatic-python-use-of-falsy-and-truthy-concepts/

>>> def arbitrary_function(x):
...     return x
...
>>> filtered_list = filter(arbitrary_function, [1, 2, 3, 4])
>>> filtered_list
<filter object at 0x7f04cb5e9550>
>>> list(filtered_list)
[1, 2, 3, 4]
>>>
>>> def arbitrary_function(x):
...     return 0 # any of False, None, [], {}
...
>>> filtered_list = filter(arbitrary_function, [1, 2, 3, 4])
>>> filtered_list
<filter object at 0x7f04cb5e92b0>
>>> list(filtered_list)
[]
>>>

Python map() built-in

- - Python, Tutorials

Map makes an iterator that takes a function and uses the arguments from the following iterables passed to the map built-in. What makes this possible is the equal status of every object in Python. One of the main goals of Python was to have an equal status for all the objects. Remember how even a function is an object in Python and hence it can be assigned to a variable, passed as an argument to a function, etc.


map(func, *iterables)

The first argument is a function that you want each of the elements of the following iterables to be passed as an argument and be evaluated.

Other than the function object, a map built-in should have at least one iterable and could have iterables as an argument such that the arguments for the function is taken from each of the iterables.

Map takes at least two arguments
>>> def square(x):
...     return x**2
...
>>> map(square)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: map() must have at least two arguments.
>>>
Map Example
>>> def square(x):
...     return x**2
...
>>> squared = map(square, [1,2,3,4,5])
>>> squared
<map object at 0x7f1948bbbef0>
>>> list(squared)
[1, 4, 9, 16, 25]
>>>
Map could take multiple iterables
>>> def add_and_square(x, y):
...     return (x+y)**2
...
>>> added_and_squared = map(add_and_square, [1,2,3,4], [5,6,7,8])
>>> added_and_squared
<map object at 0x7f1948b79518>
>>> list(added_and_squared)
[36, 64, 100, 144]
>>>
When you pass iterables of varying length
>>> def add_and_square(x, y):
...     return (x+y)**2
...
>>> added_and_squared = map(add_and_square, [1,2,3,4], [5,6,7,8, 9])
>>> added_and_squared
<map object at 0x7f1948b795f8>
>>> list(added_and_squared)
[36, 64, 100, 144]
>>>

When you pass iterables of varying length to map built-in, it falls back to the minimum length.

Examples of Browser Automations using Selenium in Python

- - Python, Tutorials

Browser Automation is one of the coolest things to do especially when there is a major purpose to it. Through this post, I intend to host a set of examples on browser automation using selenium in Python so people can take ideas from the code snippets below to perform browser automation as per their need. Selenium allows just about any kinds of interactions with the browser elements and hence is a go for tasks requiring user interaction and javascript support.

Installation:


pip install selenium
Download chromedriver from http://chromedriver.chromium.org/downloads
Download phantomjs from http://phantomjs.org/download.html

Login to a website using selenium
>>> from selenium import webdriver
>>> from selenium.webdriver.common.keys import Keys
>>> executable_path = "/home/bhishan-1504/Downloads/chromedriver_linux64/chromedriver"
>>> browser = webdriver.Chrome(executable_path=executable_path)
>>> browser.get("https://github.com/login")
>>> username_field = browser.find_element_by_name("login")
>>> password_field = browser.find_element_by_name("password")
>>> username_field.send_keys("bhishan")
>>> password_field.send_keys("password")
>>> password_field.send_keys(Keys.RETURN)
>>>
Switching proxy with selenium

As much as selenium is used for web scraping, it is very effective for web interactions too. Suppose a scenario where you have to cast a vote for a competition, one vote per IP address. Following example demonstrates how you would use selenium to perform a repetitive task(casting a vote in this case) from various IP addresses.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
url = "somedummysite.com/voting/bhishan.php" # url not made public



def cast_vote(proxy):
    service_args = [
    '--proxy=' + proxy,
    '--proxy-type=http',
    ]
    print(service_args)
    browser = webdriver.PhantomJS(service_args=service_args)
    
    browser.get(each_url)
    try:
        cast_vote_element = WebDriverWait(browser, 10).until(
            EC.presence_of_element_located((By.CLASS_NAME, 'vote'))
        )
    except selenium.common.exceptions.TimeoutException:
        print("Cast vote button not available. Seems like you have used this IP already!")
        return
    cast_vote_element.click()
    browser.quit()

def main():
    with open(proxies.txt', 'rb') as f:
        for each_ip in f:
            cast_vote(each_ip.strip())



if __name__ == '__main__':
    main()
Execute JavaScript using selenium

There could be cases where you’d want to execute javascript on the browser instance. The below example is a depiction of one such scenario. Remember when in your News Feed on facebook, a post has hundreds of thousands of comments and you have to monotonously click to expand the comment threads. The example below does it through selenium but has an even bigger purpose. The following code snippet loops over a few thousand facebook urls(relating to a post) and expands the comment threads and prints the page as a pdf file. This was a part of a larger program that had something to do with the pdf files. However, it isn’t relevant to this post. Here is a link to the JavaScript code which is used in the program below that expands the comments on facebook posts. I don’t even remember where I found it though.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

import json
import time


# get the js to be executed.

with open('js_code.txt', 'r') as f:
    js_code = f.read()

executable_path = '/home/bhishan-1504/Downloads/chromedriver_linux64/chromedriver'


appState = {
    "recentDestinations": [
        {
            "id": "Save as PDF",
            "origin": "local"
        }
    ],
    "selectedDestinationId": "Save as PDF",
    "version": 2
}


profile = {"printing.print_preview_sticky_settings.appState": json.dumps(appState), 'savefile.default_directory': "/home/bhishan-1504/secret_project/"}

profile["download.prompt_for_download"] = False
profile["profile.default_content_setting_values.notifications"] = 2
chrome_options = webdriver.ChromeOptions()

chrome_options.add_experimental_option('prefs', profile)
chrome_options.add_argument("--start-maximized")
chrome_options.add_argument('--kiosk-printing')

# chrome_options.add_argument("download.default_directory=/home/bhishan-1504/secret_project/")
browser = webdriver.Chrome(executable_path=executable_path, chrome_options=chrome_options)

def save_pdf(count):
    browser.execute_script("document.title=" + str(count) + ";")
    browser.execute_script('window.print();')
    time.sleep(1)


def visit_page(url, count):
    browser.get(url)
    try:
        home_btn = WebDriverWait(browser, 10).until(
            EC.presence_of_element_located((By.LINK_TEXT, "Home"))
        )
    except selenium.common.exceptions.TimeoutException:
        print("Didn’t work out!")
        return

    browser.execute_script(js_code)
    time.sleep(7)
    save_pdf(count)




if __name__ == '__main__':
    count = 1
    # loop through the text file and pass to visit page function.
    with open('urls.txt', 'r') as f:
        for each_url in f.readlines():
            visit_page(each_url, count)
            count += 1

I recently published an article on Web Scraping using BeautifulSoup. You should read it.