2 min read

network analysis

This plot is a network of co-authors for MIS-C published studies from PubMed. MIS-C is an inflammatory disease related to Kawasaki's disease that affects children who have been infected with COVID-19. I have plotted this to demonstrate how publicly available data can be used to perform a network analysis.

To get started, search for MIS-C over at PubMed. Download the full dateset as a CSV file. This will be formatted in a typically citation style export. That export can be parsed with the following Perl script to just get a non-repeating edge list of co-authors (pass CSV file as argument to Perl script):

#! use perl

use warnings;
use strict;
use Unicode::Collate;

my $inFile = $ARGV[0];
my $outFile = "pubmed-authors.txt";

my @aut;
my @eic;
my @newArray;
my $list;
my %main;
my $var;
my $lc = 0;

open(IN,"<$inFile") || print "can't open $inFile\n";

# read and split the csv to pull out the authors
for $_(<IN>){
    my @l = split('\",\"', $_);
    my $authList = $l[2];
    $lc++;
    
    $authList =~ s/\.//g;
    $authList =~ s/;.*//g;
    $authList =~ s/ //g;
    my @authors = split(',', $authList);

    # create author in main array if doesn't exist
    for $_(@authors){
        my $author = $_;
        if(exists($main{$author})){
            $list = $main{$author};
        }
        else {
            $list = "";
        }
        $main{$author} = $list;
    }

    # compile list of non-repeating co-authors for each author
    for $_(@authors){
        my $author1 = $_;
        for $_(@authors){
            my $author2 = $_;
            unless($author1 eq $author2){
                $list = $main{$author1};
                my $list2 = $main{$author2};
                unless($list =~ m/$author2/){
                    unless($list2 =~ m/$author1/){
                        unless($author2 eq "etal"){
                            $list = $list." ".$author2;
                            $main{$author1} = $list;
                        }
                    }
                }
            }
        }
    }
}
close IN;

# sort the authors
for my $key(keys %main){
    my $val = $main{$key};
    @aut = split(" ", $val);
    @aut = sort @aut;
    my $newOrder = join(" ", @aut);
    $main{$key} = $newOrder;
}

# format the output as an edgelist to run in R igraph
for my $key(keys %main){
    my $val = $main{$key};
    @eic = split(" ", $val);
    for $_(@eic){
        my $pair = "$key,$_\n";
        push(@newArray, $pair);
    }
}

@newArray = sort @newArray;

# save results (eliminate etal) to a txt file
open(OUT,">>$outFile") || print "can't open $outFile\n";
for $_(@newArray){
    unless($_ =~ m/^etal/){
        print OUT $_;
    }
}
close OUT;

This should provide an edge list export formatted like:
author1,author2
author1,author3
author2,author4

That pubmed-authors.txt file can be read into an R script that uses the igraph library to create this plot:

library(igraph)

# NEW method from file
setwd("~/pubmed/")
ln <- readLines("pubmed-authors.txt")

# Store as matrix with from to indices
auth_mtx <- do.call(rbind, strsplit(ln[grep(".*, .*", ln)], ","))
auth_g <- graph_from_data_frame(apply(auth_mtx, 2, as.character))
deg <- igraph::degree(auth_g, mode="all")

# this allows you to check the connectivity per author
d <- data.frame(V = as.vector(V(auth_g)$name),
                Count = deg)

# set background to black
par(bg="black")

# To map communities by color
g.com <- fastgreedy.community(as.undirected(auth_g))
V(auth_g)$color <- g.com$membership + 1

# To define colors for individual authors
#V(auth_g)$color <- ifelse(V(auth_g)$name %in% "author1", "red", 
    ifelse(V(auth_g)$name %in% "author2", "green", 
    ifelse(V(auth_g)$name %in% "author3", "orange", "blue")))

# plot using igraph
plot.igraph(auth_g, 
    layout=layout.kamada.kawai, 
    vertex.label=NA, 
    vertex.size=deg2/10, 
    vertex.color = adjustcolor(V(auth_g)$color,alpha.f=0.75), 
    edge.arrow.mode=0)