network analysis

This plot is a network of co-authors for MIS-C published studies from PubMed. MIS-C is an inflammatory disease related to Kawasaki's disease that affects children who have been infected with COVID-19. I have plotted this to demonstrate how publicly available data can be used to perform a network analysis.
To get started, search for MIS-C over at PubMed. Download the full dateset as a CSV file. This will be formatted in a typically citation style export. That export can be parsed with the following Perl script to just get a non-repeating edge list of co-authors (pass CSV file as argument to Perl script):
#! use perl
use warnings;
use strict;
use Unicode::Collate;
my $inFile = $ARGV[0];
my $outFile = "pubmed-authors.txt";
my @aut;
my @eic;
my @newArray;
my $list;
my %main;
my $var;
my $lc = 0;
open(IN,"<$inFile") || print "can't open $inFile\n";
# read and split the csv to pull out the authors
for $_(<IN>){
my @l = split('\",\"', $_);
my $authList = $l[2];
$lc++;
$authList =~ s/\.//g;
$authList =~ s/;.*//g;
$authList =~ s/ //g;
my @authors = split(',', $authList);
# create author in main array if doesn't exist
for $_(@authors){
my $author = $_;
if(exists($main{$author})){
$list = $main{$author};
}
else {
$list = "";
}
$main{$author} = $list;
}
# compile list of non-repeating co-authors for each author
for $_(@authors){
my $author1 = $_;
for $_(@authors){
my $author2 = $_;
unless($author1 eq $author2){
$list = $main{$author1};
my $list2 = $main{$author2};
unless($list =~ m/$author2/){
unless($list2 =~ m/$author1/){
unless($author2 eq "etal"){
$list = $list." ".$author2;
$main{$author1} = $list;
}
}
}
}
}
}
}
close IN;
# sort the authors
for my $key(keys %main){
my $val = $main{$key};
@aut = split(" ", $val);
@aut = sort @aut;
my $newOrder = join(" ", @aut);
$main{$key} = $newOrder;
}
# format the output as an edgelist to run in R igraph
for my $key(keys %main){
my $val = $main{$key};
@eic = split(" ", $val);
for $_(@eic){
my $pair = "$key,$_\n";
push(@newArray, $pair);
}
}
@newArray = sort @newArray;
# save results (eliminate etal) to a txt file
open(OUT,">>$outFile") || print "can't open $outFile\n";
for $_(@newArray){
unless($_ =~ m/^etal/){
print OUT $_;
}
}
close OUT;
This should provide an edge list export formatted like:
author1,author2
author1,author3
author2,author4
That pubmed-authors.txt file can be read into an R script that uses the igraph library to create this plot:
library(igraph)
# NEW method from file
setwd("~/pubmed/")
ln <- readLines("pubmed-authors.txt")
# Store as matrix with from to indices
auth_mtx <- do.call(rbind, strsplit(ln[grep(".*, .*", ln)], ","))
auth_g <- graph_from_data_frame(apply(auth_mtx, 2, as.character))
deg <- igraph::degree(auth_g, mode="all")
# this allows you to check the connectivity per author
d <- data.frame(V = as.vector(V(auth_g)$name),
Count = deg)
# set background to black
par(bg="black")
# To map communities by color
g.com <- fastgreedy.community(as.undirected(auth_g))
V(auth_g)$color <- g.com$membership + 1
# To define colors for individual authors
#V(auth_g)$color <- ifelse(V(auth_g)$name %in% "author1", "red",
ifelse(V(auth_g)$name %in% "author2", "green",
ifelse(V(auth_g)$name %in% "author3", "orange", "blue")))
# plot using igraph
plot.igraph(auth_g,
layout=layout.kamada.kawai,
vertex.label=NA,
vertex.size=deg2/10,
vertex.color = adjustcolor(V(auth_g)$color,alpha.f=0.75),
edge.arrow.mode=0)