What?

A network of colleagues. A family network. A network of friends. All of us are part of various networks. And I find these networks interesting. For example, insights into your extended network of colleagues might very well get you your next job. Knowing the network of your family might introduce you to someone in the family you never knew about. In the same gist, faculty in a university form a network. They collaborate with each other and teach together. Such a network of faculty in the Department of Physics at the Indian Institute of Technology Madras is what I am interested in here.

Why?

I can give you many reasons for Why I am interested in such a network of physicists but the first and foremost reason would be to understand how the network evolves. The same way each university is made up of departments, each department is made up of numerous labs headed by faculty. Depending on the faculty, some labs might choose to collaborate with one another. Such collaborations can be better understood from networks.

Secondly, labs evolve, in the sense that undergraduate and graduate students work in the labs over the course of their college life, part of which includes publishing papers in Journals and Conferences.

This evolution of a lab, a department and a university are interesting to me and that's why.

As I mentioned earlier, understanding such networks can help you in ways you couldn't have imagined. In the same light, understanding the network of a department can/might help the individual faculty members find possible collaborators pursuing similar work.

How?

As I mentioned earlier, each lab is made up of faculty, graduate and under-graduate students and various other people helping take care of the lab's day-to-day. The pursuit of a research goal and common academic interests is what brought these people together in the lab. A common outcome of such a pursuit are papers in Journals and Conferences. Note that papers are not the only indicator of research being pursued but it has become an indicator of a researcher's career. Such papers are what I used to create this network of faculty members in the Physics department at IIT Madras.

Take a look at the Recent Publications page of the Dept. of Physics, IIT Madras. If not all, this seems to be a list of papers that have been published by various faculty and their labs. You can see that one of the columns in the table is the list of authors on a particular paper. These lists of authors are what I use to generate my network.

Below you will find the exact code I used to extract the author list on various papers from that webpage and make a network. In the end, once I generate an appropriate network/graph, I saved the network in a file, force.json. I then used D3 to load and visualize the network.

Note that there are errenous nodes in the network/graph you see below. I am working on cleaning it up and removing any such errors. Also note that the network you see below is made up of ~180 papers or 6 pages. A network made up of the authors from the full ~900 papers available on the webpage would be too big to visualize here.

Hover over the individual nodes/points on the network and the name of the faculty/student will be displayed. As always, I would love to hear any comments/feedback from you. Please create an issue on the github repository with your comments and feedback or critique. Thank you.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import json

import pandas
import networkx as nx
from networkx.readwrite import json_graph


url_template = 'https://physics.iitm.ac.in/researchinfo?page={}'
authorlist = []


# change 8 to 30 for the full ~900 papers
for i in range(8):
    # we can pass a url as the first argument to pandas.read_html
    # and it returns a list of data frames
    df_list = pandas.read_html(url_template.format(i),
                               header=0,
                               index_col=0
    )
    df = df_list[0]

    # column containing author names needs to be cleaned
    df.Authors = df.Authors.str.lower()
    df.Authors = df.Authors.str.strip()
    df.Authors = df.Authors.str.replace('*', ' ')
    df.Authors = df.Authors.str.replace('and ', ',')
    df.Authors = df.Authors.str.replace('&', ',')

    # Split column containing authors on ","
    # split is a data frame i.e 2D array
    split = df['Authors'].str.split(u',', expand=True)
    split.columns = ['Authors_split_{0}'.format(i)
                     for i in range(len(split.columns))]
   
    # strip author names of whitespaces
    for column in split:
        split[column] = split[column].str.strip()

    # each row contains authors of a paper
    # the row might contains NaNs, which is why we use dropna
    for i in range(len(split)-1):
        authorlist.append(list(split.iloc[i].dropna()))


G = nx.Graph()

# link each author to the other authors on each paper
for list in authorlist:
    for pos, node1 in enumerate(list):
        for node2 in list[pos:]:
            # there might be empty strings or whitespaces in the author list
            if node1 != u'' and node2 != u'' and node1 != u' ' and node2 != u' ':
                G.add_edge(node1, node2)

# label each node with the author's name
for n in G:
    G.node[n]['name'] = n

# draw the graph using networkx's Graph object
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos, node_size=100, node_color='blue')
nx.draw_networkx_edges(G, pos, edge_color='green')
nx.draw_networkx_labels(G, pos, font_color='red')

# convert the Graph object into a JSON object
# we use the JSON object using D3
d = json_graph.node_link_data(G)
json.dump(d, open('force.json', 'w'))