Creating Co-Occurence Graph of Drugs-Chemicals-PMIDs

This notebook creates a graph database storing the relation of drugs and chemicals related to oxidative stress and their occurance in PubMed abstracts

Nodes are created for drugs, chemicals, articles, and MeSH terms.

import neo4j_functions.driver as neo4j_driver
import pandas as pd
import importlib

drug_list_df = pd.read_csv('lib/Drug list total 04.05.19   - Overview Drug list.csv')
drug_occurance_df = pd.read_csv('lib/Drug_PMID_occurances.csv')

chemical_list_df = pd.read_csv('lib/Oxidative Stress Text Mining Targets 4.1 - Summary of Oxidative Stress.csv')
chemical_occurance_df = pd.read_csv('lib/Chemical_PMID_occurances.csv')

Merging Drug List with Drug Occurance Data Sets

Duplicate drug names in the lab provided list are merged, the drug with an associated category is kept if possible
Deduplicated list of drug names is merged with a dataframe for drug occurance generated on the CaseOLAP cloud instance
- Notebook used to generate drug occurance list located at /home/ubuntu/RotationStd/elasticsearch/chemical_drug_elastic_occurance.ipynb
Final merged dataframe saved in to import folder of neo4j instance

# Removing Duplicate drug names, keeping version with a drug category if possible
deduped_drug_list = drug_list_df.sort_values(by='Drug Category').drop_duplicates(subset=['Name'], keep='first')

# Merging drug list with drug occurance list
drug_occurance_df['MeSH'] = drug_occurance_df['MeSH'].str.replace('[', '').str.replace(']', '').str.replace("'", '')

drug_list_occurance_df = drug_occurance_df.merge(
    deduped_drug_list.rename(columns={
        'Name': 'drug',
        'Drug Category': 'category',
        'MeSH Descriptor': 'drug_mesh',
    }),
    how='inner',
    validate='m:1'
)
# Values with NaN for category or synonym replaced
# NaN for synonym replaced with drug name, category replaced with 'None'
drug_list_occurance_df['drug']  = drug_list_occurance_df['drug'].str.strip()
drug_list_occurance_df.loc[drug_list_occurance_df.MeSH == '', 'MeSH'] = 'None'

drug_list_occurance_df.loc[drug_list_occurance_df.category.isnull(), 'category']  = 'None'
drug_list_occurance_df.loc[drug_list_occurance_df.drug_mesh.isnull(), 'drug_mesh']  = 'None'
drug_list_occurance_df.loc[drug_list_occurance_df.Synonyms.isnull(), 'Synonyms']  = drug_list_occurance_df[drug_list_occurance_df.Synonyms.isnull()].drug

# Saving file to import area of local neo4j instance
drug_list_occurance_file = '/Users/akre96/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-dc2bbd3b-84e9-421e-8594-9fe29be9bb02/installation-3.5.6/import/drug_list_occurance.csv'
drug_list_occurance_df.to_csv(drug_list_occurance_file, index=False)
drug_list_occurance_df.head()

	MeSH	PMID	abstract	title	drug	category	#	Synonyms	drug_mesh	MeSH tree(s)	Common adverse effects	Dosage (freq/amount/time/delivery)	Duration (time)	Pham Action
0	Actinomycetales, chemistry, enzymology, Adenos...	8784428	a phosphotransferase which modifies the alpha ...	Acarbose 7-phosphotransferase from Actinoplane...	Acarbose	Alpha-glucosidase Inhibitors	54	Acarbosa, Acarbose, Acarbosum	Acarbose	D09.698.629.802.100	Hypoglycaemia, Hypoglycaemic \ncoma, pneumatos...	3/25-50-100mg/day/po	4-8 weeks intervals	Glycoside \nHydrolase Inhibitors
1	Acarbose, Adult, Blood Glucose, metabolism, Cl...	6350115	in a double blind study we have compared the e...	Effect of acarbose, pectin, a combination of a...	Acarbose	Alpha-glucosidase Inhibitors	54	Acarbosa, Acarbose, Acarbosum	Acarbose	D09.698.629.802.100	Hypoglycaemia, Hypoglycaemic \ncoma, pneumatos...	3/25-50-100mg/day/po	4-8 weeks intervals	Glycoside \nHydrolase Inhibitors
2	Acarbose, Adult, Aged, Blood Glucose, metaboli...	9663365	acarbose is an alpha glucosidase inhibitor app...	Effects of beano on the tolerability and pharm...	Acarbose	Alpha-glucosidase Inhibitors	54	Acarbosa, Acarbose, Acarbosum	Acarbose	D09.698.629.802.100	Hypoglycaemia, Hypoglycaemic \ncoma, pneumatos...	3/25-50-100mg/day/po	4-8 weeks intervals	Glycoside \nHydrolase Inhibitors
3	Acarbose, administration & dosage, Animals, Bo...	11779583	as alpha glucosidase inhibitor, the antidiabet...	Chronic acarbose-feeding increases GLUT1 prote...	Acarbose	Alpha-glucosidase Inhibitors	54	Acarbosa, Acarbose, Acarbosum	Acarbose	D09.698.629.802.100	Hypoglycaemia, Hypoglycaemic \ncoma, pneumatos...	3/25-50-100mg/day/po	4-8 weeks intervals	Glycoside \nHydrolase Inhibitors
4	Acarbose, Aged, Blood Glucose, metabolism, Dia...	9428831	to compare the therapeutic potential of acarbo...	Efficacy of 24-week monotherapy with acarbose,...	Acarbose	Alpha-glucosidase Inhibitors	54	Acarbosa, Acarbose, Acarbosum	Acarbose	D09.698.629.802.100	Hypoglycaemia, Hypoglycaemic \ncoma, pneumatos...	3/25-50-100mg/day/po	4-8 weeks intervals	Glycoside \nHydrolase Inhibitors

Creating Neo4J Graph Database for Drug occurance in PMIDs

Neo4J Driver initialized
Query formed to import data from list generated in previous section of this notebook
- Loading csv
- Creating drug entities with name, category, and synonym fields
- Creating article entities with PMID, abstract, title, and MeSH fields
- Creating edges labeled OCCURANCE for connecting drugs referenced by a PMID

importlib.reload(neo4j_driver)
driver = neo4j_driver.driver(uri = "bolt://localhost:7687", user = "neo4j", password = "drug1234")

import_data_query = (
    "LOAD CSV WITH HEADERS FROM %s AS row"
    " MERGE (drug:Drug {name: row.drug, category: row.category, synonyms: row.Synonyms})"
    " MERGE (article:Article {PMID: row.PMID, abstract: row.abstract, title: row.title, MeSH: split(row.MeSH, %s)})"
    " MERGE (drug)-[:OCCURANCE]->(article)"
    " MERGE (drugmesh:MeSH {name: row.drug_mesh})"
    " MERGE (drug)-[:OCCURANCE]->(article)"
    " MERGE (drug)-[:HAS_MESH]->(drugmesh)"
    % ('"file:///' + 'drug_list_occurance.csv' + '"', "', '")
)
print('Query:\n\t', import_data_query)
with driver.driver.session() as session:
    result = session.run(import_data_query)

Query:
     LOAD CSV WITH HEADERS FROM "file:///drug_list_occurance.csv" AS row MERGE (drug:Drug {name: row.drug, category: row.category, synonyms: row.Synonyms}) MERGE (article:Article {PMID: row.PMID, abstract: row.abstract, title: row.title, MeSH: split(row.MeSH, ', ')}) MERGE (drug)-[:OCCURANCE]->(article) MERGE (drugmesh:MeSH {name: row.drug_mesh}) MERGE (drug)-[:OCCURANCE]->(article) MERGE (drug)-[:HAS_MESH]->(drugmesh)

Merging Chemical List with Chemical Occurance Data Sets

deduped_chem_list = chemical_list_df\
    .dropna(subset=['Molecule/Enzyme/Protein'])\
    .sort_values(by='Molecular and Functional Categories')\
    .drop_duplicates(subset=['Molecule/Enzyme/Protein'], keep='first')\
    .fillna('None')

chemical_occurance_df['MeSH'] = chemical_occurance_df['MeSH'].str.replace('[', '').str.replace(']', '').str.replace("'", '')
chem_list_occurance_df = chemical_occurance_df.merge(
    deduped_chem_list.rename(columns={
        'Molecule/Enzyme/Protein': 'chemical',
        'Chemical Formula': 'formula',
        'Molecular and Functional Categories': 'GO_MF',
        'Biological Events of Oxidative Stress': 'GO_Oxidative_Stress',
        'MeSH Heading': 'chemical_mesh'
    }),
    how='inner',
    validate='m:1'
).fillna('None')
chem_list_occurance_df['chemical'] = chem_list_occurance_df.chemical.str.strip()
chem_list_occurance_df.loc[chem_list_occurance_df.MeSH == '', 'MeSH'] = 'None'
chem_list_occurance_df.head()

	MeSH	PMID	abstract	title	chemical	GO_Oxidative_Stress	GO_MF	chemical_mesh	MeSH Supplementary	MeSH tree numbers	formula	Examples	Pharm Actions	Tree Numbers	References
0	None	31368101	coronary spasm plays an important role in the ...	Association of East Asian Variant Aldehyde Deh...	4-hydroxy-2-nonenal (4-HNE)	135	Lipid Peroxidation Products	Aldehydes	4-hydroxy-2-nonenal	D02.047	C9H16O2	4-HNE, MDA	Cross-Linking Reagents	D27.720.470.410.210	None
1	Acetylcholinesterase, metabolism, Aldehydes, m...	10463393	we have investigated the effect of soman induc...	Increased levels of nitrogen oxides and lipid ...	4-hydroxy-2-nonenal (4-HNE)	135	Lipid Peroxidation Products	Aldehydes	4-hydroxy-2-nonenal	D02.047	C9H16O2	4-HNE, MDA	Cross-Linking Reagents	D27.720.470.410.210	None
2	Aldehydes, chemistry, Amines, chemistry, Benzy...	8448343	the reaction of trans 4 hydroxy 2 nonenal (4 h...	Pyrrole formation from 4-hydroxynonenal and pr...	4-hydroxy-2-nonenal (4-HNE)	135	Lipid Peroxidation Products	Aldehydes	4-hydroxy-2-nonenal	D02.047	C9H16O2	4-HNE, MDA	Cross-Linking Reagents	D27.720.470.410.210	None
3	Animals, Blood-Brain Barrier, metabolism, path...	29775963	brain ischemic preconditioning (ipc) with mild...	Brain ischemic preconditioning protects agains...	4-hydroxy-2-nonenal (4-HNE)	135	Lipid Peroxidation Products	Aldehydes	4-hydroxy-2-nonenal	D02.047	C9H16O2	4-HNE, MDA	Cross-Linking Reagents	D27.720.470.410.210	None
4	Alzheimer Disease, drug therapy, enzymology, p...	30218858	excessive production of amyloid β (aβ) induced...	Neuro-protective effects of aloperine in an Al...	4-hydroxy-2-nonenal (4-HNE)	135	Lipid Peroxidation Products	Aldehydes	4-hydroxy-2-nonenal	D02.047	C9H16O2	4-HNE, MDA	Cross-Linking Reagents	D27.720.470.410.210	None

# Saving file to import area of local neo4j instance
chem_list_occurance_file = '/Users/akre96/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-dc2bbd3b-84e9-421e-8594-9fe29be9bb02/installation-3.5.6/import/chem_list_occurance.csv'
chem_list_occurance_df.to_csv(chem_list_occurance_file, index=False)

Adding to Neo4J Graph Database for Chemical occurance in PMIDs

Query formed to import data from list generated in previous section of this notebook
- Loading csv
- Creating chemical entities with name, example, and formula fields
- Merges article entities with PMID, abstract, title, and MeSH fields
- Creating edges labeled OCCURANCE for connecting drugs referenced by a PMID

import_chemical_data_query = (
    "LOAD CSV WITH HEADERS FROM %s AS row"
    " MERGE (chem:Chemical {name: row.chemical, example: row.Examples, formula: row.formula})"
    " MERGE (article:Article {PMID: row.PMID, abstract: row.abstract, title: row.title, MeSH: split(row.MeSH, %s)})"
    " MERGE (mesh:MeSH {name: row.chemical_mesh})"
    " MERGE (chem)-[:OCCURANCE]->(article)"
    " MERGE (chem)-[:HAS_MESH]->(mesh)"
    % ('"file:///' + 'chem_list_occurance.csv' + '"', "', '")

)
print('Query:\n\t', import_chemical_data_query)
with driver.driver.session() as session:
    result = session.run(import_chemical_data_query)

Query:
     LOAD CSV WITH HEADERS FROM "file:///chem_list_occurance.csv" AS row MERGE (chem:Chemical {name: row.chemical, example: row.Examples, formula: row.formula}) MERGE (article:Article {PMID: row.PMID, abstract: row.abstract, title: row.title, MeSH: split(row.MeSH, ', ')}) MERGE (mesh:MeSH {name: row.chemical_mesh}) MERGE (chem)-[:OCCURANCE]->(article) MERGE (chem)-[:HAS_MESH]->(mesh)

Adding MeSH descriptors from Articles as MeSH node

Adds nodes from mesh descriptor list
Deletes "none" node

article_mesh_descriptors_query = (
    "MATCH (article:Article)"
    " UNWIND article.MeSH AS m"
    " MERGE (artMesh:MeSH {name: m})"
    " MERGE (article)-[:HAS_MESH]->(artMesh)"
)
print('Query:\n\t', article_mesh_descriptors_query)
with driver.driver.session() as session:
    result = session.run(article_mesh_descriptors_query)

Query:
     MATCH (article:Article) UNWIND article.MeSH AS m MERGE (artMesh:MeSH {name: m}) MERGE (article)-[:HAS_MESH]->(artMesh)

delete_none_mesh_descriptors_query = (
    "MATCH (m:MeSH {name: 'None'})"
    " DETACH DELETE m"
)
print('Query:\n\t', delete_none_mesh_descriptors_query)
with driver.driver.session() as session:
    result = session.run(delete_none_mesh_descriptors_query)

Query:
     MATCH (m:MeSH {name: 'None'}) DETACH DELETE m