Creating Co-Occurence Graph of Drugs-Chemicals-PMIDs

This notebook creates a graph database storing the relation of drugs and chemicals related to oxidative stress and their occurance in PubMed abstracts

Nodes are created for drugs, chemicals, articles, and MeSH terms.

import neo4j_functions.driver as neo4j_driver
import pandas as pd
import importlib
drug_list_df = pd.read_csv('lib/Drug list total 04.05.19   - Overview Drug list.csv')
drug_occurance_df = pd.read_csv('lib/Drug_PMID_occurances.csv')

chemical_list_df = pd.read_csv('lib/Oxidative Stress Text Mining Targets 4.1 - Summary of Oxidative Stress.csv')
chemical_occurance_df = pd.read_csv('lib/Chemical_PMID_occurances.csv')

Merging Drug List with Drug Occurance Data Sets

  1. Duplicate drug names in the lab provided list are merged, the drug with an associated category is kept if possible
  2. Deduplicated list of drug names is merged with a dataframe for drug occurance generated on the CaseOLAP cloud instance
    • Notebook used to generate drug occurance list located at /home/ubuntu/RotationStd/elasticsearch/chemical_drug_elastic_occurance.ipynb
  3. Final merged dataframe saved in to import folder of neo4j instance
# Removing Duplicate drug names, keeping version with a drug category if possible
deduped_drug_list = drug_list_df.sort_values(by='Drug Category').drop_duplicates(subset=['Name'], keep='first')
# Merging drug list with drug occurance list
drug_occurance_df['MeSH'] = drug_occurance_df['MeSH'].str.replace('[', '').str.replace(']', '').str.replace("'", '')

drug_list_occurance_df = drug_occurance_df.merge(
    deduped_drug_list.rename(columns={
        'Name': 'drug',
        'Drug Category': 'category',
        'MeSH Descriptor': 'drug_mesh',
    }),
    how='inner',
    validate='m:1'
)
# Values with NaN for category or synonym replaced
# NaN for synonym replaced with drug name, category replaced with 'None'
drug_list_occurance_df['drug']  = drug_list_occurance_df['drug'].str.strip()
drug_list_occurance_df.loc[drug_list_occurance_df.MeSH == '', 'MeSH'] = 'None'

drug_list_occurance_df.loc[drug_list_occurance_df.category.isnull(), 'category']  = 'None'
drug_list_occurance_df.loc[drug_list_occurance_df.drug_mesh.isnull(), 'drug_mesh']  = 'None'
drug_list_occurance_df.loc[drug_list_occurance_df.Synonyms.isnull(), 'Synonyms']  = drug_list_occurance_df[drug_list_occurance_df.Synonyms.isnull()].drug
# Saving file to import area of local neo4j instance
drug_list_occurance_file = '/Users/akre96/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-dc2bbd3b-84e9-421e-8594-9fe29be9bb02/installation-3.5.6/import/drug_list_occurance.csv'
drug_list_occurance_df.to_csv(drug_list_occurance_file, index=False)
drug_list_occurance_df.head()
MeSH PMID abstract title drug category # Synonyms drug_mesh MeSH tree(s) Common adverse effects Dosage (freq/amount/time/delivery) Duration (time) Pham Action
0 Actinomycetales, chemistry, enzymology, Adenos... 8784428 a phosphotransferase which modifies the alpha ... Acarbose 7-phosphotransferase from Actinoplane... Acarbose Alpha-glucosidase Inhibitors 54 Acarbosa, Acarbose, Acarbosum Acarbose D09.698.629.802.100 Hypoglycaemia, Hypoglycaemic \ncoma, pneumatos... 3/25-50-100mg/day/po 4-8 weeks intervals Glycoside \nHydrolase Inhibitors
1 Acarbose, Adult, Blood Glucose, metabolism, Cl... 6350115 in a double blind study we have compared the e... Effect of acarbose, pectin, a combination of a... Acarbose Alpha-glucosidase Inhibitors 54 Acarbosa, Acarbose, Acarbosum Acarbose D09.698.629.802.100 Hypoglycaemia, Hypoglycaemic \ncoma, pneumatos... 3/25-50-100mg/day/po 4-8 weeks intervals Glycoside \nHydrolase Inhibitors
2 Acarbose, Adult, Aged, Blood Glucose, metaboli... 9663365 acarbose is an alpha glucosidase inhibitor app... Effects of beano on the tolerability and pharm... Acarbose Alpha-glucosidase Inhibitors 54 Acarbosa, Acarbose, Acarbosum Acarbose D09.698.629.802.100 Hypoglycaemia, Hypoglycaemic \ncoma, pneumatos... 3/25-50-100mg/day/po 4-8 weeks intervals Glycoside \nHydrolase Inhibitors
3 Acarbose, administration & dosage, Animals, Bo... 11779583 as alpha glucosidase inhibitor, the antidiabet... Chronic acarbose-feeding increases GLUT1 prote... Acarbose Alpha-glucosidase Inhibitors 54 Acarbosa, Acarbose, Acarbosum Acarbose D09.698.629.802.100 Hypoglycaemia, Hypoglycaemic \ncoma, pneumatos... 3/25-50-100mg/day/po 4-8 weeks intervals Glycoside \nHydrolase Inhibitors
4 Acarbose, Aged, Blood Glucose, metabolism, Dia... 9428831 to compare the therapeutic potential of acarbo... Efficacy of 24-week monotherapy with acarbose,... Acarbose Alpha-glucosidase Inhibitors 54 Acarbosa, Acarbose, Acarbosum Acarbose D09.698.629.802.100 Hypoglycaemia, Hypoglycaemic \ncoma, pneumatos... 3/25-50-100mg/day/po 4-8 weeks intervals Glycoside \nHydrolase Inhibitors

Creating Neo4J Graph Database for Drug occurance in PMIDs

  1. Neo4J Driver initialized
  2. Query formed to import data from list generated in previous section of this notebook
    • Loading csv
    • Creating drug entities with name, category, and synonym fields
    • Creating article entities with PMID, abstract, title, and MeSH fields
    • Creating edges labeled OCCURANCE for connecting drugs referenced by a PMID
importlib.reload(neo4j_driver)
driver = neo4j_driver.driver(uri = "bolt://localhost:7687", user = "neo4j", password = "drug1234")
import_data_query = (
    "LOAD CSV WITH HEADERS FROM %s AS row"
    " MERGE (drug:Drug {name: row.drug, category: row.category, synonyms: row.Synonyms})"
    " MERGE (article:Article {PMID: row.PMID, abstract: row.abstract, title: row.title, MeSH: split(row.MeSH, %s)})"
    " MERGE (drug)-[:OCCURANCE]->(article)"
    " MERGE (drugmesh:MeSH {name: row.drug_mesh})"
    " MERGE (drug)-[:OCCURANCE]->(article)"
    " MERGE (drug)-[:HAS_MESH]->(drugmesh)"
    % ('"file:///' + 'drug_list_occurance.csv' + '"', "', '")
)
print('Query:\n\t', import_data_query)
with driver.driver.session() as session:
    result = session.run(import_data_query)
Query:
     LOAD CSV WITH HEADERS FROM "file:///drug_list_occurance.csv" AS row MERGE (drug:Drug {name: row.drug, category: row.category, synonyms: row.Synonyms}) MERGE (article:Article {PMID: row.PMID, abstract: row.abstract, title: row.title, MeSH: split(row.MeSH, ', ')}) MERGE (drug)-[:OCCURANCE]->(article) MERGE (drugmesh:MeSH {name: row.drug_mesh}) MERGE (drug)-[:OCCURANCE]->(article) MERGE (drug)-[:HAS_MESH]->(drugmesh)

Merging Chemical List with Chemical Occurance Data Sets

deduped_chem_list = chemical_list_df\
    .dropna(subset=['Molecule/Enzyme/Protein'])\
    .sort_values(by='Molecular and Functional Categories')\
    .drop_duplicates(subset=['Molecule/Enzyme/Protein'], keep='first')\
    .fillna('None')
chemical_occurance_df['MeSH'] = chemical_occurance_df['MeSH'].str.replace('[', '').str.replace(']', '').str.replace("'", '')
chem_list_occurance_df = chemical_occurance_df.merge(
    deduped_chem_list.rename(columns={
        'Molecule/Enzyme/Protein': 'chemical',
        'Chemical Formula': 'formula',
        'Molecular and Functional Categories': 'GO_MF',
        'Biological Events of Oxidative Stress': 'GO_Oxidative_Stress',
        'MeSH Heading': 'chemical_mesh'
    }),
    how='inner',
    validate='m:1'
).fillna('None')
chem_list_occurance_df['chemical'] = chem_list_occurance_df.chemical.str.strip()
chem_list_occurance_df.loc[chem_list_occurance_df.MeSH == '', 'MeSH'] = 'None'
chem_list_occurance_df.head()
MeSH PMID abstract title chemical GO_Oxidative_Stress GO_MF chemical_mesh MeSH Supplementary MeSH tree numbers formula Examples Pharm Actions Tree Numbers References
0 None 31368101 coronary spasm plays an important role in the ... Association of East Asian Variant Aldehyde Deh... 4-hydroxy-2-nonenal (4-HNE) 135 Lipid Peroxidation Products Aldehydes 4-hydroxy-2-nonenal D02.047 C9H16O2 4-HNE, MDA Cross-Linking Reagents D27.720.470.410.210 None
1 Acetylcholinesterase, metabolism, Aldehydes, m... 10463393 we have investigated the effect of soman induc... Increased levels of nitrogen oxides and lipid ... 4-hydroxy-2-nonenal (4-HNE) 135 Lipid Peroxidation Products Aldehydes 4-hydroxy-2-nonenal D02.047 C9H16O2 4-HNE, MDA Cross-Linking Reagents D27.720.470.410.210 None
2 Aldehydes, chemistry, Amines, chemistry, Benzy... 8448343 the reaction of trans 4 hydroxy 2 nonenal (4 h... Pyrrole formation from 4-hydroxynonenal and pr... 4-hydroxy-2-nonenal (4-HNE) 135 Lipid Peroxidation Products Aldehydes 4-hydroxy-2-nonenal D02.047 C9H16O2 4-HNE, MDA Cross-Linking Reagents D27.720.470.410.210 None
3 Animals, Blood-Brain Barrier, metabolism, path... 29775963 brain ischemic preconditioning (ipc) with mild... Brain ischemic preconditioning protects agains... 4-hydroxy-2-nonenal (4-HNE) 135 Lipid Peroxidation Products Aldehydes 4-hydroxy-2-nonenal D02.047 C9H16O2 4-HNE, MDA Cross-Linking Reagents D27.720.470.410.210 None
4 Alzheimer Disease, drug therapy, enzymology, p... 30218858 excessive production of amyloid β (aβ) induced... Neuro-protective effects of aloperine in an Al... 4-hydroxy-2-nonenal (4-HNE) 135 Lipid Peroxidation Products Aldehydes 4-hydroxy-2-nonenal D02.047 C9H16O2 4-HNE, MDA Cross-Linking Reagents D27.720.470.410.210 None
# Saving file to import area of local neo4j instance
chem_list_occurance_file = '/Users/akre96/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-dc2bbd3b-84e9-421e-8594-9fe29be9bb02/installation-3.5.6/import/chem_list_occurance.csv'
chem_list_occurance_df.to_csv(chem_list_occurance_file, index=False)

Adding to Neo4J Graph Database for Chemical occurance in PMIDs

  1. Query formed to import data from list generated in previous section of this notebook
    • Loading csv
    • Creating chemical entities with name, example, and formula fields
    • Merges article entities with PMID, abstract, title, and MeSH fields
    • Creating edges labeled OCCURANCE for connecting drugs referenced by a PMID
import_chemical_data_query = (
    "LOAD CSV WITH HEADERS FROM %s AS row"
    " MERGE (chem:Chemical {name: row.chemical, example: row.Examples, formula: row.formula})"
    " MERGE (article:Article {PMID: row.PMID, abstract: row.abstract, title: row.title, MeSH: split(row.MeSH, %s)})"
    " MERGE (mesh:MeSH {name: row.chemical_mesh})"
    " MERGE (chem)-[:OCCURANCE]->(article)"
    " MERGE (chem)-[:HAS_MESH]->(mesh)"
    % ('"file:///' + 'chem_list_occurance.csv' + '"', "', '")

)
print('Query:\n\t', import_chemical_data_query)
with driver.driver.session() as session:
    result = session.run(import_chemical_data_query)
Query:
     LOAD CSV WITH HEADERS FROM "file:///chem_list_occurance.csv" AS row MERGE (chem:Chemical {name: row.chemical, example: row.Examples, formula: row.formula}) MERGE (article:Article {PMID: row.PMID, abstract: row.abstract, title: row.title, MeSH: split(row.MeSH, ', ')}) MERGE (mesh:MeSH {name: row.chemical_mesh}) MERGE (chem)-[:OCCURANCE]->(article) MERGE (chem)-[:HAS_MESH]->(mesh)

Adding MeSH descriptors from Articles as MeSH node

  1. Adds nodes from mesh descriptor list
  2. Deletes "none" node
article_mesh_descriptors_query = (
    "MATCH (article:Article)"
    " UNWIND article.MeSH AS m"
    " MERGE (artMesh:MeSH {name: m})"
    " MERGE (article)-[:HAS_MESH]->(artMesh)"
)
print('Query:\n\t', article_mesh_descriptors_query)
with driver.driver.session() as session:
    result = session.run(article_mesh_descriptors_query)
Query:
     MATCH (article:Article) UNWIND article.MeSH AS m MERGE (artMesh:MeSH {name: m}) MERGE (article)-[:HAS_MESH]->(artMesh)
delete_none_mesh_descriptors_query = (
    "MATCH (m:MeSH {name: 'None'})"
    " DETACH DELETE m"
)
print('Query:\n\t', delete_none_mesh_descriptors_query)
with driver.driver.session() as session:
    result = session.run(delete_none_mesh_descriptors_query)
Query:
     MATCH (m:MeSH {name: 'None'}) DETACH DELETE m