Creating Co-Occurence Graph of Drugs-Chemicals-PMIDs
This notebook creates a graph database storing the relation of drugs and chemicals related to oxidative stress and their occurance in PubMed abstracts
Nodes are created for drugs, chemicals, articles, and MeSH terms.
import neo4j_functions.driver as neo4j_driver
import pandas as pd
import importlib
drug_list_df = pd.read_csv('lib/Drug list total 04.05.19 - Overview Drug list.csv')
drug_occurance_df = pd.read_csv('lib/Drug_PMID_occurances.csv')
chemical_list_df = pd.read_csv('lib/Oxidative Stress Text Mining Targets 4.1 - Summary of Oxidative Stress.csv')
chemical_occurance_df = pd.read_csv('lib/Chemical_PMID_occurances.csv')
Merging Drug List with Drug Occurance Data Sets
- Duplicate drug names in the lab provided list are merged, the drug with an associated category is kept if possible
- Deduplicated list of drug names is merged with a dataframe for drug occurance generated on the CaseOLAP cloud instance
- Notebook used to generate drug occurance list located at
/home/ubuntu/RotationStd/elasticsearch/chemical_drug_elastic_occurance.ipynb
- Notebook used to generate drug occurance list located at
- Final merged dataframe saved in to
import
folder of neo4j instance
# Removing Duplicate drug names, keeping version with a drug category if possible
deduped_drug_list = drug_list_df.sort_values(by='Drug Category').drop_duplicates(subset=['Name'], keep='first')
# Merging drug list with drug occurance list
drug_occurance_df['MeSH'] = drug_occurance_df['MeSH'].str.replace('[', '').str.replace(']', '').str.replace("'", '')
drug_list_occurance_df = drug_occurance_df.merge(
deduped_drug_list.rename(columns={
'Name': 'drug',
'Drug Category': 'category',
'MeSH Descriptor': 'drug_mesh',
}),
how='inner',
validate='m:1'
)
# Values with NaN for category or synonym replaced
# NaN for synonym replaced with drug name, category replaced with 'None'
drug_list_occurance_df['drug'] = drug_list_occurance_df['drug'].str.strip()
drug_list_occurance_df.loc[drug_list_occurance_df.MeSH == '', 'MeSH'] = 'None'
drug_list_occurance_df.loc[drug_list_occurance_df.category.isnull(), 'category'] = 'None'
drug_list_occurance_df.loc[drug_list_occurance_df.drug_mesh.isnull(), 'drug_mesh'] = 'None'
drug_list_occurance_df.loc[drug_list_occurance_df.Synonyms.isnull(), 'Synonyms'] = drug_list_occurance_df[drug_list_occurance_df.Synonyms.isnull()].drug
# Saving file to import area of local neo4j instance
drug_list_occurance_file = '/Users/akre96/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-dc2bbd3b-84e9-421e-8594-9fe29be9bb02/installation-3.5.6/import/drug_list_occurance.csv'
drug_list_occurance_df.to_csv(drug_list_occurance_file, index=False)
drug_list_occurance_df.head()
MeSH | PMID | abstract | title | drug | category | # | Synonyms | drug_mesh | MeSH tree(s) | Common adverse effects | Dosage (freq/amount/time/delivery) | Duration (time) | Pham Action | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Actinomycetales, chemistry, enzymology, Adenos... | 8784428 | a phosphotransferase which modifies the alpha ... | Acarbose 7-phosphotransferase from Actinoplane... | Acarbose | Alpha-glucosidase Inhibitors | 54 | Acarbosa, Acarbose, Acarbosum | Acarbose | D09.698.629.802.100 | Hypoglycaemia, Hypoglycaemic \ncoma, pneumatos... | 3/25-50-100mg/day/po | 4-8 weeks intervals | Glycoside \nHydrolase Inhibitors |
1 | Acarbose, Adult, Blood Glucose, metabolism, Cl... | 6350115 | in a double blind study we have compared the e... | Effect of acarbose, pectin, a combination of a... | Acarbose | Alpha-glucosidase Inhibitors | 54 | Acarbosa, Acarbose, Acarbosum | Acarbose | D09.698.629.802.100 | Hypoglycaemia, Hypoglycaemic \ncoma, pneumatos... | 3/25-50-100mg/day/po | 4-8 weeks intervals | Glycoside \nHydrolase Inhibitors |
2 | Acarbose, Adult, Aged, Blood Glucose, metaboli... | 9663365 | acarbose is an alpha glucosidase inhibitor app... | Effects of beano on the tolerability and pharm... | Acarbose | Alpha-glucosidase Inhibitors | 54 | Acarbosa, Acarbose, Acarbosum | Acarbose | D09.698.629.802.100 | Hypoglycaemia, Hypoglycaemic \ncoma, pneumatos... | 3/25-50-100mg/day/po | 4-8 weeks intervals | Glycoside \nHydrolase Inhibitors |
3 | Acarbose, administration & dosage, Animals, Bo... | 11779583 | as alpha glucosidase inhibitor, the antidiabet... | Chronic acarbose-feeding increases GLUT1 prote... | Acarbose | Alpha-glucosidase Inhibitors | 54 | Acarbosa, Acarbose, Acarbosum | Acarbose | D09.698.629.802.100 | Hypoglycaemia, Hypoglycaemic \ncoma, pneumatos... | 3/25-50-100mg/day/po | 4-8 weeks intervals | Glycoside \nHydrolase Inhibitors |
4 | Acarbose, Aged, Blood Glucose, metabolism, Dia... | 9428831 | to compare the therapeutic potential of acarbo... | Efficacy of 24-week monotherapy with acarbose,... | Acarbose | Alpha-glucosidase Inhibitors | 54 | Acarbosa, Acarbose, Acarbosum | Acarbose | D09.698.629.802.100 | Hypoglycaemia, Hypoglycaemic \ncoma, pneumatos... | 3/25-50-100mg/day/po | 4-8 weeks intervals | Glycoside \nHydrolase Inhibitors |
Creating Neo4J Graph Database for Drug occurance in PMIDs
- Neo4J Driver initialized
- Query formed to import data from list generated in previous section of this notebook
- Loading csv
- Creating drug entities with name, category, and synonym fields
- Creating article entities with PMID, abstract, title, and MeSH fields
- Creating edges labeled OCCURANCE for connecting drugs referenced by a PMID
importlib.reload(neo4j_driver)
driver = neo4j_driver.driver(uri = "bolt://localhost:7687", user = "neo4j", password = "drug1234")
import_data_query = (
"LOAD CSV WITH HEADERS FROM %s AS row"
" MERGE (drug:Drug {name: row.drug, category: row.category, synonyms: row.Synonyms})"
" MERGE (article:Article {PMID: row.PMID, abstract: row.abstract, title: row.title, MeSH: split(row.MeSH, %s)})"
" MERGE (drug)-[:OCCURANCE]->(article)"
" MERGE (drugmesh:MeSH {name: row.drug_mesh})"
" MERGE (drug)-[:OCCURANCE]->(article)"
" MERGE (drug)-[:HAS_MESH]->(drugmesh)"
% ('"file:///' + 'drug_list_occurance.csv' + '"', "', '")
)
print('Query:\n\t', import_data_query)
with driver.driver.session() as session:
result = session.run(import_data_query)
Query:
LOAD CSV WITH HEADERS FROM "file:///drug_list_occurance.csv" AS row MERGE (drug:Drug {name: row.drug, category: row.category, synonyms: row.Synonyms}) MERGE (article:Article {PMID: row.PMID, abstract: row.abstract, title: row.title, MeSH: split(row.MeSH, ', ')}) MERGE (drug)-[:OCCURANCE]->(article) MERGE (drugmesh:MeSH {name: row.drug_mesh}) MERGE (drug)-[:OCCURANCE]->(article) MERGE (drug)-[:HAS_MESH]->(drugmesh)
Merging Chemical List with Chemical Occurance Data Sets
deduped_chem_list = chemical_list_df\
.dropna(subset=['Molecule/Enzyme/Protein'])\
.sort_values(by='Molecular and Functional Categories')\
.drop_duplicates(subset=['Molecule/Enzyme/Protein'], keep='first')\
.fillna('None')
chemical_occurance_df['MeSH'] = chemical_occurance_df['MeSH'].str.replace('[', '').str.replace(']', '').str.replace("'", '')
chem_list_occurance_df = chemical_occurance_df.merge(
deduped_chem_list.rename(columns={
'Molecule/Enzyme/Protein': 'chemical',
'Chemical Formula': 'formula',
'Molecular and Functional Categories': 'GO_MF',
'Biological Events of Oxidative Stress': 'GO_Oxidative_Stress',
'MeSH Heading': 'chemical_mesh'
}),
how='inner',
validate='m:1'
).fillna('None')
chem_list_occurance_df['chemical'] = chem_list_occurance_df.chemical.str.strip()
chem_list_occurance_df.loc[chem_list_occurance_df.MeSH == '', 'MeSH'] = 'None'
chem_list_occurance_df.head()
MeSH | PMID | abstract | title | chemical | GO_Oxidative_Stress | GO_MF | chemical_mesh | MeSH Supplementary | MeSH tree numbers | formula | Examples | Pharm Actions | Tree Numbers | References | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | None | 31368101 | coronary spasm plays an important role in the ... | Association of East Asian Variant Aldehyde Deh... | 4-hydroxy-2-nonenal (4-HNE) | 135 | Lipid Peroxidation Products | Aldehydes | 4-hydroxy-2-nonenal | D02.047 | C9H16O2 | 4-HNE, MDA | Cross-Linking Reagents | D27.720.470.410.210 | None |
1 | Acetylcholinesterase, metabolism, Aldehydes, m... | 10463393 | we have investigated the effect of soman induc... | Increased levels of nitrogen oxides and lipid ... | 4-hydroxy-2-nonenal (4-HNE) | 135 | Lipid Peroxidation Products | Aldehydes | 4-hydroxy-2-nonenal | D02.047 | C9H16O2 | 4-HNE, MDA | Cross-Linking Reagents | D27.720.470.410.210 | None |
2 | Aldehydes, chemistry, Amines, chemistry, Benzy... | 8448343 | the reaction of trans 4 hydroxy 2 nonenal (4 h... | Pyrrole formation from 4-hydroxynonenal and pr... | 4-hydroxy-2-nonenal (4-HNE) | 135 | Lipid Peroxidation Products | Aldehydes | 4-hydroxy-2-nonenal | D02.047 | C9H16O2 | 4-HNE, MDA | Cross-Linking Reagents | D27.720.470.410.210 | None |
3 | Animals, Blood-Brain Barrier, metabolism, path... | 29775963 | brain ischemic preconditioning (ipc) with mild... | Brain ischemic preconditioning protects agains... | 4-hydroxy-2-nonenal (4-HNE) | 135 | Lipid Peroxidation Products | Aldehydes | 4-hydroxy-2-nonenal | D02.047 | C9H16O2 | 4-HNE, MDA | Cross-Linking Reagents | D27.720.470.410.210 | None |
4 | Alzheimer Disease, drug therapy, enzymology, p... | 30218858 | excessive production of amyloid β (aβ) induced... | Neuro-protective effects of aloperine in an Al... | 4-hydroxy-2-nonenal (4-HNE) | 135 | Lipid Peroxidation Products | Aldehydes | 4-hydroxy-2-nonenal | D02.047 | C9H16O2 | 4-HNE, MDA | Cross-Linking Reagents | D27.720.470.410.210 | None |
# Saving file to import area of local neo4j instance
chem_list_occurance_file = '/Users/akre96/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-dc2bbd3b-84e9-421e-8594-9fe29be9bb02/installation-3.5.6/import/chem_list_occurance.csv'
chem_list_occurance_df.to_csv(chem_list_occurance_file, index=False)
Adding to Neo4J Graph Database for Chemical occurance in PMIDs
- Query formed to import data from list generated in previous section of this notebook
- Loading csv
- Creating chemical entities with name, example, and formula fields
- Merges article entities with PMID, abstract, title, and MeSH fields
- Creating edges labeled OCCURANCE for connecting drugs referenced by a PMID
import_chemical_data_query = (
"LOAD CSV WITH HEADERS FROM %s AS row"
" MERGE (chem:Chemical {name: row.chemical, example: row.Examples, formula: row.formula})"
" MERGE (article:Article {PMID: row.PMID, abstract: row.abstract, title: row.title, MeSH: split(row.MeSH, %s)})"
" MERGE (mesh:MeSH {name: row.chemical_mesh})"
" MERGE (chem)-[:OCCURANCE]->(article)"
" MERGE (chem)-[:HAS_MESH]->(mesh)"
% ('"file:///' + 'chem_list_occurance.csv' + '"', "', '")
)
print('Query:\n\t', import_chemical_data_query)
with driver.driver.session() as session:
result = session.run(import_chemical_data_query)
Query:
LOAD CSV WITH HEADERS FROM "file:///chem_list_occurance.csv" AS row MERGE (chem:Chemical {name: row.chemical, example: row.Examples, formula: row.formula}) MERGE (article:Article {PMID: row.PMID, abstract: row.abstract, title: row.title, MeSH: split(row.MeSH, ', ')}) MERGE (mesh:MeSH {name: row.chemical_mesh}) MERGE (chem)-[:OCCURANCE]->(article) MERGE (chem)-[:HAS_MESH]->(mesh)
Adding MeSH descriptors from Articles as MeSH node
- Adds nodes from mesh descriptor list
- Deletes "none" node
article_mesh_descriptors_query = (
"MATCH (article:Article)"
" UNWIND article.MeSH AS m"
" MERGE (artMesh:MeSH {name: m})"
" MERGE (article)-[:HAS_MESH]->(artMesh)"
)
print('Query:\n\t', article_mesh_descriptors_query)
with driver.driver.session() as session:
result = session.run(article_mesh_descriptors_query)
Query:
MATCH (article:Article) UNWIND article.MeSH AS m MERGE (artMesh:MeSH {name: m}) MERGE (article)-[:HAS_MESH]->(artMesh)
delete_none_mesh_descriptors_query = (
"MATCH (m:MeSH {name: 'None'})"
" DETACH DELETE m"
)
print('Query:\n\t', delete_none_mesh_descriptors_query)
with driver.driver.session() as session:
result = session.run(delete_none_mesh_descriptors_query)
Query:
MATCH (m:MeSH {name: 'None'}) DETACH DELETE m