Relationship type from datasource
planned
Gregory King
Just posting an update here that today we released File filtering functionality that enables the scenario illustrated by (iii) in my last post below. You can read more about it in the 26 Jan change log entry "Data Importer - Introducing file filtering".
bent.s.lund@gmail.com
Gregory King: Thanks for the update, this is a useful feature for the Data Importer!
Gregory King
planned
Quick update that we're planning to start work soon on providing the ability to filter files based on a string match in a column to allow a file containing multiple different relationship types to be subsetted within the UI and used to create different relationships from each subset. This is similar to option (iii) that I detailed in one of my earlier posts.
If you need dynamic relationship type generation, please see here for a post-processing workaround.
bent.s.lund@gmail.com
Hi, thanks for your reply on this. To illustrate what I suggest, here is a cyper statement that dynamically create relationships types from the input file:
:auto USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM 'file:///ProdGroup.csv' AS line FIELDTERMINATOR ';'
WITH line
MERGE(p1:zProduct {STI_ID : line.STI_ID})
MERGE(p2:zProduct {STI_ID : line.SUPERIOR_STI_ID})
WITH p1, p2, line
CALL apoc.create.relationship(p1, line.REL, {}, p2) YIELD rel
RETURN *
the variable line.REL reads the relationship type from the input csv file.
This creates dynamically assigned rel types between nodes of the same type.
As the DataImporter already allows to import relations from a file and addign properties to relations, it should be a small effort to read the relationship type out of the source data.
I have not come across situations where the object types needs to be treated dynamically
Rob Piombino
Gregory King I was about to create another thread but stumbled across bent.s.lund@gmail.com's thought!
I had this same thought and have run into this limitation quite a few times. It could be very domain specific, but this would be great and would enable Data-Importer to be used for graphs containing many (i.e. >75) distinct relationshipTypes. This is commonly seen in both small & large graphs within the life sciences/biological sciences domain. It is quite common to have upwards of 100-500 distinct relationshipTypes for very large biological networks.
The ability for Data-Importer to treat the "Type" field in "Mapping Details" as either an input text field (current state) or as a drop-down/toggle selection field enabling the selection of a specific column in the input data would alleviate this limitation.
This functionality would also provide Data-Importer (and in-turn AuraDB/AuraDS) an extremely similar convenience (both in file formatting and data modeling of flat-files) that is offered via the "neo4j-admin import" tool which is available for most Neo4j deployments but not Aura.
This could be a great way to move even closer to a "standardized/normalized" means of importing data into Neo4j in terms of file formatting (i.e. :START_ID, :END_ID, :TYPE column headers). Data-Importer would almost nearly mirror (or entirely mirror) flat-file formatting required for deployments of Neo4j where the use of the neo4j-import tool is available and being used. This would make transition to or from Aura even more seamless along with alleviating some of Data-Importer's limitations.
Best Regards,
Rob
bent.s.lund@gmail.com
Rob Piombino: I found a workaround; you can import your relations as a fixted type "RefTo" and assign this a property eg. RelationType from your inputfile where the relations are imported from . Then post-process this with cypher to change fixed relation type "RefTo" to whatever the property RelationType of the relation contains.
Gregory King
Thanks for the feedback Bent!
We've been thinking about this too - it is indeed quite common to find datasets that are edgelists (relationship lists in Neo4j terms) that contain many different relationship types in a single file.
I thought it might be helpful to share some of our thinking and get your feedback on potential scenarios:
At one extreme, you might have all of the relationships for your model contained in a single file. All your data could conceivably be mapped into a single dynamic node and dynamic relationship, as illustrated in (i) For a complex model, this would make data importer's approach of sketching out a model hard to use (how do you map properties in cleanly for different Labels/Rels?) and arguably redundant in conveying your graph.
In other scenarios you may have only subsets of the relationships for your model in a single file, say only the relationships that can join two different labels. Here the model you sketch out is less abstract and only the relationship between the two specific Labels is dynamic. This is illustrated in (ii)
Another approach, regardless of how many files your relationships are contained in, is to continue modelling at the level of detail that is clearest to understand, as illustrated in (iii). In this scenario you could map the same files but with filtering applied to each file to only select the relevant relationships. For example if you had a file that contained all the ACTED_IN, DIRECTED and PRODUCED relationships in the file people-to-movies_rels.csv, with a relationship column, the file could be mapped to the ACTED_IN relationship with a filter applied so only relationship='ACTED_IN' would be used to generate that relationship.
The different ways of doing this are I think a trade off between convenience / speed of mapping and clarity of the graph model you are sketching out. Building the functionality for dynamic relationship creation (if used judiciously) or filtering could both be viable solutions. I'd love to hear your perspectives on what you could see working well.