Importing a large number of Taxonomy terms (Gene Ontology) regularly
Here is the new approach for importing a lot of taxonomy terms:
Parse the entire file into terms and their data.
Load existing terms by name or, preferably if available, a unique ID. Process these with individual taxonomy_term_save commands (enhanced to allow custom operations via a hook). This is a big for loop.
For all terms that do not have an existing counterpart, call a bulk insert function. This too allows for custom action on save. Any array keys that match the term_data schema get saved, and anything else gets a chance to be taken care of by a hook. Same approach but both are bulk and so take an array of arrays rather than a single term array (or object).
It would be polite here to call the bulk hook for custom matters if it exists, but if it doesn't exist and the single save hook implementation does, to call that in a for loop.
In the case of Gene Ontology data, we would be adding one additional column to term_data (or a one-to-one table) and storing a fair amount of information in other tables-- probably directly in ontology module's ontology_relationships table.
For bulk insert:
INSERT INTO x (a,b)
VALUES
('1', 'one'),
('2', 'two'),
('3', 'three')
We'll have to check if this SQL works in PostgreSQL as well as MySQL.
Worst case in Drupal 6 database type can be checked with
<?php
$GLOBALS['db_type']
?>
Comments
Post new comment