User login

Importing a large number of Taxonomy terms (Gene Ontology) regularly

Here is the new approach for importing a lot of taxonomy terms:

Parse the entire file into terms and their data.

Load existing terms by name or, preferably if available, a unique ID. Process these with individual taxonomy_term_save commands (enhanced to allow custom operations via a hook). This is a big for loop.

For all terms that do not have an existing counterpart, call a bulk insert function. This too allows for custom action on save. Any array keys that match the term_data schema get saved, and anything else gets a chance to be taken care of by a hook. Same approach but both are bulk and so take an array of arrays rather than a single term array (or object).

It would be polite here to call the bulk hook for custom matters if it exists, but if it doesn't exist and the single save hook implementation does, to call that in a for loop.

In the case of Gene Ontology data, we would be adding one additional column to term_data (or a one-to-one table) and storing a fair amount of information in other tables-- probably directly in ontology module's ontology_relationships table.

For bulk insert:

INSERT INTO x (a,b)
VALUES
('1', 'one'),
('2', 'two'),
('3', 'three')

We'll have to check if this SQL works in PostgreSQL as well as MySQL.

Worst case in Drupal 6 database type can be checked with

<?php
$GLOBALS['db_type']
?>

Resolution

Searched words: 
drupal get database type

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.
  • You can use Markdown syntax to format and style the text. Also see Markdown Extra for tables, footnotes, and more.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img> <blockquote> <small> <h2> <h3> <h4> <h5> <h6> <sub> <sup> <p> <br> <strike> <table> <tr> <td> <thead> <th> <tbody> <tt> <output>
  • Lines and paragraphs break automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.