Importing files that need tweaking

LIME is very strict about the structure of the file to import. If we want to import documents from other sources that might not adhere to these rules, we need to transform the original data in to something that LIME accepts.

Let’s say we have the following file to import:

name;company;email;phone;title
Erica Koss;Jacobi Inc;zoie51@donnelly.net;1-978-451-7502x7744;IT
Janna Roob;Beer, Green and Grant;giuseppe78@keebler.com;317.432.6066;Support
...

We have a single name column that contains both the first and last name. However, the object type we are importing to has a first name and a last name property respectively.

A (naively simple) solution to this is to add a transformation step before we upload our import file where we split the name column into a ‘first name’ and a ‘last name’ column:

NORMALIZED_HEADERS = 'first name;last name;company;email;phone;title'
 
def normalize(infile, outfile):
    # Write headers to our normalized file
    outfile.write(NORMALIZED_HEADERS + '\r\n')
 
    for content in itertools.islice(infile, 1, None):
        name, *row = content.split(';')  # Get original columns
        first, last = name.split()       # Extract first and last names
        row = [first, last] + row        # Put together the line
        outfile.write(';'.join(row) + '\r\n')
 
    outfile.seek(0)
 
with open('non-normative.txt', 'r', encoding='utf-8') as raw:
    with tempfile.TemporaryFile('w+', encoding='utf-8') as fixed:
        normalize(raw, fixed)
        f = ImportFiles(c).create(filename='non-normative.txt',
                                  content=fixed)
        f.delimiter = ';'
        f.save()
 
# ...

Instead of just passing the original file directly to LIME, we create a temporary file that we fill with a normalized version with two columns for storing a name. We then pass this file to LIME instead.