Find Jobs
Hire Freelancers

Data Processing 500 fixed-length files to CSV

€8-30 EUR

Terminado
Publicado hace más de 9 años

€8-30 EUR

Pagado a la entrega
We have 500 files with fixed-length column with a total of 500.000 lines. We need to transform those filed into columns separated by a tabulation. Each file have a similar structure. The trick is that each file have a slight different column length (and some buggy stuff). We need to have a script that first automatically detect the length of each columns for each file, before doing the split (with a function like unpack in perl). We have tried DataExtract::FixedWidth perl library which use an euristic method but it's not fine. We've put in attachment a zip with the files. Please let me know the method you are going to use to determine the column length. Few input: - a normal file have 11 columns " DA" "MON" "YEAR" "FATHER LAST" "FATHER FIRST" "WIFE LAST" "WIFE FIRST" "MARRY" "DATE" "CHILD NAME" "LOCATION" "SPOUSE" - the numerical colonne are justified on the right, the text column justified on the left - the first column is a number or unk and there is few case with missing first column like "[login to view URL]" (in zip) - sometime a field can be stuck to the next one because it's two long (no space between) - There si always some text for the text fields. - there is some time no date for the first three column and "unk" instead - there is some time the marriage date - there very few buggy file with a missing month column (first one) [login to view URL]) - some time there is no header (like [login to view URL]) - there is few buggy case where the last column (spouse name) was cutted and put at the end of the file The method suggested: - statistically determine the length of the fields in a first pass. For exemple date = 5 name = 6 - use unpack with those numbers: my($key, $value) = unpack "A6A3"; Please let me know how you are going to determine the length.
ID del proyecto: 6711075

Información sobre el proyecto

14 propuestas
Proyecto remoto
Activo hace 9 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
Adjudicado a:
Avatar del usuario
Sample from my parser in perl... $ for x in `ls -1 a*txt` ; do echo "$x"; perl [login to view URL] $x ; done [login to view URL] "9","28","1716","ALTENHOF","FRIEDERIC","UNK","EVE","MARIE ANG","LA AACH","" "11","28","1723","ALTENHOFE","PETER","THEISEN","ANNE","MARIE","AACH","" "12","13","1725","ALTENHOFE","PETER","THEISEN","ANNE","MARIE ELI","ABETHAACH","" "6","22","1730","ALTENHOFE","PETER","THEISEN","ANNE","MARGARET","AACH","" [login to view URL] "8","4","1832","HIMPELER","MICHAEL","BILLEN","CATHERINE","NICOLAS","ACHTERBURG" "1","19","1831","HIMPELER","THEODORE","SCHMIT","CATHERINE","MARIE","ACHTERBURG" "7","1","1832","HIMPELER","THEODORE","SCHMITT","CATHERINE","ANNE CATHERINE","ACHTERBURG" "12","17","1847","HIMPELER","THEODORE","SCHMITT","CATHERINE","CATHERINE","ACHTERBURG" [login to view URL] "6","9","1785","ANTONI","NICOLAS","THEIS","ELIZABETH","GIRL","AFFLER" "11","12","1861","ARENS","NICOLAS","MUNCKLE","CATHERINE","JACOBUS","AFFLER" "3","26","1863","ARENS","NICOLAS","MUNCKLE","CATHERINE","ELIZABETH","AFFLER" "3","29","1865","ARENS","NICOLAS","MUNCKLE","CATHERINE","JOANNES","AFFLER"
€50 EUR en 1 día
0,0 (0 comentarios)
0,0
0,0
14 freelancers están ofertando un promedio de €45 EUR por este trabajo
Avatar del usuario
----------------------------------------------------------Please Consider on My BID---------------------------------------------------Dear Sir/Madam, We are already read your project description & fully understand this project for you. Also I have solid experienced professional in-house team of designers, developers and coders. We are offering best price in this market & providing best quality of work for you. Please check your PMB for more details & award me for this project. ~~~~~~~~~~~~~~~~~Few reasons to select us as a service provide~~~~1. Post delivery free bug fixing support.------- 2. Industry standard modern design.------- 3. Secure coding SQL injection and XSS (cross site scripting),------- 4. Well documented code.------- 5. Regular work update. ~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ Thanks with Regards
€50 EUR en 1 día
4,9 (81 comentarios)
5,8
5,8
Avatar del usuario
A proposal has not yet been provided
€60 EUR en 2 días
4,9 (138 comentarios)
5,5
5,5
Avatar del usuario
Hello! I did file processing using MATLAB for my PHD thesis! Please let me know if you want to help you! Have a nice day!
€55 EUR en 0 día
5,0 (1 comentario)
1,5
1,5
Avatar del usuario
i can provide firt draft for your approval today .
€55 EUR en 3 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
dedication and hard work is the most important things which you have to do this type of work. no body is perfect in this world. so give ever body chance to show him.
€23 EUR en 1 día
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
i have good knowlege of excel and ms office and i like to do the part time job along with my professional job
€23 EUR en 3 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
Hello, I am a big fan of Excel and VBA and I think that the help you need can be provided using this tools. I have went through some of the files and yes there are certain inconsistencies and bugs that need to be handled in order to get to a correct and accurate version of the files but that is manageable. I am not sure I understood correctly the type of delivery files you are looking for because you have in the title CSV (comma delimited=.csv) and in the description you have "columns separated by a tabulation" (tab delimited = .txt). I can also see that you are referring in the project's description to assessing the length of the columns in order to make the delimitation. If you use Excel to open the files you will see that for most of the columns the separation is done correctly (except for "MARRIAGE" and "SPOUSE NAME") so I am not sure if identifying the lengths is actually a necessity. However, I am interested in working on this project so I am hoping to hear from you soon. Please see below a description of the steps I would take in order to help with this project. Best Regards, Nina Zsurzs
€55 EUR en 3 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
WE PERFORM ALL FUNCTIONS ON DATABASE INCLUDING THIS,WE ARE THE ONE YOU ARE SEARCHING FOR . PEOPLE CALLED US INFO. INPUT OFFICERS . WE CAN DO THIS WORK EASILY EVEN LESS THAN THE DAYS YOU WANT IT TO BE COMPLETE.
€23 EUR en 1 día
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
Hey Thanks for posting the project. Its looks feasible and am Interested to do it. Next steps: Lets discuss/validate the complete requirement and I can start to get this done with required quality output. I am an excel/access VBA automation professional having 7+ years of experience and can do it with the required quality. Lets discuss more online on chat. Thanks, Abhinav
€48 EUR en 1 día
0,0 (0 comentarios)
4,2
4,2
Avatar del usuario
I have done Perl projects for companies like Qualcomm that involved parsing multiple huge chip files ( ~100 gb) followed by processing of parsed data and displaying in required graphical/text/csv format. I use Perl wherever I can to automate tasks on my Linux machine. I also extensive experience of working with PHP. I have worked as the CTO of Qulp, a web-based product innovation startup where I developed the PHP backend using Yii PHP framework. I had faced an exactly similar problem such as yours of varying column length when processing files at Qualcomm. I had used a custom regex expression to calculate the various column lengths. I have looked at the files you have provided and each column is separated by a space character. Using this property a regex can easily be developed to distinguish between various columns. Also data in the columns have distinguishing properties (numerical, text, date etc.). This will make it easier to develop the regex The special cases you have mentioned can also be coded into the regex
€24 EUR en 1 día
0,0 (0 comentarios)
0,0
0,0

Sobre este cliente

Bandera de FRANCE
PARIS, France
5,0
14
Forma de pago verificada
Miembro desde jun 26, 2007

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.