Find Jobs
Hire Freelancers

Big data text file to database

$30-250 USD

Cancelado
Publicado hace más de 10 años

$30-250 USD

Pagado a la entrega
I need to parse all lines of a very large (~650MB) text file into table rows of a SQLite3 database. Basically, the file contents are lines of text where individual pieces of data are separated by single spaces on each line. The data format looks like this: 4287053 06218896 N 19801222 19810901 19881222 M171 \r\n As seen above, each line is terminated by carriage return and line feed. Each row should be a maximum of 53 characters in length, although I know that some rows are missing the last field (e.g., M171 ), and it's possible (although I haven't seen it after parsing 500,000+ lines) that there may be other fields missing from some rows. Similarly, I know there are some rows that appear to be repeats. The file I need to parse is located at: [login to view URL], and documentation about the zipped file contents is located at [login to view URL] The [login to view URL] file is updated every week, and the changes are cumulative. The first "field" in the text line is a number representing a US patent. Rows are group by patent number in increasing order, except that the last entries in the file will start with "RE", although they will still be 7 characters in length. Thus, the entire file will look like this: 4287053 06218896 N 19801222 19810901 19881222 M171 \r\n ... 6497914 08193028 N 19940203 20021224 20030613 ASPN \r\n ... RE44712 13288586 N 20111103 20140121 20131219 ASPN \r\n The problem: The file contains millions of lines of text. The process of parsing this file to remove duplicate lines and create/update the database seems to be taking days on the hardware I have available. I'm looking for an optimized python script to do the following as quickly as possible according to hardware constraints: - perform an initial download and processing of the file to create a database of these records. Preferably, I'd like to eliminate duplicate rows by not importing them into the SQLite database. - download the weekly file update and parse it to update the database. Updated data is not appended to the existing file. Rather, the file is completely replaced with a new file that includes any updates to any of the previously listed patents, as well as the addition of any new patents that may have issued since the previous update. Therefore, a new record could appear almost anywhere in the file. This should also be optimized to run as quickly as possible according to hardware constraints. Specs: SQLite database table looks like this: id patent_num app_num small_entity app_filing_date patent_issue_date event_date event_code 1 4287053 06218896 N 19801222 19810901 19881222 M171 id = primary key pat_num - is text app_num is text small_entity is boolean app_filing_date is a date/datetime patent_issue_date is a date/datetime event_date is a date/datetime event_code is text
ID del proyecto: 5310503

Información sobre el proyecto

36 propuestas
Proyecto remoto
Activo hace 10 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
36 freelancers están ofertando un promedio de $169 USD por este trabajo
Avatar del usuario
With a strong background in data parsing and optimisations, I would like to help you with this problem. It's not an easy one and at the same time interesting and challenging for me to provide an optimised version. Before we proceed let clarify few things: 1. On which OS you want to run this? 2. Why SQLite is chosen as DB layer? I'm planning to use few things which will help to make the script run faster. First of all we need to properly read the file and not load all data at once in memory but use the available methods and read file line by line use generator. Se second one is related to DB. I had few cases when I needed to recompile SQLite from sources and enable few options which make it run faster. Need to se if this is the case. Other ideas will come during implementation.
$263 USD en 1 día
5,0 (138 comentarios)
7,7
7,7
Avatar del usuario
Hi I did many converter and parser before. I read your requirements and can start working on providing demos if you like. Thanks
$315 USD en 1 día
5,0 (117 comentarios)
7,7
7,7
Avatar del usuario
hi, i am expert in data mining and i know a best source of your required data, let me do this work with perfection, accuracy and according to your requirements, plz contact with me asap thanks
$126 USD en 3 días
4,9 (261 comentarios)
7,3
7,3
Avatar del usuario
Hi, Allow me to work for you, I have considerable amount of experience in web scraping and data parsing. Kindly message me to discuss things further. Thanks Ranjit Singh
$147 USD en 3 días
5,0 (52 comentarios)
6,4
6,4
Avatar del usuario
I'm very good with python and can do this task quickly. Your code is running very slow probably because you are not using bulk transactions. Please specify: - Are duplicates exactly the same? (or otherwise considered different entries) - Do you want create the SQLite database from scratch with every update or just want to insert the new items (former might be faster if updates are many)?
$111 USD en 3 días
5,0 (57 comentarios)
6,3
6,3
Avatar del usuario
With strong experience in Python. I can do this right now. Please keep me into your consideration. Look forward to working with you. Thank you.
$147 USD en 3 días
4,9 (44 comentarios)
5,9
5,9
Avatar del usuario
i can do this task with python script . i am very well experienced with inserting data into mysql database with script.
$100 USD en 1 día
4,8 (66 comentarios)
6,0
6,0
Avatar del usuario
A proposal has not yet been provided
$74 USD en 3 días
4,7 (96 comentarios)
6,1
6,1
Avatar del usuario
Hello vw143513vw. Thank you for such detailed project description. I've studied it and ready to provide you with Python script that will solve the task. I have experience in Python usage for processing large amounts of data, so the script will run as fast as possible.
$250 USD en 6 días
5,0 (29 comentarios)
6,0
6,0
Avatar del usuario
Hi, I am a software engineer with more than 7 years of experience and I have been using such python scripts for my other tasks. I will be able to deliver a customized script as per your needs.
$147 USD en 3 días
5,0 (12 comentarios)
5,6
5,6
Avatar del usuario
Should you accept me for this project, it will be great if you pass your existing files which you use to parse the 650 MB source data. That way I can figure out now it can be optimized or perhaps make a totally new parser script altogether. Thanks for considering this bid.
$184 USD en 6 días
5,0 (7 comentarios)
5,0
5,0
Avatar del usuario
I am IT professional having more than 4 years of experience. I am very good in programming in java, python. I am equally good in databases. I can write a script in python to do this task. I would like to hear about your current approach which you are using for this job, may be we can improve it. Looks like your algo is of high order
$200 USD en 3 días
4,8 (22 comentarios)
5,2
5,2
Avatar del usuario
Hi, I have a lot of experience with Python and processing large amounts of data. I can make use of sqlite .import statement to load data quickly.
$111 USD en 3 días
4,9 (17 comentarios)
5,2
5,2
Avatar del usuario
Hi, I'm very interested helping you building your database, I have more then 7 years of experience in SQL and database administration, modeling, and manipulating, please do not hesitate to check my portfolio, there is some samples of databases modeling and database administration manual. I've checked your requirement I can do this job in the deadline without a problem, because I've done many projects like this. Please send me all other details about your database, so we can start asap. I'm available, and can start immediately. Looking forward to working with you. Thank you for your consideration. Regards
$231 USD en 5 días
4,9 (21 comentarios)
5,0
5,0
Avatar del usuario
Hi I have more than 10 years experience in MS SQL Database administration. Please see my profile. Thanks.
$200 USD en 3 días
4,9 (24 comentarios)
4,8
4,8
Avatar del usuario
I have years of experience programming and helping client turn ideas into websites, applications, and strategies. I am an experience database architect, and re-purposing data is my specialty. I can start on this today and have it finished quickly. You will not be disappointed with my work. Google me for more information. U.S. location. Eastern timezone. Milestone required.
$157 USD en 1 día
4,6 (8 comentarios)
4,9
4,9
Avatar del usuario
Hi, I can write script in Python to parse data into mysql DB. Have solid experience in data manipulation by using Python as long as developing highly optimized scripts. Please let me know if you are interested. Regards, Andriy
$255 USD en 3 días
5,0 (29 comentarios)
4,6
4,6
Avatar del usuario
Hi, I work as a project manager in an online retailer, and I deal with lfiles that contain lots of records on a daily basis. Part of my job involves parsing price lists updates (that contain around 100,000 records), and either: inserting new stock items into the database; revising prices of existing items; deleting items that no longer exist in the price list. If you blindly write a procedure to do the import for you it will run for hours on end - this is why your import is taking you so long. I know how to efficiently insert; update and delete records so that scripts take seconds (rather than hours) to run. I believe I can do the same thing for you, and do the job quickly (within a matter of hours). However, I wouldn't write the code in python - I would use PHP. I'd also reccomend that you use an SQL rather than SQLLite database (I'm not sure what difference it would make but my feeling is that if you exprerience performance hits, this is one of the reasons). I'd like to discuss the project a bit with you becuase there's certain things I'm not clear on. The thing about cumulative updates and replacements and so on - I'd need you to eplain to me in greater detail what you require. But in short, I'm certain I can do this project within an hour or two, and total execution time would be seconds or minutes as opposed to hours or days. Could you send me a message here? Thanks a lot, John ps:Understood update requirements, the project is quite straightforward
$140 USD en 0 día
5,0 (9 comentarios)
4,5
4,5
Avatar del usuario
A proposal has not yet been provided
$222 USD en 10 días
4,4 (9 comentarios)
4,8
4,8
Avatar del usuario
Dear Sir Please assign this task for me I can assure 100% completion. I'm very much familiar with MS SQL Server ( SSIS SSAS SSRS) and also during last five years I involved many project in ETL, data warehousing, BI reporting and five years of application development experience. I would highly appreciate if you assign this job to me and also I'm Microsoft Certified Technology Specialist (MCTS) for Microsoft SQL Server 2008 for Business Intelligence Development and Maintenance & Microsoft Office SharePoint Server Application Development. Please reply me for further details Best Regards, Jinesh
$35 USD en 3 días
5,0 (10 comentarios)
3,9
3,9

Sobre este cliente

Bandera de UNITED STATES
Denver, United States
0,0
0
Miembro desde jun 29, 2001

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.