Find Jobs
Hire Freelancers

Perl Dataminer - HTML parsing, HTML post/get queries, Mysql databases, PDF creation.

$30-105 USD

Cancelado
Publicado hace más de 13 años

$30-105 USD

Pagado a la entrega
This is a datamining application some of which is already finished. The dataming is for information reguarding address records in the USA. Bonuses will be giving for useful/pretty reporting, and data mining realistic rent from a source I have been unable to find. Skills required include, HTML parsing, HTML post/get queries, Mysql databases, PDF creation. The acceptable error rate is 1 in every 500 records. That is to say, 1 in every 500 records is allow to have data import(data mining) errors. the other 499 records are required to have 100% data correctly data mined. Must work with Strewberry Perl on Windows 7. Any Addtional Modules\libs used but be installable by the ppm install command OR must be installable by you providing them in a ZIP/RAR file. No Build or Make Commands. ## Deliverables This is a datamining application most of which is already finished. The dataming is for information reguarding address records in the USA. Bonuses will be giving for useful/pretty reporting, and data mining realistic rent from a source I have been unable to find. Skills required include, HTML parsing, HTML post/get queries, Mysql databases, PDF creation. The acceptable error rate is 1 in every 500 records. That is to say, 1 in every 500 records is allow to have data import(data mining) errors. the other 499 records are required to have 100% data correctly data mined. Must work with Strewberry Perl on Windows 7. Any Addtional Modules\libs used but be installable by the ppm install command OR must be installable by you providing them in a ZIP/RAR file. No Build or Make Commands. --- Please see attached zip file --- command line sytax. perl foredownloader -- this option causes the program to download the data right now perl foredownloader 10-23-2010 10-26-2010 -- this option causes the program to download the date range 10-23-2010 to 10-26-2010 perl foredownloader 10-23-2010 10-26-2010 always -- This option cause the program to download with out the confirm response downloads the list from -- Search By 'Document Type' - Left hand side [login to view URL] -- Use types "HL,L,LISP,DEF,B,NTS,TSD,TXDUE,DETS,BETS" / Foreclosure Documents [login to view URL] -- Click "Create Export File" The exported file will provide the following data points push(@headers, 'DocumentID'); push(@headers, 'CrossPartyName'); push(@headers, 'Consideration'); push(@headers, 'Comments'); push(@headers, 'DocTypeKey'); push(@headers, 'FullName'); push(@headers, 'RecordDate'); push(@headers, 'ClerkFileNumber'); push(@headers, 'DOR1ParcelID'); push(@headers, 'Comments2'); -- If there is a error that the date range is to large(IE that we have tired to download over 10,000 records), the program should automatically divied the date range until it sucessful, It should download all such sections and reassable them. After downloading the List from [login to view URL] it should show the total number of records about to be downloaded, and request a confirm to start downloading. The program should take the Parcel ID information from the exported excel file and downloads the information from the Assessor website. The program should also add data fields for any URL ref from the Assessor website, and also a data feild for the Assessor website itself. Example URLs [login to view URL] [login to view URL] [login to view URL]:05188 [login to view URL] [login to view URL] and the following data points # GENERAL INFORMATION push(@headers, 'Assessor URL'); push(@headers, 'Parcel NO.'); push(@headers, 'OWNER AND MAILING ADDRESS'); push(@headers, 'LOCATION ADDRESS CITY/UNINCORPORATED TOWN'); push(@headers, 'ASSESSOR DESCRIPTION'); push(@headers, 'ASSESSOR DESCRIPTION URL'); push(@headers, 'RECORDED DOCUMENT NO.'); push(@headers, 'RECORDED DOCUMENT NO. URL'); push(@headers, 'RECORDED DATE'); push(@headers, 'VESTING'); # ASSESSMENT INFORMATION AND SUPPLEMENTAL VALUE push(@headers, 'TAX DISTRICT'); push(@headers, 'APPRAISAL YEAR'); push(@headers, 'FISCAL YEAR'); push(@headers, 'SUPPLEMENTAL IMPROVEMENT VALUE'); push(@headers, 'SUPPLEMENTAL IMPROVEMENT ACCOUNT NUMBER'); #REAL PROPERTY ASSESSED VALUE 1 push(@headers, 'FISCAL YEAR 1'); push(@headers, 'LAND 1'); push(@headers, 'IMPROVEMENTS 1'); push(@headers, 'PERSONAL PROPERTY 1'); push(@headers, 'EXEMPT 1'); push(@headers, 'GROSS ASSESSED (SUBTOTAL) 1'); push(@headers, 'TAXABLE LAND+IMP (SUBTOTAL) 1'); push(@headers, 'COMMON ELEMENT ALLOCATION ASSD 1'); push(@headers, 'TOTAL ASSESSED VALUE 1'); push(@headers, 'TOTAL TAXABLE VALUE 1'); #REAL PROPERTY ASSESSED VALUE 2 push(@headers, 'FISCAL YEAR 2'); push(@headers, 'LAND 2'); push(@headers, 'IMPROVEMENTS 2'); push(@headers, 'PERSONAL PROPERTY 2'); push(@headers, 'EXEMPT 2'); push(@headers, 'GROSS ASSESSED (SUBTOTAL) 2'); push(@headers, 'TAXABLE LAND+IMP (SUBTOTAL) 2'); push(@headers, 'COMMON ELEMENT ALLOCATION ASSD 2'); push(@headers, 'TOTAL ASSESSED VALUE 2'); push(@headers, 'TOTAL TAXABLE VALUE 2'); Push(@headers, 'Teasurer Property Taxes URL'); #ESTIMATED LOT SIZE AND APPRAISAL INFORMATION push(@headers, 'ESTIMATED SIZE'); push(@headers, 'ORIGINAL CONST. YEAR'); push(@headers, 'LAST SALE PRICE MONTH/YEAR'); push(@headers, 'LAND USE'); push(@headers, 'DWELLING UNITS'); #PRIMARY RESIDENTIAL STRUCTURE push(@headers, 'TOTAL LIVING SQ. FT.'); push(@headers, '1ST FLOOR SQ. FT.'); push(@headers, '2ND FLOOR SQ. FT.'); push(@headers, 'BASEMENT SQ. FT.'); push(@headers, 'GARAGE SQ. FT.'); push(@headers, 'CARPORT SQ. FT.'); push(@headers, 'STORIES'); push(@headers, 'BEDROOMS'); push(@headers, 'BATHROOMS'); push(@headers, 'FIREPLACE'); push(@headers, 'ADDN/CONV'); push(@headers, 'POOL'); push(@headers, 'SPA'); push(@headers, 'TYPE OF CONSTRUCTION'); push(@headers, 'ROOF TYPE'); #ASSESSORMAP VIEWING GUIDELINES push(@headers, 'MAP'); push(@headers, 'MAP URL'); The program should then download all the data points from Teasurer Property Taxes URL Example [login to view URL] ## List of data Points from the Teasurer Website not listed here, but download them all The program should then use a Free Geocoding Service which allows at least 10,000 records to be geocoded per day. Any recorded not abled to be geocoded during that day should beable to be geocoded later by running the command perl foredownloader fixgeo the Geocoding should provide at least the following data points #from geocode push(@headers, 'Geo_Number'); <-- Street Numbers push(@headers, 'Geo_Street'); <-- Street Name push(@headers, 'Geo_Type'); <-- Street Type (Circle, Ave, Blvd, St.) etc. push(@headers, 'Geo_City'); push(@headers, 'Geo_State'); push(@headers, 'Geo_Zip'); push(@headers, 'Geo_Suffix'); push(@headers, 'Geo_Prefix'); <-- such as North, S. E. push(@headers, 'Geo_Lat'); push(@headers, 'Geo_Long'); Should include a data field for URL of Google Maps for each Address Example [login to view URL],+NV+89014&sll=36.114646,- 115.172816&sspn=0.745514,1.244202&ie=UTF8&hq=&hnear=635+Pepper+Tree+Cir,+Henderson,+Clark,+Nevada+89014&z=16 The program need to download the following from [login to view URL] For Each Address. # From epprisal push(@headers, 'Eppraisal'); push(@headers, 'Zillow_apprasial'); Should also download the data for Recently Sold Homes (all 5 of them) Address,Sales Price,Sale Date,Bed/Bath,Sq. Ft. #### DATABASE WORK #### All the data should go into mysql, with a timestamp for the Query which importanted it. there should be a [login to view URL] file to hold the configuration values. Records need to be important multiply times, each time with a different importID and timestamp. When a record for lets say Parcel=191-24-111-040 is important on Oct 10th it should not over write the record important early n Oct 2nd. needs to be a [login to view URL] file which will create all the needed database tables needs to be a [login to view URL] file which will prompt the user to confirm they really wish to delete database tables, and them. ### Bonus ### Up to 20$ USD bonus will be given for useful reporting. such as looking to see which multi family homes with 4 units where built between 1998 and 2010, with an Eppraisal between 65,000 and 200,000 The better looking the reports the better Using a background image(same background for each page) and then creating an Mulitpage PDF file, one page per Address is prefect. ### Addtional BONUS ### Up to an 20$ USD bonus will be given if a realistic suggested rental price can be generated/data mined for each address. I need to get realistic rent I can charge if I were to buy a property, should take things into account such as properties type(House, Condo, Appartment), # of bedrooms and bath rooms, SQ feet, etc. The more realistic the Suggest rental price is the closer to 20$ USD you will get.
ID del proyecto: 3821009

Información sobre el proyecto

Proyecto remoto
Activo hace 14 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos

Sobre este cliente

Bandera de UNITED STATES
United States
5,0
1
Miembro desde oct 20, 2010

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.